This is the multi-page printable view of this section. Click here to print.
AIX How-To Documents
1 - Preparing systems for migration
Use the following prerequisites and recommendations to ensure a successful migration to IBM Power for Google Cloud (IP4G).
For AIX systems, ensure that the operating systems are running the following minimum versions:
AIX Version | Minimum TL/ML | Notes |
---|---|---|
5.3 | 5300-12-04 | Only supported within a Versioned WPAR |
7.1 | 7100-05-06 | |
7.2 | 7200-05-01 | |
7.3 | 7300-00-01 |
Additional software requirements:
- The devices.fcp.disk.ibm.mpio package must not be installed. Uninstall it if necessary.
Additional recommendations:
- Install cloud-init. This can be done using dnf, yum, or RPM. There are several prerequisites to install for cloud-init. The cloud-init software package is required to leverage some features. Those features include:
- Image capture and deploy.
- Automatic configuration of IP addresses, through the IP4G interface.
- Update MPIO settings for each disk to set algorithm to shortest_queue or round_robin:
chdev -P -l hdiskX -a algorithm=shortest_queue -a reserve_policy=no_reserve
- Reboot for the attribute changes to take effect.
2 - Migrating to IP4G using a mksysb and alt_disk_mksysb
Use a mksysb from an existing system to migrate into IP4G. Do this by building a new system from a stock image, and using the alt_disk_mksysb command. The following steps highlight how to do so using the pcloud CLI. However, the IP4G specific steps can also be performed from the GUI.
Capturing a Source System mksysb
First, check if the fileset devices.fcp.disk.ibm.mpio
exists. To do this, execute the following:
lslpp -Lc | grep devices.fcp.disk.ibm.mpio
If the fileset is installed, there are two ways to handle this. Either it must be removed from the system before the mksysb is created, or when running alt_disk_mksysb. Doing that will involve using the alt_rootvg_op command to wake the alt disk and remove the fileset.
Customers are responsible for evaluating the impact of removing the fileset in their environment.
To remove execute:
installp -u devices.fcp.disk.ibm.mpio
To begin, take a mksysb on the source system. It is recommended that the rootvg of the source system is not mirrored. If it is, edit the image.data file to unmirror it before restoring it. Details for that are included below. If the source system is running AIX 7.2 or higher, using the following command:
mksysb -C -i /path/to/hostname.mksysb
If the source system is running AIX 7.1, use the following command:
mksysb -i /path/to/hostname.mksysb
Build the target system
Build the system to do the alt_disk_mksysb on. This system will boot at first from a stock AIX image. Then after the mksysb restore, it will boot from the restored AIX image. If a stock image has not already been imported, see Obtaining a VM Image
LINK
To get started, gather the following information:
- Use a hostname that will be the final hostname in IP4G.
- Use a stock image that most closely matches the source system. Later TL/SP levels are OK.
- Use the desired CPU, Memory, CPU Type, and Network settings for the final host to have.
Example:
pcloud compute instances create hostname –image AIX-7200-05-09 –network gcp-network -c 0.25 -m 8 -t shared
Add disks to the target system
Two disks are needed. One for temporary storage for the mksysb. The other for restoring the mksysb to alt_disk_mksysb. Wait for the new instance to build.
First, build the target disk for the mksysb. It needs to be large enough to hold the source system rootvg. Change the size and disk type appropriately for the system.
pcloud compute volumes create hostname-rootvg -s 20 -T ssd
Log into the new target system from the console as root. Then, run cfgmgr to discover the new disk. There should now be two disks: hdisk0, the original stock AIX image disk, and hdisk1, the target disk for the mksysb. One more disk is needed to hold the mksysb to restore. The easiest to cleanup is to add it to the rootvg, and expand /tmp.
Use the pcloud command to create a new disk, changing the size to be sufficient for holding the mksysb:
pcloud compute volumes create hostname-tempdisk -s 20 -T ssd
Log into the target system, and discover the new disk with cfgmgr. The new disk should be hdisk2. To validate which is which, run:
lsmpio -qa
Add the disk to the rootvg. Note that it is important to only add the temporary disk here:
extendvg rootvg hdisk2
.Add space to /tmp
chfs -a size=+20G /tmp
Restore the mksysb on the target system
Use the following to restore the mksysb on a target system.
Copy the mksysb file from the original source system to the new target system.
Place it in /tmp. Use any preferred method, such as scp, for transferring the mksysb. Note that if the original rootvg was mirrored, unmirror it before using alt_disk_mksysb. Do this by restoring the image.data file and editing it. It must be edited so the PPs line for each LV is equal to the LPs line. To restore image.data, use this command:
restore -xqvf /tmp/hostname.mksysb ./image.data
If this had to be done, specify to use the new image.data when restoring. Add the flag -i /tmp/image.data to the alt_disk_mksysb command.
To restore the mksysb use the alt_disk_mksysb command:
alt_disk_mksysb -m /tmp/hostname.mksysb -d hdisk1 -z -c /dev/vty0
- This will automatically set the bootlist to boot off of the new volume, hdisk1.
- Reboot:
shutdown -Fr
Confirm the VM has booted from the disk containing the restored mksysb. In this example, that would be hdisk1. Use lspv to validate which disks / vg’s are present:
lspv
Clean up the temporary disks
Use the following to clean up the temporary disks.
Use exportvg to remove the old rootvg exportvg old_rootvg rmdev -dl hdisk0.
rmdev -dl hdisk2
Use the pcloud command to set the new rootvg volume as bootable:
pcloud compute volumes update hostname-rootvg --bootable yes
Use the pcloud command to clean up the old hdisks. Find the old boot disk name using:
pcloud compute instances describe hostname
Set the old disk as not bootable
Set the old disk so it is not bootable. Match the name to the boot-0 volume from the instances describe output. To do this use:
pcloud compute volumes update hostname-d4751509-00000b25-boot-0 --bootable no
Delete the original boot volume and the temporary disks
Use the following to delete the original boot volume, and the temporary disks.
pcloud compute volumes delete hostname-d4751509-00000b25-boot-0
pcloud compute volumes delete hostname-tempdisk
3 - AIX MPIO Recommendations
This document outlines Multipath I/O (MPIO) best practices for IBM Power for Google (IP4G) deployments, focusing on actions customers can take to ensure optimal performance and availability.
Configuration
Converge handles the underlying MPIO configuration, including redundant paths, adapter diversity, and fabric management.
You can learn more about the general MPIO configuration from the official IBM documentation:
It is important customers understand their Application and select MPIO policies that best suit their Application requirements.
Key Considerations:
- Redundant Paths: Converge provides four physical paths to the backend storage, distributed across two VIOS, for enhanced redundancy.
- Dual Fabric Fiber Channel: Converge uses dual fabric Fiber Channel for all paths to minimize single points of failure.
- Pathing Policy: Customers should understand and adjust the MPIO pathing policy if needed. Common options include:
- Round Robin: Distributes I/O requests evenly across available paths.
- Shortest Queue: Directs I/O to the path with the least congestion.
- Failover Only: Designates a primary path and uses alternative paths only when the primary fails.
Monitoring Available Paths
Regularly monitor the status of MPIO paths to proactively identify potential issues. Customers should consider integrating these MPIO status into their monitoring and alerting. Sample commands for monitoring:
lspath:
lspath
This command displays path status for all devices.
lspath -l <device_name>
This command displays all paths associated with a specific device, including their status (Available, Defined, Failed).
lsmpio:
lsmpio
This command shows detailed information and status for all devices and paths, including their status and path status.
lsmpio -l <device_name>
This command provides detailed information for a specific device.
Scheduled Maintenance
Converge manages all hardware and VIOS maintenance, including firmware upgrades and network changes. Converge sends notifications in advance of any planned maintenance.
Before Maintenance:
Check Path Status: Use lspath or lsmpio to get a baseline of current path status. This will help identify any discrepancies after maintenance.
Resolve any Down Paths: If paths are discovered as down they should be fixed prior to maitnenance to avoid an outage. A standard method for doing so is to:
- Find the failed paths using lspath, note the hdisk and fscsi device
- Remove the failed paths using
rmpath -l hdiskX -p fscsiY
- Rediscover all paths using cfgmgr
- Use lspath to verify the path state
After Maintenance:
Verify Path Status: Use lspath or lsmpio again to confirm that all paths have recovered and are in the “Available” state.
Recover Paths: Sometimes AIX does not automatically recover paths. During these scenarios, customer should attempt to recover the paths. A standard method for doing so is to:
- Find the failed paths using lspath, note the hdisk and fscsi device
- Remove the failed paths using
rmpath -l hdiskX -p fscsiY
- Rediscover all paths using cfgmgr
- Use lspath to verify the path state
Report Issues: If there are any issues with pathing or storage connectivity after maintenance, promptly report them to Converge for resolution.
By following these guidelines and proactively monitoring MPIO paths, customers can ensure the high availability and performance of their applications running on IBM Power for Google.
4 - AIX TCP/IP Settings
Optimizing TCP/IP Settings for Improved Network Performance in IP4G
This document provides guidance on adjusting TCP/IP settings in your IP4G environment within Google Cloud to potentially enhance network performance. These settings are intended as starting points and may require further tuning based on the specific needs of your applications and virtual machines.
Note: Before making any changes, ensure you have a baseline understanding of your current network performance. This will help you assess the impact of any adjustments made.
Recommended TCP/IP Settings
The following commands can be used to modify the TCP/IP settings:
chdev -l en0 -a tcp_sendspace=2049152
chdev -l en0 -a tcp_recvspace=2049152
chdev -l en0 -a rfc1323=1
chdev -l en0 -a mtu=1460
no -p -o sb_max=8196608
no -p -o tcp_nodelayack=0
no -p -o sack=1
chdev -l en0 -a mtu_bypass=on
no -p -o tcp_sendspace=2049152
no -p -o tcp_recvspace=2049152
Explanation of Settings
- tcp_sendspace & tcp_recvspace: These settings control the send and receive buffer sizes for TCP connections. Increasing these values can improve performance, especially for high-bandwidth connections.
- rfc1323: Enables TCP extensions defined in RFC 1323, including Timestamps and Window Scaling, which can improve performance on high-latency connections.
- mtu: Sets the Maximum Transmission Unit (MTU) size. This value determines the largest packet size that can be transmitted over a network. In Google Cloud, the default VPC MTU is 1460 bytes. While you can adjust this to a value between 1300 and 8896 bytes (inclusive), it’s generally recommended to keep the MTU at 1460 to ensure compatibility within the Google Cloud environment and avoid potential fragmentation issues. If your VPC is configured with a custom MTU, ensure the
mtu
setting on your IP4G instances matches the VPC MTU. If your GCP VPC is at the default 1460 MTU, your IP4G AIX instances should use an MTU of 1440. - sb_max: Sets the maximum socket buffer size. Increasing this value can improve performance for applications that utilize large socket buffers.
- tcp_nodelayack: Disables the Nagle algorithm, which can improve performance for certain applications by reducing latency. However, it may increase network overhead.
- sack: Enables Selective Acknowledgment (SACK), which can improve performance in the presence of packet loss.
- mtu_bypass: Allows packets larger than the MTU to be sent, potentially improving performance for certain applications.
Evaluating the Results
After implementing these settings, it’s essential to monitor your network performance to determine their effectiveness. Several tools can assist in this evaluation:
- Network Monitoring Tools: Utilize tools like
netstat
,tcpdump
, orWireshark
to monitor network traffic and identify any bottlenecks or performance issues. - Performance Benchmarking Tools: Employ tools like
iperf3
to measure network throughput and latency before and after applying the settings. - Application-Specific Monitoring: Monitor the performance of your applications to assess the impact of the TCP/IP adjustments on their behavior.
Remember: These settings are starting points, and further adjustments may be necessary based on your specific environment and application requirements. Continuously monitor and fine-tune these settings to optimize your network performance.
Additional Considerations
- Google Cloud Network Infrastructure: When adjusting TCP/IP settings, consider the characteristics of your Google Cloud Virtual Private Cloud (VPC) network. Factors like the configured MTU (typically 1460 bytes), subnets, firewall rules, and any network virtualization layers can influence network performance. Ensure your settings are compatible with your VPC configuration and don’t introduce unintended bottlenecks.
- Application Requirements: Different applications have varying network performance needs. Research and understand the specific requirements of your applications to fine-tune the settings accordingly. For example, applications sensitive to latency might benefit from disabling
tcp_nodelayack
, while those prioritizing throughput might benefit from larger send and receive buffers. - Virtual Machine Configuration: If you’re running virtual machines on Compute Engine, ensure the virtual network interfaces are configured correctly. Verify that the machine type provides sufficient network bandwidth and that no resource limitations on the VM instance are hindering network performance.
By carefully adjusting and monitoring your TCP/IP settings, you can potentially enhance the performance of your IP4G environment and ensure optimal network efficiency for your applications.
5 - Install gcloud SDK on AIX
Installing the gcloud sdk on AIX will allow you to download and upload from Google Cloud Storage buckets, as well as controlling other aspects of your google cloud environment. In AIX, it is primarily used for interacting with Storage Buckets and objects.
This Guide is not comprehensive, as covering all AIX versions and types is not possible. Note that it is easiest on AIX 7.3, as it requires python 3.8 or above. This example assumes a system built by downloading the AIX 7.3 TL1 stock image.
First, prepare your filesystems for new content
chfs -a size=+2G /opt
chfs -a size=+500M /tmp
chfs -a size=+500M /var
Next, I recommend you update your system using SUMA. To do this, we’ll clear out /usr/sys/inst.images first
rm -rf /usr/sys/inst.images/*
smitty suma
Select Download Updates Now (Easy)
Select Download All Latest Fixes
Once those have downloaded, update your system using
smitty update_all
For the directory, enter /usr/sys/inst.images
Change ACCEPT new license agreements?
to yes
Once those updates have installed run
updtvpkg
dnf update python3 dnf
You should now be ready to install requisite software for the gcloud sdk
dnf install curl coreutils tar git bash python3-pip
Change your path to use the new gnu utilities
export PATH=/opt/freeware/bin:$PATH
Download the gcloud sdk installer and run it
curl https://sdk.cloud.google.com | bash
For an installation directory use /opt/freeware/
For Do you want to help improve the Google Cloud CLI (y/N)?
say n
You will now see:
ERROR: (gcloud.components.update) The following components are unknown [gcloud-crc32c].
You can disregard this. You may wish to switch to a non-root user for the remaining steps.
Set your path
export PATH=/opt/freeware/google-cloud-sdk/bin/:$PATH:/opt/freeware/bin
Now you can run the google cloud sdk:
gcloud auth login
Follow the login prompts, pasting in the code to authenticate.
Adjust your crc settings. Using if_fast_else_skip
is faster and uses less CPU, but also does no crc checking.
gcloud config set storage/check_hashes if_fast_else_skip
or
gcloud config set storage/check_hashes always
You should now be able to list the content of buckets you have access to, and download files.
gcloud storage ls gs://<bucketname>
gcloud storage cp gs://<bucketname>/<filename> /path/to/download/
6 - RMC details and troubleshooting
This article provides details on Resource Monitoring and Control (RMC). Also presented below are troubleshooting methods for common problems.
Use the methods below to address issues. If unable to resolve the issue, reach out for support. For more information about contacting support, see Obtaining a VM Image LINK
RMC details and troubleshooting
This article provides details on Resource Monitoring and Control (RMC). Also presented below are troubleshooting methods for common problems.
What is RMC?
Management consoles use RMC to perform dynamic operations on a virtual machine (VM). RMC connections are routed through a dedicated internal virtual network using IPv6. That network’s configuration prevents a VM from communicating with another VM.
How to troubleshoot RMC
The methods below can help troubleshoot common problems with RMC. Most common is a VM cannot be modified online and is in an unhealthy state. In the example below, the Health of the virtual machine is listed as “Warning”.
$ pcloud compute instances list
InstanceID Name Status Health IPs
12345678-9abc-123a-b456-789abcdef123 lpar1 ACTIVE WARNING [192.168.1.5]
Restart RMC
Restarting RMC is the most common solution.
/usr/sbin/rsct/bin/rmcctrl -z
/usr/sbin/rsct/bin/rmcctrl -A
/usr/sbin/rsct/bin/rmcctrl -p
Be aware that layered software using Reliable Scalable Cluster Technology (RSCT) will be impacted. For example, this will trigger an immediate failover in PowerHA environments.
Validate RSCT version
Validate the version of the RSCT. Methods for this depend on the operating system. The RSCT packages must be at version 3.2.1.0 or later.
- AIX
lslpp -L rsct.*
- RedHat
rpm -qa | grep -e rsct -e src
Gathering RMC information
Use the following to gather information about the RMC. This information can be helpful in resolving many issues.
/usr/sbin/rsct/bin/lsnodeid
lsrsrc IBM.MCP
/opt/rsct/bin/rmcdomainstatus -s ctrmc
Validating Connectivity
Validate the connectivity by using the methods below.
- Verify that the en1 interface has an IPv6 address beginning with fe80::
- For AIX use:
netstat -in
Make sure the following lines are uncommented in /etc/rc.tcpip:# netstat -in Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll ... en1 1500 fe80::ecad:f1ff:febe:ea13 711114 0 711198 0 0 ...
Then, execute the following:start /usr/sbin/autoconf6 "" " -i en1" start /usr/sbin/ndpd-host "$src_running"
autoconf6 -i en1
- For Linux use:
ip addr show
- For AIX use:
- Get the HMC or Novalink IPv6 address from the virtual machine. Use this command:
lsrsrc IBM.MCP
- Ping the IPv6 address. If the ping fails, please escalate to support.
- Telnet ipv6_address 657. If a ping is successful, but telnet fails to connect, there may be a firewall issue.
Verify the services are active
Use the following command to verify if the services are active.
lssrc -s ndpd-host
If it isn’t active, use the following:
startsrc -s ndpd-host