1 - Preparing systems for migration

Use the following prerequisites and recommendations to ensure a successful migration to IBM Power for Google Cloud (IP4G).

For AIX systems, ensure that the operating systems are running the following minimum versions:

AIX VersionMinimum TL/MLNotes
5.35300-12-04Only supported within a Versioned WPAR
7.17100-05-06
7.27200-05-01
7.37300-00-01

Additional software requirements:

  • The devices.fcp.disk.ibm.mpio package must not be installed. Uninstall it if necessary.

Additional recommendations:

  • Install cloud-init. This can be done using dnf, yum, or RPM. There are several prerequisites to install for cloud-init. The cloud-init software package is required to leverage some features. Those features include:
    • Image capture and deploy.
    • Automatic configuration of IP addresses, through the IP4G interface.
  • Update MPIO settings for each disk to set algorithm to shortest_queue or round_robin:
    chdev -P -l hdiskX -a algorithm=shortest_queue -a reserve_policy=no_reserve
    
  • Reboot for the attribute changes to take effect.

2 - Migrating to IP4G using a mksysb and alt_disk_mksysb

Use a mksysb from an existing system to migrate into IP4G. Do this by building a new system from a stock image, and using the alt_disk_mksysb command. The following steps highlight how to do so using the pcloud CLI. However, the IP4G specific steps can also be performed from the GUI.

Capturing a Source System mksysb

First, check if the fileset devices.fcp.disk.ibm.mpio exists. To do this, execute the following:

lslpp -Lc | grep devices.fcp.disk.ibm.mpio

If the fileset is installed, there are two ways to handle this. Either it must be removed from the system before the mksysb is created, or when running alt_disk_mksysb. Doing that will involve using the alt_rootvg_op command to wake the alt disk and remove the fileset.

Customers are responsible for evaluating the impact of removing the fileset in their environment.

To remove execute:

installp -u devices.fcp.disk.ibm.mpio

To begin, take a mksysb on the source system. It is recommended that the rootvg of the source system is not mirrored. If it is, edit the image.data file to unmirror it before restoring it. Details for that are included below. If the source system is running AIX 7.2 or higher, using the following command:

mksysb -C -i /path/to/hostname.mksysb

If the source system is running AIX 7.1, use the following command:

mksysb -i /path/to/hostname.mksysb

Build the target system

Build the system to do the alt_disk_mksysb on. This system will boot at first from a stock AIX image. Then after the mksysb restore, it will boot from the restored AIX image. If a stock image has not already been imported, see Obtaining a VM Image

LINK

To get started, gather the following information:

  • Use a hostname that will be the final hostname in IP4G.
  • Use a stock image that most closely matches the source system. Later TL/SP levels are OK.
  • Use the desired CPU, Memory, CPU Type, and Network settings for the final host to have.

Example:

pcloud compute instances create hostname –image AIX-7200-05-09 –network gcp-network -c 0.25 -m 8 -t shared

Add disks to the target system

Two disks are needed. One for temporary storage for the mksysb. The other for restoring the mksysb to alt_disk_mksysb. Wait for the new instance to build.

  1. First, build the target disk for the mksysb. It needs to be large enough to hold the source system rootvg. Change the size and disk type appropriately for the system.

    pcloud compute volumes create hostname-rootvg -s 20 -T ssd
    
  2. Log into the new target system from the console as root. Then, run cfgmgr to discover the new disk. There should now be two disks: hdisk0, the original stock AIX image disk, and hdisk1, the target disk for the mksysb. One more disk is needed to hold the mksysb to restore. The easiest to cleanup is to add it to the rootvg, and expand /tmp.

  3. Use the pcloud command to create a new disk, changing the size to be sufficient for holding the mksysb:

    pcloud compute volumes create hostname-tempdisk -s 20 -T ssd
    
  4. Log into the target system, and discover the new disk with cfgmgr. The new disk should be hdisk2. To validate which is which, run: lsmpio -qa

  5. Add the disk to the rootvg. Note that it is important to only add the temporary disk here: extendvg rootvg hdisk2.

  6. Add space to /tmp chfs -a size=+20G /tmp

Restore the mksysb on the target system

Use the following to restore the mksysb on a target system.

  1. Copy the mksysb file from the original source system to the new target system.

  2. Place it in /tmp. Use any preferred method, such as scp, for transferring the mksysb. Note that if the original rootvg was mirrored, unmirror it before using alt_disk_mksysb. Do this by restoring the image.data file and editing it. It must be edited so the PPs line for each LV is equal to the LPs line. To restore image.data, use this command:

    restore -xqvf /tmp/hostname.mksysb ./image.data
    
    • If this had to be done, specify to use the new image.data when restoring. Add the flag -i /tmp/image.data to the alt_disk_mksysb command.

    • To restore the mksysb use the alt_disk_mksysb command:

    alt_disk_mksysb -m /tmp/hostname.mksysb -d hdisk1 -z -c /dev/vty0
    
    • This will automatically set the bootlist to boot off of the new volume, hdisk1.
    • Reboot: shutdown -Fr
  3. Confirm the VM has booted from the disk containing the restored mksysb. In this example, that would be hdisk1. Use lspv to validate which disks / vg’s are present:

    lspv
    

Clean up the temporary disks

Use the following to clean up the temporary disks.

  1. Use exportvg to remove the old rootvg exportvg old_rootvg rmdev -dl hdisk0.

    rmdev -dl hdisk2
    
  2. Use the pcloud command to set the new rootvg volume as bootable:

    pcloud compute volumes update hostname-rootvg --bootable yes
    
  3. Use the pcloud command to clean up the old hdisks. Find the old boot disk name using:

    pcloud compute instances describe hostname
    

Set the old disk as not bootable

Set the old disk so it is not bootable. Match the name to the boot-0 volume from the instances describe output. To do this use:

pcloud compute volumes update hostname-d4751509-00000b25-boot-0 --bootable no

Delete the original boot volume and the temporary disks

Use the following to delete the original boot volume, and the temporary disks.

pcloud compute volumes delete hostname-d4751509-00000b25-boot-0

pcloud compute volumes delete hostname-tempdisk

3 - AIX MPIO Recommendations

This document outlines Multipath I/O (MPIO) best practices for IBM Power for Google (IP4G) deployments, focusing on actions customers can take to ensure optimal performance and availability.

Configuration

Converge handles the underlying MPIO configuration, including redundant paths, adapter diversity, and fabric management.

You can learn more about the general MPIO configuration from the official IBM documentation:

It is important customers understand their Application and select MPIO policies that best suit their Application requirements.

Key Considerations:

  1. Redundant Paths: Converge provides four physical paths to the backend storage, distributed across two VIOS, for enhanced redundancy.
  2. Dual Fabric Fiber Channel: Converge uses dual fabric Fiber Channel for all paths to minimize single points of failure.
  3. Pathing Policy: Customers should understand and adjust the MPIO pathing policy if needed. Common options include:
    • Round Robin: Distributes I/O requests evenly across available paths.
    • Shortest Queue: Directs I/O to the path with the least congestion.
    • Failover Only: Designates a primary path and uses alternative paths only when the primary fails.

Monitoring Available Paths

Regularly monitor the status of MPIO paths to proactively identify potential issues. Customers should consider integrating these MPIO status into their monitoring and alerting. Sample commands for monitoring:

lspath:

lspath

This command displays path status for all devices.

lspath -l <device_name> 

This command displays all paths associated with a specific device, including their status (Available, Defined, Failed).

lsmpio:

lsmpio

This command shows detailed information and status for all devices and paths, including their status and path status.

lsmpio -l <device_name>

This command provides detailed information for a specific device.

Scheduled Maintenance

Converge manages all hardware and VIOS maintenance, including firmware upgrades and network changes. Converge sends notifications in advance of any planned maintenance.

Before Maintenance:

  • Check Path Status: Use lspath or lsmpio to get a baseline of current path status. This will help identify any discrepancies after maintenance.

  • Resolve any Down Paths: If paths are discovered as down they should be fixed prior to maitnenance to avoid an outage. A standard method for doing so is to:

    • Find the failed paths using lspath, note the hdisk and fscsi device
    • Remove the failed paths using rmpath -l hdiskX -p fscsiY
    • Rediscover all paths using cfgmgr
    • Use lspath to verify the path state

After Maintenance:

  • Verify Path Status: Use lspath or lsmpio again to confirm that all paths have recovered and are in the “Available” state.

  • Recover Paths: Sometimes AIX does not automatically recover paths. During these scenarios, customer should attempt to recover the paths. A standard method for doing so is to:

    • Find the failed paths using lspath, note the hdisk and fscsi device
    • Remove the failed paths using rmpath -l hdiskX -p fscsiY
    • Rediscover all paths using cfgmgr
    • Use lspath to verify the path state

Report Issues: If there are any issues with pathing or storage connectivity after maintenance, promptly report them to Converge for resolution.

By following these guidelines and proactively monitoring MPIO paths, customers can ensure the high availability and performance of their applications running on IBM Power for Google.

4 - AIX TCP/IP Settings

Optimizing TCP/IP Settings for Improved Network Performance in IP4G

This document provides guidance on adjusting TCP/IP settings in your IP4G environment within Google Cloud to potentially enhance network performance. These settings are intended as starting points and may require further tuning based on the specific needs of your applications and virtual machines.

Note: Before making any changes, ensure you have a baseline understanding of your current network performance. This will help you assess the impact of any adjustments made.

The following commands can be used to modify the TCP/IP settings:

chdev -l en0 -a tcp_sendspace=2049152
chdev -l en0 -a tcp_recvspace=2049152
chdev -l en0 -a rfc1323=1
chdev -l en0 -a mtu=1460
no -p -o sb_max=8196608
no -p -o tcp_nodelayack=0
no -p -o sack=1
chdev -l en0 -a mtu_bypass=on
no -p -o tcp_sendspace=2049152
no -p -o tcp_recvspace=2049152

Explanation of Settings

  • tcp_sendspace & tcp_recvspace: These settings control the send and receive buffer sizes for TCP connections. Increasing these values can improve performance, especially for high-bandwidth connections.
  • rfc1323: Enables TCP extensions defined in RFC 1323, including Timestamps and Window Scaling, which can improve performance on high-latency connections.
  • mtu: Sets the Maximum Transmission Unit (MTU) size. This value determines the largest packet size that can be transmitted over a network. In Google Cloud, the default VPC MTU is 1460 bytes. While you can adjust this to a value between 1300 and 8896 bytes (inclusive), it’s generally recommended to keep the MTU at 1460 to ensure compatibility within the Google Cloud environment and avoid potential fragmentation issues. If your VPC is configured with a custom MTU, ensure the mtu setting on your IP4G instances matches the VPC MTU. If your GCP VPC is at the default 1460 MTU, your IP4G AIX instances should use an MTU of 1440.
  • sb_max: Sets the maximum socket buffer size. Increasing this value can improve performance for applications that utilize large socket buffers.
  • tcp_nodelayack: Disables the Nagle algorithm, which can improve performance for certain applications by reducing latency. However, it may increase network overhead.
  • sack: Enables Selective Acknowledgment (SACK), which can improve performance in the presence of packet loss.
  • mtu_bypass: Allows packets larger than the MTU to be sent, potentially improving performance for certain applications.

Evaluating the Results

After implementing these settings, it’s essential to monitor your network performance to determine their effectiveness. Several tools can assist in this evaluation:

  • Network Monitoring Tools: Utilize tools like netstat, tcpdump, or Wireshark to monitor network traffic and identify any bottlenecks or performance issues.
  • Performance Benchmarking Tools: Employ tools like iperf3 to measure network throughput and latency before and after applying the settings.
  • Application-Specific Monitoring: Monitor the performance of your applications to assess the impact of the TCP/IP adjustments on their behavior.

Remember: These settings are starting points, and further adjustments may be necessary based on your specific environment and application requirements. Continuously monitor and fine-tune these settings to optimize your network performance.

Additional Considerations

  • Google Cloud Network Infrastructure: When adjusting TCP/IP settings, consider the characteristics of your Google Cloud Virtual Private Cloud (VPC) network. Factors like the configured MTU (typically 1460 bytes), subnets, firewall rules, and any network virtualization layers can influence network performance. Ensure your settings are compatible with your VPC configuration and don’t introduce unintended bottlenecks.
  • Application Requirements: Different applications have varying network performance needs. Research and understand the specific requirements of your applications to fine-tune the settings accordingly. For example, applications sensitive to latency might benefit from disabling tcp_nodelayack, while those prioritizing throughput might benefit from larger send and receive buffers.
  • Virtual Machine Configuration: If you’re running virtual machines on Compute Engine, ensure the virtual network interfaces are configured correctly. Verify that the machine type provides sufficient network bandwidth and that no resource limitations on the VM instance are hindering network performance.

By carefully adjusting and monitoring your TCP/IP settings, you can potentially enhance the performance of your IP4G environment and ensure optimal network efficiency for your applications.

5 - Install gcloud SDK on AIX

Installing the gcloud sdk on AIX will allow you to download and upload from Google Cloud Storage buckets, as well as controlling other aspects of your google cloud environment. In AIX, it is primarily used for interacting with Storage Buckets and objects.

This Guide is not comprehensive, as covering all AIX versions and types is not possible. Note that it is easiest on AIX 7.3, as it requires python 3.8 or above. This example assumes a system built by downloading the AIX 7.3 TL1 stock image.

First, prepare your filesystems for new content

chfs -a size=+2G /opt
chfs -a size=+500M /tmp
chfs -a size=+500M /var

Next, I recommend you update your system using SUMA. To do this, we’ll clear out /usr/sys/inst.images first

rm -rf /usr/sys/inst.images/*
smitty suma

Select Download Updates Now (Easy)
Select Download All Latest Fixes

Once those have downloaded, update your system using

smitty update_all

For the directory, enter /usr/sys/inst.images
Change ACCEPT new license agreements? to yes

Once those updates have installed run

updtvpkg
dnf update python3 dnf

You should now be ready to install requisite software for the gcloud sdk

dnf install curl coreutils tar git bash python3-pip

Change your path to use the new gnu utilities

export PATH=/opt/freeware/bin:$PATH

Download the gcloud sdk installer and run it

curl https://sdk.cloud.google.com | bash

For an installation directory use /opt/freeware/
For Do you want to help improve the Google Cloud CLI (y/N)? say n

You will now see:

ERROR: (gcloud.components.update) The following components are unknown [gcloud-crc32c].

You can disregard this. You may wish to switch to a non-root user for the remaining steps.

Set your path

export PATH=/opt/freeware/google-cloud-sdk/bin/:$PATH:/opt/freeware/bin

Now you can run the google cloud sdk:

gcloud auth login

Follow the login prompts, pasting in the code to authenticate.

Adjust your crc settings. Using if_fast_else_skip is faster and uses less CPU, but also does no crc checking.

gcloud config set storage/check_hashes if_fast_else_skip

or

gcloud config set storage/check_hashes always

You should now be able to list the content of buckets you have access to, and download files.

gcloud storage ls gs://<bucketname>
gcloud storage cp gs://<bucketname>/<filename> /path/to/download/

6 - RMC details and troubleshooting

This article provides details on Resource Monitoring and Control (RMC). Also presented below are troubleshooting methods for common problems.

Use the methods below to address issues. If unable to resolve the issue, reach out for support. For more information about contacting support, see Obtaining a VM Image LINK

RMC details and troubleshooting

This article provides details on Resource Monitoring and Control (RMC). Also presented below are troubleshooting methods for common problems.

What is RMC?

Management consoles use RMC to perform dynamic operations on a virtual machine (VM). RMC connections are routed through a dedicated internal virtual network using IPv6. That network’s configuration prevents a VM from communicating with another VM.

How to troubleshoot RMC

The methods below can help troubleshoot common problems with RMC. Most common is a VM cannot be modified online and is in an unhealthy state. In the example below, the Health of the virtual machine is listed as “Warning”.

$ pcloud compute instances list
InstanceID                            Name             Status   Health   IPs
12345678-9abc-123a-b456-789abcdef123  lpar1            ACTIVE   WARNING  [192.168.1.5]

Restart RMC

Restarting RMC is the most common solution.

/usr/sbin/rsct/bin/rmcctrl -z
/usr/sbin/rsct/bin/rmcctrl -A
/usr/sbin/rsct/bin/rmcctrl -p

Be aware that layered software using Reliable Scalable Cluster Technology (RSCT) will be impacted. For example, this will trigger an immediate failover in PowerHA environments.

Validate RSCT version

Validate the version of the RSCT. Methods for this depend on the operating system. The RSCT packages must be at version 3.2.1.0 or later.

  • AIX
lslpp -L rsct.*
  • RedHat
rpm -qa | grep -e rsct -e src

Gathering RMC information

Use the following to gather information about the RMC. This information can be helpful in resolving many issues.

/usr/sbin/rsct/bin/lsnodeid
lsrsrc IBM.MCP
/opt/rsct/bin/rmcdomainstatus -s ctrmc

Validating Connectivity

Validate the connectivity by using the methods below.

  1. Verify that the en1 interface has an IPv6 address beginning with fe80::
    • For AIX use: netstat -in
      # netstat -in
      Name   Mtu   Network     Address                 Ipkts     Ierrs        Opkts     Oerrs  Coll
      ...
      en1    1500  fe80::ecad:f1ff:febe:ea13              711114     0           711198     0     0
      ...
      
      Make sure the following lines are uncommented in /etc/rc.tcpip:
      start /usr/sbin/autoconf6 "" " -i en1"
      start /usr/sbin/ndpd-host "$src_running"
      
      Then, execute the following: autoconf6 -i en1
    • For Linux use: ip addr show
  2. Get the HMC or Novalink IPv6 address from the virtual machine. Use this command: lsrsrc IBM.MCP
  3. Ping the IPv6 address. If the ping fails, please escalate to support.
  4. Telnet ipv6_address 657. If a ping is successful, but telnet fails to connect, there may be a firewall issue.

Verify the services are active

Use the following command to verify if the services are active.

lssrc -s ndpd-host

If it isn’t active, use the following:

startsrc -s ndpd-host