Failover and Failback

Concepts

Failover and Failback fall into two domains: testing and disaster. There may be other cases, but primarily those are the two concerns customers need to address. Actual failover is disruptive to a customer’s DR posture and ability to recover, so is generally not used for testing. Instead, a cloning process is utilized to create copies of the DR volumes, which can be validated and removed after the test, without disrupting the ability to fail over should an actual emergency occur.

In order to perform any of these activities, the Auxillary volumes must be onboarded at the target site.

Cloning (DR Testing)

Validate the volume group you want to test.

pcloud compute volume-groups list

Validate the list of all volumes in the volume group.

pcloud compute volume-groups describe <volume-group>

Clone the volumes. Note, to keep the volumes at a consistent point, you want to clone them all in one command.

pcloud compute volumes clones create <clone name> -v <vol1> (-v volX)

Validate the cloning status with

pcloud compute volumes clones status <clone name>

When the status is completed, you’ll have volumes named clone-<clone name>-X, which can be attached to VMs for testing.

Failover

To stop a volume group so that the Target site volumes can be used, validate the volume group you want to fail over.

pcloud compute volume-groups list

Validate the list of all volumes in the volume group.

pcloud compute volume-groups describe <volume-group>

Stop the volume group, allowing the target Auxillary volume to be accessed.

pcloud compute volume-groups stop --allow-read-access

You can now attach the volumes directly to a VM.

Note: At this point the volumes at BOTH sites can be modified. To restart replication post failover, the volumes which will be the TARGET must not be attached to a VM. You also have to select a replication direction. In doing so, data at the specified TARGET site will be overwritten with the changes/data from the specified SOURCE.

Restarting Replication

To determine which direction you want, look at the volume group

pcloud compute volume-groups relationship <group>

There is a key primary - this is indicating if the primary (source) is currently the volume(s) listed as master or aux. In the start command you need to specify which is the primary once it starts.

Example - if primary shows: primary: master and you specify master when you restart the volume group, it will keep its original copy direction. Data on the Aux volume will be overwritten with the data from the Master volume.

To restart replication in the original direction (overwriting the target):

pcloud compute volume-groups start <group> master

Failback

To fail back to the original site, first restart your replication so that the original aux volume is now the source:

pcloud compute volume-groups start <group> aux 

Note: This will copy all date from the aux volume to the master volume

Once the volume-group is in a consistent_copying state, use the same process as above to stop the replication, enable access to the master volumes in the original site, and access the volumes.