Tetrate Service BridgeVersion: 1.12.x

Failover from one MP to another MP

How to failover from one MP instance to another instance.

Fail over to an alternate Management Plane

The failover process functions by updating the DNS address that identifies the Management Plane location, so that it points to the new Management Plane instance. Clients will start using the new instance when they re-resolve the DNS name.

Ensure that:

You are able to update the DNS address that identifies the Management Plane location
The new Management Plane has up-to-date configuration and is ready to take control

Shutdown the current Management Plane instance and activate the new Management Plane instance
If necessary, shutdown the current Management Plane so that it does not receive configuration updates:
On the current Management Cluster
```
kubectl scale deploy -n tsb tsb iam --replicas 0
```
Similarly, start the new Management Plane so that it can receive configuration updates:
On the new Management Cluster
```
kubectl scale deploy -n tsb tsb iam --replicas 1
```
Suspend the restore job (if present) on the new Cluster so that it does not attempt to write to the Postgres database:
On the new Management Cluster
```
kubectl patch cronjobs tsb-restore -n tsb -p '{"spec" : {"suspend" : true }}'
```
Verify that the new Management Plane is ready to take control
Log in to the new Management Plane UI:
- Verify that your Tetrate configuration is present in the Postgres database; look for cluster configurations (clusters will not have synced at this point) and the organizational structure (organization, tenants, workspaces) that you expect to see
- Check the Elastic historical data if available (if expected)
Update the DNS Record to point to the new Management Plane
Update the DNS record that you use you identify the Management Plane location, making it point to the new IP address for the new Management Plane instance.
Propagation may take time. Once the change has propagated, verify that you can access the Management Plane UI using the updated FQDN address.
Provoke each Edge cluster to re-connect to the new Management Plane
If possible, shut down the envoy service on the old Management Plane instance:
On the old Management Cluster
```
kubectl scale deploy -n tsb envoy --replicas 0
```
This should be sufficient to provoke each Edge cluster to re-connect to the new Management Plane.
Force a reconnect
If you need to manually force an Edge cluster to reconnect, restart the edge deployment to re-resolve the management plane IP address. This will provoke the cluster to begin using the new, working instance rather than the previous instance.
Switch to each workload cluster and restart the edge deployment:
kubectl rollout restart deployment -n istio-system edge
Validate Status in TSB UI
Go to the TSB UI and review the Clusters page. When each workload cluster connects to the new Management Plane, you will see its status and a last sync timestamp.
Validate Status using tctl
If preferred, you can use tctl to validate the status of each cluster:
tctl x status cluster my-cluster-id
```
NAME            STATUS    LAST EVENT      MESSAGE
my-cluster-id   READY     XCP_ACCEPTED    Cluster onboarded
```

With a successful restore of a new Management Plane, you will have fully recovered from the failure and your Workload Clusters will be under the control of the new Management Plane instance.

Final Steps

A failover operation is a last-resort, if it's not possible to recover the current Management Plane instance quickly. The old, failed Management Plane will contain a snapshot of the previous configuration and should not be reused.

You may wish to deploy another standby Management Plane for your newly-active instance, and prepare to perform the failover operation again should your new Management Plane instance ever fail.

Fail over to an alternate Management Plane​

Shutdown the current Management Plane instance and activate the new Management Plane instance​

Verify that the new Management Plane is ready to take control​

Update the DNS Record to point to the new Management Plane​

Provoke each Edge cluster to re-connect to the new Management Plane​

Validate Status in TSB UI​

Validate Status using tctl​

Final Steps​

Fail over to an alternate Management Plane

Shutdown the current Management Plane instance and activate the new Management Plane instance

Verify that the new Management Plane is ready to take control

Update the DNS Record to point to the new Management Plane

Provoke each Edge cluster to re-connect to the new Management Plane

Validate Status in TSB UI

Validate Status using tctl

Final Steps