Troubleshooting Failover Operations
If the standby Management Plane does not take control, follow these steps to troubleshoot.
Troubleshooting
You can monitor the progress of control planes as they re-connect to the new Management Plane instance using the TSB UI or the tctl tool.
-
Go to the TSB UI and review the Clusters page. When each workload cluster connects to the new Management Plane, you will see its status and a last sync timestamp.
-
If preferred, you can use
tctlto validate the status of each cluster:tctl status cluster my-cluster-idNAME STATUS LAST EVENT MESSAGE
my-cluster-id READY XCP_ACCEPTED Cluster onboarded
If the control plane in a Workload cluster does not reconnect to the new Management Plane, perform the following steps:
Restart the Edge deployment on the control plane in the Workload cluster
Switch to each workload cluster and restart the edge deployment:
On the workload cluster....kubectl rollout restart deployment -n istio-system edgeCheck the Edge deployment logs
Check the logs from the
edgedeployment as follows:On the workload cluster....kubectl logs deploy/edge -n istio-system -fClear any cached tokens
If the Edge has reconnected to the new Management Plane, but you observe logs indicating token or authentication issues, delete the existing tokens on the Control Plane and then verify that these tokens are re-generated on the Control Plane:
On the workload cluster....kubectl delete secret otel-token oap-token ngac-token xcp-edge-central-auth-token -n istio-system
sleep 60
kubectl get secrets otel-token oap-token ngac-token xcp-edge-central-auth-token -n istio-system
Further troubleshooting
Check for certificate errors and other error situations, as described in the Control Plane troubleshooting instructions.