Troubleshooting Failover Operations
If the standby Management Plane does not take control, follow these steps to troubleshoot.
Troubleshooting
You can monitor the progress of control planes as they re-connect to the new Management Plane instance using the TSB UI or the tctl
tool.
-
Go to the TSB UI and review the Clusters page. When each workload cluster connects to the new Management Plane, you will see its status and a last sync timestamp.
-
If preferred, you can use
tctl
to validate the status of each cluster:tctl x status cluster my-cluster-idNAME STATUS LAST EVENT MESSAGE
my-cluster-id READY XCP_ACCEPTED Cluster onboarded
If the control plane in a Workload cluster does not reconnect to the new Management Plane, perform the following steps:
Restart the Edge deployment on the control plane in the Workload cluster
Switch to each workload cluster and restart the edge deployment:
On the workload cluster....kubectl rollout restart deployment -n istio-system edge
Check the Edge deployment logs
Check the logs from the
edge
deployment as follows:On the workload cluster....kubectl logs deploy/edge -n istio-system -f
Clear any cached tokens
If the Edge has reconnected to the new Management Plane, but you observe logs indicating token or authentication issues, delete the existing tokens on the Control Plane and then verify that these tokens are re-generated on the Control Plane:
On the workload cluster....kubectl delete secret otel-token oap-token ngac-token xcp-edge-central-auth-token -n istio-system
sleep 60
kubectl get secrets otel-token oap-token ngac-token xcp-edge-central-auth-token -n istio-system
Further troubleshooting
Check for certificate errors and other error situations, as described in the Control Plane troubleshooting instructions.