Skip to main content
logoTetrate Service BridgeVersion: 1.12.x

Troubleshooting Failover Operations

If the standby Management Plane does not take control, follow these steps to troubleshoot.

Troubleshooting

You can monitor the progress of control planes as they re-connect to the new Management Plane instance using the TSB UI or the tctl tool.

  1. Go to the TSB UI and review the Clusters page. When each workload cluster connects to the new Management Plane, you will see its status and a last sync timestamp.

  2. If preferred, you can use tctl to validate the status of each cluster:

    tctl x status cluster my-cluster-id
    NAME            STATUS    LAST EVENT      MESSAGE
    my-cluster-id READY XCP_ACCEPTED Cluster onboarded

If the control plane in a Workload cluster does not reconnect to the new Management Plane, perform the following steps:

  1. Restart the Edge deployment on the control plane in the Workload cluster

    Switch to each workload cluster and restart the edge deployment:

    On the workload cluster....
    kubectl rollout restart deployment -n istio-system edge
  2. Check the Edge deployment logs

    Check the logs from the edge deployment as follows:

    On the workload cluster....
    kubectl logs deploy/edge -n istio-system -f
  3. Clear any cached tokens

    If the Edge has reconnected to the new Management Plane, but you observe logs indicating token or authentication issues, delete the existing tokens on the Control Plane and then verify that these tokens are re-generated on the Control Plane:

    On the workload cluster....
    kubectl delete secret otel-token oap-token ngac-token xcp-edge-central-auth-token -n istio-system
    sleep 60
    kubectl get secrets otel-token oap-token ngac-token xcp-edge-central-auth-token -n istio-system

Further troubleshooting

Check for certificate errors and other error situations, as described in the Control Plane troubleshooting instructions.