Skip to main content
logoTetrate Service BridgeVersion: 1.12.x

Troubleshoot the Tetrate Management Plane

How to troubleshoot the Tetrate Management Plane, and actions to recover in the event of a failure.

This document explains how you can troubleshoot a failing Tetrate Management Plane, and suggests possible recovery steps. If it proves impossible to recover the failed Management Plane, you can then start your chosen failover process. You may wish to refer to Tetrate Technical Support for assistance with this procedure.

Control Plane Troubleshooting

If your Management Plane appears to be functioning correctly, but one or more of your Workload clusters cannot connect, check out the Control Plane Troubleshooting guide.

Troubleshooting Problems with the Tetrate Management Plane

If the Tetrate Management Plane appears to have failed, you should first consider the following steps to attempt to recover it:

  1. Check that the Management Plane components appear to be running

    Verify that the Management Plane cluster is running, and that you can access the tsb namespace:

    kubectl get pods -n tsb

    Check the logs from the tsb-operator, which is responsible for deploying and configuring the TSB Management Plane components:

    kubectl logs -n tsb  deployment/tsb-operator-management-plane
  2. Check Access to the Management Plane endpoint

    The Management Plane will use a well-known front-Envoy endpoint, listening for HTTPS traffic (UI and API) on port 443:

    kubectl get svc envoy -n tsb

    Verify that the external IP address is reachable and can be resolved using the FQDN for the Management Plane. The FQDN for the Management Plane appears in the tctl configuration (run tctl ui to see it), and in each Workload Cluster's controlplane CR:

    Run against a Workload Cluster (not the Management Cluster)
    kubectl get controlplane -o json -n istio-system | jq ".items[0].spec|.managementPlane,.telemetryStore"

    Check the logs from the front-Envoy proxy:

    kubectl logs deployment/envoy -n tsb
  3. Check the logs from the Control Plane services

    If the Management Plane is not functioning, then it's possible that the Control Plane services on each Workload Cluster cannot connect to the Management Plane. Check the logs from various services, looking for errors regarding connections to the Management Plane or errors regarding token validation:

    kubectl logs deploy/edge -n istio-system -f

    If necessary, delete the existing tokens on the Control Plane and then verify that these tokens are re-generated on the Control Plane.

    kubectl delete secret otel-token oap-token ngac-token xcp-edge-central-auth-token -n istio-system
    sleep 60
    kubectl get secrets otel-token oap-token ngac-token xcp-edge-central-auth-token -n istio-system

    Check for certificate errors, as described in the Control Plane troubleshooting instructions.

  4. Check the logs from the Management Plane IAM service

    All requests through the front-Envoy are authenticated by the Management Plane IAM service:

    kubectl logs deployment/iam -n tsb

    You can try restarting the IAM service if you see unexpected errors:

    kubectl delete pod -n tsb -lapp=iam
    kubectl logs -f deployment/iam -n tsb
  5. Further Analysis

    You can continue to check logs from the deployments of other TSB Management Plane components:

    Look particularly for errors relating to connection problems (indicating firewall issues) and authentication problems (indicating certificate or token problems). You can safely stop (delete) any TSB component pod that is managed by the TSB Management Plane Operator; the operator creates a deployment that will reload the component.

Next Steps

If you cannot quickly restore your Tetrate Management Plane, you can: