Troubleshoot the Tetrate Management Plane
How to troubleshoot the Tetrate Management Plane, and actions to recover in the event of a failure.
This document explains how you can troubleshoot a failing Tetrate Management Plane, and suggests possible recovery steps. If it proves impossible to recover the failed Management Plane, you can then start your chosen failover process. You may wish to refer to Tetrate Technical Support for assistance with this procedure.
If your Management Plane appears to be functioning correctly, but one or more of your Workload clusters cannot connect, check out the Control Plane Troubleshooting guide.
Troubleshooting Problems with the Tetrate Management Plane
If the Tetrate Management Plane appears to have failed, you should first consider the following steps to attempt to recover it:
Check that the Management Plane components appear to be running
Verify that the Management Plane cluster is running, and that you can access the
tsb
namespace:kubectl get pods -n tsb
Check the logs from the tsb-operator, which is responsible for deploying and configuring the TSB Management Plane components:
kubectl logs -n tsb deployment/tsb-operator-management-plane
Check Access to the Management Plane endpoint
The Management Plane will use a well-known front-Envoy endpoint, listening for HTTPS traffic (UI and API) on port 443:
kubectl get svc envoy -n tsb
Verify that the external IP address is reachable and can be resolved using the FQDN for the Management Plane. The FQDN for the Management Plane appears in the
tctl
configuration (runtctl ui
to see it), and in each Workload Cluster'scontrolplane
CR:Run against a Workload Cluster (not the Management Cluster)kubectl get controlplane -o json -n istio-system | jq ".items[0].spec|.managementPlane,.telemetryStore"
Check the logs from the front-Envoy proxy:
kubectl logs deployment/envoy -n tsb
Check the logs from the Control Plane services
If the Management Plane is not functioning, then it's possible that the Control Plane services on each Workload Cluster cannot connect to the Management Plane. Check the logs from various services, looking for errors regarding connections to the Management Plane or errors regarding token validation:
kubectl logs deploy/edge -n istio-system -f
If necessary, delete the existing tokens on the Control Plane and then verify that these tokens are re-generated on the Control Plane.
kubectl delete secret otel-token oap-token ngac-token xcp-edge-central-auth-token -n istio-system
sleep 60
kubectl get secrets otel-token oap-token ngac-token xcp-edge-central-auth-token -n istio-systemCheck for certificate errors, as described in the Control Plane troubleshooting instructions.
Check the logs from the Management Plane IAM service
All requests through the front-Envoy are authenticated by the Management Plane IAM service:
kubectl logs deployment/iam -n tsb
You can try restarting the IAM service if you see unexpected errors:
kubectl delete pod -n tsb -lapp=iam
kubectl logs -f deployment/iam -n tsbFurther Analysis
You can continue to check logs from the deployments of other TSB Management Plane components:
- tsb hosts the Management Plane API service
- web hosts the Management Plane UI
- oap hosts the Management Plane observability analysis platform (based on Apache Skywalking)
- If you are using the embedded Postgres database, kubegres-controller-manager hosts the Kubegres operator which manages the Postgres instances.
Look particularly for errors relating to connection problems (indicating firewall issues) and authentication problems (indicating certificate or token problems). You can safely stop (delete) any TSB component pod that is managed by the TSB Management Plane Operator; the operator creates a deployment that will reload the component.
Next Steps
If you cannot quickly restore your Tetrate Management Plane, you can: