Cluster onboarding troubleshooting
This document explains most common issues when onboarding new control planes into TSB.
Connectivity
The deployment tsb-operator-control-plane needs to have connectivity with the management plane URL. Communication is performed to
the front-envoy component in the tsb namespace, which is served by the envoy service.
Make sure that the control plane can reach it and it's not blocked by network policies, security groups or any firewall.
Troubleshooting
Once you've applied the necessary secrets, installed the control plane operator
and created the control plane CR, if there's some misconfiguration, some pods won't be able to start. Always check for tsb-operator-control-plane
logs, as it will give more information about what could be wrong.
Service account issues
If the service account to generate the tokens is not created, you'll get the following error:
error controlplane token rotation failed, retrying in 15m0s: secret istio-system/cluster-service-account not found: Secret "cluster-service-account" not found [scope="controlplane"]
Or it can also happen that it is not correctly configured:
error controlplane token rotation failed, retrying in 15m0s: cluster has been configured with incorrect service account secret. ControlPlane CR has cluster name "demo", but service account secret has "organizations/tetrate/clusters/not-demo" [scope="controlplane"]
In this example, we've created a cluster object called demo, but in the CP we're generating the service account for a cluster called not-demo.
To fix this issue you'll need to add the cluster name and service account token to the values.yaml file to install the CP. First generate the token:
tctl install cluster-service-account --cluster demo > /tmp/demo.jwk
And then configure the values.yaml file with the cluster name and the JWK file:
secrets:
tsb:
...
clusterServiceAccount:
clusterFQN: organizations/tetrate/clusters/demo
JWK: |
'{{ .Secrets.ClusterServiceAccount.JWK }}'
The cluster name needs to match with the cluster name added in the control plane CR under spec.managementPlane.clusterName.
Remember to restart the tsb-operator-control-plane pod to generate the secrets, and once generated, restart the control plane pods.
Control plane certificate issues
If the certificate tsb-certs configured in the management plane don't contain the correct URI SAN which is configured in
the control plane CR under spec.managementPlane.host, or both tsb-certs in tsb namespace and mp-cert in istio-system
namespace doesn't contain the same URI SAN, or are not signed by the same root/intermediate CA you'll get the following error:
error controlplane token rotation failed, retrying in 7.153870785s: generate tokens: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate is valid for demo.tsb.tetrate.io, not tsb.tetrate.io" [scope="controlplane"]
You can update the mp-cert by configuring the value secrets.tsb.cacert in your control plane values.yaml file, or update
the tsb-certs by configuring the values secrets.tsb.cert and secrets.tsb.key in the management plane values.yaml file.
If the certificate provided in tsb-certs is signed by a public CA such as Digicert or Let’s Encrypt you can let the default values
for the control plane CR, but if this certificate is signed by an internal CA or it's self signed you can get the following error:
error controlplane token rotation failed, retrying in 1.661766738s: generate tokens: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority" [scope="controlplane"]
If that's the case, you'll need to modify the control plane CR to set spec.managementPlane.selfSigned to true.
Remember to restart the tsb-operator-control-plane pod to generate the secrets, and once generated, restart the control plane pods.
XCP connection issues
If the newly onboarded cluster it's not reporting the cluster status or new configurations applied are not being created in the cluster,
check the edge pod logs in istio-system namespace, as even if the pod is running, it is possible that it's having some issues. For example:
warn stream error getting stream. retrying in 21.72809085s: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate is valid for xcp.tetrate.io, not tsb.tetrate.io" name=configs-4d116fd6
In this case the xcp-central-cert in tsb namespace is configured for xcp.tetrate.io but the host configured in the control plane
CR is tsb.tetrate.io. To update the certificate, you'll need to update the management plane values.yaml accordingly to this.
If edge is unable to start you can describe to pod in order to get more information. Sometimes it couldn't start due to:
Warning FailedMount 7m15s (x7 over 7m47s) kubelet MountVolume.SetUp failed for volume "xcp-central-auth-jwt" : secret "xcp-edge-central-auth-token" not found
Warning FailedMount 5m44s kubelet Unable to attach or mount volumes: unmounted volumes=[xcp-central-auth-ca], unattached volumes=[config-map-volume xcp-central-auth-jwt xcp-central-auth-ca xcp-edge-webhook-ca kube-api-access-hxk8l webhook-certs]: timed out waiting for the condition
Warning FailedMount 3m26s kubelet Unable to attach or mount volumes: unmounted volumes=[xcp-central-auth-ca], unattached volumes=[xcp-edge-webhook-ca kube-api-access-hxk8l webhook-certs config-map-volume xcp-central-auth-jwt xcp-central-auth-ca]: timed out waiting for the condition
Warning FailedMount 95s (x11 over 7m47s) kubelet MountVolume.SetUp failed for volume "xcp-central-auth-ca" : secret "xcp-central-ca-bundle" not found
Warning FailedMount 69s kubelet Unable to attach or mount volumes: unmounted volumes=[xcp-central-auth-ca], unattached volumes=[kube-api-access-hxk8l webhook-certs config-map-volume xcp-central-auth-jwt xcp-central-auth-ca xcp-edge-webhook-ca]: timed out waiting for the condition
This error is because the secret xcp-central-ca-bundle in istio-system namespace don't exist. This secret must contain the same URI SAN and must
be signed by the same root/intermediate CA as xcp-central-cert in tsb namespace. In order to configure this secret, you'll need to update the value
secrets.xcp.rootca from your control plane values.yaml file.
Remember to restart the tsb-operator-control-plane pod to generate the secrets, and once generated, restart the edge pod.