Skip to main content
logoTetrate Service BridgeVersion: next

Reinstall the Tetrate Management Plane

How to install a new Management Plane instance and import its configuration from a backup or existing database.

In the case that the Tetrate Management Plane fails and cannot be recovered, you will need to restore the Management Plane to resume normal operational status. This guide provides an overview of the process, and you may wish to refer to Tetrate Technical Support for assistance with this procedure.

When first deploying your Tetrate Management plane, please ensure you follow the recommended Best Practices:

  • You have a current or recent backup of the Postgres database, or database is running externally and is available
  • You have a backup of the iam-signing-key, the Root CA, and the mp-values.yaml configuration used to install the Management Plane
  • You have a copy of the authentication tokens (e.g. username and password) used to access the Postgres and Elastic databases
  • You can update the DNS name used to identify the Management Plane so that it points to the new instance
  • If preserving metrics is important, maintain the ElasticSearch database in a reliable, redundant cluster, or make regular backups so that it can be restored if necessary.

Procedure

Should the Management Plane fail or the cluster hosting the Management plane become non-operational, you will need to restore the Management Plane to resume normal operation status. The recovery is done using a helm base install.

This scenario will walk through the task of restoring configuration from our failed Management Cluster on a newly-installed Management Cluster.

Prerequisites

This guide makes the following assumptions:

  • You installed the Management Plane using Helm
  • The PostgreSQL Database is available, or can be restored from a backup.
  • The ElasticSearch Database (metrics) is available. Either, the database is external to failed cluster, it can be restored from a backup, or a fresh (empty) ElasticSearch database can be used and loss-of-metrics tolerated
  • All Certificates for the new Management Plane cluster use the same Root Certificate Authority as previous failed cluster
  • You can update any DNS record used to discover the Management Plane
  • You have a backup of the iam-signing-key
  • You have a backup of the Root CA used (if necessary)
  • You have a backup of the Helm values used to install the Management Plane (for reference)

You should review the Helm Installation Procedure and any specific notes before proceeding. Please work with Tetrate Technical Support to go through the following procedure:

  1. Restore the Postgres Database (if necessary)

    If you plan to use an external Postgres database, and the existing instance is not available:

    • Deploy or acquire a new Postgres database, taking note of the credentials (for example, username and password) that can be used to create and manage schemas, tables and contents within the database (reference)
    • Import the current or recent backup to the Postgres Database (reference)

    Wait for the restore to complete before proceeding.

    If you plan to use the embedded Postgres database, this is installed with the management plane and you'll restore the contents later in this process.

  2. Create a new Management Plane cluster

    Create a new Kubernetes cluster for the Tetrate Management Plane. Note that the management plane will be installed in the tsb namespace in this cluster. A dedicated cluster is recommended.

  3. Install Dependencies

    Install the required dependencies into the cluster. These dependencies will likely include:

    • Cert-Manager (if you're not using the bundled cert-manager instance) and related issuers/certificates. Ensure you use the same root CA
    • Any secrets that hold credentials/certificates for the Management Plane
    • The iam-signing-key from the failed Management Plane cluster - optional

    Install the iam-signing-key secret using kubectl apply:

    Restore the iam-signing-key secret into the tsb namespace:

    kubectl apply -n tsb -f source_mp_operational_secrets.yaml

    If this is not possible, you will need to reconfigure each Control Plane with a fresh secret later in this procedure.

    For more information, refer to the Helm Installation Guide.

  4. Prepare the configuration

    Using the mp-values.yaml from the original installation, update any required fields such as the hub or registry, or any other environment dependent fields if required.

    There is no need to update the Elastic/Postgres configuration if using external database instances, but you may need to adjust firewall rules.

  5. Install the Management Plane

    Perform the helm install for the Management Plane using your original mp-values.yaml (with necessary modifications), and monitor progress using:

    kubectl get pod -n tsb
    kubectl logs -f -n tsb -l name=tsb-operator

    Ensure that the front Envoy certificate and key, and the root CA and key are provided, for example through the Helm values.

  6. Restore the Postgres configuration

    If you are using the embedded Postgres database, now is the time to restore the configuration from your backup. This step is not necessary if you are using an external Postgres database and it is up-to-date.

  7. Get the Management Plane address

    Once installation has completed, obtain the front envoy public ip address, for example:

    kubectl get svc -n tsb envoy

    Log into the UI with Envoy IP Address:

    • Verify that your Tetrate configuration has been preserved in the Postgres database; look for cluster configurations (clusters will not have synced at this point) and the organizational structure (organization, tenants, workspaces) that you expect to see
    • Check the Elastic historical data if available

    This confirms that the rebuild was successful.

  8. Update DNS

    Update the DNS A Record used to locate the Management Plane with the new IP Address acquired in step 5. Remote control plane clusters will use this DNS record to communicate with the Management Plane

    Propagation may take time. Once the change has propagated, verify that you can access the Management Plane UI using the updated FQDN address.

  9. Verify Control Plane operation

    In the Management Plane UI, verify that the workload cluster Control Planes are connecting and synchronising with the new Management Plane.

    Refresh the Control Plane tokens

    The iam-signing-key is used to generate, validate and rotate tokens that are given to the Control Plane Clusters for communication to the Management Plane.

    If you could not recover and restore the original iam-signing-key, you will need to refresh the tokens on each Control Plane manually:

    1. Log into each Control Plane cluster

    2. Rotate tokens by deleting the old tokens:

      kubectl delete secret otel-token oap-token ngac-token xcp-edge-central-auth-token -n istio-system
    3. Verify that the Control Planes are now connecting to and synchronising with the new Management Plane

With a successful restore of a new Management Plane, you will have fully recovered from the failure and your Workload Clusters will be under the control of the new Management Plane instance.

Troubleshooting

The Management Plane and Control Plane installations are managed by operators. If you make a configuration change, you can monitor the operator logs to watch progress and identify any errors.

The Control Planes won't synchronize

Check the logs of ControlPlane Envoy, looking for errors regarding connections to the Management Plane or errors regarding token validation:

kubectl logs deploy/edge -n istio-system -f

Delete the existing tokens on the Control Plane as described above, and verify that these tokens are re-generated on the Control Plane.

kubectl get secrets otel-token oap-token ngac-token xcp-edge-central-auth-token -n istio-system

If the tokens are not regenerated:

  • Check the firewall rules between the Control Pane instance and the new Management Plane instance, and ensure that connections are allowed
  • Ensure that the Management Plane is using the same Root CA

Cannot Access external components such as postgres

  1. Validate the firewall rules to postgres or any other external component.
  2. Verify the credentials passed via helm or in mp-values.yaml