Best Practices - Preparing for High Availability
The best practices to follow to prepare you for an HA deployment.
This document explains the measures to prepare when creating a failover Tetrate Management Plane deployment. You may wish to refer to Tetrate Technical Support for assistance with this procedure.
Prepare for HA
To prepare for a High Availability Deployment, ensure you follow these best practices.
Take Regular Backups of critical Basic Configuration
Backup the IAM Signing Key:
kubectl get secrets -n tsb -o yaml iam-signing-key > source_mp_operational_secrets.yaml
When necessary, you can restore this with
kubectl apply -n tsb -f source_mp_operational_secrets.yaml
Helm install: Backup the values used to install the Management Plane:
helm get values <release-name>
TCTL install: Backup the additional secrets used to install the Management Plane:
kubectl get secrets -n tsb -o yaml admin-credentials azure-credentials custom-host-ca elastic-credentials es-certs iam-oidc-client-secret ldap-credentials postgres-credentials tsb-certs xcp-central-cert iam-signing-key > source_mp_all_secrets.yaml
When necessary, you can restore these with
kubectl apply -n tsb -f source_mp_all_secrets.yaml
Store these backups in a secure location, away from the Management Plane cluster.
Use an easy-to-modify DNS name to identify your Management Plane endpoints
Control Plane clusters connect to the
front-envoy
endpoint on your Management Plane. If the Management Plane fails, you will need to point these clusters to a standby or new Management Plane instance.The easiest way to achieve this is to configure the Control Plane clusters to locate the Management Plane using a well-known, modifiable DNS address (for example, by making the address a CNAME for the specific MP instance). To perform a failover, you then just need to update that address to point to the alternate Management Plane.
Alternatively, you can reconfigure each control plane one-by-one to update the managementPlane address.
Backup the Postgres Service Configuration database
You have multiple strategies to operate the Postgres database that is used to store service configuration:
- Operate an external Postgres database service, which can be deployed for each Management Plane instance or can be shared between Active and Standby instances.
- Use TSB's embedded Postgres database service, which deploys a dedicated HA Postgres cluster within each Management Plane cluster.
Whichever method you decide, you should take regular (e.g. nightly or hourly) backups of the service configuration so that the database can be restored if needed:
- An external Postgres database can be backed up using
pg_dump
. You may wish to trim the audit logs which are stored in the database to save space, as they are generally not critical when restoring the database. - The embedded Postgres database is automatically backed up every 24 hours; a backup can be initiated at any time. Backups are stored in a local PVC and should be copied off-cluster for resilience.
Use an external, Highly-Available ElasticSearch database for metrics
TSB uses an external ElasticSearch database to store metrics, logs and traces.
The database can become very large, and it is updated frequently. In general, it is not critically necessary to preserve the database during a failover operation; TSB will tolerate the loss of historical metrics.
Take appropriate measures to maintain the Elastic database in a resilient, highly-available manner.