Overview for the manual reinstall and failover process
How to install a new Management Plane instance and prepare for failover.
These instructions differ slightly, depending on whether you are using an external Postgres database (either shared or dedicated to each management plane instance), or the embedded postgres database:
- If you are using your own, external postgres implementation, refer to the install when using external postgres instructions
- If you are using the embedded postgres implementation, refer to the install when using embedded postgres instructions
The solution provided here is an alternative to the automated synchronization Active-Standby approach.
In the case that the Tetrate Management Plane fails and cannot be recovered, you will need to restore the Management Plane to resume normal operational status. This guide provides an overview of the process, and you may wish to refer to Tetrate Technical Support for assistance with this procedure.
When first deploying your Tetrate Management plane, please ensure you follow the recommended Best Practices:
- You have a current or recent backup of your Postgres database, or database is running externally and is available
- You have a backup of the iam-signing-key, the Root CA, and the mp-values.yaml configuration used to install the Management Plane
- You have a copy of the authentication tokens (e.g. username and password) used to access the Postgres and Elastic databases
- You can update the DNS name used to identify the Management Plane so that it points to the new instance
If preserving metrics is important, maintain the ElasticSearch database in a reliable, redundant cluster, or make regular backups so that it can be restored if necessary.
Be prepared
You can deploy a standby Management Plane instance and perform regular database restores to this instance, so that you can fail over quickly, or you can install the Management Plane instance on demand and import the most recent database backup.
When you deploy the new Management Plane, it must have the same configuration so that it can smoothly take control in place of the failed, Active instance:
- Management Plane Deployment: Both MPs are installed using Helm with the same values file.
- PostgreSQL Configuration: The PostgreSQL secret must match in each Management Plane instance
- Certificates & Secrets: Both MPs use the same set of certificates, the same iam-signing-key secret, and the same authentication tokens for the Elastic database.
Procedure
Should the Management Plane fail or the cluster hosting the Management plane become non-operational, you will need to restore the Management Plane to resume normal operation status. The recovery is done using a helm base install.
Prerequisites
This guide makes the following assumptions:
- You installed the Management Plane using Helm
- The PostgreSQL Database is available, or can be restored from a backup, or you are using the embedded postgres implementation
- The ElasticSearch Database (metrics) is available. Either, the database is external to failed cluster, it can be restored from a backup, or a fresh (empty) ElasticSearch database can be used and loss-of-metrics tolerated
- All Certificates for the new Management Plane cluster use the same Root Certificate Authority as previous failed cluster
- You can update any DNS record used to discover the Management Plane
- You have a backup of the iam-signing-key
- You have a backup of the Root CA used (if necessary)
- You have a backup of the Helm values used to install the Management Plane (for reference)