TSB Upgrade Best Practices
Introduction
This guide outlines the best practices and procedures for upgrading Tetrate Service Bridge (TSB). It provides a structured approach to ensure a smooth and efficient upgrade process, covering all crucial aspects from pre-upgrade preparations to post-upgrade verifications. The goal is to help you minimizing downtime and maintaining TSB integrity throughout the transitions to newer version of TSB.
This guide applies to TSB upgrades across various environments and can be adapted to suit specific deployment scenarios. While the principles outlined here are broadly applicable, always refer to the latest version-specific documentation and release notes for the most up-to-date information.
Importance of careful planning and execution
A careful upgrade planning and execution is key to leveraging the latest features and improvements in TSB while maintaining system stability. Proper preparation can help identify potential issues before they occur and ensure that all necessary resources are available during the upgrade process. This guide will walk you through essential considerations, best practices, and step-by-step procedures to achieve a seamless upgrade experience. By following this guide, you'll be well-equipped to navigate the upgrade process with confidence, ensuring that your TSB deployment remains robust, secure, and up-to-date.
Pre-upgrade preparations
Upgrade planning
Schedule considerations
- Plan the upgrade during a maintenance window with minimal impact on operations.
- Allow sufficient time for the upgrade and potential rollback scenarios.
- Coordinate with all relevant teams to ensure availability during the upgrade window.
- Avoid scheduling the TSB upgrade alongside other major changes to simplify troubleshooting.
Team preparation
- Assign specific roles and responsibilities for the upgrade process.
- Ensure key personnel from Application, Database, Networking, Security, and Infrastructure teams are available.
- Conduct a pre-upgrade briefing to align all team members on the process and their roles.
- Establish a communication plan for real-time updates during the upgrade.
Communication with Tetrate support
- Notify Tetrate Customer Engineering and Support about planned upgrade activities.
- Provide a detailed timeline of your upgrade plan at least 3 working days in advance.
- Share any specific concerns or requirements unique to your environment.
- Be prepared to share dashboard access or screenshots to facilitate support.
- Refer to Working with Tetrate support for support best practices.
Test in lower environments
- Always upgrade in non-production environments before proceeding to production.
- Maintain a test environment that closely mirrors your production setup.
- Document and address any issues encountered in the test environment.
Monitor regularly
- Use TSB dashboards consistently before, during, and after the upgrade.
- Establish baseline metrics for normal operations to compare against during and after the upgrade.
Environment assessment
Version compatibility
- Verify TSB support for your Kubernetes/OpenShift version and other components
- Review the TSB Requirements documentation for your target version
- Plan any necessary infrastructure upgrades (e.g., Kubernetes or OpenShift version) well in advance of the TSB upgrade.
Infrastructure Requirements
- Ensure proper connectivity between nodes and across different Kubernetes clusters
- Test inter-cluster communication, especially if using multi-cluster features
- Verify that all required ports are open in your network configuration
Resource Health Verification
- Confirm that your Kubernetes cluster, load balancers, ingresses and required external components are functioning and running smoothly
- Check health of TSB dependencies like Elasticsearch, Postgres, Redis, DNS, and cert-manager
- Monitor resource utilization (CPU, memory, disk I/O) and address any issues before proceeding
- Ensure that you have enough resources (CPU, memory, storage) that required during upgrades.
Backup and Recovery
Critical Component Backups
- Take full backups of:
- PostgreSQL (use pg_dump or a similar tool)
- ManagementPlane CR (use kubectl or your preferred method)
- ControlPlane CR for all CPs
- Application/Gateway/TSB/Istio-related secrets
- All relevant Kubernetes resources including TSB and Istio custom resources (using tools like Velero if available).
- Verify the integrity of backups by performing a test restoration in a separate environment
Certificate management
- Document all certificates in use (including expiration dates).
- Be prepared to generate new certificates if needed.
- Ensure you have access to your certificate authority or the means to generate new certificates.
Rollback Plan
- Document detailed rollback procedures for each upgrade step
- Ensure all team members understand the criteria for initiating a rollback
- Test rollback procedures in a non-production environment if possible
- Include the rollback plan in your change management document shared with Tetrate.
Pre-upgrade Checks
Configuration Validation
- Review and document any custom configurations (e.g EnvoyFilters)
- Check manually added configurations outside of TSB for compatibility with the new version.
- Verify that all TSB components are in a healthy state
- Address any reported issues or inconsistencies before proceeding
Performance Baseline
- Establish baselines for applying configuration timings and configuration propagation.
- Document normal latencies for API calls and configuration updates.
- Capture key metrics from TSB and Istio Grafana dashboards for post-upgrade comparison.
- Share these baselines with Tetrate to aid in identifying potential issues during and after the upgrade.
Operational Checks
- Confirm all CPs are refreshing tokens at the set interval
- Verify all edge pods are syncing to central
- Check for any persistent sync issues and resolve them before upgrading
Service Resilience Verification
- Review and update redundancy/resiliency plans for critical services.
- Document and test failover procedures for essential components.
- Verify that these procedures maintain service availability during the upgrade process.
Upgrade Process
Get target TSB version
- Make sure to download target tctl version and TSB images are synced to your registry.
- If you use Helm, do helm repo update to get chart for your target version.
Upgrade TSB
- Start with upgrading TSB management plane then upgrade control plane in workload clusters.
- Follow TSB Helm Upgrade if you use Helm. The docs also has instruction for rollback.
- Follow TSB tctl Upgrade if you use tctl. The docs also has instruction for rollback.
- If you use Isolation Boundaries, make sure to check Isolation Boundaries Control Plane Upgrades.
Post-upgrade Verification
TSB component checks
- Make sure all TSB components are running.
- Examine TSB Grafana dashboards and compare with pre-upgrade values.
- Investigate any significant deviations from the baseline.
Resource usage
- Confirm TSB pods are running and consuming resources similar to pre-upgrade levels.
- Investigate any unexpected changes in resource consumption.
CP-MP synchronization
- Ensure CPs are syncing with the upgraded MP.
- Check Edge pod connectivity to the MP's Central pod.
- Verify that configuration changes are propagating correctly.
- Verify that CPs can generate necessary tokens from the MP for inter-component communication.
- Check logs for any token-related errors.
UI functionality
- Confirm the UI is loading and users can log in.
- Verify all dashboard elements are visible and accurate:
- Service metrics
- Gateway metrics
- Topology views
- Test all major UI functions (e.g., configuration updates, viewing logs).
Configuration propagation
- Compare
tctl apply
timings and config propagation with pre-upgrade baselines. - Ensure configurations are being applied within expected timeframes.
Test plan execution
- Run through the prepared upgrade test plan.
- Verify all critical functionalities:
- Service-to-service communication
- Authentication and authorization
- Traffic routing and load balancing
- Observability and tracing
Log analysis
- Review logs of different TSB components for errors and traces.
- Look for any unexpected log patterns or error messages.
Performance validation
- Conduct performance tests to ensure the upgrade hasn't introduced any regressions.
- Compare results with pre-upgrade benchmarks.
Security checks
- Verify that all security policies are still in effect and functioning as expected.
- Ensure that there are no unintended open ports or exposed services.
Summary
By following this comprehensive checklist, you can ensure a smoother TSB upgrade process and quickly identify any potential issues. Remember that every environment is unique, so you may need to adapt this checklist to your specific needs. If you encounter any problems or have questions during the upgrade, please don't hesitate to contact Tetrate Customer Support for assistance.