This document describes a conservative guideline for capacity planning of Tetrate Service Bridge (TSB) in Management and Control planes.
These parameters apply to production installations: TSB will run with minimal resources if you are using a demo-like environment.
The resource provisioning guidelines described in this document are very conservative.
Also please be aware that the resource provisioning described in this document are applicable to vertical resource scaling. Multiple replicas of the same TSB components do not share the load with each other, and therefore you cannot expect the combined resources from multiple components to have the same effect. Replicas of TSB components should only be used for high availability purposes only.
Recommended baseline production installation resource requirements
For a baseline installation of TSB with 1 registered cluster and 1 deployed service within that cluster, the following resources are recommended.
To reiterate, the amount of memory described below are very conservative. Also, the actual performance given by the number of vCPUs tend to fluctuate depending on your underlying infrastructure. You are advised to verify the results in your environment.
|Component||vCPU #||Memory MiB|
|TSB server (Management Plane) 1||2||512|
|XCP Central Components 2||2||128|
1 Including the Kubernetes operator and persistent data
2 Including the Kubernetes operator.
Recommended scaling resource parameters
The TSB stack is mostly CPU-bound. Additional clusters registered with TSB via XCP increase the CPU utilization by ~4%.
The effect of additional registered clusters or additional deployed workload services on memory utilisation is almost negligible. Likewise, the effect of additional clusters or workloads on resource consumption of the majority of TSB components is mostly negligible, with the notable exceptions of TSB, XCP Central component, TSB UI and IAM.
Components that are part of the visibility stack (e.g. OTel/Zipkin, etc.) have their resource utilisation driven by requests, thus the resource scaling should follow the user request rate statistics. As a general rule of thumb, more than 1 vCPU is preferred. It is also important to notice that the visibility stack performance is largely bound by Elasticsearch performance.
Thus, we recommend vertically scaling the components by 1 vCPU for a number of deployed workflows:
Besides OAP, All components don't require any resource adjustment. Those components are architectured and tested to support very large clusters.
OAP in Management plane requires extra CPU and Memory ~ 100 millicores of CPU and 1024 MiB of RAM per every 1000 services. E.g. 4000 services aggregated in TSB Management Plane from all TSB clusters would require approximately 400 millicores of CPU and 4096 MiB of RAM in total.