Resource Consumption and Capacity Planning
This document describes a conservative guideline for capacity planning of Tetrate Service Bridge (TSB) in Management and Control planes.
These parameters apply to production installations: TSB will run with minimal resources if you are using a demo-like environment.
The resource provisioning guidelines described in this document are very conservative.
Also please be aware that the resource provisioning described in this document are applicable to vertical resource scaling. Multiple replicas of the same TSB components do not share the load with each other, and therefore you cannot expect the combined resources from multiple components to have the same effect. Replicas of TSB components should only be used for high availability purposes only.
Recommended baseline production installation resource requirements
For a baseline installation of TSB with 1 registered cluster and 1 deployed service within that cluster, the following resources are recommended.
To reiterate, the amount of memory described below are very conservative. Also, the actual performance given by the number of vCPUs tend to fluctuate depending on your underlying infrastructure. You are advised to verify the results in your environment.
Component | vCPU # | Memory MiB |
---|---|---|
TSB server (Management Plane) 1 | 2 | 512 |
XCP Central Components 2 | 2 | 128 |
XCP Edge | 1 | 128 |
Front Envoy | 1 | 50 |
IAM | 1 | 128 |
TSB UI | 1 | 256 |
OAP | 4 | 5192 |
OTEL-collector | 2 | 1024 |
Zipkin | 2 | 2048 |
1 Including the Kubernetes operator and persistent data
reconciliation processes.
2 Including the Kubernetes operator.
Recommended scaling resource parameters
The TSB stack is mostly CPU-bound. Additional clusters registered with TSB via XCP increase the CPU utilization by ~4%.
The effect of additional registered clusters or additional deployed workload services on memory utilisation is almost negligible. Likewise, the effect of additional clusters or workloads on resource consumption of the majority of TSB components is mostly negligible, with the notable exceptions of TSB, XCP Central component, TSB UI and IAM.
Components that are part of the visibility stack (e.g. OTel/Zipkin, etc.) have their resource utilisation driven by requests, thus the resource scaling should follow the user request rate statistics. As a general rule of thumb, more than 1 vCPU is preferred. It is also important to notice that the visibility stack performance is largely bound by Elasticsearch performance.
Thus, we recommend vertically scaling the components by 1 vCPU for a number of deployed workflows:
Management Plane
Besides OAP, All components don't require any resource adjustment. Those components are architectured and tested to support very large clusters.
OAP in Management plane requires extra CPU and Memory ~ 100 millicores of CPU and 1024 MiB of RAM per every 1000 services. E.g. 4000 services aggregated in TSB Management Plane from all TSB clusters would require approximately 400 millicores of CPU and 4096 MiB of RAM in total.
Control Plane Resource Requirements
Following table shows typical peak resource utilization for TSB control plane with the following assumptions:
- 50 services with sidecars
- Traffic on entire cluster is 500 repository
- Zipkin sampling rate is 1% of the traffic
- Metric is captured for every request at every workload.
Note that average CPU utilization would be a fraction of the typical peak value.
Component | Typical Peak CPU (m) | Typical Peak Memory (Mi) |
---|---|---|
Istiod | 300m | 250Mi |
OAP | 2500m | 2500Mi |
Zipkin | 200m | 1000Mi |
XCP Edge | 100m | 100Mi |
Istio Operator - Control Plane | 50m | 100Mi |
Istio Operator - Data Plane | 150m | 100Mi |
TSB Control Plane Operator | 100m | 100Mi |
TSB Data Plane Operator | 150m | 100Mi |
OTEL Collector | 50m | 100Mi |
TSB/Istio Operator resource usage per Ingress Gateway
The following table shows the resources used by TSB Operator and Istio Operator per Ingress Gateways
Keep in mind that these are estimated numbers depending on your application deployed, this can vary, but you can have a general idea of the consumption with these values
Ingress Gateways | TSB Operator CPU(m) | TSB Operator Mem(Mi) | Istio Operator CPU(m) | Istio Operator Mem(Mi) |
---|---|---|---|---|
0 | 100m | 50Mi | 10m | 45Mi |
50 | 2600m | 125Mi | 1100m | 120Mi |
100 | 3500m | 200Mi | 1300m | 175Mi |
150 | 3800m | 250Mi | 1400m | 200Mi |
200 | 4000m | 325Mi | 1400m | 250Mi |
250 | 4700m | 325Mi | 1750m | 300Mi |
300 | 5000m | 475Mi | 1750m | 400Mi |
Component resource utilization
The following tables will show how the different components of TSB scale with 4000 services and peaking with 60 rpm, this is divided by information from the Management Plane, and the Control Plane.
Management Plane
Services | Gateways | Traffic(rpm) | Central CPU(m) | Central Mem(Mi) | MPC CPU(m) | MPC Mem(Mi) | OAP CPU(m) | OAP Mem(Mi) | Otel CPU(m) | Otel Mem(Mi) | TSB CPU(m) | TSB Mem(Mi) | Zipkin CPU(m) | Zipkin Mem(Mi) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 rpm | 3m | 39Mi | 5m | 30Mi | 37m | 408Mi | 22m | 108Mi | 14m | 57Mi | 2m | 708Mi |
420 | 7 | 600 rpm | 4m | 42Mi | 15m | 31Mi | 116m | 736Mi | 24m | 123Mi | 50m | 63Mi | 14m | 835Mi |
820 | 9 | 600 rpm | 4m | 54Mi | 24m | 34Mi | 43m | 909Mi | 26m | 127Mi | 85m | 75Mi | 25m | 948Mi |
1220 | 11 | 600 rpm | 4m | 59Mi | 32m | 41Mi | 28m | 1141Mi | 27m | 210Mi | 213m | 78Mi | 25m | 954Mi |
1620 | 13 | 600 rpm | 5m | 63Mi | 44m | 48Mi | 209m | 1475Mi | 29m | 249Mi | 113m | 86Mi | 25m | 957Mi |
2020 | 15 | 600 rpm | 5m | 73Mi | 41m | 51Mi | 51m | 1655Mi | 24m | 319Mi | 211m | 91Mi | 27m | 957Mi |
2420 | 17 | 300 rpm | 4m | 84Mi | 72m | 62Mi | 57m | 1910Mi | 29m | 381Mi | 227m | 97Mi | 27m | 755Mi |
2820 | 19 | 60 rpm | 5m | 90Mi | 73m | 65Mi | 43m | 2136Mi | 16m | 466Mi | 275m | 104Mi | 27m | 770Mi |
3220 | 21 | 60 rpm | 5m | 106Mi | 85m | 78Mi | 89m | 2600Mi | 43m | 574Mi | 382m | 108Mi | 27m | 802Mi |
3620 | 23 | 60 rpm | 5m | 123Mi | 94m | 71Mi | 245m | 2772Mi | 37m | 578Mi | 625m | 115Mi | 27m | 825Mi |
4020 | 25 | 60 rpm | 5m | 147Mi | 90m | 81Mi | 521m | 3224Mi | 15m | 704Mi | 508m | 122Mi | 27m | 856Mi |
IAM will peak at 509m/52Mi, LDAP at 2m/17Mi and XCP Operator at 9m/37Mi
Control Plane
Services | Gateways | Traffic(rpm) | Edge CPU(m) | Edge Mem(Mi) | Istiod CPU(m) | Istiod Mem(Mi) | OAP CPU(m) | OAP Mem(Mi) | Otel CPU(m) | Otel Mem(Mi) | Zipkin CPU(m) | Zipkin Mem(Mi) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 rpm | 6m | 49Mi | 9m | 53Mi | 48m | 610Mi | 26m | 80Mi | 25m | 723Mi |
400 | 2 | 600 rpm | 350m | 120Mi | 600m | 600Mi | 900m | 1510Mi | 27m | 86Mi | 75m | 931Mi |
800 | 4 | 600 rpm | 700m | 230Mi | 2170m | 1140Mi | 1720m | 2310Mi | 32m | 91Mi | 123m | 1030Mi |
1200 | 6 | 600 rpm | 1010m | 366Mi | 2680m | 1890Mi | 2630m | 3280Mi | 35m | 101Mi | 139M | 1080Mi |
1600 | 8 | 600 rpm | 1600m | 438Mi | 2690m | 2490Mi | 3610m | 4030Mi | 41m | 180Mi | 180m | 1070Mi |
2000 | 10 | 600 rpm | 1900m | 514Mi | 3240m | 3820Mi | 4470m | 5890Mi | 43m | 106Mi | 209m | 1080Mi |
2400 | 12 | 300 rpm | 682m | 628Mi | 2010m | 4660Mi | 3910m | 5750Mi | 37m | 110Mi | 281m | 1070Mi |
4000 | 20 | 600 rpm | 1470m | 1040Mi | 3730m | 9790Mi | 13300m | 35000Mi | 37m | 135Mi | 465m | 1100Mi |
Metric Server will peak at 11m/32Mi, Onboarding Operator at 6m/38Mi, and XCP-Operator at 11m/46Mi