Tetrate Service BridgeVersion: 1.14.x

Vertical Pod Autoscaler (VPA)

This document explains how to use the Kubernetes Vertical Pod Autoscaler (VPA) to right-size resource requests for TSB control plane and management plane components. It covers a recommended rollout procedure, per-component starting points, verification, rollback, and reference material for advanced configuration.

Overview

VPA observes a workload's actual CPU and memory usage and recommends — or applies — adjusted resource requests. VPA right-sizes per-pod resource requests, while the Horizontal Pod Autoscaler (HPA) scales replica count in response to load. The two are complementary, not alternatives. In the context of a TSB deployment, the main value of VPA is eliminating manual request tuning as mesh size and traffic patterns change over time.

VPA works best on workloads with relatively steady CPU patterns — its recommendation engine averages usage over time, so it does not react quickly to sudden load changes. This makes it a good fit for control plane components and a poor fit for bursty traffic-handling workloads.

Prerequisites

cgroupv2 on all nodes. This is the only runtime prerequisite for in-place resize. EKS with AL2023, GKE with COS, AKS with Ubuntu 22.04+, and most modern on-prem distros default to cgroupv2.
Verify with stat -fc %T /sys/fs/cgroup: the output should be cgroup2fs.
Kubernetes 1.33 or later.
VPA installed via the upstream Kubernetes installer (./hack/vpa-up.sh). This is the installer TSB has tested against.
VPA Updater configured with --min-replicas=1. By default, the Updater requires at least two live replicas before acting on a deployment. TSB control plane components run as single replicas, so without this change VPA will silently skip them. The only indication is a log line in the Updater pod:
```
"Too few replicas" livePods=1 requiredPods=2 globalMinReplicas=2
```
Familiarity with TSB concepts and Kubernetes customization via overlays.

Rollout procedure

Use this procedure for each TSB component you plan to manage with VPA. Do not skip the observation step.

Confirm prerequisites. Ensure cgroupv2 is enabled on all nodes, Kubernetes 1.33 or later is running, VPA is installed, and the VPA Updater is configured with --min-replicas=1.
Apply a VPA object in Off mode. Target the component's Deployment and use the recommended configuration below. Off mode produces recommendations without modifying any pods.
Wait 24–48 hours under representative load. Recommendations need a real workload sample to be meaningful. Staging traffic is not representative of production.
Read the recommendations:
```
kubectl get vpa <name> -n <ns> -o jsonpath='{.status.recommendation}'
```
You will see four values per resource: target, lowerBound, upperBound, and uncappedTarget. Use lowerBound to inform minAllowed and upperBound to inform maxAllowed. See Tuning bounds.
Set minAllowed and maxAllowed. Update the VPA object based on the observed recommendations.
Switch to the appropriate update mode. Use Initial for single-replica control plane components, InPlaceOrRecreate for components that tolerate in-place resize. See Per-component starting points.
Monitor events and behavior. See Verifying VPA behavior.
Revisit bounds after major mesh changes. Significant increases in route count, traffic pattern shifts, or mesh size changes may require updated bounds.

Recommended configuration

For TSB components, the recommended baseline is:

controlledResources: ["cpu"] — manage CPU only. Memory introduces restart risk on decrease.
controlledValues: RequestsOnly — adjust requests only; leave limits as configured.

containerPolicies:
- containerName: "*"
  controlledResources: ["cpu"]
  controlledValues: RequestsOnly
  minAllowed:
    cpu: 50m       # Derive from your Off-mode observations
  maxAllowed:
    cpu: "4"       # Derive from your Off-mode observations

Why CPU only: Memory decrease is the only resize operation that can cause a restart in InPlaceOrRecreate mode (see Memory management). For most TSB deployments the restart risk outweighs the benefit of automatic memory right-sizing.

Why RequestsOnly: The default RequestsAndLimits scales limits proportionally to requests. If your limits are intentionally set high for headroom, VPA will scale them down and remove that margin. RequestsOnly lets VPA right-size requests for scheduler placement while preserving your manually-tuned ceiling.

Per-component starting points

VPA is most valuable for components with relatively steady CPU usage that benefit from right-sizing such as control plane components during config reconciliation, and operators during reconcile loops. The recommended starting point for all components is to monitor in Off mode, observe recommendations for 24–48 hours, and only graduate to an active mode when the recommendation shows a meaningful gap from current requests.

Single-replica components

For components deployed in single replicas (typically most components, including istiod, XCP central/edge, OAP, MPC, operators, management plane front-envoy) it is advised to avoid any mode that can trigger pod eviction in order to minimize downtime. In the context of istiod specifically, eviction triggers a full xDS re-sync across all sidecars and gateways, making it a high-impact event. For these components, it is recommended to graduate to Initial mode, which applies recommendations at the next natural restart (deploy, upgrade, node maintenance) without ever forcing one. Do not use InPlaceOrRecreate unless you have validated in-place resize behavior end-to-end in your environment.

Multi-replica components

Multi-replica components can graduate to InPlaceOrRecreate once the recommendation shows a significant gap from current requests and the component tolerates the restart semantics described in Memory management.

Memory management

While including memory in controlledResources is not recommended, the sections below explain what to expect from each type of resize operation and how to control the behavior.

Restart behavior by resource (`InPlaceOrRecreate` mode)

VPA applies recommendations directly on the running pod, without modifying the Deployment spec or ReplicaSet. No rollout is triggered, and the operator will not reconcile the pod back to the Deployment's values. VPA and the Deployment controller do not conflict.

Operation	Restart
CPU increase	No, in-place
CPU decrease	No, in-place
Memory increase	No, in-place
Memory decrease	Yes, see below

Memory decrease behavior

Memory decrease is the only resize operation that can cause a restart. The behavior depends on the resizePolicy field set on the container spec (not on the VPA object):

`resizePolicy` on container	Memory decrease behavior
Not set / `NotRequired`	VPA falls back to pod eviction — pod deleted, rescheduled, full cold start
`RestartContainer`	In-place container restart — pod preserved, same IP, faster than pod eviction

TSB deployments do not ship with resizePolicy set on any container, so a memory decrease recommendation will silently trigger pod eviction.

Setting `resizePolicy` via overlays

To set resizePolicy on a TSB component without it being overwritten by the operator, use TSB overlays. Find the deployment name with kubectl get deployment -n <namespace>:

overlays:
  - apiVersion: v1
    kind: Deployment
    name: <deployment-name>
    patches:
      - path: spec.template.spec.containers.[name:<container-name>].resizePolicy
        value:
          - resourceName: memory
            restartPolicy: RestartContainer
          - resourceName: cpu
            restartPolicy: NotRequired

note

The preferred approach is still to exclude memory from VPA management entirely (controlledResources: ["cpu"]) and set minAllowed.memory above observed peak usage so a decrease is never recommended. Use resizePolicy: RestartContainer only when pod identity preservation is a hard requirement.

Tuning bounds

VPA uses minAllowed and maxAllowed to constrain its recommendations. minAllowed prevents under-provisioning; maxAllowed caps runaway recommendations during anomalous load.

Bounds cannot be prescribed upfront, as istiod's footprint depends on mesh size, XCP and OAP scale with cluster count and telemetry volume. Derive bounds from your own environment using the rollout procedure.

When reading .status.recommendation, four values are returned per resource:

target: VPA's primary recommendation, what it would apply.
lowerBound: VPA's floor estimate. Use this to inform minAllowed: set minAllowed to roughly 50–75% of lowerBound to leave VPA room to maneuver while preventing under-provisioning.
upperBound: VPA's ceiling estimate. Use this to inform maxAllowed: cap it at a value you are comfortable with as a hard ceiling.
uncappedTarget: the recommendation VPA would make ignoring minAllowed and maxAllowed. Useful for detecting when your bounds are too restrictive.

Revisit bounds after major mesh changes.

Verifying VPA behavior

Recommendations and status

# Current recommendation values
kubectl get vpa <name> -n <ns> -o jsonpath='{.status.recommendation}'

# Conditions — RecommendationProvided, LowConfidence, FetchingHistory, etc.
kubectl get vpa <name> -n <ns> -o jsonpath='{.status.conditions}'

Events

# In-place resizes succeeded
kubectl get events -n <ns> --field-selector reason=InPlaceResizedByVPA

# Fell back to pod eviction
kubectl get events -n <ns> --field-selector reason=EvictedByVPA

# In-place resize attempted but failed
kubectl get events -n <ns> --field-selector reason=FailedInPlaceResize

If you see EvictedByVPA events for a component without resizePolicy set, this is the silent fallback described in Memory decrease behavior.

Updater logs

kubectl logs -n kube-system -l app=vpa-updater

Watch for "Too few replicas" (indicates --min-replicas=1 was not applied), pod selection decisions, and eviction failures.

Rollback and recovery

If a VPA recommendation degrades a component:

Immediately switch the VPA to Off mode. This stops new recommendations from being applied but does not revert pods to their original resource values — VPA does not restore previous values when disabled.
```
kubectl patch vpa <name> -n <ns> --type merge -p '{"spec":{"updatePolicy":{"updateMode":"Off"}}}'
```
Force pods back to Deployment-spec values by triggering a rollout. The Deployment spec was never modified by VPA, so a fresh rollout restores the original requests:
```
kubectl rollout restart deployment/<name> -n <ns>
```
If the issue persists, delete the VPA object entirely and verify the Deployment's resource block reflects the values you want.
Investigate before re-enabling. Common causes: bounds too low (minAllowed below actual baseline), bounds too high (maxAllowed allowed runaway), or memory included in controlledResources causing eviction.

Interactions with other controllers

PodDisruptionBudgets

VPA eviction respects PodDisruptionBudgets. If a PDB would be violated by an eviction, VPA will skip the pod and retry later. This is the expected interaction, and no special configuration is needed. For single-replica components without a PDB, eviction proceeds immediately.

Multi-cluster deployments

VPA is a per-cluster controller: it must be installed in every cluster where you want it to act, including each control plane and data plane cluster. VPA recommendations are cluster-local; bounds derived in one cluster will not transfer cleanly to another with different mesh size or telemetry volume. Run the rollout procedure independently per cluster.

Reference

VPA modes

Mode	Behavior
`Off`	Observe only mode. Recommendations appear in `.status`, nothing is applied.
`Initial`	Injects recommendations at pod creation only. Running pods are never touched.
`Recreate`	Evicts pods that drift out of range and recreates them with right-sized values. Always a full pod restart.
`InPlaceOrRecreate`	Tries in-place resize first; evicts only as a fallback. Recommended for K8s 1.33+.
`Auto`	Deprecated. Equivalent to `InPlaceOrRecreate` on K8s 1.33+. If you have existing VPAs using `Auto`, update them to `InPlaceOrRecreate` — behavior is the same.

`controlledResources`

Value	Behavior
`["cpu", "memory"]`	Manages both. Default when unspecified.
`["cpu"]`	CPU only. Memory requests and limits never touched. Recommended.
`["memory"]`	Memory only. CPU left as-is.

`controlledValues`

Value	Behavior
`RequestsAndLimits`	Scales both requests and limits, maintaining their ratio. Default when unspecified.
`RequestsOnly`	Adjusts requests only; limits stay as configured in the Deployment. Recommended.

Full example

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: istiod-vpa
  namespace: istio-system           # Must match the Deployment's namespace
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: istiod                    # Exact deployment name — no wildcards
  updatePolicy:
    updateMode: "Off"               # Start here; switch to Initial after observation
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      controlledResources: ["cpu"]
      controlledValues: RequestsOnly
      minAllowed:
        cpu: 50m                    # PLACEHOLDER — derive from your Off-mode observations
      maxAllowed:
        cpu: "4"                    # PLACEHOLDER — derive from your Off-mode observations

Only needed if you add memory to controlledResources and want VPA to decrease memory. Set in the Deployment spec, not the VPA object. Without this, memory decrease triggers pod eviction.

containers:
- name: discovery
  resizePolicy:
  - resourceName: memory
    restartPolicy: RestartContainer
  - resourceName: cpu
    restartPolicy: NotRequired

Overview​

Prerequisites​

Rollout procedure​

Recommended configuration​

Per-component starting points​

Single-replica components​

Multi-replica components​

Memory management​

Restart behavior by resource (InPlaceOrRecreate mode)​

Memory decrease behavior​

Setting resizePolicy via overlays​

Tuning bounds​

Verifying VPA behavior​

Recommendations and status​

Events​

Updater logs​

Rollback and recovery​

Interactions with other controllers​

PodDisruptionBudgets​

Multi-cluster deployments​

Reference​

VPA modes​

controlledResources​

controlledValues​

Full example​