Managing Gateways during Upgrades
When you upgrade your ControlPlane, it may be necessary to upgrade and restart the TSB-managed Istio gateways on the cluster.
A minor istioctl version bump (e.g., 1.24.2 to 1.24.4) can produce different output from istioctl manifest generate -- different annotations, label ordering, resource limits, or sidecar configuration -- even when the gateway configuration has not changed. When the TSB CP operator applies these changes, which causes Kubernetes to roll all gateway pods.
This drives simultaneous restarts of all gateways during an operator upgrade, potentially causing traffic disruption.
Platform Operators can prepare for and manage this situation, using Gateway Reconciliation Control:
- Pause gateway reconciliation globally, by namespace or per-gateway before an upgrade
- After the upgrade, Preview what will change before re-enabling (dry-run)
- Selectively re-enable reconciliation per-namespace or per-gateway after verifying changes
- Protect gateway install CRs from accidental deletion independent of their TSB parent
The control is exposed through the IstioRevision API in the EdgeXcp CR with per-namespace overrides and per-gateway label escape hatches, following established Kubernetes operator best-practices.
Preparing for an Upgrade
Prepare for a cluster ControlPlane upgrade by identifying critical Gateway instances that should not risk being restarted during the upgrade.
A gateway will be automatically restarted if needed when gatewayReconciliation is true.
A gateway will be 'paused' and not restarted when gatewayReconciliation is false.
Pausing Collections of Gateways
To pause all gateways in a namespace, isolationBoundary or cluster-wide, edit the edge-xcp CR as follows:
apiVersion: install.xcp.tetrate.io/v1alpha1
kind: EdgeXcp
metadata:
name: edgexcp
namespace: istio-system
spec:
hub: gcr.io/xcp-istio
isolationBoundaries:
- name: prod
revisions:
- name: stable
istio:
tsbVersion: v1.8.0
# Freeze all gateways for this revision during upgrade (gatewayReconciliation = false)
gatewayReconciliation:
enabled: false
namespaceOverrides:
# But allow staging namespace to reconcile (for canary verification)
- namespace: envoy-staging
enabled: true
# And allow the test namespace
- namespace: envoy-test
enabled: true
- name: canary
revisions:
- name: canary-v2
istio:
tsbVersion: v1.8.1
# Canary revision: reconciliation enabled (default)
Override for Individual Gateways
The label on an individual Gateway has the highest precedence, and overrides settings in the EdgeXcp CR.
You can label either the Gateway Install resource (CR gateways.install.tetrate.io) in the gateway's namespace or the Gateway Deployment resource (CR gatewaydeployments.install.xcp.tetrate.io) in the istio-system namespace.
Changes to the Gateway Install resource are propagated to the Gateway Deployment resource, and the reconciliation state is obtained from the Gateway Deployment resource.
Label the Gateway Install resource
Label the Gateway Install CR and apply the label xcp.tetrate.io/gateway-reconcile=false.
-
List the Installed Gateways in the desired namespace
kubectl get gateways.install.tetrate.io -n bookinfo
NAME TYPE REVISION AGE
bookinfo-gw 6d20h -
Label the Install Gateway resource
kubectl label gateways.install.tetrate.io -n bookinfo bookinfo-gw xcp.tetrate.io/gateway-reconcile=falseUse
--overwriteif the label already exists
Label the Gateway Deployment resource
Label the Gateway Deployment CR and apply the label xcp.tetrate.io/gateway-reconcile=false.
-
List the Gateway Deployments:
kubectl get gatewaydeployments.install.xcp.tetrate.io -n istio-system
NAME AGE
unified-3791488a-1e64-4b7e-88f4-8df13a802786 5d1hIf there are multiple Gateway Deployments, find the one you want to control by extracting the uid from the gateway's Gateway Install CR:
kubectl get gateway.install.tetrate.io -n bookinfo bookinfo-gw -o yaml | yq ".metadata.uid"
3791488a-1e64-4b7e-88f4-8df13a802786 -
Label the Gateway deployment
Label the Gateway Deployment for the gateway instance you want to control:
kubectl label gatewaydeployments.install.xcp.tetrate.io -n istio-system \
unified-3791488a-1e64-4b7e-88f4-8df13a802786 \
xcp.tetrate.io/gateway-reconcile=falseLegacy Ingress GatewaysIf you are using a deprecated legacy Ingress Gateway, label the corresponding ingressdeployments.install.xcp.tetrate.io resource.
List Gateways where Reconciliation is Paused
The status for a GatewayDeployment contains the status of Gateway Reconciliation.
List All Gateways
kubectl get gatewaydeployments -A -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}: {.status.phase}{"\n"}{end}'
istio-system/unified-3791488a-1e64-4b7e-88f4-8df13a802786: RECONCILIATION_PAUSED
istio-system/unified-556e8410-e28b-41b4-a716-4c66a5740201: READY
Examine an Individual Gateway
Look at the status for an individual Gateway Deployment:
...
status:
conditions:
- lastTransitionTime: "2026-03-18T15:07:43Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: DeploymentReady
- lastTransitionTime: "2026-03-18T15:08:05.099567468Z"
message: 'Load balancer is ready with endpoint: k8s-bookinfo-bookinfo-10207a3f15-bc749bae0ad45e10.elb.eu-west-2.amazonaws.com'
reason: LoadBalancerReady
status: "True"
type: LoadBalancerReady
- lastTransitionTime: "2026-03-23T16:45:51.804136132Z"
message: 'Reconciliation paused: ObjectLabelDisabled'
reason: ObjectLabelDisabled
status: "True"
type: ReconciliationPaused
phase: RECONCILIATION_PAUSED
...
When Phase = RECONCILIATION_PAUSED, this indicates that reconciliation is paused for this Gateway Deployment. The reason can take one of the following values:
| Reason | Source | Action to Enable Reconciliation |
|---|---|---|
RevisionDisabled | gatewayReconciliation.enabled: false on IstioRevision | Set gatewayReconciliation.enabled: true in EdgeXcp CR |
NamespaceApiDisabled | namespaceOverrides entry with enabled: false | Remove or update the namespace override in EdgeXcp CR |
ObjectLabelDisabled | xcp.tetrate.io/gateway-reconcile: false label | Remove the label or set to true |
Release a Gateway to be restarted
When an Upgrade is performed, gateways with gatewayReconciliation=true will be restarted where necessary. Gateways with gatewayReconciliation=false will not be restarted.
When you reach a planned Maintenance Window, you can release any paused Gateways and allow them to be restarted. Enable the paused namespaces one by one (or all at once):
gatewayReconciliation:
enabled: false
namespaceOverrides:
- namespace: envoy-staging
enabled: true
- namespace: envoy-prod-us-east # <-- add
enabled: true
- namespace: envoy-prod-us-west # <-- add
enabled: true
Or re-enable globally:
gatewayReconciliation:
enabled: true # <-- change to true
# namespaceOverrides no longer needed, can be removed
Verify the unpaused gateways are healthy:
kubectl get deployments -n envoy-staging
kubectl get pods -n envoy-staging
Additional Topics
Debugging with the Dry-Run Endpoint
The xcp edge pod has an internal debugging endpoint that can be queried to determine the reconcile diffs for gateways that have been paused during an upgrade.
This capability is intended for debugging only, and is not designed for production environments.
Enable the XCP Edge Debug Endpoint
The Debug endpoint is disabled by default. Set the ENABLE_XCP_EDGE_DEBUG_ENDPOINTS environment variable and wait for the edge deployment to restart; the internal debug endpoint is then available on port 8090
kubectl set env -n istio-system deployment/edge ENABLE_XCP_EDGE_DEBUG_ENDPOINTS=true
# wait for the edge deployment to restart
kubectl port-forward deployment/edge -n istio-system 8090:8090
Query the Internal Debug Endpoint
Query the debug endpoint as follows:
curl -s "http://localhost:8090/debug/gateway-reconcile-diff?namespace=bookinfo"
curl -s "http://localhost:8090/debug/gateway-reconcile-diff?namespace=bookinfo&name=bookinfo-gw"
The response for a Gateway Deployment explains if a Gateway needs to be restarted, and provides a reason:
{
"gateways": [
{
"name": "bookinfo-gw ",
"namespace": "bookinfo",
"kind": "GatewayDeployment",
"revision": "stable",
"reconcileEnabled": false,
"reconcileDisabledReason": "revision_api_disabled",
"diff": {
"deployment": {
"hasChanges": true,
"summary": "Container image tag changed: 1.24.2 -> 1.24.4",
"willCauseRestart": true,
"patch": "--- current\n+++ desired\n@@ -5 +5 @@\n- image: istio/proxyv2:1.24.2\n+ image: istio/proxyv2:1.24.4"
},
"service": { "hasChanges": false },
"serviceAccount": { "hasChanges": false },
"hpa": { "hasChanges": false }
}
}
],
"summary": {
"total": 47,
"withChanges": 3,
"willCauseRestart": 2,
"paused": 45,
"enabled": 2
}
}
Metrics and Observability
Monitor the status of gateway reconciliation.
| Metric | Type | Labels | Description |
|---|---|---|---|
gateway_reconcile_skipped_total | Counter | gateway_type, reason, namespace, name | Incremented each time a gateway reconciliation is skipped due to gwReconcile. reason values: object_label_disabled, namespace_api_disabled, revision_api_disabled |
gateway_reconcile_paused | Gauge | gateway_type, namespace, name | Set to 1 when a gateway is paused, 0 when active. Enables alerting on "how many gateways are frozen" and kubectl wait-style monitoring |
gateway_reconcile_drift_detected | Counter | gateway_type, namespace, name | Incremented when the dry-run endpoint detects a diff between desired and actual state (post-1.14) |
gateway_deletion_protected_total | Counter | gateway_type, namespace, name | Incremented when gateway deletion is blocked by lifecycle protection (post-1.14) |
For example, to list the Gateways where reconciliation is paused:
kubectl port-forward -n istio-system service/xcp-operator-edge 8084:8080
curl -s http://localhost:8084/metrics | grep gateway_reconcile_paused
# HELP xcp_edge_operator_gateway_reconcile_paused Whether gateway reconciliation is currently paused (1=paused, 0=active)
# TYPE xcp_edge_operator_gateway_reconcile_paused gauge
xcp_edge_operator_gateway_reconcile_paused{gateway_name="unified-3791488a-1e64-4b7e-88f4-8df13a802786",gateway_namespace="istio-system",gateway_type="unified_gateway"} 1
The gateway_ignored_total gauge gets a new reason value to distinguish user-intentional skips from automatic revision mismatch skips:
const (
revisionMismatchReason = "revision_mismatch" // existing
gwReconcileDisabledReason = "reconcile_disabled" // NEW
)
Gateway Reconciliation Alert Rules
| Alert | PromQL | For | Severity | Description |
|---|---|---|---|---|
GatewayReconcilePausedTooLong | gateway_reconcile_paused == 1 | 7d | Warning | Gateway reconciliation has been paused for over 7 days. This may indicate a forgotten freeze after an upgrade. |
AllGatewaysFrozen | count(gateway_reconcile_paused == 1) == count(gateway_reconcile_paused) | 1h | Warning | All gateways have reconciliation paused. No gateways are being actively managed. |
GatewayReconcileSkippedRate | rate(gateway_reconcile_skipped_total[5m]) > 0 | 1h | Info | Gateway reconciliation is being actively skipped. This is expected during maintenance windows. |