Skip to main content
logoTetrate Service BridgeVersion: next

Priority based Multi-cluster Traffic routing and Failover

When designing a resilient and efficient system architecture, one often encounters the need to manage traffic distribution within the local cluster and nodes versus distributing it across different clusters based on regions or fault domains. This decision heavily impacts the system's reliability, latency, and fault tolerance.

This document describes how failoverSettings can be configured for multi-cluster traffic routing and failover using topologyChoice and failoverPriority settings.

What is the use-case?

Here are some of the common scenarios that can be addressed using failoverSettings

Local Traffic Management

  • Keeping the traffic within the local cluster and nodes ensures low latency and high throughput as communication occurs within the same network boundary.
  • It is suitable for scenarios where services communicate frequently and require fast data transfer without incurring the overhead network hops.
  • Ability to configure service to service communications local to the cluster within nodes vs across different availability zones specific to set of services using Workspace Settings or as an Organization Settings.

Global Traffic Distribution

  • Distributing traffic across different clusters based on regions or fault domains enhances fault tolerance and resilience against regional failures.
  • It allows for load balancing and scaling services based on geographical proximity to users, optimizing latency and providing better user experience.
  • Active-Active and Active-Standby endpoint groups define how traffic is managed across multiple clusters for failover scenarios.
  • Active-Active configurations distribute traffic evenly across multiple instances or clusters, while Active-Standby setups designate primary and secondary instances for failover.

How to Configure FailoverSettings?

In this guide, you are going to -

✓ Deploy bookinfo application distributed across multiple application clusters with an IngressGateway deployed on each.
✓ Deploy UnifiedGateway in one of the cluster as Tier1 to distribute the traffic to multiple application clusters configured as Tier2.
✓ Configure FailoverSettings with different configuration options to load balance and failover the services within the same availability zone as well as across multiple regions.

Before you get started, make sure:

✓ TSB is up and running, and GitOps has been enabled for the target cluster
✓ Familiarize yourself with TSB concepts
✓ Completed TSB usage quickstart. This document assumes you are familiar with Tenant Workspace and Config Groups.

Scenario 1: Local Traffic Management Using TopologyChoice

Starting from TSB version 1.9.0, topologyChoice can be used under WorkspaceSettings for set of application workloads or as an organization wide settings under OrganizationSettings to configure traffic management.

apiVersion: tsb.tetrate.io/v2
kind: WorkspaceSetting
metadata:
name: workspace-setting
annotations:
tsb.tetrate.io/organization: <organization_name>
tsb.tetrate.io/tenant: <tenant_name>
tsb.tetrate.io/workspace: <workspace_name>
spec:
failoverSettings:
topologyChoice: CLUSTER or LOCALITY

There are 2 options:-

  • CLUSTER: Gives higher priority to endpoints which are local to the cluster using failoverPriority labels configured for each proxy endpoints.
  • LOCALITY: Gives higher priority based on locality ( region/zone/subzone ) than endpoints which are local to the cluster.

Configuration

In this example, we are going to expose only reviews-v3 service in both cp-cluster-1 and cp-cluster-2 for EastWest routing and see how the load balancing behaviour is going to change based on topologyChoice configuration.

Deploy Bookinfo application

Create the namespace payment in both cp-cluster-1 and cp-cluster-2 and enable istio-injection: enabled.

kubectl create namespace payment
kubectl label namespace payment istio-injection=enabled --overwrite=true
kubectl apply -f https://docs.tetrate.io/examples/flagger/bookinfo.yaml -n payment

Create TSB configurations Tenant/Workspace/Group for bookinfo

bookinfo.yaml
apiVersion: tsb.tetrate.io/v2
kind: Tenant
metadata:
name: payment
annotations:
tsb.tetrate.io/organization: tetrate
spec:
displayName: Payment
---
apiVersion: tsb.tetrate.io/v2
kind: Workspace
metadata:
name: payment-ws
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: payment
spec:
namespaceSelector:
names:
- "cp-cluster-1/payment"
- "cp-cluster-2/payment"
displayName: payment-ws

Apply configuration using kubectl

kubectl apply -f bookinfo.yaml -n payment

Create Bookinfo IngressGateway Configuration

apiVersion: gateway.tsb.tetrate.io/v2
kind: Group
metadata:
name: bookinfo-gg
namespace: bookinfo
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: payment
tsb.tetrate.io/workspace: payment-ws
spec:
displayName: Bookinfo Gateway Group
namespaceSelector:
names:
- "cp-cluster-1/payment"
- "cp-cluster-2/payment"
configMode: BRIDGED
---
apiVersion: gateway.tsb.tetrate.io/v2
kind: Gateway
metadata:
name: bookinfo-ingress-gateway
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: payment
tsb.tetrate.io/workspace: payment-ws
tsb.tetrate.io/gatewayGroup: bookinfo-gg
spec:
displayName: Bookinfo Ingress
workloadSelector:
namespace: payment
labels:
app: bookinfo-gateway
http:
- hostname: bookinfo.tetrate.io
name: bookinfo-tetrate
port: 80
routing:
rules:
- route:
serviceDestination:
host: payment/productpage.payment.svc.cluster.local
port: 9080

Apply configuration using kubectl

kubectl apply -f bookinfo.yaml -n payment

Deploy UnifiedGateway as Tier1 Gateway in cp-cluster-3 to distribute the traffic to cp-cluster-1 and cp-cluster-2

gateway-config.yaml
apiVersion: tsb.tetrate.io/v2
kind: Tenant
metadata:
name: tier1
annotations:
tsb.tetrate.io/organization: tetrate
spec:
displayName: Tier1
---
apiVersion: tsb.tetrate.io/v2
kind: Workspace
metadata:
name: tier1-ws
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: tier1
spec:
displayName: Tier1
namespaceSelector:
names:
- "cp-cluster-3/tier1"
---
apiVersion: gateway.tsb.tetrate.io/v2
kind: Group
metadata:
name: tier1-gg
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: tier1
tsb.tetrate.io/workspace: tier1-ws
spec:
displayName: Tier1 Gateway Group
namespaceSelector:
names:
- "cp-cluster-3/tier1"
configMode: BRIDGED
---
apiVersion: install.tetrate.io/v1alpha1
kind: Gateway
metadata:
name: tier1-gateway
namespace: tier1
spec:
type: INGRESS
kubeSpec:
service:
type: LoadBalancer
---
apiVersion: gateway.tsb.tetrate.io/v2
kind: Gateway
metadata:
name: t1-gateway
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: tier1
tsb.tetrate.io/workspace: tier1-ws
tsb.tetrate.io/gatewayGroup: tier1-gg
spec:
displayName: T1 Gateway
workloadSelector:
namespace: tier1
labels:
app: tier1-gateway
http:
- hostname: bookinfo.tetrate.io
port: 80
name: bookinfo-tetrate
routing:
rules:
- route:
clusterDestination: {}

Apply configuration using kubectl

kubectl apply -f gateway-config.yaml -n tier2 

TopologyChoice as CLUSTER

Configure workspace settings with topologyChoice as CLUSTER first.

apiVersion: tsb.tetrate.io/v2
kind: WorkspaceSetting
metadata:
name: payment-wss
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: payment
tsb.tetrate.io/workspace: payment-ws
spec:
failoverSettings:
topologyChoice: LOCALITY
defaultEastWestGatewaySettings:
- workloadSelector:
namespace: payment
labels:
app: bookinfo-gateway
exposedServices:
- serviceLabels:
failover: enable
app: reviews

Apply configuration using kubectl

kubectl apply -f gateway-config.yaml -n tier2 

Send some requests to Tier1 GW service endpoint in cp-cluster-3 and observe traffic in topology UI.

Export Tier1 Gateway Service IP

export GATEWAY_IP=$(kubectl -n tier1 get service tier1-gateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

Send Request

curl http://bookinfo.tetrate.io/api/v1/products/1/reviews --resolve "bookinfo.tetrate.io:80:$GATEWAY_IP" -v

Topology

As you can see, traffic from productpage to reviews-v3 remains within the cluster, though EastWest Gateway is enabled and remote endpoints are available for the other cluster.

TopologyChoice CLUSTER

TopologyChoice as LOCALITY

Configure workspace settings with topologyChoice as LOCALITY this time.

apiVersion: tsb.tetrate.io/v2
kind: WorkspaceSetting
metadata:
name: payment-wss
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: payment
tsb.tetrate.io/workspace: payment-ws
spec:
failoverSettings:
topologyChoice: LOCALITY
defaultEastWestGatewaySettings:
- workloadSelector:
namespace: payment
labels:
app: bookinfo-gateway
exposedServices:
- serviceLabels:
failover: enable
app: reviews

Apply configuration using kubectl

kubectl apply -f gateway-config.yaml -n tier2 

Send some requests to Tier1 GW service endpoint in cp-cluster-3 and observe traffic in topology UI.

Export Tier1 Gateway Service IP

export GATEWAY_IP=$(kubectl -n tier1 get service tier1-gateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

Send Request

curl http://bookinfo.tetrate.io/api/v1/products/1/reviews --resolve "bookinfo.tetrate.io:80:$GATEWAY_IP" -v

Topology

As you can see, traffic from productpage to reviews-v3 distributed across clusters.

TopologyChoice LOCALITY

Failover

When reviews-v3 becomes unavailable in any of the clusters, it will failover to the other cluster irrespective of the value configured as topologyChoice

TopologyChoice FAILOVER

Scenario 2: Global Traffic Distribution Using FailoverPriority

Starting from TSB version 1.9.0, failoverPriority can be used under WorkspaceSettings for set of application workloads or as an organization wide settings under OrganizationSettings to configure global traffic distribution.

What is the use-case?

In a multi-cluster scenario, where a Tier1 Gateway is responsible for routing traffic to multiple Tier2 clusters deployed across cp-cluster-1, cp-cluster-2, and cp-cluster-3 provisioned across different regions, you can utilize TSB specific labels with prefixes like failover.tetrate.io/* to configure fault domains or grouping. This allows for defining active-active and active-standby sets of Tier2 Gateway endpoints, ensuring fault tolerance and resilience against regional failures.

In this setup, we have provisioned cp-cluster-1 and cp-cluster-2 in one region i.e us-west-1 and cp-cluster-2 in different region i.e us-east-1

Deploy Bookinfo application

Create the namespace payment in cp-cluster-1, cp-cluster-2 & cp-cluster-3 and enable istio-injection: enabled.

kubectl create namespace payment
kubectl label namespace payment istio-injection=enabled --overwrite=true
kubectl apply -f https://docs.tetrate.io/examples/flagger/bookinfo.yaml -n payment

Create TSB configurations Tenant/Workspace/Group for bookinfo

bookinfo.yaml
apiVersion: tsb.tetrate.io/v2
kind: Tenant
metadata:
name: payment
annotations:
tsb.tetrate.io/organization: tetrate
spec:
displayName: Payment
---
apiVersion: tsb.tetrate.io/v2
kind: Workspace
metadata:
name: payment-ws
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: payment
spec:
namespaceSelector:
names:
- "cp-cluster-1/payment"
- "cp-cluster-2/payment"
- "cp-cluster-3/payment"
displayName: payment-ws

Apply configuration using kubectl

kubectl apply -f bookinfo.yaml -n payment

Create Bookinfo IngressGateway Configuration

apiVersion: gateway.tsb.tetrate.io/v2
kind: Group
metadata:
name: bookinfo-gg
namespace: bookinfo
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: payment
tsb.tetrate.io/workspace: payment-ws
spec:
displayName: Bookinfo Gateway Group
namespaceSelector:
names:
- "cp-cluster-1/payment"
- "cp-cluster-2/payment"
- "cp-cluster-3/payment"
configMode: BRIDGED
---
apiVersion: gateway.tsb.tetrate.io/v2
kind: Gateway
metadata:
name: bookinfo-ingress-gateway
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: payment
tsb.tetrate.io/workspace: payment-ws
tsb.tetrate.io/gatewayGroup: bookinfo-gg
spec:
displayName: Bookinfo Ingress
workloadSelector:
namespace: payment
labels:
app: bookinfo-gateway
http:
- hostname: bookinfo.tetrate.io
name: bookinfo-tetrate
port: 80
routing:
rules:
- route:
serviceDestination:
host: payment/productpage.payment.svc.cluster.local
port: 9080

Apply configuration using kubectl

kubectl apply -f bookinfo.yaml -n payment

Create UnifiedGateway as Tier1 Gateway to distribute the traffic to Tier2 Gateways deployed in cp-cluster-1, cp-cluster-2 & cp-cluster-3

gateway-config.yaml
apiVersion: tsb.tetrate.io/v2
kind: Tenant
metadata:
name: tier1
annotations:
tsb.tetrate.io/organization: tetrate
spec:
displayName: Tier1
---
apiVersion: tsb.tetrate.io/v2
kind: Workspace
metadata:
name: tier1-ws
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: tier1
spec:
displayName: Tier1
namespaceSelector:
names:
- "cp-cluster-3/tier1"
---
apiVersion: gateway.tsb.tetrate.io/v2
kind: Group
metadata:
name: tier1-gg
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: tier1
tsb.tetrate.io/workspace: tier1-ws
spec:
displayName: Tier1 Gateway Group
namespaceSelector:
names:
- "cp-cluster-3/tier1"
configMode: BRIDGED
---
apiVersion: install.tetrate.io/v1alpha1
kind: Tier1Gateway
metadata:
name: tier1-gateway
namespace: tier1
spec:
type: INGRESS
kubeSpec:
service:
type: LoadBalancer
---
apiVersion: gateway.tsb.tetrate.io/v2
kind: Gateway
metadata:
name: t1-gateway
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: tier1
tsb.tetrate.io/workspace: tier1-ws
tsb.tetrate.io/gatewayGroup: tier1-gg
spec:
displayName: T1 Gateway
workloadSelector:
namespace: tier1
labels:
app: tier1-gateway
http:
- hostname: bookinfo.tetrate.io
port: 80
name: bookinfo-tetrate
routing:
rules:
- route:
clusterDestination: {}

Apply configuration using kubectl

kubectl apply -f gateway-config.yaml -n tier2 

How FailoverPriority Labels are Configured

To configure failover priority labels for each Tier2 Gateway proxy endpoints, you have to label the Bookinfo Tier2 Gateway services as shown below

cp-cluster-1 & cp-cluster-2: active-active

apiVersion: v1
kind: Service
metadata:
labels:
app: bookinfo-gateway
failover.tetrate.io/fault-domain: active-active

cp-cluster-3: active-standby

apiVersion: v1
kind: Service
metadata:
labels:
app: bookinfo-gateway
failover.tetrate.io/fault-domain: active-standby

Default Behaviour

By default TSB configures topologyChoice: CLUSTER, which keeps the traffic within the same nodes/cluster based on locality of the originating client.

Topology

As you can see, traffic from Tier1 Gateway to Tier2 Gateway and to productpage remains within the same cluster i.e cp-cluster-1.

FailoverPriority Default

Configure FailoverPriority Settings

Here we have configured our custom failoverPriority label starting with TSB specific prefix i.e failover.tetrate.io/fault-domain

tier1-wss.yaml
apiVersion: tsb.tetrate.io/v2
kind: WorkspaceSetting
metadata:
name: tier1-wss
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: tier1
tsb.tetrate.io/workspace: tier1-ws
spec:
failoverSettings:
failoverPriority:
- failover.tetrate.io/fault-domain

Apply using kubectl

kubectl apply -f tier1-wss.yaml -n tier1

Topology

As you can see, now traffic is distributed across all Tier2 Gateway endpoints irrespective of their locality. This is happening because, we have not configured failover.tetrate.io labels in the client proxy ie Tier1 Gateway in cluster-1, hence it is giving equal priority to all endpoints across cluster-1, cluster-2 and cluster-3.

FailoverPriority Default

Apply Custom Failover Priority Labels at the Client Workloads

Modify tier1-gateway workloads and add the following label. This enforces istio/envoy to only match those endpoints which are having the same fault-domain labels i.e failover.tetrate.io/fault-domain: active-active.

tier1-gateway-patch.yaml
apiVersion: install.tetrate.io/v1alpha1
kind: Gateway
metadata:
name: tier1-gateway
namespace: tier1
spec:
type: INGRESS
kubeSpec:
service:
type: LoadBalancer
overlays:
- apiVersion: apps/v1
kind: Deployment
name: tier1-gateway
patches:
- path: spec.template.metadata.labels.failover\.tetrate\.io/fault-domain
value: active-active

Apply using kubectl

kubectl apply -f tier1-gateway-patch -n tier1

Topology

As you can see, now the traffic is limited to only cp-cluster-1 and cp-cluster-2 which is matching failover.tetrate.io/fault-domain: active-active label with corresponding Tier2 Gateway Endpoints.

FailoverPriority Custom

Trigger Failover

Now we are going to trigger failover within active-active domain and then to active-standby domain

Failover within active-active domain

Scale down Tier2 Gateway workloads in cp-cluster-1

kubectl scale --replicas=0 deployment bookinfo-gateway -n payment

Topology

As you can see, now the traffic failed over to cp-cluster-2 as the Tier2 Gateway endpoints in cp-cluster-2 are matching with client ( Tier1 Gateway ) label i.e failover.tetrate.io/fault-domain: active-active

Failover within active-active

Failover from active-active to active-standby

Scale down Tier2 Gateway workloads in cp-cluster-2

kubectl scale --replicas=0 deployment bookinfo-gateway -n payment

Topology

As you can see, now the traffic failed over to cp-cluster-3 based on failover priority label key failover.tetrate.io/fault-domain: active-standby

Failover to active-standby