Skip to main content
logoTetrate Service BridgeVersion: next

Canary Progressive Delivery Using Flagger

In this document, you will learn how to use Flagger to automate canary deployments with TSB. Flagger is a Kubernetes operator that automates the promotion of canary deployments supporting multiple ingress controllers such as Istio, NGINX, and Gateway API. Flagger supports several metrics provider for canary analysis such as Prometheus, Datadog, and SkyWalking. In following example, we will use SkyWalking as the metrics provider. SkyWalking provides the PromQL service that presents the Prometheus PromQL metrics APIs that can be used by Flagger for canary analysis. If you want to use Prometheus as the metrics provider, you can follow the official Flagger documentation here.

Before you get started, make sure:
✓ TSB is up and running, and GitOps has been enabled for the target cluster
✓ Flagger is installed in the target cluster. You can install Flagger by following official documentation here

GitOps deployment

Please refer Configuring Flux CD for GitOps to understand how to use Flux to deploy your applications. This document will only focus on how to use Flagger to automate canary deployments with TSB.

Deploy Application

First, let's deploy a sample application to the target cluster. In this example, we will deploy a Bookinfo application.

Bookinfo Reviews

The bookinfo deployment below is modified to only use v1 of the reviews service. This is because we will use v2 of the reviews service as the canary deployment.

Also reviews service is removed because Flagger will automatically generate the reviews service for the canary deployment. See here for more information on what Flagger generates.

kubectl create namespace bookinfo
kubectl label namespace bookinfo istio-injection=enabled
kubectl apply -f https://docs.tetrate.io/examples/flagger/bookinfo.yaml -n bookinfo

Deploy traffic generator. This will send continuous traffic to the bookinfo application.

kubectl apply -f https://docs.tetrate.io/examples/flagger/traffic-gen.yaml

To view topology, go to the TSB UI.

Apply TSB and Flagger Canary configuration

Next, we will apply the TSB configuration for the bookinfo application. This assumes you already have a Tenant named tetrate. If you don't have a Tenant, you can create one by following the TSB quickstart.

Create a file named bookinfo-tsb.yaml with the following content:

cat <<EOF > bookinfo-tsb.yaml
apiVersion: v1
kind: List
items:
- apiVersion: tsb.tetrate.io/v2
kind: Workspace
metadata:
name: bookinfo-ws
namespace: bookinfo
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: tetrate
spec:
namespaceSelector:
names:
- "*/bookinfo"
- apiVersion: traffic.tsb.tetrate.io/v2
kind: Group
metadata:
name: bookinfo-tg
namespace: bookinfo
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: tetrate
tsb.tetrate.io/workspace: bookinfo-ws
spec:
configMode: BRIDGED
namespaceSelector:
names:
- "*/bookinfo"
EOF

and apply the configuration:

kubectl apply -f bookinfo-tsb.yaml

Next create then apply canary configuration for the bookinfo application.

cat <<EOF > reviews-canary.yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: reviews-rollout
namespace: bookinfo
spec:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: reviews
service:
port: 9080
# delegation: true is REQUIRED when using Flagger with TSB
delegation: true
analysis:
# schedule interval
interval: 5m
# max number of failed metric checks before rollback
threshold: 5
# max traffic percentage routed to canary percentage (0-100)
maxWeight: 50
# canary increment step percentage (0-100)
stepWeight: 10
EOF

kubectl apply -f reviews-canary.yaml

Above configuration means Flagger will increase traffic to canary by 10% every 5 minutes until it reaches 50%. After it reaches 50%, Flagger will wait for 5 minutes before promoting the canary deployment to primary. Failed metrics checks is not used now, since we are not using any metrics analysis.

VirtualService delegation

One important thing to note is that the delegation: true field in the canary configuration. This field is required when using Flagger with TSB. If Delegation is enabled, Flagger would generate Istio VirtualService without hosts and gateway, making the service compatible with Istio delegation. This is required to avoid conflicts with TSB generated VirtualService.

Then create and apply TSB ServiceRoute that tells TSB to use Flagger generated services for the reviews service.

cat <<EOF > bookinfo-reviews-sr.yaml
apiVersion: traffic.tsb.tetrate.io/v2
kind: ServiceRoute
metadata:
name: reviews-sr
namespace: bookinfo
annotations:
tsb.tetrate.io/organization: tetrate
tsb.tetrate.io/tenant: tetrate
tsb.tetrate.io/workspace: bookinfo-ws
tsb.tetrate.io/trafficGroup: bookinfo-tg
spec:
service: bookinfo/reviews.bookinfo.svc.cluster.local
portLevelSettings:
- port: 9080
trafficType: HTTP
httpRoutes:
- name: reviews-flagger
match:
- name: port-9080
port: 9080
flagger:
canary: reviews-rollout
namespace: bookinfo
EOF

kubectl apply -f bookinfo-reviews-sr.yaml

At this point, you are ready to trigger the canary deployment.

Trigger Canary Deployment

Trigger canary deployment by updating the reviews deployment image.

kubectl set image -n bookinfo deployment/reviews reviews=docker.io/istio/examples-bookinfo-reviews-v2:1.18.0

You can check the canary status by describing the Flagger canary object.

k describe canary reviews-rollout -n bookinfo
Name:         reviews-rollout
Namespace: bookinfo
API Version: flagger.app/v1beta1
Kind: Canary
Metadata:
Creation Timestamp: 2024-02-27T08:15:56Z
Generation: 1
Resource Version: 2110146
UID: 48b41d3f-a48b-46cc-ae42-9669fc32d5a5
Spec:
Analysis:
Interval: 1m
Max Weight: 50
Step Weight: 10
Threshold: 5
Service:
Delegation: true
Port: 9080
Target Ref:
API Version: apps/v1
Kind: Deployment
Name: reviews
Status:
Canary Weight: 10
Conditions:
Last Transition Time: 2024-02-28T04:23:58Z
Last Update Time: 2024-02-28T04:23:58Z
Message: New revision detected, progressing canary analysis.
Reason: Progressing
Status: Unknown
Type: Promoted
Failed Checks: 0
Iterations: 0
Last Applied Spec: 7456477c8d
Last Promoted Spec: fd4676cc5
Last Transition Time: 2024-02-28T04:24:58Z
Phase: Progressing
Tracked Configs:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 3m27s (x3 over 15h) flagger New revision detected! Scaling up reviews.bookinfo
Normal Synced 2m27s (x3 over 15h) flagger Starting canary analysis for reviews.bookinfo
Normal Synced 2m27s (x3 over 15h) flagger Advance reviews-rollout.bookinfo canary weight 10
Normal Synced 87s (x3 over 15h) flagger Advance reviews-rollout.bookinfo canary weight 20
Normal Synced 27s (x3 over 15h) flagger Advance reviews-rollout.bookinfo canary weight 30

And you can see the VirtualService generated by Flagger with incremental traffic split between primary and canary.

kubectl get virtualservice -n bookinfo reviews -o yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
annotations:
helm.toolkit.fluxcd.io/driftDetection: disabled
kustomize.toolkit.fluxcd.io/reconcile: disabled
creationTimestamp: "2024-02-27T08:16:58Z"
generation: 16
labels:
app: reviews
name: reviews
namespace: bookinfo
ownerReferences:
- apiVersion: flagger.app/v1beta1
blockOwnerDeletion: true
controller: true
kind: Canary
name: reviews-rollout
uid: 48b41d3f-a48b-46cc-ae42-9669fc32d5a5
resourceVersion: "2111703"
uid: 52f6ea39-3b6a-4ef9-857f-848382d7113a
spec:
hosts: []
http:
- route:
- destination:
host: reviews-primary
weight: 70
- destination:
host: reviews-canary
weight: 30

When the canary deployment is successful, canary deployment will be promoted to primary and the previous primary deployment will be deleted.

Observing Canary Rollout in TSB UI

You can see the canary rollout in TSB UI. You will see following changes in the topology.

Initial state when reviews v1 is primary. Initial state

Canary is progressing. Traffic is split between reviews-primary (that point to v1 workload) and reviews (that point to v2 workload). Canary is progressing

Canary is promoted. reviews-primary is updated to point to v2 workload and v1 workload is deleted. Reviews v2 is making request to ratings service. Canary is promoted

Using SkyWalking Metrics for Canary Analysis

To use SkyWalking metrics for canary analysis, you need to create a MetricTemplate that defines the metrics query to be used by Flagger.

In the following example, we will create a MetricTemplate that uses SkyWalking PromQL service to query the apdex score of the reviews service. Refer to SkyWalking PromQL service for more information on available query in SkyWalking PromQL service.

cat <<EOF > reviews-metrics-template.yaml
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
name: reviews-metrics-template-apdex
namespace: istio-system
spec:
provider:
type: prometheus
address: http://oap.istio-system:9090/promql
query: |
service_apdex{service='-|reviews|bookinfo|c1|-', layer='MESH'}
EOF

kubectl apply -f reviews-metrics-template.yaml

Then update the canary configuration to use the MetricTemplate.

cat <<EOF > reviews-canary-metrics.yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: reviews-rollout
namespace: bookinfo
spec:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: reviews
service:
port: 9080
# delegation: true is REQUIRED when using Flagger with TSB
delegation: true
analysis:
# schedule interval
interval: 5m
# max number of failed metric checks before rollback
threshold: 5
# max traffic percentage routed to canary percentage (0-100)
maxWeight: 50
# canary increment step percentage (0-100)
stepWeight: 10
# metrics analysis configuration
metrics:
- name: apdex
templateRef:
name: apdex
namespace: istio-system
thresholdRange:
min: 99
interval: 1m
EOF

kubectl apply -f reviews-canary-metrics.yaml

Now, if you update the reviews deployment image, Flagger will use the SkyWalking metrics to analyze the canary deployment. When the canary deployment is successful, canary deployment will be promoted to primary and the previous primary deployment will be deleted. On the other hand, if the canary deployment fails the metrics check, Flagger will rollback the canary deployment to the previous primary deployment.

Following describe canary output shows the canary deployment is failed and Flagger is rolling back the canary deployment.

Status:
Canary Weight: 0
Conditions:
Last Transition Time: 2024-03-21T03:33:44Z
Last Update Time: 2024-03-21T03:33:44Z
Message: Canary analysis failed, Deployment scaled to zero.
Reason: Failed
Status: False
Type: Promoted
Failed Checks: 0
Iterations: 0
Last Applied Spec: 7bb6f89f7
Last Promoted Spec: 68d966ff59
Last Transition Time: 2024-03-21T03:38:44Z
Phase: Failed
Tracked Configs:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 12m flagger New revision detected! Restarting analysis for reviews.bookinfo
Normal Synced 11m (x4 over 25h) flagger Starting canary analysis for reviews.bookinfo
Normal Synced 11m (x4 over 25h) flagger Advance reviews-rollout.bookinfo canary weight 10
Warning Synced 7m36s (x6 over 23h) flagger Halt advancement no values found for custom metric: apdex: no values found
Warning Synced 6m36s (x5 over 19h) flagger Halt reviews-rollout.bookinfo advancement apdex 0.00 < 99
Warning Synced 5m36s (x3 over 19h) flagger Canary failed! Scaling down reviews-rollout.bookinfo
Warning Synced 5m36s (x2 over 19h) flagger Rolling back reviews-rollout.bookinfo failed checks threshold reached 5