Skip to main content
logoTetrate Service BridgeVersion: next

Local Rate Limiting

In this document, we will enable a rate limit in the Ingress Gateway and show how to rate limit based on the HTTP request user-agent string.

Before you get started, make sure you:
✓ Familiarize yourself with TSB concepts
✓ Install the TSB environment. You can use TSB demo for quick install
✓ Completed TSB usage quickstart. This document assumes you already created Tenant and are familiar with Workspace and Config Groups. Also you need to configure tctl to your TSB environment.

Deploy httpbin Service

Deploy a test httpbin service as follows:

NS=httpbin

kubectl create namespace ${NS}
kubectl label namespace ${NS} istio-injection=enabled --overwrite=true

kubectl apply -n ${NS} -f https://raw.githubusercontent.com/istio/istio/master/samples/httpbin/httpbin.yaml

cat <<EOF > ${NS}-ingressgw.yaml
apiVersion: install.tetrate.io/v1alpha1
kind: Gateway
metadata:
name: ${NS}-ingressgw
namespace: ${NS}
spec:
kubeSpec:
service:
type: LoadBalancer
EOF

kubectl apply -f ${NS}-ingressgw.yaml
Cloud-provider specific annotations

Some cloud providers may require additional annotations. For example, on AWS:

spec:
kubeSpec:
service:
type: LoadBalancer
annotations:
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

Create the TSB resources to contain the configuration

Create a Workspace and Gateway Group to contain the TSB configuration. Set ORG and TEN to the names of your Tetrate organization and tenant:

ORG=tetrate
TEN=tetrate

cat <<EOF > ${NS}-wsconfig.yaml
apiversion: api.tsb.tetrate.io/v2
kind: Workspace
metadata:
organization: ${ORG}
tenant: ${TEN}
name: ${NS}-ws
spec:
namespaceSelector:
names:
- "*/${NS}"
---
apiVersion: gateway.tsb.tetrate.io/v2
kind: Group
metadata:
organization: ${ORG}
tenant: ${TEN}
workspace: ${NS}-ws
name: ${NS}-gwgroup
spec:
namespaceSelector:
names:
- "*/${NS}"
EOF

tctl apply -f ${NS}-wsconfig.yaml

Expose the application using a simple Gateway resource

cat <<EOF > ${NS}-gw.yaml
apiVersion: gateway.tsb.tetrate.io/v2
kind: Gateway
metadata:
organization: ${ORG}
tenant: ${TEN}
workspace: ${NS}-ws
group: ${NS}-gwgroup
name: ${NS}-gw
spec:
workloadSelector:
namespace: ${NS}
labels:
app: ${NS}-ingressgw
http:
- name: httpbin
port: 80
hostname: "httpbin.tetrate.io"
routing:
rules:
- route:
serviceDestination:
host: "${NS}/httpbin.${NS}.svc.cluster.local"
port: 8000
EOF

tctl apply -f ${NS}-gw.yaml

Test the application

Determine the public endpoint (IP address or DNS name) for the gateway:

kubectl get svc -n ${NS} ${NS}-ingressgw

Access the application as follows. Set GW to the public IP address or DNS name:

GW=k8s-httpbin-httpbini-4a722ad4c0-ec822974540ebfb1.elb.eu-west-1.amazonaws.com

curl -H "Host: httpbin.tetrate.io" http://${GW}/
Allow time to provision gateways

It can take 5-10 minutes for a cloud platform to provision the downstream load balancer instances and DNS (if used), before you can access the service.

You can send a steady stream of requests using the wrk benchmarking tool:

wrk -c 10 -t 10 -d 10 -H "Host: httpbin.tetrate.io" http://${GW}/

# repeat indefinitely
while wrk -c 10 -t 10 -d 10 -H "Host: httpbin.tetrate.io" http://${GW}/ ; do done

Apply a local rate limit

Local Rate Limits are applied using the rateLimiting: local parameters.

The following example limits each individual client (dimensions: remoteAddress: value: "*") to 8 requests per second:

cat <<EOF > ${NS}-gw-ratelimit.yaml
apiVersion: gateway.tsb.tetrate.io/v2
kind: Gateway
metadata:
organization: ${ORG}
tenant: ${TEN}
workspace: ${NS}-ws
group: ${NS}-gwgroup
name: ${NS}-gw
spec:
workloadSelector:
namespace: ${NS}
labels:
app: ${NS}-ingressgw
http:
- name: httpbin
port: 80
hostname: "httpbin.tetrate.io"
routing:
rules:
- route:
serviceDestination:
host: "${NS}/httpbin.${NS}.svc.cluster.local"
port: 8000
rateLimiting:
local:
rules:
- dimensions:
- remoteAddress:
value: "*"
tokenBucket:
maxTokens: 8
tokensPerFill: 8
fillInterval: 1s
EOF

tctl apply -f ${NS}-gw-ratelimit.yaml

Any requests that exceed that limit will immediately receive an HTTP 429 response:

curl -D - -H 'Host: httpbin.tetrate.io' http://${GW}/
HTTP/1.1 429 Too Many Requests
content-length: 18
content-type: text/plain
date: Fri, 08 Aug 2025 14:08:32 GMT
server: istio-envoy

local_rate_limited
note

Rate limits based on remoteAddress may not be accurate if there are multiple load balancers downstream of the Envoy Gateway, as the requests will appear to originate from these load balancers. Refer to your cloud platform documentation to determine if client IP addresses can be preserved or if they can be obtained from a request header.

Check traces or logs from the Envoy gateway to verify the source IP addresses that it observes, which may be different from the client's source IP addresses.

Other Rate Limiting Options

Details for other rate limiting options can be found in the local rate limiting API reference.

For example, the following local rate limit restricts each client token (header Authorization) to a maximum of 4 requests per minute:

cat <<EOF > ${NS}-gw-ratelimit2.yaml
apiVersion: gateway.tsb.tetrate.io/v2
kind: Gateway
metadata:
organization: ${ORG}
tenant: ${TEN}
workspace: ${NS}-ws
group: ${NS}-gwgroup
name: ${NS}-gw
spec:
workloadSelector:
namespace: ${NS}
labels:
app: ${NS}-ingressgw
http:
- name: httpbin
port: 80
hostname: "httpbin.tetrate.io"
routing:
rules:
- route:
serviceDestination:
host: "${NS}/httpbin.${NS}.svc.cluster.local"
port: 8000
rateLimiting:
local:
rules:
- dimensions:
- header:
name: "Authorization"
# with no value, rate limit on each unique header value
# value:
# exact: "bar"
tokenBucket:
maxTokens: 4
tokensPerFill: 4
fillInterval: 60s
EOF

tctl apply -f ${NS}-gw-ratelimit2.yaml

Make the HTTP request several times within a minute; the first 4 requests should succeed, but subsequent requests fail (429 Too Many Requests) for the remainder of that minute:

for i in {1..5} ; do curl -s -o /dev/null -w "%{http_code} " -H "Host: httpbin.tetrate.io" -H "Authorization: my-code" http://${GW}/ ; done

Try with a different value for the Authorization header, and you will see that these requests are counted independently, concurrently allowing 4 per minute.

for i in {1..5} ; do for j in {1..4} ; do curl -s -o /dev/null -w "%{http_code} " -H "Host: httpbin.tetrate.io" -H "Authorization: my-code${j}" http://${GW}/ ; done ; echo ; done