Skip to main content
logoTetrate Enterprise Gateway for Envoy (TEG)Version: v0.1.0

Rate Limiting

There are myriad reasons to limit the rate of requests to an endpoint, including fair use of your resources by customers, and protection of your services by ensuring they don't go over their qps capacity.


The Envoy proxies used by Tetrate Enterprise Gateway for Envoy (TEG) to process and forward user requests do support a form of ratelimiting natively. However this is local-only; each proxy keeps its own count of how many requests have hit and endpoint, and can only make decisions based on that local number. Since TEG horizontally scales out the Envoys at the edge of your cluster (for both scale reasons, and to avoid single-points-of-failure), each proxy doing its own local limiting isn't sufficient to keep load on your services to a given level. What's needed is global ratelimiting, i.e. across all the proxies. This requires them to co-ordinate about the request counts for each endpoint. This in turn needs a global counter, available to all of them. TEG's batteries-included approach provides this for you as part of the standard install, realized as a Redis server available on the network.

TEG makes enforcing global rate-limits easy; configurable with a simple resource. And because TEG is layer-7 (i.e. HTTP) aware, ratelimits can be applied to individual HTTP hosts, paths, and other attributes available in the requests' HTTP headers, e.g. username.


This article will just cover configuring ratelimiting. Before we can look at that, you should have a Kubernetes cluster, with TEG installed, an app deployed, and that app exposed outside of the cluster using TEG. You can follow the Expose Your Application guide to get to this point. This guide suggests you install an example httpbin instance to test with. The instructions in this article will configure ratelimiting for this httpbin instance, but you can of course change them to refer to your own app(s).

Configuring ratelimiting

  1. Ratelimiting Policy

    First, we will configure a ratelimiting policy. This states that any endpoint subject to it will be limited to 1 request per second.

    kind: RateLimitFilter
    namespace: httpbin
    name: ratelimit-1hz
    type: Global
    - limit:
    requests: 1
    unit: Second
    kubectl apply -f ratelimit-1hz.yaml
  2. Apply Policy to httpbin


    As you can see, although ratelimits are applied to routes, ratelimit policies are not defined in those routes directly. This stops the route (e.g. HTTPRoute) resources from becoming too complex, allows other types of policy to be added to the API later, and allows ratelimit policies to be reused.

    Now we have a policy deployed, we'll update httpbin's HTTPRoute to use it. This is done by adding another filter to the rule object that handles httpbin traffic.


    This requires a change to the filters array in the (only) object in the rules array of the resource. Sadly, the resource type we're dealing with, HTTPRoute, doesn't support Kubernetes' strategic merge patch type, so to make such an intricate change we will have to re-specify the whole object.

    kind: HTTPRoute
    name: httpbin
    namespace: httpbin
    - group:
    kind: Gateway
    name: dedicated-gateway
    - matches:
    - path:
    type: PathPrefix
    value: /httpbin/
    - type: URLRewrite
    type: ReplacePrefixMatch
    replacePrefixMatch: /
    - type: ExtensionRef
    kind: RateLimitFilter
    name: ratelimit-1hz
    - group: ""
    kind: Service
    name: httpbin
    port: 8000
    kubectl apply -f httproute-httpbin.yaml
  3. Test the Ratelimit

    Now let's make some requests to httpbin to see the ratelimit in action. Refer back to Expose Your Application for how to set the $DEDICATED_GATEWAY_IP environment variable, which we'll need to be able to call TEG.

    We can then make a request with curl:

    curl -i http://$DEDICATED_GATEWAY_IP/httpbin/get

    You'll see output like

    HTTP/1.1 200 OK
    server: envoy
    date: Thu, 07 Sep 2023 10:48:22 GMT
    content-type: application/json
    content-length: 334
    access-control-allow-origin: *
    access-control-allow-credentials: true
    x-envoy-upstream-service-time: 1
    x-ratelimit-limit: 1, 1;w=1
    x-ratelimit-remaining: 0
    x-ratelimit-reset: 1


    The body of the response isn't important; it just echos our request back at us. But the response headers clearly talk about ratelimiting (their syntax is complicated, I'll run through it below for the interested). Let's empirically check if the rate limit is working: we're allowed one request per second, so let's send two in quick succession:

    curl -i http://$DEDICATED_GATEWAY_IP/httpbin/get; curl -i http://$DEDICATED_GATEWAY_IP/httpbin/get;

    We should now see output like:

    HTTP/1.1 200 OK
    HTTP/1.1 429 Too Many Requests
    x-envoy-ratelimited: true

    The second request was denied because the ratelimit was exceeded. This is indicated by an HTTP status 429.


    The status code we get is in the 400 range - these indicate client errors, i.e. we sent an invalid request. And indeed we did: we asked for resources more quickly than we were allowed. The first request was successful, and it told us (in those ratelimit headers) that we were subject to a ratelimit, and how far away we were from hitting it.

    If we were making requests at an acceptable rate (according to our SLA with the service provider), but the app couldn't keep up, that's a server-side error, and hence is indicated by a 500 range error, specifically a 503 Service Unavailable.

    That it! Our ratelimit works! You can now proceed to the next steps, or keep reading this article for some more advanced content.

  4. Configuring a Second Rule

    Ratelimits are made of buckets. A 3-per-second ratelimit is a bucket of size 3. Every request decrements that bucket's value by 1. When the bucket hits 0, no more requests are allowed. That would limit is to 3 requests ever, so to enforce a limit of 3 per second, the bucket's value is reset to 3 every 1 second.

    Ratelimit policies can have more than one rule; more than one bucket which is drained and reset. Edit your policy's rules section to look like this

    - limit:
    requests: 1
    unit: Second
    - limit:
    requests: 20
    unit: Minute

    This enforces a policy of no more than 1 request per second, or 20 per minute. Whichever is exceeded first causes all traffic to be blocked (until it resets); think of them as being or'd rather than and'ed.

  5. Response Header Format

    Now we understand the bucket model of ratelimits, the response headers will make more sense:

    • x-ratelimit-limit: 1, 1;w=1, 20;w=60 - tells the client the ratelimit policies that are in effect. Ignore the first element for now, and focus on 1;w=1 and 20;w=60. These are the rules in our policy: 1 request in a window of 1 second (w=1), and 20 requests in a window of 60 seconds (w=60). That first number 1 is the size of the bucket we're closest to draining; the rule that will deny our requests first, and which we should take care not to exceed. The other two headers also concern this "closest" bucket...
    • x-ratelimit-remaining: 0 - how many requests are left in the close bucket. In this case 0: the current request succeeded, but the bucket is now empty, and future ones will fail, until that bucket is reset...
    • x-ratelimit-reset: 1 - how long, in seconds, until the close bucket resets to its full value. In this case that's just 1 second, as it's the 1-request-per-1-second bucket that we're in danger of exhausting.

    If you're writing the clients that will connect to TEG when it's enforcing ratelimits, they should ideally parse and act on these headers, rather than just retrying in a tight loop until they get a 200.

    If you're interested you can read the draft RFC for these headers. At this writing, Envoy is emitting version 3, even though the spec is already on version 7.