Tetrate Enterprise Gateway for EnvoyVersion: v1.4.x

Rate Limiting

There are myriad reasons to limit the rate of requests to an endpoint, including fair use of your resources by customers, and protection of your services by ensuring they don't go over their qps capacity.

info

The Envoy proxies used by Tetrate Enterprise Gateway for Envoy (TEG) to process and forward user requests do support a form of ratelimiting natively. However this is local-only; each proxy keeps its own count of how many requests have hit and endpoint, and can only make decisions based on that local number. Since TEG horizontally scales out the Envoys at the edge of your cluster (for both scale reasons, and to avoid single-points-of-failure), each proxy doing its own local limiting isn't sufficient to keep load on your services to a given level. What's needed is global ratelimiting, i.e. across all the proxies. This requires them to co-ordinate about the request counts for each endpoint. This in turn needs a global counter, available to all of them. TEG's batteries-included approach provides this for you as part of the standard install, realized as a Redis server available on the network.

warning

While the TEG-managed Redis instance is a great way to get started, it is not deployed in a production-hardened manner and should be considered suitable for demo purposes only.

TEG makes enforcing global rate-limits easy; configurable with a simple resource. And because TEG is layer-7 (i.e. HTTP) aware, ratelimits can be applied to individual HTTP hosts, paths, and other attributes available in the requests' HTTP headers, e.g. username.

Prerequisites

This article will just cover configuring ratelimiting. Before we can look at that, you should have a Kubernetes cluster, with TEG installed, an app deployed, and that app exposed outside of the cluster using TEG. You can follow the Expose Your Application guide to get to this point. This guide suggests you install an example httpbin instance to test with. The instructions in this article will configure ratelimiting for this httpbin instance, but you can of course change them to refer to your own app(s).

Configuring ratelimiting

Ratelimiting using BackendTrafficPolicy

First, we will configure a ratelimiting setting using BackendTrafficPolicy. This states that any endpoint subject to it will be limited to 1 request per second.

cat <<EOF | kubectl apply -f -
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  namespace: httpbin
  name: ratelimit-1hz
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: httpbin
  rateLimit:
    type: Global
    global:
      rules:
        - limit:
            requests: 1
            unit: Second
EOF

Test the Ratelimit
Now let's make some requests to httpbin to see the ratelimit in action. Refer back to Expose Your Application for how to set the $DEDICATED_GATEWAY_IP environment variable, which we'll need to be able to call TEG.
We can then make a request with curl:
```
curl -i http://$DEDICATED_GATEWAY_IP/httpbin/get
```
You'll see output like
```
HTTP/1.1 200 OK
server: envoy
date: Thu, 07 Sep 2023 10:48:22 GMT
content-type: application/json
content-length: 334
access-control-allow-origin: *
access-control-allow-credentials: true
x-envoy-upstream-service-time: 1
x-ratelimit-limit: 1, 1;w=1
x-ratelimit-remaining: 0
x-ratelimit-reset: 1

{
    ...
}
```
The body of the response isn't important; it just echos our request back at us. But the response headers clearly talk about ratelimiting (their syntax is complicated, I'll run through it below for the interested). Let's empirically check if the rate limit is working: we're allowed one request per second, so let's send two in quick succession:
```
curl -i http://$DEDICATED_GATEWAY_IP/httpbin/get; curl -i http://$DEDICATED_GATEWAY_IP/httpbin/get;
```
We should now see output like:
```
HTTP/1.1 200 OK
...
HTTP/1.1 429 Too Many Requests
x-envoy-ratelimited: true
...
```
The second request was denied because the ratelimit was exceeded. This is indicated by an HTTP status 429.
info
The status code we get is in the 400 range - these indicate client errors, i.e. we sent an invalid request. And indeed we did: we asked for resources more quickly than we were allowed. The first request was successful, and it told us (in those ratelimit headers) that we were subject to a ratelimit, and how far away we were from hitting it.
If we were making requests at an acceptable rate (according to our SLA with the service provider), but the app couldn't keep up, that's a server-side error, and hence is indicated by a 500 range error, specifically a 503 Service Unavailable.
That it! Our ratelimit works! You can now proceed to the next steps, or keep reading this article for some more advanced content.
Configuring a Second Rule
Ratelimits are made of buckets. A 3-per-second ratelimit is a bucket of size 3. Every request decrements that bucket's value by 1. When the bucket hits 0, no more requests are allowed. That would limit is to 3 requests ever, so to enforce a limit of 3 per second, the bucket's value is reset to 3 every 1 second.
Ratelimit policies can have more than one rule; more than one bucket which is drained and reset. Edit your policy's rules section to look like this
```
    rules:
      - limit:
          requests: 1
          unit: Second
      - limit:
          requests: 20
          unit: Minute
```
This enforces a policy of no more than 1 request per second, or 20 per minute. Whichever is exceeded first causes all traffic to be blocked (until it resets); think of them as being or'd rather than and'ed.
Response Header Format
Now we understand the bucket model of ratelimits, the response headers will make more sense:
- x-ratelimit-limit: 1, 1;w=1, 20;w=60 - tells the client the ratelimit policies that are in effect. Ignore the first element for now, and focus on 1;w=1 and 20;w=60. These are the rules in our policy: 1 request in a window of 1 second (w=1), and 20 requests in a window of 60 seconds (w=60). That first number 1 is the size of the bucket we're closest to draining; the rule that will deny our requests first, and which we should take care not to exceed. The other two headers also concern this "closest" bucket...
- x-ratelimit-remaining: 0 - how many requests are left in the close bucket. In this case 0: the current request succeeded, but the bucket is now empty, and future ones will fail, until that bucket is reset...
- x-ratelimit-reset: 1 - how long, in seconds, until the close bucket resets to its full value. In this case that's just 1 second, as it's the 1-request-per-1-second bucket that we're in danger of exhausting.
If you're writing the clients that will connect to TEG when it's enforcing ratelimits, they should ideally parse and act on these headers, rather than just retrying in a tight loop until they get a 200.
If you're interested you can read the draft RFC for these headers. At this writing, Envoy is emitting version 3, even though the spec is already on version 7.

Prerequisites​

Configuring ratelimiting​

Ratelimiting using BackendTrafficPolicy​

Test the Ratelimit​

Configuring a Second Rule​

Response Header Format​

Prerequisites

Configuring ratelimiting

Ratelimiting using BackendTrafficPolicy

Test the Ratelimit

Configuring a Second Rule

Response Header Format