How to setup rate limit on services

Changing NGINX snippets configuration is an advanced feature and can lead to misconfiguration. Please be careful when changing these settings as it might break the whole NGINX configuration and thus make impossible to reach your services.

Now that your service is up and running on Qovery, you might want to setup some rate limits to protect your service from abuse. While we usually recommend to do it via third parties like Cloudlfare or similar because using such solutions, the traffic will be filtered out before reaching your workload hence not wasting your resources. This guide will show you how to do that.

Before you begin, this guide assumes the following:

Your app is running on a Qovery managed cluster.

Goal

This tutorial will cover how to setup rate limit on your services by customizing Nginx configuration. Several options are possible:

Understand NGINX rate limiting configuration

More information about how rate limit works on NGINX can be found on their blog in this post.

Initial setup

Configure service
I will use a basic container service echo-server setup with Qovery. This service is listening on port 80.
To start with, this service don't have any rate limit set, so everything will be accepted.
For demonstration purpose: set nginx replicas to 1
Rate limit will be set per nginx controller instances. Meaning, if you have more than 1 nginx instances, then those limits will be x2. You can configure the minimum `nginx.hpa.min_number_instances` and maximum `nginx.hpa.max_number_instances` number of instances in the Qovery console in cluster > settings > advanced-settings.
For this tutorial and to ease the demonstration, I am setting nginx controller instances to 1 (min = max = 1) in cluster advanced settings. Please do not set this in production, as it will reduce the availability of your service.

Load testing

I will send couple requests against this service showing off there is no rate limit set (100 req/sec and 500 requests total).

❯ oha -q 100 -n 500 https://p8080-za845ce06-z01d340ed-gtw.z77ccfcb8.slab.sh/
Summary:
  Success rate: 100.00%
  Total:        4.9964 secs
  Slowest:      0.0275 secs
  Fastest:      0.0029 secs
  Average:      0.0052 secs
  Requests/sec: 100.0720

  Total data:   229.00 KiB
  Size/request: 469 B
  Size/sec:     45.83 KiB

Response time histogram:
  0.003 [1]   |
  0.005 [432] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.008 [14]  |■
  0.010 [1]   |
  0.013 [2]   |
  0.015 [2]   |
  0.018 [39]  |■■
  0.020 [4]   |
  0.023 [1]   |
  0.025 [3]   |
  0.028 [1]   |

Response time distribution:
  10.00% in 0.0032 secs
  25.00% in 0.0034 secs
  50.00% in 0.0038 secs
  75.00% in 0.0043 secs
  90.00% in 0.0150 secs
  95.00% in 0.0163 secs
  99.00% in 0.0211 secs
  99.90% in 0.0275 secs
  99.99% in 0.0275 secs


Details (average, fastest, slowest):
  DNS+dialup:   0.0126 secs, 0.0115 secs, 0.0172 secs
  DNS-lookup:   0.0001 secs, 0.0000 secs, 0.0003 secs

Status code distribution:
  [200] 500 responses

As we can see, all requests ended up with 200 status code.

How was it? Did this tutorial work? Yes No

Global rate limit

Setting a global rate limit that affects all matching requests from each IP address is useful for protecting your server from abuse while allowing legitimate traffic through. This setting will set a global rate limit to all exposed services on the cluster.

Declare the global rate at cluster level
In order to set a global rate limit, we need to declare it at cluster level in cluster advanced settings nginx.controller.http_snippet (see documentation).
Here's nginx nginx.controller.http_snippet value we will set:
```
limit_req_zone "$server_name" zone=global:10m rate=10r/s;
```
Details
- limit_req_zone: this is the NGINX directive that defines a shared memory zone for rate limiting
- "$server_name": could be replaced with any constant value like "1", the key just needs to be the same for all requests. You can also use $http_x_forwarded_for to rate limit based on the client IP address or any other custom headers, see custom rate limit key
- zone=global:10m: global is the name of the zone (you'll reference this name in your location blocks), 10m allocates 10 megabytes of shared memory for storing rate limiting states
- rate=100r/s: allows 100 requests per second, any requests above this rate will be delayed or rejected.
Use this global rate
Now that our global rate is defined, let use it in our nginx configuration.
In order to do so, we need to declare this server snippet at cluster level in advanced setting nginx.controller.server_snippet (see documentation):
```
location / {
    limit_req zone=global;
}
```
Details
- location /: matches all HTTP requests to your server (the / path is the root and matches everything)
- limit_req zone=global: applies the rate limiting rules from the zone named "global" that we defined earlier. By default with this basic syntax, it will: not allow any bursting, reject excess requests with a 503 error, use "leaky bucket" algorithm for request processing
However, this basic configuration can be quite strict. You can make it more forgiving by adding parameters: `burst=20 nodelay;` to the `limit_req` directive. This will allow the server to handle 20 requests in a burst, and not delay requests when the rate limit is exceeded.
Deploy your cluster
You can now deploy your cluster with the new settings.

Testing the global rate limit

Let's test our setup, by sending 100 requests per second for 500 requests total. We should see that some requests are rejected with a 503 status code.

❯ oha -q 100 -n 500 https://p8080-za845ce06-z01d340ed-gtw.z77ccfcb8.slab.sh/
ohaSummary:
  Success rate: 100.00%
  Total:        4.9966 secs
  Slowest:      0.0405 secs
  Fastest:      0.0024 secs
  Average:      0.0049 secs
  Requests/sec: 100.0674

  Total data:   105.85 KiB
  Size/request: 216 B
  Size/sec:     21.18 KiB

Response time histogram:
  0.002 [1]   |
  0.006 [439] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.010 [8]   |
  0.014 [0]   |
  0.018 [35]  |■■
  0.021 [7]   |
  0.025 [5]   |
  0.029 [3]   |
  0.033 [0]   |
  0.037 [1]   |
  0.041 [1]   |

Response time distribution:
  10.00% in 0.0028 secs
  25.00% in 0.0030 secs
  50.00% in 0.0032 secs
  75.00% in 0.0035 secs
  90.00% in 0.0145 secs
  95.00% in 0.0167 secs
  99.00% in 0.0253 secs
  99.90% in 0.0405 secs
  99.99% in 0.0405 secs


Details (average, fastest, slowest):
  DNS+dialup:   0.0141 secs, 0.0113 secs, 0.0264 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0005 secs

Status code distribution:
  [503] 452 responses
  [200] 48 responses

We do see a total of 500 requests sent over 5 seconds (100 req/sec) with 452 requests rejected with a 503 status code. Doing the maths 48 requests were accepted with 200 status code (48 / 5 = 9.6 req/sec) which matches the limit we set: 10 req/sec.

One quick look at NGINX logs in our service logs, we can see a message for those rejected requests:

NGINX logs showing requests rejections from global rate

How was it? Did this tutorial work? Yes No

Service level rate limit

You can also set rate limits at the service level, which can be useful if you want to have different rate limits for different services.

The configuration described below will limit the number of requests per second from an IP address (it's not global to the service).

Set rate limit in service advanced settings
Go to your service advanced settings and set those two settings:
- network.ingress.nginx_limit_rps: the rate limit in requests per second you want to set for your service (see documentation)
- network.ingress.nginx_limit_rpm: the rate limit in requests per minute you want to set for your service (see documentation)
- network.ingress.nginx_limit_burst_multiplier: the burst limit multiplier in requests for your service (default is 5) (see documentation)
For this example, I will set a rate limit of 10 requests per minute with a burst limit of 2x.
Using those
Deploy your service
Deploy your service with the new settings.
Testing the service level rate limit
Let's test our setup, by sending 100 requests per second for 500 requests total. We should see that some requests are rejected with a 503 status code.
```
❯ oha -q 100 -n 500 https://p8080-za845ce06-z417ab7bf-gtw.z77ccfcb8.slab.sh/
ohaSummary:
  Success rate: 100.00%
  Total:        5.0009 secs
  Slowest:      0.0705 secs
  Fastest:      0.0025 secs
  Average:      0.0062 secs
  Requests/sec: 99.9827

  Total data:   98.52 KiB
  Size/request: 201 B
  Size/sec:     19.70 KiB

Response time histogram:
  0.002 [1]   |
  0.009 [449] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.016 [5]   |
  0.023 [12]  |
  0.030 [11]  |
  0.036 [5]   |
  0.043 [4]   |
  0.050 [6]   |
  0.057 [2]   |
  0.064 [4]   |
  0.070 [1]   |

Response time distribution:
  10.00% in 0.0029 secs
  25.00% in 0.0031 secs
  50.00% in 0.0033 secs
  75.00% in 0.0037 secs
  90.00% in 0.0145 secs
  95.00% in 0.0263 secs
  99.00% in 0.0578 secs
  99.90% in 0.0705 secs
  99.99% in 0.0705 secs


Details (average, fastest, slowest):
  DNS+dialup:   0.0223 secs, 0.0114 secs, 0.0511 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0001 secs

Status code distribution:
  [503] 479 responses
  [200] 21 responses
```
We do see a total of 500 requests sent over 5 seconds (100 req/sec) with 479 requests rejected with a 503 status code. Doing the maths 21 requests were accepted with 200 status code which matches the limit we set: 10 req/min with x2 burst multiplier.
One quick look at NGINX logs in our service logs, we can see a message for those rejected requests:
When you set both network.ingress. nginx_limit_rpm (rate per minute) and network.ingress. nginx_limit_rps (rate per second) annotations on an Nginx ingress, both rate limits will be enforced simultaneously. This means that traffic will be restricted based on whichever limit is hit first. For example, if you set:
- network.ingress.nginx_limit_rpm: 300
- network.ingress.nginx_limit_rps: 10
This configuration would mean:
- No more than 300 requests allowed per minute
- No more than 10 requests allowed per second
In practice, the network.ingress. nginx_limit_rps often becomes the more restrictive limit. In the example above, while network.ingress. nginx_limit_rpm allows for 300 requests per minute (averaging to 5 requests per second), the network.ingress. nginx_limit_rps setting would block any burst over 10 requests in a single second, even if the total requests per minute is well below 300.

How was it? Did this tutorial work? Yes No

Service rate limit per custom header

So far, we've created rate limit on server name or client address IP, but you can also create rate limit based on custom headers.

This configuration requires to declare a rate limit at cluster level and then update service configurartion to use it.

Define this rate limiter
Go to cluster advanced settings, and declare this new rate limiter by setting nginx.controller.http_snippet (see documentation) with the following value (if any configuration is already set as in the example below, you can just append it at the end):
```
map $http_x_qovery_api_key $limit_key {
    default $http_x_qovery_api_key;
    "" "anonymous";  # Fallback for missing or empty key
}

limit_req_zone $limit_key zone=qovery_api_limit:10m rate=5r/s;
```
Details - `map`: directive creates a variable `$limit_key` based on the value of the `$http_x_qovery_api_key` header - `$http_x_qovery_api_key`: is the incoming HTTP header that contains the API key for the client making the request
If $http_x_qovery_api_key is present and non-empty, $limit_key will be set to the value of $http_x_qovery_api_key. If $http_x_qovery_api_key is missing or empty (""), $limit_key is set to "anonymous".
This ensures that requests without an API key are handled differently (e.g., rate-limited as anonymous users).
Make use of this rate limiter at service level
Go to your service advanced settings and set network.ingress.nginx_controller_configuration_snippet with the following:
```
limit_req zone=qovery_api_limit burst=2 nodelay;
```
In this example, we have set a burst of 2 but it's not mandatory, you can set it to fit best to your use case (checkout this documentation chapter Handling bursts to understand better burst). It allows a small burst of requests beyond the defined rate before enforcing the rate limit. Normally, the rate=5r/s setting in the limit_req_zone configuration allows up to 5 requests per second per $limit_key.
- burst=2 setting allows up to 2 extra requests to pass through immediately, even if the rate limit has been exceeded, but only for short bursts.
- `nodelay`` disables queuing of excess requests, without nodelay, excess requests would be queued and served later, respecting the rate limit.
Deploy your service
Deploy your service with the new settings.

Testing the service level rate limit

Let's test our setup, by sending 100 requests per second for 500 requests total. We should see that some requests are rejected with a 503 status code.

❯ oha -q 100 -n 500 -H "X-QOVERY-API-KEY: benjamin" https://p8080-za845ce06-z82b5bc64-gtw.z77ccfcb8.slab.sh/
Summary:
  Success rate: 100.00%
  Total:        4.9952 secs
  Slowest:      0.0474 secs
  Fastest:      0.0025 secs
  Average:      0.0050 secs
  Requests/sec: 100.0959

  Total data:   101.40 KiB
  Size/request: 207 B
  Size/sec:     20.30 KiB

Response time histogram:
  0.002 [1]   |
  0.007 [447] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.011 [2]   |
  0.016 [12]  |
  0.020 [23]  |■
  0.025 [4]   |
  0.029 [6]   |
  0.034 [3]   |
  0.038 [0]   |
  0.043 [1]   |
  0.047 [1]   |

Response time distribution:
  10.00% in 0.0028 secs
  25.00% in 0.0031 secs
  50.00% in 0.0032 secs
  75.00% in 0.0036 secs
  90.00% in 0.0147 secs
  95.00% in 0.0183 secs
  99.00% in 0.0310 secs
  99.90% in 0.0474 secs
  99.99% in 0.0474 secs


Details (average, fastest, slowest):
  DNS+dialup:   0.0160 secs, 0.0117 secs, 0.0338 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0001 secs

Status code distribution:
  [503] 473 responses
  [200] 27 responses

We do see a total of 500 requests sent over 5 seconds (100 req/sec) with 473 requests rejected with a 503 status code. Doing the maths 27 requests were accepted with 200 status code which matches the limit we set: 5.4 req/seq with two requests burst.

One quick look at NGINX logs in our service logs, we can see a message for those rejected requests:

NGINX logs showing requests rejections from custom header rate limit in service logs

Looking at raw NGINX logs, we do see the name of the rate limiter rejecting those requests:

NGINX logs showing requests rejections from custom header rate limit

How was it? Did this tutorial work? Yes No

Other configuration examples

In this section, you will find couple other configurations examples (non exhaustive) you can set.

Custom request limit status HTTP code

While NGINX defaults to 503 HTTP Status code when rejecting requests, you can change this default by setting nginx.controller.limit_request_status_code in cluster advanced settings. For example, we usually use 429.

Custom http status code for limit in cluster advanced settings

For any rate limited request, NGINX will now returns an HTTP 429 status code.

oha -q 100 -n 500 -H "X-QOVERY-API-KEY: benjamin" https://p8080-za845ce06-z82b5bc64-gtw.z77ccfcb8.slab
.sh/
Summary:
  Success rate: 100.00%
  Total:        5.0001 secs
  Slowest:      0.0616 secs
  Fastest:      0.0052 secs
  Average:      0.0102 secs
  Requests/sec: 99.9981

  Total data:   88.46 KiB
  Size/request: 181 B
  Size/sec:     17.69 KiB

Response time histogram:
  0.005 [1]   |
  0.011 [438] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.017 [10]  |
  0.022 [1]   |
  0.028 [0]   |
  0.033 [15]  |■
  0.039 [21]  |■
  0.045 [4]   |
  0.050 [6]   |
  0.056 [1]   |
  0.062 [3]   |

Response time distribution:
  10.00% in 0.0062 secs
  25.00% in 0.0065 secs
  50.00% in 0.0069 secs
  75.00% in 0.0076 secs
  90.00% in 0.0278 secs
  95.00% in 0.0353 secs
  99.00% in 0.0501 secs
  99.90% in 0.0616 secs
  99.99% in 0.0616 secs


Details (average, fastest, slowest):
  DNS+dialup:   0.0282 secs, 0.0206 secs, 0.0537 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0001 secs

Status code distribution:
  [429] 473 responses
  [200] 27 responses

We can see 429 status code returned, same goes when looking at NGINX logs:

NGINX logs showing requests rejections with custom http status code

Limit connections

You might want to limit the number of concurrent connections allowed from a single IP address. To do so, you can set the network.ingress.nginx_limit_connections advanced setting at service level.

#Goal

#Understand NGINX rate limiting configuration

#Initial setup

#Configure service

#For demonstration purpose: set nginx replicas to 1

#Load testing

#Global rate limit

#Declare the global rate at cluster level

#Use this global rate

#Deploy your cluster

#Testing the global rate limit

#Service level rate limit

#Set rate limit in service advanced settings

#Deploy your service

#Testing the service level rate limit

#Service rate limit per custom header

#Define this rate limiter

#Make use of this rate limiter at service level

#Deploy your service

#Testing the service level rate limit

#Other configuration examples

#Custom request limit status HTTP code

#Limit connections

Goal

Understand NGINX rate limiting configuration

Initial setup

Configure service

For demonstration purpose: set nginx replicas to 1

Load testing

Global rate limit

Declare the global rate at cluster level

Use this global rate

Deploy your cluster

Testing the global rate limit

Service level rate limit

Set rate limit in service advanced settings

Deploy your service

Testing the service level rate limit

Service rate limit per custom header

Define this rate limiter

Make use of this rate limiter at service level

Deploy your service

Testing the service level rate limit

Other configuration examples

Custom request limit status HTTP code

Limit connections