How to setup rate limit on services

How to setup rate limit on services running on Qovery.

Now that your service is up and running on Qovery, you might want to setup some rate limits to protect your service from abuse. While we usually recommend to do it via third parties like Cloudlfare or similar because using such solutions, the traffic will be filtered out before reaching your workload hence not wasting your resources. This guide will show you how to do that.

Goal

This tutorial will cover how to setup rate limit on your services by customizing Nginx configuration. Several options are possible:

Understand NGINX rate limiting configuration

More information about how rate limit works on NGINX can be found on their blog in this post.

Initial setup

  1. Configure service

    I will use a basic container service echo-server setup with Qovery. This service is listening on port 80.

    Service initial setup in Qovery console

    To start with, this service don't have any rate limit set, so everything will be accepted.

  2. For demonstration purpose: set nginx replicas to 1

    For this tutorial and to ease the demonstration, I am setting nginx controller instances to 1 (min = max = 1) in cluster advanced settings. Please do not set this in production, as it will reduce the availability of your service.

    Set nginx replicas to 1 in cluster advanced settings

  3. Load testing

    I will send couple requests against this service showing off there is no rate limit set (100 req/sec and 500 requests total).

    ❯ oha -q 100 -n 500 https://p8080-za845ce06-z01d340ed-gtw.z77ccfcb8.slab.sh/
    Summary:
    Success rate: 100.00%
    Total: 4.9964 secs
    Slowest: 0.0275 secs
    Fastest: 0.0029 secs
    Average: 0.0052 secs
    Requests/sec: 100.0720
    Total data: 229.00 KiB
    Size/request: 469 B
    Size/sec: 45.83 KiB
    Response time histogram:
    0.003 [1] |
    0.005 [432] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
    0.008 [14] |
    0.010 [1] |
    0.013 [2] |
    0.015 [2] |
    0.018 [39] |■■
    0.020 [4] |
    0.023 [1] |
    0.025 [3] |
    0.028 [1] |
    Response time distribution:
    10.00% in 0.0032 secs
    25.00% in 0.0034 secs
    50.00% in 0.0038 secs
    75.00% in 0.0043 secs
    90.00% in 0.0150 secs
    95.00% in 0.0163 secs
    99.00% in 0.0211 secs
    99.90% in 0.0275 secs
    99.99% in 0.0275 secs
    Details (average, fastest, slowest):
    DNS+dialup: 0.0126 secs, 0.0115 secs, 0.0172 secs
    DNS-lookup: 0.0001 secs, 0.0000 secs, 0.0003 secs
    Status code distribution:
    [200] 500 responses

    As we can see, all requests ended up with 200 status code.

Global rate limit

Setting a global rate limit that affects all matching requests from each IP address is useful for protecting your server from abuse while allowing legitimate traffic through. This setting will set a global rate limit to all exposed services on the cluster.

  1. Declare the global rate at cluster level

    In order to set a global rate limit, we need to declare it at cluster level in cluster advanced settings nginx.controller.http_snippet (see documentation).

    Here's nginx nginx.controller.http_snippet value we will set:

    limit_req_zone "$server_name" zone=global:10m rate=10r/s;
    Details
    • limit_req_zone: this is the NGINX directive that defines a shared memory zone for rate limiting
    • "$server_name": could be replaced with any constant value like "1", the key just needs to be the same for all requests. You can also use $http_x_forwarded_for to rate limit based on the client IP address or any other custom headers, see custom rate limit key
    • zone=global:10m: global is the name of the zone (you'll reference this name in your location blocks), 10m allocates 10 megabytes of shared memory for storing rate limiting states
    • rate=100r/s: allows 100 requests per second, any requests above this rate will be delayed or rejected.

    Declare global rate limit at cluster level in cluster advanced settings

  2. Use this global rate

    Now that our global rate is defined, let use it in our nginx configuration.

    In order to do so, we need to declare this server snippet at cluster level in advanced setting nginx.controller.server_snippet (see documentation):

    location / {
    limit_req zone=global;
    }
    Details
    • location /: matches all HTTP requests to your server (the / path is the root and matches everything)
    • limit_req zone=global: applies the rate limiting rules from the zone named "global" that we defined earlier. By default with this basic syntax, it will: not allow any bursting, reject excess requests with a 503 error, use "leaky bucket" algorithm for request processing

    Make use of global rate limit at cluster level in cluster advanced settings

  3. Deploy your cluster

    You can now deploy your cluster with the new settings.

    Deploy cluster after advanced settings changes

  4. Testing the global rate limit

    Let's test our setup, by sending 100 requests per second for 500 requests total. We should see that some requests are rejected with a 503 status code.

    ❯ oha -q 100 -n 500 https://p8080-za845ce06-z01d340ed-gtw.z77ccfcb8.slab.sh/
    ohaSummary:
    Success rate: 100.00%
    Total: 4.9966 secs
    Slowest: 0.0405 secs
    Fastest: 0.0024 secs
    Average: 0.0049 secs
    Requests/sec: 100.0674
    Total data: 105.85 KiB
    Size/request: 216 B
    Size/sec: 21.18 KiB
    Response time histogram:
    0.002 [1] |
    0.006 [439] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
    0.010 [8] |
    0.014 [0] |
    0.018 [35] |■■
    0.021 [7] |
    0.025 [5] |
    0.029 [3] |
    0.033 [0] |
    0.037 [1] |
    0.041 [1] |
    Response time distribution:
    10.00% in 0.0028 secs
    25.00% in 0.0030 secs
    50.00% in 0.0032 secs
    75.00% in 0.0035 secs
    90.00% in 0.0145 secs
    95.00% in 0.0167 secs
    99.00% in 0.0253 secs
    99.90% in 0.0405 secs
    99.99% in 0.0405 secs
    Details (average, fastest, slowest):
    DNS+dialup: 0.0141 secs, 0.0113 secs, 0.0264 secs
    DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0005 secs
    Status code distribution:
    [503] 452 responses
    [200] 48 responses

    We do see a total of 500 requests sent over 5 seconds (100 req/sec) with 452 requests rejected with a 503 status code. Doing the maths 48 requests were accepted with 200 status code (48 / 5 = 9.6 req/sec) which matches the limit we set: 10 req/sec.

    One quick look at NGINX logs in our service logs, we can see a message for those rejected requests:

    NGINX logs showing requests rejections from global rate

Service level rate limit

You can also set rate limits at the service level, which can be useful if you want to have different rate limits for different services.

The configuration described below will limit the number of requests per second from an IP address (it's not global to the service).

  1. Set rate limit in service advanced settings

    Go to your service advanced settings and set those two settings:

    • network.ingress.nginx_limit_rps: the rate limit in requests per second you want to set for your service (see documentation)
    • network.ingress.nginx_limit_rpm: the rate limit in requests per minute you want to set for your service (see documentation)
    • network.ingress.nginx_limit_burst_multiplier: the burst limit multiplier in requests for your service (default is 5) (see documentation)

    For this example, I will set a rate limit of 10 requests per minute with a burst limit of 2x.

    Using those

    Service advanced settings to set request limit per minute and burst

  2. Deploy your service

    Deploy your service with the new settings.

    Deploy service

  3. Testing the service level rate limit

    Let's test our setup, by sending 100 requests per second for 500 requests total. We should see that some requests are rejected with a 503 status code.

    ❯ oha -q 100 -n 500 https://p8080-za845ce06-z417ab7bf-gtw.z77ccfcb8.slab.sh/
    ohaSummary:
    Success rate: 100.00%
    Total: 5.0009 secs
    Slowest: 0.0705 secs
    Fastest: 0.0025 secs
    Average: 0.0062 secs
    Requests/sec: 99.9827
    Total data: 98.52 KiB
    Size/request: 201 B
    Size/sec: 19.70 KiB
    Response time histogram:
    0.002 [1] |
    0.009 [449] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
    0.016 [5] |
    0.023 [12] |
    0.030 [11] |
    0.036 [5] |
    0.043 [4] |
    0.050 [6] |
    0.057 [2] |
    0.064 [4] |
    0.070 [1] |
    Response time distribution:
    10.00% in 0.0029 secs
    25.00% in 0.0031 secs
    50.00% in 0.0033 secs
    75.00% in 0.0037 secs
    90.00% in 0.0145 secs
    95.00% in 0.0263 secs
    99.00% in 0.0578 secs
    99.90% in 0.0705 secs
    99.99% in 0.0705 secs
    Details (average, fastest, slowest):
    DNS+dialup: 0.0223 secs, 0.0114 secs, 0.0511 secs
    DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0001 secs
    Status code distribution:
    [503] 479 responses
    [200] 21 responses

    We do see a total of 500 requests sent over 5 seconds (100 req/sec) with 479 requests rejected with a 503 status code. Doing the maths 21 requests were accepted with 200 status code which matches the limit we set: 10 req/min with x2 burst multiplier.

    One quick look at NGINX logs in our service logs, we can see a message for those rejected requests:

    NGINX logs showing requests rejections from global rate

    When you set both network.ingress. nginx_limit_rpm (rate per minute) and network.ingress. nginx_limit_rps (rate per second) annotations on an Nginx ingress, both rate limits will be enforced simultaneously. This means that traffic will be restricted based on whichever limit is hit first. For example, if you set:

    • network.ingress.nginx_limit_rpm: 300
    • network.ingress.nginx_limit_rps: 10

    This configuration would mean:

    • No more than 300 requests allowed per minute
    • No more than 10 requests allowed per second

    In practice, the network.ingress. nginx_limit_rps often becomes the more restrictive limit. In the example above, while network.ingress. nginx_limit_rpm allows for 300 requests per minute (averaging to 5 requests per second), the network.ingress. nginx_limit_rps setting would block any burst over 10 requests in a single second, even if the total requests per minute is well below 300.

Service rate limit per custom header

So far, we've created rate limit on server name or client address IP, but you can also create rate limit based on custom headers.

This configuration requires to declare a rate limit at cluster level and then update service configurartion to use it.

  1. Define this rate limiter

    Go to cluster advanced settings, and declare this new rate limiter by setting nginx.controller.http_snippet (see documentation) with the following value (if any configuration is already set as in the example below, you can just append it at the end):

    map $http_x_qovery_api_key $limit_key {
    default $http_x_qovery_api_key;
    "" "anonymous"; # Fallback for missing or empty key
    }
    limit_req_zone $limit_key zone=qovery_api_limit:10m rate=5r/s;
    Details - `map`: directive creates a variable `$limit_key` based on the value of the `$http_x_qovery_api_key` header - `$http_x_qovery_api_key`: is the incoming HTTP header that contains the API key for the client making the request

    If $http_x_qovery_api_key is present and non-empty, $limit_key will be set to the value of $http_x_qovery_api_key. If $http_x_qovery_api_key is missing or empty (""), $limit_key is set to "anonymous".

    This ensures that requests without an API key are handled differently (e.g., rate-limited as anonymous users).

    Cluster settings custom header rate limiter

  2. Make use of this rate limiter at service level

    Go to your service advanced settings and set network.ingress.nginx_controller_configuration_snippet with the following:

    limit_req zone=qovery_api_limit burst=2 nodelay;

    In this example, we have set a burst of 2 but it's not mandatory, you can set it to fit best to your use case (checkout this documentation chapter Handling bursts to understand better burst). It allows a small burst of requests beyond the defined rate before enforcing the rate limit. Normally, the rate=5r/s setting in the limit_req_zone configuration allows up to 5 requests per second per $limit_key.

    • burst=2 setting allows up to 2 extra requests to pass through immediately, even if the rate limit has been exceeded, but only for short bursts.
    • `nodelay`` disables queuing of excess requests, without nodelay, excess requests would be queued and served later, respecting the rate limit.

    Service settins custom header rate limiter

  3. Deploy your service

    Deploy your service with the new settings.

    Deploy service with custom header rate limiter

  4. Testing the service level rate limit

    Let's test our setup, by sending 100 requests per second for 500 requests total. We should see that some requests are rejected with a 503 status code.

    ❯ oha -q 100 -n 500 -H "X-QOVERY-API-KEY: benjamin" https://p8080-za845ce06-z82b5bc64-gtw.z77ccfcb8.slab.sh/
    Summary:
    Success rate: 100.00%
    Total: 4.9952 secs
    Slowest: 0.0474 secs
    Fastest: 0.0025 secs
    Average: 0.0050 secs
    Requests/sec: 100.0959
    Total data: 101.40 KiB
    Size/request: 207 B
    Size/sec: 20.30 KiB
    Response time histogram:
    0.002 [1] |
    0.007 [447] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
    0.011 [2] |
    0.016 [12] |
    0.020 [23] |
    0.025 [4] |
    0.029 [6] |
    0.034 [3] |
    0.038 [0] |
    0.043 [1] |
    0.047 [1] |
    Response time distribution:
    10.00% in 0.0028 secs
    25.00% in 0.0031 secs
    50.00% in 0.0032 secs
    75.00% in 0.0036 secs
    90.00% in 0.0147 secs
    95.00% in 0.0183 secs
    99.00% in 0.0310 secs
    99.90% in 0.0474 secs
    99.99% in 0.0474 secs
    Details (average, fastest, slowest):
    DNS+dialup: 0.0160 secs, 0.0117 secs, 0.0338 secs
    DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0001 secs
    Status code distribution:
    [503] 473 responses
    [200] 27 responses

    We do see a total of 500 requests sent over 5 seconds (100 req/sec) with 473 requests rejected with a 503 status code. Doing the maths 27 requests were accepted with 200 status code which matches the limit we set: 5.4 req/seq with two requests burst.

    One quick look at NGINX logs in our service logs, we can see a message for those rejected requests:

    NGINX logs showing requests rejections from custom header rate limit in service logs

    Looking at raw NGINX logs, we do see the name of the rate limiter rejecting those requests:

    NGINX logs showing requests rejections from custom header rate limit

Other configuration examples

In this section, you will find couple other configurations examples (non exhaustive) you can set.

Custom request limit status HTTP code

While NGINX defaults to 503 HTTP Status code when rejecting requests, you can change this default by setting nginx.controller.limit_request_status_code in cluster advanced settings. For example, we usually use 429.

Custom http status code for limit in cluster advanced settings

For any rate limited request, NGINX will now returns an HTTP 429 status code.

oha -q 100 -n 500 -H "X-QOVERY-API-KEY: benjamin" https://p8080-za845ce06-z82b5bc64-gtw.z77ccfcb8.slab
.sh/
Summary:
Success rate: 100.00%
Total: 5.0001 secs
Slowest: 0.0616 secs
Fastest: 0.0052 secs
Average: 0.0102 secs
Requests/sec: 99.9981
Total data: 88.46 KiB
Size/request: 181 B
Size/sec: 17.69 KiB
Response time histogram:
0.005 [1] |
0.011 [438] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.017 [10] |
0.022 [1] |
0.028 [0] |
0.033 [15] |■
0.039 [21] |■
0.045 [4] |
0.050 [6] |
0.056 [1] |
0.062 [3] |
Response time distribution:
10.00% in 0.0062 secs
25.00% in 0.0065 secs
50.00% in 0.0069 secs
75.00% in 0.0076 secs
90.00% in 0.0278 secs
95.00% in 0.0353 secs
99.00% in 0.0501 secs
99.90% in 0.0616 secs
99.99% in 0.0616 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0282 secs, 0.0206 secs, 0.0537 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0001 secs
Status code distribution:
[429] 473 responses
[200] 27 responses

We can see 429 status code returned, same goes when looking at NGINX logs:

NGINX logs showing requests rejections with custom http status code

Limit connections

You might want to limit the number of concurrent connections allowed from a single IP address. To do so, you can set the network.ingress.nginx_limit_connections advanced setting at service level.