Redis Rate Limiting and Request Throttling for Load Smoothening

Traffic spikes are inevitable. Without throttling, a burst of requests can overwhelm your database, exhaust third-party API quotas, or degrade service for all users. Redis gives you the primitives to implement smooth, fair request limiting across your entire fleet of servers.

Rate Limiting vs Throttling

Rate limiting enforces a maximum number of requests in a time window — hard stop when exceeded. Throttling smoothens the intake rate, queuing or slowing excess requests rather than rejecting them. Redis supports both approaches.

Fixed Window Counter (INCR + EXPIRE)

The simplest approach: count requests per time window using INCR and expire the counter at the end of the window. The key encodes both the user identifier and the current time window (e.g. the current minute as a unix epoch divided by 60), so each window gets its own counter that automatically disappears when it expires.

# key format: rate:{user_id}:{window}
# window = floor(current_unix_timestamp / 60)  → changes every 60 s

# First request in the window
INCR rate:user:42:28234567
# (integer) 1

# Set expiry so the key self-cleans after the window ends
EXPIRE rate:user:42:28234567 60
# (integer) 1

# Subsequent requests just increment — no EXPIRE needed again
INCR rate:user:42:28234567
# (integer) 2

# Decision: if returned count <= 100 → allow. Otherwise → HTTP 429.

The INCR command is atomic: Redis guarantees no two clients ever receive the same counter value for the same key. However, calling EXPIRE unconditionally on every request would reset the TTL back to 60 seconds each time, turning the fixed window into an ever-sliding one — exactly the behaviour we want to avoid here.

The correct approach is to set the expiry only on the first increment (when the count equals 1). Two ways to do this atomically:

Option 1 — EXPIRE NX (Redis 7.0+): The NX flag tells Redis to apply the expiry only when no TTL is set yet, so subsequent requests leave the deadline untouched:

MULTI
INCR rate:user:42:28234567
EXPIRE rate:user:42:28234567 60 NX
EXEC
# 1) (integer) 3
# 2) (integer) 0   → 0 means expiry was already set; TTL was not reset

Option 2 — Lua script (Redis < 7.0): A small Lua script runs atomically server-side in a single round-trip and checks the count before setting the expiry:

EVAL "
  local count = redis.call('INCR', KEYS[1])
  if count == 1 then
    redis.call('EXPIRE', KEYS[1], ARGV[1])
  end
  return count
" 1 rate:user:42:28234567 60
# (integer) 4

Because count == 1 is true only for the very first request in a window, the expiry is set exactly once and never extended. The key disappears after 60 seconds, and the next request starts a fresh window.

Drawback: a user can send 100 requests at 0:59 and 100 more at 1:01, effectively 200 requests in 2 seconds at the window boundary. The sliding window solves this.

Sliding Window with Sorted Sets

Store each request as a member in a sorted set (ZADD) with the unix timestamp as the score. Before counting, trim entries older than the window with ZREMRANGEBYSCORE, then use ZCARD to count what remains. This gives an exact count of requests in the last N seconds regardless of where in the minute the clock sits.

The four commands must run atomically. Use a Lua script so the ZADD only fires when the request is actually allowed — otherwise denied requests pile up in the sorted set and artificially inflate the count for subsequent checks:

EVAL "
  local key   = KEYS[1]
  local now   = tonumber(ARGV[1])
  local win   = tonumber(ARGV[2])
  local limit = tonumber(ARGV[3])
  local window_start = now - win

  redis.call('ZREMRANGEBYSCORE', key, 0, window_start)
  local count = redis.call('ZCARD', key)

  if count < limit then
    redis.call('ZADD', key, now, now)
    redis.call('EXPIRE', key, win + 1)
    return 1  -- allowed
  end
  return 0    -- denied, sorted set is NOT modified
" 1 rate:sliding:user:42 1700000060.500 60 100
# (integer) 1   → allowed
# (integer) 0   → denied (HTTP 429)

ZREMRANGEBYSCORE removes every member whose score (timestamp) falls before window_start, leaving only the last 60 seconds of traffic. ZCARD counts what remains in O(1). The ZADD is guarded inside the if count < limit block — denied requests are never written to the set, so a flood of rejected calls cannot inflate the counter for legitimate ones. The EXPIRE is reset on every allowed request, which is intentional: the key should stay alive as long as active requests keep arriving.

Unlike the fixed-window counter, the EXPIRE reset here is correct — for a sliding window the key must live at least window_seconds from the most recent entry so that old entries have time to age out of the score range.

Token Bucket Algorithm

The token bucket smoothens bursts: tokens refill at a constant rate up to a maximum capacity, and each request consumes one token. It allows short bursts while enforcing a sustained long-term rate. The state — current token count and the last refill timestamp — is stored in a Redis hash with HSET and read back with HMGET.

# key: token_bucket:user:42
# capacity = 10 (max burst), refill_rate = 1 token/second

# Read current state
HMGET token_bucket:user:42 tokens last_refill
# 1) "8.500000"
# 2) "1700000050.123"

# --- client-side arithmetic ---
# elapsed     = now - last_refill  = 1700000060.500 - 1700000050.123 = 10.377 s
# refilled    = min(10, 8.5 + 10.377 * 1) = min(10, 18.877) = 10 tokens
# after use   = 10 - 1 = 9 tokens
# → request is ALLOWED
# ------------------------------

# Write back the new state
HSET token_bucket:user:42 tokens 9 last_refill 1700000060.500
# (integer) 0   → 0 new fields created (both fields already existed)

# TTL = capacity / refill_rate + safety buffer = 10 / 1 + 10 = 20 s
EXPIRE token_bucket:user:42 20
# (integer) 1

# If tokens were < 1 before decrement: skip HSET, return HTTP 429.

The arithmetic runs on the application side after reading the hash — Redis itself never needs to know the formula. Because HMGET and HSET are not run inside a MULTI/EXEC, high-concurrency systems should use a Lua script (EVAL) to make the read-compute-write cycle atomic in a single Redis call, eliminating the race condition where two simultaneous requests both read the same token count.

Choosing the Right Algorithm

Fixed window — simplest, minimal memory. Good for rough limits where boundary edge cases don't matter.
Sliding window — precise, prevents boundary exploitation. Higher memory usage (one sorted-set entry per request).
Token bucket — allows controlled bursts, smoothens load. Best for API quota management and load smoothening.

Rate Limiting in Elixir with Hammer

Hammer is the de-facto rate-limiting library for Elixir. Its pluggable backend system lets you swap storage without touching application code. Two Redis backends cover the most common production setups.

Single Redis Instance — hammer_backend_redis

The hammer_backend_redis package wraps Redix and implements the fixed-window counter algorithm using Redis pipelines under the hood — the same INCR + EXPIRE pattern shown above, but entirely managed for you.

# mix.exs
defp deps do
  [
    {:hammer, "~> 6.1"},
    {:hammer_backend_redis, "~> 6.1"}
  ]
end

Define a module that acts as your rate limiter and add it to the supervision tree:

defmodule MyApp.RateLimit do
  use Hammer, backend: Hammer.Redis
end

# application.ex — supervision tree
children = [
  {MyApp.RateLimit, url: "redis://localhost:6379"}
]

Then call hit/3 anywhere in your application. The first argument is a string key (typically user ID or IP), the second is the window size in milliseconds, and the third is the request limit:

defmodule MyAppWeb.ApiController do
  # Allow 100 requests per minute per user
  def check_rate(user_id) do
    case MyApp.RateLimit.hit("api:#{user_id}", 60_000, 100) do
      {:allow, _count} ->
        :ok
      {:deny, _limit} ->
        {:error, :rate_limited}
    end
  end
end

For production deployments with TLS you can pass ssl and socket_opts directly when starting the process:

{MyApp.RateLimit,
 host: "redis.internal",
 port: 6379,
 ssl: true,
 socket_opts: [
   customize_hostname_check: [
     match_fun: :public_key.pkix_verify_hostname_match_fun(:https)
   ]
 ]}

Redis Cluster — hammer_backend_eredis_cluster

When your Redis deployment is a cluster (multiple shards for horizontal scaling), use hammer_backend_eredis_cluster. It uses eredis_cluster under the hood to route commands to the correct shard based on the key hash slot, so counters for different users land on different nodes automatically.

# mix.exs
defp deps do
  [
    {:hammer, "~> 6.1"},
    {:hammer_backend_eredis_cluster, "~> 1.0"}
  ]
end

Configuration goes in config/config.exs. Provide at least a few seed nodes — eredis_cluster discovers the full topology from them at startup:

# config/config.exs
config :hammer,
  backend: {Hammer.Backend.Eredis, [
    expiry_ms: 60_000 * 60 * 2,  # counters live for 2 hours
    eredis_cluster_init_servers: [
      {"redis-node-1.internal", 6379},
      {"redis-node-2.internal", 6380},
      {"redis-node-3.internal", 6381}
    ]
  ]}

The public API is identical to the single-node backend — your application code does not need to change at all:

case Hammer.check_rate("api:#{user_id}", 60_000, 100) do
  {:allow, _count} ->
    conn |> send_resp(200, "OK")
  {:deny, _limit} ->
    conn |> put_status(429) |> json(%{error: "Too many requests"})
end

The cluster backend is the right choice when you need Redis to scale horizontally — for example, if a single Redis node cannot handle the write throughput of your rate-limiting traffic, or if you need data locality across regions.

Key Takeaways

INCR + EXPIRE is the simplest fixed-window rate limiter — two commands, minimal overhead, O(1) memory per user
Sorted sets with ZREMRANGEBYSCORE + ZCARD enable precise sliding windows at the cost of one entry per request in memory
Token bucket via a Redis hash smoothens bursts while enforcing a sustained rate — use a Lua script to make the read-compute-write cycle atomic
Use Redis pipelines or MULTI/EXEC to send multiple commands in a single round-trip and avoid race conditions
In Elixir, Hammer with hammer_backend_redis gives you a clean, tested fixed-window implementation on top of Redix — swap to hammer_backend_eredis_cluster for clustered Redis without touching application code
Always return Retry-After headers so clients know when to retry