Redis

Redis Rate Limiting and Request Throttling for Load Smoothening

June 9, 2026
7 min read

Traffic spikes are inevitable. Without throttling, a burst of requests can overwhelm your database, exhaust third-party API quotas, or degrade service for all users. Redis gives you the primitives to implement smooth, fair request limiting across your entire fleet of servers.

Rate Limiting vs Throttling

Rate limiting enforces a maximum number of requests in a time window — hard stop when exceeded. Throttling smoothens the intake rate, queuing or slowing excess requests rather than rejecting them. Redis supports both approaches.

Fixed Window Counter (INCR + EXPIRE)

The simplest approach: count requests per time window with INCR and expire the counter at window's end.

# Allow 100 requests per minute per user
import redis
import time

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

def is_allowed(user_id, limit=100, window_seconds=60):
    window = int(time.time() // window_seconds)
    key = f"rate:{user_id}:{window}"

    pipe = r.pipeline()
    pipe.incr(key)
    pipe.expire(key, window_seconds)
    count, _ = pipe.execute()

    return count <= limit

# Usage
if not is_allowed("user:42"):
    return {"error": "Too many requests"}, 429

Drawback: a user can send 100 requests at 0:59 and 100 more at 1:01, effectively 200 requests in 2 seconds. The sliding window solves this.

Sliding Window with Sorted Sets

Store each request as a member in a sorted set with its timestamp as the score. Count only requests within the last N seconds.

import time

def is_allowed_sliding(user_id, limit=100, window_seconds=60):
    now = time.time()
    window_start = now - window_seconds
    key = f"rate:sliding:{user_id}"

    pipe = r.pipeline()
    # Remove requests outside the window
    pipe.zremrangebyscore(key, 0, window_start)
    # Count requests in the window
    pipe.zcard(key)
    # Add current request
    pipe.zadd(key, {str(now): now})
    # Set expiry to clean up
    pipe.expire(key, window_seconds + 1)
    _, count, _, _ = pipe.execute()

    return count < limit

Token Bucket Algorithm

The token bucket smoothens bursts: tokens accumulate over time up to a maximum capacity, each request consumes one token. It allows short bursts while maintaining a long-term rate.

def consume_token(user_id, capacity=10, refill_rate=1):
    """
    capacity: max burst size
    refill_rate: tokens added per second
    """
    now = time.time()
    key = f"token_bucket:{user_id}"

    data = r.hmget(key, 'tokens', 'last_refill')
    tokens = float(data[0] or capacity)
    last_refill = float(data[1] or now)

    # Add tokens based on elapsed time
    elapsed = now - last_refill
    tokens = min(capacity, tokens + elapsed * refill_rate)

    if tokens >= 1:
        tokens -= 1
        r.hset(key, mapping={'tokens': tokens, 'last_refill': now})
        r.expire(key, int(capacity / refill_rate) + 10)
        return True  # allowed
    return False  # throttled

Choosing the Right Algorithm

  • Fixed window — simplest, minimal memory. Good for rough limits where edge cases don't matter.
  • Sliding window — precise, prevents boundary exploitation. Higher memory usage (one entry per request).
  • Token bucket — allows controlled bursts, smoothens load. Best for API quota management and load smoothening.

Key Takeaways

  • INCR + EXPIRE is the simplest fixed-window rate limiter — two commands, minimal overhead
  • Sorted sets enable precise sliding windows at the cost of more memory per user
  • Token bucket smoothens bursts while enforcing a sustained rate — ideal for load smoothening
  • Use Redis pipelines to execute multiple commands atomically and reduce round-trips
  • Always return Retry-After headers so clients know when to retry