Redis Rate Limiting and Request Throttling for Load Smoothening
Traffic spikes are inevitable. Without throttling, a burst of requests can overwhelm your database, exhaust third-party API quotas, or degrade service for all users. Redis gives you the primitives to implement smooth, fair request limiting across your entire fleet of servers.
Rate Limiting vs Throttling
Rate limiting enforces a maximum number of requests in a time window — hard stop when exceeded. Throttling smoothens the intake rate, queuing or slowing excess requests rather than rejecting them. Redis supports both approaches.
Fixed Window Counter (INCR + EXPIRE)
The simplest approach: count requests per time window with INCR and expire the counter at window's end.
# Allow 100 requests per minute per user
import redis
import time
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
def is_allowed(user_id, limit=100, window_seconds=60):
window = int(time.time() // window_seconds)
key = f"rate:{user_id}:{window}"
pipe = r.pipeline()
pipe.incr(key)
pipe.expire(key, window_seconds)
count, _ = pipe.execute()
return count <= limit
# Usage
if not is_allowed("user:42"):
return {"error": "Too many requests"}, 429
Drawback: a user can send 100 requests at 0:59 and 100 more at 1:01, effectively 200 requests in 2 seconds. The sliding window solves this.
Sliding Window with Sorted Sets
Store each request as a member in a sorted set with its timestamp as the score. Count only requests within the last N seconds.
import time
def is_allowed_sliding(user_id, limit=100, window_seconds=60):
now = time.time()
window_start = now - window_seconds
key = f"rate:sliding:{user_id}"
pipe = r.pipeline()
# Remove requests outside the window
pipe.zremrangebyscore(key, 0, window_start)
# Count requests in the window
pipe.zcard(key)
# Add current request
pipe.zadd(key, {str(now): now})
# Set expiry to clean up
pipe.expire(key, window_seconds + 1)
_, count, _, _ = pipe.execute()
return count < limit
Token Bucket Algorithm
The token bucket smoothens bursts: tokens accumulate over time up to a maximum capacity, each request consumes one token. It allows short bursts while maintaining a long-term rate.
def consume_token(user_id, capacity=10, refill_rate=1):
"""
capacity: max burst size
refill_rate: tokens added per second
"""
now = time.time()
key = f"token_bucket:{user_id}"
data = r.hmget(key, 'tokens', 'last_refill')
tokens = float(data[0] or capacity)
last_refill = float(data[1] or now)
# Add tokens based on elapsed time
elapsed = now - last_refill
tokens = min(capacity, tokens + elapsed * refill_rate)
if tokens >= 1:
tokens -= 1
r.hset(key, mapping={'tokens': tokens, 'last_refill': now})
r.expire(key, int(capacity / refill_rate) + 10)
return True # allowed
return False # throttled
Choosing the Right Algorithm
- Fixed window — simplest, minimal memory. Good for rough limits where edge cases don't matter.
- Sliding window — precise, prevents boundary exploitation. Higher memory usage (one entry per request).
- Token bucket — allows controlled bursts, smoothens load. Best for API quota management and load smoothening.
Key Takeaways
- INCR + EXPIRE is the simplest fixed-window rate limiter — two commands, minimal overhead
- Sorted sets enable precise sliding windows at the cost of more memory per user
- Token bucket smoothens bursts while enforcing a sustained rate — ideal for load smoothening
- Use Redis pipelines to execute multiple commands atomically and reduce round-trips
- Always return
Retry-Afterheaders so clients know when to retry