Approximate Cardinality of Large Sets Using Redis HyperLogLog
How many unique visitors did your site have today? Tracking this with a set requires storing every user ID — gigabytes of data for popular sites. Redis HyperLogLog answers this question using just 12 kilobytes of memory, regardless of whether you have 1,000 or 1 billion unique users.
The Memory Problem with Exact Counting
Exact unique counting requires remembering every item you've seen. A Redis Set tracking 10 million user IDs (8 bytes each) needs ~80MB of memory. Multiply that by daily/weekly/monthly windows and hundreds of pages — the storage adds up fast.
HyperLogLog trades a small amount of accuracy for a massive reduction in memory: at most 12KB regardless of cardinality, with a standard error of just 0.81%. For most analytics use cases, knowing you had "1,004,238 unique visitors" vs the exact "1,000,000" is perfectly acceptable.
The Three HyperLogLog Commands
# PFADD — add elements
PFADD visitors:2026-06-09 "user:101" "user:202" "user:303"
PFADD visitors:2026-06-09 "user:101" # duplicates ignored
PFADD visitors:2026-06-09 "user:404"
# PFCOUNT — estimate unique count
PFCOUNT visitors:2026-06-09 # returns ~4
# PFMERGE — combine multiple HLLs (e.g., monthly from daily)
PFMERGE visitors:2026-06 visitors:2026-06-01 visitors:2026-06-02 visitors:2026-06-09
Practical Use Cases
Unique Visitors Per Page Per Day
import redis
from datetime import date
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
def track_visit(page_slug, user_id):
today = date.today().isoformat()
key = f"uv:{page_slug}:{today}"
r.pfadd(key, user_id)
r.expire(key, 90 * 86400) # keep 90 days
def get_unique_visitors(page_slug, day=None):
day = day or date.today().isoformat()
key = f"uv:{page_slug}:{day}"
return r.pfcount(key)
# Track
track_visit("blog/redis-caching", "user:42")
track_visit("blog/redis-caching", "user:99")
track_visit("blog/redis-caching", "user:42") # duplicate, not counted
# Query
print(get_unique_visitors("blog/redis-caching")) # ~2
Monthly Active Users (MAU) from Daily HLLs
def get_monthly_active_users(year, month):
import calendar
days_in_month = calendar.monthrange(year, month)[1]
# Collect all daily keys for the month
daily_keys = [f"dau:{year}-{month:02d}-{d:02d}"
for d in range(1, days_in_month + 1)]
# Merge all daily HLLs into a temporary key
merge_key = f"mau:{year}-{month:02d}:tmp"
r.pfmerge(merge_key, *daily_keys)
r.expire(merge_key, 3600) # 1 hour temp key
return r.pfcount(merge_key)
Other Counting Use Cases
- Unique search queries — how many distinct queries hit your search endpoint today
- Unique IPs — approximate DDoS detection without storing all IPs
- A/B test reach — how many unique users saw each variant
- Unique product views — per product, per day, without bloated sets
HyperLogLog vs Redis Sets
- HyperLogLog — fixed 12KB memory, ~0.81% error, cannot retrieve members, perfect for cardinality only
- Redis Set — memory grows linearly with members, exact count, can retrieve/iterate members, needed when you need the actual list
Key Takeaways
- 12KB fixed memory regardless of how many unique items you track
- 0.81% standard error — accurate enough for analytics and dashboards
- PFADD adds elements, PFCOUNT estimates, PFMERGE combines HLLs for roll-ups
- Use for DAU/MAU metrics, unique page views, and any high-cardinality counting problem
- Cannot retrieve individual members — use a Set if you need the actual list of items