Redis

Approximate Cardinality of Large Sets Using Redis HyperLogLog

June 9, 2026
6 min read

How many unique visitors did your site have today? Tracking this with a set requires storing every user ID — gigabytes of data for popular sites. Redis HyperLogLog answers this question using just 12 kilobytes of memory, regardless of whether you have 1,000 or 1 billion unique users.

The Memory Problem with Exact Counting

Exact unique counting requires remembering every item you've seen. A Redis Set tracking 10 million user IDs (8 bytes each) needs ~80MB of memory. Multiply that by daily/weekly/monthly windows and hundreds of pages — the storage adds up fast.

HyperLogLog trades a small amount of accuracy for a massive reduction in memory: at most 12KB regardless of cardinality, with a standard error of just 0.81%. For most analytics use cases, knowing you had "1,004,238 unique visitors" vs the exact "1,000,000" is perfectly acceptable.

The Three HyperLogLog Commands

# PFADD — add elements
PFADD visitors:2026-06-09 "user:101" "user:202" "user:303"
PFADD visitors:2026-06-09 "user:101"   # duplicates ignored
PFADD visitors:2026-06-09 "user:404"

# PFCOUNT — estimate unique count
PFCOUNT visitors:2026-06-09   # returns ~4

# PFMERGE — combine multiple HLLs (e.g., monthly from daily)
PFMERGE visitors:2026-06 visitors:2026-06-01 visitors:2026-06-02 visitors:2026-06-09

Practical Use Cases

Unique Visitors Per Page Per Day

import redis
from datetime import date

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

def track_visit(page_slug, user_id):
    today = date.today().isoformat()
    key = f"uv:{page_slug}:{today}"
    r.pfadd(key, user_id)
    r.expire(key, 90 * 86400)  # keep 90 days

def get_unique_visitors(page_slug, day=None):
    day = day or date.today().isoformat()
    key = f"uv:{page_slug}:{day}"
    return r.pfcount(key)

# Track
track_visit("blog/redis-caching", "user:42")
track_visit("blog/redis-caching", "user:99")
track_visit("blog/redis-caching", "user:42")   # duplicate, not counted

# Query
print(get_unique_visitors("blog/redis-caching"))  # ~2

Monthly Active Users (MAU) from Daily HLLs

def get_monthly_active_users(year, month):
    import calendar
    days_in_month = calendar.monthrange(year, month)[1]

    # Collect all daily keys for the month
    daily_keys = [f"dau:{year}-{month:02d}-{d:02d}"
                  for d in range(1, days_in_month + 1)]

    # Merge all daily HLLs into a temporary key
    merge_key = f"mau:{year}-{month:02d}:tmp"
    r.pfmerge(merge_key, *daily_keys)
    r.expire(merge_key, 3600)  # 1 hour temp key

    return r.pfcount(merge_key)

Other Counting Use Cases

  • Unique search queries — how many distinct queries hit your search endpoint today
  • Unique IPs — approximate DDoS detection without storing all IPs
  • A/B test reach — how many unique users saw each variant
  • Unique product views — per product, per day, without bloated sets

HyperLogLog vs Redis Sets

  • HyperLogLog — fixed 12KB memory, ~0.81% error, cannot retrieve members, perfect for cardinality only
  • Redis Set — memory grows linearly with members, exact count, can retrieve/iterate members, needed when you need the actual list

Key Takeaways

  • 12KB fixed memory regardless of how many unique items you track
  • 0.81% standard error — accurate enough for analytics and dashboards
  • PFADD adds elements, PFCOUNT estimates, PFMERGE combines HLLs for roll-ups
  • Use for DAU/MAU metrics, unique page views, and any high-cardinality counting problem
  • Cannot retrieve individual members — use a Set if you need the actual list of items