PostgreSQL Partitioning with pg_partman: Automate Partition Management

Large datasets become unwieldy fast. As your tables grow to millions or billions of rows, query performance degrades, index maintenance becomes expensive, and operations like VACUUM grow painfully slow. This is where PostgreSQL partitioning shines, but manual partition management is tedious and error-prone. pg_partman eliminates this burden by automating partition creation, maintenance, and cleanup.

What is pg_partman?

pg_partman is a PostgreSQL extension that automates the creation and maintenance of table partitions. Rather than manually creating new partitions and managing retention, pg_partman handles this complexity through time-based and serial-based partitioning strategies.

Partitioning splits large tables into smaller, more manageable chunks based on a partition key. PostgreSQL then routes queries to only relevant partitions, dramatically improving performance for large datasets. pg_partman automates this architecture, creating new partitions on schedule and removing old data according to retention policies.

Available for PostgreSQL 12.5 and newer, including AWS RDS, pg_partman pairs naturally with pg_cron to schedule maintenance tasks that run automatically in the background.

Why Partition Large Tables?

Partitioning addresses critical challenges with massive datasets:

Improved query performance: PostgreSQL only scans partitions containing relevant data, dramatically reducing I/O
Efficient index maintenance: Smaller indexes on individual partitions are faster to build and maintain than monolithic indexes
Faster VACUUM operations: Vacuuming smaller partitions takes significantly less time and resource overhead
Simplified archival and retention: Drop entire old partitions instead of deleting rows from massive tables
Parallel query execution: PostgreSQL can parallelize scans across multiple partitions

Without partitioning, operations on a billion-row table lock the entire dataset. With proper partitioning, you operate on manageable chunks while keeping the database responsive.

Getting Started with pg_partman

First, install the pg_partman extension on your PostgreSQL server. For AWS RDS, it's available in the extensions list. For self-hosted PostgreSQL, install via your package manager or compile from source.

Once installed, create the extension in your database:

CREATE EXTENSION IF NOT EXISTS pg_partman;

pg_partman creates schema objects that manage partitions. Verify the installation by checking for the pg_partman schema:

SELECT schemaname FROM pg_namespace
WHERE schemaname = 'pg_partman';

Creating Partitioned Tables

The simplest approach is to start with a new table. Define your table structure first, then use pg_partman's create_parent function to enable partitioning:

-- Create your base table
CREATE TABLE events (
  id BIGSERIAL NOT NULL,
  created_at TIMESTAMP NOT NULL,
  event_type VARCHAR(50),
  user_id BIGINT,
  payload JSONB,
  PRIMARY KEY (id, created_at)
);

-- Enable time-based partitioning
SELECT pg_partman.create_parent(
  p_parent_table => 'public.events',
  p_control => 'created_at',
  p_type => 'range',
  p_interval => 'daily',
  p_premake => 7
);

This creates daily partitions with 7 days pre-created ahead of time. The control column (created_at) determines which partition receives each row. The p_premake => 7 parameter ensures new partitions exist before data arrives, preventing write failures on the boundary dates.

For time intervals, use 'daily', 'monthly', or 'yearly'. For serial-based partitioning on integer IDs, use p_interval => 1000000 to create new partitions every 1 million rows.

Automating Partition Maintenance

Partitions must be created regularly and old partitions removed according to retention policy. pg_partman integrates seamlessly with pg_cron for automated maintenance. First, ensure pg_cron is installed and enabled:

CREATE EXTENSION IF NOT EXISTS pg_cron;
GRANT USAGE ON SCHEMA pg_cron TO postgres;

Now schedule the maintenance function to run daily:

SELECT cron.schedule(
  'maintain-events-partitions',
  '5 0 * * *',  -- Run at 00:05 UTC daily
  'SELECT pg_partman.maintain_partition_table(''public.events'')'
);

The cron expression follows standard Unix cron format. This schedule creates new partitions and removes old ones according to your retention policy, ensuring your partition structure stays ahead of data growth.

Setting Retention Policies

Retention policies automatically remove old data. Define how long to keep partitions using the set_config_parameters function:

-- Keep data for 90 days
UPDATE pg_partman.part_config
SET retention = '90 days',
    retention_keep_behavior = 'DETACH'
WHERE parent_table = 'public.events';

The retention_keep_behavior option determines what happens to expired partitions. 'DETACH' removes them from the partition structure but preserves the data if you need to archive it. 'DROP' immediately removes the partition and its data.

You can also retain a minimum number of partitions even if they're old:

UPDATE pg_partman.part_config
SET retention = '90 days',
    retention_keep_number = 30
WHERE parent_table = 'public.events';

This keeps at least 30 partitions regardless of age, useful for ensuring historical data availability beyond the standard retention window.

Monitoring Partition Health

Query the part_config table to monitor partition status:

SELECT parent_table,
       partition_interval,
       retention,
       retention_keep_number,
       last_partition,
       last_run_time
FROM pg_partman.part_config
WHERE parent_table = 'public.events';

Check the part_info_parent view to see all existing partitions:

SELECT partition_name,
       partition_expression,
       pg_size_pretty(pg_relation_size(schemaname||'.'||partition_name)) AS size
FROM pg_partman.part_info_parent
WHERE parent_table = 'public.events'
ORDER BY partition_name;

This reveals partition distribution, sizes, and helps identify imbalances or anomalies in your data.

Best Practices

Successful partitioning requires thoughtful design:

Choose partition keys that align with your query patterns - typically timestamps for time-series data
Include the partition key in your primary key to satisfy PostgreSQL's constraint requirements
Pre-create 7-10 days of partitions to prevent insertion failures on boundaries
Set retention policies conservatively to avoid surprise data loss
Test maintenance scripts on staging environments before production deployment
Monitor partition creation and removal logs to catch issues early
Update index strategies - individual partition indexes are often more efficient than global ones

Review pg_partman.part_template to understand what indexes will be created on new partitions automatically. Customize templates to match your application's specific index requirements.

Migrating Existing Tables

Partitioning an existing large table requires care. Create a new partitioned table, copy data, then swap table names:

-- Create new partitioned table
CREATE TABLE events_partitioned LIKE events INCLUDING ALL;

-- Apply partitioning
SELECT pg_partman.create_parent(
  p_parent_table => 'public.events_partitioned',
  p_control => 'created_at',
  p_type => 'range',
  p_interval => 'daily'
);

-- Copy existing data
INSERT INTO events_partitioned SELECT * FROM events;

-- Swap table names
ALTER TABLE events RENAME TO events_old;
ALTER TABLE events_partitioned RENAME TO events;

After verifying data integrity, drop the old table. For massive tables, consider partitioning data in batches to reduce memory pressure and lock contention.

Conclusion

pg_partman transforms partition management from a manual maintenance burden into an automated system that scales effortlessly. By splitting large tables into manageable chunks and cleaning up old data systematically, it unlocks dramatic performance improvements for growing datasets.

Whether you're handling time-series events, logs, or transactional history, pg_partman enables you to focus on application logic while your database infrastructure handles the complexity of scale automatically.