6 min read

Rollup Tables in Postgres: The Pattern That Replaces Most BI Queries

Rollup tables are pre-aggregated summaries of source data, kept up to date via triggers or scheduled jobs. They turn dashboard queries from seconds into milliseconds.

A rollup table is a pre-aggregated summary of source data. "Daily revenue by country" rolled up from individual orders. "Hourly active users" rolled up from session events. The rollup is much smaller than the source, and queries against it are much faster.

For any application with analytics or dashboards, rollups are usually the answer to "this query is slow." Materialized views are one form of rollup; explicit rollup tables are another. The patterns are similar but the operational characteristics differ.

When to roll up

The right shape:

  • Source data is large (millions to billions of rows).
  • Aggregations over time-bucketed dimensions are queried frequently.
  • Some staleness in the rollup is acceptable.
  • The rollup query is more expensive than the rollup maintenance cost.

This covers most analytics and dashboard use cases.

The basic shape

-- Source: events table with millions of rows
CREATE TABLE events (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  user_id BIGINT NOT NULL,
  event_type TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Rollup: pre-aggregated by hour and event type
CREATE TABLE event_rollup_hourly (
  hour_start TIMESTAMPTZ NOT NULL,
  event_type TEXT NOT NULL,
  event_count INTEGER NOT NULL DEFAULT 0,
  unique_users INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (hour_start, event_type)
);

The rollup is several orders of magnitude smaller than the source. A 100-million-row source with 50 distinct event types and a year of data rolls up to about 4 million rows — a 25x reduction.

Maintaining the rollup

Three patterns:

1. Scheduled batch updates. A cron job rolls up the previous hour's data:

INSERT INTO event_rollup_hourly (hour_start, event_type, event_count, unique_users)
SELECT
  date_trunc('hour', created_at),
  event_type,
  count(*),
  count(DISTINCT user_id)
FROM events
WHERE created_at >= date_trunc('hour', now() - interval '1 hour')
  AND created_at < date_trunc('hour', now())
GROUP BY 1, 2
ON CONFLICT (hour_start, event_type) DO UPDATE
SET event_count = EXCLUDED.event_count,
    unique_users = EXCLUDED.unique_users;

Run every hour. Each run processes the previous hour. Up-to-1-hour staleness.

2. Trigger-based updates. A trigger on the source updates the rollup on every insert:

CREATE OR REPLACE FUNCTION update_hourly_rollup()
RETURNS TRIGGER AS $$
BEGIN
  INSERT INTO event_rollup_hourly (hour_start, event_type, event_count, unique_users)
  VALUES (
    date_trunc('hour', NEW.created_at),
    NEW.event_type,
    1,
    1  -- approximate; for exact uniqueness, separate table
  )
  ON CONFLICT (hour_start, event_type) DO UPDATE
  SET event_count = event_rollup_hourly.event_count + 1;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER events_rollup_trigger
AFTER INSERT ON events
FOR EACH ROW EXECUTE FUNCTION update_hourly_rollup();

Rollup is always current. Cost: every event INSERT does extra work.

3. Hybrid: triggers for current data, recompute periodically for older data. Triggers handle the current hour; a daily job recomputes the previous day from scratch to fix any drift.

This is the pattern I recommend most often. Real-time enough for dashboards; self-correcting against bugs in trigger logic.

Multiple granularities

For flexible querying, maintain rollups at multiple granularities:

CREATE TABLE event_rollup_daily (
  day DATE NOT NULL,
  event_type TEXT NOT NULL,
  event_count BIGINT NOT NULL,
  unique_users INTEGER NOT NULL,
  PRIMARY KEY (day, event_type)
);

CREATE TABLE event_rollup_monthly (
  month DATE NOT NULL,
  event_type TEXT NOT NULL,
  event_count BIGINT NOT NULL,
  unique_users INTEGER NOT NULL,
  PRIMARY KEY (month, event_type)
);

Dashboard queries use the appropriate granularity:

  • Last 24 hours? Use hourly.
  • Last 30 days? Use daily.
  • Last 12 months? Use monthly.

Maintenance: each level rolls up from the level below (daily aggregates the hourlies, monthly aggregates the dailies). Or each rolls up directly from the source.

The storage savings are huge. The query performance is constant regardless of source size.

Handling unique counts correctly

The tricky part: counting unique users. You cannot simply add unique counts across hours to get a daily unique. "User 47 was unique each hour all day" is 24 unique-per-hour but 1 unique-per-day.

Three approaches:

1. Recompute uniques at query time from the source. Slow but accurate.

2. Maintain a separate unique_users_set per bucket as an array or HyperLogLog sketch:

CREATE EXTENSION hll;  -- HyperLogLog
ALTER TABLE event_rollup_hourly
ADD COLUMN unique_users_hll hll DEFAULT hll_empty();

HLL is approximate but mergeable — daily unique count is the union of 24 hourly HLLs.

3. Accept approximation based on source-of-truth at coarser granularity. Hourly uniques are recorded; daily uniques are recomputed from source weekly.

For most dashboards, approximate is fine. For billing or compliance counts, accurate.

Rollup vs materialized view

Materialized views are a built-in Postgres feature that gives you a similar capability:

CREATE MATERIALIZED VIEW event_rollup_hourly_mv AS
SELECT date_trunc('hour', created_at) AS hour, event_type, count(*) AS event_count
FROM events
GROUP BY 1, 2;

The difference:

  • Materialized view: refresh is full re-compute. Easy. Limited control over partial updates.
  • Rollup table: explicit table, explicit maintenance. More flexible but more code.

For most cases, materialized view is the right starting point. Rollup tables become better when:

  • The full re-compute is too expensive.
  • You need fine-grained control over which buckets are recomputed.
  • The update pattern needs to be incremental.

A worked example

A team I worked with had dashboard queries against a 200M-row events table. Common query was "hourly events for the last 7 days." Took 4 seconds.

Fix:

  1. Created event_rollup_hourly table.
  2. Backfilled with INSERT ... SELECT for the last 30 days.
  3. Set up scheduled job to roll up the previous hour every 15 minutes.
  4. Application changed dashboard queries to hit the rollup.

New dashboard query latency: 30ms. Up to 15-minute stale data, which the dashboard already showed via a "data updated every 15 minutes" indicator.

The rollup table is 50,000 rows after a year of accumulation. Tiny by Postgres standards.

What I commit to with rollups

  1. The rollup is documented — what it is for, what staleness, how it is maintained.
  2. Maintenance is monitored — alerts if the rollup falls behind.
  3. The rollup query is tested — does the result match the source for a sample window.
  4. Periodic full recompute (weekly or monthly) catches drift from incremental maintenance bugs.
  5. The rollup has appropriate indexes for query patterns.

With these in place, rollup tables are a dependable foundation for fast analytics. Without them, they accumulate drift that makes the data subtly wrong over time.