Materialized views work great until the underlying query is too expensive to re-run from scratch. "Daily revenue per country over the last 5 years" might be 5 minutes to compute. If you need it refreshed every 5 minutes, you have a problem.
Incremental aggregation is the answer: only update the parts that changed, not the whole view. Postgres does not have native incremental materialized views (yet — there are extensions and proposals). You build it yourself.
The pattern
-- The aggregation table
CREATE TABLE daily_revenue (
day DATE NOT NULL,
country TEXT NOT NULL,
order_count INTEGER NOT NULL DEFAULT 0,
revenue_cents BIGINT NOT NULL DEFAULT 0,
last_updated_order_id BIGINT NOT NULL DEFAULT 0,
PRIMARY KEY (day, country)
);
-- Periodic incremental update
WITH new_orders AS (
SELECT
date_trunc('day', o.created_at)::date AS day,
c.country,
count(*) AS order_count,
sum(o.total_cents) AS revenue_cents,
max(o.id) AS max_id
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.id > (SELECT coalesce(max(last_updated_order_id), 0) FROM daily_revenue)
GROUP BY 1, 2
)
INSERT INTO daily_revenue (day, country, order_count, revenue_cents, last_updated_order_id)
SELECT day, country, order_count, revenue_cents, max_id
FROM new_orders
ON CONFLICT (day, country) DO UPDATE SET
order_count = daily_revenue.order_count + EXCLUDED.order_count,
revenue_cents = daily_revenue.revenue_cents + EXCLUDED.revenue_cents,
last_updated_order_id = greatest(daily_revenue.last_updated_order_id, EXCLUDED.last_updated_order_id);
The key idea: track a watermark (last_updated_order_id) and only process rows past that watermark each time.
The upsert (ON CONFLICT ... DO UPDATE) handles existing days by adding to them. New days are inserted.
Watermark choices
The watermark is whatever uniquely orders the source data:
- Auto-incrementing ID: simplest. Works if rows are inserted strictly in ID order.
- Created timestamp + ID: more robust if IDs can be inserted out of order (e.g., from multiple writers).
- A dedicated
processed_atcolumn on the source: explicit but adds storage.
For most cases, an integer ID is enough.
Handling updates and deletes
The pattern above only handles inserts. If source rows can be UPDATEd or DELETEd, you need more:
For updates, the source row's old values were already added to the aggregate. Adding the new values produces double-counting.
For deletes, the deleted row's values are still in the aggregate.
For write-heavy aggregations where this happens, the answer is usually one of:
Use triggers to maintain the aggregate: a trigger on INSERT/UPDATE/DELETE updates the aggregate atomically. Most reliable but adds write overhead.
Recompute periodically: once a day, recompute the aggregate from scratch to fix any drift. Acceptable if eventual consistency is OK.
Use a soft-delete pattern in the source: never actually delete; mark deleted. The aggregate query filters out deleted rows.
Trigger-based incremental aggregation
The most robust pattern:
CREATE OR REPLACE FUNCTION update_daily_revenue()
RETURNS TRIGGER AS $$
BEGIN
IF (TG_OP = 'INSERT') THEN
INSERT INTO daily_revenue (day, country, order_count, revenue_cents)
SELECT
NEW.created_at::date,
(SELECT country FROM customers WHERE id = NEW.customer_id),
1,
NEW.total_cents
ON CONFLICT (day, country) DO UPDATE
SET order_count = daily_revenue.order_count + 1,
revenue_cents = daily_revenue.revenue_cents + NEW.total_cents;
ELSIF (TG_OP = 'DELETE') THEN
UPDATE daily_revenue
SET order_count = order_count - 1,
revenue_cents = revenue_cents - OLD.total_cents
WHERE day = OLD.created_at::date
AND country = (SELECT country FROM customers WHERE id = OLD.customer_id);
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER orders_aggregate_trigger
AFTER INSERT OR DELETE ON orders
FOR EACH ROW EXECUTE FUNCTION update_daily_revenue();
Every write to orders updates the aggregate. The aggregate is always current.
The cost: every INSERT/DELETE on orders does extra work. For high-throughput tables, this is a real overhead. Profile to confirm.
Hybrid: real-time + reconciliation
A pragmatic pattern combining both:
- Triggers maintain the aggregate in real time for current-day data.
- A nightly job recomputes the aggregate from scratch for the previous day, fixing any drift.
This gives you near-real-time accuracy with periodic correction. The drift correction handles edge cases (transactions that missed the trigger, application-level bugs that bypassed it).
Performance characteristics
Incremental aggregation typically:
- Adds 5-20% to write throughput on the source table (trigger overhead).
- Reduces aggregate query latency by 100-1000x compared to recomputing.
- Pays off when the source table is at least 10M rows and the aggregate is queried at least hundreds of times per day.
For smaller workloads, plain materialized views are simpler and good enough.
Extension alternatives
For specific use cases, extensions handle this for you:
- TimescaleDB continuous aggregates: incremental materialized views for time-bucketed data. The killer feature for time-series workloads.
- pg_ivm (incremental view maintenance): a Postgres extension that adds incrementally-maintained materialized views. Active development.
If one of these fits your use case, use it. Custom incremental aggregation is more code than necessary for problems already solved by extensions.
What I do for new aggregation needs
- Start with a materialized view + scheduled refresh. Simplest.
- If refresh becomes too slow, switch to incremental aggregation with watermarks.
- If the source has updates/deletes that affect the aggregate, add triggers.
- If using time-series-shaped data, evaluate TimescaleDB.
The complexity grows with the requirements. Most teams never need step 3 or 4.