10 min read

PostgreSQL Multi-Tenant Performance: Tenant Skew, Partial Indexes, and Noisy Neighbors

Multi-tenant PostgreSQL performance fails when average metrics hide tenant skew. The fix is measuring query plans, indexes, and latency by tenant class.

Multi-tenant PostgreSQL systems rarely fail evenly. One tenant owns 60 percent of the rows, another tenant has a tiny corpus but strict latency expectations, and the average dashboard says everything is fine.

Tenant skew turns normal query tuning into a distribution problem. The same SQL can be perfect for small tenants and terrible for large tenants. Or it can be tuned for the largest tenant and waste effort for everyone else.

The first step is to stop treating tenant_id as just another filter. In a SaaS database, tenant_id is often the workload boundary.

Measure by tenant class

Split tenants by row count, traffic, and feature usage. Large, medium, small, and noisy tenants should have separate latency and plan checks. A global p95 can hide the only customer who is actually in trouble.

SELECT
  tenant_id,
  count(*) AS rows
FROM events
GROUP BY tenant_id
ORDER BY rows DESC
LIMIT 20;

Index for the real tenant query

Tenant-scoped queries usually need tenant_id early in the index, followed by the filter or sort that decides the access pattern.

CREATE INDEX CONCURRENTLY events_tenant_created_idx
ON events (tenant_id, created_at DESC);

Partial indexes can protect hot paths

If the application constantly queries active records, a partial index can keep the hot path smaller and cheaper. This works best when the predicate is stable and literal in the query.

CREATE INDEX CONCURRENTLY tasks_active_tenant_due_idx
ON tasks (tenant_id, due_at)
WHERE status = 'active';

When partitioning enters the conversation

Partitioning by tenant is tempting, but it can create too many partitions if tenant count is large. It is more useful for tenant tiers, region boundaries, or a small number of very large tenants that need isolated maintenance and query pruning.

The SaaS checklist

  • Track p95 and p99 by tenant class.
  • Test EXPLAIN plans with large and small tenant IDs.
  • Watch noisy tenants separately from average load.
  • Keep tenant_id visible in predicates.
  • Avoid one-off indexes for every small tenant.
  • Move extreme tenants to dedicated paths only when measurements justify it.

The practical standard

The best PostgreSQL performance work is boring in the right way. Name the failure mode, capture the before plan or metric, make one change, and compare the exact same signal afterward. Anything else is just a more confident guess.