Measuring Table Bloat in Postgres: The Question, Answered Honestly
Bloat measurement is useful only when it changes a cleanup decision. Learn when estimates are good enough, when pgstattuple is worth the cost, and what action follows.
Notes for the problems that show up after launch: bad plans, awkward migrations, index debt, vacuum pressure, replica lag, and the small decisions that make PostgreSQL easier to operate.
Bloat measurement is useful only when it changes a cleanup decision. Learn when estimates are good enough, when pgstattuple is worth the cost, and what action follows.
Postgres memory tuning is a budget problem. shared_buffers, OS cache, work_mem, maintenance work, and connection count all spend the same RAM.
Disk-full on a Postgres server is rarely just one thing. WAL, temp files, logs, and delayed cleanup arrive together. The recovery is mostly about what you prepared before.
Connection storms look like database outages but usually start in pools, retries, autoscaling, deploys, and health checks. The fix is shaping pressure before it reaches Postgres.
PITR lets you restore the database to any moment within your retention window. The feature is well-documented; the operational reality is messier.
Most teams have backups. Most teams have never restored from one. The first time you test, you find out the backup was missing something.
Logical and physical backups solve different problems. Most teams need both, but only have one. Here is the actual decision framework.
WAL accumulates fast. The retention policy is a tradeoff between PITR window and storage cost. Here is how I think about it.
Pooling is about protecting Postgres from connection shape, not just increasing throughput. The hard parts are transaction mode, prepared statements, bursts, and failure behavior.
Logical replication is more flexible than physical and more fragile. Use it when you need partial replication, cross-version, or selective sync. Don't use it for HA.
Physical replication slots make sure replicas can catch up after a disconnect. They also make sure your primary's disk fills if a replica is gone and forgotten.
Read replicas are eventually consistent. The application's view of "after I wrote, my read should see it" is often wrong by milliseconds, sometimes by minutes.