7 min read

AWS RDS Parameter Groups: The Postgres Settings That Actually Matter

RDS exposes hundreds of Postgres parameters through Parameter Groups. About a dozen of them are worth tuning. Here are the ones I always change and why.

RDS gives you a Parameter Group with several hundred Postgres parameters. The default values are tuned for a generic mid-sized workload, which means they are slightly wrong for everyone. Most teams I work with leave them alone and live with the defaults; some change a couple based on a guide they found; very few do a deliberate review.

A deliberate review is worth a few hours of one engineer's time. The wins are usually 20-50% on workload throughput at no infrastructure cost. Here are the parameters I always look at and the rationale.

Memory: shared_buffers, work_mem, effective_cache_size

shared_buffers — Postgres's main page cache. Default on RDS is {DBInstanceClassMemory/32768} which works out to ~25% of memory on most instances. Reasonable.

For read-heavy workloads on instances with >16GB RAM, bumping to 30-35% sometimes helps. For write-heavy workloads, leaving at 25% is usually right because more shared buffers means more dirty pages to flush at checkpoint time.

On smaller instances (<8GB), the default tends to be slightly low. Move to 30%.

work_mem — per-operation memory for sorts and hash tables. RDS default is 4MB. This is too low for almost any production workload.

For OLTP, set globally to 16-32MB. For dashboards/analytics, set per-session via SET LOCAL work_mem = '256MB' for the heavy queries.

The trap: work_mem is per-operation. A query with 5 hash joins can allocate 5x work_mem. Multiplied by concurrent connections, the total memory pressure grows fast. Do not set work_mem globally above what your peak concurrent connection count can absorb.

effective_cache_size — the planner's estimate of total OS+Postgres cache. RDS default is {DBInstanceClassMemory*3/4}, which is about right.

This does not allocate memory; it is purely a planner hint. The planner uses it to decide whether to expect index pages to be cached. Setting it correctly produces better plans, especially for index-only scans.

Concurrency: max_connections, max_worker_processes

max_connections — RDS scales this with instance size, typically LEAST({DBInstanceClassMemory/9531392}, 5000). For a db.m5.xlarge (16GB), that is roughly 1700.

This is too high for most applications. A connection costs ~10MB of RAM in shared structures. 1700 connections × 10MB = 17GB just for connection overhead.

The right max_connections depends on your application. For most:

  • Run PgBouncer in front. Set max_connections to maybe 200-400.
  • Without PgBouncer, set it to (peak concurrent transactions × 1.2 + admin reserve).

Blindly raising max_connections is a common cause of OOM on RDS.

max_worker_processes — for parallel queries and replication. Default is 8. Raise to 16 if you do parallel scans on large tables.

Vacuum: autovacuum_*

The default autovacuum settings are conservative. Most production tables benefit from more aggressive defaults:

autovacuum_vacuum_scale_factor = 0.05  -- default 0.2
autovacuum_analyze_scale_factor = 0.02  -- default 0.1
autovacuum_naptime = '15s'  -- default 60s
autovacuum_max_workers = 6  -- default 3
autovacuum_vacuum_cost_limit = 2000  -- default 200
autovacuum_vacuum_cost_delay = '2ms'  -- default 2ms in modern PG

This is roughly 4x more aggressive than defaults. It runs autovacuum more often, with more workers, with less throttling. The trade is more I/O during vacuum, which is usually fine on modern instance storage.

For specific tables that are very high-churn or very large, override at the table level:

ALTER TABLE events SET (
  autovacuum_vacuum_scale_factor = 0.02,
  autovacuum_analyze_scale_factor = 0.01
);

Checkpoint: max_wal_size, checkpoint_timeout

See the dedicated checkpoint-tuning post. Default max_wal_size = 1GB is too low for most workloads:

max_wal_size = '8GB' -- or higher for write-heavy
checkpoint_timeout = '15min'

This reduces checkpoint frequency and the latency spikes that come with them.

Statement-level: statement_timeout, idle_in_transaction_session_timeout

Defaults are 0 (disabled). Set them:

statement_timeout = '60s' -- adjust per workload
idle_in_transaction_session_timeout = '5min'

These protect against runaway queries and connection-leak bugs. The values depend on your workload (analytical queries may need longer); the defaults of 0 are wrong for any production system.

Logging: log_min_duration_statement, log_lock_waits

For diagnostics:

log_min_duration_statement = '1s' -- log queries over 1 second
log_lock_waits = on
log_temp_files = 1024 -- log temp files over 1KB (catches sort spills)
log_autovacuum_min_duration = '1s'
log_checkpoints = on

These give you the diagnostic foundation when something goes wrong. RDS exports logs to CloudWatch; useful settings turn into useful CloudWatch queries.

SPL: shared_preload_libraries

The one parameter that requires reboot is shared_preload_libraries. The standard set:

shared_preload_libraries = 'pg_stat_statements,auto_explain,pg_buffercache,pg_cron'

Add what you need at instance creation if possible. Adding later requires a reboot, which is downtime.

Per-extension parameters

For pg_stat_statements:

pg_stat_statements.max = 10000
pg_stat_statements.track = all
pg_stat_statements.track_utility = off

For auto_explain:

auto_explain.log_min_duration = 5000  -- 5 seconds
auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_format = json
auto_explain.log_nested_statements = on

These are the diagnostic foundation for slow-query investigation. Without them, you have aggregate stats but no per-execution visibility.

What I do for a new RDS instance

The checklist:

  1. Custom Parameter Group from the start (do not use the default — you cannot modify it).
  2. Apply the settings above as starting points.
  3. Adjust work_mem and max_connections based on instance size and workload.
  4. Add pg_stat_statements, auto_explain, pg_buffercache to shared_preload_libraries.
  5. Reboot once to apply.

For existing instances, I do this in waves: dynamic parameters first (no reboot), then SPL changes during a maintenance window.

What does not need to be tuned

A list of parameters I have learned to leave alone:

  • random_page_cost — defaults to 4 on RDS, which assumes spinning disks. For SSDs (which all modern RDS uses), 1.1 is more accurate, but the change rarely affects plans meaningfully.
  • seq_page_cost — default 1, leave it.
  • default_statistics_target — 100 is fine for most. Increase per-column when needed.
  • synchronous_commit — leave at on unless you have a specific reason. Off trades durability for speed.
  • wal_levellogical if you use logical replication, else replica (the default is fine).

Validation

After changing parameters, verify with:

SELECT name, setting, source
FROM pg_settings
WHERE name IN ('work_mem', 'max_connections', 'autovacuum_vacuum_scale_factor');

The source column tells you where the value came from (configuration file, default, session, etc.). If a change you applied is not showing up, the Parameter Group did not apply yet — RDS sometimes needs a few minutes for pending-reboot parameters to propagate.

What I have measured

On a real production workload (typical e-commerce OLTP), tuning the above from defaults produced:

  • ~30% reduction in p99 latency.
  • ~40% reduction in CPU during checkpoints.
  • ~50% reduction in vacuum-related backlog.

The instance size did not change. The cost was a few hours of careful review and one reboot.

RDS defaults are conservative; deliberate tuning is one of the highest ROI things a Postgres team can do.