6 min read

Azure Database for Postgres Flexible Server: The Settings Worth Tuning

Azure Flex's defaults are conservative. The Server parameters blade is where most of the meaningful tuning happens. Here are the parameters I always touch.

Azure Flex is the modern Postgres offering on Azure (the older Single Server is being phased out). The defaults are reasonable for a generic workload, which means they are slightly wrong for almost every specific workload. The Server parameters blade is where most of the value lives.

Here is the list I run through on every new Azure Flex instance.

SPL: shared_preload_libraries

The one parameter that requires a server restart. Get it right at instance creation time if possible.

For most workloads I add:

shared_preload_libraries = pg_stat_statements,auto_explain,pg_buffercache

For specific use cases, also add:

  • pg_cron for scheduled jobs.
  • pgaudit for compliance.
  • pg_wait_sampling for wait-event analysis (only if available — varies by Azure region).

Azure has a quirk: even after adding to SPL, you need to add the extension name to azure.extensions to allow CREATE EXTENSION. Two parameters, both required. Easy to miss the second one.

azure.extensions

The list of extensions allowed for CREATE EXTENSION. Default includes the common ones. To enable additional:

azure.extensions = pg_stat_statements,auto_explain,pg_buffercache,hypopg,pgstattuple,pg_cron,pgaudit

This is unique to Azure Flex; RDS and Cloud SQL handle it differently. If CREATE EXTENSION foo fails on Flex, this is usually why.

Memory parameters

work_mem — default is dynamically sized but typically too low for analytical queries.

work_mem = 16MB    # for OLTP
work_mem = 64MB    # for mixed workloads

For heavy individual queries, override at session level rather than globally. SET LOCAL work_mem = '256MB' for that query only.

shared_buffers — Azure sizes this based on instance memory automatically. Usually fine. If you need to override, target 25-35% of memory.

effective_cache_size — also auto-sized. Confirm it is set to roughly 75% of memory; if not, set explicitly.

maintenance_work_mem — used by VACUUM, REINDEX, CREATE INDEX. Default is conservative. Bump to 512MB or 1GB for faster maintenance:

maintenance_work_mem = 512MB

VACUUM finishes faster, REINDEX is faster, CREATE INDEX uses less I/O.

Connections

max_connections — Azure scales with instance size. For a Standard_D4s_v3 (16GB), the default is around 1700.

This is too high for almost any application. Each connection costs memory and lock-table space. The right shape:

  • Run PgBouncer in front of Flex (Azure provides built-in pooling — connection-pooling is on by default in newer Flex versions).
  • Set max_connections to a more realistic number (200-400 for most workloads).

On Azure Flex, the built-in pooler is convenient. It handles transaction-mode pooling automatically. Configure pool sizes via the Connection pooling blade.

Vacuum

More aggressive than defaults:

autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.02
autovacuum_naptime = 15s
autovacuum_max_workers = 6
autovacuum_vacuum_cost_limit = 2000

This runs autovacuum more often with more parallelism. Cost: slightly more I/O during vacuum. Benefit: less bloat, fewer planner surprises, lower freeze-related incident risk.

For specific tables that need different treatment (very large or very high-churn), use table-level overrides:

ALTER TABLE events SET (
  autovacuum_vacuum_scale_factor = 0.02,
  autovacuum_analyze_scale_factor = 0.01
);

Checkpoints and WAL

max_wal_size = 8GB
checkpoint_timeout = 15min
bgwriter_lru_maxpages = 500

Reduces checkpoint frequency. Smooths I/O. Eliminates most checkpoint-induced latency spikes.

Logging

log_min_duration_statement = 1000  # 1 second
log_lock_waits = on
log_temp_files = 1024
log_autovacuum_min_duration = 1000
log_checkpoints = on

These give you the diagnostic foundation. Azure Flex sends Postgres logs to Log Analytics; useful settings turn into useful Log Analytics queries.

For log_min_duration_statement, 1 second is a starting point. Adjust based on workload — for OLTP-heavy, drop to 500ms; for analytics, raise to 5000ms.

Statement timeouts

Default is 0 (no timeout). Set:

statement_timeout = 60s
idle_in_transaction_session_timeout = 5min
lock_timeout = 30s

These protect against runaway queries and stuck transactions. Adjust per workload — analytical queries may need longer statement_timeout, set per-session in those cases.

What is unique about Azure Flex

A few Flex-specific quirks:

Built-in connection pooling. Available since 2023. Enable in the Connection pooling blade. Transaction-mode by default. For most applications, this removes the need for separate PgBouncer.

Automatic minor-version upgrades. Flex applies minor patches in a maintenance window you configure. Set this to a low-traffic time. Major-version upgrades are still manual.

Storage tiering. Flex offers several storage tiers (P10/P20/P30/P40 by IOPS). Cheaper tiers cap IOPS lower. For write-heavy workloads, calculate IOPS need and pick accordingly — too low a tier saturates.

Read replicas. Async only on Flex, in the same or different region. The lag is typically sub-second on same-region. For HA, use the high-availability mode with a sync standby; for read scaling, use a separate read replica.

Backup retention. Default 7 days, configurable up to 35 days. For longer retention, you have to do logical backups yourself (pg_dump to blob storage).

Verification

After applying changes via Server parameters:

SELECT name, setting, source
FROM pg_settings
WHERE name IN ('work_mem', 'max_wal_size', 'autovacuum_vacuum_scale_factor');

The source column shows where the value came from (configuration file, default, session). If your Server parameters change is not showing up, check whether it requires a restart — those are flagged in the Azure portal.

A typical tuning session

For a new Azure Flex deployment, my workflow:

  1. Day 1: SPL + extensions configured, instance restarted once. Apply the dynamic parameter changes. Defer the work_mem/connections/vacuum tuning until I see actual workload patterns.

  2. Week 1: monitor pg_stat_statements and the Azure Insights blade. Identify hot queries and resource utilization.

  3. Week 2: tune work_mem, autovacuum settings, connection limits based on observed workload.

  4. Ongoing: revisit quarterly or after significant workload changes.

This incremental approach beats trying to set everything perfectly upfront. The defaults are not awful; they are just not optimal for any specific workload.