AWS Aurora PostgreSQL Monitoring

Aurora PostgreSQL is not just "RDS with faster storage." Its shared storage layer, multi-reader architecture, and automatic failover create monitoring requirements that standard PostgreSQL tools cannot address. MonPG is built to handle them.

Understanding Aurora's Architecture

Aurora PostgreSQL separates compute from storage in a way that fundamentally changes how database monitoring works. In standard PostgreSQL (including standard RDS), each instance has its own local storage and manages its own WAL. Aurora replaces this with a distributed, log-structured storage layer shared across all instances in a cluster.

This architecture delivers significant performance and durability advantages — data is automatically replicated six ways across three availability zones, and reader instances can serve queries from the same storage without streaming replication lag. But it also means that traditional PostgreSQL monitoring assumptions break down:

WAL metrics behave differently. Aurora does not ship WAL segments to replicas the way standard streaming replication does. WAL generation rate is still a useful write-load indicator, but WAL-based replication lag metrics from pg_stat_replication do not apply in the same way.
Replica lag is measured differently. Aurora replica lag is the time it takes for a reader instance to apply page cache invalidations from the writer, not the time to replay WAL. This lag is typically under 20 milliseconds but can spike during heavy write bursts.
Storage is not per-instance. Disk IOPS and throughput are cluster-level metrics, not instance-level. A single reader running a heavy sequential scan affects storage I/O for the entire cluster.
Failover changes the writer. When a failover occurs, a reader is promoted to writer in seconds. Any monitoring tool that hardcodes instance roles will show incorrect data after a failover until manually reconfigured.

Cluster-Level Monitoring with MonPG

MonPG treats an Aurora cluster as a first-class entity. When you add an Aurora cluster to MonPG, the collector connects to each instance in the cluster — the writer and every reader — and presents them as a unified group with per-instance drill-down.

Writer Instance Monitoring

The writer instance handles all INSERT, UPDATE, and DELETE operations plus any read queries routed to the cluster endpoint. MonPG tracks writer-specific metrics including:

Write throughput (tuples inserted, updated, deleted per second)
Buffer pool pressure and cache eviction rates
Lock contention and blocking query chains
Autovacuum performance and dead tuple accumulation
Transaction commit and rollback rates

Reader Instance Monitoring

Aurora supports up to 15 reader instances, each running its own query workload. MonPG monitors each reader independently:

Per-reader pg_stat_statements analysis — see which queries each reader handles
Reader-specific connection counts and pool utilization
Cache hit ratio per reader (each has its own buffer pool)
Aurora replica lag per reader in milliseconds

This per-reader visibility is critical because reader workloads are often heterogeneous. One reader might serve your reporting queries (long-running, scan-heavy), another might serve your API read path (low-latency, index-based), and a third might be dedicated to your analytics pipeline. Each has different performance characteristics and different optimization opportunities.

Aurora Replica Lag: What It Really Means

Aurora replica lag is one of the most misunderstood metrics in the AWS ecosystem. In standard PostgreSQL replication, lag is measured in bytes of WAL that the replica has not yet applied. In Aurora, lag is the delay between when the writer commits a change to the shared storage layer and when a reader sees that change in its buffer cache.

Under normal conditions, Aurora replica lag is under 20 milliseconds — effectively invisible to most applications. But several scenarios can cause lag spikes:

Heavy write bursts. Large bulk inserts or updates generate a flood of page invalidations that readers must process.
Reader buffer pressure.If a reader's buffer pool is too small for its workload, page evictions force it to re-read from shared storage, compounding lag.
Long-running reader transactions. Snapshot isolation on readers can delay the visibility of new writer changes.

MonPG tracks replica lag at 30-second intervals per reader, stores historical data, and alerts when lag exceeds your threshold. You can correlate lag spikes with writer activity to understand root causes.

Storage and I/O Monitoring

Aurora storage auto-scales in 10 GB increments up to 128 TB, which eliminates the need to provision storage in advance. However, storage growth still needs monitoring for cost management and capacity planning.

MonPG tracks Aurora storage metrics including:

Volume bytes used — track total storage consumption over time and project future costs.
Read/write IOPS — Aurora charges per I/O operation. MonPG correlates IOPS with specific queries so you can identify which workloads drive your I/O bill.
Buffer cache hit ratio — a low hit ratio means more reads from the storage layer, which means more I/O charges.

Understanding the relationship between query patterns and I/O is particularly important on Aurora because I/O costs are a significant portion of the total bill. A single poorly-indexed query running full table scans can double your Aurora I/O costs.

Failover Detection and Role Tracking

Aurora failover promotes a reader to writer, typically completing in under 30 seconds. From a monitoring perspective, this event is significant: the former reader is now handling writes, and its performance profile changes completely. A monitoring tool that does not detect the role change will show misleading data — the "writer" dashboard shows read-only metrics from the old writer, and the "reader" dashboard shows write-heavy metrics from the promoted instance.

MonPG detects Aurora failovers automatically. On each collection cycle, the collector checks the instance role via pg_is_in_recovery(). When a role change is detected, MonPG updates the instance metadata and sends a notification. Historical metrics are preserved with correct role attribution — you can see exactly when the failover happened and how performance changed before and after.

Aurora vs Standard RDS: Monitoring Differences

If you are deciding between Aurora and standard RDS PostgreSQL, or migrating from one to the other, understanding the monitoring differences helps set expectations:

Replication. Standard RDS uses streaming replication with WAL shipping. Aurora uses shared storage with page-level cache invalidation. MonPG adapts its replication monitoring to match whichever engine you use.
Storage metrics. Standard RDS has provisioned IOPS and a fixed storage size. Aurora auto-scales storage and charges per I/O. MonPG tracks both models.
Failover time.Standard RDS Multi-AZ failover takes 60-120 seconds. Aurora failover takes under 30 seconds. MonPG's failover detection handles both.
Reader scaling. Standard RDS supports up to 5 read replicas. Aurora supports up to 15 readers with automatic load balancing. MonPG monitors all readers regardless of count.

For a broader view of RDS monitoring including non-Aurora specifics, see our RDS PostgreSQL monitoring guide.

Aurora-Specific Optimizations

MonPG's index advisor and vacuum advisor account for Aurora-specific behavior:

Index recommendations factor in I/O cost.On Aurora, adding an index that eliminates sequential scans saves I/O charges in addition to CPU time. MonPG's what-if analysis includes estimated I/O reduction.
Vacuum behavior differs.Aurora's storage layer handles some dead tuple cleanup differently than standard PostgreSQL. MonPG's vacuum advisor accounts for Aurora-specific autovacuum behavior and thresholds.
Connection management. Aurora reader endpoints distribute connections across readers using DNS round-robin. MonPG shows per-reader connection distribution so you can verify that load balancing is working as expected.

Related Resources

AWS RDS PostgreSQL Monitoring — General RDS monitoring including CloudWatch and Performance Insights.
PostgreSQL Monitoring Guide — Core PostgreSQL metrics and monitoring best practices.
PostgreSQL Query Monitoring — Track query fingerprints, regressions, waits, and plan changes.
Replication Monitoring Guide — Streaming and logical replication monitoring for managed services.
PostgreSQL Index Optimization — Adding the right indexes reduces Aurora I/O costs significantly.

Monitor Your Aurora Cluster Today

Add your Aurora cluster to MonPG and get writer, reader, and storage visibility in minutes. The collector runs in your VPC alongside your application.

Start trial