Vector Database Monitoring Overview | MonPG Docs

What this page is for

Vector search is not healthy just because nearest-neighbor queries are fast. A production vector system can return quickly and still miss the right document, return stale chunks, ignore permissions, or serve data that no longer matches PostgreSQL.

MonPG treats vector search as an operational workload. For pgvector, the vector rows and indexes live directly inside PostgreSQL. For a dedicated vector store, PostgreSQL usually remains the source of truth for documents, tenants, permissions, deletes, and embedding jobs. In both cases, the useful monitoring question is the same: can retrieval return the right allowed evidence at the latency the product expects?

The signals to track

Vector query latency: p50, p95, and p99 by query class, not one global average.
Filtered result count: how often a query asks for 10 chunks and gets fewer after tenant, ACL, status, or language filters.
Recall checks: sampled approximate results compared with exact search or a curated golden set.
Index size growth: HNSW and IVFFlat indexes can grow quickly and change storage pressure.
Embedding freshness: documents waiting to be embedded, re-embedded, or removed from the serving index.
Delete lag: deleted or restricted documents that still have searchable vectors.
Tenant skew: latency and recall split by large, medium, and small tenants.

Where MonPG helps

When pgvector is installed, MonPG surfaces the normal PostgreSQL signals that decide whether vector search stays healthy: relation size, index size, query latency from pg_stat_statements, autovacuum lag, table bloat, index growth, and plan regressions.

For dedicated vector stores, MonPG monitors the PostgreSQL side of the contract: document tables, embedding job tables, delete queues, update queues, and the queries that prepare retrieval metadata. The vector database may be outside PostgreSQL, but the lifecycle truth usually is not.

A minimal vector health table

If your application stores embedding job state in PostgreSQL, keep it queryable. A small operational table makes monitoring much easier.

CREATE TABLE embedding_jobs (
  id bigserial PRIMARY KEY,
  tenant_id bigint NOT NULL,
  document_id bigint NOT NULL,
  embedding_model text NOT NULL,
  status text NOT NULL CHECK (status IN ('queued', 'running', 'succeeded', 'failed')),
  attempts integer NOT NULL DEFAULT 0,
  queued_at timestamptz NOT NULL DEFAULT now(),
  finished_at timestamptz,
  error text
);

From there, alert on queue age, failure rate, and re-embedding backlog before search quality drops.

Production rule

Do not monitor only vector latency. Monitor the product contract around vector search: allowed data, fresh data, enough results, stable recall, and a clear rollback path for embedding model, chunking, and index changes.