RAG Retrieval Quality Runbook | MonPG Docs

Why retrieval quality needs a runbook

RAG failures rarely announce themselves as database errors. The assistant answers confidently, but it cites the wrong document, misses a fresh document, ignores a permission boundary, or invents an answer because retrieval returned weak evidence.

This runbook turns those product failures into measurable checks.

Keep a golden query set

Create a small but realistic set of queries with expected source documents. Include common questions, exact identifiers, rare edge cases, permission-restricted documents, deleted documents, and newly uploaded content.

Query class	What to verify
Exact identifier	Error codes, invoice IDs, table names, product SKUs rank correctly.
Semantic question	Conceptually relevant docs appear even when wording differs.
Permission boundary	Restricted docs never appear for the wrong user.
Fresh document	Recently embedded docs are retrievable within the agreed freshness SLO.
Deleted document	Deleted or revoked docs disappear from retrieval.

Metrics to capture

Recall@k: how often expected documents appear in the top k retrieved chunks.
Result count: whether filters leave enough chunks for the answer step.
Citation coverage: whether answers cite retrieved evidence.
No-hit rate: how often retrieval correctly admits there is no evidence.
Freshness lag: time from document update to searchable chunk.
Forbidden-hit rate: any retrieval result the user should not be allowed to see.

Debug sequence

Re-run the query with the same tenant, user, language, and timestamp filters.
Run exact search on a small sample and compare it with the approximate production path.
Check whether filters removed most approximate candidates.
Check whether the document was embedded with the current model and chunking version.
Check whether deleted or stale chunks still exist in the serving table or vector store.
Review the prompt only after retrieval evidence is proven healthy.

MonPG checks

Use MonPG to watch the PostgreSQL side of retrieval: slow vector queries in pg_stat_statements, vector index growth, dead tuples after re-embedding, queue tables for embedding jobs, and query plans for filtered retrieval. If retrieval quality drops after a data import or index rebuild, compare those operational signals with the golden query results.