5 min read

Vector Deletes, Freshness, and Permissions: The Hidden RAG Incident

A RAG system is unsafe if deleted documents, permission changes, and stale embeddings can still appear in answers. Freshness is part of correctness.

The incident was not a hallucination. The assistant answered from a document that had been deleted two days earlier. The source database was correct. The vector index was not.

RAG freshness is a correctness problem. If search can retrieve content that the product has deleted or hidden, the answer is not merely stale. It can be a security incident.

The framework: source truth plus serving state

I separate source truth from serving state. The source system owns documents and permissions. The vector system is a serving index that must prove it reflects those rules closely enough for the product.

  • Deletes need tombstones or hard-delete confirmation.
  • Permission changes need propagation checks.
  • Embedding rows need source content hashes.
  • Queries need freshness and visibility filters.
  • Sync jobs need lag metrics and dead-letter handling.

A safe query shape

Even if deletes are eventually hard-removed, I keep visibility state in the serving row. That gives the query a final safety filter.

SELECT chunk_id, document_id
FROM document_chunks
WHERE organization_id = $1
  AND acl_group_id = $2
  AND is_deleted = false
  AND visible_after <= now()
ORDER BY embedding <=> $3
LIMIT 20;

Freshness has a budget

Some systems can tolerate five minutes of search staleness. Legal, billing, HR, and security data often cannot. The budget should be explicit per corpus, not assumed globally.

What I monitor

  • Delete propagation lag.
  • Permission propagation lag.
  • Chunks with missing source documents.
  • Chunks whose content_hash no longer matches source.
  • Queries that return fewer than requested rows after visibility filters.
  • Dead-lettered sync events.

The production default

Treat the vector index like any other derived serving system. It needs tombstones, replay, reconciliation, and alerts. The mistake is assuming semantic search can be eventually consistent without product consequences.

The runbook I want before this reaches production

Before I trust this design, I want a small runbook that names the failure mode, the owner, and the rollback path. Vector systems fail in ways that look like product quality problems: missing evidence, stale evidence, wrong-tenant evidence, high p99, or answers that cite weak chunks. If the team cannot tell which one happened, the system is not observable enough.

  • Define a golden query set with real permissions and expected source documents.
  • Track recall, result count, p95, p99, and cost by query class.
  • Keep a rollback path for index, embedding model, chunking, and metadata changes.
  • Test deleted, restricted, fresh, and re-embedded documents as canaries.
  • Review the dashboard after every bulk import, re-embedding job, and index rebuild.

The practical standard is simple: a retrieval change should be measurable before it ships, visible while it runs, and reversible when quality drops.