Integrations

Dedicated Vector Store Sync Checklist

How to keep PostgreSQL as the source of truth while syncing documents, permissions, deletes, and embedding versions to a dedicated vector store.

The rule: PostgreSQL owns product truth

A dedicated vector store should usually be a serving index, not the source of truth for documents, permissions, tenant state, or deletion lifecycle. Keep those facts in PostgreSQL and sync them outward deliberately.

Use an outbox for vector changes

An outbox table makes vector sync observable and retryable. It also gives MonPG something concrete to monitor.

CREATE TABLE vector_sync_outbox (
  id bigserial PRIMARY KEY,
  tenant_id bigint NOT NULL,
  document_id bigint NOT NULL,
  operation text NOT NULL CHECK (operation IN ('upsert', 'delete')),
  embedding_model text NOT NULL,
  embedding_version integer NOT NULL,
  status text NOT NULL DEFAULT 'queued',
  attempts integer NOT NULL DEFAULT 0,
  created_at timestamptz NOT NULL DEFAULT now(),
  processed_at timestamptz,
  error text
);

Sync checklist

  • Every vector row has a stable source document ID.
  • Every upsert includes tenant, permission, language, and document state metadata.
  • Deletes and permission revokes are high-priority sync events.
  • Embedding model and chunking version are stored in PostgreSQL and the vector store.
  • Backfills can be paused, resumed, and replayed safely.
  • Drift checks compare PostgreSQL document state with vector-store collection state.

Drift checks

Run regular checks for documents that are active in PostgreSQL but missing from the vector store, and documents deleted in PostgreSQL but still searchable in the vector store. Treat both as correctness bugs.

SELECT status, count(*)
FROM vector_sync_outbox
WHERE created_at >= now() - interval '24 hours'
GROUP BY status
ORDER BY count(*) DESC;

What to alert on

  • Oldest queued sync event age exceeds the freshness SLO.
  • Delete events are queued longer than a few minutes.
  • Failed sync attempts grow after a deploy.
  • Backfill throughput falls below the planned rate.
  • Vector-store document count diverges from PostgreSQL source count.