Agent Troubleshooting
Token errors, network failures, SSL issues, missing metrics — diagnostic walkthrough.
The agent's failure modes break into a small number of categories. Below is the rundown by what shows in the logs, ordered roughly by how often they hit.
"invalid agent token"
[http-writer] POST /api/v1/ingest/cycle returned 401: invalid agent token
The MONPG_AGENT_TOKEN doesn't match any active row in our agent_tokens table. Almost always the same root cause: the token was revoked from the UI but the env var still has the old value. Generate a fresh one (Settings → Agent Tokens) and update the agent. Tokens don't expire on a schedule but they do go away when you click Revoke.
"connection refused" to your DB
[collector] connect failed: dial tcp 127.0.0.1:5432: connection refused
Three things to check, in order. Is PostgreSQL actually listening on the configured port (ss -tlnp | grep 5432). Does pg_hba.conf allow the agent's source IP for monpg_monitor. And — the gotcha — if the agent runs in a container, localhost means the container, not the host. Use host.docker.internal on Mac/Windows or the host's LAN IP on Linux.
SSL handshake failed
tls: failed to verify certificate: x509: certificate signed by unknown authority
For managed PG (RDS, Azure, Cloud SQL), the fix is usually MONPG_DB_SSLMODE=require with the provider's root CA installed in the host's trust store. For self-hosted with a self-signed cert, MONPG_DB_SSLMODE=verify-ca + MONPG_DB_SSLROOTCERT=/path/to/ca.crt. As a smoke test only — never in production — MONPG_DB_SSLMODE=require alone skips verification.
Egress to MonPG fails
[http-writer] dial tcp api.monpg.app:443: i/o timeout
Either an egress firewall is blocking api.monpg.app:443 (allowlist the FQDN or our static IPs) or you're behind a corporate proxy that the collector doesn't know about. Set HTTP_PROXY and HTTPS_PROXY in the agent's env.
Metrics arrive but the UI is empty
The agent reports green to ingest but the UI shows no server. Three causes worth checking. First, look at the Server dropdown in the UI header — your DB should appear there with a green dot. If the dot is gray, the last-collected timestamp is older than five minutes; check agent logs around then. If the server doesn't appear at all, the most likely cause is that the agent is pushing to a different organization than the one you're looking at. Generate the token from the same org you're viewing.
High CPU on the agent host
Steady state is under 100m CPU. If you see more, three knobs to try. MONPG_COLLECTOR_INTERVAL defaults to 30 seconds — push it to 60 or higher if you don't need that resolution. Set MONPG_COLLECT_LOGS=false if you collect logs through a different pipeline. And if MONPG_LOG_FILE is pointed at a chatty PG that writes a lot of statements, log-tailing alone can pin a CPU.