External Watchdog — Verify MonPG Itself
Quis custodiet ipsos custodes? Cron a check that ensures MonPG is collecting.
Here's the thing about monitoring tools: the day they fail silently is the day you find out from a Slack message about a missing pages alert, three hours late. We collect from your DB but we're a service like any other — cred rotation, network split, cert expiry can quietly take us off the grid. The discipline is to run a small external watchdog that alerts you when MonPG itself stops reporting.
The watchdog should run from somewhere different — a different cloud, your CI, a free monitoring service. The whole point is that it survives whatever takes us out.
Pattern A: "is everything reporting recently"
Pull the server list, assert each one's last_collected is within five minutes:
#!/usr/bin/env bash
set -e
RESP=$(curl -fsSL -H "Authorization: Bearer $MONPG_KEY" https://api.monpg.app/api/v1/servers)
NOW=$(date -u +%s)
STALE=$(echo "$RESP" | jq -r --argjson now "$NOW" '
.data[]
| select((.last_collected | fromdate) < ($now - 300))
| .name')
if [ -n "$STALE" ]; then
echo "STALE: $STALE"
exit 1
fi
echo "all servers fresh"
Cron it from anywhere. Page on non-zero exit. This is the better pattern because it tests the actual data pipeline, not just the API surface.
Pattern B: health endpoint
MonPG exposes /api/v1/health/ready publicly. Returns 200 if API + DBs are healthy:
curl -fsS https://api.monpg.app/api/v1/health/ready
Simpler, but it doesn't catch the "API up, ingest broken" case where the API returns 200 while no new snapshots are landing. Pattern A is what you actually want.
Where to run it
UptimeRobot, Pingdom, Cronitor — free tier on any of them is enough for a single watchdog. GitHub Actions on a schedule is essentially free for public repos. AWS Lambda + EventBridge if you want multi-region for full DR.
Where to alert
Different channel from your normal MonPG alerts. If the watchdog routes through MonPG, you're alerting yourself through a broken system. The cleanest pattern is the watchdog calls PagerDuty's webhook directly when it fails, completely independent of us.