Uptime and health checks

Monitor service availability with HTTP, TCP, or script-based checks from one or more locations. Use this when you need to know when a service is down or degraded and to measure uptime and response time.

Intent: How-to

Quick answer

HTTP check: GET a known endpoint (e.g. /health) from outside; expect 200 and optional body or latency. Run every 1–5 minutes from multiple regions or probes. Alert on consecutive failures (e.g. 2–3).
TCP check: connect to port (e.g. 443); success if connection established. Use when the service has no HTTP or when you only care about reachability. Combine with HTTP for full stack check.
Script or internal check: run a script that logs in, runs a query, or calls an API; success/failure drives alert. Use for deep health (e.g. DB connectivity from app host). Document what each check validates and the expected runbook.

Steps

Define health endpoint

Expose /health or /ready that returns 200 when the service is healthy (e.g. DB connected, cache reachable). Return 503 or non-200 when degraded. Keep the check fast and lightweight.
Configure external check

Use uptime service (e.g. UptimeRobot, Pingdom) or your own probe: GET https://yourapp/health every 1–5 min. Alert after 2–3 consecutive failures; optionally alert on latency > threshold.
TCP and multi-region

Add TCP check to port 443 or 80 if you want reachability without HTTP. Run checks from multiple regions to detect regional outages and to measure latency by region.
Runbook and SLA

Document what to do when the check fails (e.g. check app logs, DB, load balancer). Track uptime and SLA; report on availability and MTTR.

Summary

Use HTTP or TCP health checks from external probes to monitor availability; alert on consecutive failures and optionally on latency. Use this to measure uptime and to get notified when a service is down.

Prerequisites

None.

Steps

Step 1: Define health endpoint

Expose a /health or /ready endpoint that returns 200 when healthy and 503 when degraded.

Step 2: Configure external check

Set up an uptime check that hits the endpoint periodically; alert after N consecutive failures.

Step 3: TCP and multi-region

Add TCP checks and run from multiple regions for reachability and latency insight.

Step 4: Runbook and SLA

Document response to failures; track uptime and SLA.

Verification

Check runs on schedule; alert fires when the service is down; runbook is followed and MTTR is tracked.

Troubleshooting

False positives — Increase consecutive failure count; check probe location and network. Check passes but app broken — Deepen the health check (e.g. DB query, cache read) and keep it fast.

Uptime and health checks

Quick answer

Steps

Define health endpoint

Configure external check

TCP and multi-region

Runbook and SLA

Summary

Prerequisites

Steps

Step 1: Define health endpoint

Step 2: Configure external check

Step 3: TCP and multi-region

Step 4: Runbook and SLA

Verification

Troubleshooting

Next steps

Continue to