How to set up disk, CPU, and memory alerts

Define alert rules for disk space, CPU usage, and memory (or swap) so you are notified before outages. Use thresholds and hysteresis to avoid flapping. Use this when configuring a monitoring system (e.g. Prometheus and Alertmanager, or cloud monitoring).

Intent: How-to

Quick answer

Disk: alert when filesystem usage >85% (warning) and >95% (critical). Use node_filesystem_avail_bytes / node_filesystem_size_bytes or equivalent. Alert on the mount point that matters (e.g. /, /var).
CPU: alert when usage is high for a sustained period (e.g. 5m average >80%) to avoid brief spikes. Memory: alert when available is low (e.g. <10%) or when swap usage is growing; include node_memory_MemAvailable_bytes.
Use for clause (e.g. for: 5m) so alert fires only after condition holds; add hysteresis (different threshold for resolve) if needed. Route alerts to email, Slack, or PagerDuty; document runbooks.

Prerequisites

System metrics basics (CPU, memory, disk)

Steps

Disk alerts

Rule: (node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes > 0.85 for 5m. Critical at 0.95. Filter by mountpoint (/, /var). Resolve when below threshold.
CPU and memory alerts

CPU: 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80 for 5m. Memory: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes < 0.1 for 5m. Swap: node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes growing or > threshold.
For and hysteresis

Use for: 5m so transient spikes do not alert. Optionally: warning at 80%, critical at 95%; resolve when below 75% to avoid flapping.
Route and runbook

Send to Alertmanager or cloud alerting; route to team channel or PagerDuty. Add runbook link or summary (e.g. 'High disk: see runbook disk-full'); document how to fix.

Summary

Define disk, CPU, and memory alert rules with thresholds and for-clause; route alerts and add runbooks. Use this to get notified before resource exhaustion.

Prerequisites

System metrics basics.

Steps

Step 1: Disk alerts

Alert when filesystem usage is above 85% (warning) and 95% (critical) for a sustained period.

Step 2: CPU and memory alerts

Alert on sustained high CPU and low available memory; optionally on swap growth.

Step 3: For and hysteresis

Use for-clause to avoid flapping; use different resolve thresholds if needed.

Step 4: Route and runbook

Route alerts to the right channel; link or describe runbooks for each alert.

Verification

Alerts fire when the condition is met; resolve when condition clears; runbooks are available.

Troubleshooting

Alert storm — Increase for-clause or thresholds; add hysteresis. Missing alerts — Check metric names and labels; verify scrape and alert rule evaluation.

How to set up disk, CPU, and memory alerts

Quick answer

Prerequisites

Steps

Disk alerts

CPU and memory alerts

For and hysteresis

Route and runbook

Summary

Prerequisites

Steps

Step 1: Disk alerts

Step 2: CPU and memory alerts

Step 3: For and hysteresis

Step 4: Route and runbook

Verification

Troubleshooting

Next steps

Continue to