System metrics basics (CPU, memory, disk)
Topic: Monitoring basics
Summary
Collect and interpret basic system metrics: CPU usage, memory (used, available, swap), and disk usage. Use top, free, df, and similar tools or an agent (e.g. Node Exporter) for monitoring. Use this when setting up monitoring or when diagnosing resource-related issues.
Intent: How-to
Quick answer
- CPU: top or htop for per-process; /proc/stat or node_exporter for system-wide. User, system, iowait, steal. High iowait means CPU waiting on disk; high steal (VM) means host is busy.
- Memory: free -h; note available (not just free). Swap usage; if swap is growing, memory pressure is high. Per-process: top (RES, VSZ) or ps -o rss,vsz.
- Disk: df -h for usage; iostat for I/O. Watch for full filesystems and high %util (saturation). Collect metrics with an agent (Node Exporter, collectd) and send to a time-series DB or monitoring service.
Steps
-
CPU metrics
top or htop; note %CPU per process and overall. For scripts: read /proc/stat or use mpstat. Use node_exporter or collectd to expose metrics for Prometheus or your monitoring stack.
-
Memory metrics
free -h; focus on available. Check swap (si/so in vmstat for swap in/out). Expose node_memory_* with Node Exporter or equivalent; alert on available below threshold or swap growth.
-
Disk metrics
df -h for usage; iostat -x for utilization and throughput. Alert on filesystem >85% or >90%; alert on high %util if I/O is critical. Include in agent metrics (node_filesystem_*, node_disk_*).
-
Aggregate and alert
Run an agent (Node Exporter) that exposes metrics; scrape with Prometheus or send to cloud monitoring. Define alerts for high CPU, low memory, disk full, and high I/O wait.
Summary
Collect CPU, memory, and disk metrics with OS tools or an agent; expose and scrape for alerting. Use this to set up basic system monitoring and to interpret resource usage.
Prerequisites
None.
Steps
Step 1: CPU metrics
Use top/htop or /proc/stat; expose via Node Exporter or similar for scraping.
Step 2: Memory metrics
Use free and vmstat; track available and swap; expose and alert.
Step 3: Disk metrics
Use df and iostat; alert on usage and utilization; include in agent metrics.
Step 4: Aggregate and alert
Scrape metrics with Prometheus or send to cloud; define alerts for critical thresholds.
Verification
- Metrics are collected and visible in your monitoring system; alerts fire when thresholds are exceeded.
Troubleshooting
No metrics — Ensure agent is running and reachable; check firewall and scrape config. Too many alerts — Tune thresholds; use hysteresis or rate-of-change to reduce noise.