System metrics basics (CPU, memory, disk)

Collect and interpret basic system metrics: CPU usage, memory (used, available, swap), and disk usage. Use top, free, df, and similar tools or an agent (e.g. Node Exporter) for monitoring. Use this when setting up monitoring or when diagnosing resource-related issues.

Intent: How-to

Quick answer

CPU: top or htop for per-process; /proc/stat or node_exporter for system-wide. User, system, iowait, steal. High iowait means CPU waiting on disk; high steal (VM) means host is busy.
Memory: free -h; note available (not just free). Swap usage; if swap is growing, memory pressure is high. Per-process: top (RES, VSZ) or ps -o rss,vsz.
Disk: df -h for usage; iostat for I/O. Watch for full filesystems and high %util (saturation). Collect metrics with an agent (Node Exporter, collectd) and send to a time-series DB or monitoring service.

Steps

CPU metrics

top or htop; note %CPU per process and overall. For scripts: read /proc/stat or use mpstat. Use node_exporter or collectd to expose metrics for Prometheus or your monitoring stack.
Memory metrics

free -h; focus on available. Check swap (si/so in vmstat for swap in/out). Expose node_memory_* with Node Exporter or equivalent; alert on available below threshold or swap growth.
Disk metrics

df -h for usage; iostat -x for utilization and throughput. Alert on filesystem >85% or >90%; alert on high %util if I/O is critical. Include in agent metrics (node_filesystem_*, node_disk_*).
Aggregate and alert

Run an agent (Node Exporter) that exposes metrics; scrape with Prometheus or send to cloud monitoring. Define alerts for high CPU, low memory, disk full, and high I/O wait.

Summary

Collect CPU, memory, and disk metrics with OS tools or an agent; expose and scrape for alerting. Use this to set up basic system monitoring and to interpret resource usage.

Prerequisites

None.

Steps

Step 1: CPU metrics

Use top/htop or /proc/stat; expose via Node Exporter or similar for scraping.

Step 2: Memory metrics

Use free and vmstat; track available and swap; expose and alert.

Step 3: Disk metrics

Use df and iostat; alert on usage and utilization; include in agent metrics.

Step 4: Aggregate and alert

Scrape metrics with Prometheus or send to cloud; define alerts for critical thresholds.

Verification

Metrics are collected and visible in your monitoring system; alerts fire when thresholds are exceeded.

Troubleshooting

No metrics — Ensure agent is running and reachable; check firewall and scrape config. Too many alerts — Tune thresholds; use hysteresis or rate-of-change to reduce noise.

System metrics basics (CPU, memory, disk)

Quick answer

Steps

CPU metrics

Memory metrics

Disk metrics

Aggregate and alert

Summary

Prerequisites

Steps

Step 1: CPU metrics

Step 2: Memory metrics

Step 3: Disk metrics

Step 4: Aggregate and alert

Verification

Troubleshooting

Next steps

Continue to