Out of memory (OOM): how to diagnose and fix

Topic: Servers linux

Summary

Diagnose OOM: check dmesg and journalctl for oom-killer, identify the process killed and what was using memory; fix by adding RAM, limiting process memory, or fixing leaks. Use this when the system kills processes or becomes unresponsive and logs show out-of-memory.

Intent: Troubleshooting

Quick answer

  • dmesg | grep -i oom or journalctl -k -b -1 | grep -i oom; note the process name and PID killed; check /var/log/syslog or journal for OOM report.
  • Reduce usage: limit app memory (systemd MemoryMax=), fix leaks (restart and update app), add swap for breathing room (not a long-term fix for chronic OOM).
  • Prevent: set MemoryMax in the service unit; tune vm.overcommit if needed; add RAM or scale out; monitor memory and set alerts.

Prerequisites

Steps

  1. Confirm OOM occurred

    dmesg | grep -i "out of memory\|oom"; journalctl -k -b -1 | grep -i oom; read the line that says "Killed process" and the score/reason.

  2. Identify the victim and consumers

    The log shows which process was killed; before next OOM use free -h and ps -eo pid,rss,cmd --sort=-rss to see who uses memory; check systemd units for MemoryMax.

  3. Fix or limit

    Add MemoryMax= to the service unit and reload; fix app leak or upgrade; add swap (swapon) for temporary relief; add RAM if usage is legitimate.

  4. Verify and monitor

    free -h; no OOM in dmesg after fix; set alert when memory usage is high so you can act before OOM.

Summary

You will confirm an OOM from logs, identify the process killed and memory consumers, then fix by limiting memory, fixing leaks, or adding resources. Use this when processes are killed or the system is unstable and logs show OOM.

Prerequisites

  • Root or sudo; access to dmesg or journal.

Steps

Step 1: Confirm OOM occurred

dmesg | grep -i oom
journalctl -k -b -1 | grep -i "out of memory\|Killed process"

Note the process name and PID that was killed.

Step 2: Identify the victim and consumers

free -h
ps -eo pid,rss,cmd --sort=-rss | head -20
systemctl show nginx --property=MemoryMax

Step 3: Fix or limit

  • In the service unit: MemoryMax=512M (or appropriate); systemctl daemon-reload and restart.
  • Fix application memory leak or upgrade; add swap: sudo fallocate -l 2G /swapfile; sudo chmod 600 /swapfile; sudo mkswap /swapfile; sudo swapon /swapfile and add to fstab.

Step 4: Verify and monitor

  • free -h shows swap if added; no new OOM in dmesg; alert on memory usage.

Verification

  • OOM entries stop; memory usage stable or capped; service runs within limits.

Troubleshooting

OOM keeps happening — MemoryMax may be too high for the box; lower it or add RAM; find and fix the leak in the app.

Wrong process killed — OOM killer chooses by score; adjust oom_score_adj for critical processes (negative = less likely to be killed); prefer fixing the real consumer.

Next steps

Continue to