Out of memory (OOM): how to diagnose and fix
Topic: Servers linux
Summary
Diagnose OOM: check dmesg and journalctl for oom-killer, identify the process killed and what was using memory; fix by adding RAM, limiting process memory, or fixing leaks. Use this when the system kills processes or becomes unresponsive and logs show out-of-memory.
Intent: Troubleshooting
Quick answer
- dmesg | grep -i oom or journalctl -k -b -1 | grep -i oom; note the process name and PID killed; check /var/log/syslog or journal for OOM report.
- Reduce usage: limit app memory (systemd MemoryMax=), fix leaks (restart and update app), add swap for breathing room (not a long-term fix for chronic OOM).
- Prevent: set MemoryMax in the service unit; tune vm.overcommit if needed; add RAM or scale out; monitor memory and set alerts.
Prerequisites
Steps
-
Confirm OOM occurred
dmesg | grep -i "out of memory\|oom"; journalctl -k -b -1 | grep -i oom; read the line that says "Killed process" and the score/reason.
-
Identify the victim and consumers
The log shows which process was killed; before next OOM use free -h and ps -eo pid,rss,cmd --sort=-rss to see who uses memory; check systemd units for MemoryMax.
-
Fix or limit
Add MemoryMax= to the service unit and reload; fix app leak or upgrade; add swap (swapon) for temporary relief; add RAM if usage is legitimate.
-
Verify and monitor
free -h; no OOM in dmesg after fix; set alert when memory usage is high so you can act before OOM.
Summary
You will confirm an OOM from logs, identify the process killed and memory consumers, then fix by limiting memory, fixing leaks, or adding resources. Use this when processes are killed or the system is unstable and logs show OOM.
Prerequisites
- Root or sudo; access to dmesg or journal.
Steps
Step 1: Confirm OOM occurred
dmesg | grep -i oom
journalctl -k -b -1 | grep -i "out of memory\|Killed process"
Note the process name and PID that was killed.
Step 2: Identify the victim and consumers
free -h
ps -eo pid,rss,cmd --sort=-rss | head -20
systemctl show nginx --property=MemoryMax
Step 3: Fix or limit
- In the service unit:
MemoryMax=512M(or appropriate);systemctl daemon-reloadand restart. - Fix application memory leak or upgrade; add swap:
sudo fallocate -l 2G /swapfile; sudo chmod 600 /swapfile; sudo mkswap /swapfile; sudo swapon /swapfileand add to fstab.
Step 4: Verify and monitor
- free -h shows swap if added; no new OOM in dmesg; alert on memory usage.
Verification
- OOM entries stop; memory usage stable or capped; service runs within limits.
Troubleshooting
OOM keeps happening — MemoryMax may be too high for the box; lower it or add RAM; find and fix the leak in the app.
Wrong process killed — OOM killer chooses by score; adjust oom_score_adj for critical processes (negative = less likely to be killed); prefer fixing the real consumer.