在 CentOS 7(带有 3.10 内核)服务器上,我惊讶地发现 mysqld 进程被“oomkilled”。原因很明显,我有时会运行一个非常消耗内存的进程(WebTorrent),有时会变得令人厌恶(看起来像内存泄漏)。对我来说没关系,只要发生这种情况时它就会被杀死。在另一个系统(Debian 11)上,这是实际行为,但在较旧的 CentOS 7 上,其他进程被杀死,我不明白为什么不是最明显的一个?
从日志(仅选定的行):
Apr 20 09:12:57 vps001 kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Apr 20 09:12:58 vps001 kernel: Out of memory (oom_kill_allocating_task): Kill process 996 (mysqld) score 0 or sacrifice child
Apr 20 09:12:58 vps001 kernel: Killed process 918 (mysqld), UID 27, total-vm:2184052kB, anon-rss:18492kB, file-rss:0kB, shmem-rss:0kB
--
Apr 20 09:26:40 vps001 kernel: in:imjournal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Apr 20 09:26:40 vps001 kernel: Out of memory (oom_kill_allocating_task): Kill process 663 (in:imjournal) score 0 or sacrifice child
Apr 20 09:26:40 vps001 kernel: Killed process 653 (rsyslogd), UID 0, total-vm:308640kB, anon-rss:340kB, file-rss:0kB, shmem-rss:156kB
--
Apr 20 09:26:40 vps001 kernel: tmux invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Apr 20 09:26:41 vps001 kernel: Out of memory (oom_kill_allocating_task): Kill process 23040 (tmux) score 0 or sacrifice child
Apr 20 09:26:41 vps001 kernel: Killed process 23041 (bash), UID 0, total-vm:115680kB, anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
--
Apr 20 09:26:41 vps001 kernel: node invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
Apr 20 09:26:41 vps001 kernel: Out of memory (oom_kill_allocating_task): Kill process 23241 (node) score 0 or sacrifice child
Apr 20 09:26:41 vps001 kernel: Killed process 23239 (WebTorrent), UID 1000, total-vm:14750096kB, anon-rss:1618448kB, file-rss:0kB, shmem-rss:0kB
3 个进程在好的进程之前被杀死,原因我无法理解。
我期望的事情:
- 消耗内存最多的进程被杀死
- UID 0 的进程不会在其他 UID 之前被杀死
- UID 27 比 UID 1000 更“重要”
我想更好地理解 OOM 行为,特别是为什么我所有的假设都是错误的。
阅读内核源代码,现在我明白了,很明显:
oom_kill_allocating_task
意味着不执行任何扫描,并且询问进程被杀死。这不是默认行为,我在某处设置了 sysctl 配置:
所以我一定是很早之前就设置了它,而没有了解后果。感谢您的意见。