这台机器有大量的交换,但进程仍然偶尔会被 oom-killer 杀死。谁能解释这种行为,更重要的是如何防止它发生?
Dmesg 输出:
python invoked oom-killer: gfp_mask=0x1200d2, order=0, oomkilladj=4
Pid: 13996, comm: python Not tainted 2.6.27-gentoo-r8cluster-e1000 #9
Call Trace:
[<ffffffff8025ab6b>] oom_kill_process+0x57/0x1dc
[<ffffffff802460c7>] getnstimeofday+0x53/0xb3
[<ffffffff8025ae78>] badness+0x16a/0x1a9
[<ffffffff8025b0a9>] out_of_memory+0x1f2/0x25c
[<ffffffff8025e181>] __alloc_pages_internal+0x30f/0x3b2
[<ffffffff8026fea0>] read_swap_cache_async+0x48/0xc0
[<ffffffff8026ff6f>] swapin_readahead+0x57/0x98
[<ffffffff80266d0e>] handle_mm_fault+0x408/0x706
[<ffffffff8057da33>] do_page_fault+0x42c/0x7e7
[<ffffffff8057baf9>] error_exit+0x0/0x51
Mem-Info:
Node 0 DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
CPU 1: hi: 0, btch: 1 usd: 0
CPU 2: hi: 0, btch: 1 usd: 0
CPU 3: hi: 0, btch: 1 usd: 0
Node 0 DMA32 per-cpu:
CPU 0: hi: 186, btch: 31 usd: 103
CPU 1: hi: 186, btch: 31 usd: 48
CPU 2: hi: 186, btch: 31 usd: 136
CPU 3: hi: 186, btch: 31 usd: 183
Active:480346 inactive:483 dirty:0 writeback:10 unstable:0
free:3408 slab:5146 mapped:1408 pagetables:2687 bounce:0
Node 0 DMA free:8024kB min:20kB low:24kB high:28kB active:1156kB inactive:0kB present:8364kB pages_scanned:3246 all_unreclaimable? yes
lowmem_reserve[]: 0 2003 2003 2003
Node 0 DMA32 free:5608kB min:5716kB low:7144kB high:8572kB active:1920228kB inactive:1932kB present:2051308kB pages_scanned:2941301 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 8*4kB 3*8kB 4*16kB 3*32kB 4*64kB 3*128kB 2*256kB 3*512kB 3*1024kB 1*2048kB 0*4096kB = 8024kB
Node 0 DMA32: 42*4kB 6*8kB 1*16kB 0*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB = 5608kB
325424 total pagecache pages
323900 pages in swap cache
Swap cache stats: add 20776604, delete 20452704, find 7856195/10744535
Free swap = 151691424kB
Total swap = 156290896kB
524032 pages RAM
9003 pages reserved
331431 pages shared
186210 pages non-shared
Out of memory: kill process 12965 (bash) score 2236480 or a child
Killed process 13996 (python)
虚拟机相关的 sysctl:
vm.overcommit_memory = 0
vm.panic_on_oom = 0
vm.oom_kill_allocating_task = 0
vm.oom_dump_tasks = 0
vm.overcommit_ratio = 50
vm.page-cluster = 3
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 3000
vm.nr_pdflush_threads = 2
vm.swappiness = 60
vm.nr_hugepages = 0
vm.hugetlb_shm_group = 0
vm.hugepages_treat_as_movable = 0
vm.nr_overcommit_hugepages = 0
vm.lowmem_reserve_ratio = 256 256 32
vm.drop_caches = 0
vm.min_free_kbytes = 5740
vm.percpu_pagelist_fraction = 0
vm.max_map_count = 65536
vm.laptop_mode = 0
vm.block_dump = 0
vm.vfs_cache_pressure = 100
vm.legacy_va_layout = 0
vm.zone_reclaim_mode = 0
vm.min_unmapped_ratio = 1
vm.min_slab_ratio = 5
vm.stat_interval = 1
vm.numa_zonelist_order = default
请查看此页面以获取一些可能有助于诊断您的问题的信息。
特别是,您需要查看
/proc/meminfo
并/proc/slabinfo
获取更多信息作为开始。您有一个设备驱动程序或其他内核子系统分配了大量的实内存。这就是它没有换出到您的交换空间的原因。
您需要确定您正在执行的工作负载并尝试隔离分配大量内存的内核系统。