在我们带有 KVM 的虚拟化服务器上,CPU 核心在 10 分钟后循环禁用和启用(所有虚拟机的每次禁用都会导致 15 秒内挂起)。
它发生在一周前的雷雨中,当时由于数据磁盘错误(系统磁盘正常)所有虚拟服务器都挂了。所以我们换了数据盘。接下来,我们尝试将主机系统从 ubuntu natty(内核 2.6)升级到 ubuntu precise(3.2),没有任何变化。
我只找到一个关于它的论坛,没有解决方案 http://ubuntuforums.org/showthread.php?p=12071553
我试过打开 kvm 调试
/sys/kernel/debug/tracing/trace_pipe
并在 syslog 中按内核时间找到确切的位置,但我不理解日志,也没有看到任何重要的区别
我认为这可能是来自主板的一些不良信号。由于磁盘错误,主板可能会发生一些问题,但我不知道如何找到
系统日志部分带有一个禁用/启用循环
Jul 14 15:36:44 node-01 kernel: [56713.568733] kvm: disabling virtualization on CPU1
Jul 14 15:36:44 node-01 kernel: [56713.668842] CPU 1 is now offline
Jul 14 15:36:44 node-01 kernel: [56713.670835] CPU 3 MCA banks CMCI:2 CMCI:3 CMCI:5
Jul 14 15:36:44 node-01 kernel: [56713.673771] kvm: disabling virtualization on CPU2
Jul 14 15:36:44 node-01 kernel: [56713.674492] CPU 2 is now offline
Jul 14 15:36:44 node-01 kernel: [56713.680172] kvm: disabling virtualization on CPU3
Jul 14 15:36:44 node-01 kernel: [56713.681114] CPU 3 is now offline
Jul 14 15:36:44 node-01 kernel: [56713.681119] SMP alternatives: switching to UP code
Jul 14 15:36:44 node-01 kernel: [56713.701971] init: anacron main process (3613) killed by TERM signal
Jul 14 15:36:44 node-01 kernel: [56713.709803] r8169 0000:01:00.0: eth0: link down
Jul 14 15:36:44 node-01 kernel: [56713.710421] br0: port 1(eth0) entering forwarding state
Jul 14 15:36:47 node-01 kernel: [56716.675313] r8169 0000:01:00.0: eth0: link up
Jul 14 15:36:47 node-01 kernel: [56716.676438] br0: port 1(eth0) entering forwarding state
Jul 14 15:36:47 node-01 kernel: [56716.676454] br0: port 1(eth0) entering forwarding state
Jul 14 15:36:56 node-01 kernel: [56725.666787] br0: port 1(eth0) entering forwarding state
Jul 14 15:37:02 node-01 kernel: [56730.815937] SMP alternatives: switching to SMP code
Jul 14 15:37:02 node-01 kernel: [56730.825021] Booting Node 0 Processor 1 APIC 0x4
Jul 14 15:37:02 node-01 kernel: [56730.825025] smpboot cpu 1: start_ip = 9a000
Jul 14 15:37:02 node-01 kernel: [56730.836033] Calibrating delay loop (skipped) already calibrated this CPU
Jul 14 15:37:02 node-01 kernel: [56730.837012] kvm: enabling virtualization on CPU1
Jul 14 15:37:02 node-01 kernel: [56730.858555] NMI watchdog enabled, takes one hw-pmu counter.
Jul 14 15:37:02 node-01 kernel: [56730.862547] Booting Node 0 Processor 2 APIC 0x1
Jul 14 15:37:02 node-01 kernel: [56730.862551] smpboot cpu 2: start_ip = 9a000
Jul 14 15:37:02 node-01 kernel: [56730.873460] Calibrating delay loop (skipped) already calibrated this CPU
Jul 14 15:37:02 node-01 kernel: [56730.874453] kvm: enabling virtualization on CPU2
Jul 14 15:37:02 node-01 kernel: [56730.896371] NMI watchdog enabled, takes one hw-pmu counter.
Jul 14 15:37:02 node-01 kernel: [56730.898581] Booting Node 0 Processor 3 APIC 0x5
Jul 14 15:37:02 node-01 kernel: [56730.898586] smpboot cpu 3: start_ip = 9a000
Jul 14 15:37:02 node-01 kernel: [56730.909496] Calibrating delay loop (skipped) already calibrated this CPU
Jul 14 15:37:02 node-01 kernel: [56730.910227] kvm: enabling virtualization on CPU3
Jul 14 15:37:02 node-01 kernel: [56730.930644] NMI watchdog enabled, takes one hw-pmu counter.
Jul 14 15:37:02 node-01 kernel: [56730.963737] r8169 0000:01:00.0: eth0: link down
Jul 14 15:37:02 node-01 kernel: [56730.964069] br0: port 1(eth0) entering forwarding state
Jul 14 15:37:04 node-01 kernel: [56733.432535] r8169 0000:01:00.0: eth0: link up
Jul 14 15:37:04 node-01 kernel: [56733.433808] br0: port 1(eth0) entering forwarding state
Jul 14 15:37:04 node-01 kernel: [56733.433823] br0: port 1(eth0) entering forwarding state
Jul 14 15:37:13 node-01 kernel: [56742.424751] br0: port 1(eth0) entering forwarding state
感谢您提供任何提示,如何查找错误。