我在 Hetzner EX4S(Intel Core i7-2600、32G RAM、2x3Tb SATA HDD)上运行 SmartOS 系统。主机上有六个虚拟机:
[root@10-bf-48-7f-e7-03 ~]# vmadm list
UUID TYPE RAM STATE ALIAS
d2223467-bbe5-4b81-a9d1-439e9a66d43f KVM 512 running xxxx1
5f36358f-68fa-4351-b66f-830484b9a6ee KVM 1024 running xxxx2
d570e9ac-9eac-4e4f-8fda-2b1d721c8358 OS 1024 running xxxx3
ef88979e-fb7f-460c-bf56-905755e0a399 KVM 1024 running xxxx4
d8e06def-c9c9-4d17-b975-47dd4836f962 KVM 4096 running xxxx5
4b06fe88-db6e-4cf3-aadd-e1006ada7188 KVM 9216 running xxxx5
[root@10-bf-48-7f-e7-03 ~]#
主机每周重启几次,但没有故障转储,日志中/var/crash
也没有任何消息。/var/adm/messages
基本上/var/adm/messages
看起来像是硬重置:
2012-11-23T08:54:43.210625+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:14:43.187589+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:34:43.165100+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:54:43.142065+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:14:43.119365+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:34:43.096351+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:54:43.073821+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:57:55.610954+00:00 10-bf-48-7f-e7-03 genunix: [ID 540533 kern.notice] #015SunOS Release 5.11 Version joyent_20121018T224723Z 64-bit
2012-11-23T10:57:55.610962+00:00 10-bf-48-7f-e7-03 genunix: [ID 299592 kern.notice] Copyright (c) 2010-2012, Joyent Inc. All rights reserved.
2012-11-23T10:57:55.610967+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: lgpg
2012-11-23T10:57:55.610971+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: tsc
2012-11-23T10:57:55.610974+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: msr
2012-11-23T10:57:55.610978+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mtrr
2012-11-23T10:57:55.610981+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: pge
2012-11-23T10:57:55.610984+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: de
2012-11-23T10:57:55.610987+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: cmov
2012-11-23T10:57:55.610995+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mmx
2012-11-23T10:57:55.611000+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mca
2012-11-23T10:57:55.611004+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: pae
2012-11-23T10:57:55.611008+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: cv8
问题是有时主机在重启时会丢失网络接口,因此我们需要执行手动硬件重置才能恢复。我们没有对服务器控制台的物理或虚拟访问权限——没有 KVM、没有 iLO 或类似的东西。因此,调试的唯一方法是分析故障转储/日志文件。我不是 SmartOS/Solaris 专家,所以我不确定如何进行。是否有适用于 SmartOS 的 Linux 网络控制台的等价物?我可以以某种方式将控制台输出重定向到网络端口吗?也许我遗漏了一些明显的东西并且崩溃信息位于其他地方。
运行命令
dumpadm
以检查故障转储是否已启用,以及在什么设备上启用。如果它已启用并且您没有发现故障转储,则怀疑是硬件故障并要求您的托管公司将您转移到不同的物理服务器。(他们还将能够检查硬件日志和故障灯并致电供应商等。)