服务器是带有 P410i 磁盘控制器的 HP DL360 G7。2xE5620 CPU。16GB 内存。Linux mysql 2.6.32-5-amd64 #1 SMP Mon Feb 25 00:26:11 UTC 2013 x86_64 GNU/Linux (Debian 6.0.7)
hpacucli“ctrl all show status”
Smart Array P410i in Slot 0 (Embedded)
Controller Status: OK
Cache Status: OK
Battery/Capacitor Status: OK
hpacucli "ctrl all show config"
Smart Array P410i in Slot 0 (Embedded) (sn: 5001438014555B80)
array A (SAS, Unused Space: 0 MB)
logicaldrive 1 (136.7 GB, RAID 1+0, OK)
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 72 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 72 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 72 GB, OK)
SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250 (WWID: 5001438014555B8F)
hpacucli "ctrl slot=0 ld all show"
Smart Array P410i in Slot 0 (Embedded)
array A
logicaldrive 1 (136.7 GB, RAID 1+0, OK)
我晚上运行休闲脚本:
#!/bin/bash
mkdir -p /isotest
for i in {1..200}; do
for j in {1..55}; do cp -v /root/ubuntu.iso /isotest/ubuntu.iso${j}; done
rm /isotest/ubuntu.iso*;
done
/root/ubuntu.iso 大小约为 2 GB。
在系统日志中有一些错误。我认为它与磁盘控制器有关:
Mar 28 06:59:17 mysql kernel: [850337.524306] INFO: task mandb:25565 blocked for more than 120 seconds.
Mar 28 06:59:17 mysql kernel: [850337.524337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 28 06:59:17 mysql kernel: [850337.524381] mandb D ffff88022740fa20 0 25565 25197 0x00000000
Mar 28 06:59:17 mysql kernel: [850337.524385] ffff88041ec4b880 0000000000000082 0000000000000000 000000009d778d11
Mar 28 06:59:17 mysql kernel: [850337.524388] ffffea000defe260 ffffea000defe260 000000000000f9e0 ffff88014d913fd8
Mar 28 06:59:17 mysql kernel: [850337.524390] 00000000000157c0 00000000000157c0 ffff88013228a350 ffff88013228a648
Mar 28 06:59:17 mysql kernel: [850337.524393] Call Trace:
Mar 28 06:59:17 mysql kernel: [850337.524404] [<ffffffff810168ec>] ? read_tsc+0xa/0x20
Mar 28 06:59:17 mysql kernel: [850337.524408] [<ffffffff8106bdca>] ? timekeeping_get_ns+0xe/0x2e
Mar 28 06:59:17 mysql kernel: [850337.524412] [<ffffffff810b4761>] ? sync_page+0x0/0x46
Mar 28 06:59:17 mysql kernel: [850337.524416] [<ffffffff812fc8f2>] ? io_schedule+0x73/0xb7
Mar 28 06:59:17 mysql kernel: [850337.524418] [<ffffffff810b47a2>] ? sync_page+0x41/0x46
Mar 28 06:59:17 mysql kernel: [850337.524421] [<ffffffff812fcd02>] ? __wait_on_bit_lock+0x3f/0x84
Mar 28 06:59:17 mysql kernel: [850337.524423] [<ffffffff810b472e>] ? __lock_page+0x5d/0x63
Mar 28 06:59:17 mysql kernel: [850337.524426] [<ffffffff810652e0>] ? wake_bit_function+0x0/0x23
Mar 28 06:59:17 mysql kernel: [850337.524428] [<ffffffff810b473d>] ? lock_page+0x9/0x1f
Mar 28 06:59:17 mysql kernel: [850337.524431] [<ffffffff810b4853>] ? find_lock_page+0x25/0x45
Mar 28 06:59:17 mysql kernel: [850337.524433] [<ffffffff810b4e63>] ? filemap_fault+0x1a5/0x2f6
Mar 28 06:59:17 mysql kernel: [850337.524438] [<ffffffff810cadf2>] ? __do_fault+0x54/0x3c3
Mar 28 06:59:17 mysql kernel: [850337.524455] [<ffffffffa01702d2>] ? __ext3_journal_stop+0x1f/0x3d [ext3]
Mar 28 06:59:17 mysql kernel: [850337.524458] [<ffffffff810cd146>] ? handle_mm_fault+0x3b8/0x80f
Mar 28 06:59:17 mysql kernel: [850337.524461] [<ffffffff81101d8e>] ? notify_change+0x2b3/0x2c5
Mar 28 06:59:17 mysql kernel: [850337.524464] [<ffffffff81103eb5>] ? mntput_no_expire+0x23/0xee
Mar 28 06:59:17 mysql kernel: [850337.524467] [<ffffffff81300096>] ? do_page_fault+0x2e0/0x2fc
Mar 28 06:59:17 mysql kernel: [850337.524469] [<ffffffff812fdf35>] ? page_fault+0x25/0x30
没有其他错误消息。
或者这个错误可能与内存有关?我已经在该服务器上运行 memtest86+ 好几天了,没有任何错误。
当服务器在数据中心时,我无法启动服务器。它始终显示错误:
Fatal PCI Express Device Error PCI ? B00/D00/F00
将其运送到我的工作场所后,它可以正常启动。在 ILO 事件日志中有以下错误:
Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 0, Function 0, Error status 0x00000000)
Uncorrectable Memory Error ((Processor 1, Memory Module 2))
Uncorrectable Memory Error ((Processor 1, Memory Module 3))
An Unrecoverable System Error (NMI) has occurred (System error code 0x00000000, 0x00000000)
我已经将 BIOS、磁盘控制器和驱动器固件更新到最新版本。
您的 RAM 有问题或系统板有问题。我建议系统板故障,因为板载 Smart Array P410 控制器。
国际劳工组织的信息非常具体。如果您查看 的输出,服务器端代理可能会说同样的话
hplog -v
。那是系统的 IML 日志。现在,我会重新安装所有组件,看看是否可以让系统以最小配置启动:一个 CPU,最少安装的 DIMM。
您还可以下载可引导的HP SmartStart .ISO并通过 ILO 加载它以运行诊断循环。
这是 G7 ProLiant,服务器应该仍在标准保修期内。致电惠普。