我在带有 xfs 的 linux 机器上有一个奇怪的错误,我不知道如何调试和修复它。
以下是 dmesg 的摘录:
Info fld=0x17
end_request: I/O error, dev sde, sector 34412208504
sd 7:0:0:0: SCSI error: return code = 0x08000002
sde: Current: sense key: Aborted Command
<<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23
Info fld=0x17
end_request: I/O error, dev sde, sector 35840057200
sd 7:0:0:0: SCSI error: return code = 0x08000002
sde: Current: sense key: Aborted Command
<<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23
Info fld=0x17
end_request: I/O error, dev sde, sector 35799212408
sd 7:0:0:0: SCSI error: return code = 0x08000002
sde: Current: sense key: Aborted Command
<<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23
Info fld=0x17
end_request: I/O error, dev sde, sector 39444095352
sd 7:0:0:1: SCSI error: return code = 0x08000002
sdf: Current: sense key: Aborted Command
<<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23
Info fld=0x17
end_request: I/O error, dev sdf, sector 32974487928
device-mapper: multipath: Failing path 8:80.
sd 7:0:0:1: SCSI error: return code = 0x08000002
sdf: Current: sense key: Aborted Command
<<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23
Info fld=0x17
end_request: I/O error, dev sdf, sector 32973734264
sd 7:0:0:1: SCSI error: return code = 0x08000002
sdf: Current: sense key: Aborted Command
<<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23
Info fld=0x17
end_request: I/O error, dev sdf, sector 22213009752
sd 7:0:0:1: SCSI error: return code = 0x08000002
sdf: Current: sense key: Aborted Command
<<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23
Info fld=0x17
end_request: I/O error, dev sdf, sector 32940065144
sd 7:0:0:1: SCSI error: return code = 0x08000002
sdf: Current: sense key: Aborted Command
<<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23
Info fld=0x17
end_request: I/O error, dev sdf, sector 32974552944
sd 7:0:0:1: SCSI error: return code = 0x08000002
sdf: Current: sense key: Aborted Command
<<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23
Info fld=0x17
end_request: I/O error, dev sdf, sector 17956282744
Buffer I/O error on device dm-3, logical block 9666270717
lost page write due to I/O error on dm-3
I/O error in filesystem ("dm-3") meta-data dev dm-3 block 0xe7ffb01c2 ("xlog_iodone") error 5 buf count 12800
Buffer I/O error on device dm-3, logical block 4028959741
lost page write due to I/O error on dm-3
xfs_force_shutdown(dm-3,0x2) called from line 956 of file fs/xfs/xfs_log.c. Return address = 0xffffffff883bec58
Filesystem "dm-3": Log I/O Error Detected. Shutting down filesystem: dm-3
Please umount the filesystem, and rectify the problem(s)
我该如何调试?
谢谢。
我知道这是一个非常古老的帖子,但由于答案不正确,我认为对未来的访问者发布正确答案可能会有用......
OP 报告的错误消息与 XFS 本身无关,而是驱动器/电缆损坏的结果。检查错误条目:
系统无法检索位于
sde
LBA 地址的数据39444095352
。这通常意味着磁盘上有坏块。SCSI 命令因超时(由坏块引起)而中止,磁盘返回一个特定的供应商代码,更详细地解释错误。
发出一个
smartctl --all
显示各种内部磁盘计数器。ID 为 5 (Reallocated_Sector_Ct)、197 (Current_Pending_Sector) 和 198 (Offline_Uncorrectable)的属性特别受关注,因为它们显示了磁盘块不可读/重新映射的情况。在这种情况下你能做什么?最安全和强烈推荐的方法是将整个可读内容备份到另一个安全磁盘(可能使用对磁盘错误有弹性的东西,如
ddrescue
)如果这种方法不可行,那么还有两种可能性:
badblocks -n <dev>
(此处为手册页):它将启动非破坏性读/写测试,该测试应触发磁盘上的坏块重新映射过程dd if=/dev/zero of=/dev/sde bs=512 count=1 seek=39444095352
请注意,上述两种方法(尤其是第二种)都会导致数据丢失,因为受影响的、不可读的扇区将被覆盖。
恢复/覆盖完成后,您应该运行完整的文件系统检查,在这种情况下发出
xfs_repair /dev/sde
您可以使用xfs_db命令来调试XFS文件系统。请使用以下语法: