我通过运行和替换了zpool status
ZFS raidz1 池中的一个故障硬盘(SMART 中的错误计数增加,读/写/校验和错误) 。重新同步很快就开始了,但一直给我带来写入错误(请参见下文)。当这些重新同步过程最终完成时,池将保持在 DEGRADED 状态。如果出现错误或重新启动,另一个重新同步过程将自动启动,但可能会出现不同数量的写入错误。zfs offline
zfs replace
zpool clear
SMART 显示此新驱动器没有错误。我还尝试了不同的新驱动器(购买了两个替换件)并更换了 SATA 电缆。在重新同步期间,总是会替换此 vdev(使用任一驱动器)并给出写入错误。这让我怀疑 ZFS 池在某种程度上受到了损害,但它继续运行并且zfs send
每天晚上都能运行。
排除故障并解决此问题的正确方法是什么(例如,像zfs scrub
只有四个驱动器中的三个驱动器的降级池,因为zfs replace
无法在没有错误的情况下完成)?我确实有备份,zfs send/receive
希望这些备份是好的副本(ZFS 校验和是否收到流?)。
pool: space
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Nov 3 09:56:59 2023
2.92T scanned at 594M/s, 2.63T issued at 536M/s, 5.90T total
669G resilvered, 44.59% done, 01:46:40 to go
config:
NAME STATE READ WRITE CKSUM
space DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ata-ST3000DM001-1CH166_ZZZZ1111 ONLINE 0 0 0 block size: 512B configured, 4096B native
ata-ST3000DM001-1CH166_ZZZZ2222 ONLINE 0 0 0 block size: 512B configured, 4096B native
replacing-2 UNAVAIL 0 0 0 insufficient replicas
13284017409215481231 OFFLINE 0 0 0 was /dev/disk/by-id/ata-ST3000DM001-1CH166_Z1F12QMZ-part1
ata-ST18000NM003D-3DL103_YYYY1111 FAULTED 0 4.38K 0 too many errors (resilvering)
ata-ST3000DM001-1CH166_ZZZZ4444 ONLINE 0 0 0 block size: 512B configured, 4096B native
编辑:我在以下位置看到这些错误dmesg
:
[19561.708059] sd 2:0:1:0: [sdb] tag#226 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=31s
[19561.708063] sd 2:0:1:0: [sdb] tag#226 Sense Key : Illegal Request [current]
[19561.708067] sd 2:0:1:0: [sdb] tag#226 Add. Sense: Unaligned write command
[19561.708070] sd 2:0:1:0: [sdb] tag#226 CDB: Write(16) 8a 00 00 00 00 00 68 f4 ff 40 00 00 00 53 00 00
[19561.708073] blk_update_request: I/O error, dev sdb, sector 1760886592 op 0x1:(WRITE) flags 0x700 phys_seg 83 prio class 0
感谢reddit的一些提示,这可能是一个软件/固件错误,希捷 Exos X20 ST18000NM003D 18TB 硬盘无法在 ThinkStation S30 / Ubuntu 22.04 5.15.0-88-generic 上使用 512B 扇区。我最终销毁了该池并使用 4K 扇区重新创建它,但错误尚未返回。