我的池中有一个磁盘故障(引发了太多错误)。
The number of I/O errors associated with a ZFS device exceeded
acceptable levels. ZFS has marked the device as faulted.
impact: Fault tolerance of the pool may be compromised.
eid: 52
class: statechange
state: FAULTED
host: databank-a
time: 2021-12-11 16:36:33-0500
vpath: /dev/disk02_old
vphys: pci-0000:00:1f.2-ata-4
vguid: 0x73F7B0B1D1B45864
devid: /dev/disk02_old
pool: 0x47B3E7C1336F1F4F
所以,我用一个全新的磁盘替换它zpool replace pool /dev/foo /dev/bar
(zpool clear pool /dev/bar
pool: DATA01
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Dec 15 11:23:57 2021
6.83T scanned at 256M/s, 5.80T issued at 217M/s, 9.08T total
232G resilvered, 63.85% done, 0 days 04:24:05 to go
config:
NAME STATE READ WRITE CKSUM
DATA01 DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
/dev/disk01 ONLINE 0 0 0
replacing-1 UNAVAIL 0 0 0 insufficient replicas
8356341911383201892 UNAVAIL 0 0 0 was /dev/disk02_old
/dev/disk02_new FAULTED 0 81 0 too many errors (resilvering)
/dev/disk03 ONLINE 0 0 0
/dev/disk04 ONLINE 0 0 0
errors: No known data errors
驱动器没有故障的可能性有多大?
可能驱动器有故障。如果错误计数器是正确的,那么前几 TB 使用中的几十个错误比预期的要差。而且您已经清除了错误,因此这不是一次性的瞬态事件。
虽然Backblaze 消费者驱动器故障数据并不完全是您所拥有的,但它表明早期故障仍然存在。即使早死率很低,你也可能是几千人中不幸的人,得到一个不太完美的产品。
开始对来自不同媒体的重要数据进行备份恢复测试,以防万一在最坏的情况下需要。确保有更多的备用磁盘库存。重新同步完成后,再次检查磁盘。根据需要继续更换它们。