Após uma queda de energia e uma falha no UPS, o pool ZFS está em um estado que não consigo entender:
$ zpool status -c serial
pool: storage
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
scan: scrub repaired 0B in 1 days 10:40:40 with 0 errors on Wed Apr 2 10:40:42 2025
config:
NAME STATE READ WRITE CKSUM serial
storage DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
8844532865098720143 FAULTED 0 0 0 was /dev/sda1 ZL2AJ3S10000C1111G0H
scsi-35000c500cafcbb67 ONLINE 0 0 0 ZL2AJ3S10000C1111G0H
scsi-35000c500cafc9a63 ONLINE 0 0 0 ZL2AKR3F0000C1128SV6
scsi-35000c500cafcb303 ONLINE 0 0 0 ZL2AKQVX0000C1143G62
scsi-35000c500cafcff33 ONLINE 0 0 0 ZL2AKAG10000C11445AW
scsi-35000c500cafc392b ONLINE 0 0 0 ZL2AKCWB0000C1143ARJ
wwn-0x5000c500cafa8287 ONLINE 0 0 0 ZL2AHSSL0000C107BQWN
scsi-35000c500cafbec03 ONLINE 0 0 0 ZL2AGE6X0000C1122SME
7647119559265938125 FAULTED 0 0 0 was /dev/sdi1 ZL2AGE6X0000C1122SME
scsi-35000c500cafca18b ONLINE 0 0 0 ZL2AKR0B0000C1128RNJ
scsi-35000c500cafc29c3 ONLINE 0 0 0 ZL2AGDN30000C1140NTP
scsi-35000c500cafbe293 ONLINE 0 0 0 ZL2AKDSM0000C11278YB
raidz2-1 DEGRADED 0 0 0
scsi-SSEAGATE_ST16000NM002G_ZL2AKBXB0000C1126C6X ONLINE 0 0 0 ZL2AKBXB0000C1126C6X
1470086598115969130 UNAVAIL 0 0 0 was /dev/sdy1 20342A6158FC
wwn-0x5000c500cae0af8b ONLINE 0 0 0 ZL29T97Q0000C107188W
12722321230162544658 FAULTED 0 0 0 was /dev/sdl1 ZL2AKDSM0000C11278YB
scsi-35000c500cafc3be7 ONLINE 0 0 0 ZL2AJJZF0000C1143AH2
scsi-35000c500cafc611f ONLINE 0 0 0 ZL2AKC6Z0000C11438R8
scsi-35000c500cafcfb97 ONLINE 0 0 0 ZL2AHY5R0000C11441Z5
scsi-35000c500cafc8663 ONLINE 0 0 0 ZL2AKBNX0000C1128RLR
scsi-35000c500cafc9fa3 ONLINE 0 0 0 ZL2AKR0Y0000C1128RN1
scsi-35000c500cafc96b3 ONLINE 0 0 0 ZL2AKR6T0000C1128SQP
scsi-35000c500cafc2f23 ONLINE 0 0 0 ZL2AK1NP0000C1143FXB
scsi-35000c500cafc4ccf ONLINE 0 0 0 ZL2AKCKG0000C1143ETM
logs
nvme-INTEL_SSDPED1K375GA_PHKS01530050375AGN ONLINE 0 0 0 PHKS01530050375AGN
cache
sdu FAULTED 0 0 0 corrupted data ZL2AKR6T0000C1128SQP
sdw FAULTED 0 0 0 corrupted data ZL2AK1NP0000C1143FXB
Há muita coisa que não entendo no status acima:
- Todos os discos COM DEFEITO têm o mesmo número de série de um dos discos ONLINE
- O número de série do disco UNAVAIL é de um dos dois SSDs que uso para cache, não de um dos HDDs usados para o pool
- As séries dos dois discos de cache com FALHA são as de dois HDDs de armazenamento
O que poderia ter acontecido para reduzir o pool nesse estado? É recuperável? Pensei em tentar o procedimento descrito aqui, mas não consigo nem entender quais são os discos com defeito. O procedimento simples descrito nesta postagem funcionaria no meu caso? Preciso muito de ajuda com isso, agradeço antecipadamente.