Tenho um array RAID 6 existente onde uma das unidades falhou, mas, quando tento substituí-la usando mdadm --manage /dev/md1 --add /dev/sde3
o mdadm, ocorrem erros e recebo a seguinte mensagem no log do kernel:
md: sde3 does not have a valid v1.2 superblock, not importing!
Note que eu zerei a unidade de substituição e zerei repetidamente o superbloco usando mdadm --zero-superblock -e 1.2 --force /dev/sde3
. Pode ser relevante que o add pareça gravar um novo superbloco na unidade, marcando a unidade como Spare, mas na verdade não a adiciona ao array.
Acho que o erro que cometi foi ter removido a unidade com falha mdadm --manage md1 --remove /dev/sde3
depois que ela falhou e antes de tentar adicionar a substituição.
Tentei várias combinações de zerar o superbloco da unidade de substituição e montar o array com ou sem a unidade de substituição, mas todas falharam com o mesmo erro.
Neste ponto, acho que minha única opção é recriar o array usando --assume-clean
e marcando o slot com falha como missing
.
Aqui estão as informações para a matriz:
/dev/md1:
Version : 1.2
Creation Time : Sun May 12 20:28:14 2013
Raid Level : raid6
Array Size : 22686329856 (21.13 TiB 23.23 TB)
Used Dev Size : 3781054976 (3.52 TiB 3.87 TB)
Raid Devices : 8
Total Devices : 7
Persistence : Superblock is persistent
Update Time : Mon Sep 30 17:28:13 2024
State : clean, degraded
Active Devices : 7
Working Devices : 7
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Consistency Policy : resync
Name : playroom:raid (local to host playroom)
UUID : 34484e2b:99dc0604:10b071d5:c8012127
Events : 176577
Number Major Minor RaidDevice State
13 8 99 0 active sync /dev/sdg3
10 8 115 1 active sync /dev/sdh3
11 8 19 2 active sync /dev/sdb3
- 0 0 3 removed
9 8 83 4 active sync /dev/sdf3
8 8 51 5 active sync /dev/sdd3
15 8 35 6 active sync /dev/sdc3
14 8 3 7 active sync /dev/sda3
/dev/sda3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 34484e2b:99dc0604:10b071d5:c8012127
Name : playroom:raid (local to host playroom)
Creation Time : Sun May 12 20:28:14 2013
Raid Level : raid6
Raid Devices : 8
Avail Dev Size : 7562110607 sectors (3.52 TiB 3.87 TB)
Array Size : 22686329856 KiB (21.13 TiB 23.23 TB)
Used Dev Size : 7562109952 sectors (3.52 TiB 3.87 TB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=655 sectors
State : clean
Device UUID : b3b21814:7f42ad3f:cb7e8d5b:5dfd0d22
Update Time : Mon Sep 30 17:28:13 2024
Bad Block Log : 512 entries available at offset 16 sectors - bad blocks present.
Checksum : ef2d77c6 - correct
Events : 176577
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 7
Array State : AAA.AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdb3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 34484e2b:99dc0604:10b071d5:c8012127
Name : playroom:raid (local to host playroom)
Creation Time : Sun May 12 20:28:14 2013
Raid Level : raid6
Raid Devices : 8
Avail Dev Size : 7562110607 sectors (3.52 TiB 3.87 TB)
Array Size : 22686329856 KiB (21.13 TiB 23.23 TB)
Used Dev Size : 7562109952 sectors (3.52 TiB 3.87 TB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=655 sectors
State : clean
Device UUID : 77d07ff9:e5a670c0:cfb45916:717b98e8
Update Time : Mon Sep 30 17:28:13 2024
Bad Block Log : 512 entries available at offset 16 sectors - bad blocks present.
Checksum : d6aeae07 - correct
Events : 176577
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAA.AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 34484e2b:99dc0604:10b071d5:c8012127
Name : playroom:raid (local to host playroom)
Creation Time : Sun May 12 20:28:14 2013
Raid Level : raid6
Raid Devices : 8
Avail Dev Size : 7562110607 sectors (3.52 TiB 3.87 TB)
Array Size : 22686329856 KiB (21.13 TiB 23.23 TB)
Used Dev Size : 7562109952 sectors (3.52 TiB 3.87 TB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=655 sectors
State : clean
Device UUID : 67b4abd6:f1e5b87f:0851dd21:9200e1b6
Update Time : Mon Sep 30 17:28:13 2024
Bad Block Log : 512 entries available at offset 16 sectors - bad blocks present.
Checksum : 4ceee719 - correct
Events : 176577
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 6
Array State : AAA.AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 34484e2b:99dc0604:10b071d5:c8012127
Name : playroom:raid (local to host playroom)
Creation Time : Sun May 12 20:28:14 2013
Raid Level : raid6
Raid Devices : 8
Avail Dev Size : 7562110607 sectors (3.52 TiB 3.87 TB)
Array Size : 22686329856 KiB (21.13 TiB 23.23 TB)
Used Dev Size : 7562109952 sectors (3.52 TiB 3.87 TB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=655 sectors
State : clean
Device UUID : fe728bca:23485bc8:a5e5ac21:73c1a89a
Update Time : Mon Sep 30 17:28:13 2024
Bad Block Log : 512 entries available at offset 16 sectors - bad blocks present.
Checksum : 6d085d59 - correct
Events : 176577
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : AAA.AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 34484e2b:99dc0604:10b071d5:c8012127
Name : playroom:raid (local to host playroom)
Creation Time : Sun May 12 20:28:14 2013
Raid Level : raid6
Raid Devices : 8
Avail Dev Size : 7562110607 sectors (3.52 TiB 3.87 TB)
Array Size : 22686329856 KiB (21.13 TiB 23.23 TB)
Used Dev Size : 7562109952 sectors (3.52 TiB 3.87 TB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=261864 sectors, after=655 sectors
State : clean
Device UUID : 16c61ab0:553c3ce5:d9b300b8:54b4fe14
Update Time : Mon Sep 30 17:28:13 2024
Bad Block Log : 512 entries available at offset 264 sectors - bad blocks present.
Checksum : 801fb4f3 - correct
Events : 0
Layout : left-symmetric
Chunk Size : 512K
Device Role : spare
Array State : AAA.AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 34484e2b:99dc0604:10b071d5:c8012127
Name : playroom:raid (local to host playroom)
Creation Time : Sun May 12 20:28:14 2013
Raid Level : raid6
Raid Devices : 8
Avail Dev Size : 7562110607 sectors (3.52 TiB 3.87 TB)
Array Size : 22686329856 KiB (21.13 TiB 23.23 TB)
Used Dev Size : 7562109952 sectors (3.52 TiB 3.87 TB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=261864 sectors, after=655 sectors
State : clean
Device UUID : a8368569:56a9356f:e158fc12:9a75fcf4
Update Time : Mon Sep 30 17:28:13 2024
Bad Block Log : 512 entries available at offset 264 sectors - bad blocks present.
Checksum : fe7faa91 - correct
Events : 176577
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 4
Array State : AAA.AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 34484e2b:99dc0604:10b071d5:c8012127
Name : playroom:raid (local to host playroom)
Creation Time : Sun May 12 20:28:14 2013
Raid Level : raid6
Raid Devices : 8
Avail Dev Size : 7562110607 sectors (3.52 TiB 3.87 TB)
Array Size : 22686329856 KiB (21.13 TiB 23.23 TB)
Used Dev Size : 7562109952 sectors (3.52 TiB 3.87 TB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=655 sectors
State : clean
Device UUID : 4b25f2f9:65988664:86916d97:f0e3ac2a
Update Time : Mon Sep 30 17:28:13 2024
Bad Block Log : 512 entries available at offset 16 sectors - bad blocks present.
Checksum : 3e5f2e4b - correct
Events : 176577
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAA.AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdh3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 34484e2b:99dc0604:10b071d5:c8012127
Name : playroom:raid (local to host playroom)
Creation Time : Sun May 12 20:28:14 2013
Raid Level : raid6
Raid Devices : 8
Avail Dev Size : 7562110607 sectors (3.52 TiB 3.87 TB)
Array Size : 22686329856 KiB (21.13 TiB 23.23 TB)
Used Dev Size : 7562109952 sectors (3.52 TiB 3.87 TB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=261864 sectors, after=655 sectors
State : clean
Device UUID : 948d0c31:03c19927:22f18cdd:0d84ffb2
Update Time : Mon Sep 30 17:28:13 2024
Bad Block Log : 512 entries available at offset 264 sectors - bad blocks present.
Checksum : 6fec028 - correct
Events : 176577
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAA.AAAA ('A' == active, '.' == missing, 'R' == replacing)
cat /proc/mdstat
produz:
md1 : active raid6 sdg3[13] sda3[14] sdc3[15] sdd3[8] sdf3[9] sdb3[11] sdh3[10]
22686329856 blocks super 1.2 level 6, 512k chunk, algorithm 2 [8/7] [UUU_UUUU]
unused devices: <none>
Obrigado a frostschutz por detectar a inconsistência dos blocos ruins. O mdadm --examine-badblocks
dump é:
Bad-blocks on /dev/sda3:
3834808 for 512 sectors
3835320 for 368 sectors
1787061064 for 16 sectors
1788696152 for 8 sectors
Bad-blocks on /dev/sdb3:
3834808 for 512 sectors
3835320 for 368 sectors
1787061064 for 16 sectors
1788696152 for 8 sectors
Bad-blocks on /dev/sdc3:
1787061064 for 16 sectors
1788696152 for 8 sectors
Bad-blocks on /dev/sdd3:
3834808 for 512 sectors
3835320 for 368 sectors
1787061064 for 16 sectors
1788696152 for 8 sectors
Bad-blocks on /dev/sde3:
0 for 0 sectors
--- repeats for 512 lines ---
Bad-blocks on /dev/sdf3:
1787061064 for 16 sectors
1788696152 for 8 sectors
Bad-blocks on /dev/sdg3:
1787061064 for 16 sectors
1788696152 for 8 sectors
Bad-blocks on /dev/sdh3:
1787061064 for 16 sectors
1788696152 for 8 sectors
OK, então isso deve ser um bug do kernel. Por algum motivo, o kernel impede a adição de dispositivos aqui.
Como solução alternativa, livre-se do log de blocos inválidos:
Você meio que tem que remover o log de bloco ruim de qualquer forma, já que esses blocos ruins não vão embora de outra forma (não quando mais de unidades de redundância compartilham os mesmos blocos ruins, o que acontece de ser o caso). O log de bloco ruim é meio que um recurso semi-quebrado…
Os dados nesses setores (conforme mostrado por
--examine-badblocks
) podem não estar sincronizados e podem não conter dados corretos, portanto, fique atento a possíveis corrupções de dados nesses setores.Verifique também seus logs para descobrir o que acionou esses blocos defeituosos e faça uma verificação SMART em suas unidades, só por precaução, caso os blocos defeituosos sejam realmente físicos.
Depois de remover o log do bloco defeituoso, adicionar dispositivos deve funcionar novamente, ou pelo menos funcionou para mim quando testei localmente.
Se ainda assim não funcionar, você terá que discutir isso com os desenvolvedores na lista de discussão linux-raid (tente primeiro com o kernel atual).
Assim que o problema com a unidade for resolvido, execute mdadm com
--action=check
para detectar incompatibilidades no array. Se houver alguma, a menos que você queira recuperar setores incompatíveis em um array raid (pergunta antiga — não é uma resposta muito boa), siga commdadm --action=repair
.Deixar incompatibilidades em uma matriz raid456 sem correção pode causar corrupção de dados a longo prazo, já que elas não são necessariamente corrigidas com novas gravações.