Ping uma porta específica

Question

Scandinave

Asked: 2024-09-28 21:26:04 +0800 CST2024-09-28 21:26:04 +0800 CST 2024-09-28 21:26:04 +0800 CST

Melhor maneira de substituir uma unidade que ainda não falhou

772

Tenho um array Raid 5 de 4 drives. Um dos drives (sdb) mostra muitos erros e meu array mostra altas latências durante o acesso.

Atualmente tenho um disco novo e outro está comandado.

Como parte da substituição, também quero aumentar o nível do ataque de 5 para 6. Identifico que pretendo atingir isso:

Ao crescer primeiro para o nível 6

mdadm --manage /dev/md0 --add /dev/sdf1 
mdadm --grow /dev/md0 --raid-devices 5 --level 6 --backup-file /<path>/mdadm-backup
mdadm --manage /dev/md0 --replace /dev/sdb1 # Set faulty but keep it for replacement until a spare drive is available
# After my second drive arrived
mdadm --manage /dev/md0 --add /dev/sdg1

Primeiro substituindo a unidade defeituosa e depois aumentando para o nível 6 do RAID

mdadm --manage /dev/md0 --add /dev/sdf1 
mdadm --manage /dev/md0 --replace /dev/sdb1 --with /dev/sdf1
# after second drive is arrived
mdadm --manage /dev/md0 --add /dev/sdg1
mdadm --grow /dev/md0 --raid-devices 5 --level 6 --backup-file /path/mdadm-backup

Existe alguma diferença entre as duas soluções? Uma é mais segura que a outra por impor menos operações em discos no array?

[Editar] adicionando resultado inteligente:

Sdb (quase com defeito)

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   082   072   006    Pre-fail  Always       -       22562487
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       124
  5 Reallocated_Sector_Ct   0x0033   077   077   010    Pre-fail  Always       -       29352
  7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       4963671931
  9 Power_On_Hours          0x0032   026   026   000    Old_age   Always       -       65627
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       124
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       4793
188 Command_Timeout         0x0032   100   072   000    Old_age   Always       -       137445703785
189 High_Fly_Writes         0x003a   001   001   000    Old_age   Always       -       107
190 Airflow_Temperature_Cel 0x0022   073   056   045    Old_age   Always       -       27 (Min/Max 25/28)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       82
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       296
194 Temperature_Celsius     0x0022   027   044   000    Old_age   Always       -       27 (0 14 0 0 0)
197 Current_Pending_Sector  0x0012   001   001   000    Old_age   Always       -       35384
198 Offline_Uncorrectable   0x0010   001   001   000    Old_age   Offline      -       35384
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SBC

  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail  Always       -       194860592
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       124
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       24
  7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       654004767
  9 Power_On_Hours          0x0032   026   026   000    Old_age   Always       -       65631
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       124
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   094   094   000    Old_age   Always       -       6
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       2
189 High_Fly_Writes         0x003a   062   062   000    Old_age   Always       -       38
190 Airflow_Temperature_Cel 0x0022   073   059   045    Old_age   Always       -       27 (Min/Max 24/27)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       82
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       295
194 Temperature_Celsius     0x0022   027   041   000    Old_age   Always       -       27 (0 14 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

sdd

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   113   094   006    Pre-fail  Always       -       51447952
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       124
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       665278320
  9 Power_On_Hours          0x0032   026   026   000    Old_age   Always       -       65630
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       124
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       173
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       1
189 High_Fly_Writes         0x003a   001   001   000    Old_age   Always       -       188
190 Airflow_Temperature_Cel 0x0022   073   055   045    Old_age   Always       -       27 (Min/Max 25/27)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       82
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       296
194 Temperature_Celsius     0x0022   027   045   000    Old_age   Always       -       27 (0 14 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       16
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       16
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

1 respostas

Voted

Halfgaar · Answer 1 · 2024-09-28T22:34:04+08:00

Sim, há uma diferença, mas não tenho certeza de qual é melhor.

Quando você substitui um disco em um RAID5, você precisa que todos os outros discos estejam livres de erros de leitura para reconstruir o novo disco. Com unidades modernas, isso não é um desafio fácil. A chance de erro de leitura em unidades modernas enormes é quase tal que você certamente encontrará um. É por isso que o RAID5 não é mais recomendado. Você pode encontrar muitas informações sobre isso na internet.

Ao migrar para RAID6, você também incorrerá em uma carga de gravação nos discos antigos. Isso pode até ser vantajoso: uma unidade pode ter 'setores pendentes atuais' (veja SMART). Isso significa setores que ela considerou danificados. Ela os realocará assim que receberem uma nova gravação. Se o setor ainda não for bom o suficiente para receber a gravação, o contador de 'setores realocados' aumentará. O setor então, transparentemente, residirá em outro lugar no disco.

Meu primeiro passo seria verificar o status SMART das suas unidades e observar os dois parâmetros acima.

Mas, estou hesitante em recomendar uma maneira em vez da outra se suas unidades realmente tiverem setores pendentes. Acho que se não tiverem, eu primeiro substituiria o disco no RAID5, porque isso incorre em uma operação não destrutiva nos outros discos.

E tenha/faça backups...

Melhor maneira de substituir uma unidade que ainda não falhou

Você pode passar usuário/passar para autenticação básica HTTP em parâmetros de URL?