acabei de notar que estava usando SDD para SSD. Corrigido
Preciso de ajuda para interpretar esta situação. /dev/sda
é um disco de dados com cópia de segurança e com dados reproduzíveis por isso não é crítico para o sistema mas gostaria de evitar o esforço de restaurar/reconstruir os dados alguns dos quais serão bastante demorados
É possível recuperar/reparar?
Se sim como? Se eu limpar o disco para reutilização, qual é sua confiabilidade?
Resumo (relatórios detalhados abaixo):
- não será montado: superblock ruim
- badblocks não encontra blocos ruins
- smartctl não relata erros
- fsck não pode definir sinalizadores de superblock
- fdisk mostra partição limpa
- dmesg mostra erros de gravação
- parted mostra 792 GB livres de unidade de 1 TB
A montagem ssd falha assim:
[stephen@meer ~]$ sudo mount /dev/sda1 /mnt/sda
mount: /mnt/sda: can't read superblock on /dev/sda1.
dmesg(1) may have more information after failed mount system call.
[stephen@meer ~]$
mas badblocks não encontra blocos ruins
[root@meer stephen]# badblocks -v /dev/sda1
Checking blocks 0 to 976760831
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found. (0/0/0 errors)
Mas smartctl não encontra erros
[root@meer stephen]# smartctl -a /dev/sda
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.17.9-arch1-1] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: WD Blue / Red / Green SSDs
Device Model: WDC WDS100T2B0A-00SM50
Serial Number: 213159800516
LU WWN Device Id: 5 001b44 8bc4fdc6e
Firmware Version: 415020WD
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database 7.3/5319
ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 1.5 Gb/s)
Local Time is: Tue May 24 16:06:23 2022 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 --- Old_age Always - 124
9 Power_On_Hours 0x0032 100 100 --- Old_age Always - 1470
12 Power_Cycle_Count 0x0032 100 100 --- Old_age Always - 134
165 Block_Erase_Count 0x0032 100 100 --- Old_age Always - 4312400063
166 Minimum_PE_Cycles_TLC 0x0032 100 100 --- Old_age Always - 1
167 Max_Bad_Blocks_per_Die 0x0032 100 100 --- Old_age Always - 65
168 Maximum_PE_Cycles_TLC 0x0032 100 100 --- Old_age Always - 14
169 Total_Bad_Blocks 0x0032 100 100 --- Old_age Always - 630
170 Grown_Bad_Blocks 0x0032 100 100 --- Old_age Always - 124
171 Program_Fail_Count 0x0032 100 100 --- Old_age Always - 128
172 Erase_Fail_Count 0x0032 100 100 --- Old_age Always - 0
173 Average_PE_Cycles_TLC 0x0032 100 100 --- Old_age Always - 2
174 Unexpected_Power_Loss 0x0032 100 100 --- Old_age Always - 90
184 End-to-End_Error 0x0032 100 100 --- Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 --- Old_age Always - 0
188 Command_Timeout 0x0032 100 100 --- Old_age Always - 64
194 Temperature_Celsius 0x0022 070 053 --- Old_age Always - 30 (Min/Max 18/53)
199 UDMA_CRC_Error_Count 0x0032 100 100 --- Old_age Always - 0
230 Media_Wearout_Indicator 0x0032 001 001 --- Old_age Always - 0x002600140026
232 Available_Reservd_Space 0x0033 097 097 004 Pre-fail Always - 97
233 NAND_GB_Written_TLC 0x0032 100 100 --- Old_age Always - 2703
234 NAND_GB_Written_SLC 0x0032 100 100 --- Old_age Always - 2842
241 Host_Writes_GiB 0x0030 253 253 --- Old_age Offline - 466
242 Host_Reads_GiB 0x0030 253 253 --- Old_age Offline - 622
244 Temp_Throttle_Status 0x0032 000 100 --- Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1470 -
Selective Self-tests/Logging not supported
e o fsck falha assim:
[root@meer ~]# e2fsck -cfpv /dev/sda1
/dev/sda1: recovering journal
e2fsck: Input/output error while recovering journal of /dev/sda1
e2fsck: unable to set superblock flags on /dev/sda1
/dev/sda1: ********** WARNING: Filesystem still has errors **********
May 24 15:38:29 meer kernel: I/O error, dev sda, sector 121899008 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
May 24 15:38:29 meer kernel: sd 2:0:0:0: [sda] tag#31 CDB: Write(10) 2a 00 07 44 08 00 00 00 08 00
May 24 15:38:29 meer kernel: sd 2:0:0:0: [sda] tag#31 Add. Sense: Unaligned write command
May 24 15:38:29 meer kernel: sd 2:0:0:0: [sda] tag#31 Sense Key : Illegal Request [current]
May 24 15:38:29 meer kernel: sd 2:0:0:0: [sda] tag#31 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
May 24 15:38:29 meer kernel: ata3.00: configured for UDMA/33
May 24 15:38:29 meer kernel: ata3.00: error: { ABRT }
May 24 15:38:29 meer kernel: ata3.00: status: { DRDY ERR }
May 24 15:38:29 meer kernel: ata3.00: cmd ca/00:08:00:08:44/00:00:00:00:00/e7 tag 31 dma 4096 out
res 51/04:08:00:08:44/00:00:07:00:00/e7 Emask 0x1 (device error)
May 24 15:38:29 meer kernel: ata3.00: failed command: WRITE DMA
May 24 15:38:29 meer kernel: ata3.00: irq_stat 0x40000001
May 24 15:38:29 meer kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
May 24 15:38:29 meer kernel: ata3: EH complete
May 24 15:38:29 meer kernel: ata3.00: configured for UDMA/33
May 24 15:38:29 meer kernel: ata3.00: error: { ABRT }
May 24 15:38:29 meer kernel: ata3.00: status: { DRDY ERR }
May 24 15:38:29 meer kernel: ata3.00: cmd ca/00:08:00:08:44/00:00:00:00:00/e7 tag 6 dma 4096 out
res 51/04:08:00:08:44/00:00:07:00:00/e7 Emask 0x1 (device error)
May 24 15:38:29 meer kernel: ata3.00: failed command: WRITE DMA
May 24 15:38:29 meer kernel: ata3.00: irq_stat 0x40000001
May 24 15:38:29 meer kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Particionamento como visto pelo fdisk.
Disk /dev/sda: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: WDC WDS100T2B0A
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 3F701164-2CF8-6D48-A94E-478634C140BE
Device Start End Sectors Size Type
/dev/sda1 2048 1953523711 1953521664 931.5G Linux filesystem
De dmesg
[ 5292.895300] ata3.00: configured for UDMA/33
[ 5292.895315] ata3: EH complete
[ 5293.021851] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 5293.021859] ata3.00: irq_stat 0x40000001
[ 5293.021864] ata3.00: failed command: WRITE DMA
[ 5293.021866] ata3.00: cmd ca/00:08:00:08:44/00:00:00:00:00/e7 tag 18 dma 4096 out
res 51/04:08:00:08:44/00:00:07:00:00/e7 Emask 0x1 (device error)
[ 5293.021874] ata3.00: status: { DRDY ERR }
[ 5293.021877] ata3.00: error: { ABRT }
separou:
root@meer stephen]# parted /dev/sda
GNU Parted 3.5
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print free
Model: ATA WDC WDS100T2B0A (scsi)
Disk /dev/sda: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
17.4kB 1049kB 1031kB Free Space
1 1049kB 1000GB 1000GB ext4
1000GB 1000GB 729kB Free Space
Não sei o que você tem feito com este disco, mas são números malucos! Olhando para a saída em que o SSD está:
Isso é uma constante de 16MiB por segundo de gravações em 61 dias.
Imagino que você tenha falha interna de NAND. Talvez você não consiga recuperar seus dados.
Sugiro que sua melhor solução daqui para frente é usar um espelho de invasão de alguma forma para armazenar em buffer os erros entre vários discos.
Idealmente, seriam dois discos de diferentes idades e/ou lotes de produção diferentes para tentar distribuir a distribuição de erros e falhas entre vários discos.
Apenas para esclarecer, considero uma quantidade anormalmente alta de gravações em um período muito curto. Você precisará levar isso em consideração na configuração de armazenamento que você usa.