Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Toshiba 2.5" HDD MQ01ABF...
Device Model: TOSHIBA MQ01ABF050
Serial Number: 563TT0VST
LU WWN Device Id: 5 000039 701e00efc
Firmware Version: AM0P2D
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Dec 1 20:52:46 2020 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 120) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 128
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 099 050 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 593
5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 120
9 Power_On_Hours 0x0032 060 060 000 Old_age Always - 16114
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3715
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 1252
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 262
193 Load_Cycle_Count 0x0032 084 084 000 Old_age Always - 160450
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 32 (Min/Max 11/51)
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0032 062 062 000 Old_age Always - 15578
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 34624675632
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 27836621543
254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 46937 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 46937 occurred at disk power-on lifetime: 16104 hours (671 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 00 08 58 2e 40 Error: UNC at LBA = 0x002e5808 = 3037192
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 00 08 58 2e 40 00 01:55:31.055 READ FPDMA QUEUED
60 18 f0 f2 61 6a 40 00 01:55:31.055 READ FPDMA QUEUED
60 20 e8 4a 62 6a 40 00 01:55:31.055 READ FPDMA QUEUED
60 00 e0 7a cb 2a 40 00 01:55:31.055 READ FPDMA QUEUED
60 08 d8 ea bd 1a 40 00 01:55:31.055 READ FPDMA QUEUED
Error 46936 occurred at disk power-on lifetime: 16104 hours (671 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 50 08 58 2e 40 Error: UNC at LBA = 0x002e5808 = 3037192
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 b8 7a cb 2a 40 00 01:55:26.911 READ FPDMA QUEUED
60 20 b0 4a 62 6a 40 00 01:55:26.906 READ FPDMA QUEUED
60 18 a8 f2 61 6a 40 00 01:55:26.906 READ FPDMA QUEUED
60 08 50 08 58 2e 40 00 01:55:26.890 READ FPDMA QUEUED
60 08 28 2a 66 11 40 00 01:55:26.889 READ FPDMA QUEUED
Error 46935 occurred at disk power-on lifetime: 16104 hours (671 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 b0 08 58 2e 40 Error: WP at LBA = 0x002e5808 = 3037192
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 08 10 f2 0b 51 40 00 01:55:22.635 WRITE FPDMA QUEUED
61 08 f0 ea 09 50 40 00 01:55:22.635 WRITE FPDMA QUEUED
61 08 e8 92 09 50 40 00 01:55:22.634 WRITE FPDMA QUEUED
61 08 08 32 09 50 40 00 01:55:22.633 WRITE FPDMA QUEUED
61 08 00 6a 08 50 40 00 01:55:22.633 WRITE FPDMA QUEUED
Error 46934 occurred at disk power-on lifetime: 16104 hours (671 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 a8 08 58 2e 40 Error: WP at LBA = 0x002e5808 = 3037192
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 08 d8 f2 07 10 40 00 01:55:22.178 WRITE FPDMA QUEUED
60 08 a8 08 58 2e 40 00 01:55:18.477 READ FPDMA QUEUED
60 28 d0 0a 66 11 40 00 01:55:18.459 READ FPDMA QUEUED
60 08 38 00 58 2e 40 00 01:55:18.444 READ FPDMA QUEUED
60 98 28 e2 03 aa 40 00 01:55:18.444 READ FPDMA QUEUED
Error 46933 occurred at disk power-on lifetime: 16104 hours (671 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 a0 08 58 2e 40 Error: UNC at LBA = 0x002e5808 = 3037192
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 80 a0 00 58 2e 40 00 01:55:14.288 READ FPDMA QUEUED
60 80 98 80 57 2e 40 00 01:55:14.287 READ FPDMA QUEUED
60 80 90 00 57 2e 40 00 01:55:14.286 READ FPDMA QUEUED
60 80 88 80 56 2e 40 00 01:55:14.286 READ FPDMA QUEUED
60 80 80 00 56 2e 40 00 01:55:14.285 READ FPDMA QUEUED
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 16114 -
# 2 Short offline Aborted by host 90% 16088 -
# 3 Short offline Completed: read failure 00% 16056 11470
# 4 Short offline Completed: read failure 00% 16055 11470
# 5 Short offline Completed: read failure 00% 16055 11470
# 6 Short offline Completed: read failure 00% 16055 11470
# 7 Short offline Completed: read failure 00% 16055 11470
# 8 Short offline Completed: read failure 00% 16054 11470
# 9 Short offline Aborted by host 90% 16054 -
#10 Short offline Completed: read failure 00% 16052 11470
#11 Short offline Completed: read failure 00% 16052 11470
#12 Short offline Aborted by host 90% 16046 -
#13 Short offline Completed: read failure 00% 15343 368296
#14 Short offline Completed without error 00% 14996 -
#15 Short offline Aborted by host 50% 14996 -
#16 Short offline Completed without error 00% 13750 -
#17 Short offline Completed without error 00% 3162 -
#18 Short offline Completed without error 00% 3161 -
#19 Short offline Interrupted (host reset) 80% 3161 -
#20 Short offline Completed without error 00% 3161 -
#21 Short offline Completed without error 00% 3161 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
我在运行 Ubuntu 的英特尔 NUC 中安装了一个相对较新的金士顿 1TB SSD。这是一个只有 2 个月大的新版本。今天启动时,我看到一个 SMART 错误,提示我备份和更换设备。联系了金士顿,我在路上有一个新的 SSD。我设法使用用于安装的 Live USB 启动,以便获得一些诊断。
但是,我无法访问驱动器上的数据来备份它。
运行 smartctl 显示驱动器存在问题。我尝试使用各种超级块进行挂载,但似乎它们都已损坏。我只需要让驱动器进入只读状态,这样我就可以从中提取任何数据。
如何在启动期间按照 SMART 消息的建议从该设备获取备份?
ubuntu@ubuntu:~$ sudo smartctl /dev/nvme0n1p1 -x
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-26-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: KINGSTON SA2000M81000G
Serial Number: 50026B7683CFB158
Firmware Version: S5Z42105
PCI Vendor/Subsystem ID: 0x2646
IEEE OUI Identifier: 0x0026b7
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization: 22,138,441,728 [22.1 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 0026b7 683cfb1585
Local Time is: Mon Aug 10 10:23:53 2020 UTC
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 75 Celsius
Critical Comp. Temp. Threshold: 80 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 9.00W - - 0 0 0 0 0 0
1 + 4.60W - - 1 1 1 1 0 0
2 + 3.80W - - 2 2 2 2 0 0
3 - 0.0450W - - 3 3 3 3 2000 2000
4 - 0.0040W - - 4 4 4 4 15000 15000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- media has been placed in read only mode
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x08
Temperature: 25 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 45,116 [23.0 GB]
Data Units Written: 373,748 [191 GB]
Host Read Commands: 852,421
Host Write Commands: 3,762,237
Controller Busy Time: 70
Power Cycles: 45
Power On Hours: 77
Unsafe Shutdowns: 22
Media and Data Integrity Errors: 46,979
Error Information Log Entries: 31,278
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, max 256 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 31278 1 0x0188 0x0280 - 2856 1 -
1 31277 1 0x0188 0x0280 - 3112 1 -
2 31276 1 0x0188 0x0280 - 3112 1 -
3 31275 1 0x0188 0x0280 - 2856 1 -
4 31274 1 0x0188 0x0280 - 2856 1 -
5 31273 1 0x0188 0x0280 - 3112 1 -
6 31272 1 0x0188 0x0280 - 3112 1 -
7 31271 1 0x0188 0x0280 - 2856 1 -
8 31270 1 0x0188 0x0280 - 2856 1 -
9 31269 1 0x0188 0x0280 - 3112 1 -
10 31268 1 0x0188 0x0280 - 3112 1 -
11 31267 1 0x0188 0x0280 - 2856 1 -
12 31266 1 0x0188 0x0280 - 2600 1 -
13 31265 1 0x0188 0x0280 - 2600 1 -
14 31264 1 0x0188 0x0280 - 2600 1 -
15 31263 1 0x0188 0x0280 - 2600 1 -
... (240 entries not shown)
当我尝试在其上安装 Windows 10 时,我的一个硬盘驱动器无法正常工作,因此我搜索了错误并使用 Smartctl 自检检查了 HD 是否存在错误。
我尝试了一些常见的修复方法,例如用零覆盖有错误的部分,但它没有用。
以下是自检日志:
sudo smartctl -l selftest /dev/sdb
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 828 15353528
# 2 Short offline Completed: read failure 90% 827 55762560
# 3 Extended offline Completed: read failure 90% 827 15325464
# 4 Extended offline Completed: read failure 90% 827 15323008
# 5 Extended offline Completed: read failure 90% 827 15323008
# 6 Short offline Completed: read failure 90% 827 16319388
# 7 Short offline Completed: read failure 90% 827 16319388
# 8 Short offline Completed without error 00% 537 -
# 9 Short offline Completed without error 00% 0 -
我可以修复这些错误还是无法恢复?让我知道我是否可以提供更多信息。
硬盘信息
Model Family: Seagate Samsung SpinPoint M8 (AF)
Device Model: ST1000LM024 HN-M101MBB
Serial Number: S32SJ5DF211384
LU WWN Device Id: 5 0004cf 4013ff254
Firmware Version: 2BA30001
User Capacity: 1.000.204.886.016 bytes [1,00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 5400 rpm
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Smartctl 完整日志
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 29202
2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0
3 Spin_Up_Time 0x0023 092 080 025 Pre-fail Always - 2643
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1361
5 Reallocated_Sector_Ct 0x0033 095 095 010 Pre-fail Always - 864
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 846
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 18
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1408
13 Read_Soft_Error_Rate 0x003a 100 100 000 Old_age Always - 0
181 Program_Fail_Cnt_Total 0x0022 100 100 000 Old_age Always - 3398445
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 107
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
193 Load_Cycle_Count 0x0032 099 099 000 Old_age Always - 17299
194 Temperature_Celsius 0x0002 064 055 000 Old_age Always - 31 (Min/Max 20/45)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 095 095 000 Old_age Always - 864
197 Current_Pending_Sector 0x0032 095 094 000 Old_age Always - 923
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 616
240 Head_Flying_Hours 0x0032 100 100 000 Old_age Always - 822
241 Total_LBAs_Written 0x0032 096 094 000 Old_age Always - 6322514
242 Total_LBAs_Read 0x0032 096 094 000 Old_age Always - 6719332
254 Free_Fall_Sensor 0x0032 252 252 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 828 15353528
# 2 Short offline Completed: read failure 90% 827 55762560
# 3 Extended offline Completed: read failure 90% 827 15325464
# 4 Extended offline Completed: read failure 90% 827 15323008
# 5 Extended offline Completed: read failure 90% 827 15323008
# 6 Short offline Completed: read failure 90% 827 16319388
# 7 Short offline Completed: read failure 90% 827 16319388
# 8 Short offline Completed without error 00% 537 -
# 9 Short offline Completed without error 00% 0 -
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Completed_read_failure [90% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
我想在 ubuntu disco 中使用 smartctl 测试 USB 硬盘。
alex@Guilmon:~$ LANG=C
alex@Guilmon:~$ sudo smartctl -d usbjmicron --all /dev/sdc
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.0.0-36-generic] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
Read Device Identity failed: empty IDENTIFY data
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options
我从smartmontools.org/wiki/获得的选项 -d usbjmicron
lsusb | grep -i micron
Bus 003 Device 002: ID 152d:2329 JMicron Technology Corp. / JMicron USA Technology Corp. JM20329 SATA Bridge
smartctl --scan 也显示了它
sudo smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/sdc -d usbjmicron # /dev/sdc [USB JMicron], ATA device
一段时间后,fdisk 和 parted 挂起。
Nov 27 16:35:16 Guilmon kernel: sd 6:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 27 16:35:16 Guilmon kernel: sd 6:0:0:0: [sdc] tag#0 Sense Key : Unit Attention [current]
Nov 27 16:35:16 Guilmon kernel: sd 6:0:0:0: [sdc] tag#0 Add. Sense: Not ready to ready change, medium may have changed
Nov 27 16:35:16 Guilmon kernel: sd 6:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
Nov 27 16:35:16 Guilmon kernel: print_req_error: I/O error, dev sdc, sector 0 flags 0
Nov 27 16:35:16 Guilmon kernel: Buffer I/O error on dev sdc, logical block 0, async page read
Nov 27 16:36:30 Guilmon sudo[14829]: alex : TTY=pts/0 ; PWD=/home/alex ; USER=root ; COMMAND=/usr/sbin/fdisk -l /dev/sdc
Nov 27 16:38:17 Guilmon kernel: sd 6:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 27 16:38:17 Guilmon kernel: sd 6:0:0:0: [sdc] tag#0 Sense Key : Unit Attention [current]
Nov 27 16:38:17 Guilmon kernel: sd 6:0:0:0: [sdc] tag#0 Add. Sense: Not ready to ready change, medium may have changed
Nov 27 16:38:17 Guilmon kernel: sd 6:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 03 8a 47 80 00 00 08 00
Nov 27 16:38:17 Guilmon kernel: print_req_error: I/O error, dev sdc, sector 59393920 flags 80700
Nov 27 16:41:18 Guilmon kernel: sd 6:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 27 16:41:18 Guilmon kernel: sd 6:0:0:0: [sdc] tag#0 Sense Key : Unit Attention [current]
Nov 27 16:41:18 Guilmon kernel: sd 6:0:0:0: [sdc] tag#0 Add. Sense: Not ready to ready change, medium may have changed
Nov 27 16:41:18 Guilmon kernel: sd 6:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 03 8a 4f fe 00 00 02 00
Nov 27 16:41:18 Guilmon kernel: print_req_error: I/O error, dev sdc, sector 59396094 flags 80700
Nov 27 16:44:20 Guilmon kernel: sd 6:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 27 16:44:20 Guilmon kernel: sd 6:0:0:0: [sdc] tag#0 Sense Key : Unit Attention [current]
Nov 27 16:44:20 Guilmon kernel: sd 6:0:0:0: [sdc] tag#0 Add. Sense: Not ready to ready change, medium may have changed
Nov 27 16:44:20 Guilmon kernel: sd 6:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
Nov 27 16:44:20 Guilmon kernel: print_req_error: I/O error, dev sdc, sector 0 flags 0
Nov 27 16:44:20 Guilmon kernel: Buffer I/O error on dev sdc, logical block 0, async page read
文件系统
sudo lsblk -f | grep sdc
sdc
├─sdc1 ext4 bionic 8cc02316-1cd7-4f54-bd1a-c3f174e55251 #my bionic installation
├─sdc2 swap f923fdf9-3416-420d-898c-e481c82a757b
├─sdc3
└─sdc5 ext4 bionic-home f7217969-9cde-4eff-940b-761ebb06189b #old debian home
文件系统检查
sudo fsck.ext4 /dev/sdc1
[sudo] password for alex:
e2fsck 1.44.6 (5-Mar-2019)
bionic: recovering journal
bionic: clean, 373117/1602496 files, 2540186/6400000 blocks
alex@Guilmon:~$ sudo fsck.ext4 /dev/sdc5
e2fsck 1.44.6 (5-Mar-2019)
bionic-home: clean, 161503/28672000 files, 102109567/114672128 blocks
当我插入硬盘驱动器时,它安装得很好。但是昨天我的文件管理器在我搜索它的数据时挂了。
在我用更大的硬盘更换之前,硬盘已经过测试,并且不经常使用。我怎样才能检查这个硬盘。我尝试了几种电缆。我只有一个机箱。
注意我可以在上面启动我的仿生,没有任何问题。
很抱歉有很多文字。
现在我得到 fdisk 输出
sudo fdisk -l /dev/sdc
Disk /dev/sdc: 465,8 GiB, 500107862016 bytes, 976773168 sectors
Disk model: IB-272StU-OT
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000e2cd8
Device Boot Start End Sectors Size Id Type
/dev/sdc1 2048 51202047 51200000 24,4G 83 Linux
/dev/sdc2 51202048 59394047 8192000 3,9G 82 Linux swap / Solaris
/dev/sdc3 59396094 976773119 917377026 437,5G 5 Extended
/dev/sdc5 59396096 976773119 917377024 437,5G 83 Linux
在 Windows 上,我使用 CrystalDiskInfo 来告诉我硬盘何时坏了。它有一个指示器图标,每隔几分钟左右会自动更新 SMART 数据。
在 Linux 上,有一些程序可以检查 SMART 值,但它们必须手动执行。
是否有程序在后台运行并在出现 SMART 警告时立即发送通知?
我一直在使用 Lubuntu 测试台在销售前格式化许多硬盘。我一直在使用 Gnome Disks 实用程序来确保磁盘清洁我的公司并且我的所有数据都被删除。
我选择使用 SATA SECURE ERASE 方法来清洁驱动器,因为它清除了整个磁盘表面,不仅是操作系统能够看到的块,而且被认为是最安全的。
我的系统支持 SATA HOT PLUG,并且在很多磁盘交换期间,某些驱动器的电源连接器已移动并且擦除被中断。我知道安全擦除命令需要在擦除之前使用 ATA 密码锁定驱动器。
我尝试使用 MHDD 解锁现在无法访问的驱动器,但我的 SUDO 密码似乎无法解锁它们。我一直在寻找有关 DISKS 实用程序的详细手册,但除了说明该实用程序功能的简短页面外,我找不到任何其他内容。我不是程序员,无法阅读该工具的源代码以自行查找有关密码来源的信息。
有谁知道这个工具是否会随机化它擦除的每个驱动器的密码,并且我被 10 个基本上是垃圾的 ATA PASSWORD 锁定驱动器困住了,或者它是否为每次擦除使用了一些标准密码?
请任何人向我提供一些信息,或与该工具的开发人员联系,以便我可以分析该工具,以了解是否值得我花时间去猜测密码。
如果密码是随机的,它应该完全警告用户在电源浪涌或任何类型的中断的情况下锁定驱动器的可能性。
亲切的问候,Ł。沃尔尼
我有一个奇怪的问题。在新安装的 Ubuntu 18.04 上,系统似乎运行良好。突然,显然无缘无故,系统挂断了 10 秒或几分钟,我无法执行任何操作。
我试图让一个顶级实例保持打开状态,并且 RAM/CPU 使用率似乎还不错。我在具有 6GB RAM 和 12GB 交换空间的 i5 机器上。我刚刚测试了内存和磁盘,它们没有错误。
编辑 一些附加信息。我将 CPU 频率调节器设置为性能,因此它始终以最大速度工作。
在执行 CPU 密集型操作(例如数据分析)时,该问题更常出现。完成后,GUI 变得完全没有响应,很难或不可能让它恢复工作。
编辑
输出grep . -r /sys/firmware/acpi/interrupts
/sys/firmware/acpi/interrupts/gpe2F: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe23: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe1F: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe13: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe0F: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe03: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe3D: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe31: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe2D: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe21: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe1D: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/ff_pwr_btn: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe11: 0 STS invalid unmasked
/sys/firmware/acpi/interrupts/gpe0D: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe01: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe3B: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe2B: 0 invalid unmasked
/sys/firmware/acpi/interrupts/ff_rt_clk: 0 disabled unmasked
/sys/firmware/acpi/interrupts/ff_pmtimer: 0 STS invalid unmasked
/sys/firmware/acpi/interrupts/gpe1B: 0 STS invalid unmasked
/sys/firmware/acpi/interrupts/gpe38: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe0B: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe28: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe18: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe08: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe36: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe26: 0 invalid unmasked
/sys/firmware/acpi/interrupts/error: 0
/sys/firmware/acpi/interrupts/gpe16: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/sci: 4
/sys/firmware/acpi/interrupts/gpe06: 4 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe34: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe24: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe14: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe04: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe3E: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe32: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe2E: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe22: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe1E: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe12: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe0E: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe02: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe3C: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe30: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe2C: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe20: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe1C: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe10: 0 STS invalid unmasked
/sys/firmware/acpi/interrupts/gpe39: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe0C: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe00: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe3A: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe_all: 4
/sys/firmware/acpi/interrupts/gpe29: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe2A: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe19: 0 STS invalid unmasked
/sys/firmware/acpi/interrupts/gpe1A: 0 STS invalid unmasked
/sys/firmware/acpi/interrupts/gpe09: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe37: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe0A: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe27: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe17: 0 STS invalid unmasked
/sys/firmware/acpi/interrupts/ff_gbl_lock: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe07: 0 enabled unmasked
/sys/firmware/acpi/interrupts/sci_not: 0
/sys/firmware/acpi/interrupts/gpe35: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe25: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe15: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe05: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe3F: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe33: 0 invalid unmasked
/sys/firmware/acpi/interrupts/ff_slp_btn: 0 invalid unmasked
编辑 04/03/2019 我运行了一个完整的 SMART 测试,现在看起来不太好,至少在我看来。
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 179 176 021 Pre-fail Always - 4025
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 218
5 Reallocated_Sector_Ct 0x0033 154 154 140 Pre-fail Always - 364
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 034 034 000 Old_age Always - 48741
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 217
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 100
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 117
194 Temperature_Celsius 0x0022 089 080 000 Old_age Always - 58
196 Reallocated_Event_Count 0x0032 022 022 000 Old_age Always - 178
197 Current_Pending_Sector 0x0032 199 199 000 Old_age Always - 234
198 Offline_Uncorrectable 0x0030 199 199 000 Old_age Offline - 245
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 188 188 000 Old_age Offline - 2436
240 Head_Flying_Hours 0x0032 038 038 000 Old_age Always - 45709
241 Total_LBAs_Written 0x0032 200 200 000 Old_age Always - 81196791754
242 Total_LBAs_Read 0x0032 200 200 000 Old_age Always - 75991010629
TL;博士; 有没有办法以exec
与测试邮件命令类似的方式测试 smartd 配置的命令?
详细说明: 我的用例是我使用monit来监控我的 Ubuntu Server 18.04。在我的 smartd.conf 中,我告诉 smartmontools:
/dev/sda -a -m root -M exec /usr/share/smartmontools/smartd-runner -M test
/dev/sdb -a -m root -M exec /usr/share/smartmontools/smartd-runner
在/etc/smartmontools/run.d/
我有一个名为的脚本notify-monit.sh
,当调用它时,会创建一个/etc/monit/reports
包含 smartd 报告的文件。然后在/etc/monit/scripts
我有另一个由monit调用的bash脚本作为程序检查,如果文件存在则返回1,如果文件不存在则返回0。然后 monit 检查退出代码并在它是 != 0 时发出警报。
这个场景有点复杂,所以我想从头到尾对其进行测试。我手动运行每个脚本,它们单独运行良好,但在 smartd 启动的真实案例场景中测试它们仍然会更安全。
根据我对 smartd 手册页的了解,我-M test
只会尝试发送一封测试电子邮件,我在 syslog 中看到它确实做到了这一点。仅此而已。有什么方法可以测试整个exec
链条吗?
在最近的一些更新之后,我的服务器告诉我它再也找不到使用其磁盘 ID 安装的外部备份磁盘。
所以我检查并注意到我的外部备份驱动器的 ID 已更改:从 ata-ST3000DM001... 到 usb-Seagate_Expansion_Desk...
很公平,好吧,它是一个 USB 驱动器。但是后来我的备份脚本抱怨它无法读取 SMART 数据,这在以前的标识符之前从来都不是问题。
我可以从我的脚本中删除智能检查,但我不认为这是一个选项,因为这意味着如果这个磁盘坏了我不会收到通知。
到目前为止我尝试过的事情:
smartctl -d sat -a /dev/disk/by-id/usb-...
smartctl (-d sat) -a /dev/sda
重启
有用的信息:
lsusb:
Bus 005 Device 002: ID 0bc2:331a Seagate RSS LLC
smartctl:
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-122-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
Read Device Identity failed: scsi error unsupported field in scsi command
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
uname:
4.4.0-122-generic
先感谢您!