我一直遇到系统进入只读状态的问题。我可以进行其他问题中描述的修复(例如Read-only filesystem error),这可以修复它〜一天。但是每天在启动时问题都会回来。
我的问题不是如何修复它,而是有什么方法可以让我诊断出可能是什么特定软件导致了这种情况?
编辑:
在下面添加 syslog 输出,通过对这些错误消息进行一些搜索,我的理解是没有什么特别的这样做;它只是反复找不到好块(即死驱动器)?
Jun 26 10:03:53 mal NetworkManager[1298]: <info> [1593180233.9376] manager: NetworkManager state is now CONNECTED_GLOBAL
Jun 26 10:04:03 mal systemd[1]: NetworkManager-dispatcher.service: Succeeded.
Jun 26 10:04:24 mal anacron[1290]: Job `cron.daily' terminated
Jun 26 10:04:24 mal anacron[1290]: Normal exit (1 job run)
Jun 26 10:04:24 mal systemd[1]: anacron.service: Succeeded.
Jun 26 10:05:01 mal CRON[7965]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jun 26 10:07:00 mal systemd-resolved[1282]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Jun 26 10:07:07 mal kernel: [ 529.845356] ata1.00: READ LOG DMA EXT failed, trying PIO
Jun 26 10:07:07 mal kernel: [ 529.846653] ata1.00: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x0
Jun 26 10:07:07 mal kernel: [ 529.846656] ata1.00: irq_stat 0x40000008
Jun 26 10:07:07 mal kernel: [ 529.846659] ata1.00: failed command: READ FPDMA QUEUED
Jun 26 10:07:07 mal kernel: [ 529.846663] ata1.00: cmd 60/08:c0:e8:47:91/00:00:14:00:00/40 tag 24 ncq dma 4096 in
Jun 26 10:07:07 mal kernel: [ 529.846663] res 41/40:00:ec:47:91/00:00:14:00:00/00 Emask 0x409 (media error) <F>
Jun 26 10:07:07 mal kernel: [ 529.846665] ata1.00: status: { DRDY ERR }
Jun 26 10:07:07 mal kernel: [ 529.846666] ata1.00: error: { UNC }
Jun 26 10:07:07 mal kernel: [ 529.852714] ata1.00: configured for UDMA/133
Jun 26 10:07:07 mal kernel: [ 529.852746] sd 0:0:0:0: [sda] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 26 10:07:07 mal kernel: [ 529.852749] sd 0:0:0:0: [sda] tag#24 Sense Key : Medium Error [current]
Jun 26 10:07:07 mal kernel: [ 529.852752] sd 0:0:0:0: [sda] tag#24 Add. Sense: Unrecovered read error - auto reallocate failed
Jun 26 10:07:07 mal kernel: [ 529.852755] sd 0:0:0:0: [sda] tag#24 CDB: Read(10) 28 00 14 91 47 e8 00 00 08 00
Jun 26 10:07:07 mal kernel: [ 529.852758] blk_update_request: I/O error, dev sda, sector 345065452 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jun 26 10:07:07 mal kernel: [ 529.852779] ata1: EH complete
Jun 26 10:07:07 mal kernel: [ 529.852805] EXT4-fs warning (device sda2): htree_dirblock_to_tree:997: inode #10763504: lblock 0: comm duplicity: error -5 reading directory block
Jun 26 10:07:07 mal kernel: [ 529.950532] ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x0
Jun 26 10:07:07 mal kernel: [ 529.950534] ata1.00: irq_stat 0x40000008
Jun 26 10:07:07 mal kernel: [ 529.950536] ata1.00: failed command: READ FPDMA QUEUED
Jun 26 10:07:07 mal kernel: [ 529.950538] ata1.00: cmd 60/08:00:e8:47:91/00:00:14:00:00/40 tag 0 ncq dma 4096 in
Jun 26 10:07:07 mal kernel: [ 529.950538] res 41/40:00:ec:47:91/00:00:14:00:00/00 Emask 0x409 (media error) <F>
Jun 26 10:07:07 mal kernel: [ 529.950539] ata1.00: status: { DRDY ERR }
Jun 26 10:07:07 mal kernel: [ 529.950540] ata1.00: error: { UNC }
Jun 26 10:07:07 mal kernel: [ 529.956788] ata1.00: configured for UDMA/133
Jun 26 10:07:07 mal kernel: [ 529.956808] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 26 10:07:07 mal kernel: [ 529.956810] sd 0:0:0:0: [sda] tag#0 Sense Key : Medium Error [current]
Jun 26 10:07:07 mal kernel: [ 529.956811] sd 0:0:0:0: [sda] tag#0 Add. Sense: Unrecovered read error - auto reallocate failed
Jun 26 10:07:07 mal kernel: [ 529.956813] sd 0:0:0:0: [sda] tag#0 CDB: Read(10) 28 00 14 91 47 e8 00 00 08 00
Jun 26 10:07:07 mal kernel: [ 529.956814] blk_update_request: I/O error, dev sda, sector 345065452 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jun 26 10:07:07 mal kernel: [ 529.956838] ata1: EH complete
Jun 26 10:07:07 mal kernel: [ 529.956840] EXT4-fs error (device sda2): __ext4_find_entry:1531: inode #10763504: comm deja-dup: reading directory lblock 0
6 月 29 日更新
边注; 自从我从桌面拔下 Kindle 后,这个错误已经 3 天多没有发生了。我不知道这是否与它有关;但只是在这里为更多了解的人提一下。
从 live CD 启动:检查驱动器的输出
此外, smartctl 输出。我似乎无法-t long
真正完成测试,但简短的测试似乎很好。
ubuntu@ubuntu:~$ sudo smartctl -a /dev/sda2
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-26-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Marvell based SanDisk SSDs
Device Model: SanDisk SSD PLUS 1000GB
Serial Number: 190532802370
LU WWN Device Id: 5 001b44 8b92df610
Firmware Version: UH5100RL
User Capacity: 1,000,207,286,272 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Jun 29 23:21:35 2020 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x15) SMART execute Offline immediate.
No Auto Offline data collection support.
Abort Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 182) minutes.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 3921
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 353
165 Total_Write/Erase_Count 0x0032 100 100 000 Old_age Always - 1045
166 Min_W/E_Cycle 0x0032 100 100 --- Old_age Always - 4
167 Min_Bad_Block/Die 0x0032 100 100 --- Old_age Always - 0
168 Maximum_Erase_Cycle 0x0032 100 100 --- Old_age Always - 14
169 Total_Bad_Block 0x0032 100 100 --- Old_age Always - 1688
170 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 Avg_Write/Erase_Count 0x0032 100 100 000 Old_age Always - 4
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 17
184 End-to-End_Error 0x0032 100 100 --- Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 34
188 Command_Timeout 0x0032 100 100 --- Old_age Always - 0
194 Temperature_Celsius 0x0022 065 055 000 Old_age Always - 35 (Min/Max 17/55)
199 SATA_CRC_Error 0x0032 100 100 --- Old_age Always - 0
230 Perc_Write/Erase_Count 0x0032 100 100 000 Old_age Always - 610 80 610
232 Perc_Avail_Resrvd_Space 0x0033 100 100 005 Pre-fail Always - 100
233 Total_NAND_Writes_GiB 0x0032 100 100 --- Old_age Always - 4201
234 Perc_Write/Erase_Ct_BC 0x0032 100 100 000 Old_age Always - 16347
241 Total_Writes_GiB 0x0030 100 100 000 Old_age Offline - 6893
242 Total_Reads_GiB 0x0030 100 100 000 Old_age Offline - 3558
244 Thermal_Throttle 0x0032 000 100 --- Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 3921 -
# 2 Extended offline Fatal or unknown error 90% 3920 0
# 3 Short offline Completed without error 00% 3919 -
# 4 Extended offline Fatal or unknown error 90% 3889 0
# 5 Extended offline Self-test routine in progress 90% 3888 -
# 6 Extended offline Self-test routine in progress 90% 3888 -
# 7 Extended offline Self-test routine in progress 90% 3888 -
# 8 Extended offline Self-test routine in progress 90% 3888 -
# 9 Extended offline Self-test routine in progress 90% 3888 -
#10 Extended offline Self-test routine in progress 90% 3888 -
#11 Extended offline Self-test routine in progress 90% 3888 -
#12 Extended offline Self-test routine in progress 90% 3888 -
#13 Extended offline Self-test routine in progress 90% 3888 -
#14 Extended offline Self-test routine in progress 90% 3888 -
#15 Extended offline Self-test routine in progress 90% 3888 -
#16 Extended offline Self-test routine in progress 90% 3888 -
#17 Extended offline Self-test routine in progress 90% 3888 -
#18 Extended offline Self-test routine in progress 90% 3888 -
#19 Extended offline Self-test routine in progress 90% 3888 -
#20 Extended offline Self-test routine in progress 90% 3888 -
#21 Extended offline Self-test routine in progress 90% 3888 -
Selective Self-tests/Logging not supported
fsck
terminal
按Ctrl+ Alt+打开一个窗口Tsudo fdisk -l
sudo fsck -f /dev/sda2
,替换sdXX
为您之前找到的数字fsck
如果有错误,请重复该命令reboot
NCQ
您有 NCQ 错误。
以及扇区 345065452 上可能出现的错误。本机命令队列 (NCQ) 是串行 ATA 协议的扩展,允许硬盘驱动器在内部优化接收到的读取和写入命令的执行顺序。
这样做可以修复 NCQ 错误...
编辑
sudo -H gedit /etc/default/grub
并更改以下行以包含此额外参数。然后sudo update-grub
将更改写入磁盘。重启。监视器挂起,并观察/var/log/syslog
或dmesg
继续出现错误消息。阻塞不良如果在 NCQ 补丁后错误仍然存在,那么...
注意:不要中止坏块扫描!
注意:请勿对 SSD 进行坏块
注意:首先备份您的重要文件!
注意:这将需要几个小时
注意:您可能有待处理的 HDD 故障
以“试用 Ubuntu”模式启动到 Ubuntu Live DVD/USB。
在
terminal
...sudo fdisk -l
# 识别所有“Linux 文件系统”分区sudo e2fsck -fcky /dev/sdXX
# 只读测试或者
sudo e2fsck -fccky /dev/sda2
# 无损读/写测试(推荐)-k 很重要,因为它保存了以前的坏块表,并将任何新的坏块添加到该表中。如果没有 -k,您将丢失所有先前的坏块信息。
-fccky 参数...