昨天我们的服务器(Ubuntu 18.04)达到了 100% 的存储容量
,并将我们的一个文件系统设置为只读模式,请参阅:/dev/md3 / ext4 ro,relatime,errors=remount-ro,data=ordered 0 0
. 我已经从其他关于 serverfault 的答案中尝试了几种解决方案,但似乎没有一个适合我的情况。
例如,我尝试执行以下命令:sudo mount -o remount,rw /dev/md3 /
,但这会导致消息:mount: /: cannot remount /dev/md3 read-write, is write-protected.
如何解决此问题以使文件系统再次读写?
谢谢!
使用调试信息更新:
mdadm --detail /dev/md3
/dev/md3:
Version : 0.90
Creation Time : Fri Nov 10 10:07:34 2017
Raid Level : raid1
Array Size : 20478912 (19.53 GiB 20.97 GB)
Used Dev Size : 20478912 (19.53 GiB 20.97 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 3
Persistence : Superblock is persistent
Update Time : Sat Sep 18 09:15:35 2021
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : unknown
UUID : 4b632ac4:ae1a7c2b:a4d2adc2:26fd5302
Events : 0.861
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 19 1 active sync /dev/sdb3
并使用 dmesg:
dmesg | grep "md3"
[67448453.830094] EXT4-fs error (device md3): ext4_remount:4840: Abort forced by user
执行tune2fs
:
tune2fs -l /dev/md3
tune2fs 1.44.1 (24-Mar-2018)
Filesystem volume name: /
Last mounted on: /
Filesystem UUID: d1a985c4-8c5e-4034-93e0-629b8e65f161
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean with errors
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 1281120
Block count: 5119728
Reserved block count: 255986
Free blocks: 445848
Free inodes: 1001361
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1022
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8160
Inode blocks per group: 510
Flex block group size: 16
Filesystem created: Fri Nov 10 10:07:39 2017
Last mount time: Tue Jul 30 17:51:41 2019
Last write time: Thu Sep 16 20:06:05 2021
Mount count: 7
Maximum mount count: -1
Last checked: Fri Nov 10 10:07:39 2017
Check interval: 0 (<none>)
Lifetime writes: 4013 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
First orphan inode: 663035
Default directory hash: half_md4
Directory Hash Seed: ae316af1-086d-470f-af27-0c10ca25f3c8
Journal backup: inode blocks
FS Error count: 8
First error time: Thu Sep 16 20:06:04 2021
First error function: ext4_lookup
First error line #: 1607
First error inode #: 930317
First error block #: 0
Last error time: Sat Sep 18 09:15:35 2021
Last error function: ext4_remount
Last error line #: 4840
Last error inode #: 685456
Last error block #: 0
使用调试信息e2fsck -n /dev/md3
:
e2fsck -n /dev/md3
e2fsck 1.44.1 (24-Mar-2018)
Warning: skipping journal recovery because doing a read-only filesystem check.
/ contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found. Fix? no
Inode 101 was part of the orphaned inode list. IGNORED.
Inode 117 was part of the orphaned inode list. IGNORED.
Inode 292 was part of the orphaned inode list. IGNORED.
Inode 460 was part of the orphaned inode list. IGNORED.
Inode 465 was part of the orphaned inode list. IGNORED.
Inode 471 was part of the orphaned inode list. IGNORED.
Inode 487 was part of the orphaned inode list. IGNORED.
Inode 529 was part of the orphaned inode list. IGNORED.
Inode 562 was part of the orphaned inode list. IGNORED.
Inode 564 was part of the orphaned inode list. IGNORED.
Inode 707 was part of the orphaned inode list. IGNORED.
Inode 723 was part of the orphaned inode list. IGNORED.
Inode 918 was part of the orphaned inode list. IGNORED.
...
Deleted inode 402614 has zero dtime. Fix? no
...
Inode 783370, end of extent exceeds allowed value
(logical block 1024, physical block 3068928, len 76)
Clear? no
Inode 783370, i_blocks is 8784, should be 8200. Fix? no
Inode 783470, end of extent exceeds allowed value
(logical block 2708, physical block 1322783, len 193)
Clear? no
Inode 783470, i_blocks is 23200, should be 21672. Fix? no
Inode 1047956 was part of the orphaned inode list. IGNORED.
Pass 2: Checking directory structure
Entry 'tmp' in /tmp/systemd-private-bb09aae54cab4e12844e5844d11ca5eb-certbot.service-VSBnVY (685456) has deleted/unused inode 685457. Clear? no
Entry '1159_key-certbot.pem' in /etc/letsencrypt/keys (930317) has deleted/unused inode 920168. Clear? no
Entry '1159_key-certbot.pem' in /etc/letsencrypt/keys (930317) has an incorrect filetype (was 1, should be 0).
Fix? no
Entry '1110_csr-certbot.pem' in /etc/letsencrypt/csr (930318) has deleted/unused inode 920176. Clear? no
Entry '1110_csr-certbot.pem' in /etc/letsencrypt/csr (930318) has an incorrect filetype (was 1, should be 0).
Fix? no
Entry '1106_key-certbot.pem' in /etc/letsencrypt/keys (930317) has deleted/unused inode 920166. Clear? no
Entry '1106_key-certbot.pem' in /etc/letsencrypt/keys (930317) has an incorrect filetype (was 1, should be 0).
Fix? no
Entry '1109_key-certbot.pem' in /etc/letsencrypt/keys (930317) has deleted/unused inode 920173. Clear? no
Entry '1109_key-certbot.pem' in /etc/letsencrypt/keys (930317) has an incorrect filetype (was 1, should be 0).
Fix? no
Entry '1146_csr-certbot.pem' in /etc/letsencrypt/csr (930318) has deleted/unused inode 920172. Clear? no
Entry '1146_csr-certbot.pem' in /etc/letsencrypt/csr (930318) has an incorrect filetype (was 1, should be 0).
Fix? no
...
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 685456 ref count is 3, should be 2. Fix? no
Pass 5: Checking group summary information
Block bitmap differences: -34565 -(53721--53734) -(59721--59761) -(59981--59983) -(61106--61184) -(61540--61544) -(70964--71007) -(71274--71313) -(84938--84989) -(85084--85107) -(85592--85599) -(116400--116408) -(116423--116436) -(128700--128703) -(128708--128721) -(138904--138914) -(165045--165150) -(169691--169713) -(169717--169742) -(464896--471464) -(471552--471989) -(472928--472947) -(499200--499612) -(501408--501434) -(503808--504070) -(513024--513301) -(513408--513491) -(589477--589480) -(711431--711441) -(747968--748030) -(838733--838740) -(838755--838758) -(838772--838783) -(838791--838800) -(838805--838816) -(838824--838835) -(848384--848972) -(875840--875880) -(1032187--1033031) -(1083840--1083878) -(1120110--1120132) -(1322783--1322975) -(1631196--1631251) -(1635150--1635169) -(1635360--1635391) -(1635571--1635575) -(1635848--1635855) -(1635996--1636001) -1648860 -1648880 -(1715533--1715536) -(1740800--1741311) -(1746432--1746573) -(1750528--1750729) -(1867776--1867880) -(1870717--1871294) -(1880576--1880791) -(1888256--1888258) -1888260 -(1888272--1888273) -(1888275--1888767) -(2226402--2226405) -(2235495--2235719) -(2266304--2266332) -(2301560--2301629) -(2528723--2528753) -(2589088--2589117) -(2597312--2597374) -(2597696--2597757) -(2614784--2615295) -(2619392--2619458) -(2619904--2620297) -2636181 -(2671360--2671491) -(2687328--2687350) -(3068928--3069003) -(3196998--3197002) -(3228728--3228738) -(3236697--3236703) -(3252961--3252970) -(3264276--3264277) -(3264287--3264298) -(3285164--3285170) -(3299518--3299524) -(3399680--3400062) -(3441024--3441129) -(3574080--3574142) -(3601664--3601795) -(3659648--3659724) -(3660672--3660755) -(3704233--3704234) -(3704237--3704242) -3707626 -3708898 -3709310 -3709356 -3709398 -3709984 -(3751694--3751696) -(3751707--3751711) -(3751767--3751768) -(3751774--3751775) -(3751800--3751814) -(3771264--3771343) -(3830025--3830040) -(3860480--3867203) -(3867616--3867644) -(3868160--3868618) -(3869696--3870139) -(4045457--4045483) -(4087936--4088023) -(4088032--4088055) -(4088320--4088780) -(4088960--4089064) -(4089088--4089126) -(4091136--4091324) -(4091392--4092119) -(4092928--4094514) -(4094976--4095854) -(4097088--4097120) -(4097536--4097816) -(4109312--4110157) -(4250368--4250378) -(4278497--4278513) -(4296960--4297014) -(4325486--4325616) -(4325632--4325707) -(4326688--4327074) -(4328826--4328961) -(4329202--4329314) -(4329600--4329666) -(4329764--4329804) -(4332027--4332178) -(4332406--4332476) -(4333568--4333942) -(4334372--4334454) -(4334564--4335227) -(4621153--4621176) -(4669781--4670170) -(4696470--4696548) -(4697074--4697429) -(4697662--4697711) -(4726778--4727894) -(5055921--5056185) -(5056648--5056667) -(5106412--5106620) -(5106668--5107034)
Fix? no
Free blocks count wrong for group #76 (3374, counted=3375).
Fix? no
Free blocks count wrong (445848, counted=445849).
Fix? no
Inode bitmap differences: -101 -117 -292 -460 -465 -471 -487 -529 -562 -564 -707 -723 -918 -(1837--1838) -2041 -2714 -3593 -3654 -3659 -3894 -3976 -4336 -4425 -5193 -5244 -5252 -5930 -5951 -5967 -(7066--7069) -7431 -8492 -8651 -9298 -9583 -9592 -14261 -14270 -18093 -19214 -21301 -(27843--27844) -27847 -27849 -(27853--27856) -(27868--27869) -(27872--27873) -27875 -27879 -27883 -27885 -(27889--27890) -27892 -162842 -391708 -391741 -391759 -391763 -(391800--391802) -(391804--391805) -(391812--391814) -(391831--391833) -391870 -391873 -391878 -391900 -391902 -(391910--391911) -391915 -391919 -391927 -391956 -392493 -392719 -393759 -393795 -395132 -395134 -395161 -395165 -395221 -395234 -395267 -395289 -(395312--395313) -395315 -395325 -395336 -395387 -395630 -396550 -396589 -(396699--396700) -402594 -(402596--402598) -402601 -(402604--402606) -402608 -(402611--402614) -407918 -413872 -413874 -413881 -413885 -413897 -413900 -413908 -421042 -421202 -421226 -426391 -652905 -(652931--652935) -663035 -685457 -920162 -(920164--920176) -1047956
Fix? no
Directories count wrong for group #84 (17, counted=16).
Fix? no
Free inodes count wrong for group #96 (80, counted=82).
Fix? no
Free inodes count wrong for group #112 (486, counted=487).
Fix? no
Free inodes count wrong (1001361, counted=1001364).
Fix? no
/: ********** WARNING: Filesystem still has errors **********
/: 279759/1281120 files (0.7% non-contiguous), 4673880/5119728 blocks
正是文件系统损坏导致此切换为只读模式,而不是其溢出,完全遵循 mount 选项
errors=remount-ro
。备份重要数据和配置并将它们下载到某处。如果启动重要的东西被破坏,请为案例准备恢复计划。如果可能,将重要的服务移到另一台机器上。会有一些停机时间。
我注意到这个系统不会经常重启(自 2017 年以来只有 7 次安装,上次重启是在 2019 年)。所以我建议将最大挂载计数设置为 1,这样每次启动都会检查它:
然后重新启动。初始化脚本应在引导期间检查并修复文件系统。但是,损坏可能非常严重,因此可能需要手动交互,因此请确保有人在服务器附近并准备好帮助您。而且,如果这种腐败触动了一些重要的事情,你可能会遇到奇怪的问题。
在最坏的情况下,您将不得不重新安装系统。但不要忘记再次将最大安装计数设置为 1。
为什么文件系统损坏了?它只是发生。块存储在内存中,并且由于宇宙射线的原因,可能在那里发生了损坏。非常罕见的情况,有时会发生。然后,磁盘也不理想,无法检测到所有错误;存在非零位错误率(在您的设备数据表中查找实际值),因此数据被读取损坏的可能性非常低,但仍有可能。如果这发生在元数据块上,问题可能会累积(由错误信息引导的文件系统驱动程序可能会做出一些不正确的假设并进一步破坏文件系统),这就是为什么不时检查它很重要的原因。