当 SAN 中出现问题时,ext3 检测磁盘写入错误并以只读方式重新挂载文件系统时,这是一个相对常见的问题。一切都很好,只有当 SAN 修复时,我无法弄清楚如何在不重新启动的情况下重新安装文件系统读写。
看哪:
[root@localhost ~]# multipath -ll
mpath0 (36001f93000a310000299000200000000) dm-2 XIOTECH,ISE1400
[size=1.1T][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=2][active]
\_ 1:0:0:1 sdb 8:16 [active][ready]
\_ 2:0:0:1 sdc 8:32 [active][ready]
[root@localhost ~]# mount /dev/mapper/mpath0 /mnt/foo
[root@localhost ~]# touch /mnt/foo/blah
一切都好,现在我从它下面拉出 LUN。
[root@localhost ~]# touch /mnt/foo/blah
[root@localhost ~]# touch /mnt/foo/blah
touch: cannot touch `/mnt/foo/blah': Read-only file system
[root@localhost ~]# tail /var/log/messages
Mar 18 13:17:33 localhost multipathd: sdb: tur checker reports path is down
Mar 18 13:17:34 localhost multipathd: sdc: tur checker reports path is down
Mar 18 13:17:35 localhost kernel: Aborting journal on device dm-2.
Mar 18 13:17:35 localhost kernel: Buffer I/O error on device dm-2, logical block 1545
Mar 18 13:17:35 localhost kernel: lost page write due to I/O error on dm-2
Mar 18 13:17:36 localhost kernel: ext3_abort called.
Mar 18 13:17:36 localhost kernel: EXT3-fs error (device dm-2): ext3_journal_start_sb: Detected aborted journal
Mar 18 13:17:36 localhost kernel: Remounting filesystem read-only
它只认为它是只读的,实际上它甚至不存在。
[root@localhost ~]# multipath -ll
sdb: checker msg is "tur checker reports path is down"
sdc: checker msg is "tur checker reports path is down"
mpath0 (36001f93000a310000299000200000000) dm-2 XIOTECH,ISE1400
[size=1.1T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:0:1 sdb 8:16 [failed][faulty]
\_ 2:0:0:1 sdc 8:32 [failed][faulty]
[root@localhost ~]# ll /mnt/foo/
ls: reading directory /mnt/foo/: Input/output error
total 20
-rw-r--r-- 1 root root 0 Mar 18 13:11 bar
它怎么还记得那个'bar'文件在那里......神秘,但现在并不重要。现在我重新介绍 LUN:
[root@localhost ~]# tail /var/log/messages
Mar 18 13:23:58 localhost multipathd: sdb: tur checker reports path is up
Mar 18 13:23:58 localhost multipathd: 8:16: reinstated
Mar 18 13:23:58 localhost multipathd: mpath0: queue_if_no_path enabled
Mar 18 13:23:58 localhost multipathd: mpath0: Recovered to normal mode
Mar 18 13:23:58 localhost multipathd: mpath0: remaining active paths: 1
Mar 18 13:23:58 localhost multipathd: dm-2: add map (uevent)
Mar 18 13:23:58 localhost multipathd: dm-2: devmap already registered
Mar 18 13:23:59 localhost multipathd: sdc: tur checker reports path is up
Mar 18 13:23:59 localhost multipathd: 8:32: reinstated
Mar 18 13:23:59 localhost multipathd: mpath0: remaining active paths: 2
Mar 18 13:23:59 localhost multipathd: dm-2: add map (uevent)
Mar 18 13:23:59 localhost multipathd: dm-2: devmap already registered
[root@localhost ~]# multipath -ll
mpath0 (36001f93000a310000299000200000000) dm-2 XIOTECH,ISE1400
[size=1.1T][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=2][enabled]
\_ 1:0:0:1 sdb 8:16 [active][ready]
\_ 2:0:0:1 sdc 8:32 [active][ready]
很棒吧?它在那里说[rw]。没那么快:
[root@localhost ~]# touch /mnt/foo/blah
touch: cannot touch `/mnt/foo/blah': Read-only file system
好的,不会自动执行,我会稍微推动一下:
[root@localhost ~]# mount -o remount /mnt/foo
mount: block device /dev/mapper/mpath0 is write-protected, mounting read-only
你真是个鬼:
[root@localhost ~]# mount -o remount,rw /mnt/foo
mount: block device /dev/mapper/mpath0 is write-protected, mounting read-only
呜呜呜。
我已经尝试了各种不同的 mount/tune2fs/dmsetup 命令,但我无法弄清楚如何让它将块设备取消标记为写保护。重新启动将修复它,但我更愿意在线进行。一个小时的谷歌搜索也让我无处可去。救救我ServerFault。