cagenut提出的问题 -server

cagenut

Asked: 2010-03-19 14:41:40 +0800 CST

从磁盘错误以只读方式挂载后，如何重新挂载 ext3 fs 读写？

当 SAN 中出现问题时，ext3 检测磁盘写入错误并以只读方式重新挂载文件系统时，这是一个相对常见的问题。一切都很好，只有当 SAN 修复时，我无法弄清楚如何在不重新启动的情况下重新安装文件系统读写。

看哪：

[root@localhost ~]# multipath -ll
mpath0 (36001f93000a310000299000200000000) dm-2 XIOTECH,ISE1400
[size=1.1T][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=2][active]
\_ 1:0:0:1 sdb 8:16  [active][ready]
\_ 2:0:0:1 sdc 8:32  [active][ready]
[root@localhost ~]# mount /dev/mapper/mpath0 /mnt/foo
[root@localhost ~]# touch /mnt/foo/blah

一切都好，现在我从它下面拉出 LUN。

[root@localhost ~]# touch /mnt/foo/blah
[root@localhost ~]# touch /mnt/foo/blah
touch: cannot touch `/mnt/foo/blah': Read-only file system
[root@localhost ~]# tail /var/log/messages
Mar 18 13:17:33 localhost multipathd: sdb: tur checker reports path is down
Mar 18 13:17:34 localhost multipathd: sdc: tur checker reports path is down
Mar 18 13:17:35 localhost kernel: Aborting journal on device dm-2.
Mar 18 13:17:35 localhost kernel: Buffer I/O error on device dm-2, logical block 1545
Mar 18 13:17:35 localhost kernel: lost page write due to I/O error on dm-2
Mar 18 13:17:36 localhost kernel: ext3_abort called.
Mar 18 13:17:36 localhost kernel: EXT3-fs error (device dm-2): ext3_journal_start_sb:   Detected aborted journal                      
Mar 18 13:17:36 localhost kernel: Remounting filesystem read-only

它只认为它是只读的，实际上它甚至不存在。

[root@localhost ~]# multipath -ll
sdb: checker msg is "tur checker reports path is down"
sdc: checker msg is "tur checker reports path is down"
mpath0 (36001f93000a310000299000200000000) dm-2 XIOTECH,ISE1400
[size=1.1T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:1 sdb 8:16  [failed][faulty]
 \_ 2:0:0:1 sdc 8:32  [failed][faulty]
[root@localhost ~]# ll /mnt/foo/
ls: reading directory /mnt/foo/: Input/output error
total 20
-rw-r--r-- 1 root root     0 Mar 18 13:11 bar

它怎么还记得那个'bar'文件在那里......神秘，但现在并不重要。现在我重新介绍 LUN：

[root@localhost ~]# tail /var/log/messages
Mar 18 13:23:58 localhost multipathd: sdb: tur checker reports path is up
Mar 18 13:23:58 localhost multipathd: 8:16: reinstated
Mar 18 13:23:58 localhost multipathd: mpath0: queue_if_no_path enabled
Mar 18 13:23:58 localhost multipathd: mpath0: Recovered to normal mode
Mar 18 13:23:58 localhost multipathd: mpath0: remaining active paths: 1
Mar 18 13:23:58 localhost multipathd: dm-2: add map (uevent)
Mar 18 13:23:58 localhost multipathd: dm-2: devmap already registered
Mar 18 13:23:59 localhost multipathd: sdc: tur checker reports path is up
Mar 18 13:23:59 localhost multipathd: 8:32: reinstated
Mar 18 13:23:59 localhost multipathd: mpath0: remaining active paths: 2
Mar 18 13:23:59 localhost multipathd: dm-2: add map (uevent)
Mar 18 13:23:59 localhost multipathd: dm-2: devmap already registered
[root@localhost ~]# multipath -ll
mpath0 (36001f93000a310000299000200000000) dm-2 XIOTECH,ISE1400
[size=1.1T][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=2][enabled]
 \_ 1:0:0:1 sdb 8:16  [active][ready]
 \_ 2:0:0:1 sdc 8:32  [active][ready]

很棒吧？它在那里说[rw]。没那么快：

[root@localhost ~]# touch /mnt/foo/blah
touch: cannot touch `/mnt/foo/blah': Read-only file system

好的，不会自动执行，我会稍微推动一下：

[root@localhost ~]# mount -o remount /mnt/foo
mount: block device /dev/mapper/mpath0 is write-protected, mounting read-only

你真是个鬼：

[root@localhost ~]# mount -o remount,rw /mnt/foo
mount: block device /dev/mapper/mpath0 is write-protected, mounting read-only

呜呜呜。

我已经尝试了各种不同的 mount/tune2fs/dmsetup 命令，但我无法弄清楚如何让它将块设备取消标记为写保护。重新启动将修复它，但我更愿意在线进行。一个小时的谷歌搜索也让我无处可去。救救我ServerFault。

cagenut

Asked: 2010-02-06 10:20:53 +0800 CST

如何“修复” device-mapper-multipath 中的错误路径

我有一个正在工作但现在显示“错误”路径的多路径配置：

[root@nas ~]# multipath -ll
sdd: checker msg is "readsector0 checker reports path is down"
mpath1 (36001f93000a63000019f000200000000) dm-2 XIOTECH,ISE1400
[size=200G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
 \_ 1:0:0:1 sdb 8:16  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 2:0:0:1 sdd 8:48  [active][faulty]

与此同时，我一遍又一遍地看到这三行/var/log/messages

Feb  5 12:52:57 nas kernel: sd 2:0:0:1: SCSI error: return code = 0x00010000
Feb  5 12:52:57 nas kernel: end_request: I/O error, dev sdd, sector 0
Feb  5 12:52:57 nas kernel: Buffer I/O error on device sdd, logical block 0

这条线也经常出现

Feb  5 12:52:58 nas multipathd: sdd: readsector0 checker reports path is down

我不明白的一件事是为什么当我的文件说要使用时它使用readsector0检查方法/etc/multipath.conftur

[root@nas ~]# tail -n15 /etc/multipath.conf

devices {
        device {
                vendor                  "XIOTECH "
                product                 "ISE1400         "
                path_grouping_policy    multibus
                getuid_callout          "/sbin/scsi_id -g -u -d /dev/%n"
                path_checker            tur
                prio_callout              "none"
                path_selector           "round-robin 0"
                failback                    immediate
                no_path_retry           12
                user_friendly_names yes
        }
}

在这里查看上游文档，这一段似乎很相关： http ://christophe.varoqui.free.fr/usage.html

For each path:

\_ host:channel:id:lun devnode major:minor [path_status][dm_status_if_known]

The dm status (dm_status_if_known) is like the path status
(path_status), but from the kernel's point of view. The dm status has two
states: "failed", which is analogous to "faulty", and "active" which
covers all other path states. Occasionally, the path state and the 
dm state of a device will temporarily not agree.

对我来说已经超过 24 小时，所以它不是暂时的。

因此，以所有这些作为背景，我的问题是
- 我如何确定这里的根本原因？
- 我如何手动/命令行执行它所做的任何检查
- 为什么它忽略我的 multipath.conf（我做错了吗？）

提前感谢您的任何想法，如果还有其他我可以提供的信息，请在评论中告诉我，我会将其编辑到帖子中。

从磁盘错误以只读方式挂载后，如何重新挂载 ext3 fs 读写？

如何“修复” device-mapper-multipath 中的错误路径

新安装后 postgres 的默认超级用户用户名/密码是什么？

SFTP 使用什么端口？

命令行列出 Windows Active Directory 组中的用户？

什么是 Pem 文件，它与其他 OpenSSL 生成的密钥文件格式有何不同？

如何确定bash变量是否为空？

cagenut's questions

从磁盘错误以只读方式挂载后，如何重新挂载 ext3 fs 读写？

如何“修复” device-mapper-multipath 中的错误路径

新安装后 postgres 的默认超级用户用户名/密码是什么？

SFTP 使用什么端口？

命令行列出 Windows Active Directory 组中的用户？

什么是 Pem 文件，它与其他 OpenSSL 生成的密钥文件格式有何不同？

如何确定bash变量是否为空？