我公司有一台服务器,它是生产环境的一部分。在服务器上有一个 ActiveMQ 服务器正在运行。我登录到 Active-MQ UI 并尝试创建一个新队列。当我这样做时,我收到了这条消息:
HTTP ERROR: 500
/workspace/development/org/apache/activemq/5.1.0/data/kr-store/data/data-container-roots-2 (Read-only file system)
RequestURI=/admin/createDestination.action
Caused by:
java.io.FileNotFoundException: /workspace/development/org/apache/activemq/5.1.0/data/kr-store/data/data-container-roots-2 (Read-only file system)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:212)
at org.apache.activemq.kaha.impl.data.DataFile.getRandomAccessFile(DataFile.java:51)
at org.apache.activemq.kaha.impl.data.SyncDataFileWriter.storeItem(SyncDataFileWriter.java:71)
我知道“找不到文件”消息,但它似乎与问题没有直接关系。
为了解决这个问题,我登录到服务器并运行了一些测试,在这些测试中我发现我尝试运行的一些基本命令失败并出现相同的错误:
[root@ctrl3 kr-store]# touch 1
touch: cannot touch `1': Read-only file system
[root@ctrl3 /]# chgrp users /workspace
chgrp: changing group of `/workspace': Read-only file system
[root@ctrl3 kr-store]# chown peeradmin.users /workspace
chown: changing ownership of `/workspace': Read-only file system
[root@ctrl3 kr-store]# ls -ld data
drwxrwxr-x 2 peeradmin users 4096 AUG 12 12:27 data
[root@ctrl3 kr-store]# chmod o+w data/
chmod: changing permissions of `data/': Read-only file system
如果我没记错的话,上次遇到这样的错误,后来我们发现是磁盘I/O问题,但如果不是这样,那还能是什么?
编辑#1:
[root@ctrl3 kr-store]# cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / ext3 ro,data=ordered 0 0
/dev /dev tmpfs rw 0 0
/proc /proc proc rw 0 0
/sys /sys sysfs rw 0 0
/proc/bus/usb /proc/bus/usb usbfs rw 0 0
devpts /dev/pts devpts rw 0 0
/dev/sda7 /tmp ext3 rw,data=ordered 0 0
/dev/VolGroup00/LogVol00 /workspace ext3 ro,data=ordered 0 0
/dev/sda5 /usr ext3 rw,data=ordered 0 0
/dev/sda3 /var ext3 rw,data=ordered 0 0
/dev/sda1 /boot ext3 rw,data=ordered 0 0
tmpfs /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
/etc/auto.misc /misc autofs rw,fd=7,pgrp=3795,timeout=300,minproto=5,maxproto=5,indirect 0 0
-hosts /net autofs rw,fd=13,pgrp=3795,timeout=300,minproto=5,maxproto=5,indirect 0 0
atlas.sj.company.com:/volumes/atlas_vol/NFS1 /nfs1 nfs rw,noatime,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=atlas.sj.company.com 0 0
atlas.sj.company.com:/volumes/atlas_vol/NFS1/NIS/home /home nfs rw,noatime,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=atlas.sj.company.com 0 0
atlas.sj.company.com:/volumes/atlas_vol/NFS1 /nfs1 nfs rw,noatime,vers=3,rsize=1048576,wsize=1048576,hard,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=atlas.sj.company.com 0 0
atlas.sj.company.com:/volumes/atlas_vol/NFS1/NIS/home /home nfs rw,noatime,vers=3,rsize=1048576,wsize=1048576,hard,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=atlas.sj.company.com 0 0
斯文:日志什么也没说:
[root@ctrl3 kr-store]# cat /var/log/messages |grep -v [xinetd\|snmpd]
[root@ctrl3 kr-store]#
另外,如果什么都不能写入磁盘,那么我猜日志也不能更新。
编辑#2:所以看起来文件系统已经以某种方式损坏了......我是对的吗?
SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
ext3_abort called.
EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
sd 1:0:0:0: SCSI error: return code = 0x06000000
end_request: I/O error, dev sdb, sector 745962211
printk: 215 messages suppressed.
Buffer I/O error on device dm-0, logical block 51773423
lost page write due to I/O error on dm-0
Buffer I/O error on device dm-0, logical block 51773424
lost page write due to I/O error on dm-0
Buffer I/O error on device dm-0, logical block 51773425
lost page write due to I/O error on dm-0
提前致谢,
您的文件系统似乎以只读方式安装。您可以通过
cat /proc/mounts
. 以只读方式重新挂载的文件系统通常是由文件系统错误引起的。其原因可能是硬盘问题,因此您应该检查您的磁盘(SMART 值、硬件 RAID 情况下的控制器状态等)编辑#1:您的安装表明它确实是只读安装的:
您可以尝试将卷重新挂载为可写,但在您发现它之前为什么以只读方式重新挂载之前,我不建议这样做,否则您将面临丢失数据的风险:
在任何情况下,您都应该首先检查输出
dmesg
并通过smartctl
.编辑#2:
似乎 sdb 是这里的物理问题:
检查输出