我已经寻找并寻找其他有同样问题的人,但这里的所有问题似乎都是关于整个 RAID 在重新启动后消失,而我只有一个成员驱动器有问题。
这是一台视频制作机器,上周(从 CentOS 7 升级到 Rocky 8 后)我们注意到视频播放在视频上产生了视觉伪影。所有视频都存储在附加的 RAID 上。
它是 RAID 60,因此两个 RAID6 每个包含 12 个 1.2TB 驱动器,然后这两个 RAID6 组合在一起形成 RAID0。这是在我开始在这里工作很久之前由外部公司设立的,但它在我的经验中一直很扎实。
通过调查这些视觉伪影,我发现根据 mdadm,其中一个 RAID6 中的一个驱动器被标记为“已删除”。没有这个驱动器,RAID 仍然可以工作,正如您对 RAID6 所期望的那样,但我怀疑它与我们所看到的伪影有关。smartctl 显示有问题的驱动器出现故障,因此我们订购了一个新驱动器。
它是今天早上收到的,从那里我严格按照redhat.com 上的说明进行操作。它花了近三个小时用新驱动器重建 RAID,但它似乎有效,RAID 回来了,我没有看到伪影。
然而,我重新启动了机器,我们又回到了原点。这与我们开始时一模一样,其中一个 RAID6 显示已移除的驱动器。另外,当我查看磁盘时,我可以看到有问题的驱动器 (/dev/sdc) 已丢失其分区,或者至少它显示“1.2TB 可用空间”而不是“1.2TB Linux RAID 成员”。我想(希望)也许这是一个侥幸,今晚我再次经历了整个过程,并且发生了完全相同的事情。我第二次做的唯一不同的是/etc/mdadm.conf
使用mdadm --examine --scan >> /etc/mdadm/mdadm.conf
as su 创建一个文件,但似乎没有什么区别。我现在已经清除了文件以便重新开始。
我一生都无法弄清楚发生了什么事。我对 Linux 相当有能力,但在本周之前我什至不知道 mdadm 的存在,所以我一直在尝试即时学习。这台生产机器需要在周二恢复运行,所以我反对它!我将在一夜之间再次重建 RAID,并于明天重新开始。以下是我认为您可能需要的所有输出,但如果我可以提供其他任何内容,请告诉我。
输出cat /proc/mdstat
:
Personalities : [raid6] [raid5] [raid4] [raid0]
md103 : active raid0 md101[0] md102[1]
23439351808 blocks super 1.2 512k chunks
md102 : active raid6 sdu[6] sdz[11] sdx[9] sdw[8] sdy[10] sdq[2] sdt[5] sdr[3] sdv[7] sds[4] sdo[0] sdp[1]
11719808000 blocks super 1.2 level 6, 512k chunk, algorithm 2 [12/12] [UUUUUUUUUUUU]
bitmap: 0/9 pages [0KB], 65536KB chunk
md101 : active raid6 sdk[8] sdh[5] sdg[4] sdl[9] sdf[3] sdi[6] sdj[7] sde[2] sdd[1] sdm[10] sdn[11]
11719808000 blocks super 1.2 level 6, 512k chunk, algorithm 2 [12/11] [_UUUUUUUUUUU]
bitmap: 1/9 pages [4KB], 65536KB chunk
mdadm --detail
有问题的 RAID6的输出:
/dev/md101:
Version : 1.2
Creation Time : Tue Jun 8 17:37:23 2021
Raid Level : raid6
Array Size : 11719808000 (10.91 TiB 12.00 TB)
Used Dev Size : 1171980800 (1117.69 GiB 1200.11 GB)
Raid Devices : 12
Total Devices : 11
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Fri Jan 12 19:20:16 2024
State : clean, degraded
Active Devices : 11
Working Devices : 11
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Consistency Policy : bitmap
Name : grade1:101
UUID : 56d9ee6d:3a9ef416:91d3b7ec:0da562b0
Events : 1036527
Number Major Minor RaidDevice State
- 0 0 0 removed
1 8 48 1 active sync /dev/sdd
2 8 64 2 active sync /dev/sde
3 8 80 3 active sync /dev/sdf
4 8 96 4 active sync /dev/sdg
5 8 112 5 active sync /dev/sdh
6 8 128 6 active sync /dev/sdi
7 8 144 7 active sync /dev/sdj
8 8 160 8 active sync /dev/sdk
9 8 176 9 active sync /dev/sdl
10 8 192 10 active sync /dev/sdm
11 8 208 11 active sync /dev/sdn
这可能有点矫枉过正,但fdisk -l
其输出很长,因为驱动器太多了。sda 和 sdb 是操作系统驱动器,问题驱动器 /dev/sdc 看起来有所不同,因为我已经按照准备读取它的 redhat.com 说明在其上运行了 sgdisk:
Disk /dev/sda: 894.3 GiB, 960197124096 bytes, 1875385008 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 18255C5C-FE0C-4ADB-9D13-52560809D652
Device Start End Sectors Size Type
/dev/sda1 2048 1230847 1228800 600M EFI System
/dev/sda2 1230848 3327999 2097152 1G Linux filesystem
/dev/sda3 3328000 1875384319 1872056320 892.7G Linux LVM
Disk /dev/sdb: 894.3 GiB, 960197124096 bytes, 1875385008 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: D2D8699C-C29B-4C34-B126-3667FA7B794A
Device Start End Sectors Size Type
/dev/sdb1 2048 1875384319 1875382272 894.3G Linux LVM
Disk /dev/mapper/rl-root: 70 GiB, 75161927680 bytes, 146800640 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/mapper/rl-swap: 4 GiB, 4294967296 bytes, 8388608 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdd: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdc: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: B9AB730B-09DD-44FF-BD9E-79502FB2CF5E
Disk /dev/sdh: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdg: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sde: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdl: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdi: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdm: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdo: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdp: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdk: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdq: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdf: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdn: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdr: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdj: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sds: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdt: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdu: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdv: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdw: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdx: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdy: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdz: 1.1 TiB, 1200243695616 bytes, 2344225968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/mapper/rl-home: 1.7 TiB, 1839227469824 bytes, 3592241152 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/md101: 10.9 TiB, 12001083392000 bytes, 23439616000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 524288 bytes / 5242880 bytes
Disk /dev/md102: 10.9 TiB, 12001083392000 bytes, 23439616000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 524288 bytes / 5242880 bytes
Disk /dev/md103: 21.8 TiB, 24001896251392 bytes, 46878703616 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 524288 bytes / 5242880 bytes
有些东西正在擦除您的 mdadm 元数据。它可能会在磁盘末尾看到 GPT 备份标头,并通过在磁盘开头重写它来帮助“修复”它,并在此过程中擦除 mdadm 元数据。
当使用整个驱动器而不是 RAID、LUKS、文件系统等分区时,这是一个典型的问题。它工作正常,直到出现问题,因为许多程序试图帮助您对驱动器进行分区。这不仅可以将一个驱动器从阵列中踢出,还可以将所有驱动器踢出......
您可以尝试使用以下命令清除分区表 GPT 标头(磁盘的开头和结尾),
wipefs
然后希望不会再尝试写入新的分区表。我更喜欢有一个分区表并使用分区而不是整个磁盘(就像您链接的教程中描述的设置)。它更标准/更不容易发生此类事故,因为大多数软件都知道单独保留分区,而这对于“未分区”驱动器来说是不行的。
但就您而言,这将涉及迁移整个设置,这可能也不是您想要的。
进行备份,特别是元数据/标头备份,其中包括驱动器序列号,以便您知道以后如何分配它们。如果您将来遇到类似的分区表事故,这可能会帮助您恢复。