#!/bin/sh
echo
echo "sleeping for 30 seconds while udevd and mdadm settle down"
sleep 5
echo "sleeping for 25 seconds while udevd and mdadm settle down"
sleep 5
echo "sleeping for 20 seconds while udevd and mdadm settle down"
sleep 5
echo "sleeping for 15 seconds while udevd and mdadm settle down"
sleep 5
echo "sleeping for 10 seconds while udevd and mdadm settle down"
sleep 5
echo "sleeping for 5 seconds while udevd and mdadm settle down"
sleep 5
echo "done sleeping"
Failure of either RAID component device can be simulated using mdadm. However, to verify that the boot stuff would survive a disk failure I had to shut down the computer and disconnecting power from a disk. If you do so, first ensure that the md devices are sync'ed.
In the instructions below, sdX is the failed device (X=a or b) and sdY is the ok device.
Disconnect a drive
Shutdown the computer. Disconnect a drive. Restart. Ubuntu should now boot with the RAID drives in degraded mode. (Celebrate! This is what you were trying to achieve! ;)
This is the process to follow if you have needed to replace a faulty disk. If you want to emulate a replacement, you may boot into a Ubuntu Live session and use
dd if=/dev/zero of=/dev/sdX
to wipe the disk clean before re-rebooting into the real system. If you just tested the boot/RAID redundancy in the section above, you can skip this step. However, you must at least perform steps 2 and 4 below to recover full boot/RAID redundancy for your system.
Restoring the RAID+boot system after a disk replacement requires the following steps:
Now, rebooting the system should have it back to normal (the RAID devices may still be sync'ing)!
Why the sleep script?
It has been suggested by the community that adding a sleep script might be unnecessary and could be replaced by using GRUB_CMDLINE_LINUX="rootdelay=30" in /etc/default/grub followed by sudo update-grub. This suggestion is certainly cleaner and does work in a disk failure/replace scenario. However, there is a caveat...
I disconnected my second SSD and found out that with rootdelay=30, etc. instead of the sleep script:
1) The system does boot in degraded mode without the "failed" drive.
2) In non-degraded boot (both drives present), the boot time is reduced. The delay is only perceptible with the second drive missing.
1) and 2) sounded great until I re-added my second drive. At boot, the RAID array failed to assemble and left me at the initramfs prompt without knowing what to do. It might have been possible to salvage the situation by a) booting to the Ubuntu Live USB stick, b) installing mdadm and c) re-assembling the array manually but...I messed up somewhere. Instead, when I re-ran this test with the sleep script (yes, I did start the HOWTO from the top for the nth time...), the system did boot. The arrays were in degraded mode and I could manually re-add the /dev/sdb[23] partitions without any extra USB stick. I don't know why the sleep script works whereas the rootdelay doesn't. Perhaps mdadm gets confused by two, slightly out-of-sync component devices, but I thought mdadm was designed to handle that. Anyway, since the sleep script works, I'm sticking to it.
It could be argued that removing a perfectly healthy RAID component device, re-booting the RAID to degraded mode and then re-adding the component device is an unrealistic scenario: The realistic scenario is rather that one device fails and is replaced by a new one, leaving less opportunity for mdadm to get confused. I agree with that argument. However, I don't know how to test how the system tolerates a hardware failure except to actually disable some hardware! And after testing, I want to get back to a redundant, working system. (Well, I could attach my second SSD to another machine and swipe it before I re-add it, but that's not feasible.)
In summary: To my knowledge, the rootdelay solution is clean, faster than the sleep script for non-degraded boots, and should work for a real drive failure/replace scenario. However, I don't know a feasible way to test it. So, for the time being, I will stick to the ugly sleep script.
My suggestion is for Debian OS, but I think it would work also for Ubuntu and others.
One possible way to solve a problem that occurs with lot of motherboards not correctly handling the UEFI entries (Debian doesn't boot even if you made the correct entry efibootmgr -c -g -d /dev/sda -p 1 -w -L "debian" -l /EFI/debian/grubx64.efi , UEFI BIOS shows a "debian" bootable disk but it wouldn't boot from it) , is to use instead the generic entry /boot/efi/EFI/boot/bootx4.efi.
For example Asus Z87C doesn't like /EFI/debian/grubx64.efi.
So, if you mounted the efi partition /dev/sda1 to /boot/efi path:
UEFI BIOS will see a "UEFI OS" generic disk, and also any other entry previously created with efibootmgr, but it would boot from the "UEFI OS" generic without any trouble.
更新:我已经验证下面的描述也适用于 Ubuntu 16.04。其他用户报告在 17.10 和 18.04.1 上工作。
注意:本 HOWTO 不会为您提供 LVM。如果您也想要 LVM,请尝试在具有 UEFI BIOS 的机器上安装具有 RAID 1 和 LVM 的 Ubuntu 18.04 桌面。
经过几天的尝试,我现在有了一个可以工作的系统!简而言之,该解决方案包括以下步骤:
解决方案第 6 步的一个关键组成部分是启动顺序的延迟,否则如果缺少任何一个 SSD,我就会直接进入 GRUB 提示符(没有键盘!)。
详细的HOWTO
1.开机
从 U 盘使用 EFI 引导。具体如何因您的系统而异。选择Try ubuntu without installing。
启动终端仿真器,例如
xterm
运行以下命令。1.1 从另一台电脑登录
在尝试这一点时,我经常发现从另一台已经完全配置的计算机登录更容易。这简化了命令等的剪切和粘贴。如果你想做同样的事情,你可以通过执行以下操作通过 ssh 登录:
在要配置的电脑上,安装openssh服务器:
更改密码。用户的默认密码为
ubuntu
空。您或许可以选择一个中等强度的密码。一旦您重新启动新计算机,它就会被遗忘。现在您可以从另一台计算机登录到 ubuntu live 会话。以下说明适用于 Linux:
如果您收到有关疑似中间人攻击的警告,则需要清除用于识别新计算机的 ssh 密钥。这是因为
openssh-server
无论何时安装都会生成新的服务器密钥。要使用的命令通常会打印出来,看起来应该像执行该命令后,您应该能够登录到 ubuntu live 会话。
2.分区磁盘
清除所有旧分区和引导块。警告!这会破坏磁盘上的数据!
在最小的驱动器上创建新分区:100M 用于 ESP,32G 用于 RAID SWAP,其余用于 RAID root。如果您的 sda 驱动器是最小的,请遵循第 2.1 节,否则请遵循第 2.2 节。
2.1 创建分区表(/dev/sda较小)
执行以下步骤:
将分区表复制到其他磁盘并重新生成唯一的 UUID(实际上会为 sda 重新生成 UUID)。
2.2 创建分区表(/dev/sdb较小)
执行以下步骤:
将分区表复制到其他磁盘并重新生成唯一的 UUID(实际上会为 sdb 重新生成 UUID)。
2.3 在/dev/sda上创建FAT32文件系统
为 EFI 分区创建 FAT32 文件系统。
3.安装缺少的包
Ubuntu Live CD 没有两个密钥包;grub-efi 和 mdadm。安装它们。(我不是 100% 确定这里需要 grub-efi,但为了与即将到来的安装保持对称,也将其引入。)
如果启用了安全启动,您可能需要
grub-efi-amd64-signed
而不是。grub-efi-amd64
(见 Alecz 的评论。)4.创建RAID分区
在降级模式下创建 RAID 设备。这些设备将在稍后完成。创建一个完整的 RAID1 确实有时会在
ubiquity
下面的安装过程中给我带来问题,不知道为什么。(挂载/卸载?格式?)验证 RAID 状态。
对 md 设备进行分区。
5.运行安装程序
运行无处不在的安装程序,排除无论如何都会失败的引导加载程序。(注意:如果您已经通过 ssh 登录,您可能希望改为在您的新计算机上执行此操作。)
安装类型选择Something else,修改
md1p1
类型为ext4
,格式:yes,挂载点/
。该md0p1
分区将自动选择为交换分区。安装完成后喝杯咖啡。
重要提示:安装完成后,选择继续测试,因为系统尚未准备好启动。
完成 RAID 设备
将等待的 sdb 分区附加到 RAID。
验证所有 RAID 设备是否正常(并可选择同步)。
以下过程可能会在同步期间继续,包括重新启动。
6.配置安装的系统
设置以启用 chroot 进入安装系统。
配置和安装包。
如果您的 md 设备仍在同步,您可能会偶尔看到如下警告:
这是正常的,可以忽略(请参阅 此问题底部的答案)。
禁用
quick_boot
将避免Diskfilter writes are not supported错误。禁用quiet_boot
仅是个人喜好。修改 /etc/mdadm/mdadm.conf 以删除任何标签引用,即更改
至
这一步可能是不必要的,但我看到一些页面表明命名方案可能不稳定 (name=ubuntu:0/1),这可能会阻止一个完美的 RAID 设备在引导期间组装。
修改行
/etc/default/grub
以读取同样,这一步可能是不必要的,但我更喜欢睁着眼睛启动......
6.1. 添加睡眠脚本
(社区建议这一步可能是不必要的,可以使用
GRUB_CMDLINE_LINUX="rootdelay=30"
in代替/etc/default/grub
。出于本 HOWTO 底部解释的原因,我建议坚持使用睡眠脚本,即使它比使用 rootdelay 更丑陋。因此,我们继续我们的常规计划......)创建将等待 RAID 设备稳定的脚本。如果没有这个延迟,由于 RAID 组件没有及时完成,root 的安装可能会失败。我很难发现这一点——直到我断开其中一个 SSD 的连接以模拟磁盘故障后,问题才出现!时间可能需要根据可用的硬件进行调整,例如慢速外部 USB 磁盘等。
将以下代码输入
/usr/share/initramfs-tools/scripts/local-premount/sleepAwhile
:使脚本可执行并安装它。
7.启用从第一个SSD启动
现在系统差不多准备好了,只需要安装UEFI启动参数。
这会将引导加载程序安装在
/boot/efi/EFI/Ubuntu
(又名EFI/Ubuntu
on/dev/sda1
)中,并首先将其安装在计算机上的 UEFI 引导链中。8.启用从第二个SSD启动
我们快完成了。此时,我们应该能够在
sda
驱动器上重新启动。此外,mdadm
应该能够处理sda
或sdb
驱动器的故障。但是,EFI 没有 RAID,所以我们需要克隆它。除了在第二个驱动器上安装引导加载程序之外,这将使
sdb1
分区上 FAT32 文件系统的 UUID(如 报告的那样blkid
)与sda1
和的 UUID 相匹配/etc/fstab
。(但是请注意,/dev/sda1
和/dev/sdb1
分区的 UUID 仍然不同 - 与安装后比较ls -la /dev/disk/by-partuuid | grep sd[ab]1
以blkid /dev/sd[ab]1
自行检查。)最后,我们必须将
sdb1
分区插入引导顺序。(注意:这一步可能是不必要的,具体取决于您的 BIOS。我收到一些 BIOS 自动生成有效 ESP 列表的报告。)我没有测试它,但可能需要在 ESP on
sda
和sdb
.这将生成当前引导顺序的打印输出,例如
请注意,Ubuntu #2 (sdb) 和 Ubuntu (sda) 是引导顺序中的第一个。
重启
现在我们准备重启了。
系统现在应该重新启动进入 Ubuntu(您可能必须先删除 Ubuntu Live 安装介质。)
启动后,您可以运行
将 Windows 引导加载程序附加到 grub 引导链。
虚拟机陷阱
如果你想先在虚拟机中尝试这个,有一些注意事项:显然,保存 UEFI 信息的 NVRAM 在重启之间被记住,但在关机重启周期之间不会被记住。在这种情况下,您可能最终会进入 UEFI Shell 控制台。
/dev/sda1
以下命令应将您从(FS1:
用于)引导到您的机器/dev/sdb1
:UEFI boot in virtualbox的最佳答案中的第一个解决方案- Ubuntu 12.04也可能有帮助。
模拟磁盘故障
Failure of either RAID component device can be simulated using
mdadm
. However, to verify that the boot stuff would survive a disk failure I had to shut down the computer and disconnecting power from a disk. If you do so, first ensure that the md devices are sync'ed.In the instructions below, sdX is the failed device (X=a or b) and sdY is the ok device.
Disconnect a drive
Shutdown the computer. Disconnect a drive. Restart. Ubuntu should now boot with the RAID drives in degraded mode. (Celebrate! This is what you were trying to achieve! ;)
Recover from a failed disk
This is the process to follow if you have needed to replace a faulty disk. If you want to emulate a replacement, you may boot into a Ubuntu Live session and use
to wipe the disk clean before re-rebooting into the real system. If you just tested the boot/RAID redundancy in the section above, you can skip this step. However, you must at least perform steps 2 and 4 below to recover full boot/RAID redundancy for your system.
Restoring the RAID+boot system after a disk replacement requires the following steps:
1. Partition the new drive
Copy the partition table from the healthy drive:
Re-randomize UUIDs on the new drive.
2. Add to md devices
3. Clone the boot partition
Clone the ESP from the healthy drive. (Careful, maybe do a dump-to-file of both ESPs first to enable recovery if you really screw it up.)
4. Insert the newly revived disk into the boot order
Add an EFI record for the clone. Modify the -L label as required.
Now, rebooting the system should have it back to normal (the RAID devices may still be sync'ing)!
Why the sleep script?
It has been suggested by the community that adding a sleep script might be unnecessary and could be replaced by using
GRUB_CMDLINE_LINUX="rootdelay=30"
in/etc/default/grub
followed bysudo update-grub
. This suggestion is certainly cleaner and does work in a disk failure/replace scenario. However, there is a caveat...I disconnected my second SSD and found out that with
rootdelay=30
, etc. instead of the sleep script:1) The system does boot in degraded mode without the "failed" drive.
2) In non-degraded boot (both drives present), the boot time is reduced. The delay is only perceptible with the second drive missing.
1) and 2) sounded great until I re-added my second drive. At boot, the RAID array failed to assemble and left me at the
initramfs
prompt without knowing what to do. It might have been possible to salvage the situation by a) booting to the Ubuntu Live USB stick, b) installingmdadm
and c) re-assembling the array manually but...I messed up somewhere. Instead, when I re-ran this test with the sleep script (yes, I did start the HOWTO from the top for the nth time...), the system did boot. The arrays were in degraded mode and I could manually re-add the/dev/sdb[23]
partitions without any extra USB stick. I don't know why the sleep script works whereas therootdelay
doesn't. Perhapsmdadm
gets confused by two, slightly out-of-sync component devices, but I thoughtmdadm
was designed to handle that. Anyway, since the sleep script works, I'm sticking to it.It could be argued that removing a perfectly healthy RAID component device, re-booting the RAID to degraded mode and then re-adding the component device is an unrealistic scenario: The realistic scenario is rather that one device fails and is replaced by a new one, leaving less opportunity for
mdadm
to get confused. I agree with that argument. However, I don't know how to test how the system tolerates a hardware failure except to actually disable some hardware! And after testing, I want to get back to a redundant, working system. (Well, I could attach my second SSD to another machine and swipe it before I re-add it, but that's not feasible.)In summary: To my knowledge, the
rootdelay
solution is clean, faster than the sleep script for non-degraded boots, and should work for a real drive failure/replace scenario. However, I don't know a feasible way to test it. So, for the time being, I will stick to the ugly sleep script.My suggestion is for Debian OS, but I think it would work also for Ubuntu and others.
One possible way to solve a problem that occurs with lot of motherboards not correctly handling the UEFI entries (Debian doesn't boot even if you made the correct entry
efibootmgr -c -g -d /dev/sda -p 1 -w -L "debian" -l /EFI/debian/grubx64.efi
, UEFI BIOS shows a "debian" bootable disk but it wouldn't boot from it) , is to use instead the generic entry/boot/efi/EFI/boot/bootx4.efi
.For example Asus Z87C doesn't like
/EFI/debian/grubx64.efi
.So, if you mounted the efi partition
/dev/sda1
to/boot/efi
path:Then reboot.
UEFI BIOS will see a "UEFI OS" generic disk, and also any other entry previously created with efibootmgr, but it would boot from the "UEFI OS" generic without any trouble.