smartmontools - 自动测试是否与运行简短测试相同？

Question

Zhro

Asked: 2018-11-23 12:50:17 +0800 CST2018-11-23 12:50:17 +0800 CST 2018-11-23 12:50:17 +0800 CST

安装后启动 CentOS 时，是什么导致我的 EFI 分区损坏？

772

我正在尝试在小型 PC ASUS Eee Box EB1037上安装最新的 CentOS 7.5 x64 。它是带有板载 NVIDIA GeForce GT 820M的Intel Celeron J1900 （Bay Trail）。除非首先禁用 Nouveau，否则安装媒体将锁定。这可以。但在安装和随后的重新启动后，EFI 分区似乎已损坏。

这个问题不是关于如何引导故障排除，而是理解为什么这个引导失败会破坏 EFI 分区并导致 GRUB 失败。

下面是安装过程：

将 CentOS 7.5 刻录到 USB
引导至 USB 安装程序（grub 引导加载程序）
编辑 grub 选项以添加“nouveau.modeset=0”

设置时区
软件选择：最小安装（无更改）
网络和主机名：设置主机名
将手动分区设置为“标准分区”（无 LVM）和自动分区布局

安装继续
设置root密码和用户帐户（作为管理员）

安装完成
重启
硬盘GRUB出现

我没有更改任何 GRUB 设置（例如禁用 Nouveau）。在此处查看默认设置：

尝试使用这些默认值启动 CentOS，它按预期挂起（因为我没有禁用 Nouveau）。我所能看到的只是一个黑屏。显示器已打开，但键盘指示灯和背光以及光学鼠标 LED 均已关闭。键盘对 ctrl-alt-del 不负责任。

按住电源按钮执行硬重置。系统第二次启动到硬盘 GRUB 菜单，没有问题。尝试再次使用默认值启动，它锁定和以前一样（正如预期的那样，因为我还没有禁用 Nouveau）。

请注意，我仍然插入了 CentOS USB 安装程序。在第三次重新启动后（在前两次安装后重新启动之后），系统将我带到 USB GRUB 而不是硬盘之一。奇怪的。弹出 CentOS USB 并使用 ctrl-alt-del 重新启动。

现在我在屏幕上看到一条来自 GRUB flash 的消息，简要指出它无法读取 EFI 分区：

片刻之后它消失了，我看到了这个：

系统现在不再可引导至 EFI 分区。

为什么会这样？EFI 分区是如何损坏的？

附加信息

Secure Boot is Enabled in the BIOS and cannot be disabled but is set to "Other OS".

There is only ONE SATA port inside the unit and it is populated by a Samsung 850 Pro 500GB SSD. Despite being set to AHCI and visible as SATA1 and the only disk connected to the system, CentOS identifies it as sdb instead of sda, possibly because it thinks that the USB install media is sda. It does not present the USB drive as a second disk during installation, however, and displays the Samsung SSD as the only visible drive.

GRUB sees the attached CentOS install USB media as (hd0) and the onboard SATA as (hd1) when both as inserted. The onboard SATA is seen as (hd0) when the USB media is removed. Interestingly, the onboard SATA is seen as sd by the CentOS installer but hd by GRUB.

Highlights

System has an Nvidia graphics processor (Optimus?)
Secure Boot is ENABLED (cannot be disabled)
BIOS presents USB disks as attached SATA disks? (sda during installation, hd0 in GRUB)

PLEASE NOTE

I can already get the system to boot by removing the USB stick after installation, setting nouveau.modeset=0 and updating GRUB afterwards at /boot/efi/EFI/centos/grub.cfg.

The question is to understand what is corrupting the EFI partition!

Photo of the system booted:

1 个回答

Voted

telcoM · Answer 1 · 2018-11-23T23:10:23+08:00

The name \EFI\BOOT\grubx64.efi tells me the system is not using the CentOS default UEFI boot path, but the fallback one. But the fallback boot path is \EFI\BOOT\bootx64.efi, which would be occupied by the SecureBoot shim. So it would seem the shim is loaded, but it is failing to perform the next step: the loading of the actual GRUB bootloader from the fallback directory.

My theory:

the installation set up the bootloader in the usual fashion: \EFI\CentOS\shimx64.efi is the SecureBoot shim bootloader, and \EFI\CentOS\grubx64.efi is the actual GRUB bootloader. The path \EFI\CentOS\shimx64.efi was registered into UEFI NVRAM boot variables. The installer also (attempted to) set up a second copy with shim in the default fallback/removable media boot path \EFI\BOOT\bootx64.efi and GRUB as \EFI\BOOT\grubx64.efi.
in the first reboot that was triggered by the installer, the NVRAM boot variables were intact and the firmware executed a "warm reboot", booting the kernel successfully using \EFI\CentOS\shimx64.efi and \EFI\CentOS\grubx64.efi. This boot attempt then resulted in a hang because Nouveau was not disabled.
Then, something caused the firmware to forget the NVRAM boot variables, causing the system to attempt a boot from the fallback path \EFI\BOOT\bootx64.efi instead. That happens when you tell UEFI to boot from a specific disk but don't specify a bootloader path. For some reason or another, this allows the fallback copy of the SecureBoot shim to be loaded, but then fails in loading \EFI\BOOT\grubx64.efi. Note that it doesn't say the file is corrupted: it is saying that the file just does not exist.

Now, you should probably use efibootmgr -v to view your UEFI boot variables as they exist now, and write down the current set-up, or at least the CentOS boot entry, so that you will be able to reproduce it if it is lost ever again. In that situation, you might either boot into rescue mode from CentOS installation media and use the efibootmgr command to fix the NVRAM variables, or perhaps just type in the correct settings using the UEFI "boot settings" menu, if it allows that. (Sadly, most UEFI implementations I've seen won't.)

You should also verify that the fallback GRUB bootloader is intact. The file should be accessible as /boot/efi/EFI/BOOT/grubx64.efi in Linux. Verify that the file exists and is identical to /boot/efi/EFI/CentOS/grubx64.efi.

I don't really know what caused the UEFI NVRAM boot variables to be lost between the first reboot and the third one. There are various buggy UEFI implementations out there. Or did you perhaps reset the "BIOS settings" as part of troubleshooting the hang that turned out to be caused by Nouveau? Resetting the UEFI "BIOS settings" may or may not reset the NVRAM boot variables too, depending on UEFI implementation.

If it turns out the occasional loss of UEFI NVRAM boot variables is a firmware bug, you might check for a BIOS upgrade: run dmidecode -s bios-version to see the current version. According to ASUS support pages, the most up-to-date UEFI BIOS for your system is version 1301. ASUS typically includes an update feature into the UEFI BIOS itself; if that's true on your system, you just need to save the update file onto the EFI system partition (= anywhere under /boot/efi in CentOS), go to BIOS settings, activate the update tool from there, and tell it where the update file is.

One possible reason for NVRAM corruption is the efi-pstore kernel module. If it is enabled (or built into the CentOS standard kernel) and the feature to store kernel log into pstore on a kernel panic is active, this may have filled the NVRAM to 100% with a series of variables containing the kernel log. This might have caused the firmware to detect the variable storage as corrupt and reinitialized the NVRAM boot variables automatically.

如果回退/boot/efi/EFI/BOOT/grubx64.efi实际上没有损坏，则无法从回退路径引导可能是由 SecureBoot shim 中的错误引起的，或者是由于在 HDD 回退引导路径中过度执行安全启动（技术上是 UEFI 固件错误未记录的功能）这使其与 SecureBoot 垫片不兼容）。在这种情况下，更新 SecureBoot 垫片可能会有所帮助。

安装后启动 CentOS 时，是什么导致我的 EFI 分区损坏？

附加信息

Highlights

PLEASE NOTE

如何将 GPG 私钥和公钥导出到文件

ssh 无法协商：“找不到匹配的密码”，正在拒绝 cbc

我们如何运行存储在变量中的命令？

如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域？

如何卸载内核模块“nvidia-drm”？

dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

如何从 systemctl 服务日志中查看最新的 x 行

Nano - 跳转到文件末尾

grub 错误：你需要先加载内核

如何下载软件包而不是使用 apt-get 命令安装它？

安装后启动 CentOS 时，是什么导致我的 EFI 分区损坏？

附加信息

Highlights

PLEASE NOTE

1 个回答

相关问题