smartmontools - O teste automático é o mesmo que executar um teste curto?

Question

Zhro

Asked: 2018-11-23 12:50:17 +0800 CST2018-11-23 12:50:17 +0800 CST 2018-11-23 12:50:17 +0800 CST

O que está causando a corrupção da minha partição EFI ao inicializar o CentOS após a instalação?

772

Estou tentando instalar o CentOS 7.5 x64 mais recente no PC de fator de forma pequeno ASUS Eee Box EB1037 . É um Intel Celeron J1900 (Bay Trail) com uma NVIDIA GeForce GT 820M integrada. A mídia de instalação será bloqueada, a menos que o Nouveau seja desabilitado primeiro. Isto é bom. Mas após a instalação e as reinicializações subsequentes, a partição EFI parece corrompida.

Esta pergunta NÃO é sobre como solucionar problemas de inicialização, mas sim entender por que exatamente essa falha de inicialização está corrompendo a partição EFI e causando falha no GRUB.

Aqui está o procedimento de instalação:

CentOS 7.5 queimado para USB
Boot to USB installer (grub bootloader)
Edite a opção grub para adicionar "nouveau.modeset=0"

Definir fuso horário
Seleção de software: instalação mínima (sem alterações)
Rede e nome do host: Definir nome do host
Definir partições manuais como "partições padrão" (sem LVM) e layout de partição automático

A instalação continua
Definir senha root e conta de usuário (como administrador)

Instalação concluída
Reinício
O GRUB do disco rígido aparece

Não alterei nenhuma das configurações do GRUB (como desabilitar o Nouveau). Veja as configurações padrão aqui:

Tentei inicializar o CentOS com esses padrões e travou como esperado (já que não desabilitei o Nouveau). Tudo o que eu podia ver era uma tela preta. O monitor estava ligado, mas os indicadores do teclado e a luz de fundo, bem como o LED do mouse óptico, estavam todos desligados. O teclado não era responsável por ctrl-alt-del.

Performed a hard reset by holding the power button. System booted up to the hard disk GRUB menu a second time with no problems. Tried to boot using defaults again and it locked the same as before (as expected, as I still haven't disabled Nouveau).

Note that I still have the CentOS USB installer inserted. Upon this THIRD reboot (after the previous two post-install reboots), the system takes me to the USB GRUB instead of the hard disk one. Odd. Popped out the CentOS USB and rebooted with ctrl-alt-del.

Now I see a message from GRUB flash on the screen briefly indicated thing it cannot read the EFI partition:

After a moment it disappears and I see this:

The system is now no longer bootable to the EFI partition.

Why is this happening? How is the EFI partition corrupting?

Additional Information

Secure Boot is Enabled in the BIOS and cannot be disabled but is set to "Other OS".

There is only ONE SATA port inside the unit and it is populated by a Samsung 850 Pro 500GB SSD. Despite being set to AHCI and visible as SATA1 and the only disk connected to the system, CentOS identifies it as sdb instead of sda, possibly because it thinks that the USB install media is sda. It does not present the USB drive as a second disk during installation, however, and displays the Samsung SSD as the only visible drive.

GRUB sees the attached CentOS install USB media as (hd0) and the onboard SATA as (hd1) when both as inserted. The onboard SATA is seen as (hd0) when the USB media is removed. Interestingly, the onboard SATA is seen as sd by the CentOS installer but hd by GRUB.

Highlights

System has an Nvidia graphics processor (Optimus?)
Secure Boot is ENABLED (cannot be disabled)
BIOS presents USB disks as attached SATA disks? (sda during installation, hd0 in GRUB)

PLEASE NOTE

I can already get the system to boot by removing the USB stick after installation, setting nouveau.modeset=0 and updating GRUB afterwards at /boot/efi/EFI/centos/grub.cfg.

The question is to understand what is corrupting the EFI partition!

Photo of the system booted:

1 respostas

Voted

telcoM · Answer 1 · 2018-11-23T23:10:23+08:00

The name \EFI\BOOT\grubx64.efi tells me the system is not using the CentOS default UEFI boot path, but the fallback one. But the fallback boot path is \EFI\BOOT\bootx64.efi, which would be occupied by the SecureBoot shim. So it would seem the shim is loaded, but it is failing to perform the next step: the loading of the actual GRUB bootloader from the fallback directory.

My theory:

the installation set up the bootloader in the usual fashion: \EFI\CentOS\shimx64.efi is the SecureBoot shim bootloader, and \EFI\CentOS\grubx64.efi is the actual GRUB bootloader. The path \EFI\CentOS\shimx64.efi was registered into UEFI NVRAM boot variables. The installer also (attempted to) set up a second copy with shim in the default fallback/removable media boot path \EFI\BOOT\bootx64.efi and GRUB as \EFI\BOOT\grubx64.efi.
in the first reboot that was triggered by the installer, the NVRAM boot variables were intact and the firmware executed a "warm reboot", booting the kernel successfully using \EFI\CentOS\shimx64.efi and \EFI\CentOS\grubx64.efi. This boot attempt then resulted in a hang because Nouveau was not disabled.
Then, something caused the firmware to forget the NVRAM boot variables, causing the system to attempt a boot from the fallback path \EFI\BOOT\bootx64.efi instead. That happens when you tell UEFI to boot from a specific disk but don't specify a bootloader path. For some reason or another, this allows the fallback copy of the SecureBoot shim to be loaded, but then fails in loading \EFI\BOOT\grubx64.efi. Note that it doesn't say the file is corrupted: it is saying that the file just does not exist.

Now, you should probably use efibootmgr -v to view your UEFI boot variables as they exist now, and write down the current set-up, or at least the CentOS boot entry, so that you will be able to reproduce it if it is lost ever again. In that situation, you might either boot into rescue mode from CentOS installation media and use the efibootmgr command to fix the NVRAM variables, or perhaps just type in the correct settings using the UEFI "boot settings" menu, if it allows that. (Sadly, most UEFI implementations I've seen won't.)

You should also verify that the fallback GRUB bootloader is intact. The file should be accessible as /boot/efi/EFI/BOOT/grubx64.efi in Linux. Verify that the file exists and is identical to /boot/efi/EFI/CentOS/grubx64.efi.

I don't really know what caused the UEFI NVRAM boot variables to be lost between the first reboot and the third one. There are various buggy UEFI implementations out there. Or did you perhaps reset the "BIOS settings" as part of troubleshooting the hang that turned out to be caused by Nouveau? Resetting the UEFI "BIOS settings" may or may not reset the NVRAM boot variables too, depending on UEFI implementation.

If it turns out the occasional loss of UEFI NVRAM boot variables is a firmware bug, you might check for a BIOS upgrade: run dmidecode -s bios-version to see the current version. According to ASUS support pages, the most up-to-date UEFI BIOS for your system is version 1301. ASUS typically includes an update feature into the UEFI BIOS itself; if that's true on your system, you just need to save the update file onto the EFI system partition (= anywhere under /boot/efi in CentOS), go to BIOS settings, activate the update tool from there, and tell it where the update file is.

One possible reason for NVRAM corruption is the efi-pstore kernel module. If it is enabled (or built into the CentOS standard kernel) and the feature to store kernel log into pstore on a kernel panic is active, this may have filled the NVRAM to 100% with a series of variables containing the kernel log. This might have caused the firmware to detect the variable storage as corrupt and reinitialized the NVRAM boot variables automatically.

Se o fallback /boot/efi/EFI/BOOT/grubx64.efirealmente não foi danificado, a falha ao inicializar a partir do caminho de fallback pode ter sido causada por um bug no shim do SecureBoot ou pela aplicação excessivamente zelosa do Secure Boot no caminho de inicialização do fallback do HDD (tecnicamente, um recurso não documentado de ~~bug~~ do firmware UEFI que o torna incompatível com o shim SecureBoot). Uma atualização do shim do SecureBoot pode ajudar nesse caso.

O que está causando a corrupção da minha partição EFI ao inicializar o CentOS após a instalação?

Additional Information

Highlights

PLEASE NOTE

Como exportar uma chave privada GPG e uma chave pública para um arquivo

ssh Não é possível negociar: "nenhuma cifra correspondente encontrada", está rejeitando o cbc

Como podemos executar um comando armazenado em uma variável?

Como configurar o systemd-resolved e o systemd-networkd para usar o servidor DNS local para resolver domínios locais e o servidor DNS remoto para domínios remotos?

Como descarregar o módulo do kernel 'nvidia-drm'?

apt-get update error no Kali Linux após a atualização do dist [duplicado]

Como ver as últimas linhas x do log de serviço systemctl

Nano - pule para o final do arquivo

erro grub: você precisa carregar o kernel primeiro

Como baixar o pacote não instalá-lo com o comando apt-get?

O que está causando a corrupção da minha partição EFI ao inicializar o CentOS após a instalação?

Additional Information

Highlights

PLEASE NOTE

1 respostas

relate perguntas