Estou tentando descobrir por que o WiFi no meu Intel NUC com CentOS 7 continua morrendo. Como informação, eu tenho um cluster Hadoop de 5 nós e todos eles estão configurados da mesma forma (até onde eu sei), no entanto, as outras máquinas que estão em WiFi não travam. Não sei o que há de errado com esta máquina em particular.
Aqui está o erro de /var/log/messages
. É a mesma mensagem que vejo regularmente, pois tenho observado esse problema há dias.
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: Start IWL Error Log Dump:
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: Status: 0x00000100, count: 6
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: Loaded firmware version: 34.0.1
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x000022CE | ADVANCED_SYSASSERT
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x05900280 | trm_hw_status0
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | trm_hw_status1
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00023FDC | branchlink2
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x0003915A | interruptlink1
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | interruptlink2
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x0000012C | data1
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x03830000 | data2
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xDEADBEEF | data3
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xD28011F1 | beacon time
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x72F4FDDD | tsf low
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000182 | tsf hi
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | time gp1
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xCA511FA7 | time gp2
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000001 | uCode revision type
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000022 | uCode version major
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | uCode version minor
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000230 | hw version
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00C89000 | board version
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x0A96001C | hcmd
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xA7F93882 | isr0
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00050000 | isr1
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x0020180A | isr2
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x40417DCD | isr3
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | isr4
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x0A95001C | last cmd Id
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | wait_event
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00004288 | l2p_control
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00018024 | l2p_duration
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | l2p_mhvalid
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x000000EF | l2p_addr_match
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x0000000D | lmpm_pmg_sel
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x30101345 | timestamp
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00007888 | flow_handler
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: Start IWL Error Log Dump:
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: Status: 0x00000100, count: 7
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000070 | ADVANCED_SYSASSERT
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | umac branchlink1
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xC0086964 | umac branchlink2
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xC0083A94 | umac interruptlink1
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xC0083A94 | umac interruptlink2
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000800 | umac data1
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xC0083A94 | umac data2
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xDEADBEEF | umac data3
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000022 | umac major
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | umac minor
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xC088628C | frame pointer
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xC088628C | stack pointer
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00DF019C | last host cmd
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | isr status reg
Jan 2 08:41:06 mapr04 kernel: ieee80211 phy0: Hardware restart was requested
Jan 2 08:41:06 mapr04 kernel: iwlwifi 0000:3a:00.0: FW error in SYNC CMD STATISTICS_CMD
Jan 2 08:41:06 mapr04 kernel: CPU: 0 PID: 4898 Comm: NetworkManager Kdump: loaded Not tainted 3.10.0-957.1.3.el7.x86_64 #1
Jan 2 08:41:06 mapr04 kernel: Hardware name: Intel Corporation NUC7i7BNH/NUC7i7BNB, BIOS BNKBL357.86A.0049.2017.0724.1541 07/24/2017
Jan 2 08:41:06 mapr04 kernel: Call Trace:
Jan 2 08:41:06 mapr04 kernel: [<ffffffffaeb61e41>] dump_stack+0x19/0x1b
Jan 2 08:41:06 mapr04 kernel: [<ffffffffc0afa983>] iwl_trans_pcie_send_hcmd+0x563/0x580 [iwlwifi]
Jan 2 08:41:06 mapr04 kernel: [<ffffffffae4c2d00>] ? wake_up_atomic_t+0x30/0x30
Jan 2 08:41:06 mapr04 kernel: [<ffffffffc0b060fc>] iwl_trans_send_cmd+0x5c/0xe0 [iwlwifi]
Jan 2 08:41:06 mapr04 kernel: [<ffffffffc0c6d312>] iwl_mvm_send_cmd+0x32/0xb0 [iwlmvm]
Jan 2 08:41:06 mapr04 kernel: [<ffffffffc0c6e632>] iwl_mvm_request_statistics+0x72/0x100 [iwlmvm]
Jan 2 08:41:06 mapr04 kernel: [<ffffffffc0c616fe>] iwl_mvm_mac_sta_statistics+0xbe/0x100 [iwlmvm]
Jan 2 08:41:06 mapr04 kernel: [<ffffffffc0bb68f7>] sta_set_sinfo+0xb7/0x800 [mac80211]
Jan 2 08:41:06 mapr04 kernel: [<ffffffffc0bcd052>] ieee80211_get_station+0x52/0x80 [mac80211]
Jan 2 08:41:06 mapr04 kernel: [<ffffffffc08cae41>] nl80211_get_station+0xa1/0x240 [cfg80211]
Jan 2 08:41:06 mapr04 kernel: [<ffffffffae794d0d>] ? list_del+0xd/0x30
Jan 2 08:41:06 mapr04 kernel: [<ffffffffae5bdf1a>] ? __rmqueue+0x8a/0x460
Jan 2 08:41:06 mapr04 kernel: [<ffffffffaea77918>] genl_family_rcv_msg+0x208/0x430
Jan 2 08:41:06 mapr04 kernel: [<ffffffffae5bf134>] ? free_one_page+0x2e4/0x310
Jan 2 08:41:06 mapr04 kernel: [<ffffffffaea77b9b>] genl_rcv_msg+0x5b/0xc0
Jan 2 08:41:06 mapr04 kernel: [<ffffffffaea73ec0>] ? __netlink_lookup+0xc0/0x110
Jan 2 08:41:06 mapr04 kernel: [<ffffffffaea77b40>] ? genl_family_rcv_msg+0x430/0x430
Jan 2 08:41:06 mapr04 kernel: [<ffffffffaea75bab>] netlink_rcv_skb+0xab/0xc0
Jan 2 08:41:06 mapr04 kernel: [<ffffffffaea760e8>] genl_rcv+0x28/0x40
Jan 2 08:41:06 mapr04 kernel: [<ffffffffaea75530>] netlink_unicast+0x170/0x210
Jan 2 08:41:06 mapr04 kernel: [<ffffffffae78c042>] ? memcpy_fromiovec+0x62/0xb0
Jan 2 08:41:06 mapr04 kernel: [<ffffffffaea758d8>] netlink_sendmsg+0x308/0x420
Jan 2 08:41:06 mapr04 kernel: [<ffffffffaea73112>] ? netlink_recvmsg+0x212/0x490
Jan 2 08:41:06 mapr04 kernel: [<ffffffffaea193a6>] sock_sendmsg+0xb6/0xf0
Jan 2 08:41:06 mapr04 kernel: [<ffffffffaea194f5>] ? sock_recvmsg+0xc5/0x100
Jan 2 08:41:06 mapr04 kernel: [<ffffffffaea1a269>] ___sys_sendmsg+0x3e9/0x400
Jan 2 08:41:06 mapr04 kernel: [<ffffffffae656fe0>] ? __pollwait+0xf0/0xf0
Jan 2 08:41:06 mapr04 kernel: [<ffffffffae68ee1e>] ? ep_poll+0x31e/0x360
Jan 2 08:41:06 mapr04 kernel: [<ffffffffaea1b921>] __sys_sendmsg+0x51/0x90
Jan 2 08:41:06 mapr04 kernel: [<ffffffffaea1b972>] SyS_sendmsg+0x12/0x20
Jan 2 08:41:06 mapr04 kernel: [<ffffffffaeb74ddb>] system_call_fastpath+0x22/0x27
Por onde devo começar a tentar depurar? Posso editar o post original com atualizações.
Aqui estão algumas coisas que eu acho que podem ser úteis:
uname -a
:
Linux mapr04.wired.carnoustie 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 29 14:49:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
dmesg | grep iwlwifi
:
[ 3.822041] iwlwifi 0000:3a:00.0: irq 132 for MSI/MSI-X
[ 3.831295] iwlwifi 0000:3a:00.0: loaded firmware version 34.0.1 op_mode iwlmvm
[ 3.924043] iwlwifi 0000:3a:00.0: Detected Intel(R) Dual Band Wireless AC 8265, REV=0x230
[ 3.984049] iwlwifi 0000:3a:00.0: base HW address: f8:94:c2:5c:07:24
Aqui está o erro mais recente:
Here is the latest error:
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: Error sending STATISTICS_CMD: time out after 2000ms.
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: Current CMD queue read_ptr 246 write_ptr 247
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: Start IWL Error Log Dump:
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: Status: 0x00000100, count: 6
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: Loaded firmware version: 34.0.1
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000280 | trm_hw_status0
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | trm_hw_status1
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00023FDC | branchlink2
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x0003915A | interruptlink1
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x0003915A | interruptlink2
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | data1
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000080 | data2
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x03830000 | data3
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xD4C029D9 | beacon time
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x456535F1 | tsf low
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x000001BA | tsf hi
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | time gp1
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x6D3BFE27 | time gp2
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000001 | uCode revision type
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000022 | uCode version major
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | uCode version minor
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000230 | hw version
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00C89000 | board version
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x0000001C | hcmd
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00012000 | isr0
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | isr1
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x0000180A | isr2
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00417CC0 | isr3
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | isr4
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x0A89001C | last cmd Id
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | wait_event
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00004288 | l2p_control
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00018024 | l2p_duration
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | l2p_mhvalid
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x000000EF | l2p_addr_match
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x0000000D | lmpm_pmg_sel
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x30101345 | timestamp
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00002838 | flow_handler
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: Start IWL Error Log Dump:
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: Status: 0x00000100, count: 7
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000070 | ADVANCED_SYSASSERT
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | umac branchlink1
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xC0086964 | umac branchlink2
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xC0083A94 | umac interruptlink1
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xC0083A94 | umac interruptlink2
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000800 | umac data1
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xC0083A94 | umac data2
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xDEADBEEF | umac data3
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000022 | umac major
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | umac minor
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xC088628C | frame pointer
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0xC088628C | stack pointer
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00F6019C | last host cmd
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: 0x00000000 | isr status reg
Jan 5 03:17:01 mapr04 kernel: ieee80211 phy0: Hardware restart was requested
Jan 5 03:17:01 mapr04 kernel: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.
Parece que os drivers wifi não podem gerenciar o hardware wifi em seus NUCs.
Várias distribuições Linux podem ser testadas ao vivo sem instalação. Eu acho que os NUCs têm Intel wifi, que deve funcionar com drivers linux embutidos, mas eles devem ser novos o suficiente.
Eu tenho um NUC com hardware Intel de 6ª geração. Percebi que versões mais antigas de sistemas operacionais não podem gerenciar o hardware wifi, mas as versões mais recentes o gerenciam sem nenhum ajuste, 'fora da caixa'.
CentOS-7-x86_64-LiveGNOME-1810.iso
o executei ao vivo e ele pode gerenciar o hardware com e sem fio do meu NUC6i3SYH. Foi iniciado tão facilmente quanto com o Ubuntu 18.04.1 LTS. Mas eu não testei a estabilidade durante muito tempo.Edição 3: Você deve considerar que o hardware pode estar danificado (por exemplo, falhando ao aquecer). Mas se funcionar bem com outro sistema operacional, você pode concluir que o hardware é bom.
Quando seu hardware NUC foi desenvolvido e quando o software CentOS 7 foi desenvolvido?
O Centos 7 possui uma série de kernel antiga, 3.10; a versão do kernel no sistema ativo '1810' é
3.10.0-957.el7.x86_64 #1 SMP
. O Ubuntu 18.04.1 live possui a versão do kernel 4.15.0-29 e um sistema instalado atualizado possui a versão 4.15.0-43.