Ubuntu 24 应该对 AMD XDNA 和 SoC(单芯片 CPU+GPU+NPU)提供良好的支持,例如 AMD 8945HS 处理器。我需要 AMD XDNA 驱动程序,但在 apt 等工具中没有找到相应的软件包。如何在我的 Ubuntu 服务器上安装它?amdgpu-installer 解决了我启用 GPU 时遇到的所有问题。有没有类似的 NPU 安装程序?
root@gpt:/home# lshw -C display
*-display
description: VGA compatible controller
product: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:01:00.0
version: c7
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
configuration: driver=amdgpu latency=0
resources: irq:41 memory:c0000000-cfffffff memory:d0000000-d01fffff ioport:e000(size=256) memory:fea00000-fea3ffff memory:c0000-dffff
root@gpt:/home# lspci -nnk | grep -i vga -A3
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev c7)
Subsystem: Gigabyte Technology Co., Ltd Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1458:22df]
Kernel driver in use: amdgpu
Kernel modules: amdgpu
root@gpt:/home# apt install amdgpu-dkms
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
dctrl-tools dkms
Suggested packages:
debtags menu
The following NEW packages will be installed:
amdgpu-dkms dctrl-tools dkms
0 upgraded, 3 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/10.3 MB of archives.
After this operation, 444 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Selecting previously unselected package dctrl-tools.
(Reading database ... 90817 files and directories currently installed.)
Preparing to unpack .../dctrl-tools_2.24-3_amd64.deb ...
Unpacking dctrl-tools (2.24-3) ...
Selecting previously unselected package dkms.
Preparing to unpack .../dkms_2.8.1-5ubuntu2_all.deb ...
Unpacking dkms (2.8.1-5ubuntu2) ...
Selecting previously unselected package amdgpu-dkms.
Preparing to unpack .../amdgpu-dkms_1%3a6.2.4.50700-1646729.20.04_all.deb ...
Unpacking amdgpu-dkms (1:6.2.4.50700-1646729.20.04) ...
Setting up dctrl-tools (2.24-3) ...
Setting up dkms (2.8.1-5ubuntu2) ...
Setting up amdgpu-dkms (1:6.2.4.50700-1646729.20.04) ...
Loading new amdgpu-6.2.4-1646729.20.04 DKMS files...
Building for 5.4.0-166-generic
Building for architecture x86_64
Building initial module for 5.4.0-166-generic
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/amdgpu-dkms.0.crash'
Error! Bad return status for module build on kernel: 5.4.0-166-generic (x86_64)
Consult /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/make.log for more information.
dpkg: error processing package amdgpu-dkms (--configure):
installed amdgpu-dkms package post-installation script subprocess returned error exit status 10
Processing triggers for man-db (2.9.1-1) ...
Errors were encountered while processing:
amdgpu-dkms
E: Sub-process /usr/bin/dpkg returned an error code (1)
make[1]: *** [scripts/Makefile.build:520: /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdgpu] Error 2
为了满足字符限制,我必须缩写日志的部分内容。用“...”表示的缩写
root@gpt:/home# cat /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/make.log
DKMS make.log for amdgpu-6.2.4-1646729.20.04 for kernel 5.4.0-166-generic (x86_64)
Tue 14 Nov 2023 11:58:15 PM UTC
make: Entering directory '/usr/src/linux-headers-5.4.0-166-generic'
/var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/Makefile:219: "The local C standard(gnu89) doesn't match kernel default C standard(gnu11/gnu99)"
...
In file included from /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdxcp/backport/include/kcl/kcl_drm_drv.h:29,
from /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdxcp/backport/backport.h:1,
from <command-line>:
./include/drm/drm_drv.h:784:23: note: expected ‘struct drm_driver *’ but argument is of type ‘const struct drm_driver *’
784 | struct drm_driver *driver,
| ~~~~~~~~~~~~~~~~~~~^~~~~~
make[2]: *** [scripts/Makefile.build:270: /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdxcp/./backport/kcl_drm_drv.o] Error 1
make[2]: *** Waiting for unfinished jobs....
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/ttm/ttm_bo_util.o
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdkcl/kcl_kernel_params.o
make[1]: *** [scripts/Makefile.build:520: /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdxcp] Error 2
make[1]: *** Waiting for unfinished jobs....
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/scheduler/sched_fence.o
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdkcl/kcl_dma-resv.o
/var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdgpu/amdgpu_drv.c:2379:13: warning: ‘amdgpu_driver_release’ defined but not used [-Wunused-function]
2379 | static void amdgpu_driver_release(struct drm_device *ddev)
| ^~~~~~~~~~~~~~~~~~~~~
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/ttm/ttm_bo_vm.o
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/ttm/ttm_module.o
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdgpu/amdgpu_device.o
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/ttm/ttm_execbuf_util.o
/var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/ttm/ttm_bo_vm.c: In function ‘amdttm_bo_vm_dummy_page’:
/var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/ttm/ttm_bo_vm.c:319:21: warning: unused variable ‘ddev’ [-Wunused-variable]
319 | struct drm_device *ddev = bo->base.dev;
| ^~~~
...
In file included from ./include/drm/drm_modes.h:33,
from ./include/drm/drm_crtc.h:40,
from /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/include/kcl/kcl_drm_connector.h:25,
from /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdkcl/kcl_drm_connector.c:22:
./include/drm/drm_connector.h:1526:5: note: previous declaration of ‘drm_mode_create_colorspace_property’ was here
1526 | int drm_mode_create_colorspace_property(struct drm_connector *connector);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
make[2]: *** [scripts/Makefile.build:270: /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdkcl/kcl_drm_connector.o] Error 1
make[2]: *** Waiting for unfinished jobs....
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdgpu/amdgpu_benchmark.o
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdgpu/atombios_dp.o
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdgpu/amdgpu_afmt.o
make[1]: *** [scripts/Makefile.build:520: /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdkcl] Error 2
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdgpu/amdgpu_trace_points.o
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdgpu/atombios_encoders.o
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdgpu/amdgpu_sa.o
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdgpu/atombios_i2c.o
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdgpu/amdgpu_dma_buf.o
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdgpu/amdgpu_vm.o
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdgpu/amdgpu_vm_pt.o
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdgpu/amdgpu_ib.o
CC [M] /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdgpu/amdgpu_pll.o
/var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build/amd/amdgpu/amdgpu_dma_buf.c: In function ‘amdgpu_dma_buf_map_detach’:
...
make: *** [Makefile:1778: /var/lib/dkms/amdgpu/6.2.4-1646729.20.04/build] Error 2
make: Leaving directory '/usr/src/linux-headers-5.4.0-166-generic'
当尝试amdgpu
为我的 R9 270x 启用驱动程序时,使用:
grep amdgpu /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="[truncated] radeon.cik_support=0 radeon.si_support=0 amdgpu.cik_support=1 amdgpu.si_support=1"
gandalf@hans-desktop ~
它实际上从未加载过:
sudo lspci -k |grep amdgpu
Kernel modules: radeon, amdgpu
而我们预期:
sudo lspci -k |grep amdgpu
Kernel driver in use: amdgpu
Kernel modules: radeon, amdgpu
如何在 ubuntu 20.04 上启用 amdgpu?
尝试rock-dkms
为 AMD ROCm 安装 KFD 驱动程序时出现以下错误:
$ sudo apt install rock-dkms
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
rock-dkms-firmware
The following NEW packages will be installed:
rock-dkms rock-dkms-firmware
0 upgraded, 2 newly installed, 0 to remove and 14 not upgraded.
Need to get 0 B/11.6 MB of archives.
After this operation, 243 MB of additional disk space will be used.
Do you want to continue? [Y/n]
Selecting previously unselected package rock-dkms-firmware.
(Reading database ... 304692 files and directories currently installed.)
Preparing to unpack .../rock-dkms-firmware_1%3a3.10-27_all.deb ...
Unpacking rock-dkms-firmware (1:3.10-27) ...
Setting up rock-dkms-firmware (1:3.10-27) ...
Selecting previously unselected package rock-dkms.
(Reading database ... 305096 files and directories currently installed.)
Preparing to unpack .../rock-dkms_1%3a3.10-27_all.deb ...
Unpacking rock-dkms (1:3.10-27) ...
Setting up rock-dkms (1:3.10-27) ...
Loading new amdgpu-3.10-27 DKMS files...
Building for 5.4.0-56-generic
Building for architecture x86_64
Building initial module for 5.4.0-56-generic
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/rock-dkms-firmware.0.crash'
Error! Bad return status for module build on kernel: 5.4.0-56-generic (x86_64)
Consult /var/lib/dkms/amdgpu/3.10-27/build/make.log for more information.
dpkg: error processing package rock-dkms (--configure):
installed rock-dkms package post-installation script subprocess returned error exit status 10
Errors were encountered while processing:
rock-dkms
E: Sub-process /usr/bin/dpkg returned an error code (1)
以下是 的内容/var/lib/dkms/amdgpu/3.10-27/build/make.log
。
我在 Ubuntu 20.04.1 LTS x86_64 上。安装了两个 GPU:AMD Radeon RX Vega 64 和 NVIDIA GeForce GTX 1060 6 GB。
$ uname -a
Linux basecamp 5.4.0-56-generic #62-Ubuntu SMP Mon Nov 23 19:20:19 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ lspci -v | grep VGA
0a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] (rev c1) (prog-if 00 [VGA controller])
0b:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1060 6GB] (rev a1) (prog-if 00 [VGA controller])
任何帮助,将不胜感激。
我之前在 18.04 上问过这个问题,我升级到 20.04 希望它会有所提升。但是我仍然有同样的高功耗问题。
我的笔记本电脑大约运行了大约 3 个小时,当没有任何东西运行时,我可以听到我的 CPU 风扇速度。
Powertop 状态
The battery reports a discharge rate of 351 mW
The power consumed was 7.07 J
The estimated remaining time is 29 hours, 54 minutes
Summary: 1142.5 wakeups/second, 0.0 GPU ops/seconds, 0.0 VFS ops/sec and 12.0% CPU use
Power est. Usage Events/s Category Description
890 mW 19.7 ms/s 195.4 Process [PID 2540] /usr/lib/xorg/Xorg vt2 -displayfd
790 mW 1.2 ms/s 199.6 kWork dbs_work_handler
529 mW 6.3 ms/s 119.3 Process [PID 5179] /usr/lib/firefox/firefox -new-wind
443 mW 1.1 ms/s 111.6 Timer tick_sched_timer
271 mW 16.6 ms/s 55.4 Process [PID 2730] /usr/bin/gnome-shell
250 mW 8.3 ms/s 58.0 Process [PID 12800] /usr/libexec/gnome-terminal-serve
207 mW 1.0 ms/s 51.9 Process [PID 743] [sdma0]
172 mW 3.7 ms/s 42.0 Interrupt [68] amdgpu
161 mW 4.1 ms/s 39.2 Timer hrtimer_wakeup
148 mW 2.6 ms/s 27.6 Process [PID 2579] /usr/lib/xorg/Xorg vt2 -displayfd
130 mW 93.9 µs/s 32.8 kWork flush_to_ldisc
122 mW 19.3 ms/s 23.2 Process [PID 5326] /usr/lib/firefox/firefox -contentp
119 mW 5.1 ms/s 28.0 Process [PID 7877] /home/sachith/tsetup.2.3.2/Telegra
84.7 mW 4.8 ms/s 19.6 Process [PID 12655] /usr/lib/firefox/firefox -content
84.1 mW 1.0 ms/s 20.9 Process [PID 5204] /usr/lib/firefox/firefox -new-wind
49.9 mW 205.9 µs/s 12.6 Process [PID 1] /sbin/init splash
40.3 mW 149.0 µs/s 10.1 kWork psi_avgs_work
TLP 配置:
TLP_ENABLE=1
CPU_SCALING_GOVERNOR_ON_BAT=powersave
CPU_SCALING_MAX_FREQ_ON_AC=0
CPU_SCALING_MIN_FREQ_ON_BAT=0
SCHED_POWERSAVE_ON_BAT=1
RADEON_POWER_PROFILE_ON_BAT=auto
RADEON_DPM_STATE_ON_BAT=battery
RESTORE_THRESHOLDS_ON_BAT="1"
我的硬件规格:
AMD Ryzen 7 PRO 2700U APU Integrated Radeon Vega Graphics
Kernel : 5.4.25-050425-generic
我没有安装任何 AMD-Grapics,因为它们不受官方支持,并且尝试使用 18.04 并且它们无法正常工作。
没有蓝牙连接和WiFi。
编辑:正如sancho所建议的,我觉得这与 Ubuntu 内核或 AMD 固件有关。
我最近购买了带有 Ryzen 5 4600H 和专用 NVIDIA gtx 1650Ti 的 ASUS TUF Gaming FA506IH。我不是一个游戏玩家,所以我安装了 Ubuntu 20.04 并考虑在需要它之前不使用 Nvidia 驱动程序(因为与新内核的兼容性问题),而只使用集成的 Radeon 显卡。但是,我注意到 5.4 内核不支持较新的 Radeon 显卡,所以我安装了主线 5.8 内核,在安装 amdgpu 驱动程序后,它似乎工作正常,但我收到 dkms 错误,说它在 5.8 内核中不受支持. 我也试过 xanmod 内核 5.8,但仍然是同样的问题。这是我在 amdgpu 尝试在此内核上构建其包时收到的错误消息。
Setting up amdgpu-dkms (1:5.6.0.15-1098277) ...
Removing old amdgpu-5.6.0.15-1098277 DKMS files...
------------------------------
Deleting module version: 5.6.0.15-1098277
completely from the DKMS tree.
------------------------------
Done.
Loading new amdgpu-5.6.0.15-1098277 DKMS files...
Building for 5.8.16-xanmod1
Building for architecture x86_64
Building initial module for 5.8.16-xanmod1
ERROR (dkms apport): kernel package linux-headers-5.8.16-xanmod1 is not supported
Error! Bad return status for module build on kernel: 5.8.16-xanmod1 (x86_64)
Consult /var/lib/dkms/amdgpu/5.6.0.15-1098277/build/make.log for more information.
dpkg: error processing package amdgpu-dkms (--configure):
installed amdgpu-dkms package post-installation script subprocess returned error exit status 10
dpkg: dependency problems prevent configuration of amdgpu:
amdgpu depends on amdgpu-dkms (= 1:5.6.0.15-1098277); however:
Package amdgpu-dkms is not configured yet.
dpkg: error processing package amdgpu (--configure):
dependency problems - leaving unconfigured
No apport report written because the error message indicates its a followup error from a previous failure.
Errors were encountered while processing:
amdgpu-dkms
amdgpu
E: Sub-process /usr/bin/dpkg returned an error code (1)
当我尝试在内核 5.8.*-generic 上安装 amdgpu 时出现同样的错误。
我在 19.10 和现在的 20.04 中遇到了这个问题。我在 2020 年 2 月用它构建了这台计算机的 18.04 没有这个问题。我对 20.04 进行了全新安装。简而言之,在 FireFox 中滚动一段时间(几分钟到一小时)后,鼠标变为非活动状态(我可以移动它,但点击不注册),几秒钟后系统变得完全无响应,通常为空白或错误-彩色低分辨率屏幕,需要硬启动才能重置。
通常,这发生在从挂起恢复之后,但也发生在新启动之后(更罕见)。然而,这是一个间歇性问题,我不能确定前提条件是什么。在 FireFox 中滚动似乎或多或少是一个持续的触发器。我的怀疑是恢复或初始化时存在一些竞争条件,导致 amdgpu 驱动程序出现不正确的条件。我通过 syslog 中的错误搜索了这个问题,并遵循了我可以收集到的线索 - 从 AMD 站点重新安装 amdgpu 驱动程序,更新内核(现在为 5.8.1),但没有任何帮助。系统日志错误总是以:
8 月 18 日 21:05:26 mvlLinux-pc 内核:[28611.718399] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]]错误等待围栏超时!
8 月 18 日 21:05:31 mvlLinux-pc 内核:[28611.718497] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]]错误等待栅栏超时!
8 月 18 日 21:05:31 mvlLinux-pc 内核:[28617.360497] [drm:amdgpu_job_timedout [amdgpu]]错误环 gfx 超时,发出 seq=624416,发出 seq=624418
8 月 18 日 21:05:31 mvlLinux-pc 内核:[ 28617.360584] [drm:amdgpu_job_timedout [amdgpu]]错误进程信息:进程 gnome-shell pid 2328 线程 gnome-shel:cs0 pid 2354
8 月 18 日 21:05:31 mvlLinux-pc 内核:[28617.360590] amdgpu 0000:09:00.0: amdgpu:GPU 重置开始!
硬件概要:
主板 Asus PRIME X470-PRO
处理器:AMD Ryzen 5 2600X 六核处理器
视频:Asus Strix Radeon RX570
Ram:CRUCIAL 16 GiB
当然,可以提供更多详细信息。任何建议都欣然接受。我发现最近使用 Linux 太容易崩溃了。
@heynnema
我不认为内存是问题,但这里是:
free -h
total used free shared buff/cache available<br />
Mem: 15Gi 2.7Gi 10Gi 235Mi 2.0Gi 12Gi<br />
Swap: 2.0Gi 0B 2.0Gi
sudo dmidecode -s bios-version
5406
sudo lshw -C memory
*-firmware
description: BIOS
vendor: American Megatrends Inc.
physical id: 0
version: 5406
date: 11/13/2019
size: 64KiB
capacity: 16MiB
capabilities: pci apm upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
*-memory
description: System Memory
physical id: 2e
slot: System board or motherboard
size: 16GiB
*-bank:0
description: [empty]
product: Unknown
vendor: Unknown
physical id: 0
serial: Unknown
slot: DIMM_A1
*-bank:1
description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2400 MHz (0.4 ns)
product: BLS8G4D32AESBK.M8FE1
vendor: CRUCIAL
physical id: 1
serial: E316F686
slot: DIMM_A2
size: 8GiB
width: 64 bits
clock: 2400MHz (0.4ns)
*-bank:2
description: [empty]
product: Unknown
vendor: Unknown
physical id: 2
serial: Unknown
slot: DIMM_B1
*-bank:3
description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2400 MHz (0.4 ns)
product: BLS8G4D32AESBK.M8FE1
vendor: CRUCIAL
physical id: 3
serial: E316E264
slot: DIMM_B2
size: 8GiB
width: 64 bits
clock: 2400MHz (0.4ns)
*-cache:0
description: L1 cache
physical id: 30
slot: L1 - Cache
size: 576KiB
capacity: 576KiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
configuration: level=1
*-cache:1
description: L2 cache
physical id: 31
slot: L2 - Cache
size: 3MiB
capacity: 3MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
configuration: level=2
*-cache:2
description: L3 cache
physical id: 32
slot: L3 - Cache
size: 16MiB
capacity: 16MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
configuration: level=3
@heynnema
在暂停/恢复后添加更多来自冻结的错误消息:
Aug 29 08:36:17 mvlLinux-pc systemd-resolved[830]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Aug 29 08:39:37 mvlLinux-pc kernel: [ 8030.248541] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0
Aug 29 08:39:37 mvlLinux-pc kernel: [ 8030.248550] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
Aug 29 08:39:37 mvlLinux-pc kernel: [ 8030.248553] pcieport 0000:00:03.1: AER: device [1022:1453] error status/mask=00200000/04400000
Aug 29 08:39:37 mvlLinux-pc kernel: [ 8030.248556] pcieport 0000:00:03.1: AER: [21] ACSViol (First)
Aug 29 08:39:37 mvlLinux-pc kernel: [ 8030.248559] amdgpu 0000:09:00.0: AER: can't recover (no error_detected callback)
Aug 29 08:39:37 mvlLinux-pc kernel: [ 8030.248561] snd_hda_intel 0000:09:00.1: AER: can't recover (no error_detected callback)
Aug 29 08:39:37 mvlLinux-pc kernel: [ 8030.248587] pcieport 0000:00:03.1: AER: device recovery failed
Aug 29 08:39:39 mvlLinux-pc kernel: [ 8032.331741] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0
Aug 29 08:39:39 mvlLinux-pc kernel: [ 8032.331751] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
Aug 29 08:39:39 mvlLinux-pc kernel: [ 8032.331756] pcieport 0000:00:03.1: AER: device [1022:1453] error status/mask=00200000/04400000
Aug 29 08:39:39 mvlLinux-pc kernel: [ 8032.331759] pcieport 0000:00:03.1: AER: [21] ACSViol (First)
Aug 29 08:39:39 mvlLinux-pc kernel: [ 8032.331763] amdgpu 0000:09:00.0: AER: can't recover (no error_detected callback)
Aug 29 08:39:39 mvlLinux-pc kernel: [ 8032.331765] snd_hda_intel 0000:09:00.1: AER: can't recover (no error_detected callback)
Aug 29 08:39:39 mvlLinux-pc kernel: [ 8032.331799] pcieport 0000:00:03.1: AER: device recovery failed
Aug 29 08:39:47 mvlLinux-pc kernel: [ 8040.390787] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
Aug 29 08:39:47 mvlLinux-pc kernel: [ 8040.390799] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:49:crtc-1] flip_done timed out
Aug 29 08:39:49 mvlLinux-pc kernel: [ 8042.438900] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=22040, emitted seq=22042
Aug 29 08:39:49 mvlLinux-pc kernel: [ 8042.438988] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Aug 29 08:39:49 mvlLinux-pc kernel: [ 8042.438995] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
Aug 29 08:39:50 mvlLinux-pc kernel: [ 8043.146715] amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Aug 29 08:39:50 mvlLinux-pc kernel: [ 8043.146795] [drm:gfx_v8_0_kcq_disable.isra.0 [amdgpu]] *ERROR* KCQ disable failed
Aug 29 08:39:50 mvlLinux-pc kernel: [ 8043.423697] amdgpu: cp is busy, skip halt cp
Aug 29 08:39:51 mvlLinux-pc kernel: [ 8043.700692] amdgpu: rlc is busy, skip halt rlc
Aug 29 08:39:51 mvlLinux-pc kernel: [ 8043.701711] amdgpu 0000:09:00.0: amdgpu: GPU BACO reset
Aug 29 08:39:51 mvlLinux-pc kernel: [ 8044.346691] amdgpu 0000:09:00.0: amdgpu: GPU reset succeeded, trying to resume
Aug 29 08:39:51 mvlLinux-pc kernel: [ 8044.348500] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Aug 29 08:39:51 mvlLinux-pc kernel: [ 8044.348515] [drm] VRAM is lost due to GPU reset!
Aug 29 08:39:51 mvlLinux-pc kernel: [ 8044.678238] amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
Aug 29 08:39:51 mvlLinux-pc kernel: [ 8044.678302] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v8_0> failed -110
Aug 29 08:39:51 mvlLinux-pc kernel: [ 8044.678328] amdgpu 0000:09:00.0: amdgpu: GPU reset(1) failed
Aug 29 08:39:52 mvlLinux-pc kernel: [ 8044.680626] amdgpu 0000:09:00.0: amdgpu: GPU reset end with ret = -110
Aug 29 08:39:54 mvlLinux-pc kernel: [ 8047.302923] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
Aug 29 08:40:02 mvlLinux-pc kernel: [ 8054.727115] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=22042, emitted seq=22042
Aug 29 08:40:02 mvlLinux-pc kernel: [ 8054.727203] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Aug 29 08:40:02 mvlLinux-pc kernel: [ 8054.727216] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
Aug 29 08:40:46 mvlLinux-pc systemd-modules-load[388]: Inserted module 'lp'
Aug 29 08:40:46 mvlLinux-pc systemd-modules-load[388]: Inserted module 'ppdev'
Aug 29 08:40:46 mvlLinux-pc kernel: [ 0.000000] Linux version 5.8.1-050801-generic (kernel@sita) (gcc (Ubuntu 10.2.0-5ubuntu2) 10.2.0, GNU ld (GNU Binutils for Ubuntu) 2.35) #202008111432 SMP Tue Aug 11 14:34:42 UTC 2020
Aug 29 08:40:46 mvlLinux-pc kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.8.1-050801-generic root=UUID=566746e2-b4e2-42a6-b18a-fa84ebca61aa ro quiet splash vt.handoff=7`
我在错误报告中看到了类似的错误,总是涉及 AMD 显卡,但主要是集成 APU,而不是我的离散设置。这个问题出现在我从 Ubuntu 18.04 迁移到 19.10 的过程中,其他人表示更新的内核修复了它,但更新到 5.8.1 并没有帮助我。鉴于问题的间歇性,其他人可能只认为它已经消失,而我见过的几个人注意到它又回来了。到目前为止,在我读过的几十个线程中都没有看到任何解决方案。我想我可以尝试放入一个较旧的视频卡,看看是否可以缩小范围。谢谢!
@heynnema
在 grub 命令行中设置 pci=noaer 后,我在从挂起恢复时遇到了同样的错误。简历中的 Dmesg 输出:
[ 2456.697121] ACPI: Low-level resume complete
[ 2456.697163] ACPI: EC: EC started
[ 2456.697164] PM: Restoring platform NVS memory
[ 2456.697710] Enabling non-boot CPUs ...
[ 2456.697747] x86: Booting SMP configuration:
[ 2456.697748] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 2456.697845] microcode: CPU1: patch_level=0x0800820d
[ 2456.700139] ACPI: \_PR_.C002: Found 2 idle states
[ 2456.700328] CPU1 is up
[ 2456.700344] smpboot: Booting Node 0 Processor 2 APIC 0x4
[ 2456.700442] microcode: CPU2: patch_level=0x0800820d
[ 2456.702609] ACPI: \_PR_.C004: Found 2 idle states
[ 2456.702779] CPU2 is up
[ 2456.702793] smpboot: Booting Node 0 Processor 3 APIC 0x8
[ 2456.702921] microcode: CPU3: patch_level=0x0800820d
[ 2456.705121] ACPI: \_PR_.C006: Found 2 idle states
[ 2456.705330] CPU3 is up
[ 2456.705344] smpboot: Booting Node 0 Processor 4 APIC 0xa
[ 2456.705468] microcode: CPU4: patch_level=0x0800820d
[ 2456.707683] ACPI: \_PR_.C008: Found 2 idle states
[ 2456.707886] CPU4 is up
[ 2456.707901] smpboot: Booting Node 0 Processor 5 APIC 0xc
[ 2456.708026] microcode: CPU5: patch_level=0x0800820d
[ 2456.710215] ACPI: \_PR_.C00A: Found 2 idle states
[ 2456.710422] CPU5 is up
[ 2456.710435] smpboot: Booting Node 0 Processor 6 APIC 0x1
[ 2456.710561] microcode: CPU6: patch_level=0x0800820d
[ 2456.712760] ACPI: \_PR_.C001: Found 2 idle states
[ 2456.713055] CPU6 is up
[ 2456.713084] smpboot: Booting Node 0 Processor 7 APIC 0x3
[ 2456.713186] microcode: CPU7: patch_level=0x0800820d
[ 2456.715367] ACPI: \_PR_.C003: Found 2 idle states
[ 2456.715594] CPU7 is up
[ 2456.715609] smpboot: Booting Node 0 Processor 8 APIC 0x5
[ 2456.715709] microcode: CPU8: patch_level=0x0800820d
[ 2456.717892] ACPI: \_PR_.C005: Found 2 idle states
[ 2456.718131] CPU8 is up
[ 2456.718143] smpboot: Booting Node 0 Processor 9 APIC 0x9
[ 2456.718271] microcode: CPU9: patch_level=0x0800820d
[ 2456.720463] ACPI: \_PR_.C007: Found 2 idle states
[ 2456.720728] CPU9 is up
[ 2456.720742] smpboot: Booting Node 0 Processor 10 APIC 0xb
[ 2456.720868] microcode: CPU10: patch_level=0x0800820d
[ 2456.723067] ACPI: \_PR_.C009: Found 2 idle states
[ 2456.723342] CPU10 is up
[ 2456.723356] smpboot: Booting Node 0 Processor 11 APIC 0xd
[ 2456.723483] microcode: CPU11: patch_level=0x0800820d
[ 2456.725687] ACPI: \_PR_.C00B: Found 2 idle states
[ 2456.725971] CPU11 is up
[ 2456.727331] ACPI: Waking up from system sleep state S3
[ 2456.728144] ACPI: EC: interrupt unblocked
[ 2456.810892] ACPI: EC: event unblocked
[ 2456.810961] usb usb1: root hub lost power or was reset
[ 2456.810962] usb usb2: root hub lost power or was reset
[ 2456.811202] usb usb3: root hub lost power or was reset
[ 2456.811203] usb usb4: root hub lost power or was reset
[ 2456.811595] sd 1:0:0:0: [sda] Starting disk
[ 2456.811933] serial 00:03: activated
[ 2457.124313] ata5: SATA link down (SStatus 0 SControl 330)
[ 2457.124331] ata6: SATA link down (SStatus 0 SControl 330)
[ 2457.124375] ata7: SATA link down (SStatus 0 SControl 330)
[ 2457.124474] ata1: SATA link down (SStatus 0 SControl 300)
[ 2457.124622] ata9: SATA link down (SStatus 0 SControl 300)
[ 2457.128321] ata3: SATA link down (SStatus 0 SControl 330)
[ 2457.168893] nvme nvme0: Shutdown timeout set to 8 seconds
[ 2457.181058] ata4: SATA link down (SStatus 0 SControl 330)
[ 2457.204000] nvme nvme0: 32/0/0 default/read/poll queues
[ 2457.215120] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
[ 2457.283762] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 2457.366979] usb 4-2: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd
[ 2457.403433] [drm] UVD and UVD ENC initialized successfully.
[ 2457.526411] [drm] VCE initialized successfully.
[ 2457.586664] usb 3-1: reset high-speed USB device number 2 using xhci_hcd
[ 2457.850542] ata8: failed to resume link (SControl 0)
[ 2457.850553] ata8: SATA link down (SStatus 0 SControl 0)
[ 2458.122724] usb 3-1.1: reset full-speed USB device number 3 using xhci_hcd
[ 2460.178827] igb 0000:07:00.0 enp7s0: igb: enp7s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 2462.202613] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 2462.379171] usb 5-2.2: reset low-speed USB device number 5 using xhci_hcd
[ 2462.607145] ata2.00: configured for UDMA/133
[ 2467.726718] PM: dpm_run_callback(): usb_dev_resume+0x0/0x20 returns -5
[ 2467.726722] PM: Device 5-2.2 failed to resume async: error -5
[ 2467.727071] OOM killer enabled.
[ 2467.727072] Restarting tasks ... done.
[ 2467.821378] PM: suspend exit
[ 2467.887621] usb 5-2.2: USB disconnect, device number 5
[ 2467.994352] usb 5-2.2: new low-speed USB device number 7 using xhci_hcd
[ 2468.103947] usb 5-2.2: New USB device found, idVendor=0764, idProduct=0501, bcdDevice= 0.01
[ 2468.103949] usb 5-2.2: New USB device strings: Mfr=3, Product=1, SerialNumber=0
[ 2468.103950] usb 5-2.2: Product: ST Series
[ 2468.103951] usb 5-2.2: Manufacturer: CPS
[ 2468.161509] hid-generic 0003:0764:0501.0008: hiddev2,hidraw5: USB HID v1.10 Device [CPS ST Series] on usb-0000:0a:00.3-2.2/input0
[ 2471.910903] igb 0000:07:00.0 enp7s0: igb: enp7s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 2472.022608] IPv6: ADDRCONF(NETDEV_CHANGE): enp7s0: link becomes ready
[ 2575.502700] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out!
[ 2575.502806] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out!
[ 2580.632921] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=84864, emitted seq=84866
[ 2580.633010] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1874 thread Xorg:cs0 pid 1877
[ 2580.633018] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
[ 2581.335993] amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[ 2581.336073] [drm:gfx_v8_0_kcq_disable.isra.0 [amdgpu]] *ERROR* KCQ disable failed
[ 2581.613633] amdgpu: cp is busy, skip halt cp
[ 2581.890354] amdgpu: rlc is busy, skip halt rlc
[ 2581.891376] amdgpu 0000:09:00.0: amdgpu: GPU BACO reset
[ 2582.546375] amdgpu 0000:09:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 2582.548207] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 2582.548220] [drm] VRAM is lost due to GPU reset!
[ 2582.878644] amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
[ 2582.878708] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v8_0> failed -110
[ 2582.878764] amdgpu 0000:09:00.0: amdgpu: GPU reset(2) failed
[ 2582.881066] amdgpu 0000:09:00.0: amdgpu: GPU reset end with ret = -110
[ 2585.742804] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
[ 2585.742817] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:49:crtc-1] flip_done timed out
[ 2588.558904] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
[ 2592.910983] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[ 2603.150799] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
此时屏幕为空白,系统冻结。它看起来和往常一样。GPU重置被重试并超时并且失败所以在我看来,发生的事情是GPU在挂起后无法恢复/重置。我在重新启动时看到过它,但很少见,而且我通常可以工作/玩几个小时——只要我不允许它暂停。谢谢!
我在安装全新的 Ubuntu 20.04 时遇到了一个奇怪的问题。一切正常,除了原生 Linux 游戏。在 Rimworld 上,我在 Stardew Valley (Steam) 中得到了一张条纹图片,充其量所有文本框都显示亚洲语言。我已经尝试了很多东西,oibaf 驱动程序,奇怪的 grub 选项等等......但无济于事。 边缘世界截图
最奇怪的是,我所有使用 WINE 的 Windows 游戏都能完美运行。你有没有遇到过这样的事情?
TIA
此外,我尝试重新安装与 openGL 相关的所有内容,但仍然... Unigine Valley screenshot
我也试过改变记忆......同样的区别。我重新安装了 Windows 10,没有任何问题,因此排除了硬件故障。这真让人生气。