在我主 SSD 上复制一个大文件后不久,IO PSI 值上升到 90% 以上,系统变得非常迟钝。大约需要一分钟才能恢复正常。
我还注意到在执行诸如此类的操作时会出现冻结问题docker pull
,但现在能够重现我认为是同一问题rsync
。
当测试文件位于第二个SSD时,问题似乎不会发生。
可重现的示例:
受到这篇文章的启发,我每秒将负载和 IO PSI 记录到 CSV 文件中。
在第一个终端会话中我运行
rm monitor.csv; t=0; while true; do t=$(($t+1)); echo $s,$(cat /proc/loadavg | cut -d ' ' -f1),$(grep some /proc/pressure/cpu | cut -d' ' -f2 | cut -d= -f2),$(grep some /proc/pressure/io | cut -d' ' -f2 | cut -d= -f2),$(grep full /proc/pressure/io | cut -d' ' -f2 | cut -d= -f2) >> monitor.csv; sleep 1; done
我在主 SSD 上复制了一个 3.3 GB 的测试文件:
rm test.zip ; echo "start rsync" >> monitor.csv ; rsync original.zip test.zip ; echo "end rsync" >> monitor.csv
现在 monitor.csv 如下所示:
1,0.64,0.18,0.00,0.00
2,0.67,0.14,0.00,0.00
3,0.67,0.14,0.00,0.00
4,0.67,0.12,0.00,0.00
start rsync
5,0.67,0.12,0.00,0.00
6,0.67,0.09,0.00,0.00
7,0.70,0.09,0.00,0.00
8,0.70,0.08,0.00,0.00
end rsync
9,0.70,0.08,0.00,0.00
10,0.70,0.42,6.15,5.79
11,0.70,0.42,6.15,5.79
12,0.80,0.35,16.99,15.97
13,0.80,0.35,16.99,15.97
14,0.80,0.28,30.94,29.38
15,0.80,0.28,30.94,29.38
16,1.14,0.23,43.09,41.27
17,1.14,0.23,51.59,49.73
18,1.14,0.19,51.59,49.73
19,1.14,0.19,51.59,49.73
20,1.14,0.33,58.00,55.40
21,2.57,0.27,62.89,60.22
22,2.57,0.27,62.89,60.22
23,2.57,0.22,67.80,65.25
24,2.57,0.22,67.80,65.25
25,2.57,0.18,72.37,69.73
26,3.57,0.18,72.37,69.73
27,3.57,0.15,75.38,72.68
28,3.57,0.15,75.38,72.68
29,3.57,0.12,79.48,76.90
30,3.57,0.12,79.48,76.90
31,4.56,0.10,83.01,80.36
32,4.56,0.10,83.01,80.36
33,4.56,0.08,84.46,81.38
34,4.56,0.08,84.46,81.38
35,4.56,0.06,86.37,83.12
36,5.24,0.06,86.37,83.12
37,5.24,0.05,86.66,83.10
38,5.24,0.05,86.66,83.10
39,5.24,0.04,87.09,82.90
40,5.24,0.04,87.09,82.90
41,5.94,0.03,84.71,80.02
42,5.94,0.03,84.71,80.02
43,5.94,0.02,84.77,79.65
44,5.94,0.02,84.77,79.65
45,5.94,0.02,86.26,80.08
46,6.91,0.02,86.26,80.08
47,6.91,0.01,86.93,80.42
48,6.91,0.01,86.93,80.42
49,6.91,0.01,86.04,79.80
50,6.91,0.01,86.04,79.80
51,7.95,0.01,85.67,79.29
52,7.95,0.01,85.67,79.29
53,7.95,0.01,86.45,79.78
54,7.95,0.01,86.45,79.78
55,7.95,0.00,86.73,79.82
56,8.12,0.00,86.73,79.82
57,8.12,0.18,80.08,73.69
58,8.12,0.18,80.08,73.69
59,8.12,0.33,65.57,60.34
60,7.55,0.33,65.57,60.34
61,7.55,0.27,54.05,49.77
62,7.55,0.27,54.05,49.77
63,7.55,0.22,44.26,40.76
64,7.55,0.18,36.24,33.37
65,7.02,0.18,36.24,33.37
66,7.02,0.18,36.24,33.37
67,7.02,0.14,29.67,27.32
68,7.02,0.12,24.30,22.37
69,7.02,0.12,24.30,22.37
70,6.46,0.10,19.89,18.32
71,6.46,0.10,19.89,18.32
72,6.46,0.08,16.29,15.00
73,6.46,0.08,16.29,15.00
74,6.46,0.06,13.34,12.28
75,5.94,0.06,13.34,12.28
76,5.94,0.05,10.92,10.06
77,5.94,0.05,10.92,10.06
78,5.94,0.04,8.94,8.23
79,5.94,0.04,8.94,8.23
80,5.55,0.03,7.32,6.74
81,5.55,0.03,7.32,6.74
82,5.55,0.02,5.99,5.52
83,5.55,0.02,5.99,5.52
84,5.55,0.02,4.91,4.52
85,5.10,0.02,4.91,4.52
86,5.10,0.01,4.02,3.70
87,5.10,0.01,4.02,3.70
88,5.10,0.01,3.29,3.03
89,5.10,0.01,3.29,3.03
图表:
- 复制文件仅需约 5 秒
- 此后 PSI 会变得非常高,一分钟后恢复正常
- 在此期间,系统 GUI 变得非常迟钝
复制过程结束后,顶部还会闪烁一个警告:
对位于第二个 SSD 上的文件进行相同的操作会花费更长的时间,但系统不会冻结,并且 IO PSI 值也不会出现如此严重的峰值:
1,2.42,0.00,0.01,0.01
2,2.42,0.00,0.01,0.01
3,2.42,0.00,0.01,0.00
4,2.23,0.00,0.01,0.00
start rsync
5,2.23,0.00,0.00,0.00
6,2.23,0.00,0.00,0.00
7,2.23,0.00,1.27,1.09
8,2.23,0.00,1.27,1.09
9,2.13,0.00,2.85,2.70
10,2.13,0.00,2.85,2.70
11,2.13,0.18,3.42,3.30
12,2.13,0.18,3.42,3.30
13,2.13,0.14,4.25,4.15
14,2.04,0.14,4.25,4.15
15,2.04,0.12,5.11,5.03
16,2.04,0.12,5.11,5.03
17,2.04,3.72,9.44,7.02
end rsync
18,2.04,3.72,9.44,7.02
19,1.88,3.04,8.63,6.65
20,1.88,3.04,8.63,6.65
21,1.88,2.49,7.07,5.44
22,1.88,2.49,7.07,5.44
23,1.73,2.04,5.79,4.46
24,1.73,2.04,5.79,4.46
25,1.73,1.67,4.74,3.65
26,1.73,1.67,4.74,3.65
27,1.73,1.36,3.88,2.99
28,1.59,1.36,3.88,2.99
29,1.59,1.12,3.17,2.44
30,1.59,1.12,3.17,2.44
系统详细信息
联想 E14
Ubuntu 22.04
Gnome 和 Wayland
带有 Radeon 显卡的 AMD Ryzen 7 4700U CPU
内置 2 个 NVME SSD,均具有加密文件系统
- 主 SSD:(mvme0)
- 型号:KBG40ZNT512G 东芝内存
- 固件版本:0109AELA(似乎是最新版本)
- 辅助 SSD:(nvme1)
- 型号:CT500P3SSD8
- 主 SSD:(mvme0)
两个设备上的调度程序相同:
$ cat /sys/block/nvme0n1/queue/scheduler
[none] mq-deadline
$ cat /sys/block/nvme1n1/queue/scheduler
[none] mq-deadline
理论/问题:
- 主 SSD 是不是坏了?
- 这与系统中 PCI-E 通道数量有限有关吗?
尝试过
- 我尝试了内核 6.8.0-40-generic(我的系统上的标准)以及 6.5.0-45 - 但在这方面表现相同。
EDIT1(根据@ubfan1的评论)
- Trim 似乎在第二台设备上运行得更多(没有问题)。在主设备上它运行得不那么频繁 - 这是好事还是坏事?
$ journalctl |grep "fstrim\["|grep nvme0
Jul 01 08:32:31 example fstrim[27758]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/nvme0n1p1
Jul 01 08:32:31 example fstrim[27758]: /boot: 1.2 GiB (1269686272 bytes) trimmed on /dev/nvme0n1p2
Jul 08 09:53:27 example fstrim[50828]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/nvme0n1p1
Jul 08 09:53:27 example fstrim[50828]: /boot: 1.2 GiB (1269567488 bytes) trimmed on /dev/nvme0n1p2
Jul 15 09:43:17 example fstrim[46889]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/nvme0n1p1
Jul 15 09:43:17 example fstrim[46889]: /boot: 1.2 GiB (1269567488 bytes) trimmed on /dev/nvme0n1p2
Jul 22 12:33:55 example fstrim[14093]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/nvme0n1p1
Jul 22 12:33:55 example fstrim[14093]: /boot: 1.2 GiB (1267404800 bytes) trimmed on /dev/nvme0n1p2
Jul 29 10:15:54 example fstrim[63916]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/nvme0n1p1
Jul 29 10:15:54 example fstrim[63916]: /boot: 1.2 GiB (1267404800 bytes) trimmed on /dev/nvme0n1p2
Aug 05 09:37:39 example fstrim[46612]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/nvme0n1p1
Aug 05 09:37:39 example fstrim[46612]: /boot: 1.2 GiB (1269469184 bytes) trimmed on /dev/nvme0n1p2
两个设备都是内置的,不通过 USB 连接
智能控制
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.8.0-40-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: KBG40ZNT512G TOSHIBA MEMORY
Serial Number: 30TPxxxxxxx
Firmware Version: 0109AELA
PCI Vendor/Subsystem ID: 0x1e0f
IEEE OUI Identifier: 0x8ce38e
Total NVM Capacity: 512.110.190.592 [512 GB]
Unallocated NVM Capacity: 0
Controller ID: 0
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512.110.190.592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 8ce38e 0400911b0e
Local Time is: Tue Aug 20 08:58:37 2024 CEST
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x001f): Security Format Frmw_DL NS_Mngmt Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 82 Celsius
Critical Comp. Temp. Threshold: 86 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 3.50W - - 0 0 0 0 1 1
1 + 2.60W - - 1 1 1 1 1 1
2 + 1.80W - - 2 2 2 2 1 1
3 - 0.0500W - - 4 4 4 4 800 1200
4 - 0.0050W - - 4 4 4 4 3000 32000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 3
1 - 4096 0 1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 29 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 12%
Data Units Read: 41.196.433 [21,0 TB]
Data Units Written: 54.357.811 [27,8 TB]
Host Read Commands: 792.033.014
Host Write Commands: 1.093.950.301
Controller Busy Time: 10.506
Power Cycles: 3.308
Power On Hours: 6.391
Unsafe Shutdowns: 4
Media and Data Integrity Errors: 0
Error Information Log Entries: 44
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 29 Celsius
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged
看起来操作系统正在告诉 rsync 复制已完成,但实际上似乎尚未完成。因此,当实际写入磁盘时(与操作系统相同的磁盘),利用率峰值会随之增加。
这有点像写回模式,一旦写入意图进入缓存,写入即确认完成。您的 SSD 没有 DRAM 缓存,因此这可能意味着您首先将 ~3.3GB 文件缓存到 RAM。
您的系统有多少 RAM?如果您有 4GB RAM,Ubuntu 使用 ~1.3GB,而文件使用 ~3.3GB。这超过了 4GB RAM。也许分页(写入缓存)到您从 RAM 写入的同一磁盘之间存在一些冲突?
如果不是这种情况,并且您的 RAM 足够多,也许可以考虑使用更快的磁盘来存储/使用操作系统和磁盘映像? Crucial (M.2 2280) 比 Toshiba (M.2 2242) 更快。
英睿达:https://www.harddrivebenchmark.net/hdd.php ?hdd=Crucial%20P3%20500GB 东芝:https://www.harddrivebenchmark.net/hdd.php ?hdd=KBG40ZNS512G%20NVMe%20TOSHIBA%20512GB
检查您的规格,我没有发现任何问题;两个 NVMe 插槽似乎都是 PCI 3.0 4x(可能是相同的有限 PCI 通道,是的)。我只能想象问题要么是 SSD 上的一个功能(也许 Crucial 上有一个内置加密卸载),要么就是东芝本身的 IO 功能。否则,我希望 PCI 总线上的 IO 能够公平地调度。
至于 PSI,您可能需要考虑缩小它所引用的参数范围。图表采用颜色编码,在您的示例中,表示不同的争用点,但我没有看到图例来标出哪种颜色是哪个压力点。但是,根据您的脚本和 CSV 输出,最后两个输出最高,并且它们都是基于 IO 的。因此,问题似乎与 IO 有关。在 ATOP 输出中,加密过程似乎最高,这就是为什么我指出某种有助于 IO 的 SSD 功能集。
我只能建议尝试在没有磁盘加密的情况下执行相同的操作,看看是否可以减轻 IO 惩罚/开销......但这是您的操作系统磁盘,所以我不建议您将其保持该状态。
我有点想知道如果你更换磁盘,问题是否会出在磁盘或插槽上,但我想你不能这样做,因为它们可能分别是 80 毫米和 42 毫米。
我认为问题是,在设置(https://askubuntu.com/a/122007/39966)并运行后,trim从未运行过
大约需要 25 分钟才能完成,我在 rsync 期间和之后获得了这种性能 - IO PSI 保持在较低水平并且系统保持响应。