搭建实验实验室集群,通过10G光纤连接接收数据的写入速度为本地写入速度的10%。
测试两台相同机器之间的传输速度;iperf3
显示良好的内存到内存速度为9.43Gbits/s。磁盘(读取)到内存的传输速度为(9.35Gbit/s):
test@rbox1:~$ iperf3 -s -B 10.0.0.21
test@rbox3:~$ iperf3 -c 10.0.0.21 -F /mnt/k8s/test.3g
Connecting to host 10.0.0.21, port 5201
Sent 9.00 GByte / 9.00 GByte (100%) of /mnt/k8s/test.3g
[ 5] 0.00-8.26 sec 9.00 GBytes 9.35 Gbits/sec
但是发送超过 10G 的数据并写入另一台机器上的磁盘要慢一个数量级:
test@rbox1:~$ iperf3 -s 10.0.0.21 -F /tmp/foo -B 10.0.0.21
test@rbox3:~$ iperf3 -c 10.0.0.21
Connecting to host 10.0.0.21, port 5201
[ 5] local 10.0.0.23 port 39970 connected to 10.0.0.21 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 103 MBytes 864 Mbits/sec 0 428 KBytes
[ 5] 1.00-2.00 sec 100 MBytes 842 Mbits/sec 0 428 KBytes
[ 5] 2.00-3.00 sec 98.6 MBytes 827 Mbits/sec 0 428 KBytes
[ 5] 3.00-4.00 sec 99.3 MBytes 833 Mbits/sec 0 428 KBytes
[ 5] 4.00-5.00 sec 91.5 MBytes 768 Mbits/sec 0 428 KBytes
[ 5] 5.00-6.00 sec 94.4 MBytes 792 Mbits/sec 0 428 KBytes
[ 5] 6.00-7.00 sec 98.1 MBytes 823 Mbits/sec 0 428 KBytes
[ 5] 7.00-8.00 sec 91.2 MBytes 765 Mbits/sec 0 428 KBytes
[ 5] 8.00-9.00 sec 91.0 MBytes 764 Mbits/sec 0 428 KBytes
[ 5] 9.00-10.00 sec 91.5 MBytes 767 Mbits/sec 0 428 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 959 MBytes 804 Mbits/sec 0 sender
Sent 959 MByte / 9.00 GByte (10%) of /mnt/k8s/test.3g
[ 5] 0.00-10.00 sec 953 MBytes 799 Mbits/sec receiver
NVME 驱动器能够更快地在本地dd
写入(详细信息和fio
测量值如下) - 对于单进程和4k/8k/10m块:fio
随机写入速度为330/500/1300 MB/s
我正在尝试实现接近 NVME 驱动器的实际本地写入速度的写入速度(所以是的,很好地阐明了这个假设——我希望能够达到非常相似的速度写入单个 NVME 驱动器。网络;但我什至无法获得其中的 20%)。
在这一点上,我完全被踩住了,看不到还有什么可以尝试的——除了不同的内核/操作系统——任何指针、更正和帮助都将不胜感激。
这里有一些测量/信息/结果:
到目前为止我尝试了什么:
两台机器上的巨型帧(MTU 9000)并验证它们工作(使用
ping -mping -M do -s 8972
)消除了网络交换机的干扰,我通过 2m Dumplex OM3 多模光纤电缆直接连接了两台机器(所有机器上的电缆和收发器都是相同的),并将 iperf3 服务器/客户端绑定到这些接口。结果是一样的(慢)。
在测试期间断开所有其他以太网/光纤电缆(以消除路由问题) - 没有变化。
更新了主板和光纤网卡的固件(同样,没有变化)。我还没有更新 NVME 固件 - 似乎已经是最新的了。
甚至尝试将 10G 卡从 PCIE 插槽 1 移动到下一个可用插槽;想知道 NVME 和 10G NIC 是否共享和最大化物理集线器通道带宽(同样,没有可测量的变化)。
发现了一些“有趣”的行为:
- 增加并行客户端的数量会增加带宽利用率;有 1 个客户端,目标机器写入 900Mbits/sec;4 个客户端 1.26 Gbits/sec(超过 4 个并行客户端会产生不利影响)
- 在具有 AMD Ryzen 5 3600X 和 64G RAM(相同的 NVME 驱动器 + 10G NIC)的更强大的机器上测试写入——1 个客户端可以达到 1.53Gbit/sec,4 个客户端可以达到 2.15Gbits/sec(8 个客户端可以达到 2.13Gbit/sec )。本例流量流经 Mikrotik CS309 交换机,MTU 为 1500;更强大的机器似乎接收/写入更快
- 测试期间没有明显的负载——这适用于小型和大型机器;2核可能是26%
编辑 06/07:
在@shodanshok 评论之后,通过 NFS 安装远程机器;以下是结果:
nfs exports: /mnt/nfs *(rw,no_subtree_check,async,insecure,no_root_squash,fsid=0)
cat /etc/mtab | grep nfs 10.0.0.21:/mnt/nfs /mnt/nfs1 nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.0.21,mountvers=3,mountport=52335,mountproto=udp,local_lock=none,addr=10.0.0.21 0 0
fio --name=random-write --ioengine=libaio --rw=randwrite --bs=$SIZE --numjobs=1 --iodepth=1 --runtime=30 --end_fsync=1 --size=3g
dd if=/dev/zero of=/mnt/nfs1/test bs=$SIZE count=$(3*1024/$SIZE)
| fio (bs=4k) | fio (bs=8k) | fio (bs=1M) | dd (bs=4k) | dd (bs=1M)
nfs (udp) | 153 | 210 | 984 | 907 |962
nfs (tcp) | 157 | 205 | 947 | 946 |916
所有这些测量结果都是MB/s我很满意 1M 块非常接近 10G 连接的理论速度限制。
看起来iperf3 -F ...
不是测试网络写入速度的方法;但我也会尝试让iperf3
开发人员接受它。
设置详情:
每台机器都有 AMD Ryzen 3 3200G 和 8GB RAM,MPG X570 GAMING PLUS (MS-7C37) 主板。1 个 1TB NVME 驱动器(消费级 WD Blue SN550 NVMe SSD WDS100T2B0C),位于最靠近 CPU 的 M.2 插槽中。以及 PCIe 插槽中的一张 SolarFlare S7120 10G 光纤卡。
NVME 磁盘信息:
test@rbox1:~$ sudo nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 21062Y803544 WDC WDS100T2B0C-00PXH0 1 1.00 TB / 1.00 TB 512 B + 0 B 211210WD
NVME磁盘写入速度(4k/8k/10M)
test@rbox1:~$ dd if=/dev/zero of=/tmp/temp.bin bs=4k count=1000
1000+0 records in
1000+0 records out
4096000 bytes (4.1 MB, 3.9 MiB) copied, 0.00599554 s, 683 MB/s
test@rbox1:~$ dd if=/dev/zero of=/tmp/temp.bin bs=8k count=1000
1000+0 records in
1000+0 records out
8192000 bytes (8.2 MB, 7.8 MiB) copied, 0.00616639 s, 1.3 GB/s
test@rbox1:~$ dd if=/dev/zero of=/tmp/temp.bin bs=10M count=1000
1000+0 records in
1000+0 records out
10485760000 bytes (10 GB, 9.8 GiB) copied, 7.00594 s, 1.5 GB/s
使用 fio-3.16 测试随机写入速度:
test@rbox1:~$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --iodepth=1 --runtime=30 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
Run status group 0 (all jobs):
WRITE: bw=313MiB/s (328MB/s), 313MiB/s-313MiB/s (328MB/s-328MB/s), io=9447MiB (9906MB), run=30174-30174msec
Disk stats (read/write):
dm-0: ios=2/969519, merge=0/0, ticks=0/797424, in_queue=797424, util=21.42%, aggrios=2/973290, aggrmerge=0/557, aggrticks=0/803892, aggrin_queue=803987, aggrutil=21.54%
nvme0n1: ios=2/973290, merge=0/557, ticks=0/803892, in_queue=803987, util=21.54%
test@rbox1:~$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=8k --numjobs=1 --iodepth=1 --runtime=30 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=posixaio, iodepth=1
Run status group 0 (all jobs):
WRITE: bw=491MiB/s (515MB/s), 491MiB/s-491MiB/s (515MB/s-515MB/s), io=14.5GiB (15.6GB), run=30213-30213msec
Disk stats (read/write):
dm-0: ios=1/662888, merge=0/0, ticks=0/1523644, in_queue=1523644, util=29.93%, aggrios=1/669483, aggrmerge=0/600, aggrticks=0/1556439, aggrin_queue=1556482, aggrutil=30.10%
nvme0n1: ios=1/669483, merge=0/600, ticks=0/1556439, in_queue=1556482, util=30.10%
test@rbox1:~$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=10m --numjobs=1 --iodepth=1 --runtime=30 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 10.0MiB-10.0MiB, (W) 10.0MiB-10.0MiB, (T) 10.0MiB-10.0MiB, ioengine=posixaio, iodepth=1
Run status group 0 (all jobs):
WRITE: bw=1250MiB/s (1310MB/s), 1250MiB/s-1250MiB/s (1310MB/s-1310MB/s), io=36.9GiB (39.6GB), run=30207-30207msec
Disk stats (read/write):
dm-0: ios=9/14503, merge=0/0, ticks=0/540252, in_queue=540252, util=68.96%, aggrios=9/81551, aggrmerge=0/610, aggrticks=5/3420226, aggrin_queue=3420279, aggrutil=69.16%
nvme0n1: ios=9/81551, merge=0/610, ticks=5/3420226, in_queue=3420279, util=69.16%
核心:
test@rbox1:~$ uname -a
Linux rbox1 5.8.0-55-generic #62-Ubuntu SMP Tue Jun 1 08:21:18 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
光纤网卡:
test@rbox1:~$ sudo sfupdate
Solarflare firmware update utility [v8.2.2]
Copyright 2002-2020 Xilinx, Inc.
Loading firmware images from /usr/share/sfutils/sfupdate_images
enp35s0f0np0 - MAC: 00-0F-53-3B-7D-D0
Firmware version: v8.0.1
Controller type: Solarflare SFC9100 family
Controller version: v6.2.7.1001
Boot ROM version: v5.2.2.1006
The Boot ROM firmware is up to date
The controller firmware is up to date
光纤网卡的初始化和MTU设置:
test@rbox1:~$ sudo dmesg | grep sf
[ 0.210521] ACPI: 10 ACPI AML tables successfully acquired and loaded
[ 1.822946] sfc 0000:23:00.0 (unnamed net_device) (uninitialized): Solarflare NIC detected
[ 1.824954] sfc 0000:23:00.0 (unnamed net_device) (uninitialized): Part Number : SFN7x22F
[ 1.825434] sfc 0000:23:00.0 (unnamed net_device) (uninitialized): no PTP support
[ 1.958282] sfc 0000:23:00.1 (unnamed net_device) (uninitialized): Solarflare NIC detected
[ 2.015966] sfc 0000:23:00.1 (unnamed net_device) (uninitialized): Part Number : SFN7x22F
[ 2.031379] sfc 0000:23:00.1 (unnamed net_device) (uninitialized): no PTP support
[ 2.112729] sfc 0000:23:00.0 enp35s0f0np0: renamed from eth0
[ 2.220517] sfc 0000:23:00.1 enp35s0f1np1: renamed from eth1
[ 3.494367] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[ 1748.247082] sfc 0000:23:00.0 enp35s0f0np0: link up at 10000Mbps full-duplex (MTU 1500)
[ 1809.625958] sfc 0000:23:00.1 enp35s0f1np1: link up at 10000Mbps full-duplex (MTU 9000)
主板编号:
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: Micro-Star International Co., Ltd.
Product Name: MS-7C37
Version: 2.0
其他硬件信息(主要用于列出物理连接 - 网桥)
test@rbox1:~$ hwinfo --short
cpu:
AMD Ryzen 3 3200G with Radeon Vega Graphics, 1500 MHz
AMD Ryzen 3 3200G with Radeon Vega Graphics, 1775 MHz
AMD Ryzen 3 3200G with Radeon Vega Graphics, 1266 MHz
AMD Ryzen 3 3200G with Radeon Vega Graphics, 2505 MHz
storage:
ASMedia ASM1062 Serial ATA Controller
Sandisk Non-Volatile memory controller
AMD FCH SATA Controller [AHCI mode]
AMD FCH SATA Controller [AHCI mode]
network:
enp35s0f1np1 Solarflare SFN7x22F-R3 Flareon Ultra 7000 Series 10G Adapter
enp35s0f0np0 Solarflare SFN7x22F-R3 Flareon Ultra 7000 Series 10G Adapter
enp39s0 Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
network interface:
br-0d1e233aeb68 Ethernet network interface
docker0 Ethernet network interface
vxlan.calico Ethernet network interface
veth0ef4ac4 Ethernet network interface
enp35s0f0np0 Ethernet network interface
enp35s0f1np1 Ethernet network interface
lo Loopback network interface
enp39s0 Ethernet network interface
disk:
/dev/nvme0n1 Sandisk Disk
/dev/sda WDC WD5000AAKS-4
partition:
/dev/nvme0n1p1 Partition
/dev/nvme0n1p2 Partition
/dev/nvme0n1p3 Partition
/dev/sda1 Partition
bridge:
AMD Matisse Switch Upstream
AMD Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
AMD Raven/Raven2 Device 24: Function 3
AMD Raven/Raven2 PCIe GPP Bridge [6:0]
AMD Matisse PCIe GPP Bridge
AMD Raven/Raven2 Device 24: Function 1
AMD Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
AMD FCH LPC Bridge
AMD Matisse PCIe GPP Bridge
AMD Matisse PCIe GPP Bridge
AMD Raven/Raven2 Device 24: Function 6
AMD Matisse PCIe GPP Bridge
AMD Raven/Raven2 Root Complex
AMD Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus A
AMD Raven/Raven2 Device 24: Function 4
AMD Matisse PCIe GPP Bridge
AMD Raven/Raven2 Device 24: Function 2
AMD Matisse PCIe GPP Bridge
AMD Raven/Raven2 Device 24: Function 0
AMD Raven/Raven2 Device 24: Function 7
AMD Raven/Raven2 PCIe GPP Bridge [6:0]
AMD Raven/Raven2 Device 24: Function 5