我试图从根本上解决一个客户案例,即使用相同命令格式化的 2 个相同驱动器由于额外的 Inode 开销而导致总磁盘空间存在约 55GB 的差异。
我想了解
- 关于 2x 如何
Inodes per group
转换为 2x的数学运算Inode count
- 使用标志时如何
Inodes per group
设置lazy_itable_init
环境:
2 个驱动器位于 2 个相同的硬件服务器上,运行在相同的操作系统上。以下是 2 个驱动器的详细信息(敏感信息已编辑):
驱动器 A:
=== START OF INFORMATION SECTION ===
Vendor: HPE
Product: <strip>
Revision: HPD4
Compliance: SPC-5
User Capacity: 7,681,501,126,656 bytes [7.68 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: <strip>
Serial number: <strip>
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Mon Apr 25 07:39:27 2022 GMT
SMART support is: Available - device has SMART capability.
驱动器 B:
=== START OF INFORMATION SECTION ===
Vendor: HPE
Product: <strip>
Revision: HPD4
Compliance: SPC-5
User Capacity: 7,681,501,126,656 bytes [7.68 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: <strip>
Serial number: <strip>
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Mon Apr 25 07:39:23 2022 GMT
SMART support is: Available - device has SMART capability.
运行格式化驱动器的命令是:
sudo mke2fs -F -m 1 -t ext4 -E lazy_itable_init,nodiscard /dev/sdc1
问题:
驱动器 A 和 B的df -h
输出分别显示大小为 6.9T 的驱动器 A 与大小为 7.0T 的驱动器 B:
/dev/sdc1 6.9T 89M 6.9T 1% /home/<strip>/data/<serial>
...
/dev/sdc1 7.0T 3.0G 6.9T 1% /home/<strip>/data/<serial>
观察:
- 两个驱动器上的 fdisk 输出显示它们都有相同的分区。
驱动器A:
Disk /dev/sdc: 7681.5 GB, 7681501126656 bytes, 15002931888 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disk label type: gpt
Disk identifier: 70627C8E-9F97-468E-8EE6-54E960492318
# Start End Size Type Name
1 2048 15002929151 7T Microsoft basic primary
驱动器B:
Disk /dev/sdc: 7681.5 GB, 7681501126656 bytes, 15002931888 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disk label type: gpt
Disk identifier: 702A42FA-9A20-4CE4-B938-83D3AB3DCC49
# Start End Size Type Name
1 2048 15002929151 7T Microsoft basic primary
/etc/mke2fs.conf
两个系统上的内容是相同的,所以这里没有有趣的事情:
================== DriveA =================
[defaults]
base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr
enable_periodic_fsck = 1
blocksize = 4096
inode_size = 256
inode_ratio = 16384
[fs_types]
ext3 = {
features = has_journal
}
ext4 = {
features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit
inode_size = 256
}
...
================== DriveB =================
[defaults]
base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr
enable_periodic_fsck = 1
blocksize = 4096
inode_size = 256
inode_ratio = 16384
[fs_types]
ext3 = {
features = has_journal
}
ext4 = {
features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit
inode_size = 256
}
- 如果我们对两个驱动器的 tune2fs -l 输出进行比较,我们
Inodes per group
会在 DriveA 上看到 2x DriveB - 我们还在
Inode count
DriveA 上看到 2xDriveB (Full diff HERE )
DriveA:
Inode count: 468844544
Block count: 1875365888
Reserved block count: 18753658
Free blocks: 1845578463
Free inodes: 468843793
...
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
DriveB:
Inode count: 234422272 <----- Half of A
Block count: 1875365888
Reserved block count: 18753658
Free blocks: 1860525018
Free inodes: 234422261
...
Fragments per group: 32768
Inodes per group: 4096 <---------- Half of A
Inode blocks per group: 256 <---------- Half of A
Flex block group size: 16
从如何计算 ext2 文件系统上的“每组 Inode 块”?我理解
Inode blocks per group
是由于Inodes per group
从 mke2fs 代码(Source)来看,
Inodes per group
似乎write_inode_tables
只有在提供时才在函数中调用 valuelazy_itable_init
:
write_inode_tables(fs, lazy_itable_init, itable_zeroed);
...
static void write_inode_tables(ext2_filsys fs, int lazy_flag, int itable_zeroed)
...
if (lazy_flag)
num = ext2fs_div_ceil((fs->super->s_inodes_per_group - <--------- here
ext2fs_bg_itable_unused(fs, i)) *
EXT2_INODE_SIZE(fs->super),
EXT2_BLOCK_SIZE(fs->super));
如果我们将 inode 计数的差异乘以恒定的 inode 大小 (256),我们将获得(468844544-234422272)*256 = 60012101632 bytes
约 55GiB 的额外 inode 开销。
任何人都可以帮助我计算 Inode 计数在增加到 2 倍时如何增加到 2
Inodes per group
倍吗?是否
lazy_itable_init
在运行时影响决定 的值Inodes per group
,如果是,我们如何理解它将设置什么值?(此标志是代码中对 s_inodes_per_group 的唯一引用)