基本硬件信息:
有问题的硬盘是希捷 BarraCuda 4TB(型号:ST4000DM004)。更多详细信息,请参见hdparm -I
末尾附录中的输出。
问题描述及测试:
从表面上看,这个问题就像是缓存要写入磁盘的数据,而写入速度比这慢。但是,在这种情况下,事情似乎并不那么简单。
复制文件(在 NTFS 文件系统上):
当写入相当大量的数据时,驱动器的性能会突然急剧下降。同样,通常这就像在 RAM 中缓存文件一样简单,然后磁盘工作一段时间。然而,在这里,当监视/proc/meminfo
文件时(在 Ubuntu 下),观察到的行为似乎不支持这一点。即使在写入数据(大文件或几个小文件)并调用sync
之后,“脏”内存的数量也会在一段时间内继续减少,然后几乎完全停止。会一直下降非常慢慢地,直到有时它最终加速。这可以重复,具体取决于剩余的数据量。sync
当写入速度降低时,读取设备也非常缓慢,如果在“慢速模式”下完成,即使完成后也会保持一段时间。
这些初始测试是在 Ubuntu 21.10 和 Windows 10 上执行的,具有类似的行为。
Windows 的附加说明:
当完成复制操作后磁盘仍然很慢,并且我尝试从磁盘读取文件(例如播放视频,一直滞后),资源监视器和任务管理器都显示磁盘使用率很高设备(100% 或接近),而显示的实际速度为 <1 MB/s。(操作系统也设法在某个时候完全冻结,但这可能或可能不严格相关。)
磁盘基准测试:
为了查看这是由于文件系统还是硬件本身,我使用该gnome-disks
实用程序在设备上执行了基准测试。我将在这里展示的一个这样的基准测试的结果举例说明了我上面描述的内容,读取和写入速度在一个点之后急剧下降到几乎不存在,然后稍后恢复(蓝色和红色分别是每个单独样本的读取和写入速度在从磁盘外部到内部的位置,总共1000个;绿色的点和线对应于与其他分开的访问时间基准):
请注意,据我了解,基准测试工具消除了写入缓存等因素。此外,/proc/meminfo
在任何情况下,在缓慢的时间段内,几乎没有等待写入的数据被保存在缓存中;它的完整内容可以在附录中看到。
在基准测试中禁用写入后,不会出现这种现象,尽管磁盘内部的速度似乎异常突然下降:
(减少的位置不取决于花费的时间,而是取决于磁盘上的物理位置,正如其他具有不同样本数的基准所表明的那样,截止发生在同一点。)
在系统中其他可能健康的硬盘上进行等效基准测试会产生预期的常规结果,如下所示:
结论/问题:
据此,我认为该问题可能是由某些硬件或固件故障引起的,但我可能忽略了许多事情。
所呈现现象的可能原因是什么?我应该采取哪些后续步骤来进一步诊断问题?任何帮助是极大的赞赏。
附录:
详细的硬件信息(由 输出hdparm -I
):
/dev/sdb:
ATA device, with non-removable media
Model Number: ST4000DM004-2CV104
Serial Number: ZFN3J8RH
Firmware Revision: 0001
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
Used: unknown (minor revision code 0x006d)
Supported: 10 9 8 7 6 5
Likely used: 10
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 7814037168
Logical Sector size: 512 bytes
Physical Sector size: 4096 bytes
Logical Sector-0 offset: 0 bytes
device size with M = 1024*1024: 3815447 MBytes
device size with M = 1000*1000: 4000787 MBytes (4000 GB)
cache/buffer size = unknown
Form Factor: 3.5 inch
Nominal Media Rotation Rate: 5425
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Recommended acoustic management value: 208, current value: 208
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* DOWNLOAD_MICROCODE
Power-Up In Standby feature set
* SET_FEATURES required to spinup after power up
SET_MAX security extension
* 48-bit Address feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
Write-Read-Verify feature set
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* unknown 119[6]
* unknown 119[7]
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Gen3 signaling speed (6.0Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
* READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
* DMA Setup Auto-Activate optimization
Device-initiated interface power management
* Software settings preservation
unknown 78[7]
* SMART Command Transport (SCT) feature set
* SCT Write Same (AC2)
* SCT Data Tables (AC5)
unknown 206[7]
unknown 206[12] (vendor specific)
unknown 206[13] (vendor specific)
* DOWNLOAD MICROCODE DMA command
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
supported: enhanced erase
490min for SECURITY ERASE UNIT. 490min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5000c500c6a79fae
NAA : 5
IEEE OUI : 000c50
Unique ID : 0c6a79fae
Checksum: correct
/proc/meminfo
在第一次基准测试中,在读/写速度很慢的时候:
MemTotal: 16323712 kB
MemFree: 9894056 kB
MemAvailable: 12815716 kB
Buffers: 138380 kB
Cached: 3038420 kB
SwapCached: 0 kB
Active: 1533040 kB
Inactive: 4396560 kB
Active(anon): 2960 kB
Inactive(anon): 2817480 kB
Active(file): 1530080 kB
Inactive(file): 1579080 kB
Unevictable: 32 kB
Mlocked: 32 kB
SwapTotal: 17577980 kB
SwapFree: 17577980 kB
Dirty: 176 kB
Writeback: 0 kB
AnonPages: 2752844 kB
Mapped: 694816 kB
Shmem: 73200 kB
KReclaimable: 137092 kB
Slab: 260112 kB
SReclaimable: 137092 kB
SUnreclaim: 123020 kB
KernelStack: 13872 kB
PageTables: 33292 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 25739836 kB
Committed_AS: 9749696 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 76616 kB
VmallocChunk: 0 kB
Percpu: 8128 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 512904 kB
DirectMap2M: 7813120 kB
DirectMap1G: 8388608 kB
Seagate ST4000DM004 使用SMR将数据写入磁盘表面。这意味着,为了写入单个字节,它可能必须重写多个千兆字节。
在“正常使用模式”中(由 HDD 供应商指定,而不是由用户指定!)这不会产生太大问题 - 数据被写入磁盘外缘的CMR缓存。稍后,当磁盘使用量下降时,固件会将日期移动到 SMR 带中的最终位置。
当一次写入大量数据时,此 CMR 缓存会耗尽,并且必须接管到 SMR 频带的 I/O 过程 - 这会慢几个数量级。
注意:这不是 RAM 缓存 - 它是磁盘表面的一小部分,以 CMR(即,没有重叠轨道)写入,以使 SMR 恐怖对用户不那么可见。
硬盘驱动器在磁道上的扇区中写入数据,但是在不相互干扰的情况下,磁道可以放置多近是有限制的。
Hard drive vendors realized that the problem of adjacent tracks interfering with each other could be mitigated if they gave up on the traditional random write access model and wrote large areas of data sequentially. Each track written would overlap slightly with the last. That means more data per platter which means higher capacity and/or lower cost. This is known as "Shingled Magnetic Recording" (SMR), by analogy to the way roofing shingled overlap.
Of course, that a hard drive that required major changes in the OS wouldn't sell very well. So they added translation firmware and a CMR cache area, so that the SMR drive would look like a regular drive to the OS. It is not terribly dissimilar to what SSD vendors already do.
The difference is though that flash is fast, so even with the translation layer, SSDs were still much faster than HDDs. SMR HDDs on the other hand have performance that drops off a cliff when the CMR cache area runs out and the drive must block new write operations on the slow process of rewriting shingles.
Unfortunately, all three of the remaining HDD vendors decided that the way they would release this technology is by slipping it into the product lineup without telling people about it. So rather than being able to make a conscious choice whether or not to accept a performance cliff in exchange for a slightly lower cost per unit of storage, people unknowingly received these drives. Under pressure from the media, they did eventually release the information on which drive models were SMR, but it's still not made obvious to customers.
由于这三个主要的硬盘供应商都是这样做的,你不能只是抵制罪魁祸首,所以似乎唯一的选择是从现在开始仔细检查你购买的每一个硬盘。
奇怪的是,尽管 SMR 背后的最初动机是容量,但似乎最大的驱动器通常仍然是 CMR,而 SMR 主要出现在低个位数 TB 的驱动器上。