AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / user-2694154

SolidSnackDrive's questions

Martin Hope
SolidSnackDrive
Asked: 2025-03-06 05:21:38 +0800 CST

为什么驱动器看起来很好,电缆也已更换,但却充满了 FPDMA 读取错误?

  • 6

不确定这是因为这里情况更好还是服务器故障,目前暂时尝试这里。

我有一台用作个人服务器的机器,其规格如下:

  • 2 个 Xeon E5-2690 v3
  • 华硕 Z10PA-D8 系列主板
  • Nvidia Quadro P4000
  • 1x 三星 SSD 870 EVO 500GB(系统驱动器)
  • 3x WDC WD40EFAX-68JH4N1(在 mdadm RAID5 配置中)
  • 1000W 黄金酷冷至尊电源
  • 运行 Ubuntu 24.04 LTS

它似乎有些间歇性,但我经常发现我的 dmesg 日志在多个驱动器上充斥着相同类型的 FDPMA READ QUEUED 错误。

[  +0.003702] ata10.00: status: { DRDY }
[  +0.001747] ata10.00: failed command: READ FPDMA QUEUED
[  +0.001794] ata10.00: cmd 60/40:38:80:89:33/05:00:a9:01:00/40 tag 7 ncq dma 688128 in
                       res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x50 (ATA bus error)
[  +0.003511] ata10.00: status: { DRDY }
[  +0.001673] ata10.00: failed command: READ FPDMA QUEUED
[  +0.001828] ata10.00: cmd 60/40:40:c0:8e:33/05:00:a9:01:00/40 tag 8 ncq dma 688128 in
                       res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x50 (ATA bus error)
[  +0.003079] ata10.00: status: { DRDY }
[  +0.000513] ata10.00: failed command: READ FPDMA QUEUED
[  +0.001324] ata10.00: cmd 60/40:48:40:99:33/05:00:a9:01:00/40 tag 9 ncq dma 688128 in
                       res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x50 (ATA bus error)
[  +0.003468] ata10.00: status: { DRDY }
[  +0.001671] ata10.00: failed command: READ FPDMA QUEUED
[  +0.001707] ata10.00: cmd 60/40:50:80:9e:33/05:00:a9:01:00/40 tag 10 ncq dma 688128 in
                       res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x50 (ATA bus error)

我尝试过的:

  • 为每个驱动器购买了新的 SATA 电缆,替换了旧的
  • 重新安装连接到每个驱动器的电源线
  • 重新排列主板上使用的端口
  • 检查 BIOS 版本(是否为最新版本)
  • 检查 SMART 健康 (参见 pastebins)

系统似乎仍可用,但我不认为这些错误是好事。不过我不太确定下一步该如何诊断。驱动器 SMART 报告似乎表明它们都处于良好状态,并且 SATA 电缆都是全新的。

dmesg日志,截断(在我重新启动之前从我的终端回滚复制,下次出现错误时我将使用完整版本更新):https://pastebin.com/j0Jnhkgt

skdump对于每个磁盘

Device: sat16:/dev/sda
Type: 16 Byte SCSI ATA SAT Passthru
Size: 476940 MiB
Model: [Samsung SSD 870 EVO 500GB]
Serial: [S6PXNU0X400255M]
Firmware: [SVT02B6Q]
SMART Available: yes
Quirks:
Awake: yes
SMART Disk Health Good: yes
Off-line Data Collection Status: [Off-line data collection activity was never started.]
Total Time To Complete Off-Line Data Collection: 0 s
Self-Test Execution Status: [The previous self-test routine completed without error or no self-test has ever been run.]
Percent Self-Test Remaining: 0%
Conveyance Self-Test Available: no
Short/Extended Self-Test Available: yes
Start Self-Test Available: yes
Abort Self-Test Available: yes
Short Self-Test Polling Time: 2 min
Extended Self-Test Polling Time: 85 min
Conveyance Self-Test Polling Time: 0 min
Bad Sectors: 0 sectors
Powered On: 6.7 months
Power Cycles: 30
Average Powered On Per Power Cycle: 6.7 days
Temperature: 25.0 C
Attribute Parsing Verification: Good
Overall Status: GOOD
ID# Name                        Value Worst Thres Pretty      Raw            Type    Updates Good Good/Past
  5 reallocated-sector-count    100   100    10   0 sectors   0x000000000000 prefail online  yes  yes 
  9 power-on-hours               99    99     0   6.7 months  0xdb1200000000 old-age online  n/a  n/a 
 12 power-cycle-count            99    99     0   30          0x1e0000000000 old-age online  n/a  n/a 
177 wear-leveling-count          99    99     0   3           0x030000000000 prefail online  n/a  n/a 
179 used-reserved-blocks-total  100   100    10   0           0x000000000000 prefail online  yes  yes 
181 program-fail-count-total    100   100    10   0           0x000000000000 old-age online  yes  yes 
182 erase-fail-count-total      100   100    10   0           0x000000000000 old-age online  yes  yes 
183 runtime-bad-block-total     100   100    10   0           0x000000000000 prefail online  yes  yes 
187 reported-uncorrect          100   100     0   0 sectors   0x000000000000 old-age online  n/a  n/a 
190 airflow-temperature-celsius  75    63     0   25.0 C      0x190000000000 old-age online  n/a  n/a 
195 hardware-ecc-recovered      200   200     0   0           0x000000000000 old-age online  n/a  n/a 
199 udma-crc-error-count         99    99     0   475         0xdb0100000000 old-age online  n/a  n/a 
235 good-block-rate              99    99     0   n/a         0x130000000000 old-age online  n/a  n/a 
241 total-lbas-written           99    99     0   34642.150 TB 0x106d893d0000 old-age online  n/a  n/a 
252 attribute-252               100   100     0   n/a         0x020000000000 old-age online  n/a  n/a

Device: sat16:/dev/sdb
Type: 16 Byte SCSI ATA SAT Passthru
Size: 3815447 MiB
Model: [WDC WD40EFAX-68JH4N1]
Serial: [WD-WX22D11RCA6L]
Firmware: [83.00A83]
SMART Available: yes
Quirks:
Awake: yes
SMART Disk Health Good: yes
Off-line Data Collection Status: [Off-line data collection activity was never started.]
Total Time To Complete Off-Line Data Collection: 404 s
Self-Test Execution Status: [The previous self-test routine completed without error or no self-test has ever been run.]
Percent Self-Test Remaining: 0%
Conveyance Self-Test Available: yes
Short/Extended Self-Test Available: yes
Start Self-Test Available: yes
Abort Self-Test Available: yes
Short Self-Test Polling Time: 2 min
Extended Self-Test Polling Time: 120 min
Conveyance Self-Test Polling Time: 2 min
Bad Sectors: 0 sectors
Powered On: 3.6 years
Power Cycles: 43
Average Powered On Per Power Cycle: 1.0 months
Temperature: 27.0 C
Attribute Parsing Verification: Good
Overall Status: GOOD
ID# Name                        Value Worst Thres Pretty      Raw            Type    Updates Good Good/Past
  1 raw-read-error-rate         200   200    51   0           0x000000000000 prefail online  yes  yes 
  3 spin-up-time                201   200    21   2.9 s       0x7d0b00000000 prefail online  yes  yes 
  4 start-stop-count            100   100     0   45          0x2d0000000000 old-age online  n/a  n/a 
  5 reallocated-sector-count    200   200   140   0 sectors   0x000000000000 prefail online  yes  yes 
  7 seek-error-rate             200   200     0   0           0x000000000000 old-age online  n/a  n/a 
  9 power-on-hours               57    57     0   3.6 years   0x027b00000000 old-age online  n/a  n/a 
 10 spin-retry-count            100   253     0   0           0x000000000000 old-age online  n/a  n/a 
 11 calibration-retry-count     100   253     0   0           0x000000000000 old-age online  n/a  n/a 
 12 power-cycle-count           100   100     0   43          0x2b0000000000 old-age online  n/a  n/a 
192 power-off-retract-count     200   200     0   28          0x1c0000000000 old-age online  n/a  n/a 
193 load-cycle-count            194   194     0   20445       0xdd4f00000000 old-age online  n/a  n/a 
194 temperature-celsius-2       120   109     0   27.0 C      0x1b0000000000 old-age online  n/a  n/a 
196 reallocated-event-count     200   200     0   0           0x000000000000 old-age online  n/a  n/a 
197 current-pending-sector      200   200     0   0 sectors   0x000000000000 old-age online  n/a  n/a 
198 offline-uncorrectable       100   253     0   0 sectors   0x000000000000 old-age offline n/a  n/a 
199 udma-crc-error-count        200   200     0   1825        0x210700000000 old-age online  n/a  n/a 
200 multi-zone-error-rate       100   253     0   0           0x000000000000 old-age offline n/a  n/a

Device: sat16:/dev/sdc
Type: 16 Byte SCSI ATA SAT Passthru
Size: 3815447 MiB
Model: [WDC WD40EFAX-68JH4N1]
Serial: [WD-WX22D11RCFDP]
Firmware: [83.00A83]
SMART Available: yes
Quirks:
Awake: yes
SMART Disk Health Good: yes
Off-line Data Collection Status: [Off-line data collection activity was never started.]
Total Time To Complete Off-Line Data Collection: 54720 s
Self-Test Execution Status: [The previous self-test routine completed without error or no self-test has ever been run.]
Percent Self-Test Remaining: 0%
Conveyance Self-Test Available: yes
Short/Extended Self-Test Available: yes
Start Self-Test Available: yes
Abort Self-Test Available: yes
Short Self-Test Polling Time: 2 min
Extended Self-Test Polling Time: 121 min
Conveyance Self-Test Polling Time: 2 min
Bad Sectors: 0 sectors
Powered On: 3.6 years
Power Cycles: 43
Average Powered On Per Power Cycle: 1.0 months
Temperature: 28.0 C
Attribute Parsing Verification: Good
Overall Status: GOOD
ID# Name                        Value Worst Thres Pretty      Raw            Type    Updates Good Good/Past
  1 raw-read-error-rate         200   200    51   0           0x000000000000 prefail online  yes  yes 
  3 spin-up-time                201   200    21   2.9 s       0x640b00000000 prefail online  yes  yes 
  4 start-stop-count            100   100     0   45          0x2d0000000000 old-age online  n/a  n/a 
  5 reallocated-sector-count    200   200   140   0 sectors   0x000000000000 prefail online  yes  yes 
  7 seek-error-rate             200   200     0   0           0x000000000000 old-age online  n/a  n/a 
  9 power-on-hours               57    57     0   3.6 years   0x2d7b00000000 old-age online  n/a  n/a 
 10 spin-retry-count            100   253     0   0           0x000000000000 old-age online  n/a  n/a 
 11 calibration-retry-count     100   253     0   0           0x000000000000 old-age online  n/a  n/a 
 12 power-cycle-count           100   100     0   43          0x2b0000000000 old-age online  n/a  n/a 
192 power-off-retract-count     200   200     0   28          0x1c0000000000 old-age online  n/a  n/a 
193 load-cycle-count            194   194     0   20720       0xf05000000000 old-age online  n/a  n/a 
194 temperature-celsius-2       119   107     0   28.0 C      0x1c0000000000 old-age online  n/a  n/a 
196 reallocated-event-count     200   200     0   0           0x000000000000 old-age online  n/a  n/a 
197 current-pending-sector      200   200     0   0 sectors   0x000000000000 old-age online  n/a  n/a 
198 offline-uncorrectable       100   253     0   0 sectors   0x000000000000 old-age offline n/a  n/a 
199 udma-crc-error-count        200   199     0   1806        0x0e0700000000 old-age online  n/a  n/a 
200 multi-zone-error-rate       100   253     0   0           0x000000000000 old-age offline n/a  n/a

Device: sat16:/dev/sdd
Type: 16 Byte SCSI ATA SAT Passthru
Size: 3815447 MiB
Model: [WDC WD40EFAX-68JH4N1]
Serial: [WD-WX32D11ED8N5]
Firmware: [83.00A83]
SMART Available: yes
Quirks:
Awake: yes
SMART Disk Health Good: yes
Off-line Data Collection Status: [Off-line data collection activity was never started.]
Total Time To Complete Off-Line Data Collection: 46184 s
Self-Test Execution Status: [The previous self-test routine completed without error or no self-test has ever been run.]
Percent Self-Test Remaining: 0%
Conveyance Self-Test Available: yes
Short/Extended Self-Test Available: yes
Start Self-Test Available: yes
Abort Self-Test Available: yes
Short Self-Test Polling Time: 2 min
Extended Self-Test Polling Time: 502 min
Conveyance Self-Test Polling Time: 3 min
Bad Sectors: 0 sectors
Powered On: 3.6 years
Power Cycles: 43
Average Powered On Per Power Cycle: 1.0 months
Temperature: 28.0 C
Attribute Parsing Verification: Good
Overall Status: GOOD
ID# Name                        Value Worst Thres Pretty      Raw            Type    Updates Good Good/Past
  1 raw-read-error-rate         200   200    51   0           0x000000000000 prefail online  yes  yes 
  3 spin-up-time                202   199    21   2.9 s       0x430b00000000 prefail online  yes  yes 
  4 start-stop-count            100   100     0   45          0x2d0000000000 old-age online  n/a  n/a 
  5 reallocated-sector-count    200   200   140   0 sectors   0x000000000000 prefail online  yes  yes 
  7 seek-error-rate             200   200     0   0           0x000000000000 old-age online  n/a  n/a 
  9 power-on-hours               57    57     0   3.6 years   0x447b00000000 old-age online  n/a  n/a 
 10 spin-retry-count            100   253     0   0           0x000000000000 old-age online  n/a  n/a 
 11 calibration-retry-count     100   253     0   0           0x000000000000 old-age online  n/a  n/a 
 12 power-cycle-count           100   100     0   43          0x2b0000000000 old-age online  n/a  n/a 
192 power-off-retract-count     200   200     0   28          0x1c0000000000 old-age online  n/a  n/a 
193 load-cycle-count            200   200     0   2402        0x620900000000 old-age online  n/a  n/a 
194 temperature-celsius-2       119   106     0   28.0 C      0x1c0000000000 old-age online  n/a  n/a 
196 reallocated-event-count     200   200     0   0           0x000000000000 old-age online  n/a  n/a 
197 current-pending-sector      200   200     0   0 sectors   0x000000000000 old-age online  n/a  n/a 
198 offline-uncorrectable       100   253     0   0 sectors   0x000000000000 old-age offline n/a  n/a 
199 udma-crc-error-count        200   200     0   1783        0xf70600000000 old-age online  n/a  n/a 
200 multi-zone-error-rate       100   253     0   0           0x000000000000 old-age offline n/a  n/a

mdadm地位:

/dev/md0:
           Version : 1.2
     Creation Time : Sun Jul 11 18:12:41 2021
        Raid Level : raid5
        Array Size : 7813772288 (7.28 TiB 8.00 TB)
     Used Dev Size : 3906886144 (3.64 TiB 4.00 TB)
      Raid Devices : 3
     Total Devices : 3
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Wed Mar  5 12:58:11 2025
             State : clean 
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : crab:0  (local to host crab)
              UUID : b9f769fb:49026686:78737cc2:90e8e63c
            Events : 23954

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1       8       16        1      active sync   /dev/sdb
       3       8       48        2      active sync   /dev/sdd

更新mdadm:如果我进行检查,我就能始终如一地重现该问题。

  • dmesg没有 noncq 标志的日志:https://pastebin.com/sz1sXNQ1
  • dmesg启用 noncq 标志的日志/etc/default/grub:https://pastebin.com/Aib0B8wz

我还注意到,尽管我的硬盘是 WD Reds,但实际上是 SMR 硬盘。我开始怀疑这可能是问题所在,但我不确定。

ubuntu
  • 1 个回答
  • 81 Views

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    如何减少“vmmem”进程的消耗?

    • 11 个回答
  • Marko Smith

    从 Microsoft Stream 下载视频

    • 4 个回答
  • Marko Smith

    Google Chrome DevTools 无法解析 SourceMap:chrome-extension

    • 6 个回答
  • Marko Smith

    Windows 照片查看器因为内存不足而无法运行?

    • 5 个回答
  • Marko Smith

    支持结束后如何激活 WindowsXP?

    • 6 个回答
  • Marko Smith

    远程桌面间歇性冻结

    • 7 个回答
  • Marko Smith

    子网掩码 /32 是什么意思?

    • 6 个回答
  • Marko Smith

    鼠标指针在 Windows 中按下的箭头键上移动?

    • 1 个回答
  • Marko Smith

    VirtualBox 无法以 VERR_NEM_VM_CREATE_FAILED 启动

    • 8 个回答
  • Marko Smith

    应用程序不会出现在 MacBook 的摄像头和麦克风隐私设置中

    • 5 个回答
  • Martin Hope
    Vickel Firefox 不再允许粘贴到 WhatsApp 网页中? 2023-08-18 05:04:35 +0800 CST
  • Martin Hope
    Saaru Lindestøkke 为什么使用 Python 的 tar 库时 tar.xz 文件比 macOS tar 小 15 倍? 2021-03-14 09:37:48 +0800 CST
  • Martin Hope
    CiaranWelsh 如何减少“vmmem”进程的消耗? 2020-06-10 02:06:58 +0800 CST
  • Martin Hope
    Jim Windows 10 搜索未加载,显示空白窗口 2020-02-06 03:28:26 +0800 CST
  • Martin Hope
    andre_ss6 远程桌面间歇性冻结 2019-09-11 12:56:40 +0800 CST
  • Martin Hope
    Riley Carney 为什么在 URL 后面加一个点会删除登录信息? 2019-08-06 10:59:24 +0800 CST
  • Martin Hope
    zdimension 鼠标指针在 Windows 中按下的箭头键上移动? 2019-08-04 06:39:57 +0800 CST
  • Martin Hope
    jonsca 我所有的 Firefox 附加组件突然被禁用了,我该如何重新启用它们? 2019-05-04 17:58:52 +0800 CST
  • Martin Hope
    MCK 是否可以使用文本创建二维码? 2019-04-02 06:32:14 +0800 CST
  • Martin Hope
    SoniEx2 更改 git init 默认分支名称 2019-04-01 06:16:56 +0800 CST

热门标签

windows-10 linux windows microsoft-excel networking ubuntu worksheet-function bash command-line hard-drive

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve