AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • Início
  • system&network
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • Início
  • system&network
    • Recentes
    • Highest score
    • tags
  • Ubuntu
    • Recentes
    • Highest score
    • tags
  • Unix
    • Recentes
    • tags
  • DBA
    • Recentes
    • tags
  • Computer
    • Recentes
    • tags
  • Coding
    • Recentes
    • tags
Início / user-2694154

SolidSnackDrive's questions

Martin Hope
SolidSnackDrive
Asked: 2025-03-06 05:21:38 +0800 CST

Por que as unidades parecem boas, com os cabos trocados, mas estão cheias de erros de leitura de FPDMA?

  • 6

Não tinha certeza se isso é melhor aqui ou se é uma falha do servidor. Vou tentar aqui por enquanto.

Tenho uma máquina que estou usando como servidor pessoal com as seguintes especificações:

  • 2x Xeon E5-2690 v3
  • Placa-mãe ASUS série Z10PA-D8
  • Nvidia Quadro P4000
  • 1x Samsung SSD 870 EVO 500 GB (unidade do sistema)
  • 3x WDC WD40EFAX-68JH4N1 (na configuração mdadm RAID5)
  • Fonte de alimentação Cooler Master Gold 1000W
  • Executando Ubuntu 24.04 LTS

Parece ser um pouco intermitente, mas frequentemente meu log do dmesg é inundado com o mesmo tipo de erro FDPMA READ QUEUED em várias unidades.

[  +0.003702] ata10.00: status: { DRDY }
[  +0.001747] ata10.00: failed command: READ FPDMA QUEUED
[  +0.001794] ata10.00: cmd 60/40:38:80:89:33/05:00:a9:01:00/40 tag 7 ncq dma 688128 in
                       res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x50 (ATA bus error)
[  +0.003511] ata10.00: status: { DRDY }
[  +0.001673] ata10.00: failed command: READ FPDMA QUEUED
[  +0.001828] ata10.00: cmd 60/40:40:c0:8e:33/05:00:a9:01:00/40 tag 8 ncq dma 688128 in
                       res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x50 (ATA bus error)
[  +0.003079] ata10.00: status: { DRDY }
[  +0.000513] ata10.00: failed command: READ FPDMA QUEUED
[  +0.001324] ata10.00: cmd 60/40:48:40:99:33/05:00:a9:01:00/40 tag 9 ncq dma 688128 in
                       res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x50 (ATA bus error)
[  +0.003468] ata10.00: status: { DRDY }
[  +0.001671] ata10.00: failed command: READ FPDMA QUEUED
[  +0.001707] ata10.00: cmd 60/40:50:80:9e:33/05:00:a9:01:00/40 tag 10 ncq dma 688128 in
                       res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x50 (ATA bus error)

O que eu tentei:

  • Comprei novos cabos SATA para cada uma das unidades, substituindo os antigos
  • Reconectou os cabos de alimentação a cada unidade
  • Embaralhado em torno das portas que estão sendo usadas na placa-mãe
  • Verifique a versão do BIOS (está atualizada com a versão mais recente)
  • Verifique a saúde SMART (veja pastebins)

O sistema ainda parece utilizável, mas não consigo imaginar que esses erros sejam algo bom. Não tenho muita certeza do que tentar para diagnosticá-lo em seguida. Os relatórios SMART da unidade parecem indicar que estão todos em boas condições, e os cabos SATA são todos novos.

dmesglogs, truncados (copiados do meu terminal scrollback antes de reiniciar, atualizarei com uma versão completa na próxima vez que o erro se manifestar): https://pastebin.com/j0Jnhkgt

skdumppara cada disco

Device: sat16:/dev/sda
Type: 16 Byte SCSI ATA SAT Passthru
Size: 476940 MiB
Model: [Samsung SSD 870 EVO 500GB]
Serial: [S6PXNU0X400255M]
Firmware: [SVT02B6Q]
SMART Available: yes
Quirks:
Awake: yes
SMART Disk Health Good: yes
Off-line Data Collection Status: [Off-line data collection activity was never started.]
Total Time To Complete Off-Line Data Collection: 0 s
Self-Test Execution Status: [The previous self-test routine completed without error or no self-test has ever been run.]
Percent Self-Test Remaining: 0%
Conveyance Self-Test Available: no
Short/Extended Self-Test Available: yes
Start Self-Test Available: yes
Abort Self-Test Available: yes
Short Self-Test Polling Time: 2 min
Extended Self-Test Polling Time: 85 min
Conveyance Self-Test Polling Time: 0 min
Bad Sectors: 0 sectors
Powered On: 6.7 months
Power Cycles: 30
Average Powered On Per Power Cycle: 6.7 days
Temperature: 25.0 C
Attribute Parsing Verification: Good
Overall Status: GOOD
ID# Name                        Value Worst Thres Pretty      Raw            Type    Updates Good Good/Past
  5 reallocated-sector-count    100   100    10   0 sectors   0x000000000000 prefail online  yes  yes 
  9 power-on-hours               99    99     0   6.7 months  0xdb1200000000 old-age online  n/a  n/a 
 12 power-cycle-count            99    99     0   30          0x1e0000000000 old-age online  n/a  n/a 
177 wear-leveling-count          99    99     0   3           0x030000000000 prefail online  n/a  n/a 
179 used-reserved-blocks-total  100   100    10   0           0x000000000000 prefail online  yes  yes 
181 program-fail-count-total    100   100    10   0           0x000000000000 old-age online  yes  yes 
182 erase-fail-count-total      100   100    10   0           0x000000000000 old-age online  yes  yes 
183 runtime-bad-block-total     100   100    10   0           0x000000000000 prefail online  yes  yes 
187 reported-uncorrect          100   100     0   0 sectors   0x000000000000 old-age online  n/a  n/a 
190 airflow-temperature-celsius  75    63     0   25.0 C      0x190000000000 old-age online  n/a  n/a 
195 hardware-ecc-recovered      200   200     0   0           0x000000000000 old-age online  n/a  n/a 
199 udma-crc-error-count         99    99     0   475         0xdb0100000000 old-age online  n/a  n/a 
235 good-block-rate              99    99     0   n/a         0x130000000000 old-age online  n/a  n/a 
241 total-lbas-written           99    99     0   34642.150 TB 0x106d893d0000 old-age online  n/a  n/a 
252 attribute-252               100   100     0   n/a         0x020000000000 old-age online  n/a  n/a

Device: sat16:/dev/sdb
Type: 16 Byte SCSI ATA SAT Passthru
Size: 3815447 MiB
Model: [WDC WD40EFAX-68JH4N1]
Serial: [WD-WX22D11RCA6L]
Firmware: [83.00A83]
SMART Available: yes
Quirks:
Awake: yes
SMART Disk Health Good: yes
Off-line Data Collection Status: [Off-line data collection activity was never started.]
Total Time To Complete Off-Line Data Collection: 404 s
Self-Test Execution Status: [The previous self-test routine completed without error or no self-test has ever been run.]
Percent Self-Test Remaining: 0%
Conveyance Self-Test Available: yes
Short/Extended Self-Test Available: yes
Start Self-Test Available: yes
Abort Self-Test Available: yes
Short Self-Test Polling Time: 2 min
Extended Self-Test Polling Time: 120 min
Conveyance Self-Test Polling Time: 2 min
Bad Sectors: 0 sectors
Powered On: 3.6 years
Power Cycles: 43
Average Powered On Per Power Cycle: 1.0 months
Temperature: 27.0 C
Attribute Parsing Verification: Good
Overall Status: GOOD
ID# Name                        Value Worst Thres Pretty      Raw            Type    Updates Good Good/Past
  1 raw-read-error-rate         200   200    51   0           0x000000000000 prefail online  yes  yes 
  3 spin-up-time                201   200    21   2.9 s       0x7d0b00000000 prefail online  yes  yes 
  4 start-stop-count            100   100     0   45          0x2d0000000000 old-age online  n/a  n/a 
  5 reallocated-sector-count    200   200   140   0 sectors   0x000000000000 prefail online  yes  yes 
  7 seek-error-rate             200   200     0   0           0x000000000000 old-age online  n/a  n/a 
  9 power-on-hours               57    57     0   3.6 years   0x027b00000000 old-age online  n/a  n/a 
 10 spin-retry-count            100   253     0   0           0x000000000000 old-age online  n/a  n/a 
 11 calibration-retry-count     100   253     0   0           0x000000000000 old-age online  n/a  n/a 
 12 power-cycle-count           100   100     0   43          0x2b0000000000 old-age online  n/a  n/a 
192 power-off-retract-count     200   200     0   28          0x1c0000000000 old-age online  n/a  n/a 
193 load-cycle-count            194   194     0   20445       0xdd4f00000000 old-age online  n/a  n/a 
194 temperature-celsius-2       120   109     0   27.0 C      0x1b0000000000 old-age online  n/a  n/a 
196 reallocated-event-count     200   200     0   0           0x000000000000 old-age online  n/a  n/a 
197 current-pending-sector      200   200     0   0 sectors   0x000000000000 old-age online  n/a  n/a 
198 offline-uncorrectable       100   253     0   0 sectors   0x000000000000 old-age offline n/a  n/a 
199 udma-crc-error-count        200   200     0   1825        0x210700000000 old-age online  n/a  n/a 
200 multi-zone-error-rate       100   253     0   0           0x000000000000 old-age offline n/a  n/a

Device: sat16:/dev/sdc
Type: 16 Byte SCSI ATA SAT Passthru
Size: 3815447 MiB
Model: [WDC WD40EFAX-68JH4N1]
Serial: [WD-WX22D11RCFDP]
Firmware: [83.00A83]
SMART Available: yes
Quirks:
Awake: yes
SMART Disk Health Good: yes
Off-line Data Collection Status: [Off-line data collection activity was never started.]
Total Time To Complete Off-Line Data Collection: 54720 s
Self-Test Execution Status: [The previous self-test routine completed without error or no self-test has ever been run.]
Percent Self-Test Remaining: 0%
Conveyance Self-Test Available: yes
Short/Extended Self-Test Available: yes
Start Self-Test Available: yes
Abort Self-Test Available: yes
Short Self-Test Polling Time: 2 min
Extended Self-Test Polling Time: 121 min
Conveyance Self-Test Polling Time: 2 min
Bad Sectors: 0 sectors
Powered On: 3.6 years
Power Cycles: 43
Average Powered On Per Power Cycle: 1.0 months
Temperature: 28.0 C
Attribute Parsing Verification: Good
Overall Status: GOOD
ID# Name                        Value Worst Thres Pretty      Raw            Type    Updates Good Good/Past
  1 raw-read-error-rate         200   200    51   0           0x000000000000 prefail online  yes  yes 
  3 spin-up-time                201   200    21   2.9 s       0x640b00000000 prefail online  yes  yes 
  4 start-stop-count            100   100     0   45          0x2d0000000000 old-age online  n/a  n/a 
  5 reallocated-sector-count    200   200   140   0 sectors   0x000000000000 prefail online  yes  yes 
  7 seek-error-rate             200   200     0   0           0x000000000000 old-age online  n/a  n/a 
  9 power-on-hours               57    57     0   3.6 years   0x2d7b00000000 old-age online  n/a  n/a 
 10 spin-retry-count            100   253     0   0           0x000000000000 old-age online  n/a  n/a 
 11 calibration-retry-count     100   253     0   0           0x000000000000 old-age online  n/a  n/a 
 12 power-cycle-count           100   100     0   43          0x2b0000000000 old-age online  n/a  n/a 
192 power-off-retract-count     200   200     0   28          0x1c0000000000 old-age online  n/a  n/a 
193 load-cycle-count            194   194     0   20720       0xf05000000000 old-age online  n/a  n/a 
194 temperature-celsius-2       119   107     0   28.0 C      0x1c0000000000 old-age online  n/a  n/a 
196 reallocated-event-count     200   200     0   0           0x000000000000 old-age online  n/a  n/a 
197 current-pending-sector      200   200     0   0 sectors   0x000000000000 old-age online  n/a  n/a 
198 offline-uncorrectable       100   253     0   0 sectors   0x000000000000 old-age offline n/a  n/a 
199 udma-crc-error-count        200   199     0   1806        0x0e0700000000 old-age online  n/a  n/a 
200 multi-zone-error-rate       100   253     0   0           0x000000000000 old-age offline n/a  n/a

Device: sat16:/dev/sdd
Type: 16 Byte SCSI ATA SAT Passthru
Size: 3815447 MiB
Model: [WDC WD40EFAX-68JH4N1]
Serial: [WD-WX32D11ED8N5]
Firmware: [83.00A83]
SMART Available: yes
Quirks:
Awake: yes
SMART Disk Health Good: yes
Off-line Data Collection Status: [Off-line data collection activity was never started.]
Total Time To Complete Off-Line Data Collection: 46184 s
Self-Test Execution Status: [The previous self-test routine completed without error or no self-test has ever been run.]
Percent Self-Test Remaining: 0%
Conveyance Self-Test Available: yes
Short/Extended Self-Test Available: yes
Start Self-Test Available: yes
Abort Self-Test Available: yes
Short Self-Test Polling Time: 2 min
Extended Self-Test Polling Time: 502 min
Conveyance Self-Test Polling Time: 3 min
Bad Sectors: 0 sectors
Powered On: 3.6 years
Power Cycles: 43
Average Powered On Per Power Cycle: 1.0 months
Temperature: 28.0 C
Attribute Parsing Verification: Good
Overall Status: GOOD
ID# Name                        Value Worst Thres Pretty      Raw            Type    Updates Good Good/Past
  1 raw-read-error-rate         200   200    51   0           0x000000000000 prefail online  yes  yes 
  3 spin-up-time                202   199    21   2.9 s       0x430b00000000 prefail online  yes  yes 
  4 start-stop-count            100   100     0   45          0x2d0000000000 old-age online  n/a  n/a 
  5 reallocated-sector-count    200   200   140   0 sectors   0x000000000000 prefail online  yes  yes 
  7 seek-error-rate             200   200     0   0           0x000000000000 old-age online  n/a  n/a 
  9 power-on-hours               57    57     0   3.6 years   0x447b00000000 old-age online  n/a  n/a 
 10 spin-retry-count            100   253     0   0           0x000000000000 old-age online  n/a  n/a 
 11 calibration-retry-count     100   253     0   0           0x000000000000 old-age online  n/a  n/a 
 12 power-cycle-count           100   100     0   43          0x2b0000000000 old-age online  n/a  n/a 
192 power-off-retract-count     200   200     0   28          0x1c0000000000 old-age online  n/a  n/a 
193 load-cycle-count            200   200     0   2402        0x620900000000 old-age online  n/a  n/a 
194 temperature-celsius-2       119   106     0   28.0 C      0x1c0000000000 old-age online  n/a  n/a 
196 reallocated-event-count     200   200     0   0           0x000000000000 old-age online  n/a  n/a 
197 current-pending-sector      200   200     0   0 sectors   0x000000000000 old-age online  n/a  n/a 
198 offline-uncorrectable       100   253     0   0 sectors   0x000000000000 old-age offline n/a  n/a 
199 udma-crc-error-count        200   200     0   1783        0xf70600000000 old-age online  n/a  n/a 
200 multi-zone-error-rate       100   253     0   0           0x000000000000 old-age offline n/a  n/a

mdadmstatus:

/dev/md0:
           Version : 1.2
     Creation Time : Sun Jul 11 18:12:41 2021
        Raid Level : raid5
        Array Size : 7813772288 (7.28 TiB 8.00 TB)
     Used Dev Size : 3906886144 (3.64 TiB 4.00 TB)
      Raid Devices : 3
     Total Devices : 3
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Wed Mar  5 12:58:11 2025
             State : clean 
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : crab:0  (local to host crab)
              UUID : b9f769fb:49026686:78737cc2:90e8e63c
            Events : 23954

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1       8       16        1      active sync   /dev/sdb
       3       8       48        2      active sync   /dev/sdd

Atualização : consigo reproduzir o problema consistentemente se eu mdadmexecutar uma verificação.

  • dmesglogs sem o sinalizador noncq: https://pastebin.com/sz1sXNQ1
  • dmesglogs com o sinalizador noncq habilitado em /etc/default/grub: https://pastebin.com/Aib0B8wz

Também notei que meus drives, apesar de serem WD Reds, são, na verdade, drives SMR. Estou começando a suspeitar que esse pode ser o problema, embora eu não tenha certeza.

ubuntu
  • 1 respostas
  • 81 Views

Sidebar

Stats

  • Perguntas 205573
  • respostas 270741
  • best respostas 135370
  • utilizador 68524
  • Highest score
  • respostas
  • Marko Smith

    Como posso reduzir o consumo do processo `vmmem`?

    • 11 respostas
  • Marko Smith

    Baixar vídeo do Microsoft Stream

    • 4 respostas
  • Marko Smith

    O Google Chrome DevTools falhou ao analisar o SourceMap: chrome-extension

    • 6 respostas
  • Marko Smith

    O visualizador de fotos do Windows não pode ser executado porque não há memória suficiente?

    • 5 respostas
  • Marko Smith

    Como faço para ativar o WindowsXP agora que o suporte acabou?

    • 6 respostas
  • Marko Smith

    Área de trabalho remota congelando intermitentemente

    • 7 respostas
  • Marko Smith

    O que significa ter uma máscara de sub-rede /32?

    • 6 respostas
  • Marko Smith

    Ponteiro do mouse movendo-se nas teclas de seta pressionadas no Windows?

    • 1 respostas
  • Marko Smith

    O VirtualBox falha ao iniciar com VERR_NEM_VM_CREATE_FAILED

    • 8 respostas
  • Marko Smith

    Os aplicativos não aparecem nas configurações de privacidade da câmera e do microfone no MacBook

    • 5 respostas
  • Martin Hope
    Vickel O Firefox não permite mais colar no WhatsApp web? 2023-08-18 05:04:35 +0800 CST
  • Martin Hope
    Saaru Lindestøkke Por que os arquivos tar.xz são 15x menores ao usar a biblioteca tar do Python em comparação com o tar do macOS? 2021-03-14 09:37:48 +0800 CST
  • Martin Hope
    CiaranWelsh Como posso reduzir o consumo do processo `vmmem`? 2020-06-10 02:06:58 +0800 CST
  • Martin Hope
    Jim Pesquisa do Windows 10 não está carregando, mostrando janela em branco 2020-02-06 03:28:26 +0800 CST
  • Martin Hope
    andre_ss6 Área de trabalho remota congelando intermitentemente 2019-09-11 12:56:40 +0800 CST
  • Martin Hope
    Riley Carney Por que colocar um ponto após o URL remove as informações de login? 2019-08-06 10:59:24 +0800 CST
  • Martin Hope
    zdimension Ponteiro do mouse movendo-se nas teclas de seta pressionadas no Windows? 2019-08-04 06:39:57 +0800 CST
  • Martin Hope
    jonsca Todos os meus complementos do Firefox foram desativados repentinamente, como posso reativá-los? 2019-05-04 17:58:52 +0800 CST
  • Martin Hope
    MCK É possível criar um código QR usando texto? 2019-04-02 06:32:14 +0800 CST
  • Martin Hope
    SoniEx2 Altere o nome da ramificação padrão do git init 2019-04-01 06:16:56 +0800 CST

Hot tag

windows-10 linux windows microsoft-excel networking ubuntu worksheet-function bash command-line hard-drive

Explore

  • Início
  • Perguntas
    • Recentes
    • Highest score
  • tag
  • help

Footer

AskOverflow.Dev

About Us

  • About Us
  • Contact Us

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve