AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / server / 问题

问题[smart](server)

Martin Hope
pyofey
Asked: 2023-03-02 03:56:39 +0800 CST

解释 smartctl -a 输出

  • 5

请帮助我理解这一点

root@bdb16e4bb2e3:/opt/scrutiny# smartctl --all /dev/sdb
smartctl 7.2 2020-12-30 r5155 [aarch64-linux-5.15.0-1024-raspi] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SanDisk based SSDs
Device Model:     SanDisk pSSD
Serial Number:    <removed>
LU WWN Device Id: <removed>
Firmware Version: 6EB 1030
User Capacity:    128,043,712,512 bytes [128 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      1.8 inches
TRIM Command:     Available, deterministic
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Feb 21 15:00 2023 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x51) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  41) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0002   100   100   000    Old_age   Always       -       4
  9 Power_On_Hours          0x0002   100   100   000    Old_age   Always       -       120
 12 Power_Cycle_Count       0x0002   100   100   000    Old_age   Always       -       0
165 Total_Write/Erase_Count 0x0002   100   100   000    Old_age   Always       -       2054
171 Program_Fail_Count      0x0002   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0002   100   100   000    Old_age   Always       -       4
173 Avg_Write/Erase_Count   0x0002   100   100   000    Old_age   Always       -       41
174 Unexpect_Power_Loss_Ct  0x0002   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0002   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   092   008   000    Old_age   Always       -       8 (Min/Max -4/33)
230 Perc_Write/Erase_Count  0x0002   100   100   000    Old_age   Always       -       136
232 Perc_Avail_Resrvd_Space 0x0003   000   100   005    Pre-fail  Always   FAILING_NOW 0
234 Perc_Write/Erase_Ct_BC  0x0002   100   100   000    Old_age   Always       -       10000
241 Total_LBAs_Written      0x0002   100   100   000    Old_age   Always       -       0
242 Total_LBAs_Read         0x0002   100   100   000    Old_age   Always       -       0

SMART Error Log not supported

SMART Self-test Log not supported

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

232 Perc_Avail_Resrvd_Space 0x0003 000 100 005 Pre-fail Always FAILING_NOW 0 失败,我应该担心吗?

smart
  • 1 个回答
  • 19 Views
Martin Hope
dkd6
Asked: 2022-04-06 06:40:36 +0800 CST

btrfs - 失败的磁盘生成校验和错误,磁盘已更换,错误仍然存​​在

  • 1

我在 btrfs raid1 阵列中有一对 3TB 磁盘。

其中一个磁盘开始出现故障(smartd 显示坏扇区),因此我购买了一对新的 8TB 驱动器来替换阵列中的两个磁盘。

我将两者都替换为btrfs replace,然后运行了btrfs balance- 失败并显示以下消息:

[ 5063.136378] BTRFS error (device sdc): parent transid verify failed on 5153170751488 wanted 1433374 found 1417912
[ 5063.140428] BTRFS error (device sdc): parent transid verify failed on 5153170751488 wanted 1433374 found 1417912

现在,我在更换磁盘之前就已经看到了这些消息,但是现在由于两个磁盘都已更换,我相信它与btrfs.

我的数据已完全备份,文件系统在线且工作正常,但由于此错误,我无法进行平衡。运行清理会产生少量无法纠正的错误,就像我更换磁盘之前一样。

我想知道我怎么能,也许:

  1. 找出损坏的文件并从备份中恢复它们
  2. 重置文件系统上的事务以消除错误
  3. 平衡时忽略错误

...或任何其他合理的解决方案。

谢谢!

storage smart btrfs
  • 1 个回答
  • 326 Views
Martin Hope
fi11222
Asked: 2022-02-23 23:43:10 +0800 CST

SATA 错误出现在 Journalctl 中,而 SMART 诊断正常 - 主板问题?

  • 0

在注意到异常长的磁盘操作延迟后,我查找了 journalctl,这就是我发现的:

Feb 22 14:02:11.711182 Onan01 kernel: ata10: hard resetting link
Feb 22 14:02:12.186958 Onan01 kernel: ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb 22 14:02:12.187044 Onan01 kernel: ata10.00: configured for UDMA/33
Feb 22 14:02:12.187068 Onan01 kernel: ata10: EH complete
Feb 22 14:02:22.782960 Onan01 kernel: ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb 22 14:02:22.783033 Onan01 kernel: ata10.00: configured for UDMA/33
Feb 22 14:03:27.472083 Onan01 kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0xd0000 action 0x6 frozen
Feb 22 14:03:27.472241 Onan01 kernel: ata10: SError: { PHYRdyChg CommWake 10B8B }
Feb 22 14:03:27.472271 Onan01 kernel: ata10.00: failed command: WRITE DMA EXT
Feb 22 14:03:27.472300 Onan01 kernel: ata10.00: cmd 35/00:18:00:35:44/00:00:74:00:00/e0 tag 14 dma 12288 out
                                               res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 22 14:03:27.472323 Onan01 kernel: ata10.00: status: { DRDY }
Feb 22 14:03:27.472345 Onan01 kernel: ata10: hard resetting link
Feb 22 14:03:27.950979 Onan01 kernel: ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb 22 14:03:27.951084 Onan01 kernel: ata10.00: configured for UDMA/33
Feb 22 14:03:27.951113 Onan01 kernel: ata10: EH complete
Feb 22 14:04:03.852081 Onan01 kernel: ata10.00: exception Emask 0x10 SAct 0x0 SErr 0x40d0000 action 0xe frozen
Feb 22 14:04:03.852242 Onan01 kernel: ata10.00: irq_stat 0x00000040, connection status changed
Feb 22 14:04:03.852274 Onan01 kernel: ata10: SError: { PHYRdyChg CommWake 10B8B DevExch }
Feb 22 14:04:03.852301 Onan01 kernel: ata10.00: failed command: WRITE DMA EXT
Feb 22 14:04:03.852325 Onan01 kernel: ata10.00: cmd 35/00:38:58:35:44/00:00:74:00:00/e0 tag 17 dma 28672 out
                                               res 50/00:00:38:23:00/00:00:ac:00:00/e0 Emask 0x10 (ATA bus error)
Feb 22 14:04:03.852357 Onan01 kernel: ata10.00: status: { DRDY }

第一种错误(超时)似乎比第二种错误(ATA 总线错误)更频繁。每个都有不少。SATA 通道ata10连接到 WD Caviar Green HDD。

此磁盘上的 SMART 诊断显然是干净的:

sudo smartctl --all /dev/sdf1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-100-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD20EZAZ-00GGJB0
Serial Number:    WD-WXT1A29LE265
LU WWN Device Id: 5 0014ee 211b07a4f
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Feb 23 11:37:14 2022 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (32520) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 103) minutes.
Conveyance self-test routine
recommended polling time:    (   2) minutes.
SCT capabilities:          (0x3035) SCT Status supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   184   170   021    Pre-fail  Always       -       1783
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1573
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   083   083   000    Old_age   Always       -       13100
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1524
192 Power-Off_Retract_Count 0x0032   199   199   000    Old_age   Always       -       761
193 Load_Cycle_Count        0x0032   147   147   000    Old_age   Always       -       160779
194 Temperature_Celsius     0x0022   115   104   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     13100         -
# 2  Short offline       Completed without error       00%     13099         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

一件奇怪的事情是,长时间的 SMART 测试似乎无法正常工作。它们从进度 90% 直接完成(没有 80%、70% 等),之后,它们不会出现在“SMART 自测日志”部分中。

我连续两天经历了文件操作延迟。重新启动后,问题似乎消失了,然后又回来了。具体来说,问题表现为复制或移动文件的长时间延迟以及 LibreOffice 挂起文件保存。知道导致此类错误的原因是什么吗?

操作系统:Ubuntu 20.04

处理器:锐龙3

MB:技嘉 X570 UD

smart sata hdd
  • 1 个回答
  • 86 Views
Martin Hope
Nuno
Asked: 2021-11-25 01:07:56 +0800 CST

NVMe 健康测试

  • 1

在我拥有的带有 HDD 或 SSD 的服务器上,我有一个定期运行的 cron:

/usr/sbin/smartctl --test=short/long /dev/sd1

(对于每个磁盘)

当它运行时,它只是查看 的输出/usr/sbin/smartctl -c /dev/sd1,循环直到它不再包含:

[0-9]+% of test remaining.

然后检查它是否在没有错误的情况下完成:

(   0)  The previous self-test routine completed

但是,从 7.0 版开始,它似乎smartctl还不支持对 NVMe 的测试,并且按照:https ://www.smartmontools.org/wiki/NVMe_Support

它确实说

smartd 守护进程跟踪运行状况 (-H)、错误计数 (-l error) 和温度 (-W DIFF,INFO,CRIT)

但实际运行测试的是什么?除非我们运行短/长测试,否则我不确定是否输出-H和更新?-l

我也读过关于nvme-cli,但我似乎没有找到用它在磁盘上运行健康测试的方法。

有任何想法吗?

在这里使用 CentOS 7。

centos smart healthcheck nvme smartctl
  • 1 个回答
  • 1058 Views
Martin Hope
dkd6
Asked: 2021-10-02 04:11:43 +0800 CST

SMART 显示不可读的扇区,btrfs 清理干净 - 这是正确的吗?

  • 7

我有一对 RAID1 中的磁盘,格式为btrfs.

磁盘会定期清理,我会收到结果通知。他们已经运行了大约 2-3 年,没有任何问题。

但是,我最近添加smartd到我的安装中,它立即抱怨其中一个驱动器中有少量不可读的扇区:

Device: /dev/sdc [SAT], 4 Currently unreadable (pending) sectors

我对该驱动器进行了清理,发现并纠正了相同数量的错误,但智能错误消息并没有消失。同一磁盘上的后续清理显示没有错误。

我不确定这些工具中的哪一个最准确 -smartd显示误报,或者btrfs缺少坏扇区,或者我可能误解了结果?

验证磁盘运行状况的最佳方法是什么?

谢谢!

storage smart btrfs
  • 2 个回答
  • 1181 Views
Martin Hope
Peit
Asked: 2021-03-01 00:19:38 +0800 CST

相对较新的 WD Red Pro 产生 ATA 状态:41 (DRDY ERR),错误:FreeBSD 12.2 上的 40 (UNC)

  • 2

我正在运行基于FreeBSD 12.2的TrueNAS服务器。我将存储迁移到10 TB WD Red Pro。他们现在运行了 42 天。

突然,在 ZFS 清理期间,其中一个磁盘产生了 5 个错误。他们都或多或少地读到:

(ada2:ahcich14:0:0:0): READ_FPDMA_QUEUED. ACB: 60 b8 08 3a 0f 40 f8 01 00 07 00 00
(ada2:ahcich14:0:0:0): CAM status: ATA Status Error
(ada2:ahcich14:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada2:ahcich14:0:0:0): RES: 41 40 90 3b 0f 40 f8 01 00 30 06
(ada2:ahcich14:0:0:0): Retrying command, 3 more tries remain

事件发生后我进行了扩展的 SMART 测试,但没有产生任何错误(记录的错误除外),尤其是没有重新定位的扇区等:

smartctl 7.1 2019-12-30 r5022 [FreeBSD 12.2-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD102KFBX-68M95N0
Serial Number:    [deleted]
LU WWN Device Id: 5 000cca 0b0cd3041
Firmware Version: 83.00A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    [deleted]
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (   87) seconds.
Offline data collection
capabilities:            (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (1108) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   132   132   054    Old_age   Offline      -       96
  3 Spin_Up_Time            0x0007   100   100   024    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       3
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       1077
 10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       3
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       215
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       215
194 Temperature_Celsius     0x0002   142   142   000    Old_age   Always       -       42 (Min/Max 25/67)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 5
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 5 occurred at disk power-on lifetime: 1050 hours (43 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 b8 10 08 3a 0f 40 08  15d+21:45:25.729  READ FPDMA QUEUED
  60 80 38 b8 60 0f 40 08  15d+21:45:18.777  READ FPDMA QUEUED
  60 b8 30 f8 58 0f 40 08  15d+21:45:18.775  READ FPDMA QUEUED
  60 b8 28 40 51 0f 40 08  15d+21:45:18.775  READ FPDMA QUEUED
  60 b8 20 80 49 0f 40 08  15d+21:45:15.608  READ FPDMA QUEUED

Error 4 occurred at disk power-on lifetime: 1050 hours (43 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 b8 28 10 d8 0e 40 08  15d+21:45:10.298  READ FPDMA QUEUED
  60 80 40 48 ef 0e 40 08  15d+21:45:03.370  READ FPDMA QUEUED
  60 b8 38 88 e7 0e 40 08  15d+21:45:03.178  READ FPDMA QUEUED
  60 b8 30 d0 df 0e 40 08  15d+21:45:00.444  READ FPDMA QUEUED
  60 20 20 f0 d3 0e 40 08  15d+21:45:00.286  READ FPDMA QUEUED

Error 3 occurred at disk power-on lifetime: 1050 hours (43 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 b8 00 90 81 23 40 08  15d+21:41:08.578  READ FPDMA QUEUED
  60 80 10 08 91 23 40 08  15d+21:41:08.336  READ FPDMA QUEUED
  60 b8 08 48 89 23 40 08  15d+21:41:01.627  READ FPDMA QUEUED
  60 b8 f8 d0 79 23 40 08  15d+21:40:57.546  READ FPDMA QUEUED
  60 b8 f0 18 72 23 40 08  15d+21:40:56.899  READ FPDMA QUEUED

Error 2 occurred at disk power-on lifetime: 1050 hours (43 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 b8 18 f0 d5 17 40 08  15d+21:34:13.263  READ FPDMA QUEUED
  60 20 50 10 0c 18 40 08  15d+21:34:06.288  READ FPDMA QUEUED
  60 b8 48 58 04 18 40 08  15d+21:34:06.288  READ FPDMA QUEUED
  60 b8 40 98 fc 17 40 08  15d+21:34:06.288  READ FPDMA QUEUED
  60 b8 38 e0 f4 17 40 08  15d+21:34:06.288  READ FPDMA QUEUED

Error 1 occurred at disk power-on lifetime: 1050 hours (43 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 b8 50 28 8b 17 40 08  15d+21:33:33.959  READ FPDMA QUEUED
  60 b8 48 70 83 17 40 08  15d+21:33:16.648  READ FPDMA QUEUED
  60 80 40 e8 82 17 40 08  15d+21:33:16.647  READ FPDMA QUEUED
  ea 00 00 00 00 00 40 08  15d+21:33:16.640  FLUSH CACHE EXT
  61 08 30 f0 fd 3f 40 08  15d+21:33:16.638  WRITE FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1072         -
# 2  Short offline       Completed without error       00%      1023         -
# 3  Extended offline    Completed without error       00%       946         -
# 4  Short offline       Completed without error       00%       855         -
# 5  Short offline       Completed without error       00%       687         -
# 6  Extended offline    Completed without error       00%       610         -
# 7  Short offline       Completed without error       00%       519         -
# 8  Short offline       Completed without error       00%       279         -
# 9  Extended offline    Completed without error       00%       202         -
#10  Short offline       Completed without error       00%       111         -
#11  Short offline       Completed without error       00%        11         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

首先,我认为我可能买了有缺陷的磁盘。我本来希望 SMART 无法通过评估。然而,这种情况并非如此。我不认为它是有缺陷的 PSU,因为它甚至还不到一岁。此外,它是一个 550 瓦的 PSU,机器消耗大约 100 瓦。我也不认为这是一条有缺陷的电缆,因为我运行其他磁盘将近一年都没有问题。此外,对于这些其他光盘,我实际上有一条有缺陷的电缆,我更换了它,观察结果有所不同。

我正在考虑对驱动器进行 RMA,但我不确定它是否符合 RMA 的条件。你怎么看?这可能是一个暂时的错误吗?任何建议表示赞赏。

hard-drive freebsd zfs smart sata
  • 2 个回答
  • 1278 Views
Martin Hope
Mike Texter
Asked: 2020-08-08 09:06:17 +0800 CST

在 SSD 上重置 SMART(通电时间)

  • 1

SanDisk SSD(戴尔或 HPE 品牌)上的一个已知问题困扰着我们,它们在通电一定小时后出现硬故障 - 32768 或 40000,具体取决于具体型号。有没有一种可靠的方法来回滚这个 SMART 属性,以便我们可以更新这些固件并让它们再次运行?我们有许多工具可供使用,但据我们所知,没有一个工具可以做到这一点。

ssd smart drive-failure
  • 1 个回答
  • 162 Views
Martin Hope
Alexandru
Asked: 2020-02-29 03:41:15 +0800 CST

HDD SMART解读

  • 2

如果下面的驱动器出现故障,我需要您的意见。

当我运行“smartctl -a /dev/sda -d megaraid,1”时,输出末尾会出现 2 个错误,说明“错误:LBA 上的 WP”。我在 SMART 参数中没有看到任何可疑之处。

这是“smartctl -a /dev/sda -d megaraid,1”的完整输出。

此 HDD 是 RAID 1(镜像)硬件配置中的两个 HDD 之一,位于 Dell PowerEdge 服务器上的 Dell H330 控制器上。

smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-957.21.3.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Toshiba 3.5" MG03ACAxxx(Y) Enterprise HDD
Device Model:     TOSHIBA MG03ACA300
Serial Number:    73VCK8GDF
LU WWN Device Id: 5 000039 4ebc82c58
Firmware Version: FL1A
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Feb 27 23:05:39 2020 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: ATA return descriptor not supported by controller firmware
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 510) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       8874
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       27
  5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   068   068   000    Old_age   Always       -       12964
 10 Spin_Retry_Count        0x0033   100   100   030    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       27
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       6
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       25
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       42
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       31 (Min/Max 11/48)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       0
222 Loaded_Hours            0x0032   068   068   000    Old_age   Always       -       12994
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       103
240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 2
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 12901 hours (537 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 10 0e fb 74 40  Error: WP at LBA = 0x0074fb0e = 7666446

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 48 7a e0 40 00  42d+20:47:35.187  WRITE FPDMA QUEUED
  61 08 20 58 89 8a 40 00  42d+20:47:35.187  WRITE FPDMA QUEUED
  61 10 20 48 89 8a 40 00  42d+20:47:35.187  WRITE FPDMA QUEUED
  61 08 20 48 7a e0 40 00  42d+20:47:35.183  WRITE FPDMA QUEUED
  61 08 20 40 89 8a 40 00  42d+20:47:35.183  WRITE FPDMA QUEUED

Error 1 occurred at disk power-on lifetime: 12901 hours (537 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 0e fb 74 40  Error: WP at LBA = 0x0074fb0e = 7666446

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 10 10 18 94 de 40 00  42d+20:47:32.312  WRITE FPDMA QUEUED
  60 00 08 00 fc 74 40 00  42d+20:47:32.311  READ FPDMA QUEUED
  60 00 00 00 fb 74 40 00  42d+20:47:32.311  READ FPDMA QUEUED
  60 00 00 00 fa 74 40 00  42d+20:47:32.284  READ FPDMA QUEUED
  60 00 00 00 f9 74 40 00  42d+20:47:32.264  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

稍后编辑1:

我还检查了 PowerEdge 服务器上的 iDRAC,并且在存储菜单 > 摘要 > 最近记录的存储事件中,我发现了与发生 2 个 SMART 错误相对应的事件。

事件状态:“在恢复期间更正了插槽 1 中 RAID 控制器背板 1 中磁盘 1 上的磁盘介质错误”。请在屏幕截图下方找到。

来自 iDRAC > 存储菜单 > 摘要 > 最近记录的存储事件的图像

稍后编辑2:

几天后,Current_Pending_Sector 在几个小时内增加到 1,然后又减少到 0。

Reallocated_Sector_Ct、Reallocated_Event_Count 和 Offline_Uncorrectable 始终保持为 0。

SMART 错误日志中还出现了另一个错误:“错误:LBA 的 UNC”。

虽然,iDRAC 中没有出现其他错误。

我们决定用新驱动器更换驱动器,因为我们不再信任该驱动器。

谢谢!

raid hardware-raid smart smartctl dell-perc
  • 2 个回答
  • 1756 Views
Martin Hope
Chris
Asked: 2016-12-01 16:34:24 +0800 CST

以前硬盘的SMART错误或需要更换?

  • 3

尽管磁盘很新鲜 Power_On_Minutes 427h+41m,但我运行得很聪明并提出了一些奇怪的错误

我很好奇,这些是以前硬盘的错误吗?

Error 1 occurred at disk power-on lifetime: 13729 hours (572 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle
Error 2 occurred at disk power-on lifetime: 23300 hours (970 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

这是输出

# smartctl --all /dev/sda
    smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-51-generic] (local build)
    Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

    === START OF INFORMATION SECTION ===
    Model Family:     Toshiba 2.5" HDD MK..76GSX
    Device Model:     TOSHIBA MK2576GSX
    Serial Number:    Y1J9S0IGS
    LU WWN Device Id: 5 000039 3a5a06b8e
    Firmware Version: GS001A
    User Capacity:    250,059,350,016 bytes [250 GB]
    Sector Size:      512 bytes logical/physical
    Rotation Rate:    5400 rpm
    Form Factor:      2.5 inches
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   ATA8-ACS (minor revision not indicated)
    SATA Version is:  SATA 2.6, 3.0 Gb/s (current: 3.0 Gb/s)
    Local Time is:    Thu Dec  1 00:28:22 2016 GMT
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled

    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED

    General SMART Values:
    Offline data collection status:  (0x00) Offline data collection activity
                                            was never started.
                                            Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0) The previous self-test routine completed
                                            without error or no self-test has ever
                                            been run.
    Total time to complete Offline
    data collection:                (  120) seconds.
    Offline data collection
    capabilities:                    (0x5b) SMART execute Offline immediate.
                                            Auto Offline data collection on/off support.
                                            Suspend Offline collection upon new
                                            command.
                                            Offline surface scan supported.
                                            Self-test supported.
                                            No Conveyance Self-test supported.
                                            Selective Self-test supported.
    SMART capabilities:            (0x0003) Saves SMART data before entering
                                            power-saving mode.
                                            Supports SMART auto save timer.
    Error logging capability:        (0x01) Error logging supported.
                                            General Purpose Logging supported.
    Short self-test routine
    recommended polling time:        (   2) minutes.
    Extended self-test routine
    recommended polling time:        (  81) minutes.
    SCT capabilities:              (0x003d) SCT Status supported.
                                            SCT Error Recovery Control supported.
                                            SCT Feature Control supported.
                                            SCT Data Table supported.

    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
      2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
      3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       1229
      4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       15
      5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
      8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
      9 Power_On_Minutes        0x0032   036   036   000    Old_age   Always       -       427h+41m
     10 Spin_Retry_Count        0x0033   100   100   030    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       7
    191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1
    193 Load_Cycle_Count        0x0032   070   070   000    Old_age   Always       -       304324
    194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       27 (Min/Max 20/31)
    196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
    197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       2
    220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       109
    222 Loaded_Hours            0x0032   067   067   000    Old_age   Always       -       13230
    223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
    224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
    226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       375
    240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0

    SMART Error Log Version: 1
    ATA Error Count: 2
            CR = Command Register [HEX]
            FR = Features Register [HEX]
            SC = Sector Count Register [HEX]
            SN = Sector Number Register [HEX]
            CL = Cylinder Low Register [HEX]
            CH = Cylinder High Register [HEX]
            DH = Device/Head Register [HEX]
            DC = Device Command Register [HEX]
            ER = Error register [HEX]
            ST = Status register [HEX]
    Powered_Up_Time is measured from power on, and printed as
    DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
    SS=sec, and sss=millisec. It "wraps" after 49.710 days.

    Error 2 occurred at disk power-on lifetime: 23300 hours (970 days + 20 hours)
      When the command that caused the error occurred, the device was active or idle.

      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      84 51 01 1f 7a 05 e0  Error: ICRC, ABRT 1 sectors at LBA = 0x00057a1f = 358943

      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      35 00 00 20 76 05 e0 00   6d+01:49:26.915  WRITE DMA EXT
      35 00 00 00 72 05 e0 00   6d+01:49:26.741  WRITE DMA EXT
      35 00 08 80 0f 0c e0 00   6d+01:49:26.741  WRITE DMA EXT
      35 00 08 48 8a c4 e0 00   6d+01:49:26.741  WRITE DMA EXT
      ca 00 08 00 08 14 e9 00   6d+01:49:26.741  WRITE DMA

    Error 1 occurred at disk power-on lifetime: 13729 hours (572 days + 1 hours)
      When the command that caused the error occurred, the device was active or idle.

      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      84 51 01 3f 8c 4e e0  Error: ICRC, ABRT 1 sectors at LBA = 0x004e8c3f = 5147711

      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      35 00 00 40 88 4e e0 00  12d+21:23:20.732  WRITE DMA EXT
      ca 00 08 a8 48 c8 e3 00  12d+21:23:20.731  WRITE DMA
      35 00 08 40 c1 1d e0 00  12d+21:23:20.731  WRITE DMA EXT
      35 00 08 b0 19 14 e0 00  12d+21:23:20.731  WRITE DMA EXT
      35 00 10 28 bf 13 e0 00  12d+21:23:20.731  WRITE DMA EXT

    SMART Self-test log structure revision number 1
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]

    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.

该硬盘是否可能很快出现故障并需要更换?

linux smart drive-failure
  • 2 个回答
  • 1287 Views
Martin Hope
FreeSoftwareServers
Asked: 2016-08-10 11:04:28 +0800 CST

“smartctl -H 或 -all”是否对磁盘运行任何操作或仅轮询数据?

  • 0

我目前正在设置智能监控,我对命令有疑问

smartctl -H /dev/sda

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

这实际上是对磁盘运行任何东西,还是只是轮询 SmartMonTools 当前可用的日志/数据。

我了解并正在考虑通过短期和长期测试运行 smartd,但这将由 smartd 管理。我的脚本很简单,它只是 greps 健康状态 OK 并根据查找结果失败/通过。它还显示“smartctl -all /dev/sda”,我也想知道这一点。

我只是想确定一下,因为

我认为两者smartctl -H /dev/sda && smartctl -all /dev/sda在运行时实际上都没有进行任何测试,它们只是轮询可用数据。有人可以确认吗?

原因是我经常用我的网络监控软件(目前每 15m)轮询这个数据,但如果它不影响磁盘,我会留下它并使用 smartd 安排实际的自测,它会 100% 读取/写/测试磁盘。

hardware smart smartctl
  • 1 个回答
  • 470 Views

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    新安装后 postgres 的默认超级用户用户名/密码是什么?

    • 5 个回答
  • Marko Smith

    SFTP 使用什么端口?

    • 6 个回答
  • Marko Smith

    命令行列出 Windows Active Directory 组中的用户?

    • 9 个回答
  • Marko Smith

    什么是 Pem 文件,它与其他 OpenSSL 生成的密钥文件格式有何不同?

    • 3 个回答
  • Marko Smith

    如何确定bash变量是否为空?

    • 15 个回答
  • Martin Hope
    Tom Feiner 如何按大小对 du -h 输出进行排序 2009-02-26 05:42:42 +0800 CST
  • Martin Hope
    Noah Goodrich 什么是 Pem 文件,它与其他 OpenSSL 生成的密钥文件格式有何不同? 2009-05-19 18:24:42 +0800 CST
  • Martin Hope
    Brent 如何确定bash变量是否为空? 2009-05-13 09:54:48 +0800 CST
  • Martin Hope
    cletus 您如何找到在 Windows 中打开文件的进程? 2009-05-01 16:47:16 +0800 CST

热门标签

linux nginx windows networking ubuntu domain-name-system amazon-web-services active-directory apache-2.4 ssh

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve