AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / server / 问题

问题[sata](server)

Martin Hope
Esben von Buchwald
Asked: 2022-03-08 04:55:21 +0800 CST

Synology Storage Manager 因为“重置命令”而拒绝健康磁盘为“严重”?

  • 1

我正在尝试在我的 DS1515+ 中安装新硬盘。硬盘是全新的 Seagate Exos X。

我已经购买了 3 个这样的驱动器,其中 1 个工作正常,现在是我的存储池的一部分,但是当我在 DS 中安装它们时,另外 2 个给我一个错误。错误是“发生了多个重置命令错误...”。

在存储管理器中,我无法继续,并且系统不允许我初始化磁盘以使其成为存储池的一部分。我已经尝试安全擦除其中一张磁盘,但没有成功。

我查看了 DS linux 终端的 dmesg 输出,没有发现给我带来麻烦的磁盘错误。

但我可以在所有扩展视图中看到,没有发生有关重置/重新识别/重新连接等的错误。那么,为什么 Storage Manager 会停留在“关键”状态?

当连接到另一台计算机时,这两个磁盘都可以正常工作,所以我相信它们在物理上是 100% 正常的。

我怀疑磁盘由于某些历史数据而被 DS 拒绝。首先,我尝试使用 DS1515+ 的 2 个 ESATA 端口连接两个有问题的磁盘,但这些磁盘从未出现在存储管理器中,并且我在 dmesg 日志中看到了一些重置/连接错误(使用 linux 终端)。这可能是因为旧的不稳定 ESATA 电缆。但我的理论是,这些磁盘通过 ESATA 连接时的事件导致 DS 将这些磁盘“列入黑名单”,因为它们曾经因片状电缆而导致“重置命令”。

如何强制 DS 接受这些磁盘并让我使用它们?有没有办法重置有关这些磁盘的任何历史知识,并让 DS 重新评估它们?

在此处输入图像描述 在此处输入图像描述 在此处输入图像描述 在此处输入图像描述 在此处输入图像描述 在此处输入图像描述

hard-drive storage sata synology
  • 1 个回答
  • 639 Views
Martin Hope
fi11222
Asked: 2022-02-23 23:43:10 +0800 CST

SATA 错误出现在 Journalctl 中,而 SMART 诊断正常 - 主板问题?

  • 0

在注意到异常长的磁盘操作延迟后,我查找了 journalctl,这就是我发现的:

Feb 22 14:02:11.711182 Onan01 kernel: ata10: hard resetting link
Feb 22 14:02:12.186958 Onan01 kernel: ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb 22 14:02:12.187044 Onan01 kernel: ata10.00: configured for UDMA/33
Feb 22 14:02:12.187068 Onan01 kernel: ata10: EH complete
Feb 22 14:02:22.782960 Onan01 kernel: ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb 22 14:02:22.783033 Onan01 kernel: ata10.00: configured for UDMA/33
Feb 22 14:03:27.472083 Onan01 kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0xd0000 action 0x6 frozen
Feb 22 14:03:27.472241 Onan01 kernel: ata10: SError: { PHYRdyChg CommWake 10B8B }
Feb 22 14:03:27.472271 Onan01 kernel: ata10.00: failed command: WRITE DMA EXT
Feb 22 14:03:27.472300 Onan01 kernel: ata10.00: cmd 35/00:18:00:35:44/00:00:74:00:00/e0 tag 14 dma 12288 out
                                               res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 22 14:03:27.472323 Onan01 kernel: ata10.00: status: { DRDY }
Feb 22 14:03:27.472345 Onan01 kernel: ata10: hard resetting link
Feb 22 14:03:27.950979 Onan01 kernel: ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb 22 14:03:27.951084 Onan01 kernel: ata10.00: configured for UDMA/33
Feb 22 14:03:27.951113 Onan01 kernel: ata10: EH complete
Feb 22 14:04:03.852081 Onan01 kernel: ata10.00: exception Emask 0x10 SAct 0x0 SErr 0x40d0000 action 0xe frozen
Feb 22 14:04:03.852242 Onan01 kernel: ata10.00: irq_stat 0x00000040, connection status changed
Feb 22 14:04:03.852274 Onan01 kernel: ata10: SError: { PHYRdyChg CommWake 10B8B DevExch }
Feb 22 14:04:03.852301 Onan01 kernel: ata10.00: failed command: WRITE DMA EXT
Feb 22 14:04:03.852325 Onan01 kernel: ata10.00: cmd 35/00:38:58:35:44/00:00:74:00:00/e0 tag 17 dma 28672 out
                                               res 50/00:00:38:23:00/00:00:ac:00:00/e0 Emask 0x10 (ATA bus error)
Feb 22 14:04:03.852357 Onan01 kernel: ata10.00: status: { DRDY }

第一种错误(超时)似乎比第二种错误(ATA 总线错误)更频繁。每个都有不少。SATA 通道ata10连接到 WD Caviar Green HDD。

此磁盘上的 SMART 诊断显然是干净的:

sudo smartctl --all /dev/sdf1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-100-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD20EZAZ-00GGJB0
Serial Number:    WD-WXT1A29LE265
LU WWN Device Id: 5 0014ee 211b07a4f
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Feb 23 11:37:14 2022 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (32520) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 103) minutes.
Conveyance self-test routine
recommended polling time:    (   2) minutes.
SCT capabilities:          (0x3035) SCT Status supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   184   170   021    Pre-fail  Always       -       1783
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1573
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   083   083   000    Old_age   Always       -       13100
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1524
192 Power-Off_Retract_Count 0x0032   199   199   000    Old_age   Always       -       761
193 Load_Cycle_Count        0x0032   147   147   000    Old_age   Always       -       160779
194 Temperature_Celsius     0x0022   115   104   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     13100         -
# 2  Short offline       Completed without error       00%     13099         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

一件奇怪的事情是,长时间的 SMART 测试似乎无法正常工作。它们从进度 90% 直接完成(没有 80%、70% 等),之后,它们不会出现在“SMART 自测日志”部分中。

我连续两天经历了文件操作延迟。重新启动后,问题似乎消失了,然后又回来了。具体来说,问题表现为复制或移动文件的长时间延迟以及 LibreOffice 挂起文件保存。知道导致此类错误的原因是什么吗?

操作系统:Ubuntu 20.04

处理器:锐龙3

MB:技嘉 X570 UD

smart sata hdd
  • 1 个回答
  • 86 Views
Martin Hope
serveraddict
Asked: 2021-12-07 10:00:25 +0800 CST

您可以在主板上将 SAS 驱动器从 SAS 背板运行到 SATA 吗?

  • 0

将 SAS 驱动器插入背板上的 SAS 端口,并将其连接到主板上的 SATA。可以这样使用吗?它是用于克隆另一个 SSD 的测试服务器。

hard-drive sas sata
  • 1 个回答
  • 161 Views
Martin Hope
Gnarflord
Asked: 2021-10-26 04:55:01 +0800 CST

什么可以防止 linux ahci 中的 hdd 热插拔?

  • 0

我正在为这个问题撕扯头发。

我想在我的家庭服务器上添加一个热插拔托架,以便轻松添加和移除 HDD,例如轻松轮换异地备份。有问题的主板是带有四个本机 SATA 端口的 Asrock J4105-ITX 主板,这些端口分为 ASM1062 和英特尔处理器 SATA 控制器。两者都可以正常工作并使用ahci内核模块。BIOS 中有一个热插拔选项,但似乎没有效果。

如果驱动器断开连接(通过echo 1 > /sys/block/sdX/device/delete或粗鲁地移除驱动器),重新连接后将无法识别新设备。我尝试强制重新扫描 ( echo "- - -" > /sys/class/scsi_host/host<n>/scan) 但无济于事,SATA 端口实际上在下次重新启动之前不再可用。我还尝试了一些更极端的命令,但没有任何运气:

echo 1 > /sys/class/scsi_device/2:0:0:0/device/reset
echo 1 > /sys/devices/pci0000:00/0000:00:1f.2/rescan
echo 1 > /sys/devices/pci0000:00/0000:00:1f.2/reset

(取自如何让 Linux 识别我在不重新启动的情况下热插拔的新 SATA /dev/sda 驱动器?)

“好吧,可能是芯片组不支持热插拔或者BIOS坏了。” 所以我订购了两个 PCIe SATA 控制器(一个使用 ASM1064,另一个使用 Marvell 88SE9215)。两者都表现出相同的问题,尽管其他买家表示热插拔适用于他们,所以我猜这个问题要么与软件有关(我的安装?我正在运行 Arch OS,它会尽职尽责地保持最新状态)。

一些希望有用的信息:

$ uname -a
Linux servername 5.14.14-arch1-1 #1 SMP PREEMPT Wed, 20 Oct 2021 21:35:18 +0000 x86_64 GNU/Linux

$ dmesg | grep ahci
[    0.447450] ahci 0000:00:12.0: version 3.0
[    0.447842] ahci 0000:00:12.0: SSS flag set, parallel bus scan disabled
[    0.457970] ahci 0000:00:12.0: AHCI 0001.0301 32 slots 2 ports 6 Gbps 0x3 impl SATA mode
[    0.457981] ahci 0000:00:12.0: flags: 64bit ncq sntf stag pm clo only pmp pio slum part sxs deso sadm sds apst 
[    0.458750] scsi host0: ahci
[    0.459204] scsi host1: ahci
[    0.469788] ahci 0000:01:00.0: AHCI 0001.0000 32 slots 4 ports 6 Gbps 0xf impl SATA mode
[    0.469801] ahci 0000:01:00.0: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs 
[    0.470767] scsi host2: ahci
[    0.471203] scsi host3: ahci
[    0.471562] scsi host4: ahci
[    0.471904] scsi host5: ahci
[    0.472341] ahci 0000:04:00.0: SSS flag set, parallel bus scan disabled
[    0.472376] ahci 0000:04:00.0: AHCI 0001.0200 32 slots 2 ports 6 Gbps 0x3 impl SATA mode
[    0.472382] ahci 0000:04:00.0: flags: 64bit ncq sntf stag led clo pmp pio slum part ccc 
[    0.472803] scsi host6: ahci
[    0.473011] scsi host7: ahci

$ lspci -v
[...]
01:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller (rev 11) (prog-if 01 [AHCI 1.0])
    Subsystem: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller
    Flags: bus master, fast devsel, latency 0, IRQ 127
    I/O ports at e050 [size=8]
    I/O ports at e040 [size=4]
    I/O ports at e030 [size=8]
    I/O ports at e020 [size=4]
    I/O ports at e000 [size=32]
    Memory at a1340000 (32-bit, non-prefetchable) [size=2K]
    Expansion ROM at a1300000 [disabled] [size=256K]
    Capabilities: [40] Power Management version 3
    Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
    Capabilities: [70] Express Legacy Endpoint, MSI 00
    Capabilities: [e0] SATA HBA v0.0
    Capabilities: [100] Advanced Error Reporting
    Kernel driver in use: ahci
[...]
arch-linux sata hotswap acpi
  • 1 个回答
  • 402 Views
Martin Hope
Parker Kemp
Asked: 2021-09-20 17:43:12 +0800 CST

SATA/SAS 反向分支电缆故障排除

  • 1

对于新的服务器构建,我正在尝试将 SAS HDD 背板 (SFF-8643) 连接到 SATA 主板。根据我的研究,这应该可以使用反向分支电缆。

我购买并安装了两条“SFF-8643 到 4x SATA”反向分支电缆,但是当我启动机器时,它无法识别背板上的任何磁盘(全是 SATA)。作为现场测试,我将其中一个磁盘直接连接到主板上,它运行良好。

我的理解是否正确,这种配置应该可以工作?我还能做哪些其他故障排除?

机箱/背板被描述为2U 8-bay 2.5" / 3.5" HDD / SSD 机架式存储机箱,带有 Mini-SAS HD SFF-8643 12 Gb/s 接口:https://www.silverstonetek.com/product.php? pid=922&area=en

这些电缆被描述为Cable Matters Internal HD Mini SAS to SATA (SFF-8643 to 4x SATA) Reverse Breakout Cable 3.3 Feet/1mParent:https ://www.newegg.com/p/0S8-02PG-00240

sas sata
  • 1 个回答
  • 281 Views
Martin Hope
mike
Asked: 2021-08-09 01:26:03 +0800 CST

磁盘问题:irq_stat 0x20000000,主机总线错误

  • 0

将大文件 (50+GB) 从 NVMe 磁盘复制到 SATA 7200rpm HDD 磁盘时,我在完全修补的 Ubuntu 20.04 的日志中看到以下错误:

Aug 08 00:45:59 host kernel: ata6.00: exception Emask 0x20 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug 08 00:45:59 host kernel: ata6.00: irq_stat 0x20000000, host bus error
Aug 08 00:45:59 host kernel: ata6.00: failed command: WRITE DMA EXT
Aug 08 00:45:59 host kernel: ata6.00: cmd 35/00:08:30:a2:e0/00:00:e8:00:00/e0 tag 23 dma 4096 out
                                    res 50/00:00:00:00:00/00:00:00:00:00/00 Emask 0x20 (host bus error)
Aug 08 00:45:59 host kernel: ata6.00: status: { DRDY }
Aug 08 00:45:59 host kernel: ata6: hard resetting link
Aug 08 00:46:00 host kernel: ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Aug 08 00:46:00 host kernel: ata6.00: configured for UDMA/133
Aug 08 00:46:00 host kernel: ata6: EH complete

ata6.00是正在写入的磁盘。
问题是间歇性的。有时 24 小时不出现,有时每小时出现几次。通常磁盘会恢复,但有时文件系统会损坏,需要卸载、修复(如果可能)并重新安装。

我尝试了什么:

  1. 我尝试了 3 种不同品牌的硬盘。所有人都有同样的问题。
  2. 我怀疑是硬件问题。我更换了主板和 SATA 电缆。这些都没有帮助。
  3. 我有另一台具有相同配置的服务器。该问题不会在那里发生。相同的工作量。
  4. 我还有另一台配置完全不同的服务器(英特尔与 AMD)。问题发生在那里。相同的工作量。
  5. 我通过禁用 NCQ echo 1 > /sys/block/sda/device/queue_depth。没有帮助。

我没有主意了……
这些都是数据中心级组件。鉴于我采取的步骤,我想这不是硬件制造缺陷。
这可能与软件/操作系统/BIOS 相关吗?
任何想法我还应该尝试什么?

hard-drive ubuntu sata drive-failure
  • 2 个回答
  • 205 Views
Martin Hope
Peit
Asked: 2021-03-01 00:19:38 +0800 CST

相对较新的 WD Red Pro 产生 ATA 状态:41 (DRDY ERR),错误:FreeBSD 12.2 上的 40 (UNC)

  • 2

我正在运行基于FreeBSD 12.2的TrueNAS服务器。我将存储迁移到10 TB WD Red Pro。他们现在运行了 42 天。

突然,在 ZFS 清理期间,其中一个磁盘产生了 5 个错误。他们都或多或少地读到:

(ada2:ahcich14:0:0:0): READ_FPDMA_QUEUED. ACB: 60 b8 08 3a 0f 40 f8 01 00 07 00 00
(ada2:ahcich14:0:0:0): CAM status: ATA Status Error
(ada2:ahcich14:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada2:ahcich14:0:0:0): RES: 41 40 90 3b 0f 40 f8 01 00 30 06
(ada2:ahcich14:0:0:0): Retrying command, 3 more tries remain

事件发生后我进行了扩展的 SMART 测试,但没有产生任何错误(记录的错误除外),尤其是没有重新定位的扇区等:

smartctl 7.1 2019-12-30 r5022 [FreeBSD 12.2-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD102KFBX-68M95N0
Serial Number:    [deleted]
LU WWN Device Id: 5 000cca 0b0cd3041
Firmware Version: 83.00A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    [deleted]
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (   87) seconds.
Offline data collection
capabilities:            (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (1108) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   132   132   054    Old_age   Offline      -       96
  3 Spin_Up_Time            0x0007   100   100   024    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       3
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       1077
 10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       3
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       215
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       215
194 Temperature_Celsius     0x0002   142   142   000    Old_age   Always       -       42 (Min/Max 25/67)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 5
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 5 occurred at disk power-on lifetime: 1050 hours (43 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 b8 10 08 3a 0f 40 08  15d+21:45:25.729  READ FPDMA QUEUED
  60 80 38 b8 60 0f 40 08  15d+21:45:18.777  READ FPDMA QUEUED
  60 b8 30 f8 58 0f 40 08  15d+21:45:18.775  READ FPDMA QUEUED
  60 b8 28 40 51 0f 40 08  15d+21:45:18.775  READ FPDMA QUEUED
  60 b8 20 80 49 0f 40 08  15d+21:45:15.608  READ FPDMA QUEUED

Error 4 occurred at disk power-on lifetime: 1050 hours (43 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 b8 28 10 d8 0e 40 08  15d+21:45:10.298  READ FPDMA QUEUED
  60 80 40 48 ef 0e 40 08  15d+21:45:03.370  READ FPDMA QUEUED
  60 b8 38 88 e7 0e 40 08  15d+21:45:03.178  READ FPDMA QUEUED
  60 b8 30 d0 df 0e 40 08  15d+21:45:00.444  READ FPDMA QUEUED
  60 20 20 f0 d3 0e 40 08  15d+21:45:00.286  READ FPDMA QUEUED

Error 3 occurred at disk power-on lifetime: 1050 hours (43 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 b8 00 90 81 23 40 08  15d+21:41:08.578  READ FPDMA QUEUED
  60 80 10 08 91 23 40 08  15d+21:41:08.336  READ FPDMA QUEUED
  60 b8 08 48 89 23 40 08  15d+21:41:01.627  READ FPDMA QUEUED
  60 b8 f8 d0 79 23 40 08  15d+21:40:57.546  READ FPDMA QUEUED
  60 b8 f0 18 72 23 40 08  15d+21:40:56.899  READ FPDMA QUEUED

Error 2 occurred at disk power-on lifetime: 1050 hours (43 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 b8 18 f0 d5 17 40 08  15d+21:34:13.263  READ FPDMA QUEUED
  60 20 50 10 0c 18 40 08  15d+21:34:06.288  READ FPDMA QUEUED
  60 b8 48 58 04 18 40 08  15d+21:34:06.288  READ FPDMA QUEUED
  60 b8 40 98 fc 17 40 08  15d+21:34:06.288  READ FPDMA QUEUED
  60 b8 38 e0 f4 17 40 08  15d+21:34:06.288  READ FPDMA QUEUED

Error 1 occurred at disk power-on lifetime: 1050 hours (43 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 b8 50 28 8b 17 40 08  15d+21:33:33.959  READ FPDMA QUEUED
  60 b8 48 70 83 17 40 08  15d+21:33:16.648  READ FPDMA QUEUED
  60 80 40 e8 82 17 40 08  15d+21:33:16.647  READ FPDMA QUEUED
  ea 00 00 00 00 00 40 08  15d+21:33:16.640  FLUSH CACHE EXT
  61 08 30 f0 fd 3f 40 08  15d+21:33:16.638  WRITE FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1072         -
# 2  Short offline       Completed without error       00%      1023         -
# 3  Extended offline    Completed without error       00%       946         -
# 4  Short offline       Completed without error       00%       855         -
# 5  Short offline       Completed without error       00%       687         -
# 6  Extended offline    Completed without error       00%       610         -
# 7  Short offline       Completed without error       00%       519         -
# 8  Short offline       Completed without error       00%       279         -
# 9  Extended offline    Completed without error       00%       202         -
#10  Short offline       Completed without error       00%       111         -
#11  Short offline       Completed without error       00%        11         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

首先,我认为我可能买了有缺陷的磁盘。我本来希望 SMART 无法通过评估。然而,这种情况并非如此。我不认为它是有缺陷的 PSU,因为它甚至还不到一岁。此外,它是一个 550 瓦的 PSU,机器消耗大约 100 瓦。我也不认为这是一条有缺陷的电缆,因为我运行其他磁盘将近一年都没有问题。此外,对于这些其他光盘,我实际上有一条有缺陷的电缆,我更换了它,观察结果有所不同。

我正在考虑对驱动器进行 RMA,但我不确定它是否符合 RMA 的条件。你怎么看?这可能是一个暂时的错误吗?任何建议表示赞赏。

hard-drive freebsd zfs smart sata
  • 2 个回答
  • 1278 Views
Martin Hope
Declan Shanaghy
Asked: 2021-02-13 21:40:41 +0800 CST

启动时缺少 LVM PV - 掉到 initramfs shell

  • 0

我今天重新启动了我的 ubuntu 20.10 服务器,它突然开始抱怨它在根 LV 中找不到 PV 之一

在外壳中进行了一些挖掘之后,我发现它确实丢失了 PV。我无法在不添加--activationmode partial选项的情况下激活 VG

控制台上不断出现一些令人担忧的消息。即ata2 softreset... 和ata2: SATA link down。

以下是该会议的一些图片:https ://photos.app.goo.gl/r5FBfdY5XaPa5y9h9

我启动了一个实时的 ubuntu 桌面并继续探索,我很快发现 PV 现在确实存在,并且我能够毫无问题地激活和安装 VG。我还通过 dmesg 在实时实例中看到 SATA 消息,但它们不会一直重复。有问题的磁盘是 SSD。这是关于它的 dmesg 输出的其余部分。

[   50.228406] ata2: softreset failed (1st FIS failed)
[   50.943122] ata2: SATA link down (SStatus 0 SControl 300)
[   56.855151] ata2: SATA link down (SStatus 0 SControl 300)
[   56.855157] ata2.00: link offline, clearing class 1 to NONE
[   56.859920] ata2: limiting SATA link speed to 1.5 Gbps
[   57.731143] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[   57.737432] ata2.00: ATA-9: INTEL SSDSC2CT120A3, 300i, max UDMA/133
[   57.737436] ata2.00: 234441648 sectors, multi 16: LBA48 NCQ (depth 32), AA
[   57.747452] ata2.00: configured for UDMA/133
[   57.747606] scsi 1:0:0:0: Direct-Access     ATA      INTEL SSDSC2CT12 300i PQ: 0 ANSI: 5
[   57.752238] sd 1:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/112 GiB)
[   57.755084] sd 1:0:0:0: [sdc] Write Protect is off
[   57.755127] sd 1:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[   57.755444] sd 1:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   57.755515] sd 1:0:0:0: Attached scsi generic sg2 type 0
[   57.780008]  sdc: sdc1
[   57.780452] sd 1:0:0:0: [sdc] Attached SCSI disk

这是详细 vgdisplay 的输出

root@ubuntu:~# vgdisplay -v
  /dev/sdb: open failed: No medium found
  /dev/sdb: open failed: No medium found
  --- Volume group ---
  VG Name               ubuntu-vg
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               229.52 GiB
  PE Size               4.00 MiB
  Total PE              58758
  Alloc PE / Size       58758 / 229.52 GiB
  Free  PE / Size       0 / 0   
  VG UUID               ddb9uT-0717-jSfz-phaq-N8il-4OFu-TqR3fG
   
  --- Logical volume ---
  LV Path                /dev/ubuntu-vg/ubuntu-lv
  LV Name                ubuntu-lv
  VG Name                ubuntu-vg
  LV UUID                nWtpix-WsV2-dT3v-RWtc-zPl1-6SdL-sSwIOB
  LV Write Access        read/write
  LV Creation host, time ubuntu-server, 2021-01-25 00:43:30 +0000
  LV Status              available
  # open                 0
  LV Size                229.52 GiB
  Current LE             58758
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0
   
  --- Physical volumes ---
  PV Name               /dev/nvme0n1p3     
  PV UUID               55KfPo-ep2o-n3FB-stZz-65gO-J1Bz-Y9evX0
  PV Status             allocatable
  Total PE / Free PE    30141 / 0
   
  PV Name               /dev/sdc1     
  PV UUID               fPg1BI-COwe-n4YJ-Wo4F-c6I5-4f96-hk1oEn
  PV Status             allocatable
  Total PE / Free PE    28617 / 0
   

我检查了我的 BIOS 的 SATA 设置,看到一些与该消息相关的帖子并需要更改 SATA 模式,但我找不到 SATA 部分。自从上次我需要深入研究以来,BIOS 选项变得更加复杂!!

任何指针plzzzzzzz。

linux ubuntu lvm sata
  • 1 个回答
  • 484 Views
Martin Hope
Sylvain Leroux
Asked: 2021-01-08 03:27:09 +0800 CST

/sys/block/sd 的目的是什么?/device/rescan?

  • 2

我需要“重新扫描”我们 Linux 服务器的 SATA 总线,以寻找没有出现的热插拔设备。我已经看到一个关于该主题的老问题(如何让 Linux 识别我在不重新启动的情况下热插拔的新 SATA /dev/sda 驱动器?)并且那里提供的信息确实有效。

但是,我也注意到rescan在/dev/block/sd?/device.

那么,我可以假设以下内容:

echo 1 > /sys/block/sdd/device/rescan

相当于:

echo 1 > /sys/block/sdd/device/delete
echo "- - -" > /sys/class/scsi_host/host4/scan
linux sata linux-kernel hotswap
  • 1 个回答
  • 1690 Views
Martin Hope
Binarus
Asked: 2020-08-24 10:36:44 +0800 CST

Supermicro BPN-SAS3-826EL1背板如何使用12块硬盘?

  • 1

我有一个带有 Supermicro BPN-SAS3-826EL1 背板的机箱。该背板允许连接 12 个 HDD。但是,我显然还不了解 HBA 和背板之间的布线,我需要实际使用所有 12 个 HDD 端口。

根据手册第 3-1 页的第一张图,PRI-J1 和 PRI-J2 应该连接到 HBA(因此是“输入”),而 PRI-J3 和 PRI-J4 是“输出”,可以连接到我没有的级联背板。

手册进一步说 PRI-Jn 是 SFF-8643 连接器,可以传输 4 个 SAS 或 SATA 驱动器的信号。这意味着我可以在背板上使用 8 个 HDD,尽管它在机械上提供了 12 个驱动器连接器。

我需要做什么才能将 12 个 HDD 与该背板一起使用?

如果问题很愚蠢,请原谅我,但我还没有背板的经验,并且我已经多次研究过这个和其他 Supermicro 手册,但仍然不了解情况。

更新/澄清

非常抱歉我说的不够清楚:

我想使用 SATA 驱动器(不是 SAS),并且我已经有一个连接到背板的 LSI 9361-8i,使用两端带有 SFF-8643 连接器的两条电缆。这样控制器就可以看到 8 个驱动器。现在我想知道如何使用剩余的 4 个驱动器。

我正在使用的主板集成了两个 SATA / SAS 控制器,每个控制器可以处理四个驱动器。我只需要将其中一个连接到背板的 PRI-J3 吗?

hard-drive sas sata supermicro
  • 2 个回答
  • 1522 Views

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    新安装后 postgres 的默认超级用户用户名/密码是什么?

    • 5 个回答
  • Marko Smith

    SFTP 使用什么端口?

    • 6 个回答
  • Marko Smith

    命令行列出 Windows Active Directory 组中的用户?

    • 9 个回答
  • Marko Smith

    什么是 Pem 文件,它与其他 OpenSSL 生成的密钥文件格式有何不同?

    • 3 个回答
  • Marko Smith

    如何确定bash变量是否为空?

    • 15 个回答
  • Martin Hope
    Tom Feiner 如何按大小对 du -h 输出进行排序 2009-02-26 05:42:42 +0800 CST
  • Martin Hope
    Noah Goodrich 什么是 Pem 文件,它与其他 OpenSSL 生成的密钥文件格式有何不同? 2009-05-19 18:24:42 +0800 CST
  • Martin Hope
    Brent 如何确定bash变量是否为空? 2009-05-13 09:54:48 +0800 CST
  • Martin Hope
    cletus 您如何找到在 Windows 中打开文件的进程? 2009-05-01 16:47:16 +0800 CST

热门标签

linux nginx windows networking ubuntu domain-name-system amazon-web-services active-directory apache-2.4 ssh

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve