知道为什么每次循环错误率都会下降吗?
双通道 Geil EVO Spear DDR4-2133
sudo memtester 6000 5
memtester version 4.5.1 (64-bit)
Copyright (C) 2001-2020 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 6000MB (6291456000 bytes)
got 6000MB (6291456000 bytes), trying mlock ...locked.
Loop 1/5:
Stuck Address : testing 2FAILURE: possible bad address line at offset 0x000c04d0.
Skipping to next test...
Random Value : FAILURE: 0x3ed5db7679ece79f != 0x3ed7db7679ece79f at offset 0x2027fce0.
FAILURE: 0xf7fc8f5ffe7e74ef != 0xf7fe8f5ffe7e74ef at offset 0x35fa5ce0.
FAILURE: 0x10400b8d82107fb3 != 0x10420b8d82107fb3 at offset 0x35fa5ce0.
Compare XOR : FAILURE: 0x60a20d1c0800e073 != 0x60a00d1c0800e073 at offset 0x61914de0.
Compare SUB : Compare MUL : ok
FAILURE: 0x2000000000004 != 0x00000004 at offset 0x319f1ce0.
FAILURE: 0x2000000000004 != 0x00000004 at offset 0x659e9ce0.
Compare DIV : FAILURE: 0xfffdffffffffffff != 0xffffffffffffffff at offset 0x35fa5ce0.
Compare OR : FAILURE: 0xc7f4e347d8fafda != 0xc7e4e347d8fafda at offset 0x0a5fb5d0.
FAILURE: 0xc7f4e347d8fafda != 0xc7e4e347d8fafda at offset 0x3c3094d0.
Compare AND : FAILURE: 0xffec15854f70c19d != 0xffee15854f70c19d at offset 0x088b5ce0.
FAILURE: 0xffec1585563811bd != 0xffee1585563811bd at offset 0x3ec5dde0.
Sequential Increment: Solid Bits : testing 2FAILURE: 0xfffeffffffffffff != 0xffffffffffffffff at offset 0x09c774d0.
Block Sequential : testing 0FAILURE: 0x2000000000000 != 0x00000000 at offset 0x659e9ce0.
FAILURE: 0x1000000000000 != 0x00000000 at offset 0x692f15d0.
FAILURE: 0x2000000000000 != 0x00000000 at offset 0x7a8adce0.
Checkerboard : testing 1FAILURE: 0xaaa8aaaaaaaaaaaa != 0xaaaaaaaaaaaaaaaa at offset 0x35fa5ce0.
Bit Spread : testing 0FAILURE: 0x1000000000005 != 0x00000005 at offset 0x3abe94d0.
Bit Flip : testing 3FAILURE: 0x2000000000001 != 0x00000001 at offset 0x3abe5de0.
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok
Loop 2/5:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
FAILURE: 0x2000000000001 != 0x00000001 at offset 0x53351ce0.
FAILURE: 0x2000000000001 != 0x00000001 at offset 0x65ae9ce0.
Compare DIV : Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : testing 3FAILURE: 0x2000000000000 != 0x00000000 at offset 0x659e9ce0.
Block Sequential : testing 0FAILURE: 0x2000000000000 != 0x00000000 at offset 0x319f1ce0.
Checkerboard : testing 11FAILURE: 0xaaa8aaaaaaaaaaaa != 0xaaaaaaaaaaaaaaaa at offset 0x35fa5ce0.
Bit Spread : ok
Bit Flip : testing 77FAILURE: 0x1000000000200 != 0x00000200 at offset 0x692f15d0.
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok
Loop 3/5:
Stuck Address : testing 5FAILURE: possible bad address line at offset 0x3e2814d0.
Skipping to next test...
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : testing 15FAILURE: 0x1000000000000 != 0x00000000 at offset 0x692f15d0.
Block Sequential : testing 152FAILURE: 0x989a989898989898 != 0x9898989898989898 at offset 0x22c61de0.
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok
Loop 4/5:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok
Loop 5/5:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok
Done.
更换内存模块。如果问题仍然存在,请硬件支持人员协助更换可能出现故障的其他组件,如风扇、电源、CPU 或系统板。
检查错误检测硬件、ECC 内存和其他 RAS 功能,以查明究竟是什么地方出现了故障。
memtester 不会告诉您硬件在哪里或为何出现故障,只会告诉您数学函数的存储值不符合预期。
隔离温度、电压和辐射等环境因素、找出哪些物理模块可能存在问题以及找出间歇性问题的原因(这更困难),需要花费大量时间。对于易耗组件来说,这样做不值得,应将其停产。也许可以将它们放在您不关心的测试环境中,以观察此类硬件故障是否会对您的工作负载造成问题。