关于【crash】的问题- 第1页

Danco

Asked: 2021-10-19 12:42:14 +0800 CST

服务器随机冻结并仅在冷启动时启动

0

我面临着关于一台服务器的非常奇怪的问题，它随机冻结/挂起，服务器上没有输出，并且不响应短键，并且需要冷启动，当用冷启动启动时，启动屏幕上根本没有错误。

它在重负载下根本不会冻结，大约 9-20% 的 cpu wheb 崩溃，平均负载大约 2-5（12 核 cpu）和 128gb ram

我们尝试检查日志，没有显示内核恐慌或与问题本身相关的任何内容。

在冷启动后的所有冻结中，当我们检查日志时，我们确实看到正常的 OOM 收割者正在杀死 php procces（用户达到限制）但没有太滥用，但总是在 OOM 上，有时当服务器冻结在日志中时，您会看到当前时间，有时就像它在崩溃的当前时间之后显示的旧日期几行，并冻结。

日志中没有任何内容可以确定软件相关，或者在重负载下，只是正常运行，这是从旧机器升级的机器，多年来稳定..冻结是随机的，可能是服务器启动一周后，或者两天或三个星期等等……

我们还尝试提取服务器冻结的 vmcore 转储，但仍然没有捕获任何内容。

它只是冻结，没有屏幕输出，但服务器仍在运行但不可发送，无法访问 ssh，也 kvm 正如我所说的在屏幕上根本没有输出。

它可能与可能有故障的硬件有关吗？因为我的暂停是关于内存故障？

我对这个问题非常迷茫..谢谢

Jame Goat

Asked: 2021-01-25 19:59:59 +0800 CST

我的 MySQL 在一个月内崩溃了很多次

2

我的网站（Wordpress）有时停止使用以下错误消息“

无法连接到数据表

我检查了 MySQL 的日志文件，发现崩溃信息如下：

---------- 
2021-01-21  0:44:59 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2021-01-21  0:44:59 0 [Note] InnoDB: Uses event mutexes
2021-01-21  0:44:59 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2021-01-21  0:44:59 0 [Note] InnoDB: Number of pools: 1
2021-01-21  0:45:00 0 [Note] InnoDB: Using SSE2 crc32 instructions
2021-01-21  0:45:00 0 [Note] InnoDB: Initializing buffer pool, total size = 16M, instances = 1, chunk size = 16M
2021-01-21  0:45:00 0 [Note] InnoDB: Completed initialization of buffer pool
2021-01-21  0:45:00 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2021-01-21  0:45:00 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=215993122
2021-01-21  0:45:07 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2021-01-21  0:45:07 0 [Note] InnoDB: Uses event mutexes
2021-01-21  0:45:07 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2021-01-21  0:45:07 0 [Note] InnoDB: Number of pools: 1
2021-01-21  0:45:07 0 [Note] InnoDB: Using SSE2 crc32 instructions
2021-01-21  0:45:07 0 [Note] InnoDB: Initializing buffer pool, total size = 16M, instances = 1, chunk size = 16M
2021-01-21  0:45:07 0 [Note] InnoDB: Completed initialization of buffer pool
2021-01-21  0:45:07 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2021-01-21  0:45:07 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=215993122
2021-01-21  0:50:02 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2021-01-21  0:50:02 0 [Note] InnoDB: Uses event mutexes
2021-01-21  0:50:02 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2021-01-21  0:50:02 0 [Note] InnoDB: Number of pools: 1
2021-01-21  0:50:02 0 [Note] InnoDB: Using SSE2 crc32 instructions
2021-01-21  0:50:02 0 [Note] InnoDB: Initializing buffer pool, total size = 16M, instances = 1, chunk size = 16M
2021-01-21  0:50:02 0 [Note] InnoDB: Completed initialization of buffer pool
2021-01-21  0:50:02 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2021-01-21  0:50:02 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=215993122
2021-01-21  0:50:02 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2021-01-21  0:50:02 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2021-01-21  0:50:02 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2021-01-21  0:50:02 0 [Note] InnoDB: Setting file '/opt/lampp/var/mysql/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2021-01-21  0:50:02 0 [Note] InnoDB: File '/opt/lampp/var/mysql/ibtmp1' size is now 12 MB.
2021-01-21  0:50:02 0 [Note] InnoDB: Waiting for purge to start
2021-01-21  0:50:02 0 [Note] InnoDB: 10.4.11 started; log sequence number 215993131; transaction id 221150
2021-01-21  0:50:02 0 [Note] Plugin 'FEEDBACK' is disabled.
2021-01-21  0:50:02 0 [Note] InnoDB: Loading buffer pool(s) from /opt/lampp/var/mysql/ib_buffer_pool
2021-01-21  0:50:02 0 [Note] Server socket created on IP: '::'.
----------

我重新启动了 MySQL，我的网站运行良好。我的 MySQL 版本是：Distrib 10.4.11-MariaDB，适用于 Linux (x86_64) Ubuntu 版本 20。

这种情况在一个月前出现过几次。

我之前确实在一些帖子中搜索过解决方案，但仍然无法解决这个问题，

MySQL 服务器每周至少崩溃 2 次

Wordpress + PHP+ apache +mysql，mysql每1/月崩溃一次

有没有人被困在这种情况下，并且知道如何解决它？

foss4me

Asked: 2020-10-22 15:44:14 +0800 CST

对许多小文件进行基准测试时，fio 3.23 核心转储

6

我被要求fio提供此测试数据集的基准测试结果：1048576x1MiB。因此，整体大小为1TiB。该集合包含2^20 个 1MiB文件。服务器运行CentOS Linux release 7.8.2003 (Core)。它有足够的内存：

[root@tbn-6 src]# free -g
              total        used        free      shared  buff/cache   available
Mem:            376           8         365           0           2         365
Swap:             3           2           1

它实际上不是物理服务器。相反，它是一个具有以下 CPU 的 Docker 容器：

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0-47
Thread(s) per core:    2
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz
[...]

为什么是码头工人？我们正在开展一个项目，评估使用容器而不是物理服务器的适当性。回到fio问题。

fio我记得我以前在处理包含许多小文件的数据集时遇到了麻烦。所以，我做了以下检查：

[root@tbn-6 src]# ulimit -Hn
8388608
[root@tbn-6 src]# ulimit -Sn
8388608
[root@tbn-6 src]# cat /proc/sys/kernel/shmmax
18446744073692774399

在我看来一切都很好。在撰写本文时，我还使用 GCC 9编译了最新的 fio 3.23 。

[root@tbn-6 src]# fio --version
fio-3.23

这是作业文件：

[root@tbn-6 src]# cat testfio.ini 
[writetest]
thread=1
blocksize=2m
rw=randwrite
direct=1
buffered=0
ioengine=psync
gtod_reduce=1
numjobs=12
iodepth=1
runtime=180
group_reporting=1
percentage_random=90
opendir=./1048576x1MiB

注：以上内容，可取出以下内容：

[...]
gtod_reduce=1
[...]
runtime=180
group_reporting=1
[...]

其余的必须保留。这是因为在我们看来运行 fio 时，作业文件的设置方式应尽可能模拟应用程序与存储的交互，即使知道fio!=也是如此the application。

我第一次跑步是这样的

[root@tbn-6 src]# fio testfio.ini
smalloc: OOM. Consider using --alloc-size to increase the shared memory available.
smalloc: size = 368, alloc_size = 388, blocks = 13
smalloc: pool 0, free/total blocks 1/524320
smalloc: pool 1, free/total blocks 8/524320
smalloc: pool 2, free/total blocks 10/524320
smalloc: pool 3, free/total blocks 10/524320
smalloc: pool 4, free/total blocks 10/524320
smalloc: pool 5, free/total blocks 10/524320
smalloc: pool 6, free/total blocks 10/524320
smalloc: pool 7, free/total blocks 10/524320
fio: filesetup.c:1613: alloc_new_file: Assertion `0' failed.
Aborted (core dumped)

好的，是时候使用--alloc-size

[root@tbn-6 src]# fio --alloc-size=776 testfio.ini
smalloc: OOM. Consider using --alloc-size to increase the shared memory available.
smalloc: size = 368, alloc_size = 388, blocks = 13
smalloc: pool 0, free/total blocks 1/524320
smalloc: pool 1, free/total blocks 8/524320
smalloc: pool 2, free/total blocks 10/524320
smalloc: pool 3, free/total blocks 10/524320
smalloc: pool 4, free/total blocks 10/524320
smalloc: pool 5, free/total blocks 10/524320
smalloc: pool 6, free/total blocks 10/524320
smalloc: pool 7, free/total blocks 10/524320
smalloc: pool 8, free/total blocks 8/524288
smalloc: pool 9, free/total blocks 8/524288
smalloc: pool 10, free/total blocks 8/524288
smalloc: pool 11, free/total blocks 8/524288
smalloc: pool 12, free/total blocks 8/524288
smalloc: pool 13, free/total blocks 8/524288
smalloc: pool 14, free/total blocks 8/524288
smalloc: pool 15, free/total blocks 8/524288
fio: filesetup.c:1613: alloc_new_file: Assertion `0' failed.
Aborted (core dumped)

回到原点：（

我肯定错过了什么。任何帮助都非常有义务。

King David

Asked: 2020-04-24 05:05:36 +0800 CST

关于未能分配内存的linux挂起+内核消息

0

我们有 5 台 Linux rhel 机器，我们注意到所有机器都同时挂起

从消息文件中，我们可以看到以下内容

Dec 29 19:54:25 localhost kernel: pci 0000:ff:12.4: BAR 4: failed to assign [mem size 0x00000040]
Dec 29 19:54:25 localhost kernel: pci 0000:ff:12.0: BAR 1: failed to assign [mem size 0x00000010]

这个内核消息是否可能是 linux 崩溃/挂起的原因？

根据redhat，他们建议更新内核版本（https://access.redhat.com/solutions/2772311）

但他们没有提到任何关于 linux 崩溃/挂起的信息

服务器随机冻结并仅在冷启动时启动

我的 MySQL 在一个月内崩溃了很多次

对许多小文件进行基准测试时，fio 3.23 核心转储

关于未能分配内存的linux挂起+内核消息

新安装后 postgres 的默认超级用户用户名/密码是什么？

SFTP 使用什么端口？

命令行列出 Windows Active Directory 组中的用户？

什么是 Pem 文件，它与其他 OpenSSL 生成的密钥文件格式有何不同？

如何确定bash变量是否为空？

问题[crash](server)