每隔一两天,我们就会收到下面的错误和 mysql 崩溃:
2024-07-14T05:31:59.327696Z 0 [ERROR] [MY-012872] [InnoDB] [FATAL] Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2024-07-14T05:31:59.327741Z 0 [ERROR] [MY-013183] [InnoDB] Assertion failure: srv0srv.cc:1878:ib::fatal triggered thread 140039700203264
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
2024-07-14T05:31:59Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=1758de1e111952b1f61480360c447dc27d6caddc
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x2130f51]
/usr/sbin/mysqld(print_fatal_signal(int)+0x397) [0xfee797]
/usr/sbin/mysqld(my_server_abort()+0x75) [0xfee8e5]
/usr/sbin/mysqld(my_abort()+0xe) [0x212adee]
usr/sbin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x309) [0x237b9b9]
/usr/sbin/mysqld(ib::fatal::~fatal()+0xcf) [0x237e2ff]
/usr/sbin/mysqld(srv_error_monitor_thread()+0x7aa) [0x231c41a]
/usr/sbin/mysqld(void Detached_thread::operator()<void (*)()>(void (*&&)())+0xca) [0x224a87a]
/lib64/libstdc++.so.6(+0xc2b13) [0x7f5de9854b13]
/lib64/libpthread.so.0(+0x81da) [0x7f5dea8bc1da]
/lib64/libc.so.6(clone+0x43) [0x7f5de8e6be73]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
之前还有很多这样的:
Mutex at 0x7fde903975e8, Mutex TRX_SYS created trx0sys.cc:565, locked by 140504716736256
--Thread 140504724723456 has waited at trx0sys.h line 598 for 240 seconds the semaphore:
2024-07-13T04:43:26.039036Z 0 [Warning] [MY-012985] [InnoDB] A long semaphore wait:
{}
我还在日志中看到很多这样的警告(我认为主要是在重启后):
[Warning] [MY-013865] [InnoDB] Redo log writer is waiting for a new redo log file. Consider increasing innodb_redo_log_capacity.
我不知道这个问题的原因是什么。我把 增加了innodb_redo_log_capacity
很多,但还是发生了。我也试过关闭,innodb_adaptive_hash_index
但没有用。我想增加innodb_log_buffer_size
目前设置为 1M 的 ,因为我觉得这可能会导致交易速度太慢。
需要说明的是,MySQL 似乎在这 600 秒内停止了一切操作。此时我无法连接到服务器,CPU 已降至零,然后我们收到此错误和崩溃。
由于在停机期间无法连接到数据库,我认为从 MySQL 本身运行计划事件将正在运行的查询写入磁盘应该可以解决问题 - 但我错了。即使使用计划事件也会在停机开始时停止写入。我尝试的另一件事是启用每 15 秒innodb_status_output
记录一次SHOW ENGINE INNODB STATUS;
,stderr
即使这样也会在stderr
停机开始时停止写入。
我们使用 MySQL 版本8.0.34
解决方案是更换运行 MySQL 的节点,这可能是硬件问题。之后,问题再也没有发生过。