我们正在为我们的一个客户管理一个 MySQL 服务器,该客户拥有超过 100 个数据库,每个数据库大约有 50 个表,其中许多是 InnoDB 表。服务器崩溃了,我正试图找到罪魁祸首。重新启动时innodb_force_recovery = 2
,我可以连接并且在 error.log 中看不到任何错误。更重要的是,mysqlcheck --all-databases
所有表格都报告“Ok”。但是当我删除时innodb_force_recovery
,服务器再次崩溃,将堆栈跟踪写入 error.log 并且只能使用kill -9
.
在这些情况下,我应该如何找到有问题的数据库,以及什么会导致 mysqlcheck 错过损坏的表?请不要告诉我忽略它并从转储中恢复所有数据库。这对于一个或两个数据库来说可能是可以接受的,如果它在蓝月亮中只发生一次,但是我不止一次在同一台服务器上遇到问题,从转储中恢复所有内容只需要太多时间和手动工作才能做到这一点时间。
服务器版本为 5.5.46 并且innodb_file_per_table
处于活动状态。
按要求摘录 error.log(是否The tablespace free space info is corrupt
意味着存在不在特定表中且无法更正的错误?):
180222 17:13:48 mysqld_safe Starting mysqld daemon with databases from /home/mysql
180222 17:13:48 [Warning] 'THREAD_CONCURRENCY' is deprecated and will be removed in a future release.
180222 17:13:48 [Note] /usr/libexec/mysqld (mysqld 5.5.46) starting as process 26242 ...
180222 17:13:48 [Note] Plugin 'FEDERATED' is disabled.
180222 17:13:48 InnoDB: The InnoDB memory heap is disabled
180222 17:13:48 InnoDB: Mutexes and rw_locks use InnoDB's own implementation
180222 17:13:48 InnoDB: Compressed tables use zlib 1.2.3
180222 17:13:48 InnoDB: Using Linux native AIO
180222 17:13:48 InnoDB: Initializing buffer pool, size = 128.0M
180222 17:13:49 InnoDB: Completed initialization of buffer pool
180222 17:13:49 InnoDB: highest supported file format is Barracuda.
180222 17:13:49 InnoDB: Waiting for the background threads to start
180222 17:13:50 InnoDB: 5.5.46 started; log sequence number 1632912830888
180222 17:13:50 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
180222 17:13:50 [Note] - '0.0.0.0' resolves to '0.0.0.0';
180222 17:13:50 [Note] Server socket created on IP: '0.0.0.0'.
180222 17:13:50 [Note] Event Scheduler: Loaded 0 events
180222 17:13:50 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.5.46' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Community Server (GPL) by Remi
InnoDB: Dump of the tablespace extent descriptor: len 40; hex 000000000000000200000000061600000000126e00000004ffffffffffffffffffffffffffffbfaa; asc n ;
InnoDB: Serious error! InnoDB is trying to free page 512
InnoDB: though it is already marked as free in the tablespace!
InnoDB: The tablespace free space info is corrupt.
InnoDB: You may need to dump your InnoDB tables and recreate the whole
InnoDB: database!
InnoDB: Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
180222 17:13:50 InnoDB: Assertion failure in thread 2499464080 in file fsp0fsp.c line 3309
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
16:13:50 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
key_buffer_size=268435456
read_buffer_size=1048576
max_used_connections=0
max_threads=512
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1314506 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x30000
/usr/libexec/mysqld(my_print_stacktrace+0x33)[0x842a1f3]
/usr/libexec/mysqld(handle_fatal_signal+0x42b)[0x82d9d3b]
[0x7bc420]
[0x7bc410]
/lib/libc.so.6(gsignal+0x50)[0x626b10]
/lib/libc.so.6(abort+0x101)[0x628421]
/usr/libexec/mysqld[0x85012e7]
/usr/libexec/mysqld[0x850147e]
/usr/libexec/mysqld[0x849c0b1]
/usr/libexec/mysqld[0x84a8a61]
/usr/libexec/mysqld[0x8561fef]
/usr/libexec/mysqld[0x85570a9]
/usr/libexec/mysqld[0x847b082]
/usr/libexec/mysqld[0x846bf04]
/usr/libexec/mysqld[0x846dad4]
/lib/libpthread.so.0[0x50d912]
/lib/libc.so.6(clone+0x5e)[0x6d347e]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
180222 17:13:50 mysqld_safe Number of processes running now: 0
180222 17:13:50 mysqld_safe mysqld restarted
错误消息本身告诉您为什么验证表数据不能解决问题。它试图释放一个页面(可能从表或索引中);但该页面已被标记为免费。换句话说,当页面被认为可以提供给另一个表或索引时,您的一个表或索引正在使用一个页面。而且,很明显,如果 InnoDB 不知道哪些页面实际上是空闲的,就会发生坏事。
转储所有表/索引数据并重新加载,意味着让 InnoDB 有机会重建其空闲页面的集合。理想情况下,您会在新数据库上执行此操作。为什么?好吧,你永远不应该假设只有一个单一的、单独的损坏错误。因此,如果您将数据移动到全新安装,您不必担心是否存在其他未检测到的损坏问题。
正如建议的那样,从这些转储中转储和恢复是我最终选择恢复所有数据库的方式。幸运的是,
innodb_force_recovery = 2
允许我无错误地转储所有内容,这样我就不必使用备份中的转储。当然,我宁愿找出错误的真正原因,但 MySQL 除了在错误报告中提到“表空间可用空间信息已损坏”之外没有提供任何帮助。如果不识别和消除错误的原因,我预计它最终会再次发生——我们的客户可能会比我更生气。可能是硬件不好,但系统中所有磁盘的 SMART 数据看起来都不错,并且
/var/log/messages
在崩溃时不包含任何可疑的东西。也没有意外断电或重新启动。