A cada poucas semanas ou meses meu servidor MySQL (5.6.21) está usando toda a memória incluindo swap, embora não devesse.
O servidor é usado apenas como escravo de backup para um data warehouse.
Temos vários outros servidores MySQL, todos rodando com a mesma versão no mesmo SO (CentOS 6 rodando com kernel 2.6.32-358.14.1.el6.x86_64) mas este é o único servidor que apresenta este sintoma.
Não consegui observar consultas específicas. Mas isso acontece durante a replicação por volta das 05:00 da manhã. A atualização do data warehouse é executada entre 00:00 e 07:00.
Também não há leituras neste servidor além do backup. O backup é executado às 21:00 e não é a causa disso.
Apenas reiniciar o serviço mysqld resolve o problema até agora. Estou apenas curioso, o que está causando isso.
Eis a situação atual:
# free -m
total used free shared buffers cached
Mem: 48268 47549 718 0 115 872
-/+ buffers/cache: 46562 1706
Swap: 12087 12087 0
A partir da top
saída, fica claro que apenas o MySQL está usando toda a memória:
Tasks: 3 total, 0 running, 3 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 49427328k total, 48695740k used, 731588k free, 119452k buffers
Swap: 12378104k total, 12377212k used, 892k free, 894452k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21387 root 20 0 103m 344 340 S 0.0 0.0 0:00.05 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql/data ...
22234 mysql 20 0 57.9g 44g 4856 S 0.0 94.8 7501:59 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql/data ...
Na verdade, o servidor nunca deve usar mais de 36 GB:
@@key_buffer_size / (1024 * 1024): 32.0000
@@query_cache_size / (1024 * 1024): 0.0000
@@innodb_buffer_pool_size / (1024 * 1024): 33792.0000
@@innodb_log_buffer_size / (1024 * 1024): 8.0000
@@max_connections: 10
@@read_buffer_size / (1024 * 1024): 0.1250
@@read_rnd_buffer_size / (1024 * 1024): 0.2500
@@sort_buffer_size / (1024 * 1024): 0.2500
@@join_buffer_size / (1024 * 1024): 0.2500
@@binlog_cache_size / (1024 * 1024): 0.0313
@@thread_stack / (1024 * 1024): 0.2500
@@tmp_table_size / (1024 * 1024): 256.0000
MAX_MEMORY_GB: 35.5504
Aqui estão as partes importantes de my.cnf
:
[mysqld]
max-connections = 10
max_connect_errors = 10000
thread_cache_size = 50
key_buffer_size = 32M
max_allowed_packet = 128M
query_cache_size = 0
query_cache_type = 0
table_open_cache = 2048
open-files-limit = 65535
table-definition-cache = 1024
log_bin = /var/lib/mysql/mysql-bin
binlog-format = MIXED
sync_binlog = 0
binlog-checksum = CRC32
binlog-rows-query-log_events = 1
expire-logs-days = 1
gtid-mode = on
log-slave-updates = true
master-info-repository = TABLE
master-verify-checksum = 1
sync-master-info = 1
relay-log-info-repository = TABLE
innodb_log_group_home_dir = /var/lib/mysql/
innodb-file-format = BARRACUDA
innodb-log-files-in-group = 2
innodb_buffer_pool_instances = 4
innodb_log_file_size = 512M
innodb_lock_wait_timeout = 43200
innodb_flush_log_at_trx_commit = 1
innodb_flush_method = O_DIRECT
innodb_buffer_pool_size = 33G
innodb_file_per_table = 1
tmp_table_size = 256M
max-heap-table-size = 256M
tmpdir = /var/lib/mysql/tmp
enforce-gtid-consistency = true
A saída de show engine innodb status
:
=====================================
2017-03-10 13:37:31 7f2727bdf700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 27 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 499196 srv_active, 0 srv_shutdown, 1999252 srv_idle
srv_master_thread log flush and writes: 2498448
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 147502997
OS WAIT ARRAY INFO: signal count 268030727
Mutex spin waits 483748333, rounds 5969256182, OS waits 132176123
RW-shared spins 40672399, rounds 210459357, OS waits 2338871
RW-excl spins 112924649, rounds 647917761, OS waits 7715821
Spin rounds per wait: 12.34 mutex, 5.17 RW-shared, 5.74 RW-excl
------------
TRANSACTIONS
------------
Trx id counter 8637239322
Purge done for trx's n:o < 8637239322 undo n:o < 0 state: running but idle
History list length 2341
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 611991, OS thread handle 0x7f2727bdf700, query id 12787999 localhost root init
show engine innodb status
---TRANSACTION 8637239308, not started
MySQL thread id 599812, OS thread handle 0x7f273c0c3700, query id 12781840 Slave has read all relay log; waiting for the slave I/O thread to update it
---TRANSACTION 8637239320, not started
MySQL thread id 1, OS thread handle 0x7f30059a8700, query id 0 Waiting for master to send event
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
38113420 OS file reads, 692678361 OS file writes, 295468168 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 244028, seg size 244030, 1929352 merges
merged operations:
insert 7499128, delete mark 734782843, delete 66426401
discarded operations:
insert 0, delete mark 0, delete 0
Hash table size 73015559, node heap has 35208 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 130331695761980
Log flushed up to 130331695761980
Pages flushed up to 130331695761980
Last checkpoint at 130331695761980
0 pending log writes, 0 pending chkp writes
279646075 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 36263952384; in additional pool allocated 0
Dictionary memory allocated 1991419
Buffer pool size 2162684
Free buffers 516257
Database pages 1611219
Old database pages 594842
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 21809666, not young 8333442679
0.00 youngs/s, 0.00 non-youngs/s
Pages read 37539158, created 237144171, written 399299350
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 1611219, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
----------------------
INDIVIDUAL BUFFER POOL INFO
----------------------
---BUFFER POOL 0
Buffer pool size 540671
Free buffers 129625
Database pages 402230
Old database pages 148499
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 5461997, not young 2089105440
0.00 youngs/s, 0.00 non-youngs/s
Pages read 9402002, created 59409061, written 106858985
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 402230, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 1
Buffer pool size 540671
Free buffers 128947
Database pages 402917
Old database pages 148753
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 5443941, not young 2080427002
0.00 youngs/s, 0.00 non-youngs/s
Pages read 9381960, created 59195515, written 96676260
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 402917, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 2
Buffer pool size 540671
Free buffers 129695
Database pages 402173
Old database pages 148475
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 5453587, not young 2090513432
0.00 youngs/s, 0.00 non-youngs/s
Pages read 9386136, created 59292596, written 96867291
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 402173, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 3
Buffer pool size 540671
Free buffers 127990
Database pages 403899
Old database pages 149115
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 5450141, not young 2073396805
0.00 youngs/s, 0.00 non-youngs/s
Pages read 9369060, created 59246999, written 98896814
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 403899, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Main thread process no. 22234, id 139806582294272, state: sleeping
Number of rows inserted 23128153712, updated 2139463431, deleted 353665279, read 10407801812
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
As únicas mensagens que tenho no log de erros dizem respeito ao backup diário quando o xtrabackup mata o thread SQL escravo.
Não sei mais onde procurar ou o que pode ser suspeito na saída do monitor innodb. A pesquisa no Google falhou, porque tudo o que consigo são os artigos usuais sobre como o MySQL usa memória.
Tenho certeza que você está acertando o bug #72885 que foi corrigido em 5.6.25. Então você tem a opção de alterar:
Ou atualize para 5.6.25, a parte relevante das notas de lançamento afirma: