Onde posso encontrar o log lento do mysql?

Question

Martin

Asked: 2015-10-17 10:28:27 +0800 CST2015-10-17 10:28:27 +0800 CST 2015-10-17 10:28:27 +0800 CST

Como otimizar SELECT muito lento com LEFT JOINs em tabelas grandes

772

Eu estava pesquisando, autodidata e procurando soluções por horas, mas sem sorte. Encontrei algumas perguntas semelhantes aqui, mas não este caso.

Minhas mesas:

pessoas (~ 10 milhões de linhas)
atributos (local, idade,...)
links (M:M) entre pessoas e atributos (~40 milhões de linhas)

Despejo completo ~ 280 MB

Situação: Procuro selecionar todos os ids de pessoa ( person_id) de algumas localidades ( location.attribute_value BETWEEN 3000 AND 7000), ser de algum gênero ( gender.attribute_value = 1), ter nascido em alguns anos ( bornyear.attribute_value BETWEEN 1980 AND 2000) e ter alguma cor dos olhos ( eyecolor.attribute_value IN (2,3)).

Esta é a minha consulta que leva de 3 a 4 minutos. e gostaria de otimizar:

SELECT person_id
FROM person
    LEFT JOIN attribute location ON location.attribute_type_id = 1 AND location.person_id = person.person_id
    LEFT JOIN attribute gender ON gender.attribute_type_id = 2 AND gender.person_id = person.person_id
    LEFT JOIN attribute bornyear ON bornyear.attribute_type_id = 3 AND bornyear.person_id = person.person_id
    LEFT JOIN attribute eyecolor ON eyecolor.attribute_type_id = 4 AND eyecolor.person_id = person.person_id
WHERE 1
    AND location.attribute_value BETWEEN 3000 AND 7000
    AND gender.attribute_value = 1
    AND bornyear.attribute_value BETWEEN 1980 AND 2000
    AND eyecolor.attribute_value IN (2,3)
LIMIT 100000;

Resultado:

+-----------+
| person_id |
+-----------+
|       233 |
|       605 |
|       ... |
|   8702599 |
|   8703617 |
+-----------+
100000 rows in set (3 min 42.77 sec)

Explique estendida:

+----+-------------+----------+--------+---------------------------------------------+-----------------+---------+--------------------------+---------+----------+--------------------------+
| id | select_type | table    | type   | possible_keys                               | key             | key_len | ref                      | rows    | filtered | Extra                    |
+----+-------------+----------+--------+---------------------------------------------+-----------------+---------+--------------------------+---------+----------+--------------------------+
|  1 | SIMPLE      | bornyear | range  | attribute_type_id,attribute_value,person_id | attribute_value | 5       | NULL                     | 1265229 |   100.00 | Using where              |
|  1 | SIMPLE      | location | ref    | attribute_type_id,attribute_value,person_id | person_id       | 5       | test1.bornyear.person_id |       4 |   100.00 | Using where              |
|  1 | SIMPLE      | eyecolor | ref    | attribute_type_id,attribute_value,person_id | person_id       | 5       | test1.bornyear.person_id |       4 |   100.00 | Using where              |
|  1 | SIMPLE      | gender   | ref    | attribute_type_id,attribute_value,person_id | person_id       | 5       | test1.eyecolor.person_id |       4 |   100.00 | Using where              |
|  1 | SIMPLE      | person   | eq_ref | PRIMARY                                     | PRIMARY         | 4       | test1.location.person_id |       1 |   100.00 | Using where; Using index |
+----+-------------+----------+--------+---------------------------------------------+-----------------+---------+--------------------------+---------+----------+--------------------------+
5 rows in set, 1 warning (0.02 sec)

Perfil:

+------------------------------+-----------+
| Status                       | Duration  |
+------------------------------+-----------+
| Sending data                 |  3.069452 |
| Waiting for query cache lock |  0.000017 |
| Sending data                 |  2.968915 |
| Waiting for query cache lock |  0.000019 |
| Sending data                 |  3.042468 |
| Waiting for query cache lock |  0.000043 |
| Sending data                 |  3.264984 |
| Waiting for query cache lock |  0.000017 |
| Sending data                 |  2.823919 |
| Waiting for query cache lock |  0.000038 |
| Sending data                 |  2.863903 |
| Waiting for query cache lock |  0.000014 |
| Sending data                 |  2.971079 |
| Waiting for query cache lock |  0.000020 |
| Sending data                 |  3.053197 |
| Waiting for query cache lock |  0.000087 |
| Sending data                 |  3.099053 |
| Waiting for query cache lock |  0.000035 |
| Sending data                 |  3.064186 |
| Waiting for query cache lock |  0.000017 |
| Sending data                 |  2.939404 |
| Waiting for query cache lock |  0.000018 |
| Sending data                 |  3.440288 |
| Waiting for query cache lock |  0.000086 |
| Sending data                 |  3.115798 |
| Waiting for query cache lock |  0.000068 |
| Sending data                 |  3.075427 |
| Waiting for query cache lock |  0.000072 |
| Sending data                 |  3.658319 |
| Waiting for query cache lock |  0.000061 |
| Sending data                 |  3.335427 |
| Waiting for query cache lock |  0.000049 |
| Sending data                 |  3.319430 |
| Waiting for query cache lock |  0.000061 |
| Sending data                 |  3.496563 |
| Waiting for query cache lock |  0.000029 |
| Sending data                 |  3.017041 |
| Waiting for query cache lock |  0.000032 |
| Sending data                 |  3.132841 |
| Waiting for query cache lock |  0.000050 |
| Sending data                 |  2.901310 |
| Waiting for query cache lock |  0.000016 |
| Sending data                 |  3.107269 |
| Waiting for query cache lock |  0.000062 |
| Sending data                 |  2.937373 |
| Waiting for query cache lock |  0.000016 |
| Sending data                 |  3.097082 |
| Waiting for query cache lock |  0.000261 |
| Sending data                 |  3.026108 |
| Waiting for query cache lock |  0.000026 |
| Sending data                 |  3.089760 |
| Waiting for query cache lock |  0.000041 |
| Sending data                 |  3.012763 |
| Waiting for query cache lock |  0.000021 |
| Sending data                 |  3.069694 |
| Waiting for query cache lock |  0.000046 |
| Sending data                 |  3.591908 |
| Waiting for query cache lock |  0.000060 |
| Sending data                 |  3.526693 |
| Waiting for query cache lock |  0.000076 |
| Sending data                 |  3.772659 |
| Waiting for query cache lock |  0.000069 |
| Sending data                 |  3.346089 |
| Waiting for query cache lock |  0.000245 |
| Sending data                 |  3.300460 |
| Waiting for query cache lock |  0.000019 |
| Sending data                 |  3.135361 |
| Waiting for query cache lock |  0.000021 |
| Sending data                 |  2.909447 |
| Waiting for query cache lock |  0.000039 |
| Sending data                 |  3.337561 |
| Waiting for query cache lock |  0.000140 |
| Sending data                 |  3.138180 |
| Waiting for query cache lock |  0.000090 |
| Sending data                 |  3.060687 |
| Waiting for query cache lock |  0.000085 |
| Sending data                 |  2.938677 |
| Waiting for query cache lock |  0.000041 |
| Sending data                 |  2.977974 |
| Waiting for query cache lock |  0.000872 |
| Sending data                 |  2.918640 |
| Waiting for query cache lock |  0.000036 |
| Sending data                 |  2.975842 |
| Waiting for query cache lock |  0.000051 |
| Sending data                 |  2.918988 |
| Waiting for query cache lock |  0.000021 |
| Sending data                 |  2.943810 |
| Waiting for query cache lock |  0.000061 |
| Sending data                 |  3.330211 |
| Waiting for query cache lock |  0.000025 |
| Sending data                 |  3.411236 |
| Waiting for query cache lock |  0.000023 |
| Sending data                 | 23.339035 |
| end                          |  0.000807 |
| query end                    |  0.000023 |
| closing tables               |  0.000325 |
| freeing items                |  0.001217 |
| logging slow query           |  0.000007 |
| logging slow query           |  0.000011 |
| cleaning up                  |  0.000104 |
+------------------------------+-----------+
100 rows in set (0.00 sec)

Estruturas de tabelas:

CREATE TABLE `attribute` (
  `attribute_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `attribute_type_id` int(11) unsigned DEFAULT NULL,
  `attribute_value` int(6) DEFAULT NULL,
  `person_id` int(11) unsigned DEFAULT NULL,
  PRIMARY KEY (`attribute_id`),
  KEY `attribute_type_id` (`attribute_type_id`),
  KEY `attribute_value` (`attribute_value`),
  KEY `person_id` (`person_id`)
) ENGINE=MyISAM AUTO_INCREMENT=40000001 DEFAULT CHARSET=utf8;

CREATE TABLE `person` (
  `person_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `person_name` text CHARACTER SET latin1,
  PRIMARY KEY (`person_id`)
) ENGINE=MyISAM AUTO_INCREMENT=20000001 DEFAULT CHARSET=utf8;

A consulta foi realizada no servidor virtual da DigitalOcean com SSD e 1GB de RAM.

Suponho que pode haver problema com o design do banco de dados. Você tem alguma sugestão para projetar melhor esta situação, por favor? Ou apenas para ajustar o select acima?

5 respostas

Voted

Rick James · Answer 1 · 2015-10-22T08:20:12+08:00

Rick James

2015-10-22T08:20:12+08:002015-10-22T08:20:12+08:00

Escolha alguns atributos para incluir no person. Indexe-os em algumas combinações -- use índices compostos, não índices de coluna única.

Essa é essencialmente a única saída do EAV-suck-at-performance, que é onde você está.

Aqui está mais discussão: http://mysql.rjweb.org/doc.php/eav incluindo uma sugestão de usar JSON em vez da tabela de valores-chave.

13

Martin · Answer 2 · 2015-10-26T07:48:35+08:00

Espero ter encontrado uma solução suficiente. É inspirado por este artigo .

Resposta curta:

Criei 1 tabela com todos os atributos. Uma coluna para um atributo. Mais coluna de chave primária.
Os valores de atributo são armazenados em células de texto (para pesquisa de texto completo) em formato semelhante ao CSV.
Índices de texto completo criados. Antes disso é importante definir ft_min_word_len=1(para MyISAM) na [mysqld]seção e innodb_ft_min_token_size=1(para InnoDb) no my.cnfarquivo, reinicie o serviço mysql.
Exemplo de pesquisa: SELECT * FROM person_index WHERE MATCH(attribute_1) AGAINST("123 456 789" IN BOOLEAN MODE) LIMIT 1000onde 123, 456a 789são IDs que as pessoas deveriam ter associado em attribute_1. Essa consulta levou menos de 1 segundo.

Resposta detalhada:

Etapa 1. Criando tabela com índices de texto completo. O InnoDb suporta índices fulltext do MySQL 5.7, então se você usa 5.5 ou 5.6, você deve usar MyISAM. Às vezes, é ainda mais rápido para pesquisa FT do que InnoDb.

CREATE TABLE `person_attribute_ft` (
  `person_id` int(11) NOT NULL,
  `attr_1` text,
  `attr_2` text,
  `attr_3` text,
  `attr_4` text,
  PRIMARY KEY (`person_id`),
  FULLTEXT KEY `attr_1` (`attr_1`),
  FULLTEXT KEY `attr_2` (`attr_2`),
  FULLTEXT KEY `attr_3` (`attr_3`),
  FULLTEXT KEY `attr_4` (`attr_4`),
  FULLTEXT KEY `attr_12` (`attr_1`,`attr_2`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

Etapa 2. Insira os dados da tabela EAV (entity-attribute-value). Por exemplo, declarado em questão, pode ser feito com 1 SQL simples:

INSERT IGNORE INTO `person_attribute_ft`
SELECT
    p.person_id,
    (SELECT GROUP_CONCAT(a.attribute_value SEPARATOR ' ') FROM attribute a WHERE a.attribute_type_id = 1 AND a.person_id = p.person_id LIMIT 10) attr_1,
    (SELECT GROUP_CONCAT(a.attribute_value SEPARATOR ' ') FROM attribute a WHERE a.attribute_type_id = 2 AND a.person_id = p.person_id LIMIT 10) attr_2,
    (SELECT GROUP_CONCAT(a.attribute_value SEPARATOR ' ') FROM attribute a WHERE a.attribute_type_id = 3 AND a.person_id = p.person_id LIMIT 10) attr_3,
    (SELECT GROUP_CONCAT(a.attribute_value SEPARATOR ' ') FROM attribute a WHERE a.attribute_type_id = 4 AND a.person_id = p.person_id LIMIT 10) attr_4
FROM person p

Resultado deve ser algo assim:

mysql> select * from person_attribute_ft limit 10;
+-----------+--------+--------+--------+--------+
| person_id | attr_1 | attr_2 | attr_3 | attr_4 |
+-----------+--------+--------+--------+--------+
|         1 | 541    | 2      | 1927   | 3      |
|         2 | 2862   | 2      | 1939   | 4      |
|         3 | 6573   | 2      | 1904   | 2      |
|         4 | 2432   | 1      | 2005   | 2      |
|         5 | 2208   | 1      | 1995   | 4      |
|         6 | 8388   | 2      | 1973   | 1      |
|         7 | 107    | 2      | 1909   | 4      |
|         8 | 5161   | 1      | 2005   | 1      |
|         9 | 8022   | 2      | 1953   | 4      |
|        10 | 4801   | 2      | 1900   | 3      |
+-----------+--------+--------+--------+--------+
10 rows in set (0.00 sec)

Etapa 3. Selecione na tabela com consulta como esta:

mysql> SELECT SQL_NO_CACHE *
    -> FROM `person_attribute_ft`
    -> WHERE 1 AND MATCH(attr_1) AGAINST ("3000 3001 3002 3003 3004 3005 3006 3007" IN BOOLEAN MODE)
    -> AND MATCH(attr_2) AGAINST ("1" IN BOOLEAN MODE)
    -> AND MATCH(attr_3) AGAINST ("1980 1981 1982 1983 1984" IN BOOLEAN MODE)
    -> AND MATCH(attr_4) AGAINST ("2,3" IN BOOLEAN MODE)
    -> LIMIT 10000;
+-----------+--------+--------+--------+--------+
| person_id | attr_1 | attr_2 | attr_3 | attr_4 |
+-----------+--------+--------+--------+--------+
|     12131 | 3002   | 1      | 1982   | 2      |
|     51315 | 3007   | 1      | 1984   | 2      |
|    147283 | 3001   | 1      | 1984   | 2      |
|    350086 | 3005   | 1      | 1982   | 3      |
|    423907 | 3004   | 1      | 1982   | 3      |
... many rows ...
|   9423907 | 3004   | 1      | 1982   | 3      |
|   9461892 | 3007   | 1      | 1982   | 2      |
|   9516361 | 3006   | 1      | 1980   | 2      |
|   9813933 | 3005   | 1      | 1982   | 2      |
|   9986892 | 3003   | 1      | 1981   | 2      |
+-----------+--------+--------+--------+--------+
90 rows in set (0.17 sec)

A consulta seleciona todas as linhas:

correspondendo a pelo menos um desses IDs em attr_1:3000, 3001, 3002, 3003, 3004, 3005, 3006 or 3007
AND at the same time matching 1 in attr_2 (this column represents gender so if this solution were customized, it should be smallint(1) with simple index, etc...)
AND at the same time matching at least one of 1980, 1981, 1982, 1983 or 1984 in attr_3
AND at the same time matching 2 or 3 in attr_4

Conclusion:

I know this solution is not perfect and ideal for many situations but can be used as good alternative for EAV table design.

I hope it'll help someone.

wolfgangwalther · Answer 3 · 2015-10-19T04:39:25+08:00

Adicione índices a attributefor:

(person_id, attribute_type_id, attribute_value)e
(attribute_type_id, attribute_value, person_id)

Explicação

Com seu design atual, EXPLAINespera-se que sua consulta examine 1,265,229 * 4 * 4 * 4 = 80,974,656linhas no attribute. Você pode reduzir esse número adicionando um índice composto para . Usando esse índice, sua consulta examinará apenas 1 em vez de 4 linhas para cada , e .attribute(person_id, attribute_type_id)locationeyecolorgender

Você pode estender esse índice para incluir attribute_type_valuetambém: (person_id, attribute_type_id, attribute_value). Isso transformaria esse índice em um índice de cobertura para essa consulta, o que também melhoraria o desempenho.

Além disso, adicionar um índice em (attribute_type_id, attribute_value, person_id)(novamente um índice de cobertura incluindo person_id) deve melhorar o desempenho sobre apenas usar um índice attribute_valueonde mais linhas teriam que ser examinadas. Nesse caso, o primeiro passo em sua explicação será fixado: selecionar um intervalo de bornyear.

O uso desses dois índices reduziu o tempo de execução da sua consulta no meu sistema de ~ 2,0 s para ~ 0,2 s com a saída explicada assim:

+----+-------------+----------+--------+-------------------------------------+-------------------+---------+--------------------------------+---------+----------+--------------------------+
| id | select_type | table    | type   | possible_keys                       | key               | key_len | ref                            |    rows | filtered | Extra                    |
+----+-------------+----------+--------+-------------------------------------+-------------------+---------+--------------------------------+---------+----------+--------------------------+
|  1 | SIMPLE      | bornyear | range  | person_type_value,type_value_person | type_value_person |       9 |                                | 1861881 |   100.00 | Using where; Using index |
|  1 | SIMPLE      | location | ref    | person_type_value,type_value_person | person_type_value |       8 | bornyear.person_id,const       |       1 |   100.00 | Using where; Using index |
|  1 | SIMPLE      | eyecolor | ref    | person_type_value,type_value_person | person_type_value |       8 | bornyear.person_id,const       |       1 |   100.00 | Using where; Using index |
|  1 | SIMPLE      | gender   | ref    | person_type_value,type_value_person | person_type_value |      13 | bornyear.person_id,const,const |       1 |   100.00 | Using index              |
|  1 | SIMPLE      | person   | eq_ref | PRIMARY                             | PRIMARY           |       4 | bornyear.person_id             |       1 |   100.00 | Using index              |
+----+-------------+----------+--------+-------------------------------------+-------------------+---------+--------------------------------+---------+----------+--------------------------+

Vladimir Baranov · Answer 4 · 2015-10-22T16:51:25+08:00

Suponho que pode haver problema com o design do banco de dados.

Você está usando o chamado design Entity-Attribute-Value, que geralmente tem um desempenho ruim, bem, por design.

Você tem alguma sugestão para projetar melhor esta situação, por favor?

A maneira relacional clássica de projetar isso seria criar uma tabela separada para cada atributo. Em geral, você pode ter essas tabelas separadas: location, gender, bornyear, eyecolor.

O seguinte depende se determinados atributos são sempre definidos para uma pessoa ou não. E se uma pessoa pode ter apenas um valor de um atributo. Por exemplo, geralmente a pessoa tem apenas um gênero. Em seu design atual, nada impede que você adicione três linhas para a mesma pessoa com valores diferentes de gênero nelas. Você também pode definir um valor de gênero não como 1 ou 2, mas para algum número que não faz sentido, como 987 e não há restrição no banco de dados que impeça isso. Mas essa é outra questão separada de manter a integridade dos dados com o design EAV.

Se você sempre sabe o sexo da pessoa, não faz sentido colocá-lo em uma tabela separada e é muito melhor ter uma coluna não nula GenderIDna persontabela, que seria uma chave estrangeira para a tabela de pesquisa com a lista de todos os gêneros possíveis e seus nomes. Se você souber o sexo da pessoa na maioria das vezes, mas nem sempre, poderá tornar essa coluna anulável e defini-la para NULLquando as informações não estiverem disponíveis. Se na maioria das vezes o sexo da pessoa não for conhecido, talvez seja melhor ter uma tabela separada gendervinculada a person1:1 e que tenha linhas apenas para as pessoas que têm um sexo conhecido.

Considerações semelhantes se aplicam a eyecolore bornyear- é improvável que a pessoa tenha dois valores para um eyecolorou bornyear.

Se for possível que uma pessoa tenha vários valores para um atributo, você definitivamente o colocaria em uma tabela separada. Por exemplo, não é incomum que uma pessoa tenha vários endereços (casa, trabalho, correio, feriado etc.), então você listaria todos eles em uma tabela location. Tabelas persone locationseriam ligados 1:M.

Ou apenas para ajustar o select acima?

Se estiver usando o design EAV, eu faria pelo menos o seguinte.

Defina colunas attribute_type_id, attribute_value, person_idpara NOT NULL.
Configure a chave estrangeira vinculada attribute.person_idao person.person_id.
Crie um índice em três colunas (attribute_type_id, attribute_value, person_id). A ordem das colunas é importante aqui.
Até onde eu sei, MyISAM não aceita chaves estrangeiras, então não use, use InnoDB.

Eu escreveria a consulta assim. Use INNERem vez de LEFTjunções e escreva explicitamente a subconsulta para cada atributo para dar ao otimizador todas as chances de usar o índice.

SELECT person.person_id
FROM
    person
    INNER JOIN
    (
        SELECT attribute.person_id
        FROM attribute
        WHERE attribute_type_id = 1
            AND location.attribute_value BETWEEN 3000 AND 7000
    ) AS location ON location.person_id = person.person_id
    INNER JOIN
    (
        SELECT attribute.person_id
        FROM attribute
        WHERE attribute_type_id = 2
            AND location.attribute_value = 1
    ) AS gender ON gender.person_id = person.person_id
    INNER JOIN
    (
        SELECT attribute.person_id
        FROM attribute
        WHERE attribute_type_id = 3
            AND location.attribute_value BETWEEN 1980 AND 2000
    ) AS bornyear ON bornyear.person_id = person.person_id
    INNER JOIN
    (
        SELECT attribute.person_id
        FROM attribute
        WHERE attribute_type_id = 4
            AND location.attribute_value IN (2, 3)
    ) AS eyecolor ON eyecolor.person_id = person.person_id
LIMIT 100000;

Além disso, pode valer a pena particionar a attributetabela por attribute_type_id.

Muhammad Muazzam · Answer 5 · 2015-10-19T17:50:14+08:00

Muhammad Muazzam

2015-10-19T17:50:14+08:002015-10-19T17:50:14+08:00

Tente usar dicas de índice de consulta que parecem apropriadas

Dicas de índice do MySQL

0

Como otimizar SELECT muito lento com LEFT JOINs em tabelas grandes

Resposta curta:

Resposta detalhada:

conectar ao servidor PostgreSQL: FATAL: nenhuma entrada pg_hba.conf para o host

Como fazer a saída do sqlplus aparecer em uma linha?

Selecione qual tem data máxima ou data mais recente

Como faço para listar todos os esquemas no PostgreSQL?

Listar todas as colunas de uma tabela especificada

Como usar o sqlplus para se conectar a um banco de dados Oracle localizado em outro host sem modificar meu próprio tnsnames.ora

Como você mysqldump tabela (s) específica (s)?

Listar os privilégios do banco de dados usando o psql

Como inserir valores em uma tabela de uma consulta de seleção no PostgreSQL?

Como faço para listar todos os bancos de dados e tabelas usando o psql?

Como otimizar SELECT muito lento com LEFT JOINs em tabelas grandes

5 respostas

Resposta curta:

Resposta detalhada:

relate perguntas