Chloe提出的问题 -dba

Chloe

Asked: 2020-01-21 18:43:17 +0800 CST

Por que o MySQL não usa índices para subconsultas?

0

Esta consulta leva uma eternidade para ser executada (30+m - infinito).

select date, 
       sc, 
       ( select count(fingerprint_id) 
         from stats 
         where hit_date >= t.date 
           and hit_date < date_add('2020-01-20', interval 1 day) 
           and hit_type = 0 
           and fingerprint_id is not null ) as total_fingerprint
from ( select date(hit_date) as date, 
              sum(sc) as sc 
       from delayed_stats  
       where hit_date > date_sub(now(), interval 1 day) 
       group by date(hit_date) 
       order by hit_date) t;

As consultas individuais levam 1s e 8s para serem executadas, mas combinadas nunca terminam. Eu esperava 8-9s. Se eu substituir t.datepelo estático '2020-01-20', levará 8s. Apenas substituir uma data estática por t.datecausa que a consulta 'trava'. A consulta mínima que replica esse enforcamento é

select date, 
       (select count(fingerprint_id) from stats where hit_date >= t.date and hit_date < date_add(t.date, interval 1 day) and hit_type = 0 and fingerprint_id is not null) as total_fingerprint
from (select '2020-01-01' as date union select '2020-01-02' as date) t;

Esta é a explicação da consulta:

+----+--------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+--------------+---------+------+-----------+----------+--------------------------------------------------------+
| id | select_type        | table         | partitions                                                                                                                                | type  | possible_keys | key          | key_len | ref  | rows      | filtered | Extra                                                  |
+----+--------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+--------------+---------+------+-----------+----------+--------------------------------------------------------+
|  1 | PRIMARY            | <derived3>    | NULL                                                                                                                                       | ALL   | NULL          | NULL         | NULL    | NULL |      7496 |   100.00 | NULL                                                   |
|  3 | DERIVED            | delayed_stats | NULL                                                                                                                                       | range | hit_date_idx  | hit_date_idx | 5       | NULL |      7496 |   100.00 | Using index condition; Using temporary; Using filesort |
|  2 | DEPENDENT SUBQUERY | stats         | p20180101,p20180201,p20180301,p20180401,p20180501,p20180601,p20180701,p20180801,p20180901,p20181001,p20181101,p20181201,p20190101,p20190201,p20190301,p20190401,p20190501,p20190601,p20190701,p20190801,p20190901,p20191001,p20191101,p20191201,p20200101,p20200201 | ALL   | NULL          | NULL         | NULL    | NULL | 316867000 |     1.00 | Using where                                            |
+----+--------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+--------------+---------+------+-----------+----------+--------------------------------------------------------+
3 rows in set, 2 warnings (0.11 sec)

Ele não parece estar usando o índice hit_date ( PRIMARY KEY (id ,hit_date )) na subconsulta para statstabela. Meu objetivo final é combinar essas duas consultas ( interval 30 day):

select date(hit_date), 
       sum(sc) 
from delayed_stats 
where hit_date > date_sub(now(), interval 30 day) 
group by date(hit_date) 
order by hit_date;

select date(hit_date), 
       count(fingerprint_id) 
from stats 
where hit_date > date_sub(now(), interval 30 day) 
  and hit_type = 0 
  and fingerprint_id is not null 
group by date(hit_date) 
order by hit_date; -- 2m21s

Quando vejo o plano de consulta para a segunda consulta na statstabela, ele mostra o possible_keysas PRIMARY,source_id,stats_bag_id_idx. Tentei outra forma de combiná-los, com uma junção, mas isso levou 15m para rodar, quando deveria levar apenas 2m.

select t.date, 
       sc, 
       fingerprint_count 
from ( select date(hit_date) date, 
              sum(sc) as sc 
       from delayed_stats 
       where hit_date > date_sub(now(), interval 30 day) 
       group by date(hit_date) 
       order by hit_date ) t 
join ( select date(hit_date) date, 
              count(fingerprint_id) as fingerprint_count 
       from stats 
       where hit_date > date_sub(now(), interval 30 day) 
         and hit_type = 0 
         and fingerprint_id is not null 
       group by date(hit_date) 
       order by hit_date ) t2 on t.date = t2.date;

Chloe

Asked: 2019-09-24 20:46:53 +0800 CST

É mais rápido dividir uma tabela grande em 12 tabelas mensais contínuas e usá-las UNION para relatórios ou manter uma tabela grande e excluir linhas com mais de 1 ano?

0

Meu colega de trabalho quer dividir uma grande tabela de estatísticas de 158 milhões de linhas em stats_jan, stats_feb, ... e usar UNION para selecionar deles para relatórios. Essa é uma prática padrão e é mais rápido do que apenas usar a tabela grande no local e excluir linhas com mais de um ano? A tabela é muitas linhas pequenas.

mysql> describe stats;
+----------------+---------------------+------+-----+---------+----------------+
| Field          | Type                | Null | Key | Default | Extra          |
+----------------+---------------------+------+-----+---------+----------------+
| id             | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment |
| badge_id       | bigint(20) unsigned | NO   | MUL | NULL    |                |
| hit_date       | datetime            | YES  | MUL | NULL    |                |
| hit_type       | tinyint(4)          | YES  |     | NULL    |                |
| source_id      | bigint(20) unsigned | YES  | MUL | NULL    |                |
| fingerprint_id | bigint(20) unsigned | YES  |     | NULL    |                |
+----------------+---------------------+------+-----+---------+----------------+

Eu dividi manualmente a tabela e copiei as linhas nas tabelas de mês apropriadas e criei uma consulta UNION gigante. A consulta UNION grande levou 14s versus 4,5m para a consulta de tabela única. Por que muitas tabelas menores levariam um tempo significativamente menor do que uma tabela grande, quando é o mesmo número de linhas no total?

create table stats_jan (...);
create table stats_feb (...);
...
create index stats_jan_hit_date_idx on stats_jan (hit_date);
...
insert into stats_jan select * from stats where hit_date >= '2019-01-01' and hit_date < '2019-02-01';
...
delete from stats where hit_date < '2018-09-01';
...

As tabelas mensais têm de 1,7 milhão de linhas a 35 milhões de linhas.

select host as `key`, count(*) as value from stats join sources on source_id = sources.id where hit_date >= '2019-08-21 19:43:19' and sources.host != 'NONE' group by source_id order by value desc limit 10;
4 min 30.39 sec

flush tables;
reset query cache;

select host as `key`, count(*) as value from stats_jan join sources on source_id = sources.id where hit_date >= '2019-08-21 19:43:19' and sources.host != 'NONE' group by source_id
UNION
...
order by value desc limit 10;
14.16 sec

Chloe

Asked: 2018-12-13 11:40:19 +0800 CST

Como posso acelerar essa consulta de 2m5s que possui índices?

0

Como posso acelerar essa consulta de 2m5s que possui índices?

select urls.id as urlId, 
    count(case when s1.hit_type = 0 then 1 end) as aCount, 
    count(case when s1.hit_type = 1 then 1 end) as bCount, 
    count(case when s1.hit_type = 2 then 1 end) as cCount, 
    count(distinct s1.source_id) as sourcesCount 
from urls join stats s1 on urls.id = s1.url_id 
where s1.hit_date >= '2017-12-12' 
group by urls.id 
order by aCount desc 
limit 0,100;

mysql> show create table stats;

| stats | CREATE TABLE `stats` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `url_id` varchar(100) DEFAULT NULL,
  `hit_date` datetime DEFAULT NULL,
  `hit_type` tinyint(4) DEFAULT NULL,
  `source_id` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `url_id_idx` (`url_id`),
  KEY `source_id` (`source_id`),
  KEY `stats_hit_date_idx` (`hit_date`),
  CONSTRAINT `stats_ibfk_1` FOREIGN KEY (`url_id`) REFERENCES `urls` (`ID`),
  CONSTRAINT `stats_ibfk_2` FOREIGN KEY (`source_id`) REFERENCES `sources` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6027557 DEFAULT CHARSET=latin1 |

mysql> describe select...
| id | select_type | table   | type   | possible_keys                                                                                   | key     | key_len | ref                      | rows    | Extra                                        |
+----+-------------+---------+--------+-------------------------------------------------------------------------------------------------+---------+---------+--------------------------+---------+----------------------------------------------+
|  1 | SIMPLE      | s1      | ALL    | url_id_idx,stats_hit_date_idx                                                                   | NULL    | NULL    | NULL                     | 5869695 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | urls    | eq_ref | PRIMARY,urls_email_idx,urls_status_idx,deptId_idx,deptId_status_email_idx                       | PRIMARY | 102     | db.s1.url_id             |     1   | Using index                                  |

Não parece estar usando o índice hit_date ou o índice url_id.

Eu tentei usar uma sub-seleção (select count(*) from stats where url_id = ... and hit_date >= ... and hit_type = 0) as aCounte foi mais rápido e levou 24s. Existe uma maneira de torná-lo menos de 5s? O limite para toda a solicitação é de 30 segundos.

Versão do MySQL Server: 5.6.35-log MySQL Community Server (GPL)

Chloe

Asked: 2018-05-17 14:54:08 +0800 CST

É possível combinar essas três consultas de junção em uma?

2

Eu gostaria de combinar as contagens das estatísticas por diferentes hit_types em uma consulta. Isso é possível?

MariaDB [db]> select allurls.id, count(s1.id) from allurls inner join stats s1 on allurls.id = s1.allurl_id and s1.hit_type = 0 where s1.hit_date >= '2018-01-15'  group by allurls.id;
+-----+--------------+
| id  | count(s1.id) |
+-----+--------------+
| aaa |            1 |
| cnn |           16 |
+-----+--------------+

MariaDB [db]> select allurls.id, count(s1.id) from allurls inner join stats s1 on allurls.id = s1.allurl_id and s1.hit_type = 1 where s1.hit_date >= '2018-01-15'  group by allurls.id;
+-----+--------------+
| id  | count(s1.id) |
+-----+--------------+
| cnn |            1 |
+-----+--------------+

MariaDB [db]> select allurls.id, count(s1.id) from allurls inner join stats s1 on allurls.id = s1.allurl_id and s1.hit_type = 2 where s1.hit_date >= '2018-01-15'  group by allurls.id;
+-----+--------------+
| id  | count(s1.id) |
+-----+--------------+
| cnn |            4 |
+-----+--------------+

Tentei combinar os dois primeiros, mas os números estão todos confusos e eliminou o primeiro resultado 'aaa'.

MariaDB [db]> select allurls.id, count(s1.id), count(s2.id) from allurls inner join stats s1 on allurls.id = s1.allurl_id and s1.hit_type = 0 inner join stats s2 on allurls.id = s2.allurl_id and s2.hit_type = 1 where s1.hit_date >= '2018-01-15' and s2.hit_date >= '2018-01-15' group by allurls.id;
+-----+--------------+--------------+
| id  | count(s1.id) | count(s2.id) |
+-----+--------------+--------------+
| cnn |           16 |           16 |
+-----+--------------+--------------+

eu esperava ver

+-----+--------------+--------------+
| id  | count(s1.id) | count(s2.id) |
+-----+--------------+--------------+
| aaa |            1 |            0 |
| cnn |           16 |            1 |
+-----+--------------+--------------+

Em última análise, também quero incluir count(distinct(s4.source_id)).

Aqui está um violino: https://www.db-fiddle.com/f/tGP5SbC2AdGgeEwWTAgobf/0

Chloe

Asked: 2016-09-06 14:11:34 +0800 CST

Como esta instrução atualizou 3 linhas no Postgres?

4

Estou curioso para saber como esta declaração atualizou 3 linhas no Postgres. Todas as outras vezes que executei, ele atualizaria 0 ou 1. Existe uma maneira de descobrir quais linhas?

bestsales=# update keyword set revenue = random()*10 where id = cast(random()*99999 as int);
UPDATE 3

idé a chave primária.

 id               | integer                        | not null default nextval('keyword_id_seq'::regclass)
    "keyword_pkey" PRIMARY KEY, btree (id)

Eu tentei executá-lo como SELECT:

bestsales=# select * from keyword where id = cast(random()*99999 as int);
  id   |       keyword       | seed_id | source | search_count | country | language | volume | cpc  | competition | modified_on | google_violation | revenue | bing_violation
-------+---------------------+---------+--------+--------------+---------+----------+--------+------+-------------+-------------+------------------+---------+----------------
  6833 | vizio m190mv        |         | GOOGLE |            0 |         |          |     70 | 0.38 |        0.90 |             |                  |         |
 65765 | shiatsu massage mat |         | SPYFU  |            0 |         |          |    110 | 0.69 |             |             |                  |         |
 87998 | granary flour       |         | SPYFU  |            0 |         |          |     40 | 0.04 |             |             |                  |         |
(3 rows)

E às vezes retornava mais de um. Como isso é possível?

PostgreSQL 9.5.3

Chloe

Asked: 2016-08-24 11:07:33 +0800 CST

Como esse índice exclusivo pode permitir linhas duplicadas?

1

Existe uma maneira de esse índice exclusivo permitir linhas duplicadas? Achei que talvez houvesse alguns caracteres espaciais extras, mas não consigo encontrá-los.

=> select *, length(keyword), length(country), length(language) from keyword where id in (4588076, 4951423);
   id    |       keyword       | seed_id | source | search_count | country | language | volume | cpc  | competition | modified_on | violation | revenue | length | length | length
---------+---------------------+---------+--------+--------------+---------+----------+--------+------+-------------+-------------+-----------+---------+--------+--------+--------
 4588076 | power wallet review |         | SPYFU  |            0 |         |          |     70 | 0.11 |        0.31 |             |           |         |     19 |        |
 4951423 | power wallet review |         | SPYFU  |            2 |         |          |     70 | 0.11 |        0.31 |             |           |         |     19 |        |
(2 rows)

O índice é

"keyword_keyword_country_language" UNIQUE, btree (keyword, country, language)

PostgreSQL 9.5.3

OK, eu estava planejando remover as outras duas colunas, mas pensei em testar a keywordcoluna e encontrei isto:

=> select k1.id, k1.keyword, k2.id, k2.keyword, k1.keyword=k2.keyword from keyword k1, keyword k2 where k1.id=4588076 and k2.id=4951423;
   id    |       keyword       |   id    |       keyword       | ?column?
---------+---------------------+---------+---------------------+----------
 4588076 | power wallet review | 4951423 | power wallet review | f

Por que o MySQL não usa índices para subconsultas?

É mais rápido dividir uma tabela grande em 12 tabelas mensais contínuas e usá-las UNION para relatórios ou manter uma tabela grande e excluir linhas com mais de 1 ano?

Como posso acelerar essa consulta de 2m5s que possui índices?

É possível combinar essas três consultas de junção em uma?

Como esta instrução atualizou 3 linhas no Postgres?

Como esse índice exclusivo pode permitir linhas duplicadas?

conectar ao servidor PostgreSQL: FATAL: nenhuma entrada pg_hba.conf para o host

Como fazer a saída do sqlplus aparecer em uma linha?

Selecione qual tem data máxima ou data mais recente

Como faço para listar todos os esquemas no PostgreSQL?

Listar todas as colunas de uma tabela especificada

Como usar o sqlplus para se conectar a um banco de dados Oracle localizado em outro host sem modificar meu próprio tnsnames.ora

Como você mysqldump tabela (s) específica (s)?

Listar os privilégios do banco de dados usando o psql

Como inserir valores em uma tabela de uma consulta de seleção no PostgreSQL?

Como faço para listar todos os bancos de dados e tabelas usando o psql?

Chloe's questions