Esta consulta leva uma eternidade para ser executada (30+m - infinito).
select date,
sc,
( select count(fingerprint_id)
from stats
where hit_date >= t.date
and hit_date < date_add('2020-01-20', interval 1 day)
and hit_type = 0
and fingerprint_id is not null ) as total_fingerprint
from ( select date(hit_date) as date,
sum(sc) as sc
from delayed_stats
where hit_date > date_sub(now(), interval 1 day)
group by date(hit_date)
order by hit_date) t;
As consultas individuais levam 1s e 8s para serem executadas, mas combinadas nunca terminam. Eu esperava 8-9s. Se eu substituir t.date
pelo estático '2020-01-20', levará 8s. Apenas substituir uma data estática por t.date
causa que a consulta 'trava'. A consulta mínima que replica esse enforcamento é
select date,
(select count(fingerprint_id) from stats where hit_date >= t.date and hit_date < date_add(t.date, interval 1 day) and hit_type = 0 and fingerprint_id is not null) as total_fingerprint
from (select '2020-01-01' as date union select '2020-01-02' as date) t;
Esta é a explicação da consulta:
+----+--------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+--------------+---------+------+-----------+----------+--------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+--------------+---------+------+-----------+----------+--------------------------------------------------------+
| 1 | PRIMARY | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 7496 | 100.00 | NULL |
| 3 | DERIVED | delayed_stats | NULL | range | hit_date_idx | hit_date_idx | 5 | NULL | 7496 | 100.00 | Using index condition; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | stats | p20180101,p20180201,p20180301,p20180401,p20180501,p20180601,p20180701,p20180801,p20180901,p20181001,p20181101,p20181201,p20190101,p20190201,p20190301,p20190401,p20190501,p20190601,p20190701,p20190801,p20190901,p20191001,p20191101,p20191201,p20200101,p20200201 | ALL | NULL | NULL | NULL | NULL | 316867000 | 1.00 | Using where |
+----+--------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+--------------+---------+------+-----------+----------+--------------------------------------------------------+
3 rows in set, 2 warnings (0.11 sec)
Ele não parece estar usando o índice hit_date ( PRIMARY KEY (
id ,
hit_date )
) na subconsulta para stats
tabela. Meu objetivo final é combinar essas duas consultas ( interval 30 day
):
select date(hit_date),
sum(sc)
from delayed_stats
where hit_date > date_sub(now(), interval 30 day)
group by date(hit_date)
order by hit_date;
select date(hit_date),
count(fingerprint_id)
from stats
where hit_date > date_sub(now(), interval 30 day)
and hit_type = 0
and fingerprint_id is not null
group by date(hit_date)
order by hit_date; -- 2m21s
Quando vejo o plano de consulta para a segunda consulta na stats
tabela, ele mostra o possible_keys
as PRIMARY,source_id,stats_bag_id_idx
. Tentei outra forma de combiná-los, com uma junção, mas isso levou 15m para rodar, quando deveria levar apenas 2m.
select t.date,
sc,
fingerprint_count
from ( select date(hit_date) date,
sum(sc) as sc
from delayed_stats
where hit_date > date_sub(now(), interval 30 day)
group by date(hit_date)
order by hit_date ) t
join ( select date(hit_date) date,
count(fingerprint_id) as fingerprint_count
from stats
where hit_date > date_sub(now(), interval 30 day)
and hit_type = 0
and fingerprint_id is not null
group by date(hit_date)
order by hit_date ) t2 on t.date = t2.date;
Eu resolvi isso usando o último exemplo de junção e consegui encadear 8 consultas diferentes usando esta estrutura:
Demorou cerca de 20m para ser executado, o que é bom para um relatório e menos do que eu pensava que levaria.