此查询需要永远运行(30+m - 无穷大)。
select date,
sc,
( select count(fingerprint_id)
from stats
where hit_date >= t.date
and hit_date < date_add('2020-01-20', interval 1 day)
and hit_type = 0
and fingerprint_id is not null ) as total_fingerprint
from ( select date(hit_date) as date,
sum(sc) as sc
from delayed_stats
where hit_date > date_sub(now(), interval 1 day)
group by date(hit_date)
order by hit_date) t;
单个查询需要 1 秒和 8 秒才能运行,但组合起来永远不会完成。我预计8-9秒。如果我t.date
用静态的“2020-01-20”替换,则需要 8 秒。只需将一个静态日期替换为t.date
导致查询“挂起”。复制此挂起的最小查询是
select date,
(select count(fingerprint_id) from stats where hit_date >= t.date and hit_date < date_add(t.date, interval 1 day) and hit_type = 0 and fingerprint_id is not null) as total_fingerprint
from (select '2020-01-01' as date union select '2020-01-02' as date) t;
这是查询解释:
+----+--------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+--------------+---------+------+-----------+----------+--------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+--------------+---------+------+-----------+----------+--------------------------------------------------------+
| 1 | PRIMARY | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 7496 | 100.00 | NULL |
| 3 | DERIVED | delayed_stats | NULL | range | hit_date_idx | hit_date_idx | 5 | NULL | 7496 | 100.00 | Using index condition; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | stats | p20180101,p20180201,p20180301,p20180401,p20180501,p20180601,p20180701,p20180801,p20180901,p20181001,p20181101,p20181201,p20190101,p20190201,p20190301,p20190401,p20190501,p20190601,p20190701,p20190801,p20190901,p20191001,p20191101,p20191201,p20200101,p20200201 | ALL | NULL | NULL | NULL | NULL | 316867000 | 1.00 | Using where |
+----+--------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+--------------+---------+------+-----------+----------+--------------------------------------------------------+
3 rows in set, 2 warnings (0.11 sec)
它似乎没有在表的子查询上使用 hit_date 索引(PRIMARY KEY (
id ,
hit_date )
)stats
。我的最终目标是结合这两个查询(interval 30 day
):
select date(hit_date),
sum(sc)
from delayed_stats
where hit_date > date_sub(now(), interval 30 day)
group by date(hit_date)
order by hit_date;
select date(hit_date),
count(fingerprint_id)
from stats
where hit_date > date_sub(now(), interval 30 day)
and hit_type = 0
and fingerprint_id is not null
group by date(hit_date)
order by hit_date; -- 2m21s
当我看到表上第二个查询的查询计划时stats
,它显示possible_keys
为PRIMARY,source_id,stats_bag_id_idx
. 我尝试了另一种将它们组合在一起的方法,即加入,但是运行需要 15m,而它应该只需要 2m。
select t.date,
sc,
fingerprint_count
from ( select date(hit_date) date,
sum(sc) as sc
from delayed_stats
where hit_date > date_sub(now(), interval 30 day)
group by date(hit_date)
order by hit_date ) t
join ( select date(hit_date) date,
count(fingerprint_id) as fingerprint_count
from stats
where hit_date > date_sub(now(), interval 30 day)
and hit_type = 0
and fingerprint_id is not null
group by date(hit_date)
order by hit_date ) t2 on t.date = t2.date;
我通过使用最后一个连接示例解决了这个问题,并且能够使用这种结构将 8 个不同的查询串在一起:
跑了大约20m,这对于报告来说还可以,比我想象的要少。