我今天遇到了一个问题,我不确定这是 MySQL 8 中的错误还是我做错了什么。
前言
在帖子的底部,我附加了一个查询以在本地重现该问题。
我有三张桌子:clients
还有两张。所有表都有大约 50 列。我正在做一个非常简单的查询,通过外键将所有表连接到clients
表中,然后对它们进行排序。
该clients
表有 1M 行。另外两张桌子是空的。
有问题的查询如下所示:
select
test_client.*,
test_join1.*,
test_join2.*
from
test_client
left join test_join1 on test_client.id = test_join1.client_id
left join test_join2 on test_client.id = test_join2.client_id
order by test_client.id
limit 10;
此查询需要 4-5 秒才能完成。
当我删除ORDER BY
它时,在50 ms
. 如果我使用/*+ NO_BNL() */
规划器提示禁用哈希连接,它也会在50 ms
.
这是查询计划:
1. 常规查询ORDER_BY
:
解释分析:
-> Limit: 10 row(s) (actual time=4757..4757 rows=10 loops=1)
-> Sort: test_client.id, limit input to 10 row(s) per chunk (actual time=4757..4757 rows=10 loops=1)
-> Stream results (cost=194554 rows=994944) (actual time=0.59..4241 rows=1e+6 loops=1)
-> Left hash join (test_join2.client_id = test_client.id) (cost=194554 rows=994944) (actual time=0.574..1334 rows=1e+6 loops=1)
-> Left hash join (test_join1.client_id = test_client.id) (cost=147022 rows=994944) (actual time=0.444..1240 rows=1e+6 loops=1)
-> Table scan on test_client (cost=100801 rows=994944) (actual time=0.331..1118 rows=1e+6 loops=1)
-> Hash
-> Table scan on test_join1 (cost=0.0478 rows=1) (actual time=0.0984..0.0984 rows=0 loops=1)
-> Hash
-> Table scan on test_join2 (cost=0.0955 rows=1) (actual time=0.117..0.117 rows=0 loops=1)
解释:
+--+-----------+-----------+----------+----+-------------+----+-------+----+------+--------+------------------------------------------+
|id|select_type|table |partitions|type|possible_keys|key |key_len|ref |rows |filtered|Extra |
+--+-----------+-----------+----------+----+-------------+----+-------+----+------+--------+------------------------------------------+
|1 |SIMPLE |test_client|null |ALL |null |null|null |null|994944|100 |Using temporary; Using filesort |
|1 |SIMPLE |test_join1 |null |ALL |client_id |null|null |null|1 |100 |Using where; Using join buffer (hash join)|
|1 |SIMPLE |test_join2 |null |ALL |client_id |null|null |null|1 |100 |Using where; Using join buffer (hash join)|
+--+-----------+-----------+----------+----+-------------+----+-------+----+------+--------+------------------------------------------+
2. 没有ORDER BY
:
解释分析:
-> Limit: 10 row(s) (cost=194554 rows=10) (actual time=0.0813..0.096 rows=10 loops=1)
-> Left hash join (test_join2.client_id = test_client.id) (cost=194554 rows=994944) (actual time=0.0806..0.0946 rows=10 loops=1)
-> Left hash join (test_join1.client_id = test_client.id) (cost=147022 rows=994944) (actual time=0.0643..0.0776 rows=10 loops=1)
-> Table scan on test_client (cost=100801 rows=994944) (actual time=0.0482..0.0607 rows=10 loops=1)
-> Hash
-> Table scan on test_join1 (cost=0.0478 rows=1) (actual time=0.00996..0.00996 rows=0 loops=1)
-> Hash
-> Table scan on test_join2 (cost=0.0955 rows=1) (actual time=0.0105..0.0105 rows=0 loops=1)
解释:
+--+-----------+-----------+----------+----+-------------+----+-------+----+------+--------+------------------------------------------+
|id|select_type|table |partitions|type|possible_keys|key |key_len|ref |rows |filtered|Extra |
+--+-----------+-----------+----------+----+-------------+----+-------+----+------+--------+------------------------------------------+
|1 |SIMPLE |test_client|null |ALL |null |null|null |null|994944|100 |null |
|1 |SIMPLE |test_join1 |null |ALL |client_id |null|null |null|1 |100 |Using where; Using join buffer (hash join)|
|1 |SIMPLE |test_join2 |null |ALL |client_id |null|null |null|1 |100 |Using where; Using join buffer (hash join)|
+--+-----------+-----------+----------+----+-------------+----+-------+----+------+--------+------------------------------------------+
3./*+ NO_BNL() */
提示:
解释分析:
-> Limit: 10 row(s) (cost=497474 rows=10) (actual time=0.179..0.3 rows=10 loops=1)
-> Nested loop left join (cost=497474 rows=10) (actual time=0.178..0.296 rows=10 loops=1)
-> Nested loop left join (cost=248737 rows=10) (actual time=0.144..0.239 rows=10 loops=1)
-> Index scan on test_client using PRIMARY (cost=0.0131 rows=10) (actual time=0.0782..0.148 rows=10 loops=1)
-> Index lookup on test_join1 using client_id (client_id=test_client.id) (cost=0.25 rows=1) (actual time=0.00824..0.00824 rows=0 loops=10)
-> Index lookup on test_join2 using client_id (client_id=test_client.id) (cost=0.25 rows=1) (actual time=0.0048..0.0048 rows=0 loops=10)
解释:
+--+-----------+-----------+----------+-----+-------------+---------+-------+----------------------------+----+--------+-----+
|id|select_type|table |partitions|type |possible_keys|key |key_len|ref |rows|filtered|Extra|
+--+-----------+-----------+----------+-----+-------------+---------+-------+----------------------------+----+--------+-----+
|1 |SIMPLE |test_client|null |index|null |PRIMARY |4 |null |10 |100 |null |
|1 |SIMPLE |test_join1 |null |ref |client_id |client_id|5 |hashjoinissue.test_client.id|1 |100 |null |
|1 |SIMPLE |test_join2 |null |ref |client_id |client_id|5 |hashjoinissue.test_client.id|1 |100 |null |
+--+-----------+-----------+----------+-----+-------------+---------+-------+----------------------------+----+--------+-----+
这是 mysql 8 中的错误,还是我做错了什么?
编辑1
只是为了澄清 - 真实的表看起来与此处的表并不完全相同,但它具有相似数量的列(平均而言)。
真实的数据集也确实有一些数据test_join1
和test_join2
表格 - 我刚刚意识到那里的数据量似乎并不那么重要。
我添加了这么多列来说明问题。然而我注意到的是,如果查询不要求所有列,而只要求 id:
select
test_client.id,
test_join1.id,
test_join2.id
from
test_client
left join test_join1 on test_client.id = test_join1.client_id
left join test_join2 on test_client.id = test_join2.client_id
order by test_client.id limit 10;
那么查询几乎是即时的。请注意,任何其他列都不会用于任何进一步的过滤、连接或排序,但是当规划器期望结果数据足够大时,它会切换到使用散列连接,这不是最佳解决方案。
解释:
+--+-----------+-----------+----------+-----+-------------+--------------+-------+----------------------------+----+--------+-----------+
|id|select_type|table |partitions|type |possible_keys|key |key_len|ref |rows|filtered|Extra |
+--+-----------+-----------+----------+-----+-------------+--------------+-------+----------------------------+----+--------+-----------+
|1 |SIMPLE |test_client|null |index|null |test_client_id|4 |null |10 |100 |Using index|
|1 |SIMPLE |test_join1 |null |ref |client_id |client_id |5 |hashjoinissue.test_client.id|1 |100 |Using index|
|1 |SIMPLE |test_join2 |null |ref |client_id |client_id |5 |hashjoinissue.test_client.id|1 |100 |Using index|
+--+-----------+-----------+----------+-----+-------------+--------------+-------+----------------------------+----+--------+-----------+
解释分析:
-> Limit: 10 row(s) (cost=497474 rows=10) (actual time=0.796..0.835 rows=10 loops=1)
-> Nested loop left join (cost=497474 rows=10) (actual time=0.794..0.833 rows=10 loops=1)
-> Nested loop left join (cost=248737 rows=10) (actual time=0.721..0.757 rows=10 loops=1)
-> Covering index scan on test_client using test_client_id (cost=0.0131 rows=10) (actual time=0.543..0.545 rows=10 loops=1)
-> Covering index lookup on test_join1 using client_id (client_id=test_client.id) (cost=0.25 rows=1) (actual time=0.0198..0.0198 rows=0 loops=10)
-> Covering index lookup on test_join2 using client_id (client_id=test_client.id) (cost=0.25 rows=1) (actual time=0.00645..0.00645 rows=0 loops=10)
查询重现数据
以下查询可用于在本地重现数据:
drop table if exists test_join1;
drop table if exists test_join2;
drop table if exists test_client cascade;
create table if not exists test_client (
id int auto_increment primary key,
col1 varchar(255),
col2 varchar(255),
col3 varchar(255),
col4 varchar(255),
col5 varchar(255),
col6 varchar(255),
col7 varchar(255),
col8 varchar(255),
col9 varchar(255),
col10 varchar(255),
col11 varchar(255),
col12 varchar(255),
col13 varchar(255),
col14 varchar(255),
col15 varchar(255),
col16 varchar(255),
col17 varchar(255),
col18 varchar(255),
col19 varchar(255),
col20 varchar(255),
col21 varchar(255),
col22 varchar(255),
col23 varchar(255),
col24 varchar(255),
col25 varchar(255),
col26 varchar(255),
col27 varchar(255),
col28 varchar(255),
col29 varchar(255),
col30 varchar(255),
col31 varchar(255),
col32 varchar(255),
col33 varchar(255),
col34 varchar(255),
col35 varchar(255),
col36 varchar(255),
col37 varchar(255),
col38 varchar(255),
col39 varchar(255),
col40 varchar(255),
col41 varchar(255),
col42 varchar(255),
col43 varchar(255),
col44 varchar(255),
col45 varchar(255),
col46 varchar(255),
col47 varchar(255),
col48 varchar(255),
col49 varchar(255)
);
create table if not exists test_join1 (
id int auto_increment primary key,
client_id int,
col1 varchar(255),
col2 varchar(255),
col3 varchar(255),
col4 varchar(255),
col5 varchar(255),
col6 varchar(255),
col7 varchar(255),
col8 varchar(255),
col9 varchar(255),
col10 varchar(255),
col11 varchar(255),
col12 varchar(255),
col13 varchar(255),
col14 varchar(255),
col15 varchar(255),
col16 varchar(255),
col17 varchar(255),
col18 varchar(255),
col19 varchar(255),
col20 varchar(255),
col21 varchar(255),
col22 varchar(255),
col23 varchar(255),
col24 varchar(255),
col25 varchar(255),
col26 varchar(255),
col27 varchar(255),
col28 varchar(255),
col29 varchar(255),
col30 varchar(255),
col31 varchar(255),
col32 varchar(255),
col33 varchar(255),
col34 varchar(255),
col35 varchar(255),
col36 varchar(255),
col37 varchar(255),
col38 varchar(255),
col39 varchar(255),
col40 varchar(255),
col41 varchar(255),
col42 varchar(255),
col43 varchar(255),
col44 varchar(255),
col45 varchar(255),
col46 varchar(255),
col47 varchar(255),
col48 varchar(255),
col49 varchar(255),
foreign key (client_id) references test_client(id)
);
create table if not exists test_join2 (
id int auto_increment primary key,
client_id int,
col1 varchar(255),
col2 varchar(255),
col3 varchar(255),
col4 varchar(255),
col5 varchar(255),
col6 varchar(255),
col7 varchar(255),
col8 varchar(255),
col9 varchar(255),
col10 varchar(255),
col11 varchar(255),
col12 varchar(255),
col13 varchar(255),
col14 varchar(255),
col15 varchar(255),
col16 varchar(255),
col17 varchar(255),
col18 varchar(255),
col19 varchar(255),
col20 varchar(255),
col21 varchar(255),
col22 varchar(255),
col23 varchar(255),
col24 varchar(255),
col25 varchar(255),
col26 varchar(255),
col27 varchar(255),
col28 varchar(255),
col29 varchar(255),
col30 varchar(255),
col31 varchar(255),
col32 varchar(255),
col33 varchar(255),
col34 varchar(255),
col35 varchar(255),
col36 varchar(255),
col37 varchar(255),
col38 varchar(255),
col39 varchar(255),
col40 varchar(255),
col41 varchar(255),
col42 varchar(255),
col43 varchar(255),
col44 varchar(255),
col45 varchar(255),
col46 varchar(255),
col47 varchar(255),
col48 varchar(255),
col49 varchar(255),
foreign key (client_id) references test_client(id)
);
SET @@cte_max_recursion_depth = 1000001;
insert into test_client
WITH RECURSIVE nrows(id) AS (
SELECT 1 as id UNION ALL
SELECT id + 1 from nrows WHERE id<=1000000
)
SELECT id, '',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'' FROM nrows;
create index test_client_id on test_client (id);