这似乎是一个常见问题的变体,即对查询中 LIMIT 子句的微小更改会将查询计划更改为性能极差的计划。在本例中,我在 PostgreSQL 13 数据库中有两个表:
assets
有 ~140 行;asset_data
大约有 1.41 亿行。
我的查询如下:
SELECT
a.dp as dp,
a.at as at,
a.dt as dt,
a.it as it,
a.ai as ai,
ad.idt as idt,
ad.data as data
FROM assets a
JOIN asset_data ad ON a.id = ad.asset_id_fk
WHERE
a.dp = 'kr'
AND a.at = 'fr'
AND a.dt = 'oh'
AND a.it = '1m'
AND a.ai = 'st'
ORDER BY idt desc
LIMIT 8000
上有一个索引idt
。
当限制时,8000
它只是使用idt
索引进行排序;当限制时,9000
它在连接后执行排序。
净性能从 3 秒缩短到近 12 分钟。
在阅读了其中一些类型的问题后,我尝试了一个VACUUM ANALYZE
,这改变了查询计划,但没有任何重要的方式。
更新:
我还尝试设置
idt
和asset_id_fk
列的统计信息,但这不起作用1000
,而且对我来说它应该起作用并不明显。idt 本身并不独特
idt + asset_id_fk 是唯一的并且有相应的约束
资产表仅返回一行
asset_id_fk 和 idt 的组合上有一个索引
idt上有单独的索引
关于解决此问题的适当方法有什么建议吗?
创建表语句:
CREATE TABLE IF NOT EXISTS public.asset_data
(
id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
dp character varying(40) COLLATE pg_catalog."default",
at character varying(40) COLLATE pg_catalog."default",
it character varying(10) COLLATE pg_catalog."default",
ai character varying(40) COLLATE pg_catalog."default",
idt timestamp without time zone NOT NULL,
data jsonb NOT NULL,
inserted timestamp without time zone DEFAULT now(),
updated timestamp without time zone DEFAULT now(),
dt character varying(20) COLLATE pg_catalog."default",
asset_id_fk integer NOT NULL DEFAULT 1,
CONSTRAINT asset_data_pkey PRIMARY KEY (id),
CONSTRAINT asset_data_1_idx UNIQUE (dp, at, it, idt, dt, ai),
CONSTRAINT idx_asset_data_2_unique UNIQUE (asset_id_fk, idt)
)
CREATE INDEX IF NOT EXISTS asset_data_it_aif_idx
ON public.asset_data (idt, asset_id_fk);
CREATE INDEX IF NOT EXISTS idx_asset_data_asset_fk
ON public.asset_data (asset_id_fk);
CREATE INDEX IF NOT EXISTS idx_asset_data_asset_fk_idt_index
ON public.asset_data (asset_id_fk, idt);
CREATE INDEX IF NOT EXISTS idx_asset_data_idt_idx
ON public.asset_data (idt);
解释分析,限制为 8000:
Limit (cost=0.57..4170458.87 rows=8000 width=173) (actual time=0.160..3677.025 rows=8000 loops=1)
Buffers: shared hit=61448 read=14657 dirtied=756
-> Nested Loop (cost=0.57..507975375.06 rows=974426 width=173) (actual time=0.159..3675.290 rows=8000 loops=1)
Join Filter: (a.id = ad.asset_id_fk)
Rows Removed by Join Filter: 474053
Buffers: shared hit=61448 read=14657 dirtied=756
-> Index Scan Backward using idx_asset_data_idt_idx on asset_data ad (cost=0.57..505855992.43 rows=141291824 width=146) (actual time=0.051..3437.070 rows=482053 loops=1)
Buffers: shared hit=61446 read=14657 dirtied=756
-> Materialize (cost=0.00..5.27 rows=1 width=35) (actual time=0.000..0.000 rows=1 loops=482053)
Buffers: shared hit=2
-> Seq Scan on assets a (cost=0.00..5.26 rows=1 width=35) (actual time=0.052..0.067 rows=1 loops=1)
Filter: (((dp)::text = 'kr'::text) AND ((at)::text = 'fr'::text) AND ((dt)::text = 'oh'::text) AND ((it)::text = '1m'::text) AND ((ai)::text = 'st'::text))
Rows Removed by Filter: 144
Buffers: shared hit=2
Settings: effective_cache_size = '1507160kB'
Planning:
Buffers: shared hit=4
Planning Time: 0.445 ms
Execution Time: 3679.005 ms
解释分析,限制为 9000:
Limit (cost=4269756.79..4269779.29 rows=9000 width=173) (actual time=700091.606..700094.588 rows=9000 loops=1)
Buffers: shared hit=133 read=1538340, temp read=74205 written=112013
-> Sort (cost=4269756.79..4272192.85 rows=974426 width=173) (actual time=700091.604..700093.738 rows=9000 loops=1)
Sort Key: ad.idt DESC
Sort Method: external merge Disk: 304680kB
Buffers: shared hit=133 read=1538340, temp read=74205 written=112013
-> Nested Loop (cost=0.57..4200885.77 rows=974426 width=173) (actual time=1.190..693283.441 rows=1687735 loops=1)
Buffers: shared hit=133 read=1538340
-> Seq Scan on assets a (cost=0.00..5.26 rows=1 width=35) (actual time=0.032..0.050 rows=1 loops=1)
Filter: (((dp)::text = 'kr'::text) AND ((at)::text = 'fr'::text) AND ((dt)::text = 'oh'::text) AND ((it)::text = '1m'::text) AND ((ai)::text = 'st'::text))
Rows Removed by Filter: 144
Buffers: shared hit=2
-> Index Scan using idx_asset_data_asset_fk on asset_data ad (cost=0.57..4190011.91 rows=1086860 width=146) (actual time=1.151..691172.001 rows=1687735 loops=1)
Index Cond: (asset_id_fk = a.id)
Buffers: shared hit=131 read=1538340
Settings: effective_cache_size = '1507160kB'
Planning:
Buffers: shared hit=4
Planning Time: 0.317 ms
Execution Time: 700245.659 ms
更新2:
切换到以下查询后:
SELECT a.dp, a.at, a.dt, a.it, a.ai, ad.idt, ad.data
FROM (
SELECT a.id, a.dp, a.at, a.dt, a.it, a.ai
FROM assets a
WHERE a.dp = 'kr'
AND a.at = 'fr'
AND a.dt = 'oh'
AND a.it = '1m'
AND a.ai = 'st'
LIMIT 1 -- make sure the planner understands
) a
JOIN asset_data ad ON ad.asset_id_fk = a.id
ORDER BY ad.idt DESC
LIMIT 8000;
唯一的区别是查询计划在 9000 和 10000 之间切换:
Limit (cost=0.57..4206410.37 rows=9000 width=173) (actual time=0.139..432.265 rows=9000 loops=1)
Buffers: shared hit=82159
-> Nested Loop (cost=0.57..507975395.07 rows=1086860 width=173) (actual time=0.138..431.358 rows=9000 loops=1)
Join Filter: (a.id = ad.asset_id_fk)
Rows Removed by Join Filter: 533354
Buffers: shared hit=82159
-> Index Scan Backward using idx_asset_data_idt_idx on asset_data ad (cost=0.57..505856012.43 rows=141291824 width=146) (actual time=0.049..164.873 rows=542354 loops=1)
Buffers: shared hit=82158
-> Materialize (cost=0.00..5.28 rows=1 width=35) (actual time=0.000..0.000 rows=1 loops=542354)
Buffers: shared hit=1
-> Subquery Scan on a (cost=0.00..5.27 rows=1 width=35) (actual time=0.032..0.035 rows=1 loops=1)
Buffers: shared hit=1
-> Limit (cost=0.00..5.26 rows=1 width=35) (actual time=0.032..0.033 rows=1 loops=1)
Buffers: shared hit=1
-> Seq Scan on assets a_1 (cost=0.00..5.26 rows=1 width=35) (actual time=0.031..0.032 rows=1 loops=1)
Filter: (((dp)::text = 'kr'::text) AND ((at)::text = 'fr'::text) AND ((dt)::text = 'oh'::text) AND ((it)::text = '1m'::text) AND ((ai)::text = 'st'::text))
Rows Removed by Filter: 96
Buffers: shared hit=1
Settings: effective_cache_size = '1507160kB'
Planning Time: 0.322 ms
Execution Time: 432.904 ms
Limit (cost=4278529.50..4278554.50 rows=10000 width=173) (actual time=702389.569..702392.909 rows=10000 loops=1)
Buffers: shared hit=350 read=1538221, temp read=74241 written=112019
-> Sort (cost=4278529.50..4281246.65 rows=1086860 width=173) (actual time=702389.568..702392.118 rows=10000 loops=1)
Sort Key: ad.idt DESC
Sort Method: external merge Disk: 304704kB
Buffers: shared hit=350 read=1538221, temp read=74241 written=112019
-> Nested Loop (cost=0.57..4200885.78 rows=1086860 width=173) (actual time=1.267..695545.975 rows=1687836 loops=1)
Buffers: shared hit=350 read=1538221
-> Limit (cost=0.00..5.26 rows=1 width=35) (actual time=0.071..0.074 rows=1 loops=1)
Buffers: shared hit=1
-> Seq Scan on assets a (cost=0.00..5.26 rows=1 width=35) (actual time=0.071..0.071 rows=1 loops=1)
Filter: (((dp)::text = 'kr'::text) AND ((at)::text = 'fr'::text) AND ((dt)::text = 'oh'::text) AND ((it)::text = '1m'::text) AND ((ai)::text = 'st'::text))
Rows Removed by Filter: 96
Buffers: shared hit=1
-> Index Scan using idx_asset_data_asset_fk on asset_data ad (cost=0.57..4190011.91 rows=1086860 width=146) (actual time=1.190..693414.817 rows=1687836 loops=1)
Index Cond: (asset_id_fk = a.id)
Buffers: shared hit=349 read=1538221
Settings: effective_cache_size = '1507160kB'
Planning Time: 0.301 ms
Execution Time: 702526.728 ms