这似乎是一个常见问题的变体,即对查询中 LIMIT 子句的微小更改会将查询计划更改为性能极差的计划。在本例中,我在 PostgreSQL 13 数据库中有两个表:
assets
有 ~140 行;asset_data
大约有 1.41 亿行。
我的查询如下:
SELECT
a.dp as dp,
a.at as at,
a.dt as dt,
a.it as it,
a.ai as ai,
ad.idt as idt,
ad.data as data
FROM assets a
JOIN asset_data ad ON a.id = ad.asset_id_fk
WHERE
a.dp = 'kr'
AND a.at = 'fr'
AND a.dt = 'oh'
AND a.it = '1m'
AND a.ai = 'st'
ORDER BY idt desc
LIMIT 8000
上有一个索引idt
。
当限制时,8000
它只是使用idt
索引进行排序;当限制时,9000
它在连接后执行排序。
净性能从 3 秒缩短到近 12 分钟。
在阅读了其中一些类型的问题后,我尝试了一个VACUUM ANALYZE
,这改变了查询计划,但没有任何重要的方式。
更新:
我还尝试设置
idt
和asset_id_fk
列的统计信息,但这不起作用1000
,而且对我来说它应该起作用并不明显。idt 本身并不独特
idt + asset_id_fk 是唯一的并且有相应的约束
资产表仅返回一行
asset_id_fk 和 idt 的组合上有一个索引
idt上有单独的索引
关于解决此问题的适当方法有什么建议吗?
创建表语句:
CREATE TABLE IF NOT EXISTS public.asset_data
(
id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
dp character varying(40) COLLATE pg_catalog."default",
at character varying(40) COLLATE pg_catalog."default",
it character varying(10) COLLATE pg_catalog."default",
ai character varying(40) COLLATE pg_catalog."default",
idt timestamp without time zone NOT NULL,
data jsonb NOT NULL,
inserted timestamp without time zone DEFAULT now(),
updated timestamp without time zone DEFAULT now(),
dt character varying(20) COLLATE pg_catalog."default",
asset_id_fk integer NOT NULL DEFAULT 1,
CONSTRAINT asset_data_pkey PRIMARY KEY (id),
CONSTRAINT asset_data_1_idx UNIQUE (dp, at, it, idt, dt, ai),
CONSTRAINT idx_asset_data_2_unique UNIQUE (asset_id_fk, idt)
)
CREATE INDEX IF NOT EXISTS asset_data_it_aif_idx
ON public.asset_data (idt, asset_id_fk);
CREATE INDEX IF NOT EXISTS idx_asset_data_asset_fk
ON public.asset_data (asset_id_fk);
CREATE INDEX IF NOT EXISTS idx_asset_data_asset_fk_idt_index
ON public.asset_data (asset_id_fk, idt);
CREATE INDEX IF NOT EXISTS idx_asset_data_idt_idx
ON public.asset_data (idt);
解释分析,限制为 8000:
Limit (cost=0.57..4170458.87 rows=8000 width=173) (actual time=0.160..3677.025 rows=8000 loops=1)
Buffers: shared hit=61448 read=14657 dirtied=756
-> Nested Loop (cost=0.57..507975375.06 rows=974426 width=173) (actual time=0.159..3675.290 rows=8000 loops=1)
Join Filter: (a.id = ad.asset_id_fk)
Rows Removed by Join Filter: 474053
Buffers: shared hit=61448 read=14657 dirtied=756
-> Index Scan Backward using idx_asset_data_idt_idx on asset_data ad (cost=0.57..505855992.43 rows=141291824 width=146) (actual time=0.051..3437.070 rows=482053 loops=1)
Buffers: shared hit=61446 read=14657 dirtied=756
-> Materialize (cost=0.00..5.27 rows=1 width=35) (actual time=0.000..0.000 rows=1 loops=482053)
Buffers: shared hit=2
-> Seq Scan on assets a (cost=0.00..5.26 rows=1 width=35) (actual time=0.052..0.067 rows=1 loops=1)
Filter: (((dp)::text = 'kr'::text) AND ((at)::text = 'fr'::text) AND ((dt)::text = 'oh'::text) AND ((it)::text = '1m'::text) AND ((ai)::text = 'st'::text))
Rows Removed by Filter: 144
Buffers: shared hit=2
Settings: effective_cache_size = '1507160kB'
Planning:
Buffers: shared hit=4
Planning Time: 0.445 ms
Execution Time: 3679.005 ms
解释分析,限制为 9000:
Limit (cost=4269756.79..4269779.29 rows=9000 width=173) (actual time=700091.606..700094.588 rows=9000 loops=1)
Buffers: shared hit=133 read=1538340, temp read=74205 written=112013
-> Sort (cost=4269756.79..4272192.85 rows=974426 width=173) (actual time=700091.604..700093.738 rows=9000 loops=1)
Sort Key: ad.idt DESC
Sort Method: external merge Disk: 304680kB
Buffers: shared hit=133 read=1538340, temp read=74205 written=112013
-> Nested Loop (cost=0.57..4200885.77 rows=974426 width=173) (actual time=1.190..693283.441 rows=1687735 loops=1)
Buffers: shared hit=133 read=1538340
-> Seq Scan on assets a (cost=0.00..5.26 rows=1 width=35) (actual time=0.032..0.050 rows=1 loops=1)
Filter: (((dp)::text = 'kr'::text) AND ((at)::text = 'fr'::text) AND ((dt)::text = 'oh'::text) AND ((it)::text = '1m'::text) AND ((ai)::text = 'st'::text))
Rows Removed by Filter: 144
Buffers: shared hit=2
-> Index Scan using idx_asset_data_asset_fk on asset_data ad (cost=0.57..4190011.91 rows=1086860 width=146) (actual time=1.151..691172.001 rows=1687735 loops=1)
Index Cond: (asset_id_fk = a.id)
Buffers: shared hit=131 read=1538340
Settings: effective_cache_size = '1507160kB'
Planning:
Buffers: shared hit=4
Planning Time: 0.317 ms
Execution Time: 700245.659 ms
更新2:
切换到以下查询后:
SELECT a.dp, a.at, a.dt, a.it, a.ai, ad.idt, ad.data
FROM (
SELECT a.id, a.dp, a.at, a.dt, a.it, a.ai
FROM assets a
WHERE a.dp = 'kr'
AND a.at = 'fr'
AND a.dt = 'oh'
AND a.it = '1m'
AND a.ai = 'st'
LIMIT 1 -- make sure the planner understands
) a
JOIN asset_data ad ON ad.asset_id_fk = a.id
ORDER BY ad.idt DESC
LIMIT 8000;
唯一的区别是查询计划在 9000 和 10000 之间切换:
Limit (cost=0.57..4206410.37 rows=9000 width=173) (actual time=0.139..432.265 rows=9000 loops=1)
Buffers: shared hit=82159
-> Nested Loop (cost=0.57..507975395.07 rows=1086860 width=173) (actual time=0.138..431.358 rows=9000 loops=1)
Join Filter: (a.id = ad.asset_id_fk)
Rows Removed by Join Filter: 533354
Buffers: shared hit=82159
-> Index Scan Backward using idx_asset_data_idt_idx on asset_data ad (cost=0.57..505856012.43 rows=141291824 width=146) (actual time=0.049..164.873 rows=542354 loops=1)
Buffers: shared hit=82158
-> Materialize (cost=0.00..5.28 rows=1 width=35) (actual time=0.000..0.000 rows=1 loops=542354)
Buffers: shared hit=1
-> Subquery Scan on a (cost=0.00..5.27 rows=1 width=35) (actual time=0.032..0.035 rows=1 loops=1)
Buffers: shared hit=1
-> Limit (cost=0.00..5.26 rows=1 width=35) (actual time=0.032..0.033 rows=1 loops=1)
Buffers: shared hit=1
-> Seq Scan on assets a_1 (cost=0.00..5.26 rows=1 width=35) (actual time=0.031..0.032 rows=1 loops=1)
Filter: (((dp)::text = 'kr'::text) AND ((at)::text = 'fr'::text) AND ((dt)::text = 'oh'::text) AND ((it)::text = '1m'::text) AND ((ai)::text = 'st'::text))
Rows Removed by Filter: 96
Buffers: shared hit=1
Settings: effective_cache_size = '1507160kB'
Planning Time: 0.322 ms
Execution Time: 432.904 ms
Limit (cost=4278529.50..4278554.50 rows=10000 width=173) (actual time=702389.569..702392.909 rows=10000 loops=1)
Buffers: shared hit=350 read=1538221, temp read=74241 written=112019
-> Sort (cost=4278529.50..4281246.65 rows=1086860 width=173) (actual time=702389.568..702392.118 rows=10000 loops=1)
Sort Key: ad.idt DESC
Sort Method: external merge Disk: 304704kB
Buffers: shared hit=350 read=1538221, temp read=74241 written=112019
-> Nested Loop (cost=0.57..4200885.78 rows=1086860 width=173) (actual time=1.267..695545.975 rows=1687836 loops=1)
Buffers: shared hit=350 read=1538221
-> Limit (cost=0.00..5.26 rows=1 width=35) (actual time=0.071..0.074 rows=1 loops=1)
Buffers: shared hit=1
-> Seq Scan on assets a (cost=0.00..5.26 rows=1 width=35) (actual time=0.071..0.071 rows=1 loops=1)
Filter: (((dp)::text = 'kr'::text) AND ((at)::text = 'fr'::text) AND ((dt)::text = 'oh'::text) AND ((it)::text = '1m'::text) AND ((ai)::text = 'st'::text))
Rows Removed by Filter: 96
Buffers: shared hit=1
-> Index Scan using idx_asset_data_asset_fk on asset_data ad (cost=0.57..4190011.91 rows=1086860 width=146) (actual time=1.190..693414.817 rows=1687836 loops=1)
Index Cond: (asset_id_fk = a.id)
Buffers: shared hit=349 read=1538221
Settings: effective_cache_size = '1507160kB'
Planning Time: 0.301 ms
Execution Time: 702526.728 ms
无论您做什么,请在以下位置创建多列索引
asset_data
:你后来公开了一个关于 的索引
(asset_id_fk, idt)
,这很好,但是我的具有匹配排序顺序的索引更好。考虑用我的索引替换idx_asset_data_asset_fk_idt_index
。您通常也不需要idx_asset_data_asset_fk
额外的。看:就像
asset_data_it_aif_idx
已经涵盖了几乎所有idx_asset_data_idt_idx
可以带到桌面上的东西一样。如果
asset_data.data
是一个窄列,它可以向INCLUDE
索引中的列付费以允许仅索引扫描 - 但不适用于您披露的大型 JSON 列:由于您知道只有一个资产与该
WHERE
子句匹配,因此此查询可确保 Postgres 理解,并且不会切换到较差的计划(与上面的索引组合!):我提出了一个
LATERAL
查询来确定。如果服务器配置和列统计数据具有欺骗性,我之前建议的查询仍然会让 Postgres 陷入劣质计划。或者尝试这个最低限度的查询。应该是最快的:
不返回
dp
,at
, ... 因为它们必然与您的输入相同(您显然已经知道)。所以我们只需要asset
来生成id
.或者,
UNIQUE
您可以通过添加对 的约束来让 Postgres 了解过滤器的选择性asset(dp, at, dt, it, ai)
。那么您的原始查询也不应该导致错误的查询计划。旁白
由于您已经将所有这些列存储在 中
asset
,因此请从 中删除冗余存储asset_data
。重命名asset_id_fk
为asset_id
. 清理相关名称。索引的一致命名约定不会有什么坏处。并添加FOREIGN KEY
约束以强制引用完整性:将大表聚集在我建议的索引上会有很大帮助:
看:
您可能还想投入一些工作来改进服务器配置和列统计信息。
有关的:
查看您的第一个计划的摘录:
每次缓冲区访问读取 482053/(61446+14657) = 6.33 行。但其成本估算为每行 505855992.43/141291824 = 3.58,仅略小于 random_page_cost 的默认值 4。所以它认为每个缓冲区只能找到大约 1 行。这个问题并不能解释整个糟糕的成本估算,但肯定足以解释它的重要性。
这表明您的物理行排序与 idt 列适度相关,但规划器并不知道这一点。该列的 pg_stats.correlation 的值是多少?这看起来是一个准确的估计吗?也许表中不同部分的相关程度不同,因此无法获得整体准确的估计。在这种情况下,分区可能会有所帮助,或者只是在表上运行 CLUSTER。
太棒了;由于我知道最常见的查询模式,因此我将删除除我想要强制查询规划器使用它的索引之外的所有索引,并且我将更新应用程序代码以执行两个查询,而不是执行一个查询join 因为查询计划程序只是拒绝确认任何告诉它子查询将返回单行的信息。
细节:
好吧,经过一系列的实验,这就是我的想法。其中很多都是假设,因为我不知道幕后到底发生了什么,但结果就是结果。
首先了解一些上下文:数据是时间序列数据,
idt
是间隔日期时间的缩写,在最佳条件下,基本上相当于插入的时间戳。第一个问题可能是,由于多次迁移和批量更改,行在物理上并未按最佳顺序插入,这导致查询规划器忽略索引已排序的事实并尝试在连接后进行排序。 ..如果限制从 9000 行移至 10000 行。这是一个糟糕的计划。
jjanes 的回答解释了一些关于聚类、相关统计和物理插入方面的内容,这让我进一步研究了这些信息。由于 Edwin 包含了一些类似的命令(聚集在 asset_id_fk 和 idt 列上),因此我首先尝试这些命令,因为我可以复制粘贴它们。
在对 asset_id_fk 和 idt 列上的索引进行聚类并可能重新索引 和 后
vacuum analyze
,相关统计信息更新为按预期与 asset_id_fk 相关。这加快了查询速度,但没有改变查询计划。速度的增加并不奇怪,因为一切都已经排序了(我认为)。但是,考虑到表的典型操作(基于
idt
列的插入),我认为如果没有定期重新聚类,这种聚类方法就无法扩展。此外,查询计划未修复这一事实是一个危险信号。
idt
因此,我决定仅使用该列上的索引来对该列进行集群。聚类/重新索引/真空分析后,相关统计信息按预期更新,并且与列完全相关idt
。而且,你瞧,查询计划正确地使用了
idt
索引进行排序,并且这支持了至少 150k 行,而不是仅仅 10k。我使用以下查询进行了多次实验:在这两种情况下,他们大约需要 2 分 40 分才能完成。
我进行了第二个实验,因为我注意到基本上每次埃德温说 X 会做 Y 时,实际上却没有。他所说的任何内容都不会“通知”查询规划器子查询实际上会返回一行。为了测试这一点,我只是将
WHERE
条件替换为我想要的资产的特定 ID,因此这些是查询:这些根据资产更改了查询计划。
有些使用 asset_id_fk 和 idt 列上的索引,然后对资产表进行联接并在联接后进行排序。这些真的很快(只花了几秒钟)。
Some used the index on the idt column, did a gather merge strategy, then did the join with the asset table. These took about 50 seconds.
Looking at the row count for the assets, this was probably a reasonable change in query plan but I obviously can't verify since I can't force a query plan. Regardless, this took significantly less time in all cases.
My next experiment was the same as the above but I dropped the index on the idt column. For the asset with the larger number of rows, this forced the query planner to use the index on the asset_id_fk and idt columns and the sorting was done using the index.
This query took about 40 seconds and, even cooler, seemed to result in some caching because the second run took less than one second.
My last experiment was the same as the above but instead of using the asset id, I used the multiple query parameters. This resulted in the query planner not making the optimizations for the asset with the larger number of rows and thus taking long enough that I stopped it (more than 2:40).
My conclusions are as follows:
As far as I can tell, the query planner simply ignores any attempt to tell it that a query will return one row if the table is small enough.
In situations like this, the better option is to simply do two queries...one for the id from the small table and a second query that doesn't involve a join.
The PostgreSQL query planner can be shockingly dumb and unpredictable at times and if you've ever googled "how to force postgresql to use a particular index" and felt the responses of "the query planner will do a better job than you" were condescending and unsupported by evidence, congratulations, you're probably right. Doing a fairly simple join like this should not require so much work to get it to work reliably and predictably.