我们使用 Amazon RDS 实例
x86_64-pc-linux-gnu 上的 PostgreSQL 11.13,由 gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12) 编译,64 位
我有一个简单的经典 top-1-per-group 查询。我需要为每个creativeScheduleId
.
这是一个表和索引定义:
CREATE TABLE IF NOT EXISTS public.creative_schedule_status_histories (
id serial PRIMARY KEY,
"creativeScheduleId" text NOT NULL,
-- other columns
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_creativescheduleid_id
ON public.creative_schedule_status_histories ("creativeScheduleId" ASC, id ASC);
当id ASC
引擎的查询命令只读取索引而不做任何额外的排序时:
EXPLAIN (ANALYZE)
SELECT history.id, history."creativeScheduleId"
FROM (
SELECT cssh.id, cssh."creativeScheduleId"
, ROW_NUMBER() OVER (PARTITION BY cssh."creativeScheduleId"
ORDER BY cssh.id ASC) AS rn -- !
FROM creative_schedule_status_histories as cssh
) AS history
WHERE history.rn = 1;
"Subquery Scan on history (cost=0.56..511808.63 rows=26377 width=41) (actual time=0.047..4539.058 rows=709030 loops=1)"
" Filter: (history.rn = 1)"
" Rows Removed by Filter: 4579766"
" -> WindowAgg (cost=0.56..445866.24 rows=5275391 width=49) (actual time=0.046..4165.835 rows=5288796 loops=1)"
" -> Index Only Scan using idx_creativescheduleid_id on creative_schedule_status_histories cssh (cost=0.56..353546.90 rows=5275391 width=41) (actual time=0.037..1447.490 rows=5288796 loops=1)"
" Heap Fetches: 2372"
"Planning Time: 0.072 ms"
"Execution Time: 4568.235 ms"
当我订购时,我希望看到完全相同的查询计划id DESC
,但是计划中有一个明确的排序溢出到磁盘,显然一切都变慢了。
EXPLAIN (ANALYZE)
SELECT history.id, history."creativeScheduleId"
FROM (
SELECT cssh.id, cssh."creativeScheduleId"
, ROW_NUMBER() OVER (PARTITION BY cssh."creativeScheduleId"
ORDER BY cssh.id DESC) AS rn -- !
FROM creative_schedule_status_histories as cssh
) AS history
WHERE history.rn = 1;
"Subquery Scan on history (cost=1267132.63..1438582.84 rows=26377 width=41) (actual time=11974.827..15840.338 rows=709046 loops=1)"
" Filter: (history.rn = 1)"
" Rows Removed by Filter: 4579802"
" -> WindowAgg (cost=1267132.63..1372640.45 rows=5275391 width=49) (actual time=11974.825..15529.679 rows=5288848 loops=1)"
" -> Sort (cost=1267132.63..1280321.11 rows=5275391 width=41) (actual time=11974.814..13547.038 rows=5288848 loops=1)"
" Sort Key: cssh.""creativeScheduleId"", cssh.id DESC"
" Sort Method: external merge Disk: 263992kB"
" -> Index Only Scan using idx_creativescheduleid_id on creative_schedule_status_histories cssh (cost=0.56..353550.90 rows=5275391 width=41) (actual time=0.015..1386.310 rows=5288848 loops=1)"
" Heap Fetches: 2508"
"Planning Time: 0.078 ms"
"Execution Time: 15949.877 ms"
我希望给定的索引在查询的两个变体中同样有用。
Postgres 不能在这里向后扫描索引?
我在这里想念什么?
当我对特定的给定进行查询时creativeScheduleId
,Postgres 对索引ASC
和DESC
排序顺序都同样有效。在任何变体中都没有明确的排序:
EXPLAIN (ANALYZE)
SELECT id, "creativeScheduleId"
FROM creative_schedule_status_histories AS cssh
WHERE "creativeScheduleId" = '24238370-a64c-4b30-ac8e-27eb2b693aca'
ORDER BY id DESC -- or ASC, no sort
LIMIT 1
"Limit (cost=0.56..0.71 rows=1 width=41) (actual time=0.022..0.022 rows=1 loops=1)"
" -> Index Only Scan Backward using idx_creativescheduleid_id on creative_schedule_status_histories cssh (cost=0.56..14.06 rows=86 width=41) (actual time=0.021..0.021 rows=1 loops=1)"
" Index Cond: (""creativeScheduleId"" = '24238370-a64c-4b30-ac8e-27eb2b693aca'::text)"
" Heap Fetches: 0"
"Planning Time: 0.064 ms"
"Execution Time: 0.033 ms"
在这里我们实际上看到Index Only Scan Backward
了,所以 Postgres 能够做到。但不适用于整张桌子。
任何想法如何鼓励引擎为读取整个表的第一个查询向后扫描整个索引?