抱歉,如果这似乎是一个重复的问题。我在 AWS RDS 上使用 Postgres 11.6。我有 2 张桌子:
CREATE TABLE public.e
(
id character varying(32) COLLATE pg_catalog."default" NOT NULL,
p_id character varying(32) COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT e_pkey PRIMARY KEY (id)
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
CREATE TABLE public.ed
(
e_id character varying(32) COLLATE pg_catalog."default" NOT NULL,
<other columns + primary key>
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
我有一个索引ed.e_id
:
CREATE INDEX ix_ed_e_id
ON public.ed USING btree
(e_id COLLATE pg_catalog."default" ASC NULLS LAST)
TABLESPACE pg_default;
当我运行此查询时:
select *
from ed, e
where e.id = ed.e_id
and e.p_id = '5c7cae8df6d10f1064b2eaf5';
(使用时问题依然存在from ed inner join e on e.id = ed.e_id
)
explain analyze
计划是:
Gather (cost=1136.68..141235.01 rows=28320 width=311) (actual time=0.456..871.155 rows=102709 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=136.68..137403.01 rows=11800 width=311) (actual time=0.241..688.095 rows=34236 loops=3)
Hash Cond: (ed.e_id = e.id)
-> Parallel Seq Scan on ed ed (cost=0.00..133210.10 rows=1544610 width=218) (actual time=0.005..314.524 rows=1235269 loops=3)
-> Hash (cost=135.67..135.67 rows=81 width=93) (actual time=0.125..0.126 rows=81 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 19kB
-> Bitmap Heap Scan on e e (cost=4.91..135.67 rows=81 width=93) (actual time=0.045..0.097 rows=81 loops=3)
Recheck Cond: ((p_id)::text = '5c7cae8df6d10f1064b2eaf5'::text)
Heap Blocks: exact=31
-> Bitmap Index Scan on ix_e_p_id (cost=0.00..4.89 rows=81 width=0) (actual time=0.035..0.035 rows=81 loops=3)
Index Cond: ((p_id)::text = '5c7cae8df6d10f1064b2eaf5'::text)
Planning Time: 0.329 ms
Execution Time: 877.804 ms
用一个Parallel Seq Scan on ed
为ed.e_id
匹配。
当 ISET SESSION enable_seqscan = OFF
时,解释计划是:
Nested Loop (cost=0.72..395895.14 rows=28320 width=311) (actual time=0.037..60.068 rows=102709 loops=1)
-> Index Scan using e_pkey on e e (cost=0.29..917.61 rows=81 width=93) (actual time=0.019..4.995 rows=81 loops=1)
Filter: ((p_id)::text = '5c7cae8df6d10f1064b2eaf5'::text)
Rows Removed by Filter: 10522
-> Index Scan using ix_ed_e_id on ed ed (cost=0.43..4757.83 rows=11844 width=218) (actual time=0.013..0.334 rows=1268 loops=81)
Index Cond: (e_id = e.id)
Planning Time: 0.273 ms
Execution Time: 64.675 ms
快了一个数量级(877ms vs 64ms)!我试过VACUUM ANALYZE ed
了,但这没有帮助。我什至尝试将e.id
&更改ed.e_id
为UUID
类型,但这也无济于事。
如何说服 Postgres 使用ix_ed_e_id
索引而不设置enable_seqscan
为关闭?
似乎 PostgreSQL 高估了索引扫描的成本,这导致它更喜欢哈希连接而不是嵌套循环连接。
有两个参数告诉 PostgreSQL 硬件并影响它对索引扫描成本的估计:
random_page_cost
: 与 相比越大,seq_page_cost
PostgreSQL 估计索引扫描的随机 I/O 与顺序 I/O 相比就越昂贵。因此,您可以降低该参数以鼓励索引扫描。effective_cache_size
:这告诉优化器有多少内存可用于缓存数据。如果该值很高,它将假定索引被缓存并且价格索引扫描较低。也许调整这些参数会改变 PostgreSQL 的想法,尽管成本估计相差甚远。