我正在使用 PostgreSQL 17。例如,我有一张表,table0
其中有一列col0
,其中随机生成的字符串作为GIN 索引值。我使用以下命令setup.sql
创建这样的表:
create table public.table0 (
col0 varchar(25)
);
select setseed(0.12345);
insert into table0 (col0)
select substring(md5(random()::text), 1, (2 + (random() * 14))::int)
from generate_series(1, 12345678);
create extension pg_trgm;
create index col0_gin_trgm_idx on table0 using gin (col0 gin_trgm_ops);
vacuum (full, analyze) table0;
使用select setseed(0.12345);
可确保每次执行时创建的表都是相同的setup.sql
。我想观察一个简单的部分字符串匹配查询的执行计划。为此,我使用query.sql
:
explain (analyze, buffers)
select * from table0 where col0 like '%abc%' limit 500;
令我惊讶的是,查询的执行计划并不是恒定的,它似乎取决于表创建和查询执行之间的时间。为了演示这一点,我创建了以下 bash 脚本:
#!/bin/bash
set -o errexit
set -o nounset
set -o pipefail
rm -f records.txt
for i in {1..9}; do
docker run \
--name postgres-db \
--env POSTGRES_DB=postgres \
--env POSTGRES_USER=postgres \
--env POSTGRES_PASSWORD=mysecretpassword \
--publish 5432:5432\
--detach postgres
sleep 2 # wait for docker container to start
PGPASSWORD=mysecretpassword psql \
--host=localhost \
--port=5432 \
--username=postgres \
--dbname=postgres \
--set=ON_ERROR_STOP=1 \
--file=setup.sql
sleep $((RANDOM % 10))
PGPASSWORD=mysecretpassword psql \
--host=localhost \
--port=5432 \
--username=postgres \
--dbname=postgres \
--set=ON_ERROR_STOP=1 \
--file=query.sql >> records.txt
docker rm --force postgres-db
done
此脚本设置表,等待最多 10 秒的随机时间,然后执行上述操作query.sql
。它会执行几次。它将query.sql
输出保存到records.txt
。我查看了一下records.txt
,发现有时使用顺序扫描,有时使用索引扫描来执行查询。以下经过筛选的版本(通过cat records.txt | grep "\->"
)records.txt
:
-> Bitmap Heap Scan on table0 (cost=52.34..4394.93 rows=1219 width=10) (actual time=11.526..15.200 rows=500 loops=1)
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.04 rows=1219 width=0) (actual time=9.559..9.559 rows=20852 loops=1)
-> Seq Scan on table0 (cost=0.00..216612.01 rows=122742 width=9) (actual time=0.068..17.788 rows=500 loops=1)
-> Bitmap Heap Scan on table0 (cost=52.34..4394.93 rows=1219 width=9) (actual time=5.963..8.939 rows=500 loops=1)
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.04 rows=1219 width=0) (actual time=4.144..4.144 rows=20852 loops=1)
-> Bitmap Heap Scan on table0 (cost=52.32..4377.97 rows=1214 width=9) (actual time=7.447..11.406 rows=500 loops=1)
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.01 rows=1214 width=0) (actual time=5.594..5.594 rows=20852 loops=1)
-> Bitmap Heap Scan on table0 (cost=52.33..4384.75 rows=1216 width=9) (actual time=6.660..11.991 rows=500 loops=1)
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.02 rows=1216 width=0) (actual time=4.744..4.745 rows=20852 loops=1)
-> Bitmap Heap Scan on table0 (cost=52.34..4394.93 rows=1219 width=9) (actual time=9.153..13.563 rows=500 loops=1)
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.04 rows=1219 width=0) (actual time=7.141..7.141 rows=20852 loops=1)
-> Bitmap Heap Scan on table0 (cost=52.31..4374.58 rows=1213 width=10) (actual time=10.078..13.199 rows=500 loops=1)
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.01 rows=1213 width=0) (actual time=8.108..8.108 rows=20852 loops=1)
-> Bitmap Heap Scan on table0 (cost=52.33..4384.76 rows=1216 width=10) (actual time=7.322..12.073 rows=500 loops=1)
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.02 rows=1216 width=0) (actual time=5.526..5.527 rows=20852 loops=1)
-> Seq Scan on table0 (cost=0.00..216610.51 rows=245232 width=9) (actual time=0.073..23.047 rows=500 loops=1)
完整的records.txt
是:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.34..1833.55 rows=500 width=10) (actual time=11.527..15.249 rows=500 loops=1)
Buffers: shared hit=4 read=435
-> Bitmap Heap Scan on table0 (cost=52.34..4394.93 rows=1219 width=10) (actual time=11.526..15.200 rows=500 loops=1)
Recheck Cond: ((col0)::text ~~ '%abc%'::text)
Heap Blocks: exact=428
Buffers: shared hit=4 read=435
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.04 rows=1219 width=0) (actual time=9.559..9.559 rows=20852 loops=1)
Index Cond: ((col0)::text ~~ '%abc%'::text)
Buffers: shared hit=4 read=7
Planning:
Buffers: shared hit=51 read=13 dirtied=1
Planning Time: 4.666 ms
Execution Time: 15.657 ms
(13 rows)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..882.39 rows=500 width=9) (actual time=0.068..17.831 rows=500 loops=1)
Buffers: shared read=1439
-> Seq Scan on table0 (cost=0.00..216612.01 rows=122742 width=9) (actual time=0.068..17.788 rows=500 loops=1)
Filter: ((col0)::text ~~ '%abc%'::text)
Rows Removed by Filter: 282649
Buffers: shared read=1439
Planning:
Buffers: shared hit=55 read=9 dirtied=1
Planning Time: 2.427 ms
Execution Time: 17.918 ms
(10 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.34..1833.55 rows=500 width=9) (actual time=5.965..8.988 rows=500 loops=1)
Buffers: shared hit=4 read=435
-> Bitmap Heap Scan on table0 (cost=52.34..4394.93 rows=1219 width=9) (actual time=5.963..8.939 rows=500 loops=1)
Recheck Cond: ((col0)::text ~~ '%abc%'::text)
Heap Blocks: exact=428
Buffers: shared hit=4 read=435
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.04 rows=1219 width=0) (actual time=4.144..4.144 rows=20852 loops=1)
Index Cond: ((col0)::text ~~ '%abc%'::text)
Buffers: shared hit=4 read=7
Planning:
Buffers: shared hit=51 read=13 dirtied=1
Planning Time: 2.427 ms
Execution Time: 9.234 ms
(13 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.32..1833.89 rows=500 width=9) (actual time=7.448..11.454 rows=500 loops=1)
Buffers: shared hit=4 read=435
-> Bitmap Heap Scan on table0 (cost=52.32..4377.97 rows=1214 width=9) (actual time=7.447..11.406 rows=500 loops=1)
Recheck Cond: ((col0)::text ~~ '%abc%'::text)
Heap Blocks: exact=428
Buffers: shared hit=4 read=435
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.01 rows=1214 width=0) (actual time=5.594..5.594 rows=20852 loops=1)
Index Cond: ((col0)::text ~~ '%abc%'::text)
Buffers: shared hit=4 read=7
Planning:
Buffers: shared hit=81 read=13 dirtied=1
Planning Time: 2.835 ms
Execution Time: 11.721 ms
(13 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.33..1833.75 rows=500 width=9) (actual time=6.662..12.059 rows=500 loops=1)
Buffers: shared hit=4 read=435
-> Bitmap Heap Scan on table0 (cost=52.33..4384.75 rows=1216 width=9) (actual time=6.660..11.991 rows=500 loops=1)
Recheck Cond: ((col0)::text ~~ '%abc%'::text)
Heap Blocks: exact=428
Buffers: shared hit=4 read=435
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.02 rows=1216 width=0) (actual time=4.744..4.745 rows=20852 loops=1)
Index Cond: ((col0)::text ~~ '%abc%'::text)
Buffers: shared hit=4 read=7
Planning:
Buffers: shared hit=51 read=13 dirtied=1
Planning Time: 3.332 ms
Execution Time: 12.511 ms
(13 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.34..1833.55 rows=500 width=9) (actual time=9.154..13.621 rows=500 loops=1)
Buffers: shared hit=4 read=435
-> Bitmap Heap Scan on table0 (cost=52.34..4394.93 rows=1219 width=9) (actual time=9.153..13.563 rows=500 loops=1)
Recheck Cond: ((col0)::text ~~ '%abc%'::text)
Heap Blocks: exact=428
Buffers: shared hit=4 read=435
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.04 rows=1219 width=0) (actual time=7.141..7.141 rows=20852 loops=1)
Index Cond: ((col0)::text ~~ '%abc%'::text)
Buffers: shared hit=4 read=7
Planning:
Buffers: shared hit=51 read=13 dirtied=1
Planning Time: 4.113 ms
Execution Time: 14.018 ms
(13 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.31..1833.95 rows=500 width=10) (actual time=10.079..13.249 rows=500 loops=1)
Buffers: shared hit=4 read=435
-> Bitmap Heap Scan on table0 (cost=52.31..4374.58 rows=1213 width=10) (actual time=10.078..13.199 rows=500 loops=1)
Recheck Cond: ((col0)::text ~~ '%abc%'::text)
Heap Blocks: exact=428
Buffers: shared hit=4 read=435
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.01 rows=1213 width=0) (actual time=8.108..8.108 rows=20852 loops=1)
Index Cond: ((col0)::text ~~ '%abc%'::text)
Buffers: shared hit=4 read=7
Planning:
Buffers: shared hit=51 read=13 dirtied=1
Planning Time: 3.596 ms
Execution Time: 13.682 ms
(13 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.33..1833.75 rows=500 width=10) (actual time=7.323..12.126 rows=500 loops=1)
Buffers: shared hit=4 read=435
-> Bitmap Heap Scan on table0 (cost=52.33..4384.76 rows=1216 width=10) (actual time=7.322..12.073 rows=500 loops=1)
Recheck Cond: ((col0)::text ~~ '%abc%'::text)
Heap Blocks: exact=428
Buffers: shared hit=4 read=435
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.02 rows=1216 width=0) (actual time=5.526..5.527 rows=20852 loops=1)
Index Cond: ((col0)::text ~~ '%abc%'::text)
Buffers: shared hit=4 read=7
Planning:
Buffers: shared hit=51 read=13 dirtied=1
Planning Time: 5.907 ms
Execution Time: 12.485 ms
(13 rows)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..441.64 rows=500 width=9) (actual time=0.074..23.096 rows=500 loops=1)
Buffers: shared read=1439
-> Seq Scan on table0 (cost=0.00..216610.51 rows=245232 width=9) (actual time=0.073..23.047 rows=500 loops=1)
Filter: ((col0)::text ~~ '%abc%'::text)
Rows Removed by Filter: 282649
Buffers: shared read=1439
Planning:
Buffers: shared hit=51 read=13 dirtied=1
Planning Time: 2.087 ms
Execution Time: 23.218 ms
(10 rows)
我使用了dockerised Postgres 17来提供简单的可重复性。
重新启动docker容器,运行setup.sql
,然后:
explain (analyze, buffers, settings)
select * from table0 where col0 like '%abc%' limit 500;
返回:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.32..1833.89 rows=500 width=9) (actual time=9.476..17.923 rows=500 loops=1)
Buffers: shared hit=4 read=435
-> Bitmap Heap Scan on table0 (cost=52.32..4377.97 rows=1214 width=9) (actual time=9.475..17.858 rows=500 loops=1)
Recheck Cond: ((col0)::text ~~ '%abc%'::text)
Heap Blocks: exact=428
Buffers: shared hit=4 read=435
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.01 rows=1214 width=0) (actual time=7.662..7.662 rows=20852 loops=1)
Index Cond: ((col0)::text ~~ '%abc%'::text)
Buffers: shared hit=4 read=7
Planning:
Buffers: shared hit=55 read=9 dirtied=1
Planning Time: 5.597 ms
Execution Time: 18.334 ms
(13 rows)
然而,我不明白为什么有时使用顺序扫描,有时使用索引扫描。
为什么查询计划似乎取决于查询和表设置之间的时间间隔?
下面我回答一下评论中出现的一些问题。
数据库中还有其他写入活动吗?
我没有发起任何写入活动,而且由于它只是我在本地机器上运行的一个容器,除非存在一些我不知道的自动化过程,否则不会有任何写入活动。
您的服务器有多忙?
我没有发现任何不正常的情况。这是一台 Apple M1 Pro,运行 MacOS Sonoma 14.5。
我只是在仔细观察了您的设置后才意识到:您为循环中的每次迭代重新创建了整个数据库。
您使用
setseed()
来重新创建完全相同的表内容。这对于重新创建稳定的测试平台非常有用。但setseed()
仅适用于同一会话中的random()
和调用,而不适用于,后者根据其自身随机选择的行收集统计信息。因此,在迭代之间,列统计信息在标准分布内会出现随机变化。这反映在查询计划中的行估计值略有变化。random_normal()
ANALYZE
我重新创建了您的场景,在总共 12345678 行中得到了 20852 行包含“abc”。为了满足您的要求
LIMIT 500
,Postgres 需要(12345678 / 20852) * 500 = 296031
平均读取行,这并不比在三元组 GIN 索引上运行位图索引扫描昂贵多少。如果统计数据高估了您的搜索模式的频率,错误的规划器常量可能会导致查询规划器误入歧途。解决方案
显然,您的硬件速度很快。默认的设置
random_page_cost = 4
会误导成本估算。我期望更现实的设置random_page_cost = 1.1
(如手册中所建议的)能够终止给定场景中的任何顺序扫描,因为这会使相对成本估算大致增加四倍。您还可以使用明显更小
LIMIT
或更长(更具选择性)的搜索模式进行测试,或者优化一些其他配置设置,使得规划器估算中的索引使用(实际上!)更便宜,或者增加表的统计目标(或仅一列)如手册中所建议的那样:看:
所有这些都是为了使该案例的序列扫描应该消失。
附言 1:
VACUUM (FULL, ANALYZE)
是过度杀伤,VACUUM (ANALYZE)
对于您的原始 DB 来说已经足够好了(并且便宜得多)。附注 2:更好的是(并且更便宜),在 DB 集群中创建一个“模板”数据库,然后使用以下命令重新创建相同的数据库(具有相同的列统计信息!):
看: