我有一个在boolean
具有索引的列上进行过滤的查询。但是,查询需要很长时间才能完成。当我不使用这个过滤器时,查询返回得非常快。
这是解释计划。第一个有processed is true
并且需要很长时间才能完成。第二个没有它并立即返回。
explain select count(*) from listen_events where (started_at >='2021-12-26' and started_at <'2021-12-27') and processed is true;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=212405.62..212405.63 rows=1 width=8)
-> Bitmap Heap Scan on listen_events (cost=187657.78..212390.09 rows=6213 width=0)
Recheck Cond: ((started_at >= '2021-12-26 00:00:00'::timestamp without time zone) AND (started_at < '2021-12-27 00:00:00'::timestamp without time zone))
Filter: (processed IS TRUE)
-> BitmapAnd (cost=187657.78..187657.78 rows=6213 width=0)
-> Bitmap Index Scan on index_listen_events_on_started_at (cost=0.00..17323.56 rows=813898 width=0)
Index Cond: ((started_at >= '2021-12-26 00:00:00'::timestamp without time zone) AND (started_at < '2021-12-27 00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on listen_events_processed_idx (cost=0.00..170330.87 rows=9125639 width=0)
Index Cond: (processed = true)
(9 rows)
=> explain select count(*) from listen_events where (started_at >='2021-12-26' and started_at <'2021-12-27');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=24549.13..24549.14 rows=1 width=8)
-> Gather (cost=24548.92..24549.13 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=23548.92..23548.93 rows=1 width=8)
-> Parallel Index Only Scan using index_listen_events_on_started_at on listen_events (cost=0.58..22701.11 rows=339124 width=0)
Index Cond: ((started_at >= '2021-12-26 00:00:00'::timestamp without time zone) AND (started_at < '2021-12-27 00:00:00'::timestamp without time zone))
(6 rows)
这是表格配置:
Table "public.listen_events"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------------+-----------------------------+-----------+----------+-------------------------------------------+----------+--------------+-------------
id | integer | | not null | nextval('listen_events_id_seq'::regclass) | plain | |
event_type | text | | | | extended | |
stream_type | text | | | | extended | |
event_id | text | | | | extended | |
broadcast_uid | text | | | | extended | |
user_agent | text | | | | extended | |
city | text | | | | extended | |
country | text | | | | extended | |
referrer | text | | | | extended | |
country_code | character varying(2) | | | | extended | |
continent_code | character varying(2) | | | | extended | |
user_id | integer | | | | plain | |
started_at | timestamp without time zone | | | | plain | |
created_at | timestamp without time zone | | | | plain | |
updated_at | timestamp without time zone | | | | plain | |
ip_address | cidr | | | | main | |
location | point | | | | plain | |
ended_at | timestamp without time zone | | | | plain | |
server_id | text | | | | extended | |
channel_id | integer | | | | plain | |
id_bigint | bigint | | | | plain | |
processed | boolean | | not null | false | plain | |
Indexes:
"listen_events_pkey" PRIMARY KEY, btree (id)
"index_listen_events_event_id" btree (event_id)
"index_listen_events_on_broadcast_uid" btree (broadcast_uid)
"index_listen_events_on_started_at" btree (started_at)
"index_listen_events_on_user_id" btree (user_id)
"listen_events_processed_idx" btree (processed)
Options: autovacuum_enabled=true, autovacuum_vacuum_scale_factor=0, autovacuum_vacuum_threshold=30000, autovacuum_vacuum_cost_delay=0, autovacuum_analyze_scale_factor=0, autovacuum_analyze_threshold=30000, toast.autovacuum_enabled=true
目前,该表有 19 亿行,其中大部分为processed = false
.
任何线索为什么会这样?
你没有显示
EXPLAIN (ANALYZE, BUFFERS)
输出,所以我只能猜测。无论如何,有两个主要区别:由于查询没有单一索引,PostgreSQL 结合了两个索引。这比扫描单个索引要多一些工作。
主要区别在于快速查询可以使用仅索引扫描,而慢速查询则不能。
我会像这样创建一个两列索引:
如果您只使用 查询行
processed IS TRUE
,您还可以创建一个更小更快的索引: