我有一个过滤结果成本很高的查询,我想我应该添加一个索引来优化计划,但我到目前为止尝试的索引没有任何影响。我可以通过为过滤列添加复合索引来优化查询吗?这是计划:
Limit (cost=3069.33..14926.59 rows=4 width=509) (actual time=258424.190..258424.197 rows=4 loops=1)
InitPlan 1 (returns $0)
-> HashAggregate (cost=82.19..82.99 rows=80 width=8) (actual time=1320.215..1320.535 rows=2045 loops=1)
Group Key: booking_passengers.bid
Batches: 1 Memory Usage: 257kB
-> Index Scan using idx_booking_passengers_user_id on booking_passengers (cost=0.44..81.99 rows=80 width=8) (actual time=10.687..1314.519 rows=2045 loops=1)
Index Cond: ((user_id)::text = 'NJ8QigsGcQCDOttoGsD3iS'::text)
-> Incremental Sort (cost=2986.35..18414332.62 rows=6211 width=509) (actual time=258424.188..258424.189 rows=4 loops=1)
" Sort Key: booking_data.last_segment_arrival_at DESC, booking_data.bid"
Presorted Key: booking_data.last_segment_arrival_at
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 27kB Peak Memory: 27kB
-> Index Scan Backward using idx_booking_data_last_segment_arrival_at on booking_data (cost=0.44..18414054.67 rows=6211 width=509) (actual time=48419.376..258424.093 rows=5 loops=1)
Index Cond: (last_segment_arrival_at < '2022-06-13 13:36:00+00'::timestamp with time zone)
Filter: ((is_deleted IS FALSE) AND (bid >= 1100000) AND (((confirmation_sent IS TRUE) AND ((final_status)::text <> 'refunded'::text)) OR ((final_status)::text = 'confirmed'::text) OR ((confirmation_sent IS FALSE) AND ((final_status)::text = 'closed'::text))) AND (((user_id)::text = 'NJ8QigsGcQCDOttoGsD3iS'::text) OR (bid = ANY ($0))))
Rows Removed by Filter: 2315888
Planning Time: 2.132 ms
Execution Time: 258424.387 ms
这是查询:
explain analyze
SELECT *
FROM booking_data
WHERE booking_data.bid >= 1100000
AND booking_data.is_deleted IS false
AND booking_data.last_segment_arrival_at < '2022-06-13 13:36'
AND (booking_data.user_id = 'NJ8QigsGcQCDOttoGsD3iS'
OR booking_data.bid = ANY (CAST(array((
SELECT DISTINCT booking_passengers.bid AS anon_2
FROM booking_passengers
WHERE booking_passengers.user_id = 'NJ8QigsGcQCDOttoGsD3iS')) AS BIGINT[]))
)
AND (booking_data.confirmation_sent IS true
AND booking_data.final_status != 'refunded'
OR booking_data.final_status = 'confirmed'
OR booking_data.confirmation_sent IS false
AND booking_data.final_status IN ('closed')
)
ORDER BY booking_data.last_segment_arrival_at DESC, booking_data.bid ASC
LIMIT 4 OFFSET 0
booking_data 表上的当前索引:
create index idx_booking_data_final_status on booking_data (final_status);
create index idx_booking_data_user_id on booking_data (user_id);
create index idx_booking_data_last_segment_arrival_at on booking_data (last_segment_arrival_at);
create index idx_booking_data_first_segment_arrival_at on booking_data (first_segment_arrival_at);
create index idx_booking_data_confirmed_at on booking_data (confirmed_at);
create index idx_booking_data_booked_email on booking_data (booked, email);
create index idx_booking_data_first_last_segment_bid_user_id on booking_data (first_segment_arrival_at, last_segment_arrival_at, bid, user_id);
我添加了索引:
CREATE index CONCURRENTLY idx_booking_data_user_id_last_segment_arrival_at on booking_data (user_id, last_segment_arrival_at);
它现在有关于暂存数据库的计划(具有生产数据的较弱实例)。这是计划:
Limit (cost=13432.55..13432.56 rows=4 width=509) (actual time=11958.229..11958.235 rows=4 loops=1)
InitPlan 1 (returns $0)
-> HashAggregate (cost=82.19..82.99 rows=80 width=8) (actual time=2741.877..2742.215 rows=2053 loops=1)
Group Key: booking_passengers.bid
Batches: 1 Memory Usage: 257kB
-> Index Scan using idx_booking_passengers_user_id on booking_passengers (cost=0.44..81.99 rows=80 width=8) (actual time=18.064..2734.284 rows=2053 loops=1)
Index Cond: ((user_id)::text = 'NJ8QigsGcQCDOttoGsD3iS'::text)
-> Sort (cost=13349.57..13365.09 rows=6210 width=509) (actual time=11958.227..11958.230 rows=4 loops=1)
" Sort Key: booking_data.last_segment_arrival_at DESC, booking_data.bid"
Sort Method: top-N heapsort Memory: 28kB
-> Bitmap Heap Scan on booking_data (cost=195.64..13256.42 rows=6210 width=509) (actual time=3771.506..11952.815 rows=854 loops=1)
Recheck Cond: ((((user_id)::text = 'NJ8QigsGcQCDOttoGsD3iS'::text) AND (last_segment_arrival_at < '2022-06-13 13:36:00+00'::timestamp with time zone)) OR ((bid = ANY ($0)) AND (bid >= 1100000)))
Filter: ((is_deleted IS FALSE) AND (bid >= 1100000) AND (last_segment_arrival_at < '2022-06-13 13:36:00+00'::timestamp with time zone) AND (((confirmation_sent IS TRUE) AND ((final_status)::text <> 'refunded'::text)) OR ((final_status)::text = 'confirmed'::text) OR ((confirmation_sent IS FALSE) AND ((final_status)::text = 'closed'::text))))
Rows Removed by Filter: 10202
Heap Blocks: exact=10935
-> BitmapOr (cost=195.64..195.64 rows=12634 width=0) (actual time=3718.959..3718.961 rows=0 loops=1)
-> Bitmap Index Scan on idx_booking_data_user_id_last_segment_arrival_at (cost=0.00..176.81 rows=12625 width=0) (actual time=17.294..17.294 rows=11025 loops=1)
Index Cond: (((user_id)::text = 'NJ8QigsGcQCDOttoGsD3iS'::text) AND (last_segment_arrival_at < '2022-06-13 13:36:00+00'::timestamp with time zone))
-> Bitmap Index Scan on booking_data_pkey (cost=0.00..15.72 rows=10 width=0) (actual time=3701.663..3701.663 rows=2062 loops=1)
Index Cond: ((bid = ANY ($0)) AND (bid >= 1100000))
Planning Time: 2.263 ms
Execution Time: 11958.434 ms
首次运行查询后,执行时间更快:
Limit (cost=13432.55..13432.56 rows=4 width=509) (actual time=29.641..29.647 rows=4 loops=1)
InitPlan 1 (returns $0)
-> HashAggregate (cost=82.19..82.99 rows=80 width=8) (actual time=2.507..2.761 rows=2053 loops=1)
Group Key: booking_passengers.bid
Batches: 1 Memory Usage: 257kB
-> Index Scan using idx_booking_passengers_user_id on booking_passengers (cost=0.44..81.99 rows=80 width=8) (actual time=0.021..1.664 rows=2053 loops=1)
Index Cond: ((user_id)::text = 'NJ8QigsGcQCDOttoGsD3iS'::text)
-> Sort (cost=13349.57..13365.09 rows=6210 width=509) (actual time=29.640..29.643 rows=4 loops=1)
" Sort Key: booking_data.last_segment_arrival_at DESC, booking_data.bid"
Sort Method: top-N heapsort Memory: 28kB
-> Bitmap Heap Scan on booking_data (cost=195.64..13256.42 rows=6210 width=509) (actual time=11.942..28.832 rows=854 loops=1)
Recheck Cond: ((((user_id)::text = 'NJ8QigsGcQCDOttoGsD3iS'::text) AND (last_segment_arrival_at < '2022-06-13 13:36:00+00'::timestamp with time zone)) OR ((bid = ANY ($0)) AND (bid >= 1100000)))
Filter: ((is_deleted IS FALSE) AND (bid >= 1100000) AND (last_segment_arrival_at < '2022-06-13 13:36:00+00'::timestamp with time zone) AND (((confirmation_sent IS TRUE) AND ((final_status)::text <> 'refunded'::text)) OR ((final_status)::text = 'confirmed'::text) OR ((confirmation_sent IS FALSE) AND ((final_status)::text = 'closed'::text))))
Rows Removed by Filter: 10202
Heap Blocks: exact=10935
-> BitmapOr (cost=195.64..195.64 rows=12634 width=0) (actual time=10.139..10.140 rows=0 loops=1)
-> Bitmap Index Scan on idx_booking_data_user_id_last_segment_arrival_at (cost=0.00..176.81 rows=12625 width=0) (actual time=2.024..2.024 rows=11025 loops=1)
Index Cond: (((user_id)::text = 'NJ8QigsGcQCDOttoGsD3iS'::text) AND (last_segment_arrival_at < '2022-06-13 13:36:00+00'::timestamp with time zone))
-> Bitmap Index Scan on booking_data_pkey (cost=0.00..15.72 rows=10 width=0) (actual time=8.113..8.113 rows=2062 loops=1)
Index Cond: ((bid = ANY ($0)) AND (bid >= 1100000))
Planning Time: 0.404 ms
Execution Time: 29.765 ms
在生产实例上,所有查询运行都很慢,即使它是更强大的实例(idx_booking_data_user_id_last_segment_arrival_at
不使用索引):
Limit (cost=523.03..2268.86 rows=4 width=509) (actual time=28549.479..28549.482 rows=4 loops=1)
InitPlan 1 (returns $0)
-> HashAggregate (cost=82.19..82.99 rows=80 width=8) (actual time=155.070..155.307 rows=2053 loops=1)
Group Key: booking_passengers.bid
Batches: 1 Memory Usage: 257kB
-> Index Scan using idx_booking_passengers_user_id on booking_passengers (cost=0.44..81.99 rows=80 width=8) (actual time=0.414..153.733 rows=2053 loops=1)
Index Cond: ((user_id)::text = 'NJ8QigsGcQCDOttoGsD3iS'::text)
-> Incremental Sort (cost=440.05..2710839.81 rows=6210 width=509) (actual time=28549.478..28549.479 rows=4 loops=1)
" Sort Key: booking_data.last_segment_arrival_at DESC, booking_data.bid"
Presorted Key: booking_data.last_segment_arrival_at
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 27kB Peak Memory: 27kB
-> Index Scan Backward using idx_booking_data_last_segment_arrival_at on booking_data (cost=0.44..2710561.90 rows=6210 width=509) (actual time=2034.195..28549.417 rows=5 loops=1)
Index Cond: (last_segment_arrival_at < '2022-06-13 13:36:00+00'::timestamp with time zone)
Filter: ((is_deleted IS FALSE) AND (bid >= 1100000) AND (((confirmation_sent IS TRUE) AND ((final_status)::text <> 'refunded'::text)) OR ((final_status)::text = 'confirmed'::text) OR ((confirmation_sent IS FALSE) AND ((final_status)::text = 'closed'::text))) AND (((user_id)::text = 'NJ8QigsGcQCDOttoGsD3iS'::text) OR (bid = ANY ($0))))
Rows Removed by Filter: 2323153
Planning Time: 1.845 ms
Execution Time: 28549.694 ms
这是关于表格分析的答案吗?
SELECT schemaname, relname, last_analyze FROM pg_stat_all_tables WHERE relname = 'booking_passengers';
所以在两个相关表上运行 ANALYZE :
ANALYZE VERBOSE public.booking_data;
ANALYZE VERBOSE public.booking_passengers;
生产索引仍未使用:(
你的 WHERE 有 5 个 ANDed together 块。在没有 LIMIT 的情况下,每个单独返回多少行?
select count(*) FROM booking_data WHERE bid >= 1100000
- 28208008
select count(*) FROM booking_data WHERE is_deleted IS false
- 29249188
select count(*) FROM booking_data WHERE last_segment_arrival_at < '2022-06-13 13:36'
- 23594003
select count(*)
FROM booking_data
WHERE (booking_data.user_id = 'NJ8QigsGcQCDOttoGsD3iS'
OR booking_data.bid = ANY (CAST(array((
SELECT DISTINCT booking_passengers.bid AS anon_2
FROM booking_passengers
WHERE booking_passengers.user_id = 'NJ8QigsGcQCDOttoGsD3iS')) AS BIGINT[]))
)
11079
select count(*)
FROM booking_data
WHERE (booking_data.confirmation_sent IS true
AND booking_data.final_status != 'refunded'
OR booking_data.final_status = 'confirmed'
OR booking_data.confirmation_sent IS false
AND booking_data.final_status IN ('closed')
)
17294003
我按照建议使用更高的 statistics_target 运行 ANALYZE:
show default_statistics_target ;
set default_statistics_target to 1000;
ANALYZE VERBOSE public.booking_data;
ANALYZE VERBOSE public.booking_passengers;
但仍然没有使用 user_id 和 last_segment_arrival_at 的索引:(
在第一个计划中,通过向后跟踪索引,它可以提供已经在请求的排序中的行,并且一旦找到通过查询条件的 LIMIT 4 加上足够的更多以打破“last_segment_arrival_at”上的任何关系,它就可以提前停止。根据该计划,我们知道它预计有 6211 行通过条件,但我们不知道实际有多少行通过条件。根据第二个计划,我们知道它实际上找到了通过相同条件的 854 行。从 854 字段中获取限制比从 6211 字段中获取限制要多得多,因此它错误估计了该计划的成本也就不足为奇了。还有其他原因,它也可能被错误估计,但既然我们已经有了这个原因的证据,为什么还要推测超出我们所能看到的范围呢?
是什么导致不正确的行估计?我们不知道。可能是 WHERE 的 5 个顶级 AND 部分中的一个或多个部分被错误估计(您没有回答我的问题),或者可能是它们被单独正确估计,但在组合时被错误估计,因为它们不是统计上相互独立。可能只是增加统计样本量就可以解决问题。这很容易测试,您只需要在会话中本地增加 default_statistics_target,分析这两个表,看看它是否有所作为。如果可行,那么您需要决定是否要在系统范围内永久增加 default_statistics_target,或者只是为这些表覆盖它。
那么,为什么这个计划错误会导致更好的产品计划而不是分期呢?如果您查看前两个计划,您会发现两者的成本估算几乎相等,即使运行时间完全不同。由于它们一开始就非常接近,因此即使是很小的差异也可能导致它们交换顺序。因此,不必有一个深刻的答案,如果一个系统在一个系统上排名第一,而另一个系统在另一个系统上排名第一,这可能只是运气不好。但这也可能是一个真正的系统性差异。如果您查看第一个和第四个计划,它们的行估算值几乎相同,但成本估算值却大不相同,这可能是设置的差异。prod 上的 effective_cache_size 很可能较高,或者 random_page_cost 较低。其中任何一个都可能对这两个计划产生不同的影响,从而使选择的计划发生变化。幸运的是,这两个设置都可以在您的会话中本地覆盖,因此很容易测试在 prod 上将它们更改为与在舞台上相同会导致它选择另一个计划。
如果这些都不起作用,或者如果您只是想在 prod 上强制执行其他计划,以便您至少可以为它制定计划,您可以通过在本地将 enable_incremental_sort 设置为 off 或删除索引 idx_booking_data_last_segment_arrival_at 或通过重写 ORDER BY 以包含一些虚拟算术,例如: