我正在运行以下 Postgres 查询,其中根据消费者使用服务表中的服务的时间,将消费者的使用情况与该时间间隔的服务成本结合起来。下面的查询是一个截断版本,有时我必须执行多达 12 个或更多连接。问题是运行一个查询可能需要 2.5 分钟。我怎样才能减少这个时间?我采取的方法正确吗?
select
c.consumption,
c.interval_start,
c.interval_end,
s1.value_exc_vat as service1_price,
s2.value_exc_vat as service2_price
from
consumer as c
left join service1 as s1 on c.interval_start >= timestamp '2023-03-18T00:00:00Z'
and c.interval_start < timestamp '2024-02-15T00:00:00Z'
and (
s1.payment_method is null
or s1.payment_method = 'DIRECT_DEBIT'
)
and c.interval_start >= s1.valid_from
and (
c.interval_start < s1.valid_to
or s1.valid_to is null
)
left join service2 as s2 on c.interval_start >= timestamp '2024-02-15T00:00:00Z'
and c.interval_start < timestamp '2025-02-15T00:00:00Z'
and (
s2.payment_method is null
or s2.payment_method = 'DIRECT_DEBIT'
)
and c.interval_start >= s2.valid_from
and (
c.interval_start < s2.valid_to
or s2.valid_to is null
)
order by
c.interval_start desc
我已将问题隔离到每个联接中的查询的这一部分:
and c.interval_start >= s1.valid_from
and (
c.interval_start < s1.valid_to
or s1.valid_to is null
)
看来找到正确的间隔来加入表需要花费很多时间。对于消费者表和服务表来说,时间段可能相隔 30 分钟到几个月或几年不等,所以我不能做一个简单的计算c.interval_start = s1.valid_from and c.interval_end = s1.valid_to
这是一个EXPLAIN (ANALYZE, BUFFERS)
:
"QUERY PLAN"
"Sort (cost=5523752586.99..5549030615.96 rows=10111211585 width=28) (actual time=140588.278..140589.028 rows=20593 loops=1)"
" Sort Key: c.interval_start DESC"
" Sort Method: quicksort Memory: 2216kB"
" Buffers: shared hit=396"
" -> Nested Loop Left Join (cost=0.00..2633916836.23 rows=10111211585 width=28) (actual time=12.169..140546.307 rows=20593 loops=1)"
" Join Filter: ((c.interval_start >= '2023-03-18 00:00:00'::timestamp without time zone) AND (c.interval_start < '2024-02-15 00:00:00'::timestamp without time zone) AND (c.interval_start >= s1.valid_from) AND ((c.interval_start < s1.valid_to) OR (s1.valid_to IS NULL)))"
" Rows Removed by Join Filter: 559372220"
" Buffers: shared hit=396"
" -> Nested Loop Left Join (cost=0.00..3979716.14 rows=4302977 width=24) (actual time=0.058..27617.147 rows=20593 loops=1)"
" Join Filter: ((c.interval_start >= '2024-02-15 00:00:00'::timestamp without time zone) AND (c.interval_start < '2025-02-15 00:00:00'::timestamp without time zone) AND (c.interval_start >= s2.valid_from) AND ((c.interval_start < s2.valid_to) OR (s2.valid_to IS NULL)))"
" Rows Removed by Join Filter: 176848172"
" Buffers: shared hit=196"
" -> Seq Scan on consumer c (cost=0.00..337.93 rows=20593 width=20) (actual time=0.007..21.813 rows=20593 loops=1)"
" Buffers: shared hit=132"
" -> Materialize (cost=0.00..214.29 rows=8588 width=20) (actual time=0.000..0.272 rows=8588 loops=20593)"
" Buffers: shared hit=64"
" -> Seq Scan on service2 s2 (cost=0.00..171.35 rows=8588 width=20) (actual time=0.006..0.891 rows=8588 loops=1)"
" Filter: ((payment_method IS NULL) OR (payment_method = 'DIRECT_DEBIT'::bpchar))"
" Buffers: shared hit=64"
" -> Materialize (cost=0.00..675.37 rows=27164 width=20) (actual time=0.000..0.896 rows=27164 loops=20593)"
" Buffers: shared hit=200"
" -> Seq Scan on service1 s1 (cost=0.00..539.55 rows=27164 width=20) (actual time=0.002..2.830 rows=27164 loops=1)"
" Filter: ((payment_method IS NULL) OR (payment_method = 'DIRECT_DEBIT'::bpchar))"
" Buffers: shared hit=200"
"Planning Time: 0.107 ms"
"Execution Time: 140590.312 ms"
我认为你可以替换
和
然后可以使用索引
通过使用功能索引,您不需要重构表