我有这张桌子
Table "public.lineitem"
Column | Type | Collation | Nullable | Default
-----------------+---------------+-----------+----------+---------
l_orderkey | integer | | |
l_partkey | integer | | |
l_suppkey | integer | | |
l_linenumber | integer | | |
l_quantity | integer | | |
l_extendedprice | numeric(12,2) | | |
l_discount | numeric(12,2) | | |
l_tax | numeric(12,2) | | |
l_returnflag | character(1) | | |
l_linestatus | character(1) | | |
l_shipdate | date | | |
l_commitdate | date | | |
l_receiptdate | date | | |
l_shipinstruct | character(25) | | |
l_shipmode | character(10) | | |
l_comment | character(44) | | |
l_partsuppkey | character(20) | | |
Indexes:
"l_shipdate_c_idx" btree (l_shipdate) CLUSTER
"l_shipmode_h_idx" hash (l_shipdate)
Foreign-key constraints:
"lineitem_l_orderkey_fkey" FOREIGN KEY (l_orderkey) REFERENCES orders(o_orderkey)
"lineitem_l_partkey_fkey" FOREIGN KEY (l_partkey) REFERENCES part(p_partkey)
"lineitem_l_partsuppkey_fkey" FOREIGN KEY (l_partsuppkey) REFERENCES partsupp(ps_partsuppkey)
"lineitem_l_suppkey_fkey" FOREIGN KEY (l_suppkey) REFERENCES supplier(s_suppkey)
这个查询:
explain analyze select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice*(1 - l_discount)) as sum_disc_price,
sum(l_extendedprice*(1 - l_discount)*(1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate<='31/08/1998'
GROUP by
l_returnflag,
l_linestatus
ORDER by
l_returnflag,
l_linestatus
返回此查询计划:
"Finalize GroupAggregate (cost=2631562.25..2631564.19 rows=6 width=212) (actual time=28624.012..28624.466 rows=4 loops=1)"
" Group Key: l_returnflag, l_linestatus"
" -> Gather Merge (cost=2631562.25..2631563.65 rows=12 width=212) (actual time=28623.998..28624.442 rows=12 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Sort (cost=2630562.23..2630562.24 rows=6 width=212) (actual time=28620.633..28620.633 rows=4 loops=3)"
" Sort Key: l_returnflag, l_linestatus"
" Sort Method: quicksort Memory: 27kB"
" Worker 0: Sort Method: quicksort Memory: 27kB"
" Worker 1: Sort Method: quicksort Memory: 27kB"
" -> Partial HashAggregate (cost=2630562.03..2630562.15 rows=6 width=212) (actual time=28620.607..28620.611 rows=4 loops=3)"
" Group Key: l_returnflag, l_linestatus"
" Batches: 1 Memory Usage: 24kB"
" Worker 0: Batches: 1 Memory Usage: 24kB"
" Worker 1: Batches: 1 Memory Usage: 24kB"
" -> Parallel Seq Scan on lineitem (cost=0.00..1707452.35 rows=24616258 width=24) (actual time=0.549..19028.353 rows=19701655 loops=3)"
" Filter: (l_shipdate <= '1998-08-31'::date)"
" Rows Removed by Filter: 293696"
"Planning Time: 0.374 ms"
"Execution Time: 28624.523 ms"
- 为什么优化器更喜欢顺序扫描
lineitem
而不是使用表l_shipdate_c_idx
?我应该放弃它吗?
Postgres 版本:PostgreSQL 14.2 on x86_64-apple-darwin20.6.0, compiled by Apple clang version 12.0.0 (clang-1200.0.32.29), 64-bit
你的过滤器
不是很有选择性,我们可以从计划中看到它只负责删除 293,696 行,最终需要使用 19,701,655。如果要使用索引一一读取这些行,它可能会比顺序扫描表慢得多。
如果这是您正在运行的唯一查询并且正在使用的唯一过滤器,那么可能。否则,没有足够的信息继续下去。如果您想查看某一天的行,该索引可能会很有用。索引中包含一些额外的列可能会更好。不可能在不知道您的应用程序的情况下说。