我有这个查询,来自 TPCH-H 基准:
explain analyze select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice*(1 - l_discount)) as sum_disc_price,
sum(l_extendedprice*(1 - l_discount)*(1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate<='31/08/1998'
GROUP by
l_returnflag,
l_linestatus
ORDER by
l_returnflag,
l_linestatus
返回这个:
"Finalize GroupAggregate (cost=2300777.06..2300779.00 rows=6 width=212) (actual time=38289.923..38290.426 rows=4 loops=1)"
" Group Key: l_returnflag, l_linestatus"
" -> Gather Merge (cost=2300777.06..2300778.46 rows=12 width=212) (actual time=38289.907..38290.390 rows=12 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Sort (cost=2299777.04..2299777.05 rows=6 width=212) (actual time=38284.169..38284.169 rows=4 loops=3)"
" Sort Key: l_returnflag, l_linestatus"
" Sort Method: quicksort Memory: 27kB"
" Worker 0: Sort Method: quicksort Memory: 27kB"
" Worker 1: Sort Method: quicksort Memory: 27kB"
" -> Partial HashAggregate (cost=2299776.84..2299776.96 rows=6 width=212) (actual time=38284.129..38284.133 rows=4 loops=3)"
" Group Key: l_returnflag, l_linestatus"
" Batches: 1 Memory Usage: 24kB"
" Worker 0: Batches: 1 Memory Usage: 24kB"
" Worker 1: Batches: 1 Memory Usage: 24kB"
" -> Parallel Seq Scan on lineitem (cost=0.00..1493832.54 rows=21491848 width=24) (actual time=0.281..29321.949 rows=17236798 loops=3)"
" Filter: (l_shipdate <= '1998-08-31'::date)"
" Rows Removed by Filter: 256933"
"Planning Time: 3.870 ms"
"Execution Time: 38290.784 ms"
它涉及这种关系:
CREATE TABLE LINEITEM
(
L_ORDERKEY INTEGER REFERENCES ORDERS(O_ORDERKEY),
L_PARTKEY INTEGER REFERENCES PART(P_PARTKEY),
L_SUPPKEY INTEGER REFERENCES SUPPLIER(S_SUPPKEY),
L_LINENUMBER INTEGER,
L_QUANTITY INTEGER,
L_EXTENDEDPRICE NUMERIC (12,2),
L_DISCOUNT NUMERIC (12,2),
L_TAX NUMERIC (12,2),
L_RETURNFLAG CHAR (1),
L_LINESTATUS CHAR (1),
L_SHIPDATE DATE ,
L_COMMITDATE DATE ,
L_RECEIPTDATE DATE ,
L_SHIPINSTRUCT CHAR (25),
L_SHIPMODE CHAR (10),
L_COMMENT CHAR (44),
L_PARTSUPPKEY CHAR (20) REFERENCES PARTSUPP(PS_PARTSUPPKEY)
)
如您所见,它大约需要 40 秒,我想对此进行优化。我在列上添加了一个 b 树索引L_SHIPDATE
(排序顺序 ASC 和最后一个 NULL)。
- 我怎样才能做得更好?
正如您在此处看到的,优化器没有使用索引,l_shipdate
因此他更喜欢顺序扫描lineitem
表。
没有什么神奇的方法可以让这更快。您的选择是:
更快的磁盘
更多 RAM,并确保表缓存在 RAM 中
通过增加投入更多的工人
max_parallel_workers_per_gather