我在一些查询计划中看到,父节点是 Finalize GroupAggregate,但其子节点是 Partial HashAggregates。这什么时候有意义?
例如,我有一个类似于以下内容的查询:
=# SELECT x, count(*) AS n FROM t GROUP BY x ;
没有涉及排序,那么为什么它选择顶部的 GroupAggregate 呢?为什么并行工作者使用 HashAggregate?
"Finalize GroupAggregate (cost=44630.76..47219.48 rows=10218 width=24) (actual time=270.025..309.145 rows=27909 loops=1)"
" Group Key: x"
" -> Gather Merge (cost=44630.76..47015.12 rows=20436 width=24) (actual time=270.014..293.964 rows=61056 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Sort (cost=43630.73..43656.28 rows=10218 width=24) (actual time=264.612..270.608 rows=20352 loops=3)"
" Sort Key: x"
" Sort Method: external merge Disk: 728kB"
" Worker 0: Sort Method: external merge Disk: 720kB"
" Worker 1: Sort Method: external merge Disk: 776kB"
" -> Partial HashAggregate (cost=39474.60..42950.27 rows=10218 width=24) (actual time=198.285..223.757 rows=20352 loops=3)"
" Group Key: x"
" Batches: 5 Memory Usage: 1073kB Disk Usage: 2312kB"
" Worker 0: Batches: 5 Memory Usage: 1073kB Disk Usage: 1760kB"
" Worker 1: Batches: 5 Memory Usage: 1073kB Disk Usage: 3400kB"
" -> Parallel Seq Scan on t (cost=0.00..17344.46 rows=345446 width=16) (actual time=0.053..52.217 rows=276357 loops=3)"
我在这个问题中也看到了类似的东西。尽管在这种情况下,我不知道原始查询。