我想在 Postgresql 上使用外部合并排序算法对随机记录进行排序的基准测试。因此,我尝试了以下两种方法(一个接一个,保持所有参数/配置相同):
尝试 1:
CREATE TABLE test(id BIGINT, name varchar(200));
INSERT INTO test (id,name) SELECT (random() * 1000000), concat(CONCAT(md5(random()::text), md5(random()::text))) FROM generate_series(1, 1000000) as t;
explain analyze select * from test order by id, name;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=41486.43..63526.06 rows=188898 width=426) (actual time=76.477..207.253 rows=1000000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=40486.40..40722.52 rows=94449 width=426) (actual time=73.418..101.593 rows=333333 loops=3)
Sort Key: id, name
Sort Method: external merge Disk: 29744kB
Worker 0: Sort Method: external merge Disk: 26008kB
Worker 1: Sort Method: external merge Disk: 25512kB
-> Parallel Seq Scan on test (cost=0.00..14278.49 rows=94449 width=426) (actual time=0.011..20.945 rows=333333 loops=3)
Planning Time: 2.820 ms
Execution Time: 227.090 ms
(11 rows)
尝试2:
explain analyze SELECT (random() * 1000000) as id, concat(CONCAT(md5(random()::text), md5(random()::text))) as name
FROM generate_series(1, 1000000) as t order by id, name;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=194348.85..196848.85 rows=1000000 width=40) (actual time=1707.086..1768.986 rows=1000000 loops=1)
Sort Key: ((random() * '1000000'::double precision)), (concat(concat(md5((random())::text), md5((random())::text))))
Sort Method: external merge Disk: 81256kB
-> Function Scan on generate_series t (cost=0.00..40000.00 rows=1000000 width=40) (actual time=55.734..1388.681 rows=1000000 loops=1)
Planning Time: 0.191 ms
JIT:
Functions: 3
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 0.338 ms (Deform 0.000 ms), Inlining 0.000 ms, Optimization 0.497 ms, Emission 11.837 ms, Total 12.672 ms
Execution Time: 1841.843 ms
(10 rows)
有人能向我解释一下,为什么对随机生成的数据进行排序比对磁盘中的类似随机数据进行排序慢吗?
我重新运行了这两个查询max_parallel_workers_per_gather = 0
;第一个查询的延迟下降到 360 毫秒,而正如预期的那样,第二个查询没有变化。