我可以在使用数据库后激活 PITR 吗？

Question

Asked: 2023-07-24 20:31:51 +0800 CST2023-07-24 20:31:51 +0800 CST 2023-07-24 20:31:51 +0800 CST

Postgres15 memoize 减慢了许多查询的速度

772

我有一个使用联接和分组依据的查询。它正在处理的表由几百万条记录（10-5000 万）组成，其中一条已分区（不确定是否相关，但提供尽可能多的信息）。由于某种原因，这个查询（以及我研究中的许多其他查询，不一定必须包含分组依据）运行得有点慢，但是当更改时，enable_memoize=false它的运行时间几乎减少了一半。

为什么会出现这种情况？整个网络 memoize 都被誉为一项可以改进许多查询的出色新功能。

是否需要更改任何 psql 设置才能使 memoize 快速工作/不被规划者选择，因为它是较差的计划（这不是直接禁用 memoize）？

查询本身：

EXPLAIN ANALYZE 
SELECT table3.id, table3.type, count(*) FROM table1 
 JOIN table2 ON table1.id=table2.table1_id AND table1.tenant_id=table2.tenant_id
 JOIN table3 ON table2.table3_id=table3.id AND table2.tenant_id=table3.tenant_id
WHERE table1.tenant_id=123 
GROUP BY table3.id, table3.type
ORDER BY count(*) DESC LIMIT 10;

产出计划：

 Limit  (cost=79555.10..79555.12 rows=10 width=45) (actual time=11017.288..11017.294 rows=10 loops=1)
   ->  Sort  (cost=79555.10..80128.80 rows=229481 width=45) (actual time=11017.286..11017.291 rows=10 loops=1)
         Sort Key: (count(*)) DESC
         Sort Method: top-N heapsort  Memory: 26kB
         ->  HashAggregate  (cost=68715.65..74596.10 rows=229481 width=45) (actual time=9814.014..10892.716 rows=814630 loops=1)
               Group Key: table3.id
               Planned Partitions: 4  Batches: 33  Memory Usage: 9265kB  Disk Usage: 105720kB
               ->  Merge Join  (cost=5.67..59034.42 rows=229481 width=37) (actual time=0.101..8846.868 rows=1912806 loops=1)
                     Merge Cond: (table2.table1_id = table1.id)
                     ->  Nested Loop  (cost=0.87..184852.18 rows=919397 width=53) (actual time=0.062..7766.353 rows=1912806 loops=1)
                           ->  Index Scan using idx_table2_tenant_id_table1id on table2  (cost=0.43..124987.43 rows=1872626 width=24) (actual time=0.034..1710.167 rows=1912806 loops=1)
                                 Index Cond: (tenant_id = 123)
                           ->  Memoize  (cost=0.44..0.62 rows=1 width=45) (actual time=0.003..0.003 rows=1 loops=1912806)
                                 Cache Key: table2.table3_id
                                 Cache Mode: logical
                                 Hits: 1040220  Misses: 872586  Evictions: 816079  Overflows: 0  Memory Usage: 8389kB
                                 ->  Index Scan using table3_pkey on table3  (cost=0.43..0.61 rows=1 width=45) (actual time=0.004..0.004 rows=1 loops=872586)
                                       Index Cond: (id = table2.table3_id)
                                       Filter: (tenant_id = 123)
                     ->  Index Only Scan using table1_partition_123_pkey on table1_partition_123 table1  (cost=0.43..39581.09 rows=1683017 width=16) (actual time=0.035..455.399 rows=1912806 loops=1)
                           Index Cond: (tenant_id = 123)
                           Heap Fetches: 9346
 Planning Time: 7.250 ms
 Execution Time: 11038.258 ms

设置后的输出计划enable_memoize=false：

  Limit  (cost=102850.70..102850.72 rows=10 width=45) (actual time=6040.773..6040.960 rows=10 loops=1)
   ->  Sort  (cost=102850.70..103424.40 rows=229481 width=45) (actual time=6040.772..6040.957 rows=10 loops=1)
         Sort Key: (count(*)) DESC
         Sort Method: top-N heapsort  Memory: 26kB
         ->  HashAggregate  (cost=92011.24..97891.69 rows=229481 width=45) (actual time=4841.865..5916.868 rows=814630 loops=1)
               Group Key: table3.id
               Planned Partitions: 4  Batches: 33  Memory Usage: 9265kB  Disk Usage: 105720kB
               ->  Merge Join  (cost=1005.72..82330.01 rows=229481 width=37) (actual time=10.344..3868.398 rows=1912806 loops=1)
                     Merge Cond: (table2.table1_id = table1.id)
                     ->  Gather Merge  (cost=1000.92..508058.81 rows=919397 width=53) (actual time=10.288..2796.661 rows=1912806 loops=1)
                           Workers Planned: 4
                           Workers Launched: 4
                           ->  Nested Loop  (cost=0.86..397549.71 rows=229849 width=53) (actual time=0.071..2360.817 rows=382561 loops=5)
                                 ->  Parallel Index Scan using idx_table2_tenant_id_table1id on table2  (cost=0.43..110942.73 rows=468156 width=24) (actual time=0.040..403.754 rows=382561 loops=5)
                                       Index Cond: (tenant_id = 123)
                                 ->  Index Scan using table3_pkey on table3  (cost=0.43..0.61 rows=1 width=45) (actual time=0.004..0.004 rows=1 loops=1912806)
                                       Index Cond: (id = table2.table3_id)
                                       Filter: (tenant_id = 123)
                     ->  Index Only Scan using table1_partition_123_pkey on table1_partition_123 table1  (cost=0.43..39581.09 rows=1683017 width=16) (actual time=0.052..460.241 rows=1912806 loops=1)
                           Index Cond: (tenant_id = 123)
                           Heap Fetches: 9346
 Planning Time: 0.907 ms
 Execution Time: 6061.154 ms

可能相关的 Psql 设置（？）

max_parallel_workers_per_gather=4;
max_parallel_workers=32;
max_parallel_maintenance_workers=4;
random_page_cost=1.1;
work_mem='4194kB';

jjanes · Answer 1 · 2023-07-25T01:49:43+08:00

如果无法访问您的数据和机器，很难在这里给出确切的细节，但对我来说，大致的思路似乎很清楚。

规划器认为它可以从 memoize 中受益，也可以从并行查询中受益。但它并不认为它可以从结合使用两者中受益（如果每个工作人员都有自己的缓存，那么命中率就会下降，因为他们看不到彼此的缓存，而如果他们有共享缓存，他们必须支付大量的锁定成本以免互相破坏），因此必须在它们之间进行选择。它做出了错误的选择。它要么高估了记忆的好处，要么（我认为更有可能）低估了并行化的好处。可能是因为parallel_tuple_cost的默认设置相当高，这不鼓励并行查询。这会因 random_page_cost 的低设置而变得更加复杂，因为并行随机读取是并行化的主要好处，

但您还必须记住，并行化不是免费的。如果您使用多 5 倍的资源来以 2 倍的速度获得答案，那么如果其他资源将被闲置，那就太好了。但是，如果您的生产服务器同时发生其他可能会使用这些资源的事情，那么并行执行每一件事情都会造成净损失，即使单独执行每件事都会带来收益。

Postgres15 memoize 减慢了许多查询的速度

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

Postgres15 memoize 减慢了许多查询的速度

1 个回答

相关问题