我可以在使用数据库后激活 PITR 吗？

Question

Asked: 2024-07-04 22:18:05 +0800 CST2024-07-04 22:18:05 +0800 CST 2024-07-04 22:18:05 +0800 CST

PostgreSQL 使用慢速索引扫描，而不是使用带有 ORDER BY 和特定 LIMIT 的位图堆 + 索引扫描

772

给出下表：

CREATE TABLE chat_message (
    id bigint DEFAULT nextval('public.chat_message_id_seq'::regclass) NOT NULL,
    "user" integer,
    type smallint,
    text text
);
ALTER TABLE ONLY chat_message ADD CONSTRAINT pk_chat_message PRIMARY KEY (id);
CREATE INDEX idx_chat_message_user_type ON chat_message USING btree ("user", type);
CREATE INDEX k_chat_message_user ON chat_message USING btree ("user");

其中类型为1或NULL，则查询：

EXPLAIN ANALYZE
SELECT *
FROM "chat_message" AS t
WHERE true
  AND "type" = 1
  AND "user" = 1234567
ORDER BY "user", "type", "id" ASC
LIMIT 10 OFFSET 0;

给出以下输出：

                                                                       QUERY PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=53644.94..53644.97 rows=10 width=127) (actual time=4.817..4.818 rows=6 loops=1)
   ->  Sort  (cost=53644.94..53681.60 rows=14663 width=127) (actual time=4.816..4.816 rows=6 loops=1)
         Sort Key: id
         Sort Method: quicksort  Memory: 26kB
         ->  Bitmap Heap Scan on chat_message t  (cost=362.86..53328.08 rows=14663 width=127) (actual time=1.975..2.181 rows=6 loops=1)
               Recheck Cond: (("user" = 1234567) AND (type = 1::smallint))
               Heap Blocks: exact=3
               ->  Bitmap Index Scan on idx_chat_message_user_type  (cost=0.00..359.19 rows=14663 width=0) (actual time=1.822..1.822 rows=6 loops=1)
                     Index Cond: (("user" = 1234567) AND (type = 1::smallint))
 Planning time: 0.348 ms
 Execution time: 5.028 ms

但是一旦 LIMIT 值降低到某个值以下（在我的本地机器上为 9），那么查询计划就会变为这样：

                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..50193.33 rows=9 width=127) (actual time=23119.188..46005.965 rows=6 loops=1)
   ->  Index Scan using pk_chat_message on chat_message t  (cost=0.56..81775168.50 rows=14663 width=127) (actual time=23119.187..46005.962 rows=6 loops=1)
         Filter: ((type = 1::smallint) AND ("user" = 1234567))
         Rows Removed by Filter: 49452956
 Planning time: 14.840 ms
 Execution time: 46006.683 ms

这实在是太慢了。

对于这个确切的用户来说，存在巨大的数据偏差：它有 50 000 行WHERE type is NULL，但只有 6 行WHERE type = 1。此外，请求相同的 LIMIT 9，但WHERE type is NULL具有完全相同的查询计划，但运行速度很快：

                                                                         QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=153793.13..153793.15 rows=9 width=127) (actual time=886.897..886.898 rows=9 loops=1)
   ->  Sort  (cost=153793.13..153909.07 rows=46374 width=127) (actual time=886.894..886.894 rows=9 loops=1)
         Sort Key: gs_type, id
         Sort Method: top-N heapsort  Memory: 27kB
         ->  Bitmap Heap Scan on chat_message t  (cost=1143.90..152826.25 rows=46374 width=127) (actual time=12.561..878.947 rows=49934 loops=1)
               Recheck Cond: (("user" = 1234567) AND (type IS NULL))
               Heap Blocks: exact=10903
               ->  Bitmap Index Scan on idx_chat_message_user_type  (cost=0.00..1132.31 rows=46374 width=0) (actual time=9.942..9.942 rows=49934 loops=1)
                     Index Cond: (("user" = 1234567) AND (type IS NULL))
 Planning time: 0.308 ms
 Execution time: 887.027 ms

在生产服务器上，将完全相同的数据加载到规格与我的笔记本电脑不同的服务器（更多的内存、巨大的shared_buffers、max_mem来自不同其他表的恒定工作负载）中，其行为方式类似，只有限制阈值不同（最高到 75 时很慢Index Scan，然后从 76 开始很快Bitmap Heap Scan+Bitmap Index Scan甚至更高）。

一些附加信息：

SELECT * FROM pg_stat_user_tables WHERE relname = 'chat_message';

relname     |seq_scan   |seq_tup_read   |idx_scan   |idx_tup_fetch  |n_tup_ins  |n_tup_upd  |n_tup_del  |n_tup_hot_upd  |n_live_tup |n_dead_tup |n_mod_since_analyze|last_vacuum|last_autovacuum|last_analyze   |last_autoanalyze   |vacuum_count   |autovacuum_count   |analyze_count  |autoanalyze_count  |
chat_message|0          |0              |11         |197,652,914    |0          |0          |0          |0              |0          |0          |0                  |           |               |               |                   |0              |0                  |0              |0                  |

SELECT * FROM pg_stats where tablename = 'chat_message';

schemaname  |tablename      |attname       |inherited|null_frac|avg_width|n_distinct|most_common_vals
public      |chat_message   |id            |false    |0        |8        |-1        |
public      |chat_message   |user          |false    |0        |4        |30145     |{redacted}
public      |chat_message   |text          |false    |0        |38       |45553     |{redacted}
public      |chat_message   |type          |false    |0.7656   |2        |1         |{1}

我的问题是：

Index Scan为什么当涉及到那几行时，速度就会变得非常慢？
为什么Index Scan总是使用pk_chat_message索引，即使有更合适的索引idx_chat_message_user_type，即使ORDER BY子句包含WHERE子句中的所有字段（order by 影响索引的使用）？
为什么它LIMIT N会影响查询计划，因为它Index Scan更喜欢Bitmap Index + Heap Scan？
怎样才能使这个查询对于这个user + type和其他查询表现良好（在 1 秒内）？

2 个回答

Voted

Laurenz Albe · Answer 1 · 2024-07-05T16:30:47+08:00

PostgreSQL 有两种选择来处理查询：

它可以对该子句使用索引WHERE，然后排序并返回前几个结果（这是你的快速计划）
它可以对该子句使用索引ORDER BY，并丢弃不符合WHERE条件的行，直到找到足够的结果行（这是您的慢速计划）

决定哪个计划更好很难，PostgreSQL 有时肯定会出错。在你这个慢速情况下，它必须扫描 49452957 行，直到找到一个满足条件的行WHERE，即使估计有 14663 行（实际上是 49934 行）满足条件WHERE。问题是 PostgreSQL 没有统计数据可以告诉它所有匹配的行都有一个很大的id，所以它必须扫描很多行，直到找到一个。

当然，如果您只需要很少的结果行，第二种（在您的情况下很慢）策略会变得更有吸引力，这解释了当您减少行数时优化器会切换到这样的计划LIMIT。

注意，“索引扫描”与“位图索引扫描”的处理方式截然不同。前者将按索引顺序返回结果，而后者按表顺序返回行，但如果结果行很多，则性能会更好。

有两种方法可以改善这种情况：

创建一个支持ORDER BY和WHERE条件的索引：
```
CREATE INDEX ON chat_message ("user", type, id);
```
使用一个粗暴的技巧来阻止 PostgreSQL 使用主键索引：
```
... ORDER BY "user", type, id + 0
```

mediocre · Answer 2 · 2024-07-05T16:53:27+08:00

解决倾斜数据最快的方法是。对于您来说，像下面这样的索引将非常快速，而且非常紧凑。您可以通过更改或partial index来调整和测试索引。index columnfiltering column

在 chat_message(<user_id/id>) 上创建索引 ix_partial_type，其中 type=1；

PostgreSQL 使用慢速索引扫描，而不是使用带有 ORDER BY 和特定 LIMIT 的位图堆 + 索引扫描

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

PostgreSQL 使用慢速索引扫描，而不是使用带有 ORDER BY 和特定 LIMIT 的位图堆 + 索引扫描

2 个回答

相关问题