我可以在使用数据库后激活 PITR 吗？

Question

Asked: 2023-09-18 23:53:14 +0800 CST2023-09-18 23:53:14 +0800 CST 2023-09-18 23:53:14 +0800 CST

为什么在表 ANALYZE 之后索引扫描获取的活动行数会下降？

772

我们使用 PostgreSQL 12 并有一个简单的表，event_participant存储 100 GB 的数据。 event_participant具有所有必要的索引，因此使用它们来获取所有行，即，不使用顺序扫描来获取任何行。

通常，它每秒获取 65 行，但有一天上午 10 点，我们运行了一项计划好的活动，其中使用索引扫描获取的行数跃升至每秒 540 万行。然而，索引扫描的数量保持不变，为每秒 200 次。表内容开始缓慢变化，但不足以触发自动分析，因为autovacuum_analyze_scale_factor是表大小的 0.01 或 1%。

值得一提的是，我们在此数据库上配置了plan_cache_modeTO ，因为我们的应用程序使用准备好的语句，并且我们希望避免由于实时活动而产生通用计划。force_custom_plan

经过 3 个小时的巨大 CPU 负载和索引扫描，我们手动执行了一次ANALYZE，event_participant索引扫描获取的活动行数立即从540 万行/秒下降到 450 行/秒。

我试图弄清楚该ANALYZE命令如何影响索引扫描获取的活动行数，而索引扫描数保持不变。

更新 - 包括有关表结构和索引的更多详细信息。

> \d+ event_participant
                            Table "public.event_participant"
  Column  |       Type       | Collation | Nullable | Default | Storage  | Stats target | Description 
----------+------------------+-----------+----------+---------+----------+--------------+-------------
 event_id | text             |           | not null |         | extended |              | 
 user_id  | bigint           |           | not null |         | plain    |              | 
 progress | text             |           | not null |         | extended |              | 
 level    | integer          |           | not null | 0       | plain    |              | 
 quality  | double precision |           |          |         | plain    |              | 
Indexes:
    "event_participant_pkey" PRIMARY KEY, btree (user_id, event_id)
    "event_participant_event_id_idx" btree (event_id)
Access method: heap

因此，上午 10 点，包含新事件的活动开始（新 event_id），并且该event_participant表开始增长。每次用户登录时，后端应用程序知道哪些事件处于活动状态，并通过 user_id 和 event_id 选择所有条目：SELECT * from event_participant WHERE user_id=? AND event_id=?;以获取用户的进度。

1 个回答

Voted

Milan Ilic · Answer 1 · 2023-09-27T21:35:26+08:00

再次，自事件开始以来，event_participant表开始增长，但不足以触发autovacuum_analyze,更新查询计划。

在上午 10 点事件开始之前，event_id=tour2023表中不存在该事件，因此在数小时前发生的最新 autovacuum_analyze 期间，查询计划不知道tour2023，因此它建议使用索引， event_participant_event_id_idx.我通过运行使用不存在的 event_id 解释 SELECT；它使用在其上创建的索引，然后按 user_id 过滤行：

explain select * from event_participant where user_id = 1 and event_id = 'bla';
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using event_participant_event_id_idx on event_participant  (cost=0.56..1.61 rows=1 width=1409)
   Index Cond: (event_id = 'bla'::text)
   Filter: (user_id = 1)

这意味着在tour2023运行查询时事件启动后SELECT * from event_participant WHERE user_id=? AND event_id=?;，PostgreSQL 用于event_participant_event_id_idx获取所有行event_id=tour2023，然后通过user_id而不是使用复合索引来过滤所需的行"event_participant_pkey" PRIMARY KEY, btree (user_id, event_id)。这导致索引扫描获取的行数增加，以及巨大的 CPU 使用率。

手动运行后ANALYZE，查询计划被更新，数据库决定使用复合索引。因此，索引扫描获取的行数下降至 450 行/秒。

使用现有 event_id 时解释输出：

explain select * from event_participant where user_id = 1 and event_id = 'tour2023';
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Index Scan using event_participant_pkey on event_participant  (cost=0.56..2.58 rows=1 width=1409)
   Index Cond: ((user_id = 1) AND (event_id = 'tour2023'::text))

所以，答案是查询计划已经过时，PostgreSQL 决定使用次优索引。

我仍然不明白 PostgreSQL 仅使用 (event_id) 索引的部分原因，因为我预计当在查询中同时指定 user_id 和 event_id 时，查询规划器会支持复合索引。

为什么在表 ANALYZE 之后索引扫描获取的活动行数会下降？

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

为什么在表 ANALYZE 之后索引扫描获取的活动行数会下降？

1 个回答

相关问题