我可以在使用数据库后激活 PITR 吗？

Question

Asked: 2024-05-24 19:41:44 +0800 CST2024-05-24 19:41:44 +0800 CST 2024-05-24 19:41:44 +0800 CST

不同的过滤器值会导致不同（较慢）的查询计划

772

我正在 Postgres 15 中使用警报表上的 Timescale 扩展执行以下查询，以获取用户名的最新警报。

EXPLAIN ANALYZE
SELECT *
FROM alerts_alerts
WHERE username IN ('<username_here>')
ORDER BY timestamp DESC
LIMIT 1

对于大多数用户名，查询执行速度很快，不到 150 毫秒。但是，对于某些用户名，需要更长的时间。几乎所有数据库都有大约相同数量的警报，大约 450 个，并且大多数数据库都有相当新的数据，全部在过去 6 个月内。

这是Explain Analyze有问题的用户名：

"Limit  (cost=0.29..2262.68 rows=1 width=86) (actual time=36129.346..36129.370 rows=1 loops=1)"
"  ->  Custom Scan (ChunkAppend) on alerts_alerts  (cost=0.29..2262.68 rows=1 width=86) (actual time=36129.344..36129.368 rows=1 loops=1)"
"        Order: alerts_alerts.""timestamp"" DESC"
"        ->  Index Scan using _hyper_1_234_chunk_alerts_alerts_timestamp_idx_1 on _hyper_1_234_chunk  (cost=0.29..2262.68 rows=1 width=89) (actual time=5.795..5.796 rows=0 loops=1)"
"              Filter: ((username)::text = 'username_long_query'::text)"
"              Rows Removed by Filter: 30506"
"        ->  Index Scan using _hyper_1_233_chunk_alerts_alerts_timestamp_idx_1 on _hyper_1_233_chunk  (cost=0.29..4337.82 rows=1 width=91) (actual time=11.112..11.112 rows=0 loops=1)"
"              Filter: ((username)::text = 'username_long_query'::text)"
"              Rows Removed by Filter: 59534"
            [   ...     Cut redundant log lines here    ...    ]
"        ->  Index Scan using _hyper_1_156_chunk_alerts_alerts_timestamp_idx_1 on _hyper_1_156_chunk  (cost=0.42..11418.54 rows=2591 width=80) (never executed)"
"              Filter: ((username)::text = 'username_long_query'::text)"
"        ->  Index Scan using _hyper_1_155_chunk_alerts_alerts_timestamp_idx_1 on _hyper_1_155_chunk  (cost=0.29..7353.95 rows=749 width=84) (never executed)"
"              Filter: ((username)::text = 'username_long_query'::text)"
            [   ...     Cut redundant log lines here    ...    ]
"Planning Time: 13.154 ms"
"Execution Time: 36129.923 ms"

现在，这是Explain Analyze快速执行的用户名：

"Limit  (cost=471.73..471.73 rows=1 width=458) (actual time=1.672..1.691 rows=1 loops=1)"
"  ->  Sort  (cost=471.73..472.76 rows=414 width=458) (actual time=1.671..1.689 rows=1 loops=1)"
"        Sort Key: _hyper_1_234_chunk.""timestamp"" DESC"
"        Sort Method: top-N heapsort  Memory: 27kB"
"        ->  Append  (cost=0.29..469.66 rows=414 width=457) (actual time=1.585..1.654 rows=210 loops=1)"
"              ->  Index Scan using _hyper_1_234_chunk_alerts_alerts_fleet_a3933a38_1 on _hyper_1_234_chunk  (cost=0.29..2.49 rows=1 width=372) (actual time=0.006..0.007 rows=0 loops=1)"
"                    Index Cond: ((username)::text = 'username_value'::text)"
"              ->  Index Scan using _hyper_1_233_chunk_alerts_alerts_fleet_a3933a38_1 on _hyper_1_233_chunk  (cost=0.29..2.37 rows=1 width=385) (actual time=0.006..0.006 rows=0 loops=1)"
"                    Index Cond: ((username)::text = 'username_value'::text)"
            [   ...     Cut redundant log lines here    ...    ]
"              ->  Seq Scan on _hyper_1_83_chunk  (cost=0.00..1.12 rows=1 width=504) (actual time=0.013..0.013 rows=0 loops=1)"
"                    Filter: ((username)::text = 'username_value'::text)"
"                    Rows Removed by Filter: 10"
"              ->  Seq Scan on _hyper_1_81_chunk  (cost=0.00..1.12 rows=1 width=504) (actual time=0.009..0.009 rows=0 loops=1)"
"                    Filter: ((username)::text = 'username_value'::text)"
"                    Rows Removed by Filter: 10"
"Planning Time: 899.811 ms"
"Execution Time: 2.613 ms"

初步研究建议对数据库表进行维护。执行vacuum命令后，再次执行查询，但结果没有变化。

还应该指出的是，还有其他用户名使用“有问题的”规划，但执行时间仍然很快。

不确定如何解决查询执行时间的这种差异。添加另一个索引可能很有用，但由于我是 PostgreSQL 的新手，我目前不确定最好的方法。

1 个回答

Voted

Erwin Brandstetter · Answer 1 · 2024-05-27T08:19:17+08:00

“按一列过滤 ( username)，按另一列排序 ( timestamp)，！LIMIT 1”

这是两种可能的方法之间由来已久的斗争：

遍历上的索引timestamp，过滤出正确的用户名。第一击就完成了任务。这就是你的第一个计划中发生的事情：

"        ->  Index Scan using _hyper_1_234_chunk_alerts_alerts_timestamp_idx_1 on _hyper_1_234_chunk  (cost=0.29..2262.68 rows=1 width=89) (actual time=5.795..5.796 rows=0 loops=1)"
"              Filter: ((username)::text = 'username_long_query'::text)"
"              Rows Removed by Filter: 30506"

只需一遍又一遍地重复 Timescale 超表中的所有分区（他们将分区称为“块”）。

它的效果很好，除非第一次点击已经是很久以前的事了——就像眼前的情况一样。如果 Postgres 没有可用的有效信息（列统计信息：最常见值列表、n_distinct设置），它可能会陷入这个陷阱。

使用索引检索username给定用户名的所有行，然后排序并获取最新的行。这就是你的第二个计划所显示的：

"        ->  Append  (cost=0.29..469.66 rows=414 width=457) (actual time=1.585..1.654 rows=210 loops=1)"
"              ->  Index Scan using _hyper_1_234_chunk_alerts_alerts_fleet_a3933a38_1 on _hyper_1_234_chunk  (cost=0.29..2.49 rows=1 width=372) (actual time=0.006..0.007 rows=0 loops=1)"
"                    Index Cond: ((username)::text = 'username_value'::text)"

如果合格的行很少，甚至最新的行也被埋在许多较新的、不合格的行之下，那么效率会高得多。

如果您的查询一次只过滤一个，那么username多列索引(username, timestamp DESC)将是完美的。

但是您拥有的 Timescale 超表（我假设）是在timestamp列上分区的（“时间分区为块”）。首先针对查询进行了优化timestamp，因此最佳策略变得很棘手。通常，仍称多列索引。然后 Postgres/Timescale 仍然必须查看每个块（或只是索引），从最年轻的开始，直到找到给定的第一个条目username。但现在它不必遍历所有行只是为了找不到任何内容和状态Rows Removed by Filter: 30506"- 这是示例中块中的所有行。

对于少数不同的用户名，您可以对超级表进行子分区。用繁琐的 Timescale 术语来说，“向超表添加空间分区维度”。但这对于太多不同的用户名来说效率很低。

最好的行动方案取决于整体情况：基数、写入频率、数据分布、硬件资源、服务器配置、Postgres 和 Timescale 版本……可能超出了这里一个简单问题的范围。

有关的：

不同的过滤器值会导致不同（较慢）的查询计划

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

不同的过滤器值会导致不同（较慢）的查询计划

1 个回答

相关问题