我可以在使用数据库后激活 PITR 吗？

Question

peremeykin

Asked: 2021-11-06 01:55:14 +0800 CST2021-11-06 01:55:14 +0800 CST 2021-11-06 01:55:14 +0800 CST

PostgreSQL：在查询计划中实现为 Merge Join 的内部节点

772

我正在学习 PostgreSQL EXPLAIN 计划节点。目前，我正在研究 Materialize 节点。这是我在博文（https://www.depesz.com/2013/05/09/explaining-the-unexplainable-part-3/）中找到的一个查询和我自己获得的一个计划（从结构上讲，它是与博客文章中的相同）：

set work_mem= '1GB';
explain analyze select * from
(select * from pg_class order by oid) as c
join
(select * from pg_attribute a order by attrelid) as a
on c.oid = a.attrelid;
                                                                              QUERY PLAN                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=34.27..333.37 rows=2913 width=504) (actual time=0.823..28.413 rows=2913 loops=1)
   Merge Cond: (pg_class.oid = a.attrelid)
   ->  Sort  (cost=33.99..34.97 rows=395 width=265) (actual time=0.739..1.084 rows=395 loops=1)
         Sort Key: pg_class.oid
         Sort Method: quicksort  Memory: 130kB
         ->  Seq Scan on pg_class  (cost=0.00..16.95 rows=395 width=265) (actual time=0.038..0.285 rows=395 loops=1)
   ->  Materialize  (cost=0.28..257.05 rows=2913 width=239) (actual time=0.060..11.702 rows=2913 loops=1)
         ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a  (cost=0.28..220.63 rows=2913 width=239) (actual time=0.050..6.827 rows=2913 loops=1)
 Planning Time: 1.472 ms
 Execution Time: 29.617 ms
(10 rows)

我无法理解博客作者关于 Postgres 为何在这里使用 Materialize 的论点。

...合并连接必须匹配几个条件。有些是显而易见的（必须对数据进行排序），有些则不那么明显，因为技术性更强（数据必须可以来回滚动）。

正因为如此（这些不是那么明显的标准），有时 Pg 必须将来自源的数据（在我们的例子中是索引扫描）具体化，以便在使用时具有所有必要的功能。

为什么 Merge Join 需要数据可以来回滚动？根据我对 Merge Join 的理解，它使用两个指针同时迭代两个数据集。Merge Join 算法在向后退时没有案例。无论如何，据我了解，索引扫描实际上是“可前后滚动”的。我多次看到“Index Scan Backward”。那么为什么 Postgres 必须实现它呢？

我在其他来源，即Korry Douglas 和 Susan Douglas的旧书“ PostgreSQL：构建、编程和管理 PostgreSQL 数据库的综合指南”，第 2 版中找到了博客作者观点的确认：

Materialize 也将用于一些合并连接操作。特别是，如果 Merge Join 运算符的内部输入集不是由 Seq Scan、Index Scan、Sort 或 Materialize 运算符生成的，则计划器/优化器将在计划中插入 Materialize 运算符。这条规则背后的原因并不明显，它更多地与其他操作员的能力有关，而不是与您的数据的性能或结构有关。Merge Join 运算符很复杂；Merge Join 的一项要求是输入集必须按连接列排序。第二个要求是内部输入集必须是可重新定位的；也就是说，Merge Join 需要在输入集中前后移动。并非所有有序运算符都可以前后移动。如果内部输入集是由一个不可重定位的算子产生的，

在我的例子中，内部输入集是由 Index Scan 产生的，所以根据本书，这个计划中应该没有 Materialize 节点。

然后我决定修改查询，使规划器不会使用 Materialize，但 Merge Join 仍然存在。这就是我想出的：

set enable_hashjoin = off;
set work_mem= '1GB';
explain analyze select * from
(select * from pg_class order by oid) as c
join
(select * from pg_attribute a) as a
on c.oid = a.attrelid;
                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=34.27..296.96 rows=2913 width=504) (actual time=0.554..10.103 rows=2913 loops=1)
   Merge Cond: (pg_class.oid = a.attrelid)
   ->  Sort  (cost=33.99..34.97 rows=395 width=265) (actual time=0.491..0.645 rows=395 loops=1)
         Sort Key: pg_class.oid
         Sort Method: quicksort  Memory: 130kB
         ->  Seq Scan on pg_class  (cost=0.00..16.95 rows=395 width=265) (actual time=0.021..0.177 rows=395 loops=1)
   ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a  (cost=0.28..220.63 rows=2913 width=239) (actual time=0.041..2.296 rows=2913 loops=1)
 Planning Time: 1.188 ms
 Execution Time: 10.731 ms
(9 rows)

我从第二个子查询中删除order by attrelid并强行禁用了 Hash Join。除了 Materialize 节点外，此计划与上一个计划相同。因此，我得出结论，Merge Join 不是规划器在上一个中使用 Materialize 的原因。这个计划比较便宜，但我想结果是一样的。

如果您能帮助我解决其中的一些难题，我将不胜感激：

Merge Join 真的需要能够向后迭代内部数据集吗？在哪些情况下？就 Merge Join 要求而言，Index Scan 的结果是否“向后迭代”？
为什么 Postgres 计划器在第一个计划中使用 Materialize，即使它成本更高？它在这里服务的目的是什么？为什么 Postgres planner 在第二个计划中不使用 Materialize？

2 个回答

Voted

jjanes · Answer 1 · 2021-11-06T07:53:36+08:00

jjanes

2021-11-06T07:53:36+08:002021-11-06T07:53:36+08:00

为什么 Merge Join 需要数据可以来回滚动？根据我对 Merge Join 的理解，它使用两个指针同时迭代两个数据集。Merge Join 算法在向后退时没有案例。

所以第一个节点产生了一个“猫”。第二个节点扫描（忽略结果）直到它找到一个“猫”或更大，产生结果直到它看到一个>“猫”，然后暂停。现在第一个节点产生另一个“猫”。现在，您认为第二个节点应该做什么？

在我的例子中，内部输入集是由 Index Scan 产生的，所以根据本书，这个计划中应该没有 Materialize 节点。

~~如果它只是认为使用物化会更快怎么办？~~（好吧，正如Laurenz指出的那样，这里不是这种情况）

4

Laurenz Albe · Answer 2 · 2021-11-06T07:55:21+08:00

你的分析是对的；实现合并连接的内部关系的决定是在final_cost_mergejoin.

如果它更便宜或者内部路径的排序会溢出到磁盘，PostgreSQL 将考虑实现。这可以通过关闭来禁用enable_material，所以这是一个有用的测试。在我们的例子中，禁用它仍然会实现索引扫描，因此必须需要实现。

此源注释描述了何时需要：

/*
 * Even if materializing doesn't look cheaper, we *must* do it if the
 * inner path is to be used directly (without sorting) and it doesn't
 * support mark/restore.
 *
 * Since the inner side must be ordered, and only Sorts and IndexScans can
 * create order to begin with, and they both support mark/restore, you
 * might think there's no problem --- but you'd be wrong.  Nestloop and
 * merge joins can *preserve* the order of their inputs, so they can be
 * selected as the input of a mergejoin, and they don't support
 * mark/restore at present.
 *
 * We don't test the value of enable_material here, because
 * materialization is required for correctness in this case, and turning
 * it off does not entitle us to deliver an invalid plan.
 */
else if (innersortkeys == NIL &&
         !ExecSupportsMarkRestore(inner_path))
    path->materialize_inner = true;

ExecSupportsMarkRestore做这个：

bool
ExecSupportsMarkRestore(Path *pathnode)
{
    /*
     * For consistency with the routines above, we do not examine the nodeTag
     * but rather the pathtype, which is the Plan node type the Path would
     * produce.
     */
    switch (pathnode->pathtype)
    {
        case T_IndexScan:
        case T_IndexOnlyScan:

            /*
             * Not all index types support mark/restore.
             */
            return castNode(IndexPath, pathnode)->indexinfo->amcanmarkpos;
    [...]
        default:
            break;
    }

    return false;
}

现在 B-tree 索引确实支持标记/恢复，那是怎么回事呢？

问题是您不是pg_attribute直接加入，而是使用子查询。现在这在最终的执行计划中是看不到的，但是在生成路径的阶段，路径类型不是T_IndexScan，而是T_SubqueryScan，所以ExecSupportsMarkRestore得出结论，我们必须实现。

您可以通过省略ORDER BY和禁用哈希连接来进行测试：

SET enable_hashjoin = off;

explain (costs off) select * from
(select * from pg_class order by oid) as c
join
(select * from pg_attribute a) as a
on c.oid = a.attrelid;

                                QUERY PLAN                                
══════════════════════════════════════════════════════════════════════════
 Merge Join
   Merge Cond: (pg_class.oid = a.attrelid)
   ->  Sort
         Sort Key: pg_class.oid
         ->  Seq Scan on pg_class
   ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a
(6 rows)

瞧——不Materialize。

这可以优化，但我不太了解代码是否可行。

PostgreSQL：在查询计划中实现为 Merge Join 的内部节点

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

PostgreSQL：在查询计划中实现为 Merge Join 的内部节点

2 个回答

相关问题