AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / dba / 问题 / 9043
Accepted
Bruno
Bruno
Asked: 2011-12-12 11:38:13 +0800 CST2011-12-12 11:38:13 +0800 CST 2011-12-12 11:38:13 +0800 CST

PostgreSQL:在同一查询中重用复杂的中间结果

  • 772

使用 PostgreSQL (8.4),我正在创建一个视图,该视图总结了几个表中的各种结果(例如,在视图中创建列a、、、),然后我需要在同一个查询中将其中一些结果组合在一起(例如,、、, ...),从而产生最终结果。我注意到的是,每次使用中间结果时都会对其进行完全计算,即使它是在同一个查询中完成的。bca+ba-b(a+b)/c

有没有办法对此进行优化以避免每次都计算相同的结果?

这是一个重现问题的简化示例。

CREATE TABLE test1 (
    id SERIAL PRIMARY KEY,
    log_timestamp TIMESTAMP NOT NULL
);
CREATE TABLE test2 (
    test1_id INTEGER NOT NULL REFERENCES test1(id),
    category VARCHAR(10) NOT NULL,
    col1 INTEGER,
    col2 INTEGER
);
CREATE INDEX test_category_idx ON test2(category);

-- Added after edit to this question
CREATE INDEX test_id_idx ON test2(test1_id);

-- Populating with test data.
INSERT INTO test1(log_timestamp)
    SELECT * FROM generate_series('2011-01-01'::timestamp, '2012-01-01'::timestamp, '1 hour');
INSERT INTO test2
    SELECT id, substr(upper(md5(random()::TEXT)), 1, 1),
               (20000*random()-10000)::int, (3000*random()-200)::int FROM test1;
INSERT INTO test2
    SELECT id, substr(upper(md5(random()::TEXT)), 1, 1),
               (2000*random()-1000)::int, (3000*random()-200)::int FROM test1;
INSERT INTO test2
    SELECT id, substr(upper(md5(random()::TEXT)), 1, 1),
               (2000*random()-40)::int, (3000*random()-200)::int FROM test1;

这是执行最耗时操作的视图:

CREATE VIEW testview1 AS
    SELECT
       t1.id,
       t1.log_timestamp,
       (SELECT SUM(t2.col1) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='A') AS a,
       (SELECT SUM(t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='B') AS b,
       (SELECT SUM(t2.col1 - t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='C') AS c
    FROM test1 t1;
  • SELECT a FROM testview1产生这个计划(通过EXPLAIN ANALYZE):

     Seq Scan on test1 t1  (cost=0.00..1787086.55 rows=8761 width=4) (actual time=12.877..10517.575 rows=8761 loops=1)
       SubPlan 1
         ->  Aggregate  (cost=203.96..203.97 rows=1 width=4) (actual time=1.193..1.193 rows=1 loops=8761)
               ->  Bitmap Heap Scan on test2 t2  (cost=36.49..203.95 rows=1 width=4) (actual time=1.109..1.177 rows=0 loops=8761)
                     Recheck Cond: ((category)::text = 'A'::text)
                     Filter: (test1_id = $0)
                     ->  Bitmap Index Scan on test_category_idx  (cost=0.00..36.49 rows=1631 width=0) (actual time=0.414..0.414 rows=1631 loops=8761)
                           Index Cond: ((category)::text = 'A'::text)
     Total runtime: 10522.346 ms
    

  • SELECT a, a FROM testview1产生这个计划:

     Seq Scan on test1 t1  (cost=0.00..3574037.50 rows=8761 width=4) (actual time=3.343..20550.817 rows=8761 loops=1)
       SubPlan 1
         ->  Aggregate  (cost=203.96..203.97 rows=1 width=4) (actual time=1.183..1.183 rows=1 loops=8761)
               ->  Bitmap Heap Scan on test2 t2  (cost=36.49..203.95 rows=1 width=4) (actual time=1.100..1.166 rows=0 loops=8761)
                     Recheck Cond: ((category)::text = 'A'::text)
                     Filter: (test1_id = $0)
                     ->  Bitmap Index Scan on test_category_idx  (cost=0.00..36.49 rows=1631 width=0) (actual time=0.418..0.418 rows=1631 loops=8761)
                           Index Cond: ((category)::text = 'A'::text)
       SubPlan 2
         ->  Aggregate  (cost=203.96..203.97 rows=1 width=4) (actual time=1.154..1.154 rows=1 loops=8761)
               ->  Bitmap Heap Scan on test2 t2  (cost=36.49..203.95 rows=1 width=4) (actual time=1.083..1.143 rows=0 loops=8761)
                     Recheck Cond: ((category)::text = 'A'::text)
                     Filter: (test1_id = $0)
                     ->  Bitmap Index Scan on test_category_idx  (cost=0.00..36.49 rows=1631 width=0) (actual time=0.426..0.426 rows=1631 loops=8761)
                           Index Cond: ((category)::text = 'A'::text)
     Total runtime: 20557.581 ms
    

在这里,选择a, a花费的时间是选择的两倍a,而它们实际上可以只计算一次。例如,使用SELECT a, a+b, a-b FROM testview1,它通过子计划a3 次和b两次,而执行时间可以减少到总时间的 2/5(假设 + 和 - 在这里可以忽略不计)。

这是一件好事,它不会在不需要时计算未使用的列 (b和c),但是有没有办法让它只从视图中计算相同的使用列一次?

编辑: @Frank Heikens 正确地建议使用上面示例中缺少的索引。虽然它确实提高了每个子计划的速度,但它不会阻止多次计算相同的子查询。抱歉,我应该在最初的问题中说明这一点。

postgresql performance
  • 2 2 个回答
  • 4273 Views

2 个回答

  • Voted
  1. Best Answer
    Bruno
    2011-12-13T02:22:33+08:002011-12-13T02:22:33+08:00

    (很抱歉回答我自己的问题,但在阅读了这个不相关的问题和答案后,我想到我应该尝试使用 CTE。它有效。)

    这是另一个视图,类似于testview1问题中的视图,但使用了Common Table Expression:

    CREATE VIEW testview2 AS
        WITH testcte AS (SELECT
           t1.id,
           t1.log_timestamp,
           (SELECT SUM(t2.col1) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='A') AS a,
           (SELECT SUM(t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='B') AS b,
           (SELECT SUM(t2.col1 - t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='C') AS c
          FROM test1 t1)
        SELECT * FROM testcte;
    

    (这只是一个例子,我并不是说将视图和 CTE 结合起来一定是一个好主意:一个 CTE 可能就足够了。)

    与 不同testview1的是,现在的查询计划SELECT a FROM testview2还计算b和c,因为在 中未使用而被忽略testview1:

    Subquery Scan testview2  (cost=395272.42..395535.25 rows=8761 width=8) (actual time=0.256..607.941 rows=8761 loops=1)
       ->  CTE Scan on testcte  (cost=395272.42..395447.64 rows=8761 width=36) (actual time=0.255..604.106 rows=8761 loops=1)
             CTE testcte
               ->  Seq Scan on test1 t1  (cost=0.00..395272.42 rows=8761 width=12) (actual time=0.252..589.358 rows=8761 loops=1)
                     SubPlan 1
                       ->  Aggregate  (cost=15.02..15.03 rows=1 width=4) (actual time=0.021..0.021 rows=1 loops=8761)
                             ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=4) (actual time=0.015..0.015 rows=0 loops=8761)
                                   Recheck Cond: (test1_id = $0)
                                   Filter: ((category)::text = 'A'::text)
                                   ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.009..0.009 rows=3 loops=8761)
                                         Index Cond: (test1_id = $0)
                     SubPlan 2
                       ->  Aggregate  (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
                             ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=8761)
                                   Recheck Cond: (test1_id = $0)
                                   Filter: ((category)::text = 'B'::text)
                                   ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
                                         Index Cond: (test1_id = $0)
                     SubPlan 3
                       ->  Aggregate  (cost=15.02..15.04 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=8761)
                             ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=8) (actual time=0.013..0.014 rows=0 loops=8761)
                                   Recheck Cond: (test1_id = $0)
                                   Filter: ((category)::text = 'C'::text)
                                   ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
                                         Index Cond: (test1_id = $0)
    

    但是,它不会重新计算在同一查询中多次使用的结果(这是目标)。

    与testview1whichSELECT a, a, a, a, a花费 5 倍的时间不同SELECT a,这里花费的时间与orSELECT a, a, a, a, a, b, c, a+b, a+c, b+c FROM testview2一样长。它只通过,一次:SELECT a FROM testview2SELECT a, b, c FROM testview2abc

     Subquery Scan testview2  (cost=395272.42..395600.96 rows=8761 width=24) (actual time=0.147..562.790 rows=8761 loops=1)
       ->  CTE Scan on testcte  (cost=395272.42..395447.64 rows=8761 width=36) (actual time=0.144..554.194 rows=8761 loops=1)
             CTE testcte
               ->  Seq Scan on test1 t1  (cost=0.00..395272.42 rows=8761 width=12) (actual time=0.140..542.657 rows=8761 loops=1)
                     SubPlan 1
                       ->  Aggregate  (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
                             ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.013 rows=0 loops=8761)
                                   Recheck Cond: (test1_id = $0)
                                   Filter: ((category)::text = 'A'::text)
                                   ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
                                         Index Cond: (test1_id = $0)
                     SubPlan 2
                       ->  Aggregate  (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
                             ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=8761)
                                   Recheck Cond: (test1_id = $0)
                                   Filter: ((category)::text = 'B'::text)
                                   ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.006..0.006 rows=3 loops=8761)
                                         Index Cond: (test1_id = $0)
                     SubPlan 3
                       ->  Aggregate  (cost=15.02..15.04 rows=1 width=8) (actual time=0.018..0.019 rows=1 loops=8761)
                             ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=8) (actual time=0.012..0.012 rows=0 loops=8761)
                                   Recheck Cond: (test1_id = $0)
                                   Filter: ((category)::text = 'C'::text)
                                   ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
                                         Index Cond: (test1_id = $0)
    
    • 6
  2. Frank Heikens
    2011-12-12T12:19:06+08:002011-12-12T12:19:06+08:00

    您需要表 test2 中 test1_id 的索引,这会改变事情。

    Seq Scan on test1 t1  (cost=0.00..301450.63 rows=8761 width=12) (actual time=0.108..229.859 rows=8761 loops=1)
      SubPlan 1
        ->  Aggregate  (cost=11.45..11.46 rows=1 width=4) (actual time=0.008..0.008 rows=1 loops=8761)
              ->  Bitmap Heap Scan on test2 t2  (cost=3.27..11.45 rows=1 width=4) (actual time=0.007..0.007 rows=0 loops=8761)
                    Recheck Cond: (test1_id = t1.id)
                    Filter: ((category)::text = 'A'::text)
                    ->  Bitmap Index Scan on idx_id  (cost=0.00..3.27 rows=3 width=0) (actual time=0.003..0.003 rows=3 loops=8761)
                          Index Cond: (test1_id = t1.id)
      SubPlan 2
        ->  Aggregate  (cost=11.45..11.46 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=8761)
              ->  Bitmap Heap Scan on test2 t2  (cost=3.27..11.45 rows=1 width=4) (actual time=0.006..0.006 rows=0 loops=8761)
                    Recheck Cond: (test1_id = t1.id)
                    Filter: ((category)::text = 'B'::text)
                    ->  Bitmap Index Scan on idx_id  (cost=0.00..3.27 rows=3 width=0) (actual time=0.003..0.003 rows=3 loops=8761)
                          Index Cond: (test1_id = t1.id)
      SubPlan 3
        ->  Aggregate  (cost=11.46..11.47 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=8761)
              ->  Bitmap Heap Scan on test2 t2  (cost=3.27..11.45 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=8761)
                    Recheck Cond: (test1_id = t1.id)
                    Filter: ((category)::text = 'C'::text)
                    ->  Bitmap Index Scan on idx_id  (cost=0.00..3.27 rows=3 width=0) (actual time=0.003..0.003 rows=3 loops=8761)
                          Index Cond: (test1_id = t1.id)
    Total runtime: 232.419 ms
    
    • 3

相关问题

  • PostgreSQL 中 UniProt 的生物序列

  • 如何确定是否需要或需要索引

  • 我在哪里可以找到mysql慢日志?

  • 如何优化大型数据库的 mysqldump?

  • PostgreSQL 9.0 Replication 和 Slony-I 有什么区别?

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    你如何mysqldump特定的表?

    • 4 个回答
  • Marko Smith

    您如何显示在 Oracle 数据库上执行的 SQL?

    • 2 个回答
  • Marko Smith

    如何选择每组的第一行?

    • 6 个回答
  • Marko Smith

    使用 psql 列出数据库权限

    • 10 个回答
  • Marko Smith

    我可以查看在 SQL Server 数据库上运行的历史查询吗?

    • 6 个回答
  • Marko Smith

    如何在 PostgreSQL 中使用 currval() 来获取最后插入的 id?

    • 10 个回答
  • Marko Smith

    如何在 Mac OS X 上运行 psql?

    • 11 个回答
  • Marko Smith

    如何从 PostgreSQL 中的选择查询中将值插入表中?

    • 4 个回答
  • Marko Smith

    如何使用 psql 列出所有数据库和表?

    • 7 个回答
  • Marko Smith

    将数组参数传递给存储过程

    • 12 个回答
  • Martin Hope
    Manuel Leduc PostgreSQL 多列唯一约束和 NULL 值 2011-12-28 01:10:21 +0800 CST
  • Martin Hope
    markdorison 你如何mysqldump特定的表? 2011-12-17 12:39:37 +0800 CST
  • Martin Hope
    Stuart Blackler 什么时候应该将主键声明为非聚集的? 2011-11-11 13:31:59 +0800 CST
  • Martin Hope
    pedrosanta 使用 psql 列出数据库权限 2011-08-04 11:01:21 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 对 SQL 查询进行计时? 2011-06-04 02:22:54 +0800 CST
  • Martin Hope
    Jonas 如何从 PostgreSQL 中的选择查询中将值插入表中? 2011-05-28 00:33:05 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 列出所有数据库和表? 2011-02-18 00:45:49 +0800 CST
  • Martin Hope
    BrunoLM Guid vs INT - 哪个更好作为主键? 2011-01-05 23:46:34 +0800 CST
  • Martin Hope
    bernd_k 什么时候应该使用唯一约束而不是唯一索引? 2011-01-05 02:32:27 +0800 CST
  • Martin Hope
    Patrick 如何优化大型数据库的 mysqldump? 2011-01-04 13:13:48 +0800 CST

热门标签

sql-server mysql postgresql sql-server-2014 sql-server-2016 oracle sql-server-2008 database-design query-performance sql-server-2017

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve