AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / dba / 问题 / 36842
Accepted
Milovan Zogovic
Milovan Zogovic
Asked: 2013-03-13 09:05:29 +0800 CST2013-03-13 09:05:29 +0800 CST 2013-03-13 09:05:29 +0800 CST

如何链接 postgres 规则?

  • 772

我已经使用 postgresql 规则实施了数据非规范化策略。出于性能原因,我选择了规则而不是触发器。


Schema 的结构如下:

  • 应用程序有很多客户
  • 客户有很多项目
  • 项目有很多用户

hits系统的一部分是为表中的每个用户存储stats。Hit 是一个虚构的指标,它并不真正相关。系统可以收集许多这些指标。统计表中有很多记录(每天超过 1,000,000 条)。

我想知道给定日期每个用户、每个项目、每个客户和每个应用程序有多少次点击。

为了使其快速运行,我按天对统计数据进行分组并将输出存储到 user_hits 表中。在此过程中,还添加了 application_id、client_id 和 project_id(作为列),并创建了适当的索引。

我想通过按 project_id、client_id 和最后的 application_id 分组来进一步优化流程。数据管道是这样的:

stats -> user_hits -> project_hits -> client_hits -> application_hits

我想确保当我删除user_hits给定日期的数据时project_hits,同一日期的数据也会被删除。这个过程应该传播到链中的最后一个表。

我定义了这些简单的规则:

CREATE RULE delete_children AS ON DELETE TO user_hits
  DO ALSO
  DELETE FROM project_hits WHERE day = OLD.day;

CREATE RULE delete_children AS ON DELETE TO project_hits
  DO ALSO
  DELETE FROM client_hits WHERE day = OLD.day;

CREATE RULE delete_children AS ON DELETE TO client_hits
  DO ALSO
  DELETE FROM application_hits WHERE day = OLD.day;

但是,当我发表这样的声明时:

DELETE FROM user_hits WHERE day = current_date;

我希望它运行这 3 个查询作为回报:

DELETE FROM project_hits WHERE day = current_date;
DELETE FROM client_hits WHERE day = current_date;
DELETE FROM application_hits WHERE day = current_date;

然而,事实并非如此。

它完成了操作,但需要几分钟才能完成(使用测试数据)。使用真实数据需要数小时,而手动运行这 3 个查询需要几毫秒。它花费的时间似乎与组合的数量成正比(用户 x 项目 x 客户 x 应用程序)。

这里有什么问题?我错过了什么吗?这可以用触发器以优化的方式实现吗?


包括重现问题的示例脚本:

https://gist.github.com/assembler/5151102


user_hits更新:从到project_hits(等等)的过渡由工作进程在后台完成(因为它涉及联系第 3 方服务以获取更多信息)。它足够聪明,可以重新计算缺失日期的所有内容。所以我唯一需要的是一种以优化方式级联删除记录的方法。


更新:stats每天都会填写表格。唯一可能的情况是无条件删除一整天的数据,然后用新值替换它。


更新:我注意到受影响的行数(从explain语句中提取)正好等于user_hits、project_hits、client_hits和application_hits表(数亿行)中受影响行的乘积。

事实证明它是这样工作的:

  1. 我跑DELETE FROM user_hits WHERE day = current_date;
  2. 对于user_hits表中的每一行,触发规则,从中删除每一行project_hits
  3. 对于 的每一行project_hits,都会触发规则,从中删除每一行client_hits
  4. 对于 的每一行client_hits,都会触发规则,从中删除每一行application_hits

因此,操作数等于这些表中受影响行数的乘积。

postgresql
  • 3 3 个回答
  • 1177 Views

3 个回答

  • Voted
  1. Best Answer
    Chris Travers
    2013-03-16T23:34:34+08:002013-03-16T23:34:34+08:00

    下次,请包含 EXPLAIN 输出,而不是让我们在您的脚本中挖掘它。不能保证我的系统使用与您的相同的计划(尽管使用您的测试数据很可能)。

    这里的规则系统运行正常。首先,我想包括我自己的诊断查询(注意我没有运行 EXPLAIN ANALYZE 因为我只是对生成的查询计划感兴趣):

    rulestest=# explain DELETE FROM user_hits WHERE day = '2013-03-16';
                                                  QUERY PLAN                        
    
    --------------------------------------------------------------------------------
    ----------------------
     Delete on application_hits  (cost=0.00..3953181.85 rows=316094576 width=24)
       ->  Nested Loop  (cost=0.00..3953181.85 rows=316094576 width=24)
             ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
                   Filter: (day = '2013-03-16'::date)
             ->  Materialize  (cost=0.00..128.53 rows=6352 width=22)
                   ->  Nested Loop  (cost=0.00..96.78 rows=6352 width=22)
                         ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 wi
    dth=10)
                               Filter: (day = '2013-03-16'::date)
                         ->  Materialize  (cost=0.00..2.49 rows=16 width=16)
                               ->  Nested Loop  (cost=0.00..2.41 rows=16 width=16)
                                     ->  Seq Scan on application_hits  (cost=0.00..1
    .10 rows=4 width=10)
                                           Filter: (day = '2013-03-16'::date)
                                     ->  Materialize  (cost=0.00..1.12 rows=4 width=
    10)
                                           ->  Seq Scan on client_hits  (cost=0.00..
    1.10 rows=4 width=10)
                                                 Filter: (day = '2013-03-16'::date)
    
     Delete on client_hits  (cost=0.00..989722.41 rows=79023644 width=18)
       ->  Nested Loop  (cost=0.00..989722.41 rows=79023644 width=18)
             ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
                   Filter: (day = '2013-03-16'::date)
             ->  Materialize  (cost=0.00..43.83 rows=1588 width=16)
                   ->  Nested Loop  (cost=0.00..35.89 rows=1588 width=16)
                         ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 wi
    dth=10)
                               Filter: (day = '2013-03-16'::date)
                         ->  Materialize  (cost=0.00..1.12 rows=4 width=10)
                               ->  Seq Scan on client_hits  (cost=0.00..1.10 rows=4 
    width=10)
                                     Filter: (day = '2013-03-16'::date)
    
     Delete on project_hits  (cost=0.00..248851.80 rows=19755911 width=12)
       ->  Nested Loop  (cost=0.00..248851.80 rows=19755911 width=12)
             ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
                   Filter: (day = '2013-03-16'::date)
             ->  Materialize  (cost=0.00..16.91 rows=397 width=10)
                   ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 width=10
    )
                         Filter: (day = '2013-03-16'::date)
    
     Delete on user_hits  (cost=0.00..1887.00 rows=49763 width=6)
       ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=6)
             Filter: (day = '2013-03-16'::date)
    (39 rows)
    
    rulestest=# select distinct day from application_hits;
        day     
    ------------
     2013-03-15
     2013-03-16
    (2 rows)
    
    rulestest=# select count(*), day from application_hits group by day;
     count |    day     
    -------+------------
         4 | 2013-03-15
         4 | 2013-03-16
    (2 rows)
    
    rulestest=# select count(*), day from client_hits group by day;
     count |    day     
    -------+------------
         4 | 2013-03-15
         4 | 2013-03-16
    (2 rows)
    
    rulestest=# select count(*), day from project_hits group by day;
     count |    day     
    -------+------------
       397 | 2013-03-15
       397 | 2013-03-16
    (2 rows)
    

    如果您的数据与现有数据有任何相似之处,则规则和触发器都不会很好地工作。更好的是一个存储过程,你传递一个值并删除你想要的一切。

    首先让我们注意这里的索引将无处可去,因为在所有情况下您都在拉取一半的表(我确实在所有表上添加了一天的索引以帮助计划者,但这没有真正的区别)。

    您需要从使用规则开始。RULEs 基本上重写了查询,并且它们使用尽可能健壮的方式来这样做。尽管您的代码更符合您的问题,但您的代码也不符合您的示例。您在表上有规则,这些规则级联到其他表上的规则,这些规则级联到其他表上的规则

    因此,当您 时delete from user_hits where [criteria],规则将其转换为一组查询:

    DELETE FROM application_hits 
     WHERE day IN (SELECT day FROM client_hits 
                   WHERE day IN (SELECT day FROM user_hits WHERE [condition]));
    DELETE FROM client_hits
      WHERE day IN (SELECT day FROM user_hits WHERE [condition]);
    DELETE FROM user_hits WHERE [condition];
    

    现在,您可能认为我们可以首先跳过对 client_hits 的扫描,但这不是这里发生的事情。问题是您可能在 user_hits 和 application_hits 中有几天不在 client_hits 中,因此您真的必须扫描所有表。

    现在这里没有灵丹妙药。触发器不会更好地工作,因为虽然它可以避免扫描每个表,但它会在被删除的每一行中被触发,因此您基本上最终会得到相同的嵌套循环顺序扫描,而这些扫描目前正在降低性能。它会工作得更好一些,因为它会沿途删除行而不是沿途重写查询,但它不会执行得很好。

    一个更好的解决方案是只定义一个存储过程并让应用程序调用它。就像是:

    CREATE OR REPLACE FUNCTION delete_stats_at_date(in_day date) RETURNS BOOL 
    LANGUAGE SQL AS
    $$
    DELETE FROM application_hits WHERE day = $1;
    DELETE FROM project_hits WHERE day = $1;
    DELETE FROM client_hits WHERE day  = $1;
    DELETE FROM user_hits WHERE day = $1;
    SELECT TRUE;
    $$;
    

    根据测试数据,这在我的笔记本电脑上运行了 280 毫秒。

    关于规则的困难之一是记住它们是什么,并注意到计算机实际上无法读懂你的想法。这就是为什么我不认为它们是初学者的工具。

    • 7
  2. wildplasser
    2013-03-14T04:33:50+08:002013-03-14T04:33:50+08:00
    -- this is the datamodel with the (correct?) PRIMARY and FOREIGN KEYs 
    -- the `deferrable initially deferred` thing is there to accomodate the
    -- table filling (which had to be altered to avoid duplicate keys)
    -- ------------------------------------------------------------------------
    
    DROP SCHEMA tmp CASCADE;
    CREATE SCHEMA tmp ;
    SET search_path = tmp ;
    
    -- table definitions
    
    CREATE TABLE application_hits
      ( zday DATE NOT NULL
      , application_id INTEGER NOT NULL
      , hits INTEGER NOT NULL DEFAULT 0
            , PRIMARY KEY (zday,application_id)
      );
    
    CREATE TABLE client_hits
      ( zday DATE NOT NULL
      , client_id INTEGER NOT NULL
      , application_id INTEGER NOT NULL
      , hits INTEGER NOT NULL DEFAULT 0
            , PRIMARY KEY (zday,client_id,application_id)
            , FOREIGN KEY (zday,application_id)
               REFERENCES application_hits (zday,application_id) DEFERRABLE INITIALLY DEFERRED
      );
    
    CREATE TABLE project_hits
      ( zday DATE NOT NULL
      , project_id INTEGER NOT NULL
      , client_id INTEGER NOT NULL
      , application_id INTEGER NOT NULL
      , hits INTEGER NOT NULL DEFAULT 0
            , PRIMARY KEY (zday,project_id,client_id,application_id)
            , FOREIGN KEY (zday,client_id,application_id)
               REFERENCES client_hits (zday,client_id,application_id) DEFERRABLE INITIALLY DEFERRED
      );
    
    CREATE TABLE user_hits
      ( zday DATE NOT NULL
      , user_id INTEGER NOT NULL
      , project_id INTEGER NOT NULL
      , client_id INTEGER NOT NULL
      , application_id INTEGER NOT NULL
      , hits INTEGER NOT NULL DEFAULT 0
            , PRIMARY KEY (zday,user_id,project_id,client_id,application_id)
            , FOREIGN KEY (zday,project_id,client_id,application_id)
               REFERENCES project_hits (zday,project_id,client_id,application_id) DEFERRABLE INITIALLY DEFERRED
      );
    
    --- rules
    CREATE RULE delete_children AS ON DELETE TO user_hits
      DO ALSO
      UPDATE project_hits dst SET hits = dst.hits - OLD.hits
            WHERE dst.zday = OLD.zday
            AND dst.project_id = OLD.project_id
            AND dst.client_id = OLD.client_id
            AND dst.application_id = OLD.application_id
            ;
    
    CREATE RULE delete_children AS ON DELETE TO project_hits
      DO ALSO
      UPDATE client_hits dst SET hits = dst.hits - OLD.hits
            WHERE dst.zday = OLD.zday
            AND dst.client_id = OLD.client_id
            AND dst.application_id = OLD.application_id
            ;
    
    CREATE RULE delete_children AS ON DELETE TO client_hits
      DO ALSO
      UPDATE application_hits dst SET hits = dst.hits - OLD.hits
            WHERE dst.zday = OLD.zday
            AND dst.application_id = OLD.application_id
            ;
    
            -- Rules for UPDATE
    CREATE RULE update_children AS ON UPDATE TO project_hits
      DO ALSO
      UPDATE client_hits dst SET hits = dst.hits - OLD.hits +NEW.hits
            WHERE dst.zday = OLD.zday
            AND dst.client_id = OLD.client_id
            AND dst.application_id = OLD.application_id
            ;
    
    CREATE RULE update_children AS ON UPDATE TO client_hits
      DO ALSO
      UPDATE application_hits dst SET hits = dst.hits - OLD.hits +NEW.hits
            WHERE dst.zday = OLD.zday
            AND dst.application_id = OLD.application_id
            ;
    
    -- filling user_hits
    BEGIN WORK;
    INSERT INTO user_hits (zday, user_id, project_id, client_id, application_id, hits)
    SELECT
      current_date - (s%1313)::INT
            , (s % 1001)::INT , (s % 101)::INT , (s%13)::INT , (s%11)::INT
            , (50*random())::INT
    FROM
      generate_series(1, 100000) s;
    
    -- filling project_hits
    INSERT INTO project_hits (zday, project_id, client_id, application_id, hits)
    SELECT zday, project_id, client_id, application_id, SUM(hits)
    FROM user_hits
    GROUP BY zday, project_id, client_id, application_id
            ;
    
    -- filling client_hits
    INSERT INTO client_hits (zday, client_id, application_id, hits)
    SELECT zday , client_id , application_id , SUM(hits)
    FROM project_hits
    GROUP BY zday, client_id, application_id;
    
    -- filling application_hits
    INSERT INTO application_hits (zday, application_id, hits)
    SELECT zday, application_id, SUM(hits)
    FROM client_hits
    GROUP BY zday, application_id
            ;
    COMMIT WORK;
    
    
    -- create view for today
    CREATE VIEW v_today
    AS SELECT
      (SELECT SUM(hits) FROM user_hits WHERE zday = current_date) AS user_hits
      , (SELECT SUM(hits) FROM project_hits WHERE zday = current_date) AS project_hits
      , (SELECT SUM(hits) FROM client_hits WHERE zday = current_date) AS client_hits
      , (SELECT SUM(hits) FROM application_hits WHERE zday = current_date) AS application_hits
       ;
    
    
    SELECT * FROM v_today;
    -- explain analyse
    DELETE FROM user_hits WHERE zday = current_date;
    SELECT * FROM v_today;
    

    接下来:为INSERT......制定一些规则并且不要忘记UPDATE关键领域的令人讨厌的案例。

    • 0
  3. Andrew Lazarus
    2013-03-14T10:14:22+08:002013-03-14T10:14:22+08:00

    好吧,已经很久了,但是如果你运行一个EXPLAIN我们可以看看我的记忆是否正确。我认为辅助查询的计划是在错误的时间创建的,在计划者可以考虑索引之前。我认为您正在进行表扫描。

    话虽如此,您是否对标准级联删除外键进行基准测试太慢了?那个规则会更快?

    [评论后编辑]

    CREATE RULE delete_children AS ON DELETE TO user_hits
      WHERE day = OLD.day  -- added
      DO ALSO
      DELETE FROM project_hits WHERE day = OLD.day;
    

    仔细阅读文档,似乎(与触发器不同)当使用规则时,它将(在没有添加 where 子句的情况下)应用于整个原始表?!

    • 0

相关问题

  • 我可以在使用数据库后激活 PITR 吗?

  • 运行时间偏移延迟复制的最佳实践

  • 存储过程可以防止 SQL 注入吗?

  • PostgreSQL 中 UniProt 的生物序列

  • PostgreSQL 9.0 Replication 和 Slony-I 有什么区别?

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    如何让sqlplus的输出出现在一行中?

    • 3 个回答
  • Marko Smith

    选择具有最大日期或最晚日期的日期

    • 3 个回答
  • Marko Smith

    如何列出 PostgreSQL 中的所有模式?

    • 4 个回答
  • Marko Smith

    授予用户对所有表的访问权限

    • 5 个回答
  • Marko Smith

    列出指定表的所有列

    • 5 个回答
  • Marko Smith

    如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

    • 4 个回答
  • Marko Smith

    你如何mysqldump特定的表?

    • 4 个回答
  • Marko Smith

    使用 psql 列出数据库权限

    • 10 个回答
  • Marko Smith

    如何从 PostgreSQL 中的选择查询中将值插入表中?

    • 4 个回答
  • Marko Smith

    如何使用 psql 列出所有数据库和表?

    • 7 个回答
  • Martin Hope
    Stéphane 如何列出 PostgreSQL 中的所有模式? 2013-04-16 11:19:16 +0800 CST
  • Martin Hope
    Mike Walsh 为什么事务日志不断增长或空间不足? 2012-12-05 18:11:22 +0800 CST
  • Martin Hope
    Stephane Rolland 列出指定表的所有列 2012-08-14 04:44:44 +0800 CST
  • Martin Hope
    haxney MySQL 能否合理地对数十亿行执行查询? 2012-07-03 11:36:13 +0800 CST
  • Martin Hope
    qazwsx 如何监控大型 .sql 文件的导入进度? 2012-05-03 08:54:41 +0800 CST
  • Martin Hope
    markdorison 你如何mysqldump特定的表? 2011-12-17 12:39:37 +0800 CST
  • Martin Hope
    pedrosanta 使用 psql 列出数据库权限 2011-08-04 11:01:21 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 对 SQL 查询进行计时? 2011-06-04 02:22:54 +0800 CST
  • Martin Hope
    Jonas 如何从 PostgreSQL 中的选择查询中将值插入表中? 2011-05-28 00:33:05 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 列出所有数据库和表? 2011-02-18 00:45:49 +0800 CST

热门标签

sql-server mysql postgresql sql-server-2014 sql-server-2016 oracle sql-server-2008 database-design query-performance sql-server-2017

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve