AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / dba / 问题 / 189190
Accepted
Parker
Parker
Asked: 2017-10-25 04:37:25 +0800 CST2017-10-25 04:37:25 +0800 CST 2017-10-25 04:37:25 +0800 CST

PostgreSQL 9.3 中按周时间戳范围内的查询性能不佳

  • 772

我有一个缓慢的查询,会生成过去一年每周的帐户活动报告。该表当前有近 500 万行,此查询当前需要 8 秒才能执行。(当前)瓶颈是对时间戳范围的顺序扫描。

account=> EXPLAIN ANALYZE SELECT to_timestamp(to_char(date_trunc('week', event_time), 'IYYY-IW'), 'IYYY-IW')::date AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;

 GroupAggregate  (cost=450475.76..513465.44 rows=2290534 width=12) (actual time=7524.474..8003.291 rows=52 loops=1)
   Group Key: ((to_timestamp(to_char(date_trunc('week'::text, event_time), 'IYYY-IW'::text), 'IYYY-IW'::text))::date)
   ->  Sort  (cost=450475.76..456202.09 rows=2290534 width=12) (actual time=7519.053..7691.924 rows=2314164 loops=1)
         Sort Key: ((to_timestamp(to_char(date_trunc('week'::text, event_time), 'IYYY-IW'::text), 'IYYY-IW'::text))::date)
         Sort Method: external sort  Disk: 40704kB
         ->  Seq Scan on account_history  (cost=0.00..169364.81 rows=2290534 width=12) (actual time=1470.438..6222.076 rows=2314164 loops=1)
               Filter: ((event_time <= now()) AND (event_time >= (now() - '357 days'::interval)))
               Rows Removed by Filter: 2591679
 Planning time: 0.126 ms
 Execution time: 8011.160 ms

桌子:

account=> \d account_history
                    Table "public.account_history"
   Column    |            Type             |         Modifiers
-------------+-----------------------------+---------------------------
 account     | integer                     | not null
 event_code  | text                        | not null
 event_time  | timestamp without time zone | not null default now()
 description | text                        | not null default ''::text
Indexes:
    "account_history_idx" btree (account, event_time DESC)
    "account_id_idx" btree (account, event_code, event_time)
Foreign-key constraints:
    "account_fk" FOREIGN KEY (account) REFERENCES account(id) ON UPDATE CASCADE ON DELETE RESTRICT
    "event_code_fk" FOREIGN KEY (event_code) REFERENCES domain_account_event(code) ON UPDATE CASCADE ON DELETE RESTRICT

当我最初创建此表时,我将时间戳列添加为 btree 索引的一部分,但我认为顺序扫描是由于表中的(当时)行数较少(参见相关问题)。

但是,现在表已经增长到数百万,我注意到查询的性能问题,并发现查询中没有使用索引。

我尝试按照这里的建议添加一个有序索引,但这显然也没有在执行计划中使用。

有没有更好的方法来索引这个表,或者我的查询中是否有一些内在的东西绕过了这两个索引?


更新:当我仅在时间戳上添加索引时,将使用该索引。然而,它只减少了 25% 的执行时间:

account=> CREATE INDEX account_history_time_idx ON account_history (event_time DESC);

account=> EXPLAIN ANALYZE VERBOSE SELECT to_timestamp(to_char(date_trunc('week', event_time), 'IYYY-IW'), 'IYYY-IW')::date AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;

 GroupAggregate  (cost=391870.30..454870.16 rows=2290904 width=12) (actual time=5481.930..6104.838 rows=52 loops=1)
   Output: ((to_timestamp(to_char(date_trunc('week'::text, event_time), 'IYYY-IW'::text), 'IYYY-IW'::text))::date), count(DISTINCT account)
   Group Key: ((to_timestamp(to_char(date_trunc('week'::text, account_history.event_time), 'IYYY-IW'::text), 'IYYY-IW'::text))::date)
   ->  Sort  (cost=391870.30..397597.56 rows=2290904 width=12) (actual time=5474.181..5771.903 rows=2314038 loops=1)
         Output: ((to_timestamp(to_char(date_trunc('week'::text, event_time), 'IYYY-IW'::text), 'IYYY-IW'::text))::date), account
         Sort Key: ((to_timestamp(to_char(date_trunc('week'::text, account_history.event_time), 'IYYY-IW'::text), 'IYYY-IW'::text))::date)
         Sort Method: external merge  Disk: 40688kB
         ->  Index Scan using account_history_time_idx on public.account_history  (cost=0.44..110710.59 rows=2290904 width=12) (actual time=0.108..4352.143 rows=2314038 loops=1)
               Output: (to_timestamp(to_char(date_trunc('week'::text, event_time), 'IYYY-IW'::text), 'IYYY-IW'::text))::date, account
               Index Cond: ((account_history.event_time >= (now() - '357 days'::interval)) AND (account_history.event_time <= now()))
 Planning time: 0.204 ms
 Execution time: 6112.832 ms

https://explain.depesz.com/s/PSfU

我也按照这里VACUUM FULL的建议进行了尝试,但执行时间没有区别。


以下是针对同一张表的一些更简单查询的执行计划:

简单地计算行数需要 0.5 秒:

account=> EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM account_history;

 Aggregate  (cost=97401.04..97401.05 rows=1 width=0) (actual time=551.179..551.179 rows=1 loops=1)
   Output: count(*)
   ->  Seq Scan on public.account_history  (cost=0.00..85136.43 rows=4905843 width=0) (actual time=0.039..344.675 rows=4905843 loops=1)
         Output: account, event_code, event_time, description
 Planning time: 0.075 ms
 Execution time: 551.209 ms

并且使用相同的时间范围子句只需不到一秒钟:

account=> EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now();

 Aggregate  (cost=93527.57..93527.58 rows=1 width=0) (actual time=997.436..997.436 rows=1 loops=1)
   Output: count(*)
   ->  Index Only Scan using account_history_time_idx on public.account_history  (cost=0.44..87800.45 rows=2290849 width=0) (actual time=0.100..897.776 rows=2313987 loops=1)
         Output: event_time
         Index Cond: ((account_history.event_time >= (now() - '357 days'::interval)) AND (account_history.event_time <= now()))
         Heap Fetches: 2313987
 Planning time: 0.239 ms
 Execution time: 997.473 ms

根据评论,我尝试了一种简化的查询形式:

account=> EXPLAIN ANALYZE VERBOSE SELECT date_trunc('week', event_time) AS date, count(DISTINCT account) FROM account_history
WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;

 GroupAggregate  (cost=374676.22..420493.00 rows=2290839 width=12) (actual time=2475.556..3078.191 rows=52 loops=1)
   Output: (date_trunc('week'::text, event_time)), count(DISTINCT account)
   Group Key: (date_trunc('week'::text, account_history.event_time))
   ->  Sort  (cost=374676.22..380403.32 rows=2290839 width=12) (actual time=2468.654..2763.739 rows=2313977 loops=1)
         Output: (date_trunc('week'::text, event_time)), account
         Sort Key: (date_trunc('week'::text, account_history.event_time))
         Sort Method: external merge  Disk: 49720kB
         ->  Index Scan using account_history_time_idx on public.account_history  (cost=0.44..93527.35 rows=2290839 width=12) (actual time=0.094..1537.488 rows=2313977 loops=1)
               Output: date_trunc('week'::text, event_time), account
               Index Cond: ((account_history.event_time >= (now() - '357 days'::interval)) AND (account_history.event_time <= now()))
 Planning time: 0.220 ms
 Execution time: 3086.828 ms
(12 rows)

account=> SELECT date_trunc('week', current_date) AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWE
EN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;
          date          | count
------------------------+-------
 2017-10-23 00:00:00-04 |   132
(1 row)

事实上,这将执行时间减少了一半,但不幸的是并没有给出预期的结果,如下所示:

account=> SELECT to_timestamp(to_char(date_trunc('week', event_time), 'IYYY-IW'), 'IYYY-IW')::date AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;
    date    | count
------------+-------
 2016-10-31 |    14
...
 2017-10-23 |   584
(52 rows)

如果我能找到一种更便宜的方法来按周汇总这些记录,那将大大有助于解决这个问题。


我愿意接受有关使用该GROUP BY子句提高每周查询性能的任何建议,包括更改表。

我创建了一个物化视图作为测试,但当然刷新它所花费的时间与原始查询完全相同,所以除非我每天只刷新几次,否则它并没有真正的帮助,代价是添加复杂:

account=> CREATE MATERIALIZED VIEW account_activity_weekly AS SELECT to_timestamp(to_char(date_trunc('week', event_time), 'IYYY-IW'), 'IYYY-IW')::date AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;
SELECT 52

根据附加评论,我将查询修改如下,将执行时间缩短了一半,并提供了预期的结果集:

account=> EXPLAIN ANALYZE VERBOSE SELECT to_timestamp(to_char(date_trunc('week', event_time), 'IYYY-IW'), 'IYYY-IW')::date AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date_trunc('week', event_time) ORDER BY date;

 Sort  (cost=724523.11..730249.97 rows=2290745 width=12) (actual time=3188.495..3188.496 rows=52 loops=1)
   Output: ((to_timestamp(to_char((date_trunc('week'::text, event_time)), 'IYYY-IW'::text), 'IYYY-IW'::text))::date), (count(DISTINCT account)), (date_trunc('week'::text, event_time))
   Sort Key: ((to_timestamp(to_char((date_trunc('week'::text, account_history.event_time)), 'IYYY-IW'::text), 'IYYY-IW'::text))::date)
   Sort Method: quicksort  Memory: 29kB
   ->  GroupAggregate  (cost=374662.50..443384.85 rows=2290745 width=12) (actual time=2573.694..3188.451 rows=52 loops=1)
         Output: (to_timestamp(to_char((date_trunc('week'::text, event_time)), 'IYYY-IW'::text), 'IYYY-IW'::text))::date, count(DISTINCT account), (date_trunc('week'::text, event_time))
         Group Key: (date_trunc('week'::text, account_history.event_time))
         ->  Sort  (cost=374662.50..380389.36 rows=2290745 width=12) (actual time=2566.086..2859.590 rows=2313889 loops=1)
               Output: (date_trunc('week'::text, event_time)), event_time, account
               Sort Key: (date_trunc('week'::text, account_history.event_time))
               Sort Method: external merge  Disk: 67816kB
               ->  Index Scan using account_history_time_idx on public.account_history  (cost=0.44..93524.23 rows=2290745 width=12) (actual time=0.090..1503.985 rows=2313889 loops=1)
                     Output: date_trunc('week'::text, event_time), event_time, account
                     Index Cond: ((account_history.event_time >= (now() - '357 days'::interval)) AND (account_history.event_time <= now()))
 Planning time: 0.205 ms
 Execution time: 3198.125 ms
(16 rows)
optimization postgresql-9.3
  • 2 2 个回答
  • 10744 Views

2 个回答

  • Voted
  1. Best Answer
    Parker
    2017-10-25T07:07:13+08:002017-10-25T07:07:13+08:00

    感谢那些在评论中做出贡献的人,我通过以下方式将查询时间从 ~8000 ms 减少到 ~1650 ms:

    • 仅在时间戳列上添加索引(约 2000 毫秒改进)。
    • 删除额外的时间戳到字符到时间戳的转换(或添加date_trunc('week', event_time)到GROUP BY子句)(大约 3000 毫秒改进)。

    供参考,当前表结构和执行计划如下。

    我确实尝试了多列索引的其他变体,但是执行计划没有使用这些索引。

    另外,我听取了另一条评论的建议,采取了以下步骤(随后是 VACUUM 和 REINDEX):

    • 从描述列中删除约束并将所有空字符串设置为 NULL
    • 将时间戳列WITHOUT TIME ZONE从WITH TIME ZONE
    • 将 work_mem 增加到 100MB(通过postgresql.conf)。

    ALTER TABLE account_history ALTER event_time TYPE timestamptz USING event_time AT TIME ZONE 'UTC';
    ALTER TABLE account_history ALTER COLUMN description DROP NOT NULL;
    ALTER TABLE account_history ALTER COLUMN description DROP DEFAULT;
    UPDATE account_history SET description=NULL WHERE description='';
    VACUUM FULL;
    REINDEX TABLE account_history;
    
    account=> show work_mem;
     work_mem
    ----------
     100MB
    

    这些额外的更改将执行时间又缩短了 400 毫秒,并且还缩短了规划时间。需要注意的一点是,排序方法已经从“外部排序”变成了“外部合并”。由于仍在使用“磁盘”进行排序,因此我将 work_mem 增加到 200MB,从而使用了快速排序(内存)方法(176MB)。这将执行时间缩短了整整一秒(尽管这对于我们的服务器实例来说实在是太高了)。

    更新后的表格和执行计划如下。


    account=> \d account_history
                     Table "public.account_history"
       Column    |           Type           |       Modifiers
    -------------+--------------------------+------------------------
     account     | integer                  | not null
     event_code  | text                     | not null
     event_time  | timestamp with time zone | not null default now()
     description | text                     |
    Indexes:
        "account_history_account_idx" btree (account)
        "account_history_account_time_idx" btree (event_time DESC, account)
        "account_history_time_idx" btree (event_time DESC)
    Foreign-key constraints:
        "account_fk" FOREIGN KEY (account) REFERENCES account(id) ON UPDATE CASCADE ON DELETE RESTRICT
        "event_code_fk" FOREIGN KEY (event_code) REFERENCES domain_account_event(code) ON UPDATE CASCADE ON DELETE RESTRICT
    

    account=> EXPLAIN ANALYZE VERBOSE SELECT date_trunc('week', event_time) AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;
    
     GroupAggregate  (cost=334034.60..380541.52 rows=2325346 width=12) (actual time=1307.742..1685.676 rows=52 loops=1)
       Output: (date_trunc('week'::text, event_time)), count(DISTINCT account)
       Group Key: (date_trunc('week'::text, account_history.event_time))
       ->  Sort  (cost=334034.60..339847.97 rows=2325346 width=12) (actual time=1303.565..1361.540 rows=2312418 loops=1)
             Output: (date_trunc('week'::text, event_time)), account
             Sort Key: (date_trunc('week'::text, account_history.event_time))
             Sort Method: quicksort  Memory: 176662kB
             ->  Index Only Scan using account_history_account_time_idx on public.account_history  (cost=0.44..88140.73 rows=2325346 width=12) (actual time=0.028..980.822 rows=2312418 loops=1)
                   Output: date_trunc('week'::text, event_time), account
                   Index Cond: ((account_history.event_time >= (now() - '357 days'::interval)) AND (account_history.event_time <= now()))
                   Heap Fetches: 0
     Planning time: 0.153 ms
     Execution time: 1697.824 ms
    

    到目前为止,我对改进感到非常满意,但我欢迎任何其他有助于提高此查询性能的贡献,因为在我的观点中,这仍然是我拥有的最慢的查询。

    • 2
  2. Michael
    2019-03-05T00:56:57+08:002019-03-05T00:56:57+08:00

    解决查询问题之间的快速日期。我将日期转换为 Unix 时间 (UTC)(我只需要“秒”精度,但如果需要,您可以做得更好)然后创建一个方法将您的日期转换为 bigint/long(在此处包括您的时区转换)。然后运行您的查询并在 2 个整数之间进行搜索。可能听起来有点狂野,但工作起来就像一场梦。

    • 0

相关问题

  • Yelp 如何有效地计算数据库中的距离?

  • 查询优化

  • 我应该如何优化此表的存储?

  • oracle 中的 DBMS_REDEFINITION 与 EXCHANGE PARTITION

  • 将 EXPLAIN 成本转换为(挂钟)运行时是否有好的“经验法则”?

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目

    • 12 个回答
  • Marko Smith

    如何让sqlplus的输出出现在一行中?

    • 3 个回答
  • Marko Smith

    选择具有最大日期或最晚日期的日期

    • 3 个回答
  • Marko Smith

    如何列出 PostgreSQL 中的所有模式?

    • 4 个回答
  • Marko Smith

    列出指定表的所有列

    • 5 个回答
  • Marko Smith

    如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

    • 4 个回答
  • Marko Smith

    你如何mysqldump特定的表?

    • 4 个回答
  • Marko Smith

    使用 psql 列出数据库权限

    • 10 个回答
  • Marko Smith

    如何从 PostgreSQL 中的选择查询中将值插入表中?

    • 4 个回答
  • Marko Smith

    如何使用 psql 列出所有数据库和表?

    • 7 个回答
  • Martin Hope
    Jin 连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目 2014-12-02 02:54:58 +0800 CST
  • Martin Hope
    Stéphane 如何列出 PostgreSQL 中的所有模式? 2013-04-16 11:19:16 +0800 CST
  • Martin Hope
    Mike Walsh 为什么事务日志不断增长或空间不足? 2012-12-05 18:11:22 +0800 CST
  • Martin Hope
    Stephane Rolland 列出指定表的所有列 2012-08-14 04:44:44 +0800 CST
  • Martin Hope
    haxney MySQL 能否合理地对数十亿行执行查询? 2012-07-03 11:36:13 +0800 CST
  • Martin Hope
    qazwsx 如何监控大型 .sql 文件的导入进度? 2012-05-03 08:54:41 +0800 CST
  • Martin Hope
    markdorison 你如何mysqldump特定的表? 2011-12-17 12:39:37 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 对 SQL 查询进行计时? 2011-06-04 02:22:54 +0800 CST
  • Martin Hope
    Jonas 如何从 PostgreSQL 中的选择查询中将值插入表中? 2011-05-28 00:33:05 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 列出所有数据库和表? 2011-02-18 00:45:49 +0800 CST

热门标签

sql-server mysql postgresql sql-server-2014 sql-server-2016 oracle sql-server-2008 database-design query-performance sql-server-2017

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve