我可以在使用数据库后激活 PITR 吗？

Question

Asked: 2018-07-28 04:26:36 +0800 CST2018-07-28 04:26:36 +0800 CST 2018-07-28 04:26:36 +0800 CST

PostgreSQL 9.5 查询性能取决于 SELECT 子句中的 JOINed 列

772

数据：

在以下两个表中：

                                      表“public.response_logs”
  栏目 | 类型 | 整理| 可为空 | 默认
----------+------------------------+----------+ ----------+------------------------------------ ----
 编号 | 整数 | | 不为空 | nextval('response_logs_id_seq'::regclass)
 状态 | 二进制 | | |
 uuid | 文字 | | |
 有效载荷 | 文字 | | |
 已访问 | 带时区的时间戳 | | 不为空 | 现在（）
指标：
    “response_logs_pkey” 主键，btree (id)
    “response_logs_uuid_idx” UNIQUE，btree (uuid)
外键约束：
    “response_logs_uuid_fkey” FOREIGN KEY (uuid) REFERENCES request_logs(uuid)

和：

                                        表“public.request_logs”
    栏目 | 类型 | 整理| 可为空 | 默认
--------------+------------------------+------ ----+------------+-------------------------------- ------
 编号 | 整数 | | 不为空 | nextval('access_logs_id_seq'::regclass)
 账号 | 二进制 | | |
 用户名 | 二进制 | | |
 应用程序编号 | 二进制 | | |
 请求 | 文字 | | |
 方法 | 文字 | | |
 已访问 | 带时区的时间戳 | | 不为空 | 现在（）
 uuid | 文字 | | 不为空 |
 有效载荷 | 文字 | | 不为空 | ''：：文本
 apikeyid | 二进制 | | |
 标题 | jsonb | | 不为空 | '[]'::jsonb
指标：
    “request_logs_pkey” 主键，btree (uuid)
    “request_logs_application_idx” btree (applicationid)
引用者：
    表 "response_logs" CONSTRAINT "response_logs_uuid_fkey" FOREIGN KEY (uuid) REFERENCES request_logs(uuid)

我正在执行以下查询：

选择
  req.uuid,
  资源状态，
  请求方法，
  req.requesturi,
  请求访问，
  req.payload reqpayload,
  res.payload respayload, /* #1 - 响应负载 */
  COUNT(*) OVER() AS total_rows /* #2 - 响应中每一行的响应总数 */
从
  request_logs 请求
  INNER JOIN response_logs res ON req.uuid = res.uuid AND res.status IS NOT NULL
在哪里
   req.applicationid = 1 和
   req.accessed BETWEEN '2018-01-01 15:04:05 +0000' AND '2019-01-02 15:04:05+0000' AND
   req.requesturi NOT ILIKE '/v1/sessions%'
订购方式
   访问 DESC LIMIT 1000；

平均需要 270 毫秒。

问题：

出于相当明显的原因，我可以通过省略#2( ) 来加快查询速度。COUNT(*) OVER() AS total_rows它将达到大约 40 毫秒。

然而，这让我感到非常困惑respayload，如果我从响应中省略，查询将达到大约 34 毫秒。

问题！

瓶颈不应该是COUNT(*) OVER() AS total_rows吗？查询如何在仍在计算的情况下达到 40 毫秒？
为什么省略会respayload带来这样的改进？删除reqpayload没有类似的效果，即使我们不添加respayload，我们仍然需要获取response_log相应的 UUID 以将其复制status到响应中。
考虑到唯一改变的查询值是applicationid我们比较的日期和日期accessed，包括索引在内的哪些进一步改进可以提高性能？

更多数据！

这是explain analyse针对不同的查询配置：

与respayload和COUNT(*) OVER() AS total_rows：

Limit  (cost=2826.59..2829.09 rows=1000 width=823) (actual time=408.535..419.136 rows=1000 loops=1)
      ->  Sort  (cost=2826.59..2829.79 rows=1281 width=823) (actual time=408.524..412.154 rows=1000 loops=1)
            Sort Key: req.accessed DESC
            Sort Method: top-N heapsort  Memory: 2064kB
            ->  WindowAgg  (cost=1090.16..2760.47 rows=1281 width=823) (actual time=368.207..390.866 rows=3951 loops=1)
                  ->  Hash Join  (cost=1090.16..2744.46 rows=1281 width=823) (actual time=50.244..127.325 rows=3951 loops=1)
                        Hash Cond: (res.uuid = req.uuid)
                        ->  Seq Scan on response_logs res  (cost=0.00..1607.26 rows=9126 width=758) (actual time=0.008..36.196 rows=9129 loops=1)
                              Filter: (status IS NOT NULL)
                        ->  Hash  (cost=1044.85..1044.85 rows=3625 width=102) (actual time=38.739..38.739 rows=4046 loops=1)
                              Buckets: 4096  Batches: 1  Memory Usage: 1122kB
                              ->  Index Scan using request_logs_application_idx on request_logs req  (cost=0.29..1044.85 rows=3625 width=102) (actual time=0.035..22.009 rows=4046 loops=1)
                                    Index Cond: (applicationid = 1)
                                    Filter: ((accessed >= '2018-01-01 15:04:05+00'::timestamp with time zone) AND (accessed <= '2019-01-02 15:04:05+00'::timestamp with time zone) AND (requesturi !~~* '/v1/sessions%'::text))
    Planning time: 2.699 ms
    Execution time: 423.068 ms

respayload有无COUNT(*) OVER() AS total_rows：_

Limit  (cost=2810.58..2813.08 rows=1000 width=823) (actual time=136.977..146.820 rows=1000 loops=1)
  ->  Sort  (cost=2810.58..2813.78 rows=1281 width=823) (actual time=136.967..140.334 rows=1000 loops=1)
        Sort Key: req.accessed DESC
        Sort Method: top-N heapsort  Memory: 2064kB
        ->  Hash Join  (cost=1090.16..2744.46 rows=1281 width=823) (actual time=47.127..119.808 rows=3951 loops=1)
              Hash Cond: (res.uuid = req.uuid)
              ->  Seq Scan on response_logs res  (cost=0.00..1607.26 rows=9126 width=758) (actual time=0.015..33.307 rows=9129 loops=1)
                    Filter: (status IS NOT NULL)
              ->  Hash  (cost=1044.85..1044.85 rows=3625 width=102) (actual time=38.328..38.328 rows=4046 loops=1)
                    Buckets: 4096  Batches: 1  Memory Usage: 1122kB
                    ->  Index Scan using request_logs_application_idx on request_logs req  (cost=0.29..1044.85 rows=3625 width=102) (actual time=0.047..21.813 rows=4046 loops=1)
                          Index Cond: (applicationid = 1)
                          Filter: ((accessed >= '2018-01-01 15:04:05+00'::timestamp with time zone) AND (accessed <= '2019-01-02 15:04:05+00'::timestamp with time zone) AND (requesturi !~~* '/v1/sessions%'::text))
Planning time: 3.882 ms
Execution time: 150.465 ms

没有respayload和有COUNT(*) OVER() AS total_rows：

Limit  (cost=2826.59..2829.09 rows=1000 width=110) (actual time=164.428..174.760 rows=1000 loops=1)
  ->  Sort  (cost=2826.59..2829.79 rows=1281 width=110) (actual time=164.418..167.956 rows=1000 loops=1)
        Sort Key: req.accessed DESC
        Sort Method: top-N heapsort  Memory: 564kB
        ->  WindowAgg  (cost=1090.16..2760.47 rows=1281 width=110) (actual time=133.997..148.382 rows=3951 loops=1)
              ->  Hash Join  (cost=1090.16..2744.46 rows=1281 width=110) (actual time=46.282..119.070 rows=3951 loops=1)
                    Hash Cond: (res.uuid = req.uuid)
                    ->  Seq Scan on response_logs res  (cost=0.00..1607.26 rows=9126 width=45) (actual time=0.009..33.656 rows=9129 loops=1)
                          Filter: (status IS NOT NULL)
                    ->  Hash  (cost=1044.85..1044.85 rows=3625 width=102) (actual time=37.844..37.844 rows=4046 loops=1)
                          Buckets: 4096  Batches: 1  Memory Usage: 1122kB
                          ->  Index Scan using request_logs_application_idx on request_logs req  (cost=0.29..1044.85 rows=3625 width=102) (actual time=0.029..21.602 rows=4046 loops=1)
                                Index Cond: (applicationid = 1)
                                Filter: ((accessed >= '2018-01-01 15:04:05+00'::timestamp with time zone) AND (accessed <= '2019-01-02 15:04:05+00'::timestamp with time zone) AND (requesturi !~~* '/v1/sessions%'::text))
Planning time: 3.758 ms
Execution time: 178.675 ms

AlexGordon · Answer 1 · 2018-07-28T08:20:40+08:00

在尝试了各种配置之后，我想我找到了自己问题的答案。这不是 100% 证明，但它是有道理的。

该COUNT(*) OVER() as total_rows命令执行整个查询，包括子句WHERE和字段SELECT。所以当结果集非常大，比大很多时LIMIT 1000，我们仍然会根据中指示的列复制所有匹配的记录SELECT，包括respayload.

由于respayload可以相当大，当无限响应计数很高时，比方说 10K，然后respayload复制 10K s，只是后来被丢弃，因为所选列的复制仅针对前 1000 条（由于限制）记录再次发生.

respayload这就是为什么从语句中省略可以显着提高性能的原因。

以后我会小心使用COUNT(*) OVER()，因为这似乎受到所选列的严重影响。

PostgreSQL 9.5 查询性能取决于 SELECT 子句中的 JOINed 列

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

PostgreSQL 9.5 查询性能取决于 SELECT 子句中的 JOINed 列

1 个回答

相关问题