数据:
在以下两个表中:
表“public.response_logs” 栏目 | 类型 | 整理| 可为空 | 默认 ----------+------------------------+----------+ ----------+------------------------------------ ---- 编号 | 整数 | | 不为空 | nextval('response_logs_id_seq'::regclass) 状态 | 二进制 | | | uuid | 文字 | | | 有效载荷 | 文字 | | | 已访问 | 带时区的时间戳 | | 不为空 | 现在() 指标: “response_logs_pkey” 主键,btree (id) “response_logs_uuid_idx” UNIQUE,btree (uuid) 外键约束: “response_logs_uuid_fkey” FOREIGN KEY (uuid) REFERENCES request_logs(uuid)
和:
表“public.request_logs” 栏目 | 类型 | 整理| 可为空 | 默认 --------------+------------------------+------ ----+------------+-------------------------------- ------ 编号 | 整数 | | 不为空 | nextval('access_logs_id_seq'::regclass) 账号 | 二进制 | | | 用户名 | 二进制 | | | 应用程序编号 | 二进制 | | | 请求 | 文字 | | | 方法 | 文字 | | | 已访问 | 带时区的时间戳 | | 不为空 | 现在() uuid | 文字 | | 不为空 | 有效载荷 | 文字 | | 不为空 | ''::文本 apikeyid | 二进制 | | | 标题 | jsonb | | 不为空 | '[]'::jsonb 指标: “request_logs_pkey” 主键,btree (uuid) “request_logs_application_idx” btree (applicationid) 引用者: 表 "response_logs" CONSTRAINT "response_logs_uuid_fkey" FOREIGN KEY (uuid) REFERENCES request_logs(uuid)
我正在执行以下查询:
选择 req.uuid, 资源状态, 请求方法, req.requesturi, 请求访问, req.payload reqpayload, res.payload respayload, /* #1 - 响应负载 */ COUNT(*) OVER() AS total_rows /* #2 - 响应中每一行的响应总数 */ 从 request_logs 请求 INNER JOIN response_logs res ON req.uuid = res.uuid AND res.status IS NOT NULL 在哪里 req.applicationid = 1 和 req.accessed BETWEEN '2018-01-01 15:04:05 +0000' AND '2019-01-02 15:04:05+0000' AND req.requesturi NOT ILIKE '/v1/sessions%' 订购方式 访问 DESC LIMIT 1000;
平均需要 270 毫秒。
问题:
出于相当明显的原因,我可以通过省略#2
( ) 来加快查询速度。COUNT(*) OVER() AS total_rows
它将达到大约 40 毫秒。
然而,这让我感到非常困惑respayload
,如果我从响应中省略,查询将达到大约 34 毫秒。
问题!
瓶颈不应该是
COUNT(*) OVER() AS total_rows
吗?查询如何在仍在计算的情况下达到 40 毫秒?为什么省略会
respayload
带来这样的改进?删除reqpayload
没有类似的效果,即使我们不添加respayload
,我们仍然需要获取response_log
相应的 UUID 以将其复制status
到响应中。考虑到唯一改变的查询值是
applicationid
我们比较的日期和日期accessed
,包括索引在内的哪些进一步改进可以提高性能?
更多数据!
这是explain analyse
针对不同的查询配置:
与respayload
和COUNT(*) OVER() AS total_rows
:
Limit (cost=2826.59..2829.09 rows=1000 width=823) (actual time=408.535..419.136 rows=1000 loops=1)
-> Sort (cost=2826.59..2829.79 rows=1281 width=823) (actual time=408.524..412.154 rows=1000 loops=1)
Sort Key: req.accessed DESC
Sort Method: top-N heapsort Memory: 2064kB
-> WindowAgg (cost=1090.16..2760.47 rows=1281 width=823) (actual time=368.207..390.866 rows=3951 loops=1)
-> Hash Join (cost=1090.16..2744.46 rows=1281 width=823) (actual time=50.244..127.325 rows=3951 loops=1)
Hash Cond: (res.uuid = req.uuid)
-> Seq Scan on response_logs res (cost=0.00..1607.26 rows=9126 width=758) (actual time=0.008..36.196 rows=9129 loops=1)
Filter: (status IS NOT NULL)
-> Hash (cost=1044.85..1044.85 rows=3625 width=102) (actual time=38.739..38.739 rows=4046 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 1122kB
-> Index Scan using request_logs_application_idx on request_logs req (cost=0.29..1044.85 rows=3625 width=102) (actual time=0.035..22.009 rows=4046 loops=1)
Index Cond: (applicationid = 1)
Filter: ((accessed >= '2018-01-01 15:04:05+00'::timestamp with time zone) AND (accessed <= '2019-01-02 15:04:05+00'::timestamp with time zone) AND (requesturi !~~* '/v1/sessions%'::text))
Planning time: 2.699 ms
Execution time: 423.068 ms
respayload
有无COUNT(*) OVER() AS total_rows
:_
Limit (cost=2810.58..2813.08 rows=1000 width=823) (actual time=136.977..146.820 rows=1000 loops=1)
-> Sort (cost=2810.58..2813.78 rows=1281 width=823) (actual time=136.967..140.334 rows=1000 loops=1)
Sort Key: req.accessed DESC
Sort Method: top-N heapsort Memory: 2064kB
-> Hash Join (cost=1090.16..2744.46 rows=1281 width=823) (actual time=47.127..119.808 rows=3951 loops=1)
Hash Cond: (res.uuid = req.uuid)
-> Seq Scan on response_logs res (cost=0.00..1607.26 rows=9126 width=758) (actual time=0.015..33.307 rows=9129 loops=1)
Filter: (status IS NOT NULL)
-> Hash (cost=1044.85..1044.85 rows=3625 width=102) (actual time=38.328..38.328 rows=4046 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 1122kB
-> Index Scan using request_logs_application_idx on request_logs req (cost=0.29..1044.85 rows=3625 width=102) (actual time=0.047..21.813 rows=4046 loops=1)
Index Cond: (applicationid = 1)
Filter: ((accessed >= '2018-01-01 15:04:05+00'::timestamp with time zone) AND (accessed <= '2019-01-02 15:04:05+00'::timestamp with time zone) AND (requesturi !~~* '/v1/sessions%'::text))
Planning time: 3.882 ms
Execution time: 150.465 ms
没有respayload
和有COUNT(*) OVER() AS total_rows
:
Limit (cost=2826.59..2829.09 rows=1000 width=110) (actual time=164.428..174.760 rows=1000 loops=1)
-> Sort (cost=2826.59..2829.79 rows=1281 width=110) (actual time=164.418..167.956 rows=1000 loops=1)
Sort Key: req.accessed DESC
Sort Method: top-N heapsort Memory: 564kB
-> WindowAgg (cost=1090.16..2760.47 rows=1281 width=110) (actual time=133.997..148.382 rows=3951 loops=1)
-> Hash Join (cost=1090.16..2744.46 rows=1281 width=110) (actual time=46.282..119.070 rows=3951 loops=1)
Hash Cond: (res.uuid = req.uuid)
-> Seq Scan on response_logs res (cost=0.00..1607.26 rows=9126 width=45) (actual time=0.009..33.656 rows=9129 loops=1)
Filter: (status IS NOT NULL)
-> Hash (cost=1044.85..1044.85 rows=3625 width=102) (actual time=37.844..37.844 rows=4046 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 1122kB
-> Index Scan using request_logs_application_idx on request_logs req (cost=0.29..1044.85 rows=3625 width=102) (actual time=0.029..21.602 rows=4046 loops=1)
Index Cond: (applicationid = 1)
Filter: ((accessed >= '2018-01-01 15:04:05+00'::timestamp with time zone) AND (accessed <= '2019-01-02 15:04:05+00'::timestamp with time zone) AND (requesturi !~~* '/v1/sessions%'::text))
Planning time: 3.758 ms
Execution time: 178.675 ms
在尝试了各种配置之后,我想我找到了自己问题的答案。这不是 100% 证明,但它是有道理的。
该
COUNT(*) OVER() as total_rows
命令执行整个查询,包括子句WHERE
和字段SELECT
。所以当结果集非常大,比 大很多时LIMIT 1000
,我们仍然会根据 中指示的列复制所有匹配的记录SELECT
,包括respayload
.由于
respayload
可以相当大,当无限响应计数很高时,比方说 10K,然后respayload
复制 10K s,只是后来被丢弃,因为所选列的复制仅针对前 1000 条(由于限制)记录再次发生.respayload
这就是为什么从语句中省略可以显着提高性能的原因。以后我会小心使用
COUNT(*) OVER()
,因为这似乎受到所选列的严重影响。