我可以在使用数据库后激活 PITR 吗？

Question

JPM

Asked: 2021-03-07 02:27:04 +0800 CST2021-03-07 02:27:04 +0800 CST 2021-03-07 02:27:04 +0800 CST

在 postgres 上选择慢的多个聚合

772

我有一个包含列的表：id、天线 ID、纬度、经度。(antenna_id, latitude) 和 (antenna_id, longitude) 上有两个复合索引。当我为特定的天线 id 做一个 max(latitude) 时，速度是可以接受的，但是同时对纬度和经度做一个 min 和 max 是非常慢的。

使用 PostgreSQL 12.3

询问


EXPLAIN (analyze, buffers, format text) 
SELECT max(latitude) 
FROM packets 
WHERE antenna_id IN (1,2)

Finalize Aggregate  (cost=443017.21..443017.22 rows=1 width=32) (actual time=4373.679..4373.679 rows=1 loops=1)
  Buffers: shared hit=10812 read=16887
  ->  Gather  (cost=443017.10..443017.21 rows=1 width=32) (actual time=4373.412..4389.032 rows=2 loops=1)
        Workers Planned: 1
        Workers Launched: 1
        Buffers: shared hit=10812 read=16887
        ->  Partial Aggregate  (cost=442017.10..442017.11 rows=1 width=32) (actual time=4313.576..4313.577 rows=1 loops=2)
              Buffers: shared hit=10809 read=16887
              ->  Parallel Index Only Scan using idx_packets_antenna_id_latitude on packets  (cost=0.57..433527.51 rows=3395835 width=7) (actual time=0.375..3435.488 rows=2201866 loops=2)
                    Index Cond: (antenna_id = ANY ('{1,2}'::integer[]))
                    Heap Fetches: 0
                    Buffers: shared hit=10809 read=16887
Planning Time: 5.992 ms
JIT:
  Functions: 8
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 6.236 ms, Inlining 0.000 ms, Optimization 1.549 ms, Emission 32.058 ms, Total 39.842 ms
Execution Time: 4706.406 ms

对 max(longitude)、min(latitude) 和 min(longitude) 的解释看起来几乎相同。速度可以接受。

但是当我结合查询时

SELECT max(latitude), max(longitude), min(latitude), min(longitude) 
FROM packets 
WHERE antenna_id IN (1,2)

期间

[2021-03-06 09:28:30] 1 row retrieved starting from 1 in 5 m 35 s 907 ms (execution: 5 m 35 s 869 ms, fetching: 38 ms)

解释

Finalize Aggregate  (cost=3677020.18..3677020.19 rows=1 width=128)
  ->  Gather  (cost=3677020.06..3677020.17 rows=1 width=128)
        Workers Planned: 1
        ->  Partial Aggregate  (cost=3676020.06..3676020.07 rows=1 width=128)
              ->  Parallel Seq Scan on packets  (cost=0.00..3642080.76 rows=3393930 width=14)
                    Filter: (antenna_id = ANY ('{1,2}'::integer[]))
JIT:
  Functions: 7
  Options: Inlining true, Optimization true, Expressions true, Deforming true

EXPLAIN (analyze, buffers, format text) 
SELECT max(latitude), max(longitude), min(latitude), min(longitude) 
FROM packets 
WHERE antenna_id IN (1,2)

已经运行了 24 小时，还没有完成

索引

create index idx_packets_antenna_id_time
    on packets (antenna_id, time);

create index idx_packets_antenna_id_longitude
    on packets (antenna_id, longitude);

create index idx_packets_device_id_time
    on packets (device_id, time);

create index idx_packets_antenna_id_latitude
    on packets (antenna_id, latitude);

数据统计

select count(*) from packets
136758098

select count(distinct (antenna_id)) from packets
17558

select antenna_id, count(*) as records 
from packets 
where antenna_id in (1,2) 
group by antenna_id 
order by records desc

1,4361049
2,42683

问题

为什么在纬度和经度字段上执行最小值和最大值的第二个查询不使用索引？以及如何重写查询以使其更快？

1 个回答

Voted

bobflux · Answer 1 · 2021-03-08T09:43:32+08:00

让我们创建一些测试数据。看起来您的查询每个天线 ID 大约有 1% 的行，所以让我们复制一下。

CREATE UNLOGGED TABLE foo( lat FLOAT NOT NULL, lon FLOAT NOT NULL, aid INTEGER NOT NULL );
INSERT INTO foo SELECT random(), random(), random()*100
    FROM generate_series(1,10000000) s;
CREATE INDEX foo_aid_lat ON foo( aid, lat );
CREATE INDEX foo_aid_lon ON foo( aid, lon );
VACUUM ANALYZE foo;

SELECT min(lat),max(lat),min(lon),max(lon) FROM foo WHERE aid IN (1,2);

 Finalize Aggregate  (cost=71572.35..71572.36 rows=1 width=32) (actual time=119.907..125.118 rows=1 loops=1)
   ->  Gather  (cost=71572.12..71572.33 rows=2 width=32) (actual time=119.648..125.110 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=70572.12..70572.13 rows=1 width=32) (actual time=95.595..95.596 rows=1 loops=3)
               ->  Parallel Bitmap Heap Scan on foo  (cost=4886.47..69687.39 rows=88473 width=16) (actual time=9.532..90.336 rows=66524 loops=3)
                     Recheck Cond: (aid = ANY ('{1,2}'::integer[]))
                     Heap Blocks: exact=26477
                     ->  Bitmap Index Scan on foo_aid_lon  (cost=0.00..4833.39 rows=212336 width=0) (actual time=20.022..20.023 rows=199572 loops=1)
                           Index Cond: (aid = ANY ('{1,2}'::integer[]))
 Planning Time: 0.499 ms
 Execution Time: 125.202 ms

这真的很慢。让我们尝试一个天线id。

SELECT min(lat),max(lat),min(lon),max(lon) FROM foo WHERE aid=1;
 Result  (cost=1.88..1.89 rows=1 width=32) (actual time=0.192..0.196 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..0.47 rows=1 width=8) (actual time=0.059..0.060 rows=1 loops=1)
           ->  Index Only Scan using foo_aid_lat on foo  (cost=0.43..3777.80 rows=106668 width=8) (actual time=0.057..0.057 rows=1 loops=1)
                 Index Cond: ((aid = 1) AND (lat IS NOT NULL))
                 Heap Fetches: 0
   InitPlan 2 (returns $1)
     ->  Limit  (cost=0.43..0.47 rows=1 width=8) (actual time=0.044..0.045 rows=1 loops=1)
           ->  Index Only Scan Backward using foo_aid_lat on foo foo_1  (cost=0.43..3777.80 rows=106668 width=8) (actual time=0.043..0.044 rows=1 loops=1)
                 Index Cond: ((aid = 1) AND (lat IS NOT NULL))
                 Heap Fetches: 0
   InitPlan 3 (returns $2)
     ->  Limit  (cost=0.43..0.47 rows=1 width=8) (actual time=0.038..0.038 rows=1 loops=1)
           ->  Index Only Scan using foo_aid_lon on foo foo_2  (cost=0.43..3777.80 rows=106668 width=8) (actual time=0.037..0.037 rows=1 loops=1)
                 Index Cond: ((aid = 1) AND (lon IS NOT NULL))
                 Heap Fetches: 0
   InitPlan 4 (returns $3)
     ->  Limit  (cost=0.43..0.47 rows=1 width=8) (actual time=0.042..0.042 rows=1 loops=1)
           ->  Index Only Scan Backward using foo_aid_lon on foo foo_3  (cost=0.43..3777.80 rows=106668 width=8) (actual time=0.041..0.041 rows=1 loops=1)
                 Index Cond: ((aid = 1) AND (lon IS NOT NULL))
                 Heap Fetches: 0
 Planning Time: 0.504 ms
 Execution Time: 0.277 ms

这是正确的计划，它使用多列索引来计算最大值和最小值。每个 min() 或 max() 只需要 1 个索引查找，因为

SELECT max(lat) where aid=...

相当于

SELECT lat WHERE aid=... ORDER BY lat DESC LIMIT 1

...可以使用包含按预排序顺序的行的索引进行优化。

上面对 max() 和 min() 的优化基本上是语法糖，它将查询变成 ORDER BY+LIMIT 并将其放入 InitPlan 以便使用索引。

但是，显然，当使用“WHERE IN()”查询多个天线 ID 时，它不会这样做。在第一个查询末尾添加“GROUP BY 辅助”没有帮助。

所以......让我们一次查询一个天线ID。

SELECT * FROM 
(VALUES (1),(2)) AS v
CROSS JOIN LATERAL (SELECT min(lat),max(lat),min(lon),max(lon) FROM foo WHERE aid=v.column1) x;

它对 VALUES 进行嵌套循环，嵌套循环内部是上述快速查询。它为每个天线 ID 返回 max() 和 min()，因此要获得全局 max() 和 min()，您必须将其包装在子查询中并在结果上应用 max() 和 min()。

除非有其他问题，否则这不应该超过一毫秒。

将上面的 VALUES 替换为 generate_series(1,100) 以获得表中 100 个辅助的最大值大约需要 5 毫秒。以老式的方式进行操作：

select aid,min(lat),max(lat),min(lon),max(lon) FROM foo group by aid;

需要大约 100 倍的时间。

在 postgres 上选择慢的多个聚合

索引

数据统计

问题

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

在 postgres 上选择慢的多个聚合

索引

数据统计

问题

1 个回答

相关问题