JPM提出的问题 -dba

JPM

Asked: 2021-03-17 01:07:14 +0800 CST

TimescaleDB 通配符 (%) 慢

-1

我有一个像这样的 timescaledb 超表：

create table logs
(
    time         timestamp not null,
    partitionkey text      not null,
    ip           inet,
    raw          text,
    transformed  double precision
);

和索引如下：

create index logs_time_idx
    on logs (time desc);

create unique index logs_partitionkey_time_uindex
    on logs (partitionkey asc, time desc);

当我运行此查询时，需要 20 分钟才能完成：

SELECT * FROM data.logs 
WHERE partitionkey LIKE '%m.60.05482730' 
AND time > NOW() - INTERVAL '3 days'

但是当我运行这个时，它需要 2 秒：

SELECT * FROM data.logs 
WHERE partitionkey LIKE '865617033605366.m.60.05482730'
AND time > NOW() - INTERVAL '3 days'

我尝试仅索引分区键以帮助通配符查询找到匹配值，但这没有效果。

-- created this index later to try and fix the slow wildcard query
create index logs_partitionkey_index
    on logs (partitionkey);

解释通配符查询的计划：

Gather  (cost=1000.57..525711.89 rows=1219 width=81)
  Workers Planned: 2
  ->  Parallel Custom Scan (ChunkAppend) on logs  (cost=0.57..524589.99 rows=509 width=82)
        Chunks excluded during startup: 2
        ->  Parallel Index Scan using _hyper_2_10_chunk_logs_time_idx on _hyper_2_10_chunk  (cost=0.57..263956.91 rows=255 width=81)
              Index Cond: ("time" > (now() - '3 days'::interval))
              Filter: (partitionkey ~~ '%m.60.05482730'::text)
        ->  Parallel Index Scan using _hyper_2_9_chunk_logs_time_idx on _hyper_2_9_chunk  (cost=0.57..260629.72 rows=252 width=83)
              Index Cond: ("time" > (now() - '3 days'::interval))
              Filter: (partitionkey ~~ '%m.60.05482730'::text)
JIT:
  Functions: 8
  Options: Inlining true, Optimization true, Expressions true, Deforming true

解释具体的partionkey值：

Custom Scan (ChunkAppend) on logs  (cost=0.44..903.08 rows=790 width=82)
  Chunks excluded during startup: 2
  ->  Index Scan using _hyper_2_9_chunk_logs_partitionkey_time_uindex on _hyper_2_9_chunk  (cost=0.57..447.44 rows=392 width=83)
        Index Cond: ((partitionkey = '865617033605366.m.60.05482730'::text) AND ("time" > (now() - '3 days'::interval)))
        Filter: (partitionkey ~~ '865617033605366.m.60.05482730'::text)
  ->  Index Scan using _hyper_2_10_chunk_logs_partitionkey_time_uindex on _hyper_2_10_chunk  (cost=0.57..452.27 rows=396 width=81)
        Index Cond: ((partitionkey = '865617033605366.m.60.05482730'::text) AND ("time" > (now() - '3 days'::interval)))
        Filter: (partitionkey ~~ '865617033605366.m.60.05482730'::text)

TimescaleDB 是否无法执行通配符 (%) 查询，还是我错过了索引？

JPM

Asked: 2021-03-07 02:27:04 +0800 CST

在 postgres 上选择慢的多个聚合

0

我有一个包含列的表：id、天线 ID、纬度、经度。(antenna_id, latitude) 和 (antenna_id, longitude) 上有两个复合索引。当我为特定的天线 id 做一个 max(latitude) 时，速度是可以接受的，但是同时对纬度和经度做一个 min 和 max 是非常慢的。

使用 PostgreSQL 12.3

询问


EXPLAIN (analyze, buffers, format text) 
SELECT max(latitude) 
FROM packets 
WHERE antenna_id IN (1,2)

Finalize Aggregate  (cost=443017.21..443017.22 rows=1 width=32) (actual time=4373.679..4373.679 rows=1 loops=1)
  Buffers: shared hit=10812 read=16887
  ->  Gather  (cost=443017.10..443017.21 rows=1 width=32) (actual time=4373.412..4389.032 rows=2 loops=1)
        Workers Planned: 1
        Workers Launched: 1
        Buffers: shared hit=10812 read=16887
        ->  Partial Aggregate  (cost=442017.10..442017.11 rows=1 width=32) (actual time=4313.576..4313.577 rows=1 loops=2)
              Buffers: shared hit=10809 read=16887
              ->  Parallel Index Only Scan using idx_packets_antenna_id_latitude on packets  (cost=0.57..433527.51 rows=3395835 width=7) (actual time=0.375..3435.488 rows=2201866 loops=2)
                    Index Cond: (antenna_id = ANY ('{1,2}'::integer[]))
                    Heap Fetches: 0
                    Buffers: shared hit=10809 read=16887
Planning Time: 5.992 ms
JIT:
  Functions: 8
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 6.236 ms, Inlining 0.000 ms, Optimization 1.549 ms, Emission 32.058 ms, Total 39.842 ms
Execution Time: 4706.406 ms

对 max(longitude)、min(latitude) 和 min(longitude) 的解释看起来几乎相同。速度可以接受。

但是当我结合查询时

SELECT max(latitude), max(longitude), min(latitude), min(longitude) 
FROM packets 
WHERE antenna_id IN (1,2)

期间

[2021-03-06 09:28:30] 1 row retrieved starting from 1 in 5 m 35 s 907 ms (execution: 5 m 35 s 869 ms, fetching: 38 ms)

解释

Finalize Aggregate  (cost=3677020.18..3677020.19 rows=1 width=128)
  ->  Gather  (cost=3677020.06..3677020.17 rows=1 width=128)
        Workers Planned: 1
        ->  Partial Aggregate  (cost=3676020.06..3676020.07 rows=1 width=128)
              ->  Parallel Seq Scan on packets  (cost=0.00..3642080.76 rows=3393930 width=14)
                    Filter: (antenna_id = ANY ('{1,2}'::integer[]))
JIT:
  Functions: 7
  Options: Inlining true, Optimization true, Expressions true, Deforming true

EXPLAIN (analyze, buffers, format text) 
SELECT max(latitude), max(longitude), min(latitude), min(longitude) 
FROM packets 
WHERE antenna_id IN (1,2)

已经运行了 24 小时，还没有完成

索引

create index idx_packets_antenna_id_time
    on packets (antenna_id, time);

create index idx_packets_antenna_id_longitude
    on packets (antenna_id, longitude);

create index idx_packets_device_id_time
    on packets (device_id, time);

create index idx_packets_antenna_id_latitude
    on packets (antenna_id, latitude);

数据统计

select count(*) from packets
136758098

select count(distinct (antenna_id)) from packets
17558

select antenna_id, count(*) as records 
from packets 
where antenna_id in (1,2) 
group by antenna_id 
order by records desc

1,4361049
2,42683

问题

为什么在纬度和经度字段上执行最小值和最大值的第二个查询不使用索引？以及如何重写查询以使其更快？

JPM

Asked: 2020-02-24 00:58:44 +0800 CST

不再处理连接或共享内存不足

0

我有一个 golang 程序，它一次对多个线程中的数据进行计算，所有数据都从 Postgres 中提取。线程数取决于先前的结果。因此可能有数百个线程试图同时从 Postgres 中提取数据。

golang sql 库允许指定连接限制，以防止 postgres 耗尽共享内存或空闲连接。

如果我硬编码最大连接数，当连接其他东西时，我将用完连接。另一方面，如果我硬编码来自 golang 程序的允许连接数量太少，性能将受到不必要的限制。

什么是允许 go 程序使用尽可能多的连接而不会遇到限制的最佳方法。我想这个数字是一个变量，取决于当时连接到数据库的其他服务的数量。

我正在考虑在数据库和 golang 程序之间运行 PgBouncer，希望接受来自 golang 程序的所有连接，允许尽可能多的连接，但阻止其余连接，直到连接空闲。然而，我不确定 PgBouncer 是否会这样做，但接下来我将对其进行测试。

是否有另一种方法可以让连接池在没有真正连接可用时阻止连接？阻塞，而不是拒绝，因为拒绝连接意味着我必须在我的 golang 程序中添加重试逻辑。

JPM

Asked: 2020-02-05 06:43:35 +0800 CST

对唯一编号字段的 SELECT 查询不返回任何行

0

这是我在 GORM Github repo 上提交的一个问题的交叉帖子。我不确定问题出在 Postgres 11 还是 GORM 上。

见：https ://github.com/jinzhu/gorm/issues/2872

我有一个表，其中包含一个 ID 作为主键，并且只有一个列 (mega_herz)，它是一个数字 (7,3)。数字字段也有一个唯一的约束。当我从 pgadmin4 或 psql 执行以下查询时，我得到一行作为响应：

SELECT * 
FROM "ttnmapper_frequencies" 
WHERE ("ttnmapper_frequencies"."mega_herz" = 868.3) 
ORDER BY "ttnmapper_frequencies"."id" ASC 
LIMIT 1

但是当我通过 GORM 执行相同的查询时，它不会返回任何结果，并且当我尝试执行插入时，它会失败：

(/home/jpmeijers/go/src/ttnmapper-postgres-insert-raw/main.go:342) 
[2020-02-03 14:52:28]  [3.07ms]  
SELECT * 
FROM "ttnmapper_frequencies"  
WHERE ("ttnmapper_frequencies"."mega_herz" = 868.3)
ORDER BY "ttnmapper_frequencies"."id" ASC 
LIMIT 1  
[0 rows affected or returned ] 

(/home/jpmeijers/go/src/ttnmapper-postgres-insert-raw/main.go:342) 
[2020-02-03 14:52:28]  [2.10ms]  
INSERT  INTO "ttnmapper_frequencies" 
("mega_herz") 
VALUES 
(868.3) 
RETURNING "ttnmapper_frequencies"."id"  
[0 rows affected or returned ] 

(/home/jpmeijers/go/src/ttnmapper-postgres-insert-raw/main.go:345) 
[2020-02-03 14:52:28]  pq: duplicate key value violates unique constraint "ttnmapper_frequencies_mega_herz_key"

为什么选择查询不返回结果？我应该在 where 子句中指定三位小数的数字吗？

2020 年 2 月 9 日更新：

Postgres 的日志输出显示如下：

2020-02-09 05:43:45.679 UTC [59458] ttnmapper@ttnmapper LOG:  execute <unnamed>: SELECT * FROM "ttnmapper_frequencies"  WHERE ("ttnmapper_frequencies"."mega_herz" = $1) ORDER BY "ttnmapper_frequencies"."id" ASC LIMIT 1
2020-02-09 05:43:45.679 UTC [59458] ttnmapper@ttnmapper DETAIL:  parameters: $1 = '868.2999877929688'
2020-02-09 05:43:45.688 UTC [59458] ttnmapper@ttnmapper LOG:  statement: BEGIN READ WRITE
2020-02-09 05:43:45.689 UTC [59458] ttnmapper@ttnmapper LOG:  execute <unnamed>: INSERT  INTO "ttnmapper_frequencies" ("mega_herz") VALUES ($1) RETURNING "ttnmapper_frequencies"."id"
2020-02-09 05:43:45.689 UTC [59458] ttnmapper@ttnmapper DETAIL:  parameters: $1 = '868.2999877929688'
2020-02-09 05:43:45.704 UTC [59458] ttnmapper@ttnmapper ERROR:  duplicate key value violates unique constraint "ttnmapper_frequencies_mega_herz_key"
2020-02-09 05:43:45.704 UTC [59458] ttnmapper@ttnmapper DETAIL:  Key (mega_herz)=(868.300) already exists.

查询

SELECT * FROM "ttnmapper_frequencies" WHERE ("ttnmapper_frequencies"."mega_herz" = 868.2999877929688) ORDER BY "ttnmapper_frequencies"."id" ASC LIMIT 1

不返回任何结果。

但是，查询

SELECT * FROM "ttnmapper_frequencies" WHERE ("ttnmapper_frequencies"."mega_herz" = 868.3) ORDER BY "ttnmapper_frequencies"."id" ASC LIMIT 1

返回单行。

因此，我不得不假设 Postgres 不会自动舍入选择查询的参数。什么是最好的解决方案？

TimescaleDB 通配符 (%) 慢

在 postgres 上选择慢的多个聚合

索引

数据统计

问题

不再处理连接或共享内存不足

对唯一编号字段的 SELECT 查询不返回任何行

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

JPM's questions

索引

数据统计

问题