您好我正在尝试优化时间戳范围包含<@
对 Postgres 12 的查询
我已经阅读了一些postgres 文档 ,发现只有 GiST 和 SP-GiST 索引支持这个运算符。但是,我不能添加其中之一(我想我需要在心率表中添加一个 - 请参阅下面的模式,但这不是范围类型......)。
我的问题与这个问题和这个问题类似,这也表明我需要一个 GiST 索引。但是,它们是相反的,例如,它们具有单个时间戳并希望从tsrange
包含所有记录的表中返回的列。我有一个时间戳表,想将它加入到一个 tsranges 表中
对于我的模式的一些背景信息,我在真实数据集中采样了约 1/3 秒的心率集合,以及我听过的歌曲列表以及何时听过的歌曲。我想查询诸如
avg(heartrate)
对于一个特定的track
和artist
avg(heartrate)
对于一个特定的artist
- 等等
架构
create table heartrate ( "time" timestamp primary key , value float ) ; CREATE INDEX ON heartrate ("time", value); -- CREATE INDEX ON heartrate USING GIST ("time", value); can't do as "time" is not a range column. -- one gets the following error: --- ERROR: data type timestamp without time zone has no default operator class for access method "gist" Hint: You must specify an operator class for the index or define a default operator class for the data type. create table song_play( track TEXT NOT NULL, artist TEXT NOT NULL, play tsrange not null ) ; CREATE INDEX ON song_play(track, artist); INSERT INTO heartrate("time", value) SELECT d, 60+60*random() FROM generate_series('2015-01-01 00:00:00'::timestamp, '2020-01-01 00:00:00'::timestamp, '5 min'::interval) d ; INSERT INTO song_play(track,artist, play) SELECT case when random() > 0.5 then 'a' when random() > 0.5 then 'b' else 'c' end , case when random() > 0.5 then 'a' when random() > 0.5 then 'b' else 'c' end , tsrange(d, d+ (((random()*3+1)::text|| 'min')::interval)) FROM generate_series('2015-01-01 00:00:00'::timestamp, '2020-01-01 00:00:00'::timestamp, '1 day'::interval) d ; EXPLAIN SELECT sp.track, sp.artist, avg(h.value) FROM song_play sp left join heartrate h ON h.time <@ sp.play where sp.track='a' and sp.artist='b' GROUP BY sp.track, sp.artist;
结果如下:
✓ ✓ ✓ ✓ 525889 行受影响 1827 行受影响 | 查询计划 | | :------------------------------------------------ -------------------------------------------------- ------ | | GroupAggregate(成本=0.28..14689.24 行=1 宽度=72)| | 组键:sp.track、sp.artist | | -> 嵌套循环左连接(成本=0.28..14685.28 行=526 宽度=72)| | 加入过滤器:(h."time" <@ sp.play) | | -> 使用 song_play sp 上的 song_play_track_artist_idx 进行索引扫描(成本=0.28..8.29 行=1 宽度=96)| | 索引条件: ((track = 'a'::text) AND (artist = 'b'::text)) | | -> 对心率 h 进行 Seq 扫描(成本=0.00..8102.55 行=525955 宽度=16)|
注意:上述计划导致对最大的表心率表进行全序列扫描 - 一点也不理想!
然后我决定创建以下函数,看看它是否有助于加快查询速度。它将范围例如转换tsrange('2020-01-01 00:00:00', '2020-01-02 00:00:00')
为条件查询,例如field >= 2020-01-01 00:00:00 and field < '2020-01-02 00:00:00'
。
本质上与<@
contains 运算符相同。
它似乎工作!虽然这仅有助于查找特定song_play
的心率……但并非所有track
/artist
的song_play
心率
CREATE OR REPLACE FUNCTION range_to_conditional(range anyrange, field text) RETURNS text LANGUAGE SQL IMMUTABLE STRICT AS $$ SELECT case when isempty(range) then 'false' when upper_inf(range) and lower_inf(range) then 'true' when upper_inf(range) then case when lower_inc(range) then format(' %L <= %I ', lower(range), field) else format(' %L < %I ', lower(range), field) end when lower_inf(range) then case when upper_inc(range) then format(' %L >= %I ', upper(range), field) else format(' %L > %I ', upper(range), field) end else case when lower_inc(range) and upper_inc(range) then format(' %1$L <= %3$I AND %2$L >= %3$I ', lower(range), upper(range), field) when lower_inc(range) then format(' %1$L <= %3$I AND %2$L > %3$I ', lower(range), upper(range), field) when upper_inc(range) then format(' %1$L < %3$I AND %2$L >= %3$I ', lower(range), upper(range), field) else format(' %1$L < %3$I AND %2$L > %3$I ', lower(range), upper(range), field) end end $$ ; create function avg_heartrate(sp song_play) returns double precision as $$ DECLARE retval double precision ; BEGIN EXECUTE format('select avg(h.value) from heartrate h where %s', range_to_conditional(sp.play, 'time')) INTO STRICT retval; RETURN retval; END $$ LANGUAGE plpgsql stable; SELECT sp.track, sp.artist, sp.play, avg_heartrate(sp) from song_play sp where sp.track='a' and sp.artist='b' limit 10;
✓ ✓ 跟踪 | 艺术家 | 玩 | avg_heartrate :---- | :----- | :------------------------------------------------ -- | :----------------- 一个 | 乙 | ["2015-01-03 00:00:00","2015-01-03 00:03:42.413608") | 78.93074469582096 一个 | 乙 | ["2015-01-10 00:00:00","2015-01-10 00:01:32.299356") | 83.89127804586359 一个 | 乙 | ["2015-01-11 00:00:00","2015-01-11 00:03:24.722083") | 62.333722293527885 一个 | 乙 | ["2015-01-19 00:00:00","2015-01-19 00:01:14.845757") | 77.65872734128969 一个 | 乙 | ["2015-01-30 00:00:00","2015-01-30 00:01:40.991165") | 102.88233680407437 一个 | 乙 | ["2015-02-06 00:00:00","2015-02-06 00:03:51.264716") | 70.34797302970127 一个 | 乙 | ["2015-02-13 00:00:00","2015-02-13 00:01:23.358657") | 62.91734005187344 一个 | 乙 | ["2015-02-25 00:00:00","2015-02-25 00:02:04.856602") | 115.45533419257616 一个 | 乙 | ["2015-02-28 00:00:00","2015-02-28 00:02:46.800728") | 117.39846990343175 一个 | 乙 | ["2015-03-18 00:00:00","2015-03-18 00:02:54.893186") | 68.1618921408235
db<>在这里摆弄
谢谢!
将连接条件从
至
(如果您的范围在右端打开,则使用不同的不等式运算符)。
然后嵌套循环连接可以使用常规的 b-tree 索引
heartrate(time)
来加速内部查询。