clickhouse文档指出:
lagInFrame 行为与标准 SQL 滞后窗口函数不同。Clickhouse 窗口函数 lagInFrame 尊重窗口框架。
这个窗口框架是什么,它如何影响输出?
示例:我想在时间序列中找到连续两行的时间列值差大于给定阈值的行。
我想将每一行与前一行进行比较。
以下内容使我相信下面的查询是正确的方法。
为了获得与滞后相同的行为,请使用 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING。
SELECT lag, startdate, diff
FROM (SELECT startdate,
lagInFrame(startdate)
OVER(
ORDER BY startdate ASC ROWS BETWEEN UBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING) AS lag,
Date_diff('minute', lag, startdate) AS diff
FROM <table>
ORDER BY startdate ASC)
WHERE diff > 15
但是,以下查询给出了完全相同的结果并且占用更少的内存。
SELECT lag, startdate, diff
FROM (SELECT startdate,
lagInFrame(startdate)
OVER(
ORDER BY startdate ASC ROWS BETWEEN 1 PRECEDING AND
CURRENT ROW) AS lag,
Date_diff('minute', lag, startdate) AS diff
FROM <table>
ORDER BY startdate ASC)
WHERE diff > 15
区别在于
OVER(ORDER BY startdate ASC ROWS BETWEEN UBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
:
OVER(ORDER BY startdate ASC ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
查询的输出是否存在差异的情况?
我以这样的方式编写了此文档,以减少有关以下内容的错误报告的数量
leadInFrame
:人们只是不明白在
OVER (ORDER BY somecol)
https://clickhouse.com/docs/en/sql-reference/window-functions#syntax的情况下窗口函数的框架是如何工作的框架以分区的开头和当前行为界。并且它是的等式
ORDER BY order ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
。并且下一行不可见
leadInFrame
。它对于普通人来说更方便使用
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
,因为它更兼容lag/lead
并且可以在任何偏移的情况下工作。例如,如果您使用另一个偏移量:
lagInFrame(startdate,2)
,那么您需要使用BETWEEN 2 PRECEDING AND CURRENT ROW
https://fiddle.clickhouse.com/1ce6547a-109a-4552-9f1b-b9c03f972988顺便说一句,您可以使用
any() over ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
并将获得相同的结果:https://fiddle.clickhouse.com/cfa1df1e-2571-4167-9fc1-0f3b1c8f86ac