AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / dba / 问题 / 315379
Accepted
Moshe Katz
Moshe Katz
Asked: 2022-08-09 13:47:00 +0800 CST2022-08-09 13:47:00 +0800 CST 2022-08-09 13:47:00 +0800 CST

事件数组对的聚合查询

  • 772

db<>fiddle用于下面的所有数据和查询

我有一个events具有以下结构的表:

create table events (
    correlation_id char(26) not null,
    user_id        bigint,
    task_id        bigint not null,
    location_id    bigint,
    type           bigint not null,
    created_at     timestamp(6) with time zone not null,
    constraint events_correlation_id_created_at_user_id_unique
        unique (correlation_id, created_at, user_id)
);

此表包含正在执行的任务的记录,如下所示:

相关标识 用户身份 task_id location_id 类型 created_at
01CN4HP4AN0000000000000001 4 58 30 0 2018-08-17 18:17:15.348629
01CN4HP4AN0000000000000001 4 58 30 1 2018-08-17 18:17:22.852299
01CN4HP4AN0000000000000001 4 58 30 99 2018-08-17 18:17:25.535593
01CN4J9SZ80000000000000003 4 97 30 0 2018-08-17 18:28:00.104093
01CN4J9SZ80000000000000003 4 97 30 99 2018-08-17 18:29:09.016840
01CN4JC1430000000000000004 4 99 30 0 2018-08-17 18:29:12.963264
01CN4JC1430000000000000004 4 99 30 99 2018-08-17 18:32:09.272632
01CN4KJCDY0000000000000005 139 97 30 0 2018-08-17 18:50:09.725668
01CN4KJCDY0000000000000005 139 97 30 3 2018-08-17 18:50:11.842000
01CN4KJCDY0000000000000005 139 97 30 99 2018-08-17 18:51:42.240895
01CNC4G1Y40000000000000008 139 99 30 0 2018-08-20 17:00:40.260430
01CNC4G1Y40000000000000008 139 99 30 99 2018-08-20 17:00:47.583501

带有 的行type = 0表示任务的开始,带有 的行type = 99表示任务的结束。(其他值表示与此问题无关的其他内容,但为了完整起见,此处包含两个示例行。)

每个task_id对应于tasks表中的一行。任务表中唯一与该问题相关的其他字段称为inprogress_status,它可以是1or 2,分别表示Opening task和Closing task。

我最初被要求提供一个查询,该查询将返回按开始日期和位置排序的任务列表,其中一行包含每个任务的开始 ( type = 0) 和结束 ( type = 99)。

这是我曾经这样做的查询:

SELECT e.created_at::DATE, e.location_id, e.task_id
     , CASE t.inprogress_status WHEN 2 THEN 'CLOSE' WHEN 1 THEN 'OPEN' END AS task_type
     , e.correlation_id
     , json_object_agg(e.type, json_build_object('timestamp', e.created_at, 'user_id', e.user_id)) AS events
FROM events e
JOIN tasks t on e.task_id = t.id
WHERE e.type IN (0, 99)
AND t.inprogress_status IN (1, 2)
group by created_at::DATE, location_id, task_id, correlation_id, inprogress_status
ORDER BY 1, 2, 3;

这是使用上面显示的数据进行查询的结果:

created_at location_id task_id 任务类型 相关标识 事件
2018-08-17 30 58 打开 01CN4HP4AN0000000000000001 {"0": {"timestamp": "2018-08-17T18:17:15.348629+00:00", "user_id": 4}, "99": {"timestamp": "2018-08-17T18:17:25.535593+00:00", "user_id": 4} }
2018-08-17 30 97 关 01CN4J9SZ80000000000000003 {"0": {"timestamp": "2018-08-17T18:28:00.104093+00:00", "user_id": 4}, "99": {"timestamp": "2018-08-17T18:29:09.01684+00:00", "user_id": 4} }
2018-08-17 30 99 打开 01CN4JC1430000000000000004 { "0": {"timestamp": "2018-08-17T18:29:12.963264+00:00", "user_id": 4}, "99": {"timestamp": "2018-08-17T18:32:09.272632+00:00", "user_id": 4} }
2018-08-17 30 97 关 01CN4KJCDY0000000000000005 { "0": {"timestamp": "2018-08-17T18:50:09.725668+00:00", "user_id": 139}, "99": {"timestamp": "2018-08-17T18:51:42.240895+00:00", "user_id": 139} }
2018-08-20 30 99 打开 01CNC4G1Y40000000000000008 { "0": {"timestamp": "2018-08-20T17:00:40.26043+00:00", "user_id": 139}, "99" : {"timestamp": "2018-08-20T17:00:47.583501+00:00", "user_id" : 139} }

在上面的例子中,task_id 58and 99haveinprogress_status = 1和task_id 97has inprogress_status = 2。

现在我被要求修改返回的数据结构,以便它也可以聚合inprogress_status,并将行作为 OPEN+CLOSE 事件对返回。

为了弄清楚如何构建它,我首先尝试获取这种格式(我真正想要的最终格式如下):

created_at location_id 事件
2018-08-17 30 {"OPEN": [{"correlation_id": "01CN4HP4AN0000000000000001", "0" : {"timestamp" : "2018-08-17T18:17:15.348629+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:17:25.535593+00:00", "user_id" : 4} }, {"OPEN": {"correlation_id": "01CN4JC1430000000000000004", "0" : {"timestamp" : "2018-08-17T18:29:12.963264+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:32:09.272632+00:00", "user_id" : 4} }], "CLOSE": [{"correlation_id": "01CN4J9SZ80000000000000003", "0" : {"timestamp" : "2018-08-17T18:28:00.104093+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:29:09.01684+00:00", "user_id" : 4} }, { "correlation_id": "01CN4KJCDY0000000000000005", "0" : {"timestamp" : "2018-08-17T18:50:09.725668+00:00", "user_id" : 139}, "99" : {"timestamp" : "2018-08-17T18:51:42.240895+00:00", "user_id" : 139} }]}
2018-08-20 30 {"OPEN": [{"correlation_id": "01CNC4G1Y40000000000000008", "0" : {"timestamp" : "2018-08-20T17:00:40.26043+00:00", "user_id" : 139}, "99" : {"timestamp" : "2018-08-20T17:00:47.583501+00:00", "user_id" : 139} }], "CLOSE": null}

这是我写的第一个查询,试图使这项工作:

WITH grouped_events AS (
    SELECT e.created_at::DATE AS created_date,
        location_id,
        task_id,
        CASE t.inprogress_status WHEN 2 THEN 'CLOSE' WHEN 1 THEN 'OPEN' END AS task_type,
        jsonb_build_object('id', e.correlation_id) ||
                jsonb_object_agg(type, jsonb_build_object('timestamp', e.created_at, 'user_id', user_id)) AS events
    FROM events e
    JOIN tasks t on e.task_id = t.id
    WHERE type IN (0, 99)
    AND inprogress_status IN (1, 2)
    GROUP BY e.created_at::DATE, location_id, task_id, correlation_id, t.inprogress_status
)
SELECT created_date, location_id, json_object_agg(task_type, events)
FROM grouped_events
GROUP BY 1, 2
ORDER BY 1, 2

问题是这会产生无效的 JSON。具有多个相同的键:

{
    "OPEN": {
        "0": { "user_id": 4, "timestamp": "2018-08-17T18:29:12.963264+00:00" },
        "99": { "user_id": 4, "timestamp": "2018-08-17T18:32:09.272632+00:00" },
        "id": "01CN4JC1430000000000000004"
    },
    "OPEN": {
        "0": { "user_id": 4, "timestamp": "2018-08-17T18:17:15.348629+00:00" },
        "99": { "user_id": 4, "timestamp": "2018-08-17T18:17:25.535593+00:00" },
        "id": "01CN4HP4AN0000000000000001"
    },
    // ... etc.
}

我发现这个查询以上面显示的格式返回数据:

WITH grouped_events1 AS (
    SELECT e.created_at::DATE AS created_date,
        location_id,
        task_id,
        CASE t.inprogress_status WHEN 2 THEN 'CLOSE' WHEN 1 THEN 'OPEN' END AS task_type,
        jsonb_build_object('id', e.correlation_id) ||
                jsonb_object_agg(type, jsonb_build_object('timestamp', e.created_at, 'user_id', user_id)) AS events
    FROM events e
    JOIN tasks t on e.task_id = t.id
    WHERE type IN (0, 99)
    AND inprogress_status IN (1, 2)
    GROUP BY e.created_at::DATE, location_id, task_id, correlation_id, t.inprogress_status
), grouped_events2 AS (
    SELECT created_date, location_id, task_type, json_agg(events) AS events
    FROM grouped_events1
    GROUP BY 1, 2, 3
)
SELECT created_date, location_id, json_object_agg(task_type, events)
FROM grouped_events2
GROUP BY 1, 2
ORDER BY 1, 2

但是,我实际需要的格式应该只是将单个 OPEN 与单个 CLOSE 配对,如下所示(每个 OPEN 和紧随其后的 CLOSE):

created_at location_id 事件
2018-08-17 30 {"OPEN": {"correlation_id": "01CN4HP4AN0000000000000001", "0" : {"timestamp" : "2018-08-17T18:17:15.348629+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:17:25.535593+00:00", "user_id" : 4} }, "CLOSE": {"correlation_id": "01CN4J9SZ80000000000000003", "0" : {"timestamp" : "2018-08-17T18:28:00.104093+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:29:09.01684+00:00", "user_id" : 4} }}
2018-08-17 30 {"OPEN": {"OPEN": {"correlation_id": "01CN4JC1430000000000000004", "0" : {"timestamp" : "2018-08-17T18:29:12.963264+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:32:09.272632+00:00", "user_id" : 4} }, "CLOSE": { "correlation_id": "01CN4KJCDY0000000000000005", "0" : {"timestamp" : "2018-08-17T18:50:09.725668+00:00", "user_id" : 139}, "99" : {"timestamp" : "2018-08-17T18:51:42.240895+00:00", "user_id" : 139} }}
2018-08-20 30 {"OPEN": [{"correlation_id": "01CNC4G1Y40000000000000008", "0" : {"timestamp" : "2018-08-20T17:00:40.26043+00:00", "user_id" : 139}, "99" : {"timestamp" : "2018-08-20T17:00:47.583501+00:00", "user_id" : 139} }], "CLOSE": null}

现在我想弄清楚我是否走错了方向,因为我看不出如何从我所拥有的东西中得到我的最终格式。

我接近这个错误吗?我怎样才能得到我正在寻找的结果?

postgresql group-by
  • 1 1 个回答
  • 46 Views

1 个回答

  • Voted
  1. Best Answer
    Erwin Brandstetter
    2022-08-09T20:57:50+08:002022-08-09T20:57:50+08:00

    这会产生您想要的结果:

    SELECT the_day, location_id
         , jsonb_object_agg(task_type, events || jsonb_build_object('correlation_id', correlation_id)) AS events
    FROM  (
       SELECT e.created_at::date AS the_day, e.location_id, e.correlation_id
            , count(*) FILTER (WHERE t.inprogress_status = 1)
                       OVER (PARTITION BY e.location_id ORDER BY min(e.created_at) FILTER (WHERE e.type = 0)) AS task_nr
            , CASE t.inprogress_status WHEN 2 THEN 'CLOSE' WHEN 1 THEN 'OPEN' END AS task_type     
            , jsonb_object_agg(e.type, jsonb_build_object('timestamp', e.created_at, 'user_id', e.user_id)) AS events
       FROM   events e
       JOIN   tasks t on e.task_id = t.id
       WHERE  e.type IN (0, 99)
       AND    t.inprogress_status IN (1, 2)
       GROUP  BY 1, 2, e.correlation_id, t.inprogress_status
       ) sub
    GROUP  BY the_day, location_id, task_nr
    ORDER  BY the_day, location_id, task_nr;
    

    db<>在这里摆弄

    除了在一天开始时缺少“OPEN”事件和在最后缺少“CLOSE”事件之外只是缺少。

    我使用jsonb而不是json允许jsonb || jsonb操作员。json如果您确实需要,您可以将结果转换为。

    核心特征是形成任务编号的复杂表达式:

        , count(*) FILTER (WHERE t.inprogress_status = 1)
                   OVER (PARTITION BY e.location_id ORDER BY min(e.created_at) FILTER (WHERE e.type = 0)) AS task_nr
    

    每个“打开”任务都会启动一个新组。created_atwith定义了任务的type = 0顺序。从技术上讲,这是可行的,因为我们可以FILTER在窗口函数中嵌套聚合函数(甚至使用聚合子句)。
    相关答案:

    • 在单个 SELECT 语句中返回多个范围的计数
    • 在 PHP/PostgreSQL 中应用 LIMIT 之前获取结果计数的最佳方法
    • 选择最长的连续序列
    • 2

相关问题

  • 我可以在使用数据库后激活 PITR 吗?

  • 运行时间偏移延迟复制的最佳实践

  • 存储过程可以防止 SQL 注入吗?

  • PostgreSQL 中 UniProt 的生物序列

  • PostgreSQL 9.0 Replication 和 Slony-I 有什么区别?

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目

    • 12 个回答
  • Marko Smith

    如何让sqlplus的输出出现在一行中?

    • 3 个回答
  • Marko Smith

    选择具有最大日期或最晚日期的日期

    • 3 个回答
  • Marko Smith

    如何列出 PostgreSQL 中的所有模式?

    • 4 个回答
  • Marko Smith

    列出指定表的所有列

    • 5 个回答
  • Marko Smith

    如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

    • 4 个回答
  • Marko Smith

    你如何mysqldump特定的表?

    • 4 个回答
  • Marko Smith

    使用 psql 列出数据库权限

    • 10 个回答
  • Marko Smith

    如何从 PostgreSQL 中的选择查询中将值插入表中?

    • 4 个回答
  • Marko Smith

    如何使用 psql 列出所有数据库和表?

    • 7 个回答
  • Martin Hope
    Jin 连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目 2014-12-02 02:54:58 +0800 CST
  • Martin Hope
    Stéphane 如何列出 PostgreSQL 中的所有模式? 2013-04-16 11:19:16 +0800 CST
  • Martin Hope
    Mike Walsh 为什么事务日志不断增长或空间不足? 2012-12-05 18:11:22 +0800 CST
  • Martin Hope
    Stephane Rolland 列出指定表的所有列 2012-08-14 04:44:44 +0800 CST
  • Martin Hope
    haxney MySQL 能否合理地对数十亿行执行查询? 2012-07-03 11:36:13 +0800 CST
  • Martin Hope
    qazwsx 如何监控大型 .sql 文件的导入进度? 2012-05-03 08:54:41 +0800 CST
  • Martin Hope
    markdorison 你如何mysqldump特定的表? 2011-12-17 12:39:37 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 对 SQL 查询进行计时? 2011-06-04 02:22:54 +0800 CST
  • Martin Hope
    Jonas 如何从 PostgreSQL 中的选择查询中将值插入表中? 2011-05-28 00:33:05 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 列出所有数据库和表? 2011-02-18 00:45:49 +0800 CST

热门标签

sql-server mysql postgresql sql-server-2014 sql-server-2016 oracle sql-server-2008 database-design query-performance sql-server-2017

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve