AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / dba / 问题 / 271037
Accepted
Deepan Kaviarasu
Deepan Kaviarasu
Asked: 2020-07-16 19:36:21 +0800 CST2020-07-16 19:36:21 +0800 CST 2020-07-16 19:36:21 +0800 CST

按时间间隔分组并输出源和目的station_id和count

  • 772

我被一个查询困住了:

CREATE TABLE public.bulk_sample (
    serial_number character varying(255),
    validation_date timestamp,  -- timestamp of entry and exit
    station_id integer,
    direction integer           -- 1 = Entry | 2 = Exit
);

INSERT INTO public.bulk_sample VALUES
  ('019b5526970fcfcf7813e9fe1acf8a41bcaf5a5a5c10870b3211d82f63fbf270', '2020-02-01 08:31:58', 120, 1)
, ('019b5526970fcfcf7813e9fe1acf8a41bcaf5a5a5c10870b3211d82f63fbf270', '2020-02-01 08:50:22', 113, 2)
, ('019b5526970fcfcf7813e9fe1acf8a41bcaf5a5a5c10870b3211d82f63fbf270', '2020-02-01 10:16:56', 113, 1)
, ('019b5526970fcfcf7813e9fe1acf8a41bcaf5a5a5c10870b3211d82f63fbf270', '2020-02-01 10:47:06', 120, 2)
, ('019b5526970fcfcf7813e9fe1acf8a41bcaf5a5a5c10870b3211d82f63fbf270', '2020-02-01 16:02:12', 120, 1)
, ('019b5526970fcfcf7813e9fe1acf8a41bcaf5a5a5c10870b3211d82f63fbf270', '2020-02-01 16:47:45', 102, 2)
, ('019b5526970fcfcf7813e9fe1acf8a41bcaf5a5a5c10870b3211d82f63fbf270', '2020-02-01 19:26:38', 102, 1)
, ('019b5526970fcfcf7813e9fe1acf8a41bcaf5a5a5c10870b3211d82f63fbf270', '2020-02-01 20:17:24', 120, 2)
, ('23cc9678e8cf834decb096ba36be0efee418402bce03aab52e69026adfec7663', '2020-02-01 07:58:20', 119, 1)
, ('23cc9678e8cf834decb096ba36be0efee418402bce03aab52e69026adfec7663', '2020-02-01 08:43:35', 104, 2)
, ('23cc9678e8cf834decb096ba36be0efee418402bce03aab52e69026adfec7663', '2020-02-01 16:38:10', 104, 1)
, ('23cc9678e8cf834decb096ba36be0efee418402bce03aab52e69026adfec7663', '2020-02-01 17:15:01', 119, 2)
, ('23cc9678e8cf834decb096ba36be0efee418402bce03aab52e69026adfec7663', '2020-02-01 17:42:29', 119, 1)
, ('23cc9678e8cf834decb096ba36be0efee418402bce03aab52e69026adfec7663', '2020-02-01 17:48:05', 120, 2)
, ('2a8f28bf0afc655210aa337aff016d33100282ac73cca660a397b924808499af', '2020-02-01 15:17:59', 120, 1)
, ('2a8f28bf0afc655210aa337aff016d33100282ac73cca660a397b924808499af', '2020-02-01 15:25:25', 118, 2)
, ('2a8f28bf0afc655210aa337aff016d33100282ac73cca660a397b924808499af', '2020-02-01 16:16:12', 118, 1)
, ('2a8f28bf0afc655210aa337aff016d33100282ac73cca660a397b924808499af', '2020-02-01 16:32:51', 120, 2)
, ('2a8f28bf0afc655210aa337aff016d33100282ac73cca660a397b924808499af', '2020-02-01 19:31:20', 120, 1)
, ('2a8f28bf0afc655210aa337aff016d33100282ac73cca660a397b924808499af', '2020-02-01 19:39:33', 118, 2)
, ('2a8f28bf0afc655210aa337aff016d33100282ac73cca660a397b924808499af', '2020-02-01 20:57:50', 118, 1)
, ('2a8f28bf0afc655210aa337aff016d33100282ac73cca660a397b924808499af', '2020-02-01 21:16:25', 120, 2)
;

我必须创建一个查询,其结果如下

source | dest | Count
120    | 113  |  1
113    | 120  |  1

我尝试了以下代码,但无法获得所需的结果:

SELECT serial_number
     , count(*)
     , min(validation_date) AS start_time
     , CASE WHEN count(*) > 1 THEN max(validation_date) END AS end_time
FROM  (
   SELECT serial_number, validation_date, count(step OR NULL) OVER (ORDER BY serial_number, 
validation_date) AS grp
   FROM  (
      SELECT *
           , lag(validation_date) OVER (PARTITION BY serial_number ORDER BY validation_date)
           < validation_date - interval '60 min' AS step
      FROM   table1 
       where BETWEEN '2020-02-01 00:00:00' AND '2020-02-01 23:59:59'
      ) sub1
   ) sub2
GROUP  BY serial_number, grp;

每次进出之间的时间间隔约为 55 分钟至 60 分钟。

我也尝试过内部联接,但无法按内部联接中的时间间隔进行分组

SELECT source.station_id AS source_station ,dest.station_id AS destination_station ,source.count FROM 
    (
        SELECT serial_number,station_id,count(bulk_transaction_id) FROM table1
        WHERE 
            direction = 1 AND 
            validation_date BETWEEN '2020-02-01 00:00:00' AND '2020-02-01 23:59:59' 
        GROUP BY serial_number,station_id
    )source

 INNER JOIN 
    (
        SELECT serial_number,station_id,count(bulk_transaction_id) FROM table1
        WHERE 
            direction = 2 AND 
            validation_date BETWEEN '2020-02-01 00:00:00' AND '2020-02-01 23:59:59'
        GROUP BY serial_number,station_id
    )dest
ON source.serial_number = dest.serial_number and source.station_id <> dest.station_id

挑战有时是进入日期为空,有时退出日期为空。

postgresql postgresql-10
  • 2 2 个回答
  • 162 Views

2 个回答

  • Voted
  1. bbaird
    2020-07-17T05:58:48+08:002020-07-17T05:58:48+08:00

    为此,您将需要两件事:

    1. 连接条件的相关子查询
    2. (serial_number,validation_date) 上的唯一索引

    之后,您的查询变为:

    SELECT
      station_entry.station_id AS source
     ,station_exit.station_id AS dest
     ,COUNT(*) AS count
    FROM
      public.bulk_sample station_entry
    INNER JOIN
      public.bulk_sample station_exit
        ON station_exit.serial_number = station_entry.serial_number
            AND station_exit.validation_date =
                  (
                    SELECT
                      MIN(validation_date)
                    FROM
                      public.bulk_sample
                    WHERE
                      serial_number = station_entry.serial_number
                        AND validation_date > station_entry.validation_date
                  )
    WHERE
      station_entry.direction = 1
        AND station_exit.direction = 2  --Ensure next transaction is valid
        AND station_entry.validation_date >= '2020-02-01 00:00:00'
        AND station_entry.validation_date <= '2020-02-01 23:59:59'
        AND station_exit.validation_date <= '2020-02-01 23:59:59' --Ensure both events occurred within specified timeframe
    GROUP BY
      station_entry.station_id
     ,station_exit.station_id
    

    应该返回:

    source  dest    count
    102     120     1
    104     119     1
    113     120     1
    118     120     2
    119     104     1
    119     120     1
    120     102     1
    120     113     1
    120     118     2
    
    • 2
  2. Best Answer
    Erwin Brandstetter
    2020-07-17T16:20:58+08:002020-07-17T16:20:58+08:00

    这应该是最简单和最快的,而每个事务serial_number从不重叠:

    WITH cte AS (
       SELECT serial_number, validation_date, station_id, direction
            , row_number() OVER (PARTITION BY serial_number ORDER BY validation_date) AS rn
       FROM   bulk_sample
       WHERE  validation_date >= '2020-02-01'  -- ①
       AND    validation_date <  '2020-02-02'  -- entry & exit must be within time frame
       )
    SELECT s.station_id AS source, d.station_id AS dest, count(*)
    FROM   cte s
    JOIN   cte d USING (serial_number)
    WHERE  s.direction = 1
    AND    d.rn = s.rn + 1
    GROUP  BY 1, 2
    ORDER  BY 1, 2;  -- optional sort order
    

    db<>在这里摆弄

    ① 我重写了WHERE条件,以最佳方式获得 2020 年 2 月 1 日的所有内容。BETWEEN几乎总是时间范围的错误工具。看:

    • 如何将日/夜指示器添加到时间戳列?

    此外,当时间分量丢失时,假定“2020-02-01”是一个完全有效的timestamp常数。00:00:00

    在检索给定时间范围内的结果时,一个普通的 btree索引是(validation_date)最佳的。对于完整的表,索引(serial_number, validation_date)会更有帮助。

    validation_date IS NULL?

    查询继续工作,而在给定时间范围内只有最后一个目的地有,因为值碰巧按默认升序排列在最后。但它与. 您必须更仔细地定义这些可以弹出的位置以及如何准确处理它们。serial_numbervalidation_date IS NULLNULLvalidation_date IS NULL

    (2x)uuid而不是varchar(255)for serial_number?

    您serial_number似乎是一个正好有 64 位数字的十六进制数。如果是这样,varchar(255)是一个糟糕的选择。看:

    • 我应该向 VARCHAR 列添加任意长度限制吗?

    此外,一个uuid(32 个十六进制数字)就足够了。如果需要所有 64 个十六进制数字,仍然考虑 2uuid列。更小、更快、更安全。考虑:

    SELECT *
         , replace(uuid1::text || uuid2::text, '-', '') AS reverse_engineered
         , replace(uuid1::text || uuid2::text, '-', '') = serial_number AS identical
         , pg_column_size(serial_number) AS varchar_size
         , pg_column_size(uuid1) + pg_column_size(uuid2) AS uuid_size
    FROM  (
       SELECT serial_number
            , left(serial_number, 32)::uuid  AS uuid1
            , right(serial_number, 32)::uuid AS uuid2
       FROM   bulk_sample
       ) sub;
    

    db<>在这里摆弄

    看:

    • MD5 字段的最佳数据类型是什么?
    • 当所有值都是 36 个字符时,使用 char 与 varchar 进行索引查找会明显更快吗
    • 1

相关问题

  • 我可以在使用数据库后激活 PITR 吗?

  • 运行时间偏移延迟复制的最佳实践

  • 存储过程可以防止 SQL 注入吗?

  • PostgreSQL 中 UniProt 的生物序列

  • PostgreSQL 9.0 Replication 和 Slony-I 有什么区别?

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目

    • 12 个回答
  • Marko Smith

    如何让sqlplus的输出出现在一行中?

    • 3 个回答
  • Marko Smith

    选择具有最大日期或最晚日期的日期

    • 3 个回答
  • Marko Smith

    如何列出 PostgreSQL 中的所有模式?

    • 4 个回答
  • Marko Smith

    列出指定表的所有列

    • 5 个回答
  • Marko Smith

    如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

    • 4 个回答
  • Marko Smith

    你如何mysqldump特定的表?

    • 4 个回答
  • Marko Smith

    使用 psql 列出数据库权限

    • 10 个回答
  • Marko Smith

    如何从 PostgreSQL 中的选择查询中将值插入表中?

    • 4 个回答
  • Marko Smith

    如何使用 psql 列出所有数据库和表?

    • 7 个回答
  • Martin Hope
    Jin 连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目 2014-12-02 02:54:58 +0800 CST
  • Martin Hope
    Stéphane 如何列出 PostgreSQL 中的所有模式? 2013-04-16 11:19:16 +0800 CST
  • Martin Hope
    Mike Walsh 为什么事务日志不断增长或空间不足? 2012-12-05 18:11:22 +0800 CST
  • Martin Hope
    Stephane Rolland 列出指定表的所有列 2012-08-14 04:44:44 +0800 CST
  • Martin Hope
    haxney MySQL 能否合理地对数十亿行执行查询? 2012-07-03 11:36:13 +0800 CST
  • Martin Hope
    qazwsx 如何监控大型 .sql 文件的导入进度? 2012-05-03 08:54:41 +0800 CST
  • Martin Hope
    markdorison 你如何mysqldump特定的表? 2011-12-17 12:39:37 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 对 SQL 查询进行计时? 2011-06-04 02:22:54 +0800 CST
  • Martin Hope
    Jonas 如何从 PostgreSQL 中的选择查询中将值插入表中? 2011-05-28 00:33:05 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 列出所有数据库和表? 2011-02-18 00:45:49 +0800 CST

热门标签

sql-server mysql postgresql sql-server-2014 sql-server-2016 oracle sql-server-2008 database-design query-performance sql-server-2017

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve