我可以在使用数据库后激活 PITR 吗？

Question

Asked: 2023-11-28 13:58:00 +0800 CST2023-11-28 13:58:00 +0800 CST 2023-11-28 13:58:00 +0800 CST

每天“截至”汇总 SCD 类型 2 数据

772

问题

使用 SCD 类型 2 数据时，可以通过使用日期列（例如：和）轻松查看给定时间点“截至”的表valid_from状态valid_to。例如：

select * from table
where '2023-11-01' between valid_from and valid_to

然后，您可以分组或使用窗口函数进行聚合。

但是，如果我想对某个范围内的每个日期（例如：每天）重复执行此操作，该怎么办？我不需要对这些日期进行汇总，只需在每个日期内进行汇总即可。

例子

假设我有一个表来跟踪quantity每个人的人数reservation_id。正如表格所追踪的，quantity人和事物都会随着时间而改变。reservation_status每个预订都附有一个event_id. event_date包含在内是为了在需要时帮助限制范围（参见下面的假设）。

预订ID	预订状态	数量	事件ID	活动日期	有效来源	有效
1	积极的	4	100	2023-05-25	2023-01-01	2023-01-02
2	积极的	2	200	2024-01-07	2023-01-01	2023-01-03
3	积极的	7	100	2023-05-25	2023-01-02	9999-12-31
4	积极的	1	200	2024-01-07	2023-01-02	9999-12-31
1	积极的	5	100	2023-05-25	2023-01-03	9999-12-31
5	积极的	8	100	2023-05-25	2023-01-03	9999-12-31
2	取消	2	200	2024-01-07	2023-01-04	9999-12-31
6	积极的	3	100	2023-05-25	2023-01-06	9999-12-31

PostgreSQL 因为 BigQuery 更难测试 - db-fiddle / SQL：

CREATE TABLE Reservations (
  "reservation_id" INTEGER,
  "reservation_status" VARCHAR(9),
  "quantity" INTEGER,
  "event_id" INTEGER,
  "event_date" DATE,
  "valid_from" DATE,
  "valid_to" DATE
);

INSERT INTO Reservations
  ("reservation_id", "reservation_status", "quantity", "event_id", "event_date", "valid_from", "valid_to")
VALUES
  ('1', 'active', '4', '100', '2023-05-25', '2023-01-01', '2023-01-02'),
  ('2', 'active', '2', '200', '2024-01-07', '2023-01-01', '2023-01-03'),
  ('3', 'active', '7', '100', '2023-05-25', '2023-01-02', '9999-12-31'),
  ('4', 'active', '1', '200', '2024-01-07', '2023-01-02', '9999-12-31'),
  ('1', 'active', '5', '100', '2023-05-25', '2023-01-03', '9999-12-31'),
  ('5', 'active', '8', '100', '2023-05-25', '2023-01-03', '9999-12-31'),
  ('2', 'cancelled', '2', '200', '2024-01-07', '2023-01-04', '9999-12-31'),
  ('6', 'active', '3', '100', '2023-05-25', '2023-01-06', '9999-12-31');

虽然这最终适用于 BigQuery，但只要是通用的，任何方言的答案都会被接受。

假设

“截至”日期可以是基于valid_from最小/最大的列表或范围
valid_toof9999-12-31是最新数据
特定活动的所有预订将在event_date - INTERVAL '2 years'和之间进行event_date。对于这个例子来说，这不会改变任何东西，但也许对于缩放很有用（？）

所需输出

我想知道每个间隔（天）的quantity分组依据和截止日期的event_id总和。reservation_status

截至日期	事件ID	预订状态	总数量
2023-01-01	100	积极的	4
2023-01-01	200	积极的	2
2023-01-02	100	积极的	11
2023-01-02	200	积极的	2
2023-01-03	100	积极的	20
2023-01-03	200	积极的	3
2023-01-04	100	积极的	20
2023-01-04	200	积极的	1
2023-01-04	200	取消	2
2023-01-06	100	积极的	23

^{行值的粗略估计。如果使用完整的日期范围将会有所不同。}

我本质上想做以下事情：

/* Invalid SQL, just for conceptual purposes */

-- Given a list of dates, for each "date":
select
  event_id,
  reservation_status,
  sum(quantity)
from table
where {{date}} between valid_from and valid_to
group by
  event_id,
  reservation_status

我相信这可以使用过程语言来完成，例如 for 循环，但我觉得我想得太多了，并且在组合更简单的概念时遇到了困难。

1 个回答

Voted

camtech · Answer 1 · 2023-11-29T19:53:56+08:00

事实证明我想太多了。感谢Stack Overflow 上的这篇文章提醒我基础知识，特别是您可以on and通过比较来加入，以及 SergeyA 的评论：只需列出日期并加入即可。

这是一个完整的解决方案（db-fiddle）：

with

-- Generate range of days based on valid_from
-- Or use a calendar table/date dimension
all_dates as (
  select day::date
  from generate_series(
      (select min(valid_from) from Reservations),
      (select max(valid_from) from Reservations),
      '1 day'
  ) day
),

-- Quantities as of each day
quantity_as_of as (
  select
    day as as_of_date,
    Reservations.event_id,
    Reservations.reservation_status,
    sum(Reservations.quantity) as sum_quantity
  from all_dates as ad
  join Reservations
    on valid_from <= ad.day and ad.day < valid_to
  group by
    ad.day,
    Reservations.event_id,
    Reservations.reservation_status
  order by as_of_date, event_id
)
  
select * from quantity_as_of

截至日期	事件ID	预订状态	数量
2023-01-01	100	积极的	4
2023-01-01	200	积极的	2
2023-01-02	100	积极的	7
2023-01-02	200	积极的	3
2023-01-03	100	积极的	20
2023-01-03	200	积极的	1
2023-01-04	100	积极的	20
2023-01-04	200	取消	2
2023-01-04	200	积极的	1
2023-01-05	100	积极的	20
2023-01-05	200	积极的	1
2023-01-05	200	取消	2
2023-01-06	100	积极的	23
2023-01-06	200	取消	2
2023-01-06	200	积极的	1

每天“截至”汇总 SCD 类型 2 数据

问题

例子

假设

所需输出

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

每天“截至”汇总 SCD 类型 2 数据

问题

例子

假设

所需输出

1 个回答

相关问题