设置连接的默认数据库

Question

Ryan

Asked: 2017-11-07 01:04:15 +0800 CST2017-11-07 01:04:15 +0800 CST 2017-11-07 01:04:15 +0800 CST

计算期间开始和结束日期

772

我的桌子：

Date       Employee     Status
-----------------------------
20171106   001          At work
20171107   001          Sick leave
20171108   001          At work
20171109   001          At work
20171111   001          Sick leave (A gap here)
20171112   001          Sick leave
20171115   001          At work (Another gap)
20171116   001          At work

期望的结果：

Employee      Status        StartDT                       EndDT
-------------------------------------------------------------------
001           At work       Some time in the history      20171106
001           Sick leave    20171107                      20171107
001           At work       20171108                      20171109
001           Sick leave    20171111                      20171112
001           At work       20171115                      20171116

逻辑：我们按状态重新组织源表，而不是按日期。所以日期的差距并不重要，应该被忽略。

如何在 Teradata 15 中执行此操作？

注意：select min(Date), max(Date) group by employee, status;将不起作用，因为两个“工作中”之间的状态可能会发生变化。

2 个回答

Voted

dnoeth · Answer 1 · 2017-11-08T03:03:03+08:00

最简单的解决方案是标准化一个周期：

SELECT NORMALIZE Employee, Status, PERIOD(date,date+1) AS pd
FROM mytable
ORDER BY Employee, pd

NORMALIZE是一种非常未知的语法，它结合了重叠的时段，您只需要在日期列之外创建一个一天的时段。由于这会导致您在一段时间内获得稍微不同的输出，因此结束日期与预期结果相比是 +1。

要解决此问题，您可以将期间拆分回单独的列：

SELECT Employee, Status, Begin(pd), 
   Last(pd) -- last included date, i.e. expected EndDT
FROM
 (
   SELECT NORMALIZE Employee, Status, PERIOD(date,date+1) AS pd
   FROM myTable
 ) AS dt
ORDER BY 1,pd

得到这个历史上的某些时候更复杂，需要额外的计算，你应该检查你是否真的需要它。

另一个更经典的解决方案计算具有连续值的行组：

SELECT Employee, Status, Min(date), Max(date)
FROM
 (
   SELECT Employee, Status, date, 
   -- this is the tricky part, the differnce between a monotonous sequence (row_number) 
   -- and another monotonous sequence with gaps (date)
   -- is constant when there's no gap
      date - Row_Number() 
             Over (PARTITION BY Employee, Status
                   ORDER BY date) AS grp
   FROM vt
 ) AS dt
GROUP BY Employee, Status, grp
ORDER BY 3

当您添加时，这两种解决方案都会导致额外的一行

20171120   001          At work
...
001           At work       20171115                      20171116
001           At work       20171120                      20171120

如果要与上一行合并

001           At work       20171115                      20171120

它也有点复杂......

markp-fuso · Answer 2 · 2017-11-08T05:54:10+08:00

我无权访问 Teradata 系统（而且我的 Teradata '知识'非常陈旧/过时），因此针对 SQL Server 测试了以下代码。

注意：我假设转换为 Teradata 语法的（相对）小问题......

我们将从表格和示例数据开始：

create table emp_status
([Date]     date
,Employee   varchar(10)
,Status     varchar(30));

insert into emp_status values
('20171106','001','At work'),
('20171107','001','Sick leave'),
('20171108','001','At work'),
('20171109','001','At work'),
('20171111','001','Sick leave'),
('20171112','001','Sick leave'),
('20171115','001','At work'),
('20171116','001','At work');

为了使查询更容易编写，我们将通过添加一些临时的“开始”和“结束”记录来扩展Employee我们的源数据。

“开始”记录将复制最早Status但带有Date=18000101，而“结束”记录将设置Status='BOGUS'和Date=99991231：

-- with expanded as ...

select [Date],
       Employee,
       Status
from   emp_status

union all

-- add our starting/history record

select distinct
       '18000101',
       es1.Employee,
       es1.Status
from   emp_status es1
where  [Date] = (select min(es2.[Date])
                 from   emp_status es2
                 where  es2.Employee = es1.Employee)

union all

-- add our ending record

select distinct
       '99991231',
       Employee,
       'BOGUS'
from   emp_status

 Date                | Employee | Status    
 ------------------- | -------- | ----------
 01/01/1800 00:00:00 | 001      | At work     <--- temporary 'start' record
 06/11/2017 00:00:00 | 001      | At work   
 07/11/2017 00:00:00 | 001      | Sick leave
 08/11/2017 00:00:00 | 001      | At work   
 09/11/2017 00:00:00 | 001      | At work   
 11/11/2017 00:00:00 | 001      | Sick leave
 12/11/2017 00:00:00 | 001      | Sick leave
 15/11/2017 00:00:00 | 001      | At work   
 16/11/2017 00:00:00 | 001      | At work   
 31/12/9999 00:00:00 | 001      | BOGUS       <-- temporary 'end' record

下一步是使用我们扩展的源数据为每个数据记录提供一个范围。

虽然每个范围的开始只是Date，但每个记录的上限将由 a）找到Date下一个（不同）的Status，然后 b）从所述中减去 1 天Date：

-- with ranges   as ...

select curr.Employee,
       curr.Status,
       curr.[Date],
       dateadd(day,-1,min(change.[Date])) as maxDate

from   expanded curr

left
join   expanded change

on     curr.Employee  = change.Employee
and    curr.Status   != change.Status
and    curr.[Date]    < change.[Date]

group by curr.Employee,
         curr.[Date],
         curr.Status

 Employee | Status     | Date                | maxDate            
 -------- | ---------- | ------------------- | -------------------
 001      | At work    | 01/01/1800 00:00:00 | 06/11/2017 00:00:00
 001      | At work    | 06/11/2017 00:00:00 | 06/11/2017 00:00:00
 001      | Sick leave | 07/11/2017 00:00:00 | 07/11/2017 00:00:00
 001      | At work    | 08/11/2017 00:00:00 | 10/11/2017 00:00:00
 001      | At work    | 09/11/2017 00:00:00 | 10/11/2017 00:00:00
 001      | Sick leave | 11/11/2017 00:00:00 | 14/11/2017 00:00:00
 001      | Sick leave | 12/11/2017 00:00:00 | 14/11/2017 00:00:00
 001      | At work    | 15/11/2017 00:00:00 | 30/12/9999 00:00:00
 001      | At work    | 16/11/2017 00:00:00 | 30/12/9999 00:00:00
 001      | BOGUS      | 31/12/9999 00:00:00 | null

最后一部分是将这些范围分组，maxDate丢弃Status=BOGUS记录，并根据问题中显示的所需输出进行一些数据转换以“漂亮地打印”结果。

我们确实需要加入我们的扩展数据以获得有效的EndDt（ranges.maxDate不一定是有效的日期，因为我们只是减去一天而没有验证所说的日期是实际Date值）：

-- with expanded as ...
-- with ranges   as ...

select  r.Employee,
        r.Status,
        case min(r.[Date])  
             when '18000101'
             then 'Some time in the history'
             else convert(varchar(30),min(r.[Date]),112)
        end                                                 as StartDT,
        convert(varchar(8),max(e.[Date]),112)               as EndDT

from   ranges r
join   expanded e

on     r.Employee  = e.Employee
and    r.Status    = e.Status
and    r.[Date]    = e.[Date]
and    r.maxDate  >= e.[Date]

and    r.Status   != 'BOGUS'

group by r.Employee,
         r.Status,
         r.maxDate

order by 1,4

 Employee | Status     | StartDT                  | EndDT
 -------- | ---------- | ------------------------ | --------
 001      | At work    | Some time in the history | 20171106
 001      | Sick leave | 20171107                 | 20171107
 001      | At work    | 20171108                 | 20171109
 001      | Sick leave | 20171111                 | 20171112
 001      | At work    | 20171115                 | 20171116

这是上面的小提琴。

可能有一种更有效的方法来做到这一点，在后台凝结后我可能会想到别的东西，但现在我想把这个（蛮力？）想法记下来......也许有人可以把它用作更有效的方法的起点......

计算期间开始和结束日期

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

计算期间开始和结束日期

2 个回答

相关问题