死锁的主要原因是什么，可以预防吗？

Question

Vladimir Baranov

Asked: 2016-04-23 05:40:44 +0800 CST2016-04-23 05:40:44 +0800 CST 2016-04-23 05:40:44 +0800 CST

将每日计划分组到 [开始日期; 结束日期] 与工作日列表的间隔

772

我需要在两个系统之间转换数据。

第一个系统将日程表存储为一个简单的日期列表。计划中包含的每个日期都是一行。日期顺序中可能存在各种间隙（周末、公共假期和较长的停顿，一周中的某些日子可能会被排除在日程表之外）。完全不能有空档，甚至可以包括周末。时间表最长可达 2 年。通常是几周之久。

这是一个简单的计划示例，该计划跨越两周，不包括周末（下面的脚本中有更复杂的示例）：

+----+------------+------------+---------+--------+
| ID | ContractID |     dt     | dowChar | dowInt |
+----+------------+------------+---------+--------+
| 10 |          1 | 2016-05-02 | Mon     |      2 |
| 11 |          1 | 2016-05-03 | Tue     |      3 |
| 12 |          1 | 2016-05-04 | Wed     |      4 |
| 13 |          1 | 2016-05-05 | Thu     |      5 |
| 14 |          1 | 2016-05-06 | Fri     |      6 |
| 15 |          1 | 2016-05-09 | Mon     |      2 |
| 16 |          1 | 2016-05-10 | Tue     |      3 |
| 17 |          1 | 2016-05-11 | Wed     |      4 |
| 18 |          1 | 2016-05-12 | Thu     |      5 |
| 19 |          1 | 2016-05-13 | Fri     |      6 |
+----+------------+------------+---------+--------+

ID是唯一的，但不一定是顺序的（它是主键）。每个合约中的日期都是唯一的（在上有唯一的索引(ContractID, dt)）。

第二个系统将时间表存储为带有作为时间表一部分的工作日列表的间隔。每个间隔由其开始日期和结束日期（包括）和日程表中包含的工作日列表定义。在这种格式中，您可以有效地定义重复的每周模式，例如周一至周三，但是当模式被打乱时（例如公共假期）就会变得很痛苦。

上面的简单示例如下所示：

+------------+------------+------------+----------+----------------------+
| ContractID |  StartDT   |   EndDT    | DayCount |       WeekDays       |
+------------+------------+------------+----------+----------------------+
|          1 | 2016-05-02 | 2016-05-13 |       10 | Mon,Tue,Wed,Thu,Fri, |
+------------+------------+------------+----------+----------------------+

[StartDT;EndDT]属于同一合约的区间不应重叠。

我需要将来自第一个系统的数据转换为第二个系统使用的格式。目前，我正在 C# 中为单个给定合同在客户端解决此问题，但我想在服务器端的 T-SQL 中执行此操作，以便在服务器之间进行批量处理和导出/导入。最有可能的是，它可以使用 CLR UDF 来完成，但在这个阶段我不能使用 SQLCLR。

这里的挑战是使间隔列表尽可能短且人性化。

例如，这个时间表：

+-----+------------+------------+---------+--------+
| ID  | ContractID |     dt     | dowChar | dowInt |
+-----+------------+------------+---------+--------+
| 223 |          2 | 2016-05-05 | Thu     |      5 |
| 224 |          2 | 2016-05-06 | Fri     |      6 |
| 225 |          2 | 2016-05-09 | Mon     |      2 |
| 226 |          2 | 2016-05-10 | Tue     |      3 |
| 227 |          2 | 2016-05-11 | Wed     |      4 |
| 228 |          2 | 2016-05-12 | Thu     |      5 |
| 229 |          2 | 2016-05-13 | Fri     |      6 |
| 230 |          2 | 2016-05-16 | Mon     |      2 |
| 231 |          2 | 2016-05-17 | Tue     |      3 |
+-----+------------+------------+---------+--------+

应该变成这样：

+------------+------------+------------+----------+----------------------+
| ContractID |  StartDT   |   EndDT    | DayCount |       WeekDays       |
+------------+------------+------------+----------+----------------------+
|          2 | 2016-05-05 | 2016-05-17 |        9 | Mon,Tue,Wed,Thu,Fri, |
+------------+------------+------------+----------+----------------------+

，不是这个：

+------------+------------+------------+----------+----------------------+
| ContractID |  StartDT   |   EndDT    | DayCount |       WeekDays       |
+------------+------------+------------+----------+----------------------+
|          2 | 2016-05-05 | 2016-05-06 |        2 | Thu,Fri,             |
|          2 | 2016-05-09 | 2016-05-13 |        5 | Mon,Tue,Wed,Thu,Fri, |
|          2 | 2016-05-16 | 2016-05-17 |        2 | Mon,Tue,             |
+------------+------------+------------+----------+----------------------+

我试图对gaps-and-islands这个问题应用一种方法。我试着分两次做。在第一遍中，我找到了简单连续天的岛屿，即岛屿的末端是天序列中的任何间隙，无论是周末、公共假期还是其他什么。对于每个这样找到的岛，我建立一个以逗号分隔的 distinct 列表WeekDays。在第二遍中，我小组通过查看周数序列中的差距或WeekDays.

使用这种方法，每个部分周都会以一个额外的间隔结束，如上所示，因为即使周数是连续的，也会WeekDays发生变化。此外，一周内可能存在定期间隔（参见ContractID=3示例数据，其中仅包含的数据Mon,Wed,Fri,），并且这种方法会在这样的时间表中为每一天生成单独的间隔。从好的方面来说，如果日程安排完全没有任何间隙（参见ContractID=7包含周末的示例数据），它会生成一个间隔，在这种情况下，开始周或结束周是否部分无关紧要。

请查看下面脚本中的其他示例，以更好地了解我所追求的。您可以看到，周末经常被排除在外，但一周中的任何其他日子也可能被排除在外。仅在示例 3Mon中，Wed并且Fri是计划的一部分。此外，还可以包括周末，如示例 7。该解决方案应平等对待一周中的所有日子。一周中的任何一天都可以包含在日程表中或排除在日程表之外。

要验证生成的间隔列表是否正确描述了给定的时间表，您可以使用以下伪代码：

循环遍历所有间隔
对于每个间隔，循环遍历开始日期和结束日期（包括）之间的所有日历日期。
对于每个日期，检查其星期几是否列在WeekDays. 如果是，则该日期包含在日程表中。

希望这可以阐明在什么情况下应该创建新的间隔。在示例 4 和 52016-05-09中，从日程表的中间删除了一个星期一 ( )，这样的日程表不能由单个间隔表示。在示例 6 中，时间表中有很长的间隔，因此需要两个间隔。

间隔代表计划中的每周模式，当模式被中断/更改时，必须添加新的间隔。例 11 前三周有一个模式Tue，然后这个模式变为Thu。因此，我们需要两个间隔来描述这样的时间表。

我目前使用的是 SQL Server 2008，所以解决方案应该在这个版本中工作。如果 SQL Server 2008 的解决方案可以使用更高版本的功能进行简化/改进，那将是一个奖励，请同时展示它。

我有一个Calendar表格（日期列表）和Numbers表格（从 1 开始的整数列表），所以如果需要，可以使用它们。也可以创建临时表并有多个查询分阶段处理数据。但是，算法中的阶段数必须固定，游标和显式WHILE循环都不行。

示例数据和预期结果的脚本

-- @Src is sample data
-- @Dst is expected result

DECLARE @Src TABLE (ID int PRIMARY KEY, ContractID int, dt date, dowChar char(3), dowInt int);
INSERT INTO @Src (ID, ContractID, dt, dowChar, dowInt) VALUES

-- simple two weeks (without weekend)
(110, 1, '2016-05-02', 'Mon', 2),
(111, 1, '2016-05-03', 'Tue', 3),
(112, 1, '2016-05-04', 'Wed', 4),
(113, 1, '2016-05-05', 'Thu', 5),
(114, 1, '2016-05-06', 'Fri', 6),
(115, 1, '2016-05-09', 'Mon', 2),
(116, 1, '2016-05-10', 'Tue', 3),
(117, 1, '2016-05-11', 'Wed', 4),
(118, 1, '2016-05-12', 'Thu', 5),
(119, 1, '2016-05-13', 'Fri', 6),

-- a partial end of the week, the whole week, partial start of the week (without weekends)
(223, 2, '2016-05-05', 'Thu', 5),
(224, 2, '2016-05-06', 'Fri', 6),
(225, 2, '2016-05-09', 'Mon', 2),
(226, 2, '2016-05-10', 'Tue', 3),
(227, 2, '2016-05-11', 'Wed', 4),
(228, 2, '2016-05-12', 'Thu', 5),
(229, 2, '2016-05-13', 'Fri', 6),
(230, 2, '2016-05-16', 'Mon', 2),
(231, 2, '2016-05-17', 'Tue', 3),

-- only Mon, Wed, Fri are included across two weeks plus partial third week
(310, 3, '2016-05-02', 'Mon', 2),
(311, 3, '2016-05-04', 'Wed', 4),
(314, 3, '2016-05-06', 'Fri', 6),
(315, 3, '2016-05-09', 'Mon', 2),
(317, 3, '2016-05-11', 'Wed', 4),
(319, 3, '2016-05-13', 'Fri', 6),
(330, 3, '2016-05-16', 'Mon', 2),

-- a whole week (without weekend), in the second week Mon is not included
(410, 4, '2016-05-02', 'Mon', 2),
(411, 4, '2016-05-03', 'Tue', 3),
(412, 4, '2016-05-04', 'Wed', 4),
(413, 4, '2016-05-05', 'Thu', 5),
(414, 4, '2016-05-06', 'Fri', 6),
(416, 4, '2016-05-10', 'Tue', 3),
(417, 4, '2016-05-11', 'Wed', 4),
(418, 4, '2016-05-12', 'Thu', 5),
(419, 4, '2016-05-13', 'Fri', 6),

-- three weeks, but without Mon in the second week (no weekends)
(510, 5, '2016-05-02', 'Mon', 2),
(511, 5, '2016-05-03', 'Tue', 3),
(512, 5, '2016-05-04', 'Wed', 4),
(513, 5, '2016-05-05', 'Thu', 5),
(514, 5, '2016-05-06', 'Fri', 6),
(516, 5, '2016-05-10', 'Tue', 3),
(517, 5, '2016-05-11', 'Wed', 4),
(518, 5, '2016-05-12', 'Thu', 5),
(519, 5, '2016-05-13', 'Fri', 6),
(520, 5, '2016-05-16', 'Mon', 2),
(521, 5, '2016-05-17', 'Tue', 3),
(522, 5, '2016-05-18', 'Wed', 4),
(523, 5, '2016-05-19', 'Thu', 5),
(524, 5, '2016-05-20', 'Fri', 6),

-- long gap between two intervals
(623, 6, '2016-05-05', 'Thu', 5),
(624, 6, '2016-05-06', 'Fri', 6),
(625, 6, '2016-05-09', 'Mon', 2),
(626, 6, '2016-05-10', 'Tue', 3),
(627, 6, '2016-05-11', 'Wed', 4),
(628, 6, '2016-05-12', 'Thu', 5),
(629, 6, '2016-05-13', 'Fri', 6),
(630, 6, '2016-05-16', 'Mon', 2),
(631, 6, '2016-05-17', 'Tue', 3),
(645, 6, '2016-06-06', 'Mon', 2),
(646, 6, '2016-06-07', 'Tue', 3),
(647, 6, '2016-06-08', 'Wed', 4),
(648, 6, '2016-06-09', 'Thu', 5),
(649, 6, '2016-06-10', 'Fri', 6),
(655, 6, '2016-06-13', 'Mon', 2),
(656, 6, '2016-06-14', 'Tue', 3),
(657, 6, '2016-06-15', 'Wed', 4),
(658, 6, '2016-06-16', 'Thu', 5),
(659, 6, '2016-06-17', 'Fri', 6),

-- two weeks, no gaps between days at all, even weekends are included
(710, 7, '2016-05-02', 'Mon', 2),
(711, 7, '2016-05-03', 'Tue', 3),
(712, 7, '2016-05-04', 'Wed', 4),
(713, 7, '2016-05-05', 'Thu', 5),
(714, 7, '2016-05-06', 'Fri', 6),
(715, 7, '2016-05-07', 'Sat', 7),
(716, 7, '2016-05-08', 'Sun', 1),
(725, 7, '2016-05-09', 'Mon', 2),
(726, 7, '2016-05-10', 'Tue', 3),
(727, 7, '2016-05-11', 'Wed', 4),
(728, 7, '2016-05-12', 'Thu', 5),
(729, 7, '2016-05-13', 'Fri', 6),

-- no gaps between days at all, even weekends are included, with partial weeks
(805, 8, '2016-04-30', 'Sat', 7),
(806, 8, '2016-05-01', 'Sun', 1),
(810, 8, '2016-05-02', 'Mon', 2),
(811, 8, '2016-05-03', 'Tue', 3),
(812, 8, '2016-05-04', 'Wed', 4),
(813, 8, '2016-05-05', 'Thu', 5),
(814, 8, '2016-05-06', 'Fri', 6),
(815, 8, '2016-05-07', 'Sat', 7),
(816, 8, '2016-05-08', 'Sun', 1),
(825, 8, '2016-05-09', 'Mon', 2),
(826, 8, '2016-05-10', 'Tue', 3),
(827, 8, '2016-05-11', 'Wed', 4),
(828, 8, '2016-05-12', 'Thu', 5),
(829, 8, '2016-05-13', 'Fri', 6),
(830, 8, '2016-05-14', 'Sat', 7),

-- only Mon-Wed included, two weeks plus partial third week
(910, 9, '2016-05-02', 'Mon', 2),
(911, 9, '2016-05-03', 'Tue', 3),
(912, 9, '2016-05-04', 'Wed', 4),
(915, 9, '2016-05-09', 'Mon', 2),
(916, 9, '2016-05-10', 'Tue', 3),
(917, 9, '2016-05-11', 'Wed', 4),
(930, 9, '2016-05-16', 'Mon', 2),
(931, 9, '2016-05-17', 'Tue', 3),

-- only Thu-Sun included, three weeks
(1013,10,'2016-05-05', 'Thu', 5),
(1014,10,'2016-05-06', 'Fri', 6),
(1015,10,'2016-05-07', 'Sat', 7),
(1016,10,'2016-05-08', 'Sun', 1),
(1018,10,'2016-05-12', 'Thu', 5),
(1019,10,'2016-05-13', 'Fri', 6),
(1020,10,'2016-05-14', 'Sat', 7),
(1021,10,'2016-05-15', 'Sun', 1),
(1023,10,'2016-05-19', 'Thu', 5),
(1024,10,'2016-05-20', 'Fri', 6),
(1025,10,'2016-05-21', 'Sat', 7),
(1026,10,'2016-05-22', 'Sun', 1),

-- only Tue for first three weeks, then only Thu for the next three weeks
(1111,11,'2016-05-03', 'Tue', 3),
(1116,11,'2016-05-10', 'Tue', 3),
(1131,11,'2016-05-17', 'Tue', 3),
(1123,11,'2016-05-19', 'Thu', 5),
(1124,11,'2016-05-26', 'Thu', 5),
(1125,11,'2016-06-02', 'Thu', 5),

-- one week, then one week gap, then one week
(1210,12,'2016-05-02', 'Mon', 2),
(1211,12,'2016-05-03', 'Tue', 3),
(1212,12,'2016-05-04', 'Wed', 4),
(1213,12,'2016-05-05', 'Thu', 5),
(1214,12,'2016-05-06', 'Fri', 6),
(1215,12,'2016-05-16', 'Mon', 2),
(1216,12,'2016-05-17', 'Tue', 3),
(1217,12,'2016-05-18', 'Wed', 4),
(1218,12,'2016-05-19', 'Thu', 5),
(1219,12,'2016-05-20', 'Fri', 6);

SELECT ID, ContractID, dt, dowChar, dowInt
FROM @Src
ORDER BY ContractID, dt;


DECLARE @Dst TABLE (ContractID int, StartDT date, EndDT date, DayCount int, WeekDays varchar(255));
INSERT INTO @Dst (ContractID, StartDT, EndDT, DayCount, WeekDays) VALUES
(1, '2016-05-02', '2016-05-13', 10, 'Mon,Tue,Wed,Thu,Fri,'),
(2, '2016-05-05', '2016-05-17',  9, 'Mon,Tue,Wed,Thu,Fri,'),
(3, '2016-05-02', '2016-05-16',  7, 'Mon,Wed,Fri,'),
(4, '2016-05-02', '2016-05-06',  5, 'Mon,Tue,Wed,Thu,Fri,'),
(4, '2016-05-10', '2016-05-13',  4, 'Tue,Wed,Thu,Fri,'),
(5, '2016-05-02', '2016-05-06',  5, 'Mon,Tue,Wed,Thu,Fri,'),
(5, '2016-05-10', '2016-05-20',  9, 'Mon,Tue,Wed,Thu,Fri,'),
(6, '2016-05-05', '2016-05-17',  9, 'Mon,Tue,Wed,Thu,Fri,'),
(6, '2016-06-06', '2016-06-17', 10, 'Mon,Tue,Wed,Thu,Fri,'),
(7, '2016-05-02', '2016-05-13', 12, 'Sun,Mon,Tue,Wed,Thu,Fri,Sat,'),
(8, '2016-04-30', '2016-05-14', 15, 'Sun,Mon,Tue,Wed,Thu,Fri,Sat,'),
(9, '2016-05-02', '2016-05-17',  8, 'Mon,Tue,Wed,'),
(10,'2016-05-05', '2016-05-22', 12, 'Sun,Thu,Fri,Sat,'),
(11,'2016-05-03', '2016-05-17',  3, 'Tue,'),
(11,'2016-05-19', '2016-06-02',  3, 'Thu,'),
(12,'2016-05-02', '2016-05-06',  5, 'Mon,Tue,Wed,Thu,Fri,'),
(12,'2016-05-16', '2016-05-20',  5, 'Mon,Tue,Wed,Thu,Fri,');

SELECT ContractID, StartDT, EndDT, DayCount, WeekDays
FROM @Dst
ORDER BY ContractID, StartDT;

答案比较

真正的表@Src有不同的403,555行。所有答案都会产生正确的结果（至少对于我的数据而言），并且所有答案都相当快，但它们的最优性不同。生成的间隔越少越好。出于好奇，我将运行时间包括在内。主要关注的是正确和最佳的结果，而不是速度（除非花费太长时间 - 我在 10 分钟后停止了 Ziggy Crueltyfree Zeitgeister 的非递归查询）。15,857ContractIDs

+--------------------------------------------------------+-----------+---------+
|                         Answer                         | Intervals | Seconds |
+--------------------------------------------------------+-----------+---------+
| Ziggy Crueltyfree Zeitgeister                          |     25751 |    7.88 |
| While loop                                             |           |         |
|                                                        |           |         |
| Ziggy Crueltyfree Zeitgeister                          |     25751 |    8.27 |
| Recursive                                              |           |         |
|                                                        |           |         |
| Michael Green                                          |     25751 |   22.63 |
| Recursive                                              |           |         |
|                                                        |           |         |
| Geoff Patterson                                        |     26670 |    4.79 |
| Weekly gaps-and-islands with merging of partial weeks  |           |         |
|                                                        |           |         |
| Vladimir Baranov                                       |     34560 |    4.03 |
| Daily, then weekly gaps-and-islands                    |           |         |
|                                                        |           |         |
| Mikael Eriksson                                        |     35840 |    0.65 |
| Weekly gaps-and-islands                                |           |         |
+--------------------------------------------------------+-----------+---------+
| Vladimir Baranov                                       |     25751 |  121.51 |
| Cursor                                                 |           |         |
+--------------------------------------------------------+-----------+---------+

7 个回答

Voted

Ezequiel Tolnay · Answer 1 · 2016-04-27T21:09:13+08:00

This one uses a recursive CTE. Its result is identical to the example in the question. It was a nightmare to come up with... The code includes comments to ease through its convoluted logic.

SET DATEFIRST 1 -- Make Monday weekday=1

DECLARE @Ranked TABLE (RowID int NOT NULL IDENTITY PRIMARY KEY,                   -- Incremental uninterrupted sequence in the right order
                       ID int NOT NULL UNIQUE, ContractID int NOT NULL, dt date,  -- Original relevant values (ID is not really necessary)
                       WeekNo int NOT NULL, dowBit int NOT NULL);                 -- Useful to find gaps in days or weeks
INSERT INTO @Ranked
SELECT ID, ContractID, dt,
       DATEDIFF(WEEK, '1900-01-01', DATEADD(DAY, 1-DATEPART(dw, dt), dt)) AS WeekNo,
       POWER(2, DATEPART(dw, dt)-1) AS dowBit
FROM @Src
ORDER BY ContractID, WeekNo, dowBit

/*
Each evaluated date makes part of the carried sequence if:
  - this is not a new contract, and
    - sequence started this week, or
    - same day last week was part of the sequence, or
    - sequence started last week and today is a lower day than the accumulated weekdays list
  - and there are no sequence gaps since previous day
(otherwise it does not make part of the old sequence, so it starts a new one) */

DECLARE @RankedRanges TABLE (RowID int NOT NULL PRIMARY KEY, WeekDays int NOT NULL, StartRowID int NULL);

WITH WeeksCTE AS -- Needed for building the sequence gradually, and comparing the carried sequence (and previous day) with a current evaluated day
( 
    SELECT RowID, ContractID, dowBit, WeekNo, RowID AS StartRowID, WeekNo AS StartWN, dowBit AS WeekDays, dowBit AS StartWeekDays
    FROM @Ranked
    WHERE RowID = 1 
    UNION ALL
    SELECT RowID, ContractID, dowBit, WeekNo, StartRowID,
           CASE WHEN StartRowID IS NULL THEN StartWN ELSE WeekNo END AS WeekNo,
           CASE WHEN StartRowID IS NULL THEN WeekDays | dowBit ELSE dowBit END AS WeekDays,
           CASE WHEN StartRowID IS NOT NULL THEN dowBit WHEN WeekNo = StartWN THEN StartWeekDays | dowBit ELSE StartWeekDays END AS StartWeekDays
    FROM (
        SELECT w.*, pre.StartWN, pre.WeekDays, pre.StartWeekDays,
               CASE WHEN w.ContractID <> pre.ContractID OR     -- New contract always break the sequence
                         NOT (w.WeekNo = pre.StartWN OR        -- Same week as a new sequence always keeps the sequence
                              w.dowBit & pre.WeekDays > 0 OR   -- Days in the sequence keep the sequence (provided there are no gaps, checked later)
                              (w.WeekNo = pre.StartWN+1 AND (w.dowBit-1) & pre.StartWeekDays = 0)) OR -- Days in the second week when less than a week passed since the sequence started remain in sequence
                         (w.WeekNo > pre.StartWN AND -- look for gap after initial week
                          w.WeekNo > pre.WeekNo+1 OR -- look for full-week gaps
                          (w.WeekNo = pre.WeekNo AND                            -- when same week as previous day,
                           ((w.dowBit-1) ^ (pre.dowBit*2-1)) & pre.WeekDays > 0 -- days between this and previous weekdays, compared to current series
                          ) OR
                          (w.WeekNo > pre.WeekNo AND                                   -- when following week of previous day,
                           ((-1 ^ (pre.dowBit*2-1)) | (w.dowBit-1)) & pre.WeekDays > 0 -- days between this and previous weekdays, compared to current series
                          )) THEN w.RowID END AS StartRowID
        FROM WeeksCTE pre
        JOIN @Ranked w ON (w.RowID = pre.RowID + 1)
        ) w
) 
INSERT INTO @RankedRanges -- days sequence and starting point of each sequence
SELECT RowID, WeekDays, StartRowID
--SELECT *
FROM WeeksCTE
OPTION (MAXRECURSION 0)

--SELECT * FROM @RankedRanges

DECLARE @Ranges TABLE (RowNo int NOT NULL IDENTITY PRIMARY KEY, RowID int NOT NULL);

INSERT INTO @Ranges       -- @RankedRanges filtered only by start of each range, with numbered rows to easily find the end of each range
SELECT StartRowID
FROM @RankedRanges
WHERE StartRowID IS NOT NULL
ORDER BY 1

-- Final result putting everything together
SELECT rs.ContractID, rs.dt AS StartDT, re.dt AS EndDT, re.RowID-rs.RowID+1 AS DayCount,
       CASE WHEN rr.WeekDays & 64 > 0 THEN 'Sun,' ELSE '' END +
       CASE WHEN rr.WeekDays & 1 > 0 THEN 'Mon,' ELSE '' END +
       CASE WHEN rr.WeekDays & 2 > 0 THEN 'Tue,' ELSE '' END +
       CASE WHEN rr.WeekDays & 4 > 0 THEN 'Wed,' ELSE '' END +
       CASE WHEN rr.WeekDays & 8 > 0 THEN 'Thu,' ELSE '' END +
       CASE WHEN rr.WeekDays & 16 > 0 THEN 'Fri,' ELSE '' END +
       CASE WHEN rr.WeekDays & 32 > 0 THEN 'Sat,' ELSE '' END AS WeekDays
FROM (
    SELECT r.RowID AS StartRowID, COALESCE(pos.RowID-1, (SELECT MAX(RowID) FROM @Ranked)) AS EndRowID
    FROM @Ranges r
    LEFT JOIN @Ranges pos ON (pos.RowNo = r.RowNo + 1)
    ) g
JOIN @Ranked rs ON (rs.RowID = g.StartRowID)
JOIN @Ranked re ON (re.RowID = g.EndRowID)
JOIN @RankedRanges rr ON (rr.RowID = re.RowID)

Another strategy

This one should be significantly faster than the previous one because it doesn't rely on the slow limited recursive CTE in SQL Server 2008, although it implements more or less the same strategy.

There is a WHILE loop (I couldn't devise a way to avoid it), but goes for a reduced number of iterations (the highest number of sequences (minus one) on any given contract).

It's a simple strategy, and could be used for sequences either shorter or longer than a week (replacing any occurrence of the constant 7 for any other number, and the dowBit calculated from MODULUS x of DayNo rather than DATEPART(wk)) and up to 32.

SET DATEFIRST 1 -- Make Monday weekday=1

-- Get the minimum information needed to calculate sequences
DECLARE @Days TABLE (ContractID int NOT NULL, dt date, DayNo int NOT NULL, dowBit int NOT NULL, PRIMARY KEY (ContractID, DayNo));
INSERT INTO @Days
SELECT ContractID, dt, CAST(CAST(dt AS datetime) AS int) AS DayNo, POWER(2, DATEPART(dw, dt)-1) AS dowBit
FROM @Src

DECLARE @RangeStartFirstPass TABLE (ContractID int NOT NULL, DayNo int NOT NULL, PRIMARY KEY (ContractID, DayNo))

-- Calculate, from the above list, which days are not present in the previous 7
INSERT INTO @RangeStartFirstPass
SELECT r.ContractID, r.DayNo
FROM @Days r
LEFT JOIN @Days pr ON (pr.ContractID = r.ContractID AND pr.DayNo BETWEEN r.DayNo-7 AND r.DayNo-1) -- Last 7 days
GROUP BY r.ContractID, r.DayNo, r.dowBit
HAVING r.dowBit & COALESCE(SUM(pr.dowBit), 0) = 0

-- Update the previous list with all days that occur right after a missing day
INSERT INTO @RangeStartFirstPass
SELECT *
FROM (
    SELECT DISTINCT ContractID, (SELECT MIN(DayNo) FROM @Days WHERE ContractID = d.ContractID AND DayNo > d.DayNo + 7) AS DayNo
    FROM @Days d
    WHERE NOT EXISTS (SELECT 1 FROM @Days WHERE ContractID = d.ContractID AND DayNo = d.DayNo + 7)
    ) d
WHERE DayNo IS NOT NULL AND
      NOT EXISTS (SELECT 1 FROM @RangeStartFirstPass WHERE ContractID = d.ContractID AND DayNo = d.DayNo)

DECLARE @RangeStart TABLE (ContractID int NOT NULL, DayNo int NOT NULL, PRIMARY KEY (ContractID, DayNo));

-- Fetch the first sequence for each contract
INSERT INTO @RangeStart
SELECT ContractID, MIN(DayNo)
FROM @RangeStartFirstPass
GROUP BY ContractID

-- Add to the list above the next sequence for each contract, until all are added
-- (ensure no sequence is added with less than 7 days)
WHILE @@ROWCOUNT > 0
  INSERT INTO @RangeStart
  SELECT f.ContractID, MIN(f.DayNo)
  FROM (SELECT ContractID, MAX(DayNo) AS DayNo FROM @RangeStart GROUP BY ContractID) s
  JOIN @RangeStartFirstPass f ON (f.ContractID = s.ContractID AND f.DayNo > s.DayNo + 7)
  GROUP BY f.ContractID

-- Summarise results
SELECT ContractID, StartDT, EndDT, DayCount,
       CASE WHEN WeekDays & 64 > 0 THEN 'Sun,' ELSE '' END +
       CASE WHEN WeekDays & 1 > 0 THEN 'Mon,' ELSE '' END +
       CASE WHEN WeekDays & 2 > 0 THEN 'Tue,' ELSE '' END +
       CASE WHEN WeekDays & 4 > 0 THEN 'Wed,' ELSE '' END +
       CASE WHEN WeekDays & 8 > 0 THEN 'Thu,' ELSE '' END +
       CASE WHEN WeekDays & 16 > 0 THEN 'Fri,' ELSE '' END +
       CASE WHEN WeekDays & 32 > 0 THEN 'Sat,' ELSE '' END AS WeekDays
FROM (
    SELECT r.ContractID,
           MIN(d.dt) AS StartDT,
           MAX(d.dt) AS EndDT,
           COUNT(*) AS DayCount,
           SUM(DISTINCT d.dowBit) AS WeekDays
    FROM (SELECT *, COALESCE((SELECT MIN(DayNo) FROM @RangeStart WHERE ContractID = rs.ContractID AND DayNo > rs.DayNo), 999999) AS DayEnd FROM @RangeStart rs) r
    JOIN @Days d ON (d.ContractID = r.ContractID AND d.DayNo BETWEEN r.DayNo AND r.DayEnd-1)
    GROUP BY r.ContractID, r.DayNo
    ) d
ORDER BY ContractID, StartDT

Mikael Eriksson · Answer 2 · 2016-04-26T23:52:58+08:00

不完全是您正在寻找的东西，但您可能会感兴趣。

该查询使用逗号分隔的字符串为每周使用的天数创建周。然后它在中找到使用相同模式的连续几周的岛屿Weekdays。

with Weeks as
(
  select T.*,
         row_number() over(partition by T.ContractID, T.WeekDays order by T.WeekNumber) as rn
  from (
       select S1.ContractID,
              min(S1.dt) as StartDT,
              max(S1.dt) as EndDT,
              datediff(day, 0, S1.dt) / 7 as WeekNumber, -- Number of weeks since '1900-01-01 (a monday)'
              count(*) as DayCount,
              stuff((
                    select ','+S2.dowChar
                    from @Src as S2
                    where S2.ContractID = S1.ContractID and
                          S2.dt between min(S1.dt) and max(S1.dt)
                    order by S2.dt
                    for xml path('')
                    ), 1, 1, '') as WeekDays
       from @Src as S1
       group by S1.ContractID, 
                datediff(day, 0, S1.dt) / 7
       ) as T
)
select W.ContractID,
       min(W.StartDT) as StartDT,
       max(W.EndDT) as EndDT,
       count(*) * W.DayCount as DayCount,
       W.WeekDays
from Weeks as W
group by W.ContractID,
         W.WeekDays,
         W.DayCount,
         W.rn - W.WeekNumber
order by W.ContractID,
         min(W.WeekNumber);

结果：

ContractID  StartDT    EndDT      DayCount    WeekDays
----------- ---------- ---------- ----------- -----------------------------
1           2016-05-02 2016-05-13 10          Mon,Tue,Wed,Thu,Fri
2           2016-05-05 2016-05-06 2           Thu,Fri
2           2016-05-09 2016-05-13 5           Mon,Tue,Wed,Thu,Fri
2           2016-05-16 2016-05-17 2           Mon,Tue
3           2016-05-02 2016-05-13 6           Mon,Wed,Fri
3           2016-05-16 2016-05-16 1           Mon
4           2016-05-02 2016-05-06 5           Mon,Tue,Wed,Thu,Fri
4           2016-05-10 2016-05-13 4           Tue,Wed,Thu,Fri
5           2016-05-02 2016-05-06 5           Mon,Tue,Wed,Thu,Fri
5           2016-05-10 2016-05-13 4           Tue,Wed,Thu,Fri
5           2016-05-16 2016-05-20 5           Mon,Tue,Wed,Thu,Fri
6           2016-05-05 2016-05-06 2           Thu,Fri
6           2016-05-09 2016-05-13 5           Mon,Tue,Wed,Thu,Fri
6           2016-05-16 2016-05-17 2           Mon,Tue
6           2016-06-06 2016-06-17 10          Mon,Tue,Wed,Thu,Fri
7           2016-05-02 2016-05-08 7           Mon,Tue,Wed,Thu,Fri,Sat,Sun
7           2016-05-09 2016-05-13 5           Mon,Tue,Wed,Thu,Fri
8           2016-04-30 2016-05-01 2           Sat,Sun
8           2016-05-02 2016-05-08 7           Mon,Tue,Wed,Thu,Fri,Sat,Sun
8           2016-05-09 2016-05-14 6           Mon,Tue,Wed,Thu,Fri,Sat
9           2016-05-02 2016-05-11 6           Mon,Tue,Wed
9           2016-05-16 2016-05-17 2           Mon,Tue
10          2016-05-05 2016-05-22 12          Thu,Fri,Sat,Sun
11          2016-05-03 2016-05-10 2           Tue
11          2016-05-17 2016-05-19 2           Tue,Thu
11          2016-05-26 2016-06-02 2           Thu

ContractID = 2 shows what the difference in the result is compared to what you want. The first and last week will be treated as separate periods since WeekDays is different.

Geoff Patterson · Answer 3 · 2016-04-28T10:52:35+08:00

I ended up with an approach that yields the optimal solution in this case and I think will do well in general. The solution is quite lengthy, however, so it would be interesting to see if someone else has a different approach that is more concise.

Here is a script that contains the full solution.

And here is an outline of the algorithm:

Pivot the data set so that there is a single row representing each week
Compute the islands of weeks within each ContractId
Merge any adjacent weeks that fall within the same ContractId and have same WeekDays
For any single weeks (not yet merged) where the previous grouping is on the same island and the WeekDays of the single week matches a leading subset of the WeekDays of the previous grouping, merge into that previous grouping
For any single weeks (not yet merged) where the next grouping is on the same island and the WeekDays of the single week matches a trailing subset of the WeekDays of the next grouping, merge into that next grouping
For any two adjacent weeks on the same island where neither has been merged, merge them together if they are both partial weeks that can be combined (e.g., "Mon,Tue,Wed,Thu," and "Wed,Thu,Sat,")
For any remaining single weeks (not yet merged), if possible split the week into two parts and merge both parts, the first part into the previous grouping on the same island, and the second part into the following grouping on the same island

Ezequiel Tolnay · Answer 4 · 2016-04-24T19:27:37+08:00

我无法理解将周与间隔分组或将周与周末分组背后的逻辑（例如，当周末连续两周时，周末去哪一周？）。

以下查询产生所需的输出，只是它只对连续的工作日进行分组，并将星期几分组 Sun-Sat（而不是 Mon-Sun）。虽然不完全是您想要的，但也许这可以为不同的策略提供一些线索。天的分组来自这里。使用的窗口函数应该适用于 SQLServer 2008，但我没有那个版本来测试它是否真的可以。

WITH 
  mysrc AS (
    SELECT *, RANK() OVER (PARTITION BY ContractID ORDER BY DT) AS rank
    FROM @Src
    ),
  prepos AS (
    SELECT s.*, pos.ID AS posid
    FROM mysrc s
    LEFT JOIN mysrc pos ON (pos.ContractID = s.ContractID AND pos.rank = s.rank+1 AND (pos.DowInt = s.DowInt+1 OR pos.DowInt = 2 AND s.DowInt=6))
    ),
  grped AS (
    SELECT TOP 100 *, (SELECT COUNT(CASE WHEN posid IS NULL THEN 1 END) FROM prepos WHERE contractid = p.contractid AND rank < p.rank) as grp
    FROM prepos p
    ORDER BY ContractID, DT
    )
SELECT ContractID, min(dt) AS StartDT, max(dt) AS EndDT, count(*) AS DayCount,
       STUFF( (SELECT ', ' + dowchar
               FROM (
                 SELECT TOP 100 dowint, dowchar 
                 FROM grped 
                 WHERE ContractID = g.ContractID AND grp = g.grp 
                 GROUP BY dowint, dowchar 
                 ORDER BY 1
                 ) a 
               FOR XML PATH(''), TYPE).value('.','varchar(max)'), 1, 2, '') AS WeekDays
FROM grped g
GROUP BY ContractID, grp
ORDER BY 1, 2

结果

+------------+------------+------------+----------+-----------------------------------+
| ContractID | StartDT    | EndDT      | DayCount | WeekDays                          |
+------------+------------+------------+----------+-----------------------------------+
| 1          | 2/05/2016  | 13/05/2016 | 10       | Mon, Tue, Wed, Thu, Fri           |
| 2          | 5/05/2016  | 17/05/2016 | 9        | Mon, Tue, Wed, Thu, Fri           |
| 3          | 2/05/2016  | 2/05/2016  | 1        | Mon                               |
| 3          | 4/05/2016  | 4/05/2016  | 1        | Wed                               |
| 3          | 6/05/2016  | 9/05/2016  | 2        | Mon, Fri                          |
| 3          | 11/05/2016 | 11/05/2016 | 1        | Wed                               |
| 3          | 13/05/2016 | 16/05/2016 | 2        | Mon, Fri                          |
| 4          | 2/05/2016  | 6/05/2016  | 5        | Mon, Tue, Wed, Thu, Fri           |
| 4          | 10/05/2016 | 13/05/2016 | 4        | Tue, Wed, Thu, Fri                |
| 5          | 2/05/2016  | 6/05/2016  | 5        | Mon, Tue, Wed, Thu, Fri           |
| 5          | 10/05/2016 | 20/05/2016 | 9        | Mon, Tue, Wed, Thu, Fri           |
| 6          | 5/05/2016  | 17/05/2016 | 9        | Mon, Tue, Wed, Thu, Fri           |
| 6          | 6/06/2016  | 17/06/2016 | 10       | Mon, Tue, Wed, Thu, Fri           |
| 7          | 2/05/2016  | 7/05/2016  | 6        | Mon, Tue, Wed, Thu, Fri, Sat      |
| 7          | 8/05/2016  | 13/05/2016 | 6        | Sun, Mon, Tue, Wed, Thu, Fri      |
| 8          | 30/04/2016 | 30/04/2016 | 1        | Sat                               |
| 8          | 1/05/2016  | 7/05/2016  | 7        | Sun, Mon, Tue, Wed, Thu, Fri, Sat |
| 8          | 8/05/2016  | 14/05/2016 | 7        | Sun, Mon, Tue, Wed, Thu, Fri, Sat |
| 9          | 2/05/2016  | 4/05/2016  | 3        | Mon, Tue, Wed                     |
| 9          | 9/05/2016  | 10/05/2016 | 2        | Mon, Tue                          |
+------------+------------+------------+----------+-----------------------------------+

Vladimir Baranov · Answer 5 · 2016-04-27T16:44:37+08:00

For the sake of completeness, here is a two-pass gaps-and-islands approach that I tried myself before asking this question.

As I was testing it on the real data I found few cases when it was producing incorrect results and fixed it.

Here is the algorithm:

Generate islands of consecutive dates (CTE_ContractDays, CTE_DailyRN, CTE_DailyIslands) and calculate a week number for each starting and ending date of an island. Here week number is calculated assuming that Monday is the first day of the week.
If the schedule has non-sequential dates within the same week (like in example 3), the previous stage will create several rows for the same week. Group rows to have only one row per week (CTE_Weeks).
For each row from the previous stage build a comma-separated list of week days (CTE_FirstResult).
Second pass of gaps-and-islands to group consecutive weeks with the same WeekDays (CTE_SecondRN, CTE_Schedules).

It handles well cases when there is no disruption in the weekly patterns (1, 7, 8, 10, 12). It handles well cases when pattern has non-sequential days (3).

But, unfortunately, it generates extra intervals for partial weeks (2, 3, 5, 6, 9, 11).

WITH
CTE_ContractDays
AS
(
    SELECT
         S.ContractID
        ,MIN(S.dt) OVER (PARTITION BY S.ContractID) AS ContractMinDT
        ,S.dt
        ,ROW_NUMBER() OVER (PARTITION BY S.ContractID ORDER BY S.dt) AS rn1
        ,DATEDIFF(day, '2001-01-01', S.dt) AS DayNumber
        ,S.dowChar
        ,S.dowInt
    FROM
        @Src AS S
)
,CTE_DailyRN
AS
(
    SELECT
        DayNumber - rn1 AS WeekGroupNumber
        ,ROW_NUMBER() OVER (
            PARTITION BY
                ContractID
                ,DayNumber - rn1
            ORDER BY dt) AS rn2
        ,ContractID
        ,ContractMinDT
        ,dt
        ,rn1
        ,DayNumber
        ,dowChar
        ,dowInt
    FROM CTE_ContractDays
)
,CTE_DailyIslands
AS
(
    SELECT
        ContractID
        ,ContractMinDT
        ,MIN(dt) AS MinDT
        ,MAX(dt) AS MaxDT
        ,COUNT(*) AS DayCount
        -- '2001-01-01' is Monday
        ,DATEDIFF(day, '2001-01-01', MIN(dt)) / 7 AS WeekNumberMin
        ,DATEDIFF(day, '2001-01-01', MAX(dt)) / 7 AS WeekNumberMax
    FROM CTE_DailyRN
    GROUP BY
        ContractID
        ,rn1-rn2
        ,ContractMinDT
)
,CTE_Weeks
AS
(
    SELECT
        ContractID
        ,ContractMinDT
        ,MIN(MinDT) AS MinDT
        ,MAX(MaxDT) AS MaxDT
        ,SUM(DayCount) AS DayCount
        ,WeekNumberMin
        ,WeekNumberMax
    FROM CTE_DailyIslands
    GROUP BY
        ContractID
        ,ContractMinDT
        ,WeekNumberMin
        ,WeekNumberMax
)
,CTE_FirstResult
AS
(
    SELECT
        ContractID
        ,ContractMinDT
        ,MinDT
        ,MaxDT
        ,DayCount
        ,CA_Data.XML_Value AS DaysOfWeek
        ,WeekNumberMin AS WeekNumber
        ,ROW_NUMBER() OVER(PARTITION BY ContractID ORDER BY MinDT) AS rn1
    FROM
        CTE_Weeks
        CROSS APPLY
        (
            SELECT CAST(CTE_ContractDays.dowChar AS varchar(8000)) + ',' AS dw
            FROM CTE_ContractDays
            WHERE
                    CTE_ContractDays.ContractID = CTE_Weeks.ContractID
                AND CTE_ContractDays.dt >= CTE_Weeks.MinDT
                AND CTE_ContractDays.dt <= CTE_Weeks.MaxDT
            GROUP BY
                CTE_ContractDays.dowChar
                ,CTE_ContractDays.dowInt
            ORDER BY CTE_ContractDays.dowInt
            FOR XML PATH(''), TYPE
        ) AS CA_XML(XML_Value)
        CROSS APPLY
        (
            SELECT CA_XML.XML_Value.value('.', 'VARCHAR(8000)')
        ) AS CA_Data(XML_Value)
)
,CTE_SecondRN
AS
(
    SELECT 
        ContractID
        ,ContractMinDT
        ,MinDT
        ,MaxDT
        ,DayCount
        ,DaysOfWeek
        ,WeekNumber
        ,rn1
        ,WeekNumber - rn1 AS SecondGroupNumber
        ,ROW_NUMBER() OVER (
            PARTITION BY
                ContractID
                ,DaysOfWeek
                ,DayCount
                ,WeekNumber - rn1
            ORDER BY MinDT) AS rn2
    FROM CTE_FirstResult
)
,CTE_Schedules
AS
(
    SELECT
        ContractID
        ,MIN(MinDT) AS StartDT
        ,MAX(MaxDT) AS EndDT
        ,SUM(DayCount) AS DayCount
        ,DaysOfWeek
    FROM CTE_SecondRN
    GROUP BY
        ContractID
        ,DaysOfWeek
        ,rn1-rn2
)
SELECT
    ContractID
    ,StartDT
    ,EndDT
    ,DayCount
    ,DaysOfWeek AS WeekDays
FROM CTE_Schedules
ORDER BY
    ContractID
    ,StartDT
;

Result

+------------+------------+------------+----------+------------------------------+
| ContractID |  StartDT   |   EndDT    | DayCount |           WeekDays           |
+------------+------------+------------+----------+------------------------------+
|          1 | 2016-05-02 | 2016-05-13 |       10 | Mon,Tue,Wed,Thu,Fri,         |
|          2 | 2016-05-05 | 2016-05-06 |        2 | Thu,Fri,                     |
|          2 | 2016-05-09 | 2016-05-13 |        5 | Mon,Tue,Wed,Thu,Fri,         |
|          2 | 2016-05-16 | 2016-05-17 |        2 | Mon,Tue,                     |
|          3 | 2016-05-02 | 2016-05-13 |        6 | Mon,Wed,Fri,                 |
|          3 | 2016-05-16 | 2016-05-16 |        1 | Mon,                         |
|          4 | 2016-05-02 | 2016-05-06 |        5 | Mon,Tue,Wed,Thu,Fri,         |
|          4 | 2016-05-10 | 2016-05-13 |        4 | Tue,Wed,Thu,Fri,             |
|          5 | 2016-05-02 | 2016-05-06 |        5 | Mon,Tue,Wed,Thu,Fri,         |
|          5 | 2016-05-10 | 2016-05-13 |        4 | Tue,Wed,Thu,Fri,             |
|          5 | 2016-05-16 | 2016-05-20 |        5 | Mon,Tue,Wed,Thu,Fri,         |
|          6 | 2016-05-05 | 2016-05-06 |        2 | Thu,Fri,                     |
|          6 | 2016-05-09 | 2016-05-13 |        5 | Mon,Tue,Wed,Thu,Fri,         |
|          6 | 2016-05-16 | 2016-05-17 |        2 | Mon,Tue,                     |
|          6 | 2016-06-06 | 2016-06-17 |       10 | Mon,Tue,Wed,Thu,Fri,         |
|          7 | 2016-05-02 | 2016-05-13 |       12 | Sun,Mon,Tue,Wed,Thu,Fri,Sat, |
|          8 | 2016-04-30 | 2016-05-14 |       15 | Sun,Mon,Tue,Wed,Thu,Fri,Sat, |
|          9 | 2016-05-02 | 2016-05-11 |        6 | Mon,Tue,Wed,                 |
|          9 | 2016-05-16 | 2016-05-17 |        2 | Mon,Tue,                     |
|         10 | 2016-05-05 | 2016-05-22 |       12 | Sun,Thu,Fri,Sat,             |
|         11 | 2016-05-03 | 2016-05-10 |        2 | Tue,                         |
|         11 | 2016-05-17 | 2016-05-19 |        2 | Tue,Thu,                     |
|         11 | 2016-05-26 | 2016-06-02 |        2 | Thu,                         |
|         12 | 2016-05-02 | 2016-05-06 |        5 | Mon,Tue,Wed,Thu,Fri,         |
|         12 | 2016-05-16 | 2016-05-20 |        5 | Mon,Tue,Wed,Thu,Fri,         |
+------------+------------+------------+----------+------------------------------+

Cursor-based solution

I converted my C# code into a cursor-based algorithm, just to see how it compares to other solutions on real data. It confirms that it is much slower than other set-based or recursive approaches, but it generates an optimal result.

CREATE TABLE #Dst_V2 (ContractID bigint, StartDT date, EndDT date, DayCount int, WeekDays varchar(255) COLLATE SQL_Latin1_General_CP1_CI_AS);

SET NOCOUNT ON;

DECLARE @VarOldDateFirst int = @@DATEFIRST;
SET DATEFIRST 7;

DECLARE @iFS int;
DECLARE @VarCursor CURSOR;
SET @VarCursor = CURSOR FAST_FORWARD
FOR
    SELECT
        ContractID
        ,dt
        ,dowChar
        ,dowInt
    FROM #Src AS S
    ;

OPEN @VarCursor;

DECLARE @CurrContractID bigint = 0;
DECLARE @Currdt date;
DECLARE @CurrdowChar char(3);
DECLARE @CurrdowInt int;


DECLARE @VarCreateNewInterval bit = 0;
DECLARE @VarTempDT date;
DECLARE @VarTempdowInt int;

DECLARE @LastContractID bigint = 0;
DECLARE @LastStartDT date;
DECLARE @LastEndDT date;
DECLARE @LastDayCount int = 0;
DECLARE @LastWeekDays varchar(255);
DECLARE @LastMonCount int;
DECLARE @LastTueCount int;
DECLARE @LastWedCount int;
DECLARE @LastThuCount int;
DECLARE @LastFriCount int;
DECLARE @LastSatCount int;
DECLARE @LastSunCount int;


FETCH NEXT FROM @VarCursor INTO @CurrContractID, @Currdt, @CurrdowChar, @CurrdowInt;
SET @iFS = @@FETCH_STATUS;
IF @iFS = 0
BEGIN
    SET @LastContractID = @CurrContractID;
    SET @LastStartDT = @Currdt;
    SET @LastEndDT = @Currdt;
    SET @LastDayCount = 1;
    SET @LastMonCount = 0;
    SET @LastTueCount = 0;
    SET @LastWedCount = 0;
    SET @LastThuCount = 0;
    SET @LastFriCount = 0;
    SET @LastSatCount = 0;
    SET @LastSunCount = 0;
    IF @CurrdowInt = 1 SET @LastSunCount = @LastSunCount + 1;
    IF @CurrdowInt = 2 SET @LastMonCount = @LastMonCount + 1;
    IF @CurrdowInt = 3 SET @LastTueCount = @LastTueCount + 1;
    IF @CurrdowInt = 4 SET @LastWedCount = @LastWedCount + 1;
    IF @CurrdowInt = 5 SET @LastThuCount = @LastThuCount + 1;
    IF @CurrdowInt = 6 SET @LastFriCount = @LastFriCount + 1;
    IF @CurrdowInt = 7 SET @LastSatCount = @LastSatCount + 1;
END;

WHILE @iFS = 0
BEGIN

    SET @VarCreateNewInterval = 0;

    -- Contract changes -> start new interval
    IF @LastContractID <> @CurrContractID
    BEGIN
        SET @VarCreateNewInterval = 1;
    END;

    IF @VarCreateNewInterval = 0
    BEGIN
        -- check days of week
        -- are we still within the first week of the interval?
        IF DATEDIFF(day, @LastStartDT, @Currdt) > 6
        BEGIN
            -- we are beyond the first week, check day of the week
            -- have we seen @CurrdowInt before?
            -- we should start a new interval if this is the new day of the week that didn't exist in the first week
            IF @CurrdowInt = 1 AND @LastSunCount = 0 SET @VarCreateNewInterval = 1;
            IF @CurrdowInt = 2 AND @LastMonCount = 0 SET @VarCreateNewInterval = 1;
            IF @CurrdowInt = 3 AND @LastTueCount = 0 SET @VarCreateNewInterval = 1;
            IF @CurrdowInt = 4 AND @LastWedCount = 0 SET @VarCreateNewInterval = 1;
            IF @CurrdowInt = 5 AND @LastThuCount = 0 SET @VarCreateNewInterval = 1;
            IF @CurrdowInt = 6 AND @LastFriCount = 0 SET @VarCreateNewInterval = 1;
            IF @CurrdowInt = 7 AND @LastSatCount = 0 SET @VarCreateNewInterval = 1;

            IF @VarCreateNewInterval = 0
            BEGIN
                -- check the gap between current day and last day of the interval
                -- if the gap between current day and last day of the interval
                -- contains a day of the week that was included in the interval before,
                -- we should create new interval
                SET @VarTempDT = DATEADD(day, 1, @LastEndDT);
                WHILE @VarTempDT < @Currdt
                BEGIN
                    SET @VarTempdowInt = DATEPART(WEEKDAY, @VarTempDT);

                    IF @VarTempdowInt = 1 AND @LastSunCount > 0 BEGIN SET @VarCreateNewInterval = 1; BREAK; END;
                    IF @VarTempdowInt = 2 AND @LastMonCount > 0 BEGIN SET @VarCreateNewInterval = 1; BREAK; END;
                    IF @VarTempdowInt = 3 AND @LastTueCount > 0 BEGIN SET @VarCreateNewInterval = 1; BREAK; END;
                    IF @VarTempdowInt = 4 AND @LastWedCount > 0 BEGIN SET @VarCreateNewInterval = 1; BREAK; END;
                    IF @VarTempdowInt = 5 AND @LastThuCount > 0 BEGIN SET @VarCreateNewInterval = 1; BREAK; END;
                    IF @VarTempdowInt = 6 AND @LastFriCount > 0 BEGIN SET @VarCreateNewInterval = 1; BREAK; END;
                    IF @VarTempdowInt = 7 AND @LastSatCount > 0 BEGIN SET @VarCreateNewInterval = 1; BREAK; END;

                    SET @VarTempDT = DATEADD(day, 1, @VarTempDT);
                END;
            END;
        END;
        -- else
        -- we are still within the first week, so we can add this day to the interval
    END;

    IF @VarCreateNewInterval = 1
    BEGIN
        -- save the new interval into the final table
        SET @LastWeekDays = '';
        IF @LastSunCount > 0 SET @LastWeekDays = @LastWeekDays + 'Sun,';
        IF @LastMonCount > 0 SET @LastWeekDays = @LastWeekDays + 'Mon,';
        IF @LastTueCount > 0 SET @LastWeekDays = @LastWeekDays + 'Tue,';
        IF @LastWedCount > 0 SET @LastWeekDays = @LastWeekDays + 'Wed,';
        IF @LastThuCount > 0 SET @LastWeekDays = @LastWeekDays + 'Thu,';
        IF @LastFriCount > 0 SET @LastWeekDays = @LastWeekDays + 'Fri,';
        IF @LastSatCount > 0 SET @LastWeekDays = @LastWeekDays + 'Sat,';

        INSERT INTO #Dst_V2 
            (ContractID
            ,StartDT
            ,EndDT
            ,DayCount
            ,WeekDays)
        VALUES
            (@LastContractID
            ,@LastStartDT
            ,@LastEndDT
            ,@LastDayCount
            ,@LastWeekDays);

        -- init the new interval
        SET @LastContractID = @CurrContractID;
        SET @LastStartDT = @Currdt;
        SET @LastEndDT = @Currdt;
        SET @LastDayCount = 1;
        SET @LastMonCount = 0;
        SET @LastTueCount = 0;
        SET @LastWedCount = 0;
        SET @LastThuCount = 0;
        SET @LastFriCount = 0;
        SET @LastSatCount = 0;
        SET @LastSunCount = 0;
        IF @CurrdowInt = 1 SET @LastSunCount = @LastSunCount + 1;
        IF @CurrdowInt = 2 SET @LastMonCount = @LastMonCount + 1;
        IF @CurrdowInt = 3 SET @LastTueCount = @LastTueCount + 1;
        IF @CurrdowInt = 4 SET @LastWedCount = @LastWedCount + 1;
        IF @CurrdowInt = 5 SET @LastThuCount = @LastThuCount + 1;
        IF @CurrdowInt = 6 SET @LastFriCount = @LastFriCount + 1;
        IF @CurrdowInt = 7 SET @LastSatCount = @LastSatCount + 1;

    END ELSE BEGIN

        -- update last interval
        SET @LastEndDT = @Currdt;
        SET @LastDayCount = @LastDayCount + 1;
        IF @CurrdowInt = 1 SET @LastSunCount = @LastSunCount + 1;
        IF @CurrdowInt = 2 SET @LastMonCount = @LastMonCount + 1;
        IF @CurrdowInt = 3 SET @LastTueCount = @LastTueCount + 1;
        IF @CurrdowInt = 4 SET @LastWedCount = @LastWedCount + 1;
        IF @CurrdowInt = 5 SET @LastThuCount = @LastThuCount + 1;
        IF @CurrdowInt = 6 SET @LastFriCount = @LastFriCount + 1;
        IF @CurrdowInt = 7 SET @LastSatCount = @LastSatCount + 1;
    END;


    FETCH NEXT FROM @VarCursor INTO @CurrContractID, @Currdt, @CurrdowChar, @CurrdowInt;
    SET @iFS = @@FETCH_STATUS;
END;

-- save the last interval into the final table
IF @LastDayCount > 0
BEGIN
    SET @LastWeekDays = '';
    IF @LastSunCount > 0 SET @LastWeekDays = @LastWeekDays + 'Sun,';
    IF @LastMonCount > 0 SET @LastWeekDays = @LastWeekDays + 'Mon,';
    IF @LastTueCount > 0 SET @LastWeekDays = @LastWeekDays + 'Tue,';
    IF @LastWedCount > 0 SET @LastWeekDays = @LastWeekDays + 'Wed,';
    IF @LastThuCount > 0 SET @LastWeekDays = @LastWeekDays + 'Thu,';
    IF @LastFriCount > 0 SET @LastWeekDays = @LastWeekDays + 'Fri,';
    IF @LastSatCount > 0 SET @LastWeekDays = @LastWeekDays + 'Sat,';

    INSERT INTO #Dst_V2
        (ContractID
        ,StartDT
        ,EndDT
        ,DayCount
        ,WeekDays)
    VALUES
        (@LastContractID
        ,@LastStartDT
        ,@LastEndDT
        ,@LastDayCount
        ,@LastWeekDays);
END;

CLOSE @VarCursor;
DEALLOCATE @VarCursor;

SET DATEFIRST @VarOldDateFirst;

DROP TABLE #Dst_V2;

Geoff Patterson · Answer 6 · 2016-05-03T06:44:09+08:00

I was a little bit surprised that the Vladimir's cursor solution was so slow, so I also tried to optimize that version. I did confirm that using a cursor was very slow for me as well.

However, at the cost of using undocumented functionality in SQL Server by appending to a variable while processing a rowset, I was able to create a simplified version of this logic that yields the optimal result and executes much faster than both the cursor and my original solution. So use at your own risk, but I'll present the solution in case it's of interest. It would also be possible to update the solution to use a WHILE loop from one to the maximum row number, seeking to the next row number at each iteration of the loop. This would stick to fully documented and reliable functionality, but would violate the (somewhat artificial) stated constraint of the problem that WHILE loops are not allowed.

Note that if using SQL 2014 was allowed, it is likely that a natively-compiled stored procedure that loops over the row numbers and access each row number in a memory-optimized table would be an implementation of this same logic that would run more quickly.

Here is the full solution, including expanding the trial data set out to about a half million rows. The new solution completes in about 3 seconds and in my opinion is a lot more concise and readable than the previous solution I offered. I'll break out the three steps involved here:

Step 1: pre-processing

We first add a row number to the data set, in the order we will process the data. While doing so, we also convert each dowInt into a power of 2 so that we can use a bitmap to represent which days have been observed in any given grouping:

IF OBJECT_ID('tempdb..#srcWithRn') IS NOT NULL
    DROP TABLE #srcWithRn
GO
SELECT rn = IDENTITY(INT, 1, 1), ContractId, dt, dowInt,
    POWER(2, dowInt) AS dowPower, dowChar
INTO #srcWithRn
FROM #src
ORDER BY ContractId, dt
GO
ALTER TABLE #srcWithRn
ADD PRIMARY KEY (rn)
GO

Step 2: Looping through the contract days in order to identify new groupings

We next loop over the data, in order by row number. We compute only the list of row numbers that form the boundary of a new grouping, then output those row numbers into a table:

DECLARE @ContractId INT, @RnList VARCHAR(MAX), @NewGrouping BIT = 0, @DowBitmap INT = 0, @startDt DATE
SELECT TOP 1 @ContractId = ContractId, @startDt = dt, @RnList = ',' + CONVERT(VARCHAR(MAX), rn), @DowBitmap = DowPower
FROM #srcWithRn
WHERE rn = 1

SELECT 
    -- New grouping if new contract, or if we're observing a new day that we did
    -- not observe within the first 7 days of the grouping
    @NewGrouping = CASE
        WHEN ContractId <> @ContractId THEN 1
        WHEN DATEDIFF(DAY, @startDt, dt) > 6
            AND @DowBitmap & dowPower <> dowPower THEN 1
        ELSE 0
        END,
    @ContractId = ContractId,
    -- If this is a newly observed day in an existing grouping, add it to the bitmap
    @DowBitmap = CASE WHEN @NewGrouping = 0 THEN @DowBitmap | DowPower ELSE DowPower END,
    -- If this is a new grouping, reset the start date of the grouping
    @startDt = CASE WHEN @NewGrouping = 0 THEN @startDt ELSE dt END,
    -- If this is a new grouping, add this rn to the list of row numbers that delineate the boundary of a new grouping
    @RnList = CASE WHEN @NewGrouping = 0 THEN @RnList ELSE @RnList + ',' + CONVERT(VARCHAR(MAX), rn) END 
FROM #srcWithRn
WHERE rn >= 2
ORDER BY rn
OPTION (MAXDOP 1)

-- Split the list of grouping boundaries into a table
IF OBJECT_ID('tempdb..#newGroupingRns') IS NOT NULL
    DROP TABLE #newGroupingRns
SELECT splitListId AS rn
INTO #newGroupingRns
FROM dbo.f_delimitedIntListSplitter(SUBSTRING(@RnList, 2, 1000000000), DEFAULT)
GO
ALTER TABLE #newGroupingRns
ADD PRIMARY KEY (rn)
GO

Step 3: Computing final results based on the row numbers of each grouping boundary

We then compute the final groupings by using the boundaries identified in the loop above to aggregate all dates that fall into each grouping:

IF OBJECT_ID('tempdb..#finalGroupings') IS NOT NULL
    DROP TABLE #finalGroupings
GO
SELECT MIN(s.ContractId) AS ContractId,
    MIN(dt) AS StartDT,
    MAX(dt) AS EndDT,
    COUNT(*) AS DayCount,
    CASE WHEN MAX(CASE WHEN dowChar = 'Sun' THEN 1 ELSE 0 END) = 1 THEN 'Sun,' ELSE '' END + 
    CASE WHEN MAX(CASE WHEN dowChar = 'Mon' THEN 1 ELSE 0 END) = 1 THEN 'Mon,' ELSE '' END + 
    CASE WHEN MAX(CASE WHEN dowChar = 'Tue' THEN 1 ELSE 0 END) = 1 THEN 'Tue,' ELSE '' END + 
    CASE WHEN MAX(CASE WHEN dowChar = 'Wed' THEN 1 ELSE 0 END) = 1 THEN 'Wed,' ELSE '' END + 
    CASE WHEN MAX(CASE WHEN dowChar = 'Thu' THEN 1 ELSE 0 END) = 1 THEN 'Thu,' ELSE '' END + 
    CASE WHEN MAX(CASE WHEN dowChar = 'Fri' THEN 1 ELSE 0 END) = 1 THEN 'Fri,' ELSE '' END + 
    CASE WHEN MAX(CASE WHEN dowChar = 'Sat' THEN 1 ELSE 0 END) = 1 THEN 'Sat,' ELSE '' END AS WeekDays
INTO #finalGroupings
FROM #srcWithRn s
CROSS APPLY (
    -- For any row, its grouping is the largest boundary row number that occurs at or before this row
    SELECT TOP 1 rn AS groupingRn
    FROM #newGroupingRns grp
    WHERE grp.rn <= s.rn
    ORDER BY grp.rn DESC
) g
GROUP BY g.groupingRn
ORDER BY g.groupingRn
GO

Michael Green · Answer 7 · 2016-05-03T02:47:03+08:00

Discussion will follow the code.

declare @Helper table(
    rn tinyint,
    dowInt tinyint,
    dowChar char(3));
insert @Helper
values  ( 1,1,'Sun'),
        ( 2,2,'Mon'),
        ( 3,3,'Tue'),
        ( 4,4,'Wed'),
        ( 5,5,'Thu'),
        ( 6,6,'Fri'),
        ( 7,7,'Sat'),
        ( 8,1,'Sun'),
        ( 9,2,'Mon'),
        (10,3,'Tue'),
        (11,4,'Wed'),
        (12,5,'Thu'),
        (13,6,'Fri'),
        (14,7,'Sat');



with MissingDays as
(
    select
        h1.rn as rn1,
        h1.dowChar as StartDay,
        h2.rn as rn2,
        h2.dowInt as FollowingDayInt,
        h2.dowChar as FollowingDayChar
    from @Helper as h1
    inner join @Helper as h2
        on h2.rn > h1.rn
    where h1.rn < 8
    and h2.rn < h1.rn + 8
)
,Numbered as
(
    select
        a.*,
        ROW_NUMBER() over (partition by a.ContractID order by a.dt) as rn
    from #Src as a
)
,Incremented as
(
    select
        b.*,
        convert(varchar(max), b.dowChar)+',' as WeekDays,
        b.dt as IntervalStart
    from Numbered as b
    where b.rn = 1

    union all

    select
        c.*,
        case
            when
                (DATEDIFF(day, d.IntervalStart, c.dt) > 6)      -- interval goes beyond 7 days
            and (
                    (d.WeekDays not like '%'+c.dowChar+'%')     -- the new week day has not been seen before
                or 
                    (DATEDIFF(day, d.dt, c.dt) > 7)
                or 
                    (
                        (DATEDIFF(day, d.dt, c.dt) > 1)
                        and
                        (
                        exists( select
                                    e.FollowingDayChar
                                from MissingDays as e
                                where e.StartDay = d.dowChar
                                and rn2 < (select f.rn2 from MissingDays as f
                                            where f.StartDay = d.dowChar
                                            and f.FollowingDayInt = c.dowInt)
                                and d.WeekDays like '%'+e.FollowingDayChar+'%'
                            )
                        )
                    )
                )
            then convert(varchar(max),c.dowChar)+','
            else
                case
                    when d.WeekDays like '%'+c.dowChar+'%'
                    then d.WeekDays
                    else d.WeekDays+convert(varchar(max),c.dowChar)+','
                end
        end,
        case
            when
                (DATEDIFF(day, d.IntervalStart, c.dt) > 6)      -- interval goes beyond 7 days
            and (
                    (d.WeekDays not like '%'+c.dowChar+'%')     -- the new week day has not been seen before
                or
                    (DATEDIFF(day, d.dt, c.dt) > 7)             -- there is a one week gap
                or 
                    (
                        (DATEDIFF(day, d.dt, c.dt) > 1)         -- there is a gap..
                        and
                        (
                        exists( select                          -- .. and the omitted days are in the preceeding interval
                                    e.FollowingDayChar
                                from MissingDays as e
                                where e.StartDay = d.dowChar
                                and rn2 < (select f.rn2 from MissingDays as f
                                            where f.StartDay = d.dowChar
                                            and f.FollowingDayInt = c.dowInt)
                                and d.WeekDays like '%'+e.FollowingDayChar+'%'
                            )
                        )
                    )
                )
            then c.dt
            else d.IntervalStart
        end
    from Numbered as c
    inner join Incremented as d
    on d.ContractID = c.ContractID
    and d.rn = c.rn - 1
)
select
    g.ContractID,
    g.IntervalStart as StartDT,
    MAX(g.dt) as EndDT,
    COUNT(*) as DayCount,
    MAX(g.WeekDays) as WeekDays
from Incremented as g
group by
    g.ContractID,
    g.IntervalStart
order by
    ContractID,
    StartDT;

@Helper is to cope with this rule:

If the gap between current day and last day of the interval contains a day of the week that was included in the interval before, we should create new interval

It allows me to list day names, in day number order, between any two given days. This is used when deciding if a new interval should start. I populate it with two week's worth of values to make wrapping around a weekend easier to code.

There are cleaner ways to implement this. A full "dates" table would be one. There's probably a clever way with day number and modulo arithmetic, too.

The CTE MissingDays is to generate a list of day names between any two given days. It is handled in this clunky way because the recursive CTE (following) does not allow aggregates, TOP(), or other operators. This is inelegant, but it works.

CTE Numbered is to enforce a known, gap-free sequence on the data. It avoids a lot of comparisons later.

CTE Incremented is where the action happens. In essence I use a recursive CTE to step through the data and enforce the rules. The row number generated in Numbered (above) is used to drive the recursive processing.

The seed of the recursive CTE simply gets the first date for each ContractID and initialises values that will be used to decide if a new interval is required.

Deciding if a new interval should start requires the current interval's start date, day list and the length of any gap in the calendar dates. These may be reset or carried forward, depending on the decision. Hence the recursive part is verbose and a little repetitive, as we have to decide whether to start a new interval for more than one column value.

The decision logic for columns WeekDays and IntervalStart should have the same decision logic - it can be cut-and-pasted between them. If the logic for starting a new interval were to change this is the code to alter. Ideally it would be abstracted, therefore; doing this in a recursive CTE may be challenging.

The EXISTS() clause is the outfall of not being able to use aggregate functions in a recursive CTE. All it does is see if the days falling within a gap are in the current interval already.

There is nothing magic about the nesting of the logic clauses. If it is clearer in another conformation, or using nested CASEs, say, there is no reason to keep it this way.

The final SELECT is to give the output in the format desired.

Having the PK on Src.ID is not useful for this method. A clustered index on (ContractID,dt) would be nice, I think.

There are a few rough edges. The days are not returned in dow sequence, but in the calendar sequence they appear in the source data. Everything to do with @Helper is klunky and could be smoothed. I like the idea of using one bit per day and using binary functions instead of LIKE. Separating some of the auxilliary CTEs into temp table with proper indexes would undoubtedly help.

One of the challenges with this is that a "week" does not align with a standard calendar, but rather is driven by the data, and resets when it is determined that a new interval should begin. A "week", or at least an interval, can be from one day long to spanning the entire dataset.

For interests sake, here's the estimated costs against Geoff's sample data (thanks for that!) after various changes:

                                             estimated cost

My submission as is w/ CTEs, Geoff's data:      791682
Geoff's data, cluster key on (ContractID, dt):   21156.2
Real table for MissingDays:                      21156.2
Numbered as table UCI=(ContractID, rn):             16.6115    26s elapsed.
                  UCI=(rn, ContractID):             41.9845    26s elapsed.
MissingDays as refactored to simple lookup          16.6477    22s elapsed.
Weekdays as varchar(30)                             13.4013    30s elapsed.

The estimated and actual number of rows differ wildly.

The plan has a table spoo, likely as a result of the recursive CTE. Most of the action is in a worktable coming off that:

Table 'Worktable'.   Scan count       2, logical reads 4 196 269, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'MissingDays'. Scan count 464 116, logical reads   928 232, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Numbered'.    Scan count 484 122, logical reads 1 475 467, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Just the way recursive's implemented, I guess!

将每日计划分组到 [开始日期; 结束日期] 与工作日列表的间隔

答案比较

Another strategy

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

将每日计划分组到 [开始日期; 结束日期] 与工作日列表的间隔

答案比较

7 个回答

Another strategy

相关问题