我需要在两个系统之间转换数据。
第一个系统将日程表存储为一个简单的日期列表。计划中包含的每个日期都是一行。日期顺序中可能存在各种间隙(周末、公共假期和较长的停顿,一周中的某些日子可能会被排除在日程表之外)。完全不能有空档,甚至可以包括周末。时间表最长可达 2 年。通常是几周之久。
这是一个简单的计划示例,该计划跨越两周,不包括周末(下面的脚本中有更复杂的示例):
+----+------------+------------+---------+--------+
| ID | ContractID | dt | dowChar | dowInt |
+----+------------+------------+---------+--------+
| 10 | 1 | 2016-05-02 | Mon | 2 |
| 11 | 1 | 2016-05-03 | Tue | 3 |
| 12 | 1 | 2016-05-04 | Wed | 4 |
| 13 | 1 | 2016-05-05 | Thu | 5 |
| 14 | 1 | 2016-05-06 | Fri | 6 |
| 15 | 1 | 2016-05-09 | Mon | 2 |
| 16 | 1 | 2016-05-10 | Tue | 3 |
| 17 | 1 | 2016-05-11 | Wed | 4 |
| 18 | 1 | 2016-05-12 | Thu | 5 |
| 19 | 1 | 2016-05-13 | Fri | 6 |
+----+------------+------------+---------+--------+
ID
是唯一的,但不一定是顺序的(它是主键)。每个合约中的日期都是唯一的(在 上有唯一的索引(ContractID, dt)
)。
第二个系统将时间表存储为带有作为时间表一部分的工作日列表的间隔。每个间隔由其开始日期和结束日期(包括)和日程表中包含的工作日列表定义。在这种格式中,您可以有效地定义重复的每周模式,例如周一至周三,但是当模式被打乱时(例如公共假期)就会变得很痛苦。
上面的简单示例如下所示:
+------------+------------+------------+----------+----------------------+
| ContractID | StartDT | EndDT | DayCount | WeekDays |
+------------+------------+------------+----------+----------------------+
| 1 | 2016-05-02 | 2016-05-13 | 10 | Mon,Tue,Wed,Thu,Fri, |
+------------+------------+------------+----------+----------------------+
[StartDT;EndDT]
属于同一合约的区间不应重叠。
我需要将来自第一个系统的数据转换为第二个系统使用的格式。目前,我正在 C# 中为单个给定合同在客户端解决此问题,但我想在服务器端的 T-SQL 中执行此操作,以便在服务器之间进行批量处理和导出/导入。最有可能的是,它可以使用 CLR UDF 来完成,但在这个阶段我不能使用 SQLCLR。
这里的挑战是使间隔列表尽可能短且人性化。
例如,这个时间表:
+-----+------------+------------+---------+--------+
| ID | ContractID | dt | dowChar | dowInt |
+-----+------------+------------+---------+--------+
| 223 | 2 | 2016-05-05 | Thu | 5 |
| 224 | 2 | 2016-05-06 | Fri | 6 |
| 225 | 2 | 2016-05-09 | Mon | 2 |
| 226 | 2 | 2016-05-10 | Tue | 3 |
| 227 | 2 | 2016-05-11 | Wed | 4 |
| 228 | 2 | 2016-05-12 | Thu | 5 |
| 229 | 2 | 2016-05-13 | Fri | 6 |
| 230 | 2 | 2016-05-16 | Mon | 2 |
| 231 | 2 | 2016-05-17 | Tue | 3 |
+-----+------------+------------+---------+--------+
应该变成这样:
+------------+------------+------------+----------+----------------------+
| ContractID | StartDT | EndDT | DayCount | WeekDays |
+------------+------------+------------+----------+----------------------+
| 2 | 2016-05-05 | 2016-05-17 | 9 | Mon,Tue,Wed,Thu,Fri, |
+------------+------------+------------+----------+----------------------+
,不是这个:
+------------+------------+------------+----------+----------------------+
| ContractID | StartDT | EndDT | DayCount | WeekDays |
+------------+------------+------------+----------+----------------------+
| 2 | 2016-05-05 | 2016-05-06 | 2 | Thu,Fri, |
| 2 | 2016-05-09 | 2016-05-13 | 5 | Mon,Tue,Wed,Thu,Fri, |
| 2 | 2016-05-16 | 2016-05-17 | 2 | Mon,Tue, |
+------------+------------+------------+----------+----------------------+
我试图对gaps-and-islands
这个问题应用一种方法。我试着分两次做。在第一遍中,我找到了简单连续天的岛屿,即岛屿的末端是天序列中的任何间隙,无论是周末、公共假期还是其他什么。对于每个这样找到的岛,我建立一个以逗号分隔的 distinct 列表WeekDays
。在第二遍中,我小组通过查看周数序列中的差距或WeekDays
.
使用这种方法,每个部分周都会以一个额外的间隔结束,如上所示,因为即使周数是连续的,也会WeekDays
发生变化。此外,一周内可能存在定期间隔(参见ContractID=3
示例数据,其中仅包含 的数据Mon,Wed,Fri,
),并且这种方法会在这样的时间表中为每一天生成单独的间隔。从好的方面来说,如果日程安排完全没有任何间隙(参见ContractID=7
包含周末的示例数据),它会生成一个间隔,在这种情况下,开始周或结束周是否部分无关紧要。
请查看下面脚本中的其他示例,以更好地了解我所追求的。您可以看到,周末经常被排除在外,但一周中的任何其他日子也可能被排除在外。仅在示例 3Mon
中,Wed
并且Fri
是计划的一部分。此外,还可以包括周末,如示例 7。该解决方案应平等对待一周中的所有日子。一周中的任何一天都可以包含在日程表中或排除在日程表之外。
要验证生成的间隔列表是否正确描述了给定的时间表,您可以使用以下伪代码:
- 循环遍历所有间隔
- 对于每个间隔,循环遍历开始日期和结束日期(包括)之间的所有日历日期。
- 对于每个日期,检查其星期几是否列在
WeekDays
. 如果是,则该日期包含在日程表中。
希望这可以阐明在什么情况下应该创建新的间隔。在示例 4 和 52016-05-09
中,从日程表的中间删除了一个星期一 ( ),这样的日程表不能由单个间隔表示。在示例 6 中,时间表中有很长的间隔,因此需要两个间隔。
间隔代表计划中的每周模式,当模式被中断/更改时,必须添加新的间隔。例 11 前三周有一个模式Tue
,然后这个模式变为Thu
。因此,我们需要两个间隔来描述这样的时间表。
我目前使用的是 SQL Server 2008,所以解决方案应该在这个版本中工作。如果 SQL Server 2008 的解决方案可以使用更高版本的功能进行简化/改进,那将是一个奖励,请同时展示它。
我有一个Calendar
表格(日期列表)和Numbers
表格(从 1 开始的整数列表),所以如果需要,可以使用它们。也可以创建临时表并有多个查询分阶段处理数据。但是,算法中的阶段数必须固定,游标和显式WHILE
循环都不行。
示例数据和预期结果的脚本
-- @Src is sample data
-- @Dst is expected result
DECLARE @Src TABLE (ID int PRIMARY KEY, ContractID int, dt date, dowChar char(3), dowInt int);
INSERT INTO @Src (ID, ContractID, dt, dowChar, dowInt) VALUES
-- simple two weeks (without weekend)
(110, 1, '2016-05-02', 'Mon', 2),
(111, 1, '2016-05-03', 'Tue', 3),
(112, 1, '2016-05-04', 'Wed', 4),
(113, 1, '2016-05-05', 'Thu', 5),
(114, 1, '2016-05-06', 'Fri', 6),
(115, 1, '2016-05-09', 'Mon', 2),
(116, 1, '2016-05-10', 'Tue', 3),
(117, 1, '2016-05-11', 'Wed', 4),
(118, 1, '2016-05-12', 'Thu', 5),
(119, 1, '2016-05-13', 'Fri', 6),
-- a partial end of the week, the whole week, partial start of the week (without weekends)
(223, 2, '2016-05-05', 'Thu', 5),
(224, 2, '2016-05-06', 'Fri', 6),
(225, 2, '2016-05-09', 'Mon', 2),
(226, 2, '2016-05-10', 'Tue', 3),
(227, 2, '2016-05-11', 'Wed', 4),
(228, 2, '2016-05-12', 'Thu', 5),
(229, 2, '2016-05-13', 'Fri', 6),
(230, 2, '2016-05-16', 'Mon', 2),
(231, 2, '2016-05-17', 'Tue', 3),
-- only Mon, Wed, Fri are included across two weeks plus partial third week
(310, 3, '2016-05-02', 'Mon', 2),
(311, 3, '2016-05-04', 'Wed', 4),
(314, 3, '2016-05-06', 'Fri', 6),
(315, 3, '2016-05-09', 'Mon', 2),
(317, 3, '2016-05-11', 'Wed', 4),
(319, 3, '2016-05-13', 'Fri', 6),
(330, 3, '2016-05-16', 'Mon', 2),
-- a whole week (without weekend), in the second week Mon is not included
(410, 4, '2016-05-02', 'Mon', 2),
(411, 4, '2016-05-03', 'Tue', 3),
(412, 4, '2016-05-04', 'Wed', 4),
(413, 4, '2016-05-05', 'Thu', 5),
(414, 4, '2016-05-06', 'Fri', 6),
(416, 4, '2016-05-10', 'Tue', 3),
(417, 4, '2016-05-11', 'Wed', 4),
(418, 4, '2016-05-12', 'Thu', 5),
(419, 4, '2016-05-13', 'Fri', 6),
-- three weeks, but without Mon in the second week (no weekends)
(510, 5, '2016-05-02', 'Mon', 2),
(511, 5, '2016-05-03', 'Tue', 3),
(512, 5, '2016-05-04', 'Wed', 4),
(513, 5, '2016-05-05', 'Thu', 5),
(514, 5, '2016-05-06', 'Fri', 6),
(516, 5, '2016-05-10', 'Tue', 3),
(517, 5, '2016-05-11', 'Wed', 4),
(518, 5, '2016-05-12', 'Thu', 5),
(519, 5, '2016-05-13', 'Fri', 6),
(520, 5, '2016-05-16', 'Mon', 2),
(521, 5, '2016-05-17', 'Tue', 3),
(522, 5, '2016-05-18', 'Wed', 4),
(523, 5, '2016-05-19', 'Thu', 5),
(524, 5, '2016-05-20', 'Fri', 6),
-- long gap between two intervals
(623, 6, '2016-05-05', 'Thu', 5),
(624, 6, '2016-05-06', 'Fri', 6),
(625, 6, '2016-05-09', 'Mon', 2),
(626, 6, '2016-05-10', 'Tue', 3),
(627, 6, '2016-05-11', 'Wed', 4),
(628, 6, '2016-05-12', 'Thu', 5),
(629, 6, '2016-05-13', 'Fri', 6),
(630, 6, '2016-05-16', 'Mon', 2),
(631, 6, '2016-05-17', 'Tue', 3),
(645, 6, '2016-06-06', 'Mon', 2),
(646, 6, '2016-06-07', 'Tue', 3),
(647, 6, '2016-06-08', 'Wed', 4),
(648, 6, '2016-06-09', 'Thu', 5),
(649, 6, '2016-06-10', 'Fri', 6),
(655, 6, '2016-06-13', 'Mon', 2),
(656, 6, '2016-06-14', 'Tue', 3),
(657, 6, '2016-06-15', 'Wed', 4),
(658, 6, '2016-06-16', 'Thu', 5),
(659, 6, '2016-06-17', 'Fri', 6),
-- two weeks, no gaps between days at all, even weekends are included
(710, 7, '2016-05-02', 'Mon', 2),
(711, 7, '2016-05-03', 'Tue', 3),
(712, 7, '2016-05-04', 'Wed', 4),
(713, 7, '2016-05-05', 'Thu', 5),
(714, 7, '2016-05-06', 'Fri', 6),
(715, 7, '2016-05-07', 'Sat', 7),
(716, 7, '2016-05-08', 'Sun', 1),
(725, 7, '2016-05-09', 'Mon', 2),
(726, 7, '2016-05-10', 'Tue', 3),
(727, 7, '2016-05-11', 'Wed', 4),
(728, 7, '2016-05-12', 'Thu', 5),
(729, 7, '2016-05-13', 'Fri', 6),
-- no gaps between days at all, even weekends are included, with partial weeks
(805, 8, '2016-04-30', 'Sat', 7),
(806, 8, '2016-05-01', 'Sun', 1),
(810, 8, '2016-05-02', 'Mon', 2),
(811, 8, '2016-05-03', 'Tue', 3),
(812, 8, '2016-05-04', 'Wed', 4),
(813, 8, '2016-05-05', 'Thu', 5),
(814, 8, '2016-05-06', 'Fri', 6),
(815, 8, '2016-05-07', 'Sat', 7),
(816, 8, '2016-05-08', 'Sun', 1),
(825, 8, '2016-05-09', 'Mon', 2),
(826, 8, '2016-05-10', 'Tue', 3),
(827, 8, '2016-05-11', 'Wed', 4),
(828, 8, '2016-05-12', 'Thu', 5),
(829, 8, '2016-05-13', 'Fri', 6),
(830, 8, '2016-05-14', 'Sat', 7),
-- only Mon-Wed included, two weeks plus partial third week
(910, 9, '2016-05-02', 'Mon', 2),
(911, 9, '2016-05-03', 'Tue', 3),
(912, 9, '2016-05-04', 'Wed', 4),
(915, 9, '2016-05-09', 'Mon', 2),
(916, 9, '2016-05-10', 'Tue', 3),
(917, 9, '2016-05-11', 'Wed', 4),
(930, 9, '2016-05-16', 'Mon', 2),
(931, 9, '2016-05-17', 'Tue', 3),
-- only Thu-Sun included, three weeks
(1013,10,'2016-05-05', 'Thu', 5),
(1014,10,'2016-05-06', 'Fri', 6),
(1015,10,'2016-05-07', 'Sat', 7),
(1016,10,'2016-05-08', 'Sun', 1),
(1018,10,'2016-05-12', 'Thu', 5),
(1019,10,'2016-05-13', 'Fri', 6),
(1020,10,'2016-05-14', 'Sat', 7),
(1021,10,'2016-05-15', 'Sun', 1),
(1023,10,'2016-05-19', 'Thu', 5),
(1024,10,'2016-05-20', 'Fri', 6),
(1025,10,'2016-05-21', 'Sat', 7),
(1026,10,'2016-05-22', 'Sun', 1),
-- only Tue for first three weeks, then only Thu for the next three weeks
(1111,11,'2016-05-03', 'Tue', 3),
(1116,11,'2016-05-10', 'Tue', 3),
(1131,11,'2016-05-17', 'Tue', 3),
(1123,11,'2016-05-19', 'Thu', 5),
(1124,11,'2016-05-26', 'Thu', 5),
(1125,11,'2016-06-02', 'Thu', 5),
-- one week, then one week gap, then one week
(1210,12,'2016-05-02', 'Mon', 2),
(1211,12,'2016-05-03', 'Tue', 3),
(1212,12,'2016-05-04', 'Wed', 4),
(1213,12,'2016-05-05', 'Thu', 5),
(1214,12,'2016-05-06', 'Fri', 6),
(1215,12,'2016-05-16', 'Mon', 2),
(1216,12,'2016-05-17', 'Tue', 3),
(1217,12,'2016-05-18', 'Wed', 4),
(1218,12,'2016-05-19', 'Thu', 5),
(1219,12,'2016-05-20', 'Fri', 6);
SELECT ID, ContractID, dt, dowChar, dowInt
FROM @Src
ORDER BY ContractID, dt;
DECLARE @Dst TABLE (ContractID int, StartDT date, EndDT date, DayCount int, WeekDays varchar(255));
INSERT INTO @Dst (ContractID, StartDT, EndDT, DayCount, WeekDays) VALUES
(1, '2016-05-02', '2016-05-13', 10, 'Mon,Tue,Wed,Thu,Fri,'),
(2, '2016-05-05', '2016-05-17', 9, 'Mon,Tue,Wed,Thu,Fri,'),
(3, '2016-05-02', '2016-05-16', 7, 'Mon,Wed,Fri,'),
(4, '2016-05-02', '2016-05-06', 5, 'Mon,Tue,Wed,Thu,Fri,'),
(4, '2016-05-10', '2016-05-13', 4, 'Tue,Wed,Thu,Fri,'),
(5, '2016-05-02', '2016-05-06', 5, 'Mon,Tue,Wed,Thu,Fri,'),
(5, '2016-05-10', '2016-05-20', 9, 'Mon,Tue,Wed,Thu,Fri,'),
(6, '2016-05-05', '2016-05-17', 9, 'Mon,Tue,Wed,Thu,Fri,'),
(6, '2016-06-06', '2016-06-17', 10, 'Mon,Tue,Wed,Thu,Fri,'),
(7, '2016-05-02', '2016-05-13', 12, 'Sun,Mon,Tue,Wed,Thu,Fri,Sat,'),
(8, '2016-04-30', '2016-05-14', 15, 'Sun,Mon,Tue,Wed,Thu,Fri,Sat,'),
(9, '2016-05-02', '2016-05-17', 8, 'Mon,Tue,Wed,'),
(10,'2016-05-05', '2016-05-22', 12, 'Sun,Thu,Fri,Sat,'),
(11,'2016-05-03', '2016-05-17', 3, 'Tue,'),
(11,'2016-05-19', '2016-06-02', 3, 'Thu,'),
(12,'2016-05-02', '2016-05-06', 5, 'Mon,Tue,Wed,Thu,Fri,'),
(12,'2016-05-16', '2016-05-20', 5, 'Mon,Tue,Wed,Thu,Fri,');
SELECT ContractID, StartDT, EndDT, DayCount, WeekDays
FROM @Dst
ORDER BY ContractID, StartDT;
答案比较
真正的表@Src
有不同的403,555
行。所有答案都会产生正确的结果(至少对于我的数据而言),并且所有答案都相当快,但它们的最优性不同。生成的间隔越少越好。出于好奇,我将运行时间包括在内。主要关注的是正确和最佳的结果,而不是速度(除非花费太长时间 - 我在 10 分钟后停止了 Ziggy Crueltyfree Zeitgeister 的非递归查询)。15,857
ContractIDs
+--------------------------------------------------------+-----------+---------+
| Answer | Intervals | Seconds |
+--------------------------------------------------------+-----------+---------+
| Ziggy Crueltyfree Zeitgeister | 25751 | 7.88 |
| While loop | | |
| | | |
| Ziggy Crueltyfree Zeitgeister | 25751 | 8.27 |
| Recursive | | |
| | | |
| Michael Green | 25751 | 22.63 |
| Recursive | | |
| | | |
| Geoff Patterson | 26670 | 4.79 |
| Weekly gaps-and-islands with merging of partial weeks | | |
| | | |
| Vladimir Baranov | 34560 | 4.03 |
| Daily, then weekly gaps-and-islands | | |
| | | |
| Mikael Eriksson | 35840 | 0.65 |
| Weekly gaps-and-islands | | |
+--------------------------------------------------------+-----------+---------+
| Vladimir Baranov | 25751 | 121.51 |
| Cursor | | |
+--------------------------------------------------------+-----------+---------+
This one uses a recursive CTE. Its result is identical to the example in the question. It was a nightmare to come up with... The code includes comments to ease through its convoluted logic.
Another strategy
This one should be significantly faster than the previous one because it doesn't rely on the slow limited recursive CTE in SQL Server 2008, although it implements more or less the same strategy.
There is a
WHILE
loop (I couldn't devise a way to avoid it), but goes for a reduced number of iterations (the highest number of sequences (minus one) on any given contract).It's a simple strategy, and could be used for sequences either shorter or longer than a week (replacing any occurrence of the constant 7 for any other number, and the
dowBit
calculated from MODULUS x ofDayNo
rather thanDATEPART(wk)
) and up to 32.不完全是您正在寻找的东西,但您可能会感兴趣。
该查询使用逗号分隔的字符串为每周使用的天数创建周。然后它在 中找到使用相同模式的连续几周的岛屿
Weekdays
。结果:
ContractID = 2
shows what the difference in the result is compared to what you want. The first and last week will be treated as separate periods sinceWeekDays
is different.I ended up with an approach that yields the optimal solution in this case and I think will do well in general. The solution is quite lengthy, however, so it would be interesting to see if someone else has a different approach that is more concise.
Here is a script that contains the full solution.
And here is an outline of the algorithm:
ContractId
ContractId
and have sameWeekDays
WeekDays
of the single week matches a leading subset of theWeekDays
of the previous grouping, merge into that previous groupingWeekDays
of the single week matches a trailing subset of theWeekDays
of the next grouping, merge into that next grouping我无法理解将周与间隔分组或将周与周末分组背后的逻辑(例如,当周末连续两周时,周末去哪一周?)。
以下查询产生所需的输出,只是它只对连续的工作日进行分组,并将星期几分组 Sun-Sat(而不是 Mon-Sun)。虽然不完全是您想要的,但也许这可以为不同的策略提供一些线索。天的分组来自这里。使用的窗口函数应该适用于 SQLServer 2008,但我没有那个版本来测试它是否真的可以。
结果
For the sake of completeness, here is a two-pass
gaps-and-islands
approach that I tried myself before asking this question.As I was testing it on the real data I found few cases when it was producing incorrect results and fixed it.
Here is the algorithm:
CTE_ContractDays
,CTE_DailyRN
,CTE_DailyIslands
) and calculate a week number for each starting and ending date of an island. Here week number is calculated assuming that Monday is the first day of the week.CTE_Weeks
).CTE_FirstResult
).WeekDays
(CTE_SecondRN
,CTE_Schedules
).It handles well cases when there is no disruption in the weekly patterns (1, 7, 8, 10, 12). It handles well cases when pattern has non-sequential days (3).
But, unfortunately, it generates extra intervals for partial weeks (2, 3, 5, 6, 9, 11).
Result
Cursor-based solution
I converted my C# code into a cursor-based algorithm, just to see how it compares to other solutions on real data. It confirms that it is much slower than other set-based or recursive approaches, but it generates an optimal result.
I was a little bit surprised that the Vladimir's cursor solution was so slow, so I also tried to optimize that version. I did confirm that using a cursor was very slow for me as well.
However, at the cost of using undocumented functionality in SQL Server by appending to a variable while processing a rowset, I was able to create a simplified version of this logic that yields the optimal result and executes much faster than both the cursor and my original solution. So use at your own risk, but I'll present the solution in case it's of interest. It would also be possible to update the solution to use a
WHILE
loop from one to the maximum row number, seeking to the next row number at each iteration of the loop. This would stick to fully documented and reliable functionality, but would violate the (somewhat artificial) stated constraint of the problem thatWHILE
loops are not allowed.Note that if using SQL 2014 was allowed, it is likely that a natively-compiled stored procedure that loops over the row numbers and access each row number in a memory-optimized table would be an implementation of this same logic that would run more quickly.
Here is the full solution, including expanding the trial data set out to about a half million rows. The new solution completes in about 3 seconds and in my opinion is a lot more concise and readable than the previous solution I offered. I'll break out the three steps involved here:
Step 1: pre-processing
We first add a row number to the data set, in the order we will process the data. While doing so, we also convert each dowInt into a power of 2 so that we can use a bitmap to represent which days have been observed in any given grouping:
Step 2: Looping through the contract days in order to identify new groupings
We next loop over the data, in order by row number. We compute only the list of row numbers that form the boundary of a new grouping, then output those row numbers into a table:
Step 3: Computing final results based on the row numbers of each grouping boundary
We then compute the final groupings by using the boundaries identified in the loop above to aggregate all dates that fall into each grouping:
Discussion will follow the code.
@Helper
is to cope with this rule:It allows me to list day names, in day number order, between any two given days. This is used when deciding if a new interval should start. I populate it with two week's worth of values to make wrapping around a weekend easier to code.
There are cleaner ways to implement this. A full "dates" table would be one. There's probably a clever way with day number and modulo arithmetic, too.
The CTE
MissingDays
is to generate a list of day names between any two given days. It is handled in this clunky way because the recursive CTE (following) does not allow aggregates, TOP(), or other operators. This is inelegant, but it works.CTE
Numbered
is to enforce a known, gap-free sequence on the data. It avoids a lot of comparisons later.CTE
Incremented
is where the action happens. In essence I use a recursive CTE to step through the data and enforce the rules. The row number generated inNumbered
(above) is used to drive the recursive processing.The seed of the recursive CTE simply gets the first date for each ContractID and initialises values that will be used to decide if a new interval is required.
Deciding if a new interval should start requires the current interval's start date, day list and the length of any gap in the calendar dates. These may be reset or carried forward, depending on the decision. Hence the recursive part is verbose and a little repetitive, as we have to decide whether to start a new interval for more than one column value.
The decision logic for columns
WeekDays
andIntervalStart
should have the same decision logic - it can be cut-and-pasted between them. If the logic for starting a new interval were to change this is the code to alter. Ideally it would be abstracted, therefore; doing this in a recursive CTE may be challenging.The
EXISTS()
clause is the outfall of not being able to use aggregate functions in a recursive CTE. All it does is see if the days falling within a gap are in the current interval already.There is nothing magic about the nesting of the logic clauses. If it is clearer in another conformation, or using nested CASEs, say, there is no reason to keep it this way.
The final
SELECT
is to give the output in the format desired.Having the PK on
Src.ID
is not useful for this method. A clustered index on(ContractID,dt)
would be nice, I think.There are a few rough edges. The days are not returned in dow sequence, but in the calendar sequence they appear in the source data. Everything to do with @Helper is klunky and could be smoothed. I like the idea of using one bit per day and using binary functions instead of
LIKE
. Separating some of the auxilliary CTEs into temp table with proper indexes would undoubtedly help.One of the challenges with this is that a "week" does not align with a standard calendar, but rather is driven by the data, and resets when it is determined that a new interval should begin. A "week", or at least an interval, can be from one day long to spanning the entire dataset.
For interests sake, here's the estimated costs against Geoff's sample data (thanks for that!) after various changes:
The estimated and actual number of rows differ wildly.
The plan has a table spoo, likely as a result of the recursive CTE. Most of the action is in a worktable coming off that:
Just the way recursive's implemented, I guess!