我有一个包含 id 列、父列和子列的表。因此,对于每条记录,我知道在链关系中它之前是什么记录,之后是什么记录,但是从单个记录中,我不知道它在链中的什么位置,也不知道它的第一部分或最后一部分是什么连锁,链条。
出于演示目的,假设我们有以下设置。我意识到这种设置存在许多设计问题,但这是我必须解决的问题。
CREATE TABLE Relationships (
ID VARCHAR(4),
ParentID VARCHAR(4),
ChildID VARCHAR(4)
)
-- Insert root entries with their children
INSERT INTO Relationships (ID, ParentID, ChildID)
VALUES ('0001', '', '0003'), ('0002', '', '0004')
-- Now add further entries for each relationship chain
INSERT INTO Relationships(ID, ParentID, ChildID)
VALUES('0003', '0001', '0005'), ('0005', '0003', '0006'), ('0006', '0005', '0007'), ('0007', '0006', ''),
('0004', '0002', '')
--Now we have two chains of 0001 -> 0003 -> 0005 -> 0006 -> 0007 and 0002 -> 0004
通过像下面这样的递归 CTE,我可以找到所有记录在它们的链中是如何相关的以及它们在链中的位置。
WITH RelationshipChain AS (
SELECT ID, ParentID, ChildID, 0 AS Seq, ID AS RootID
FROM Relationships WHERE ParentID = ''
UNION ALL
SELECT r2.ID, r2.ParentID, r2.ChildID, rc.Seq + 1 AS Seq, rc.RootID AS RootID
FROM Relationships r2
INNER JOIN RelationshipChain rc ON rc.ChildID = r2.ID
)
SELECT * FROM RelationshipChain
ORDER BY RootID, Seq
对于大约 200 万条记录,这在大约 30 秒内运行,这非常好,但是如果我尝试还包括链的最后部分,则运行时间是 4 倍。目前,我正在这样做:
WITH RelationshipChain AS (
SELECT ID, ParentID, ChildID, 0 AS Seq, ID AS RootID
FROM Relationships WHERE ParentID = ''
UNION ALL
SELECT r2.ID, r2.ParentID, r2.ChildID, rc.Seq + 1 AS Seq, rc.RootID AS RootID
FROM Relationships r2
INNER JOIN RelationshipChain rc ON rc.ChildID = r2.ID
)
SELECT *
FROM RelationshipChain rc
CROSS APPLY (
SELECT MAX(Seq) AS FinalSeq FROM RelationshipChain WHERE rc.RootID = RootID
) AS b
CROSS APPLY (
SELECT ID AS LastChild FROM RelationshipChain WHERE b.FinalSeq = seq AND rc.RootID = RootID
) AS c
ORDER BY RootID, Seq
有没有办法更有效地做到这一点?
我没有你的数据集,所以无法测试这是否更好,但感觉更好。构建链后,我们将它们反转以在每个链中找到“childest”项目,然后重新加入原始链。
样本数据的结果:
索引将有助于提高性能。