背景
我正在尝试建立一个“访问”序列,其中如果在基本相同的地方General_Location
(
因此,如果在所有 A(n) 个位置都属于“A”的Location
序列中检测到动物,则前 6 次检测将作为访问 1 (@A),接下来作为访问 2 (@B),然后作为访问 3 (@D),然后访问 4 (返回 @A)。
A1, A2, A3, A3, A3, A1, B2, D4, A2
General_Location
由于LAG
并且在 SQL Server 2008R2 中不可用(在ing 子句LEAD
中也不可用),我正在尝试按照SQL Authority 博客条目中所述的解决方法。UNBOUNDED PRECEDING
PARTITION
我遇到了以下内存问题(更不用说计算时间了):
WITH s AS (
SELECT
RANK() OVER (PARTITION BY det.Technology, det.XmitID ORDER BY DetectDate ASC, ReceiverID ASC) as DetID,
COALESCE(TA.AnimalID, det.Technology+'-'+cast(da.XmitID AS nvarchar), 'BSVALUE999') as AnimalID,
det.Technology, det.XmitID, DetectDate, det.location as Location, RL.General_Location as GLocation, ReceiverID
FROM
Detections_with_Location as det JOIN
Receiver_Locations as RL
ON det.Location=RL.Location LEFT OUTER JOIN
Tagged_Animal as TA
ON det.AnimalID=TA.AnimalID
)
INSERT INTO ##ttOrder_det (AnimalID, Technology, XmitID, DD, Location, GLocation, ReceiverID, DetID, PrevDD, BinInc)
SELECT
s1.AnimalID, --was a coalesce
s1.Technology, s1.XmitID, s1.DetectDate, s1.Location, s1.GLocation, s1.ReceiverID,
s1.DetID,
sLAG.DetectDate,
CASE WHEN sLAG.DetectDate IS NULL
THEN 1
ELSE CASE WHEN sLAG.GLocation = s1.GLocation
THEN 0
ELSE 1
END
END AS BinInc
FROM s as s1
LEFT OUTER JOIN s AS sLAG ON
s1.DetID = sLAG.DetID + 1 AND
s1.AnimalID= sLAG.AnimalID --and s.Technology=sLAG.Technology and s.XmitID=sLAG.XmitID;
正如各种用户(@MartinSmith、@Frisbee)已经提到或暗示的那样,使用AnimalID
不是 的完整主键Tagged_Animal
,也不是在唯一约束中定义的。但是,表中的行数A.AnimalID=B.AnimalID AND A.TagSN<B.TagSN
(当前)为零。为了使这个查询健壮,我必须强制它是唯一的(或者只是从 PK 中删除 TagSN)。
表和索引定义
##ttOrder_det(临时表)
当前在填充表格之前创建了索引;我正在进行测试,在填充临时表后将NONCLUSTERED
非索引创建转移到一个位置。UNIQUE
CREATE TABLE ##ttOrder_det (
AnimalID nvarchar(50) not null,
Technology varchar(25) not null,
XmitID int not null,
DD DateTime not null,
[Location] [nvarchar](255) NULL,
[GLocation] nvarchar(255) NULL,
PrevDD DateTime NULL,
ReceiverID int not null,
DetID int NOT NULL,
BinInc int NULL,
BinNum int NULL,
CONSTRAINT [PK_ttRDA] PRIMARY KEY CLUSTERED
([AnimalID] ASC, [DD] ASC, ReceiverID ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
GO
CREATE NONCLUSTERED INDEX NIX_F on ##ttOrder_det (AnimalID ASC);
CREATE NONCLUSTERED INDEX NIX_VTC ON ##ttOrder_det (ReceiverID ASC, AnimalID ASC);
CREATE NONCLUSTERED INDEX NIX_TCD ON ##ttOrder_det (AnimalID ASC, DD ASC);
CREATE NONCLUSTERED INDEX NIX_R ON ##ttOrder_det (DetID ASC);
CREATE NONCLUSTERED INDEX NIX_A ON ##ttOrder_det (GLocation ASC);
CREATE NONCLUSTERED INDEX NIX_DD ON ##ttOrder_det (DD, PrevDD);
CREATE UNIQUE INDEX UX_CTR ON ##ttOrder_det (AnimalID ASC, DetID ASC);
CREATE NONCLUSTERED INDEX NIX_Bi ON ##ttOrder_det (BinInc ASC);
CREATE NONCLUSTERED INDEX NIX_CT ON ##ttOrder_det (XmitID ASC, Technology ASC);
标记_动物
CREATE TABLE [dbo].[Tagged_Animal](
[DateTagged] [datetime] NULL,
[AnimalID] [nvarchar](50) NOT NULL,
[TagSN] [nvarchar](50) NOT NULL,
[XmitID] [int] NULL,
[Technology] [varchar](25) NULL,
[Animal_SubType] [nvarchar](50) NULL,
[Species] [nvarchar](30) NULL,
[StudyID] [nvarchar](50) NULL,
[Weight] [float] NULL,
[Length] [int] NULL,
[Length_Type] [nvarchar](50) NULL,
[Date_Released] [datetime] NULL,
[Release_Location] [nvarchar](50) NULL,
[Lat] [float] NULL,
[Lon] [float] NULL,
[Course_Dist_km] [float] NULL,
[Sex] [nvarchar](255) NULL,
[Life_Stage] [nvarchar](255) NULL,
[Marking_Method] [nvarchar](255) NULL,
[Tag_Type] [varchar](30) NULL,
[Notes] [nvarchar](255) NULL,
CONSTRAINT [PK_tbl_Tagged_Animal] PRIMARY KEY CLUSTERED
(
[AnimalID] ASC,
[TagSN] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [I_TF_TagCode] ON [dbo].[Tagged_Animal]
(
[XmitID] ASC,
[Technology] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Detections_with_Location
CREATE TABLE [dbo].[Detections_with_Location](
[AnimalID] [nvarchar](50) NOT NULL,
[XmitID] [int] NOT NULL,
[Technology] [varchar](25) NOT NULL,
[DetectDate] [datetime] NOT NULL,
[ReceiverID] [int] NOT NULL,
[Data] [float] NULL,
[Units] [varchar](50) NULL,
[Location] [nvarchar](255) NULL,
[StartD] [datetime] NULL,
[StopD] [datetime] NULL,
[fname] [nvarchar](255) NULL,
[notes] [nvarchar](max) NULL,
CONSTRAINT [PK_dlwk] PRIMARY KEY CLUSTERED
(
[ReceiverID] ASC,
[Technology] ASC,
[XmitID] ASC,
[DetectDate] ASC,
[AnimalID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [NIX_VTC] ON [dbo].[Detections_with_Location]
(
[ReceiverID] ASC,
[AnimalID] ASC,
[XmitID] ASC,
[Technology] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [NIX_TCpi] ON [dbo].[Detections_with_Location]
(
[XmitID] ASC,
[Technology] ASC
)
INCLUDE ( [DetectDate],
[ReceiverID],
[Data],
[Units],
[Location]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [NIX_TCD] ON [dbo].[Detections_with_Location]
(
[AnimalID] ASC,
[XmitID] ASC,
[Technology] ASC,
[DetectDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [NIX_F] ON [dbo].[Detections_with_Location]
(
[AnimalID] ASC
)
INCLUDE ( [XmitID],
[Technology],
[DetectDate],
[ReceiverID],
[Data],
[Units],
[Location]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [NIX_DSS] ON [dbo].[Detections_with_Location]
(
[ReceiverID] ASC,
[Location] ASC,
[StartD] ASC,
[StopD] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Receiver_Locations
CREATE TABLE [dbo].[Receiver_Locations](
[Region] [nvarchar](50) NULL,
[Location_Long] [nvarchar](50) NULL,
[Location] [nvarchar](50) NOT NULL,
[Lat] [float] NULL,
[Lon] [float] NULL,
[Altitude] [float] NULL,
[Elevation] [float] NULL,
[RiverKm] [float] NULL,
[LocationType] [nvarchar](50) NULL,
[General_Location] [nvarchar](50) NULL,
[Nearest_Access] [nvarchar](50) NULL,
[Responsible_Agent] [nvarchar](50) NULL,
[Agent_Phone] [nvarchar](255) NULL,
[Agent_Email] [nvarchar](255) NULL,
CONSTRAINT [PK_tbl_Receiver_Locations] PRIMARY KEY CLUSTERED
(
[Location] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
桌子尺寸
Tagged_Animals:
10,000 检测_with_Location: 46+ 百万条目
Receiver_Locations: 800
收到的特定错误
无法为数据库“tempdb”中的对象“dbo.SORT 临时运行存储:140737631617024”分配空间,因为“PRIMARY”文件组已满。通过删除不需要的文件、删除文件组中的对象、向文件组添加其他文件或为文件组中的现有文件设置自动增长来创建磁盘空间。
数据库“tempdb”的事务日志已满。要找出日志中的空间不能被重用的原因,请参阅 sys.databases (tempdb ACTIVE_TRANSACTION) 中的 log_reuse_wait_desc 列
执行批处理时出错。错误消息是:抛出了“System.OutOfMemoryException”类型的异常——(如果
s
在大约 3300 万条记录后直接选择)。
(估计)初始代码的执行计划摘要
INSERT
成本 0%
SEQUENCE
成本 0% 但从 9 个不同的子步骤中提取。这些子步骤(具有典型成本)是Index Insert
(2%),其次是Sort
(8%),然后是Table Spool
(2%)。NIX_A
有 9% 的成本,Index Insert
既没有NIX_TCD
也NIX_F
没有Sort
步骤;Table Spool
forNIX_F
是免费的。费用为 10%
。分发流
还有2% 的成本和 1% 的成本。
对于成本,它似乎增加了 95%,其他步骤花费了 13%,所以很明显某处存在一些舍入“错误”,可能主要在 14% 的序列中 。Clustered Index Insert
Sort
Parallelism
SEQUENCE
Index Insert
Sort
Table Spool
注释/参考
基于SQL 权威博客条目
的 LAG/LEAD 实现
另见此 Stackexchange 线程
我的问题
有什么改进的建议吗?
当我加入副本时,我也可以分区
s
吗?如果我制作
s
一个谨慎的临时表并适当地索引它,情况会有所改善吗?在执行所有插入之后
UNIQUE
在临时表中创建非索引会更有效吗?我假设必须预先创建(因此)索引以防止违反键约束。UNIQUE
PRIMARY KEY
回答我自己的问题之一
- 是的,是的。进一步优化后
- 21 分钟用数据填充临时表
- 1分钟索引
这个过程之前至少进行了 1.5 小时,出错并且没有生成结果表。在我开始摆弄查询逻辑之前,实际上需要 4 个多小时才能出错。
服务器规格:
处理器:Xeon E3-1240 V2 @ 3.4 GHz(4 核/8 线程)
内存:16 GB
分页文件:16 GB on 111 GB SSD(52 GB free)
tempdb + my database on 223 GB SSD(119 GB free)
当前状态
请参阅我发布的解决方案/答案。
我将此作为答案提交,因为我目前正在避免内存不足类型错误以及显着减少运行时间(4 小时以上,以失败告终;现在是 1.25 小时,以成功告终)。但是,我确信在大约 1000 万条记录之后,这个问题可能会再次发生,所以我将不胜感激任何其他旨在提高未来内存效率的评论或答案。
对此的“解决方案”是从临时表的设计中删除不需要的字段,尤其是索引。此外,非约束键的索引创建被推迟到表被填满之后。
为了解决分区不匹配的问题(@MartinSmith 首先指出)
JOIN
,我在子查询中创建了一个字段,然后我将其用于PARTITION
ing 和 forJOIN
ing。Q的代码
快速评论
我改用 using
ROW_NUMBER
而不是RANK
. 这是不确定的,但至少会导致没有“联系”,这将导致LAG
围绕该点的实施中断。联系不应该存在,但这只是针对General_Location
那些非常接近并共同检测相同传输的 s 的未来证明。同样,正如两位用户在上面的评论中指出的那样,我没有使用
Tagged_Animal
表的完整 PK,因此有可能存在JOIN
带有模棱两可的AnimalID
. 然而,目前,两者AnimalID
和TagSN
都是UNIQUE
,尽管不受约束。我计划放弃该
ReceiverID
领域以支持使用Location
,但我有一段时间在同一位置部署了两个接收器(一个接收器被假定丢失但后来被发现)确实同时检测到同一只动物完成任务的后续代码