在过去的几天里,在 8 周没有错误之后,我们遇到了 3 次这个奇怪的错误,我很难过。
这是错误消息:
Executing the query "EXEC dbo.MergeTransactions" failed with the following error: "Cannot insert duplicate key row in object 'sales.Transactions' with unique index 'NCI_Transactions_ClientID_TransactionDate'. The duplicate key value is (1001, 2018-12-14 19:16:29.00, 304050920).".
我们拥有的索引不是唯一的。如果您注意到,错误消息中的重复键值甚至没有与索引对齐。奇怪的是,如果我重新运行 proc,它会成功。
这是我能找到的最新链接,它有我的问题,但我没有看到解决方案。
关于我的场景的几件事:
- proc 正在更新 TransactionID(主键的一部分) - 我认为这是导致错误的原因,但不知道为什么?我们将删除该逻辑。
- 在表上启用更改跟踪
- 做事务读取未提交
每个表有45个字段,我主要列出了索引中使用的字段。我正在更新语句中的 TransactionID(聚集键)(不必要)。奇怪的是,直到上周我们几个月都没有遇到任何问题。它只是通过 SSIS 偶尔发生。
桌子
USE [DB]
GO
/****** Object: Table [sales].[Transactions] Script Date: 5/29/2019 1:37:49 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[sales].[Transactions]') AND type in (N'U'))
BEGIN
CREATE TABLE [sales].[Transactions]
(
[TransactionID] [bigint] NOT NULL,
[ClientID] [int] NOT NULL,
[TransactionDate] [datetime2](2) NOT NULL,
/* snip*/
[BusinessUserID] [varchar](150) NOT NULL,
[BusinessTransactionID] [varchar](150) NOT NULL,
[InsertDate] [datetime2](2) NOT NULL,
[UpdateDate] [datetime2](2) NOT NULL,
CONSTRAINT [PK_Transactions_TransactionID] PRIMARY KEY CLUSTERED
(
[TransactionID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION=PAGE) ON [DB_Data]
) ON [DB_Data]
END
GO
USE [DB]
IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[sales].[Transactions]') AND name = N'NCI_Transactions_ClientID_TransactionDate')
begin
CREATE NONCLUSTERED INDEX [NCI_Transactions_ClientID_TransactionDate] ON [sales].[Transactions]
(
[ClientID] ASC,
[TransactionDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON [DB_Data]
END
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[sales].[DF_Transactions_Units]') AND type = 'D')
BEGIN
ALTER TABLE [sales].[Transactions] ADD CONSTRAINT [DF_Transactions_Units] DEFAULT ((0)) FOR [Units]
END
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[sales].[DF_Transactions_ISOCurrencyCode]') AND type = 'D')
BEGIN
ALTER TABLE [sales].[Transactions] ADD CONSTRAINT [DF_Transactions_ISOCurrencyCode] DEFAULT ('USD') FOR [ISOCurrencyCode]
END
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[sales].[DF_Transactions_InsertDate]') AND type = 'D')
BEGIN
ALTER TABLE [sales].[Transactions] ADD CONSTRAINT [DF_Transactions_InsertDate] DEFAULT (sysdatetime()) FOR [InsertDate]
END
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[sales].[DF_Transactions_UpdateDate]') AND type = 'D')
BEGIN
ALTER TABLE [sales].[Transactions] ADD CONSTRAINT [DF_Transactions_UpdateDate] DEFAULT (sysdatetime()) FOR [UpdateDate]
END
GO
临时表
same columns as the mgdata. including the relevant fields. Also has a non-unique clustered index
(
[BusinessTransactionID] [varchar](150) NULL,
[BusinessUserID] [varchar](150) NULL,
[PostalCode] [varchar](25) NULL,
[TransactionDate] [datetime2](2) NULL,
[Units] [int] NOT NULL,
[StartDate] [datetime2](2) NULL,
[EndDate] [datetime2](2) NULL,
[TransactionID] [bigint] NULL,
[ClientID] [int] NULL,
)
CREATE CLUSTERED INDEX ##workingTransactionsMG_idx ON #workingTransactions (TransactionID)
It is populated in batches (500k rows at a time), something like this
IF OBJECT_ID(N'tempdb.dbo.#workingTransactions') IS NOT NULL DROP TABLE #workingTransactions;
select fields
into #workingTransactions
from import.Transactions
where importrowid between two number ranges -- pseudocode
首要的关键
CONSTRAINT [PK_Transactions_TransactionID] PRIMARY KEY CLUSTERED
(
[TransactionID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION=PAGE) ON [Data]
) ON [Data]
非聚集索引
CREATE NONCLUSTERED INDEX [NCI_Transactions_ClientID_TransactionDate] ON [sales].[Transactions]
(
[ClientID] ASC,
[TransactionDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE)
样本更新声明
-- updates every field
update t
set
t.transactionid = s.transactionid,
t.[CityCode]=s.[CityCode],
t.TransactionDate=s.[TransactionDate],
t.[ClientID]=s.[ClientID],
t.[PackageMonths] = s.[PackageMonths],
t.UpdateDate = @UpdateDate
FROM #workingTransactions s
JOIN [DB].[sales].[Transactions] t
ON s.[TransactionID] = t.[TransactionID]
WHERE CAST(HASHBYTES('SHA2_256 ',CONCAT( S.[BusinessTransactionID],'|',S.[BusinessUserID],'|', etc)
<> CAST(HASHBYTES('SHA2_256 ',CONCAT( T.[BusinessTransactionID],'|',T.[BusinessUserID],'|', etc)
我的问题是,引擎盖下发生了什么?解决方案是什么?作为参考,上面的链接提到了这一点:
在这一点上,我有几个理论:
- 与内存压力或大型并行更新计划相关的错误,但我预计会出现不同类型的错误,到目前为止,我无法将低资源与这些孤立和零星错误的时间范围相关联。
- UPDATE 语句或数据中的错误导致对主键的实际重复违规,但一些模糊的 SQL Server 错误导致错误消息引用错误的索引名称。
- 由读取未提交隔离导致的脏读取导致大量并行更新双重插入。但是 ETL 开发人员声称使用了默认的已提交读,并且很难准确确定进程在运行时实际使用的隔离级别。
我怀疑如果我调整执行计划作为解决方法,也许是 MAXDOP (1) 提示或使用会话跟踪标志来禁用假脱机操作,错误就会消失,但目前尚不清楚这将如何影响性能
版本
Microsoft SQL Server 2017 (RTM-CU13) (KB4466404) - 14.0.3048.4 (X64) 2018 年 11 月 30 日 12:57:58 版权所有 (C) 2017 Microsoft Corporation Enterprise Edition (64-bit) on Windows Server 2016 Standard 10.0 (Build 14393) :)
这是一个错误。问题是它只是偶尔发生,并且很难重现。不过,您最好的机会是获得 Microsoft 支持。更新处理非常复杂,因此需要进行非常详细的调查。
有关所涉及的复杂性的示例,请查看我的帖子MERGE Bug with Filtered Indexes和Incorrect Results with Indexed Views。这些都与您的问题没有直接关系,但它们确实给人一种感觉。
编写确定性更新
当然,这都是相当笼统的。也许更有用的是,我可以说你应该重写你当前的
UPDATE
陈述。正如文档所说:你
UPDATE
不是确定性的,因此结果是不确定的。您应该更改它,以便为每个目标行标识最多一个源行。如果没有该更改,更新的结果可能不会反映任何单独的源行。例子
让我向您展示一个示例,使用根据问题中给出的表格松散地建模的表格:
为简单起见,在目标表中放置一行,在源表中放置四行:
所有四个源行都与目标 on 匹配
TransactionID
,所以如果我们运行单独连接的更新(如问题中的那个),将TransactionID
使用哪一个?(更新
TransactionID
专栏对于demo来说并不重要,喜欢的可以注释掉。)第一个惊喜是
UPDATE
完成没有错误,尽管目标表不允许任何列中的空值(所有候选行都包含空值)。重要的一点是结果是undefined,在这种情况下会产生一个与任何源行都不匹配的结果:
db<>小提琴演示
更多详细信息:ANY 聚合已损坏
更新应该写成如果写成等效
MERGE
语句,它会成功,它会检查多次更新同一目标行的尝试。我一般不建议MERGE
直接使用,因为它已经受到很多实现错误的影响,通常性能更差。作为奖励,您可能会发现将当前更新重写为确定性会导致偶尔出现的错误问题也消失。当然,对于编写非确定性更新的人来说,产品错误仍然存在。