我面临一个奇怪的问题,其中我分析了一个每 3 分钟运行一次的存储过程,并且给 CPU 带来了负载。我发现有许多选择语句是此过程的一部分,并且它们都在进行全表扫描(读取表中的所有页面)。因此,我在测试环境中对它们进行了测试,并使用非聚集索引来支持它。相关供应商也确认了这一点,他们同意进行更改。我昨天将它们部署到生产环境并检查了这些查询的逻辑读取,并在新索引之后进行了交叉检查,验证它具有积极影响并且逻辑读取下降了 1/10。
部署此索引后,每 3 分钟运行一次的程序立即开始失败。手动执行以检查问题,发现它每次都会导致死锁,除了像 1 小时内的 1 或 2 次。我完全不知道为什么索引会导致死锁?理想情况下,索引应该解决死锁,但它恰恰相反。
我指的表是一个堆,没有聚集主键,而是有非聚集主键。我已得到各方同意从 NC 更改为集群,但是它通过 PK-FK 关系与多个表链接,并且需要停机时间,因此它目前处于暂停状态。
我已经捕获了死锁图,并且还设置了 sp_blitzlock。应用程序查询和这个过程之间似乎发生了死锁,但是我不明白这个索引是如何导致它的,以及当我回滚这个索引时,它工作顺利并且没有死锁。
死锁图如下:
<deadlock>
<victim-list>
<victimProcess id="processef645b848" />
</victim-list>
<process-list>
<process id="processef645b848" taskpriority="0" logused="0" waitresource="PAGE: 6:1:965 " waittime="3661" ownerId="303318514" transactionname="INSERT" lasttranstarted="2020-11-02T12:18:12.793" XDES="0x31fbb78e0" lockMode="S" schedulerid="1" kpid="8564" status="suspended" spid="1246" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2020-11-02T12:18:00.500" lastbatchcompleted="2020-11-02T12:18:00.500" lastattention="1900-01-01T00:00:00.500" clientapp="SQLAgent - TSQL JobStep (Job 0x417A2365E91D1647B2C225CA23D84860 : Step 1)" hostname="DB_Server" hostpid="3628" loginname="SQL_Agent_Login" isolationlevel="read committed (2)" xactid="303318514" currentdb="7" currentdbname="DB_Name_1" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
<executionStack>
<frame procname="adhoc" line="1" stmtend="2888" sqlhandle="0x02000000c984e814ec51be0d03e3852ee0b755da518273d00000000000000000000000000000000000000000">
unknown </frame>
<frame procname="DB_Name_1.dbo.Procedure_Name" line="171" stmtstart="13412" stmtend="13482" sqlhandle="0x03000700151ddc58cd73c00013ac000001000000000000000000000000000000000000000000000000000000">
EXECUTE (@EXEC_IMMEDIATE_VAR) </frame>
<frame procname="adhoc" line="1" sqlhandle="0x01000700d070832b90a421810300000000000000000000000000000000000000000000000000000000000000">
EXEC Procedure_Name 'DB_User_Application' </frame>
</executionStack>
<inputbuf>
EXEC Procedure_Name 'DB_User_Application' </inputbuf>
</process>
<process id="processe1435468" taskpriority="0" logused="996" waitresource="PAGE: 6:1:88348 " waittime="3841" ownerId="303318578" transactionname="user_transaction" lasttranstarted="2020-11-02T12:18:13.007" XDES="0x255f023b0" lockMode="IX" schedulerid="1" kpid="4120" status="suspended" spid="1271" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2020-11-02T12:18:13.017" lastbatchcompleted="2020-11-02T12:18:13.007" lastattention="1900-01-01T00:00:00.007" clientapp="Vendor_Name" hostname="APP_Server_Name" hostpid="1296" loginname="DB_User_Application" isolationlevel="read committed (2)" xactid="303318578" currentdb="6" currentdbname="DB_Name_2" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
<executionStack>
<frame procname="adhoc" line="1" stmtstart="720" stmtend="1560" sqlhandle="0x020000001fdba70fc3e8a833920a732fe1c19b282e28ac1c0000000000000000000000000000000000000000">
unknown </frame>
<frame procname="adhoc" line="1" stmtend="1190" sqlhandle="0x02000000f228391fab59e520bc237b6a919f53ced0b1ac290000000000000000000000000000000000000000">
unknown </frame>
</executionStack>
<inputbuf>
update Deadlock_Table set Error_Code = '000',TIMEOUT_NETWORK_ID=NULL , var32_32='XXXX', var32_15='000', var32_22='1', var32_04='IB', var32_05='XXX', var256_01='Processed OK', var32_09='XXX', var32_14='048', var64_01='XXXXX', var32_06='02112020121802', var32_03='XXX', var32_10='XXXX', var32_07='XXXX', var32_02='XXX', var32_01='XXX', Node_Id='APP_Server_Name', Message_Id='XXXX', End_Point_Id='XXXXX' where Log_Id=XXXXX and Receive_Time='XXXXX' </inputbuf>
</process>
</process-list>
<resource-list>
<pagelock fileid="1" pageid="965" dbid="6" subresource="FULL" objectname="DB_Name_2.dbo.Deadlock_Table" id="lock4d151f900" mode="IX" associatedObjectId="72057594079739904">
<owner-list>
<owner id="processe1435468" mode="IX" />
</owner-list>
<waiter-list>
<waiter id="processef645b848" mode="S" requestType="wait" />
</waiter-list>
</pagelock>
<pagelock fileid="1" pageid="88348" dbid="6" subresource="FULL" objectname="DB_Name_2.dbo.Deadlock_Table" id="lock4e1da7d00" mode="S" associatedObjectId="72057594079739904">
<owner-list>
<owner id="processef645b848" mode="S" />
</owner-list>
<waiter-list>
<waiter id="processe1435468" mode="IX" requestType="wait" />
</waiter-list>
</pagelock>
</resource-list>
</deadlock>
下面是表的 DDL:
CREATE TABLE [dbo].[Deadlock_Table](
[Log_id] [int] IDENTITY(1,1) NOT NULL,
[Receive_time] [varchar](15) NOT NULL,
[A] [int] NOT NULL,
[VAR32_01] [varchar](32) NULL,
[VAR32_02] [varchar](32) NULL,
[VAR32_03] [varchar](32) NULL,
[VAR32_04] [varchar](32) NULL,
[VAR32_05] [varchar](32) NULL,
[VAR32_06] [varchar](32) NULL,
[VAR32_07] [varchar](32) NULL,
[VAR32_08] [varchar](32) NULL,
[VAR32_09] [varchar](32) NULL,
[VAR32_10] [varchar](32) NULL,
[VAR32_11] [varchar](32) NULL,
[VAR32_12] [varchar](32) NULL,
[VAR32_13] [varchar](32) NULL,
[VAR32_14] [varchar](32) NULL,
[VAR32_15] [varchar](32) NULL,
[VAR32_16] [varchar](32) NULL,
[VAR32_17] [varchar](32) NULL,
[VAR32_18] [varchar](32) NULL,
[VAR32_19] [varchar](32) NULL,
[VAR32_20] [varchar](32) NULL,
[VAR32_21] [varchar](32) NULL,
[VAR32_22] [varchar](32) NULL,
[VAR32_23] [varchar](32) NULL,
[VAR32_24] [varchar](32) NULL,
[VAR32_25] [varchar](32) NULL,
[VAR32_26] [varchar](32) NULL,
[VAR32_27] [varchar](32) NULL,
[VAR32_28] [varchar](32) NULL,
[VAR32_29] [varchar](32) NULL,
[VAR32_30] [varchar](32) NULL,
[VAR32_31] [varchar](32) NULL,
[VAR32_32] [varchar](32) NULL,
[VAR32_33] [varchar](32) NULL,
[VAR32_34] [varchar](32) NULL,
[VAR32_35] [varchar](32) NULL,
[VAR32_36] [varchar](32) NULL,
[VAR32_37] [varchar](32) NULL,
[VAR32_38] [varchar](32) NULL,
[VAR32_39] [varchar](32) NULL,
[VAR32_40] [varchar](32) NULL,
[VAR32_41] [varchar](32) NULL,
[VAR32_42] [varchar](32) NULL,
[VAR32_43] [varchar](32) NULL,
[VAR32_44] [varchar](32) NULL,
[VAR32_45] [varchar](32) NULL,
[VAR32_46] [varchar](32) NULL,
[VAR32_47] [varchar](32) NULL,
[VAR32_48] [varchar](32) NULL,
[VAR32_49] [varchar](32) NULL,
[VAR32_50] [varchar](32) NULL,
[VAR32_51] [varchar](32) NULL,
[VAR32_52] [varchar](32) NULL,
[VAR32_53] [varchar](32) NULL,
[VAR32_54] [varchar](32) NULL,
[VAR32_55] [varchar](32) NULL,
[VAR32_56] [varchar](32) NULL,
[VAR32_57] [varchar](32) NULL,
[VAR32_58] [varchar](32) NULL,
[VAR32_59] [varchar](32) NULL,
[VAR32_60] [varchar](32) NULL,
[VAR32_61] [varchar](32) NULL,
[VAR32_62] [varchar](32) NULL,
[VAR32_63] [varchar](32) NULL,
[VAR32_64] [varchar](32) NULL,
[VAR64_01] [varchar](64) NULL,
[VAR64_02] [varchar](64) NULL,
[VAR64_03] [varchar](64) NULL,
[VAR64_04] [varchar](64) NULL,
[VAR64_05] [varchar](64) NULL,
[VAR64_06] [varchar](64) NULL,
[VAR64_07] [varchar](64) NULL,
[VAR64_08] [varchar](64) NULL,
[VAR64_09] [varchar](64) NULL,
[VAR64_10] [varchar](64) NULL,
[VAR64_11] [varchar](64) NULL,
[VAR64_12] [varchar](64) NULL,
[VAR64_13] [varchar](64) NULL,
[VAR64_14] [varchar](64) NULL,
[VAR64_15] [varchar](64) NULL,
[VAR64_16] [varchar](64) NULL,
[VAR64_17] [varchar](64) NULL,
[VAR64_18] [varchar](64) NULL,
[VAR64_19] [varchar](64) NULL,
[VAR64_20] [varchar](64) NULL,
[VAR64_21] [varchar](64) NULL,
[VAR64_22] [varchar](64) NULL,
[VAR64_23] [varchar](64) NULL,
[VAR64_24] [varchar](64) NULL,
[VAR64_25] [varchar](64) NULL,
[VAR64_26] [varchar](64) NULL,
[VAR64_27] [varchar](64) NULL,
[VAR64_28] [varchar](64) NULL,
[VAR64_29] [varchar](64) NULL,
[VAR64_30] [varchar](64) NULL,
[VAR64_31] [varchar](64) NULL,
[VAR64_32] [varchar](64) NULL,
[VAR128_01] [varchar](128) NULL,
[VAR128_02] [varchar](128) NULL,
[VAR128_03] [varchar](128) NULL,
[VAR128_04] [varchar](128) NULL,
[VAR128_05] [varchar](128) NULL,
[VAR128_06] [varchar](128) NULL,
[VAR128_07] [varchar](128) NULL,
[VAR128_08] [varchar](128) NULL,
[VAR128_09] [varchar](128) NULL,
[VAR128_10] [varchar](128) NULL,
[VAR128_11] [varchar](128) NULL,
[VAR128_12] [varchar](128) NULL,
[VAR128_13] [varchar](128) NULL,
[VAR128_14] [varchar](128) NULL,
[VAR128_15] [varchar](128) NULL,
[VAR128_16] [varchar](128) NULL,
[VAR256_01] [varchar](256) NULL,
[VAR256_02] [varchar](256) NULL,
[VAR256_03] [varchar](256) NULL,
[VAR256_04] [varchar](256) NULL,
[VAR256_05] [varchar](256) NULL,
[VAR256_06] [varchar](256) NULL,
[VAR256_07] [varchar](256) NULL,
[VAR256_08] [varchar](256) NULL,
[VAR512_01] [varchar](512) NULL,
[VAR512_02] [varchar](512) NULL,
[VAR512_03] [varchar](512) NULL,
[VAR512_04] [varchar](512) NULL,
[VAR1024_01] [varchar](1024) NULL,
[VAR1024_02] [varchar](1024) NULL,
[E] [varchar](20) NULL,
[M] [varchar](40) NULL,
[E] [varchar](50) NULL,
[N] [varchar](40) NULL,
[TN] [int] NULL,
[T] [numeric](1, 0) NULL,
[D] [numeric](1, 0) NULL,
CONSTRAINT [XPKtable_name] PRIMARY KEY NONCLUSTERED
(
[Log_id] ASC,
[Receive_time] ASC
)
目前,除了第三列(比如 A)的主键之外,它只有一个索引。
我提出的索引如下:
CREATE NONCLUSTERED INDEX [IX_Deadlock_Table] ON [dbo].[Deadlock_Table]
(
var32_02 ,
D,
T
)
GO
感谢您阅读如此冗长的问题,非常感谢您在解决此僵局方面的投入。万一我错过了一些重要的细节,请将它们放在评论部分,我会添加它们。
更新语句的应用查询计划 --> https://www.brentozar.com/pastetheplan/?id=r14x3TCOD
在游标内运行的查询是多个插入语句,它们基本上是涉及该死锁表与其他表连接的选择语句,它们如下:
INSERT INTO Table_Name (SELECT LOG_ID, RECEIVE_TIME, N, E, CASE WHEN TRAN_SUMMARY = 1 AND DIRTY_FLAG = 1 THEN 1 ELSE 0 END,null,null,null,var32_07,null,var32_02,var32_01,var32_20,Coalesce(null,var32_10,var32_11,var32_21),var32_21,null,null,Coalesce(var32_31,var32_12,var32_13),null,null,null,null,null,convert(varchar(2),N),null,1,1,null,null,null,null,null,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,var32_17,var32_19,var32_18,'system','system' FROM deadlock_table A, ADPTR B WHERE A.ADPTR_ID = B.ADPTR_ID AND B.N =4 AND (A.TRAN_SUMMARY = 0 OR (A.TRAN_SUMMARY = 1 AND A.DIRTY_FLAG = 1)) AND var32_02 IN (SELECT B.CODE_ID FROM TRN_CODE A , GETCODEMAPPING B WHERE A.TRAN_CODE = B.CODE_NAME AND B.N = 4) AND var32_02 NOT IN ('0044')
INSERT INTO Table_Name (SELECT LOG_ID, RECEIVE_TIME, N, E, CASE WHEN TRAN_SUMMARY = 1 AND DIRTY_FLAG = 1 THEN 1 ELSE 0 END,var32_25,var32_10,var32_04,var32_24,null,var64_19,var32_14,var32_18,var64_11,var64_12,var64_02,var32_42,var32_20,var32_45,null,null,null,null,convert(varchar(2),N),var32_35,1,1,null,var32_43,var32_38,var32_39,null,var32_27,NULL,NULL,var32_19,NULL,NULL,var32_22,NULL,NULL,NULL,NULL,var64_22,var32_30,var64_23,'system','system' FROM deadlock_table A, ADPTR B WHERE A.ADPTR_ID = B.ADPTR_ID AND B.N =10 AND (A.TRAN_SUMMARY = 0 OR (A.TRAN_SUMMARY = 1 AND A.DIRTY_FLAG = 1)) AND var64_19 IN (SELECT B.CODE_ID FROM TRN_CODE A , GETCODEMAPPING B WHERE A.TRAN_CODE = B.CODE_NAME AND B.N = 10)
INSERT INTO Table_Name (SELECT LOG_ID, RECEIVE_TIME, N, E, CASE WHEN TRAN_SUMMARY = 1 AND DIRTY_FLAG = 1 THEN 1 ELSE 0 END,var32_25,var32_10,var32_04,var32_24,null,var64_19,var32_14,var32_18,var64_11,var64_12,var64_02,var32_42,var32_20,var32_45,null,null,null,null,var32_62,var32_35,1,1,null,var32_43,var32_38,var32_39,null,var32_27,NULL,NULL,var32_19,NULL,NULL,var32_22,var1024_02,NULL,NULL,NULL,'system','system' FROM deadlock_table A, ADPTR B WHERE A.ADPTR_ID = B.ADPTR_ID AND B.N =11 AND (A.TRAN_SUMMARY = 0 OR (A.TRAN_SUMMARY = 1 AND A.DIRTY_FLAG = 1)) AND var64_19 IN (SELECT B.CODE_ID FROM TRN_CODE A , GETCODEMAPPING B WHERE A.TRAN_CODE = B.CODE_NAME AND B.N = 11)
INSERT INTO Table_Name (SELECT LOG_ID, RECEIVE_TIME, N, E, CASE WHEN TRAN_SUMMARY = 1 AND DIRTY_FLAG = 1 THEN 1 ELSE 0 END,var32_25,var32_10,var32_04,var32_24,null,var64_19,var32_14,var32_18,var64_11, case when var64_19 = '64' then var64_10 else var64_12 end as TO_ACCOUNT_NUM, var64_02,var32_42,var32_20,var32_45,null,null,null,null,var32_62,var32_35,1,1,null,var32_43,var32_38,var32_39,null,var32_27,NULL,NULL,var32_19,NULL,NULL,var32_22,var1024_02,var64_17,var64_21,var64_20,'system','system' FROM deadlock_table A, ADPTR B WHERE A.ADPTR_ID = B.ADPTR_ID AND B.N =12 AND (A.TRAN_SUMMARY = 0 OR (A.TRAN_SUMMARY = 1 AND A.DIRTY_FLAG = 1)) AND (var64_19 IN (SELECT B.CODE_ID FROM TRN_CODE A , GETCODEMAPPING B WHERE A.TRAN_CODE = B.CODE_NAME AND B.N = 12) or var64_19 in (SELECT B.CODE_NAME FROM TRN_CODE A , GETCODEMAPPING B WHERE A.TRAN_CODE = B.CODE_NAME AND B.N = 12))
INSERT INTO Table_Name (SELECT LOG_ID, RECEIVE_TIME, N, E, CASE WHEN TRAN_SUMMARY = 1 AND DIRTY_FLAG = 1 THEN 1 ELSE 0 END,null,null,null,var32_44,null,var32_02,var32_46,null,null,null,null,null,null,null,null,null,null,null,convert(varchar(2),N),null,1,1,null,null,null,null,null,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,'system','system' FROM deadlock_table A, ADPTR B WHERE A.ADPTR_ID = B.ADPTR_ID AND B.N =13 AND (A.TRAN_SUMMARY = 0 OR (A.TRAN_SUMMARY = 1 AND A.DIRTY_FLAG = 1)) AND var32_02 IN (SELECT B.CODE_ID FROM TRN_CODE A , GETCODEMAPPING B WHERE A.TRAN_CODE = B.CODE_NAME AND B.N = 13)
没有我的索引的第一个选择语句的查询计划(以其当前形式) - https://www.brentozar.com/pastetheplan/?id=BkDnpbktw
创建索引后选择语句的查询计划 - https://www.brentozar.com/pastetheplan/?id=r1SCeGktD
以下是索引创建前后的逻辑读取:
Table 'deadlock_table'. Scan count 5, logical reads 326203, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'deadlock_table'. Scan count 58, logical reads 12055, physical reads 0, read-ahead reads 23, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL 版本
版本:Microsoft SQL Server 2014 (SP3) (KB4022619) - 12.0.6024.0 (X64) Sep 7 2018 01:37:51 Enterprise Edition:Windows NT 6.3 (Build 9600:) 上基于内核的许可(64 位)(管理程序) )
使用匿名计划和僵局很难详细回答您的问题,而不是关于僵局中涉及的代理工作的大量信息。但总的来说,这个问题的答案可能有助于您找出问题所在:
添加索引可能会导致查询以与没有这些索引时不同的顺序访问数据。
在您的情况下,应用程序查询具有以下模式:
dbo.deadlock_table
在该行所在的页面上获取 IX 锁(当然还有实际行上的 X 锁)添加新的 NC 索引后,代理作业查询似乎正在以相反的顺序请求这些堆页面上的 S 锁。
问题是堆没有定义的顺序。假设您在代理作业查询中有 RID 查找(即新的 NC 索引不是覆盖索引),那么它本质上是在整个堆中“随机”发出 S 锁。应用程序查询更新的行虽然在 NC PK 中彼此相邻,但实际上也可以随机分布在整个堆中。
如果没有 NC 索引,代理作业查询对堆的访问可能更可预测(分配有序扫描)或与 PK 对齐(扫描 NC PK + RID 查找),因此您能够避免这些死锁。
在我看来,根据现有信息,一些可行的选择是: