我已在 Kubernetes 上部署了 SQL Server 2019 实例,并在 SQL Server 上启用了更改数据捕获 (CDC)。
我会定期在日志中遇到“Non-yielding Scheduler”错误,此后数据库开始消耗所有分配的 CPU 资源并停止响应查询。
在问题发生之前,没有任何资源短缺的迹象,并且数据库没有承受过重的负载,因为这是一个开发设置。Raid 中 Kubernetes 和数据库都存储在两个 SSD 上,因此我怀疑读写限制是原因。
在启用 CDC 之前,数据库运行没有问题。图片:2019-CU25-ubuntu-20.04(环境:PID - 开发人员,AGENT_ENABLED - true)日志:
getspinlock pre-Sleep(): spid 0, 1790 yields on lock type "XDESMGR" (adr 00000010029316C0)
getspinlock pre-Sleep(): spid 0, 1685 yields on lock type "XDESMGR" (adr 00000010029316C0)
getspinlock pre-Sleep(): spid 0, 1467 yields on lock type "XDESMGR" (adr 00000010029316C0)
getspinlock pre-Sleep(): spid 0, 3232 yields on lock type "XDESMGR" (adr 00000010029316C0)
getspinlock pre-Sleep(): spid 0, 3124 yields on lock type "XDESMGR" (adr 00000010029316C0)
getspinlock pre-Sleep(): spid 0, 2904 yields on lock type "XDESMGR" (adr 00000010029316C0)
getspinlock pre-Sleep(): spid 0, 4531 yields on lock type "XDESMGR" (adr 00000010029316C0)
getspinlock pre-Sleep(): spid 0, 4276 yields on lock type "XDESMGR" (adr 00000010029316C0)
getspinlock pre-Sleep(): spid 0, 3827 yields on lock type "XDESMGR" (adr 00000010029316C0)
getspinlock pre-Sleep(): spid 0, 692 yields on lock type "XDESMGR" (adr 00000010029316C0)
getspinlock pre-Sleep(): spid 0, 691 yields on lock type "XDESMGR" (adr 00000010029316C0)
2024-03-23 14:53:13.66 Server Using 'dbghelp.dll' version '4.0.5'
getspinlock pre-Sleep(): spid 0, 667 yields on lock type "XDESMGR" (adr 00000010029316C0)
2024-03-23 14:53:14.15 Server ***Unable to get thread context for spid 0
2024-03-23 14:53:14.16 Server * *******************************************************************************
2024-03-23 14:53:14.16 Server *
2024-03-23 14:53:14.16 Server * BEGIN STACK DUMP:
2024-03-23 14:53:14.16 Server * 03/23/24 14:53:14 spid 468
2024-03-23 14:53:14.16 Server *
2024-03-23 14:53:14.16 Server * Non-yielding Scheduler
2024-03-23 14:53:14.17 Server *
2024-03-23 14:53:14.17 Server * *******************************************************************************
2024-03-23 14:53:14.17 Server Stack Signature for the dump is 0x0000000000000014
getspinlock pre-Sleep(): spid 0, 5171 yields on lock type "XDESMGR" (adr 00000010029316C0)
2024-03-23 14:53:20.87 Server External dump process return code 0x20000001.
External dump process returned no errors.
2024-03-23 14:53:20.88 Server Process 0:0:0 (0x570) Worker 0x000000101E398160 appears to be non-yielding on Scheduler 0. Thread creation time: 13355613019271. Approx Thread CPU Used: kernel 100 ms, user 38760 ms. Process Utilization 12%. System Idle 0%. Interval: 70011 ms.
getspinlock pre-Sleep(): spid 0, 4854 yields on lock type "XDESMGR" (adr 00000010029316C0)
2024-03-23 14:53:25.96 Server Process 0:0:0 (0x168) Worker 0x000000101A354160 appears to be non-yielding on Scheduler 15. Thread creation time: 13355563362779. Approx Thread CPU Used: kernel 0 ms, user 37360 ms. Process Utilization 12%. System Idle 0%. Interval: 77297 ms.
getspinlock pre-Sleep(): spid 0, 4403 yields on lock type "XDESMGR" (adr 00000010029316C0)
2024-03-23 14:53:30.97 Server Process 0:0:0 (0x88) Worker 0x00000010389FE160 appears to be non-yielding on Scheduler 9. Thread creation time: 13355657814870. Approx Thread CPU Used: kernel 70 ms, user 35910 ms. Process Utilization 12%. System Idle 0%. Interval: 77313 ms.
getspinlock pre-Sleep(): spid 0, 1243 yields on lock type "XDESMGR" (adr 00000010029316C0)
getspinlock pre-Sleep(): spid 0, 1246 yields on lock type "XDESMGR" (adr 00000010029316C0)
.....
我该如何解决或追踪此问题的原因?
我尝试过的方法:我尝试通过增加资源限制并设置 SQL Server 可以消耗的最小和最大内存量来解决“Non-yielding Scheduler”错误。尽管进行了这些调整,问题仍然存在,没有任何改善。
我的期望:我期望调整 SQL Server 的资源限制和内存消耗设置可以缓解该问题,允许数据库运行而不会触发“Non-yielding Scheduler”错误和随后的 CPU 资源耗尽。
Microsoft SQL Server 2019 (RTM-CU25) (KB5033688) - 15.0.4355.3 (X64) 2024 年 1 月 30 日 17:02:22
我对我的 MS SQL Server 2019 实例应用了AlwaysLearning的建议:
这就是我所做的:
创建了一个
mssql.conf
包含以下内容的文件:作为部署过程的一部分,将此文件放置在/var/opt/mssql/目录中。
经过近一个半月的监控系统,该问题并未再次出现。因此,我相信这个建议可以解决这个问题。