SQL从一个表中获取另一个表中的多个条目的ID

Question

Greg

Asked: 2023-06-15 07:43:01 +0800 CST2023-06-15 07:43:01 +0800 CST 2023-06-15 07:43:01 +0800 CST

Sql Perf - 为什么查询执行聚簇索引扫描而不是使用定义的非聚簇索引

772

我有一个对非常大的表执行聚簇索引扫描的查询，该扫描在某些情况下会导致超时。需要帮助理解为什么它不使用定义的非聚集索引。

这是查询：

DECLARE @StartDate datetime = '2023-03-16 00:00:00';

DECLARE @TerminalIds [dbo].[udtBigInt]; -- user defined table with a BIGINT col
INSERT INTO @TerminalIds ([Id])
SELECT [EquipmentId]
FROM #mechanicsTerminal;

SELECT [DataRecId]
    , [RawData]
    , [RecordingTime]
    , [EquipmentId]
FROM [dbo].[Data]
WHERE [EquipmentId] IN (SELECT [Id] FROM @TerminalIds)
AND [RecordingTime] >= @StartDate
ORDER BY [DataRecId] DESC
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;

这是表定义：

CREATE TABLE [dbo].[Data](
    [DataRecId] [bigint] IDENTITY(1,1) NOT NULL,
    [RawData] [nvarchar](max) NOT NULL,
    [CreatedDateUTC] [datetime] NOT NULL,
    [RecordingTime] [datetime] NOT NULL,
    [EquipmentId] [bigint] NOT NULL,
    [DataSetId] [uniqueidentifier] NULL,
    [SourceType] [nvarchar](50) NULL,
    [Name] [nvarchar](100) NULL,
PRIMARY KEY CLUSTERED ( DataRecId] ASC)
GO
ALTER TABLE [EJ].[Data]  WITH CHECK ADD  CONSTRAINT [chk_Data_RawData] CHECK  ((isjson([RawData])=(1)))
GO

以下是索引：

CREATE INDEX [nc_Data_DataSetId_includes] 
ON [dbo].[Data] ( [DataSetId] ) INCLUDE ( [DataRecId], [RawData], [RecordingTime]);
GO
CREATE INDEX [nc_Data_EquipmentId_includes] 
ON [dbo].[Data] ( [EquipmentId] ) INCLUDE ( [DataSetId], [RawData]);
GO
CREATE INDEX [nc_Data_EquipmentId_RecordingTime_Name_includes] 
ON [dbo].[Data] ( [EquipmentId], [RecordingTime], [Name] ) INCLUDE ( [DataRecId], [RawData]);
GO

这是实际的执行计划：

https://www.brentozar.com/pastetheplan/?id=B1oq7TDD3

使用此特定数据，查询将在亚秒级执行。

然而，有一种情况是中只有三个记录@TerminalIds，而中没有匹配的记录[dbo].[Data]，查询永远不会完成。这是 45 秒后的计划。

https://www.brentozar.com/pastetheplan/?id=rJJMRavDn

我试过的：

更新统计数据并重新编译主过程
继续而不是用子句做子INNER JOIN查询@TerminalIdsIN

1 个回答

Voted

Charlieface · Answer 1 · 2023-06-15T09:37:09+08:00

问题是您在上使用了不等式RecordingTime，但使用了OFFSET FETCHordered by DataRecId。服务器似乎认为这种排序更重要，会更快地减少行数，所以它求助于扫描主键索引，希望它能快速找到这 50 行。

您可以nc_Data_EquipmentId_RecordingTime_Name_includes通过像这样重写查询来强制它读取索引

SELECT DataRecId
    , RawData
    , RecordingTime
    , EquipmentId
FROM (
    SELECT TOP (1000000000) *
    FROM dbo.Data
    WHERE EquipmentId IN (SELECT Id FROM @TerminalIds)
      AND RecordingTime >= @StartDate
    ORDER BY
      EquipmentId,
      RecordingTime
) t
ORDER BY DataRecId DESC
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;

假设，正如我所怀疑的那样，您正在使用实体框架之类的 ORM，您可能想要这样的东西

context.Data
  .Where(d => TerminalIds.Contains(d.EquipmentId) && d.RecordingTime >= StartDate)
  .OrderBy(d => d.EquipmentId)
  .ThenBy(d => d.RecordingTime)
  .Take(1000000000)
  .OrderByDescending(d => d.DataRecId)
  .Take(50)

DataRecId如果您可以首先删除排序依据的要求，那么您可能会显着提高选择正确索引的机会。

您还应该向表类型添加一个主键。这将从计划中删除排序和假脱机。

DROP TYPE dbo.udtBigInt;
CREATE TYPE dbo.udtBigInt (Id bigint PRIMARY KEY);

Sql Perf - 为什么查询执行聚簇索引扫描而不是使用定义的非聚簇索引

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

Sql Perf - 为什么查询执行聚簇索引扫描而不是使用定义的非聚簇索引

1 个回答

相关问题