正如我所读到的,大多数时候索引搜索比索引扫描更受欢迎,我正在尝试一些东西。
我有一个查询在使用索引扫描时进行 993 次读取(使用 SQL Profiler 检查)。使用索引查找时,它需要 44.347 次读取。感觉有些不对劲,或者我不明白。
这是索引扫描的查询:
select t5.Id as t5Id
from table1 t1
left join table2 t2 on t2.Table1Id = t1.Id
left join table3 t3 on t3.Table2Id = t2.Id
left join table4 t4 on t4.Table3Id = t3.Id
left join table5 t5 on t5.Table4Id = t4.Id
这是索引查找的查询:
select t5.Id as t5Id
from table1 t1
left join table2 t2 on t2.Table1Id = t1.Id
left join table3 t3 on t3.Table2Id = t2.Id
left join table4 t4 on t4.Table3Id = t3.Id
left join table5 t5 WITH (FORCESEEK) on t5.Table4Id = t4.Id
这些表格简单明了。最后,我用一些虚拟数据填充它们,因此可以轻松复制。
CREATE TABLE [dbo].[table1](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](max) NOT NULL,
CONSTRAINT [PK_table1] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
CREATE TABLE [dbo].[table2](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Table1Id] [bigint] NOT NULL,
CONSTRAINT [PK_table2] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
ALTER TABLE [dbo].[table2] WITH CHECK
ADD CONSTRAINT [FK_table2_table1Id] FOREIGN KEY([table1Id])
REFERENCES [dbo].[table1] ([Id])
GO
ALTER TABLE [dbo].[table2]
CHECK CONSTRAINT [FK_table2_table1Id]
GO
CREATE NONCLUSTERED INDEX [IdxTable2_FKTable1Id] ON [dbo].[table2]
(
[Table1Id] ASC
)
INCLUDE ( [Id]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE TABLE [dbo].[table3](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Table2Id] [bigint] NOT NULL,
CONSTRAINT [PK_table3] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
ALTER TABLE [dbo].[table3] WITH CHECK
ADD CONSTRAINT [FK_table3_table2Id] FOREIGN KEY([table2Id])
REFERENCES [dbo].[table2] ([Id])
GO
ALTER TABLE [dbo].[table3]
CHECK CONSTRAINT [FK_table3_table2Id]
GO
CREATE NONCLUSTERED INDEX [IdxTable3_FKTable2Id] ON [dbo].[table3]
(
[Table2Id] ASC
)
INCLUDE ( [Id]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE TABLE [dbo].[table4](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Table3Id] [bigint] NOT NULL,
CONSTRAINT [PK_table4] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
ALTER TABLE [dbo].[table4] WITH CHECK
ADD CONSTRAINT [FK_table4_table3Id] FOREIGN KEY([table3Id])
REFERENCES [dbo].[table4] ([Id])
GO
ALTER TABLE [dbo].[table4]
CHECK CONSTRAINT [FK_table4_table3Id]
GO
CREATE NONCLUSTERED INDEX [IdxTable4_FKTable3Id] ON [dbo].[table4]
(
[Table3Id] ASC
)
INCLUDE ( [Id]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE TABLE [dbo].[table5](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Table4Id] [bigint] NOT NULL,
[Description] [nvarchar](2000) NOT NULL,
CONSTRAINT [PK_table5] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
ALTER TABLE [dbo].[table5] WITH CHECK
ADD CONSTRAINT [FK_table5_table4Id] FOREIGN KEY([table4Id])
REFERENCES [dbo].[table5] ([Id])
GO
ALTER TABLE [dbo].[table5]
CHECK CONSTRAINT [FK_table5_table4Id]
GO
CREATE NONCLUSTERED INDEX [IdxTable5_FKTable4Id] ON [dbo].[table5]
(
[Table4Id] ASC
)
INCLUDE ( [Id], [Description]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
set nocount on
DECLARE @i INT = 0;
DECLARE @j INT = 0;
DECLARE @k INT = 0;
DECLARE @l INT = 10;
DECLARE @m INT = 0;
declare @table1Id bigint
declare @table2Id bigint
declare @table3Id bigint
declare @table4Id bigint
begin tran
WHILE @i < 10
BEGIN
INSERT INTO [dbo].[table1] ([Name]) VALUES (cast(@i as nvarchar(10)))
SELECT @table1Id = SCOPE_IDENTITY()
WHILE @j < 10
BEGIN
INSERT INTO [dbo].[table2] ([Table1Id]) VALUES (@table1Id)
SELECT @table2Id = SCOPE_IDENTITY()
WHILE @k < 10
BEGIN
INSERT INTO [dbo].[table3] ([Table2Id]) VALUES (@table2Id)
SELECT @table3Id = SCOPE_IDENTITY()
WHILE @l > 0
BEGIN
INSERT INTO [dbo].[table4] ([Table3Id]) VALUES (@table3Id)
SELECT @table4Id = SCOPE_IDENTITY()
WHILE @m < 10
BEGIN
INSERT INTO [dbo].[table5] ([Table4Id], [Description]) VALUES (@table4Id, 'Not so long description')
SET @m = @m + 1;
END;
SET @m = 0;
SET @l = @l - 1;
END;
SET @l = 10;
SET @k = @k + 1;
END;
SET @k = 0;
SET @j = @j + 1;
END;
SET @j = 0;
SET @i = @i + 1;
END;
commit
首先让我们估算一下对 IdxTable5_FKTable4Id 进行索引扫描所需的读取次数:
在我的系统上,该查询的结果表明 SQL Server 需要大约 900 次读取才能完整读取索引。为了测试这一点,我将运行一个简单的查询,该查询在 SQL Server 中作为索引扫描最有效地实现。下面的查询需要 table5 中所有行的 ID 列。SQL Server 只需查看索引即可获取查询所需的所有数据。由于需要每一行,因此这里没有任何浪费的工作。
现在让我们考虑您的第一个不使用提示的测试查询。最终哈希匹配的外部表在我的系统上估计有 9657 行。查询优化器决定对 IdxTable5_FKTable4Id 进行索引扫描是一个足够好的计划。这是我运行的查询:
散列连接的外部表实际上有 10000 行。但是,因为这是一个哈希连接,SQL Server 仍然需要扫描索引中的所有 100000 行,即使只需要 10000 行。这就是为什么该查询需要与第一个查询相同的 900 次逻辑读取。可以说这是查询优化器浪费的精力。使用索引查找仅从索引中获取所需的 10000 行会更有效吗?
首先让我们估算一下所需的读取次数。您的索引深度为 3:
此外,我将启用 TF 8744 以获得更清晰的结果以用于此演示:
我知道外部表中有 10000 行,因此对逻辑读取次数的估计值为 10000 * (3 + 1) = 40000。每行,索引深度为 3,获取数据为 1。
这是经过测试的查询:
这非常接近估计的40000。
我们在这里学到了什么?对于此索引,每次索引查找的成本约为 4 次读取,执行扫描的固定成本为 900 次读取。这意味着从 IO 的角度来看,使用索引查找只会在从索引中获取一小部分数据时更有效。否则,即使不需要为查询返回正确结果,使用索引扫描获取所有数据也会更有效。
对于最终测试,让我们尝试从原始测试查询中取回前 1000 行。在我的系统上,即使没有提示,查询优化器也会自然地选择索引搜索。这是我运行的查询:
对于该查询,索引查找可以被视为比索引扫描更有效。请注意,外部表有 100 行,因此完成了 100 次搜索。一次查找可以返回超过 1 行。这就是逻辑读取接近 400 而不是 4000 的原因。
它不仅仅是一个简单的“扫描与搜索”问题。SQL 可以做得好也可以做得不好,这取决于其他因素。一次查找可能会执行 900 次,每次返回一行,而一次扫描可能会在一次访问中返回所有 900 行。在这种情况下,扫描效果更好。
您应该查看执行计划并熟悉如何阅读它们。
https://stackoverflow.com/questions/758912/how-to-read-an-execution-plan-in-sql-server
我在做一些优化时也看到了同样的事情。一些覆盖索引的创建加快了查询速度,但导致逻辑读取量增加了大约 10 倍。
在http://www.dbsophic.com/learn-more/sql-server-articles/53-tip-comparing-db-sql-server-logical-reads-what-they-really上有一篇非常好的文章-告诉
根据 Ami Levin 的说法,原因是因为创建良好的索引会导致使用嵌套循环连接进行索引查找,而不是使用散列连接进行索引扫描。显然,SQL 不会将哈希探测计为逻辑读取,因此这意味着两个不同计划形状之间的逻辑读取实际上可以比较苹果和橘子。