在我的查询中,有些事情我不确定如何解决。
一、定义:
快递服务表。有一张唱片。
CREATE TABLE [dbo].[CS](
[ServiceID] [int] IDENTITY(1,1) NOT NULL,
[CSID] [nvarchar](6) NULL,
[CSDescription] [varchar](50) NULL,
[OperatingDays] [int] NULL,
[DefaultService] [bit] NULL,
CONSTRAINT [CourierServices_PK] PRIMARY KEY CLUSTERED
(
[ServiceID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90
) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[CS] ON
INSERT [dbo].[CS] ([ServiceID], [CSID], [OperatingDays], [DefaultService])
VALUES (1, N'RM48', 2, 1)
SET IDENTITY_INSERT [dbo].[CS] OFF
SET ANSI_PADDING ON
GO
/****** Object: Index [ix_CourierServices] Script Date: 19/04/2017 14:27:03 ******/
CREATE NONCLUSTERED INDEX [ix_CourierServices] ON [dbo].[CS]
(
[CSID] ASC,
[DefaultService] ASC,
[OperatingDays] ASC
)
INCLUDE ( [CSDescription]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
日历数据库和表格,代码由Genius Jim Horn编写:
CREATE TABLE [dbo].[days](
[PKDate] [date] NOT NULL,
[calendar_year] [smallint] NULL,
[calendar_quarter] [tinyint] NULL,
[calendar_quarter_desc] [varchar](10) NULL,
[calendar_month] [tinyint] NULL,
[calendar_month_name_long] [varchar](30) NULL,
[calendar_month_name_short] [varchar](10) NULL,
[calendar_week_in_year] [tinyint] NULL,
[calendar_week_in_month] [tinyint] NULL,
[calendar_day_in_year] [smallint] NULL,
[calendar_day_in_week] [tinyint] NULL,
[calendar_day_in_month] [tinyint] NULL,
[dmy_name_long] [varchar](30) NULL,
[dmy_name_long_with_suffix] [varchar](30) NULL,
[day_name_long] [varchar](10) NULL,
[day_name_short] [varchar](10) NULL,
[continuous_year] [tinyint] NULL,
[continuous_quarter] [smallint] NULL,
[continuous_month] [smallint] NULL,
[continuous_week] [smallint] NULL,
[continuous_day] [int] NULL,
[description] [varchar](100) NULL,
[is_weekend] [tinyint] NULL,
[is_holiday] [tinyint] NULL,
[is_workday] [tinyint] NULL,
[is_event] [tinyint] NULL,
PRIMARY KEY CLUSTERED
(
[PKDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: Index [ix_days] Script Date: 19/04/2017 14:38:47 ******/
CREATE NONCLUSTERED INDEX [ix_days] ON [dbo].[days]
(
[PKDate] ASC
)
INCLUDE ( [is_weekend],
[is_holiday],
[is_workday],
[is_event]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
现在,我正在运行一个查询,它根据代码位引用两个表:
Select
OID
,case when
Cast(o.[CreationDate] as time) > '16:00:00'
then (select top 1 [PKDate] from [calendar].[dbo].days
where is_weekend <> 1 and is_holiday <>1 and
PKDate > cast(o.[CreationDate] as date)
order by PKDate asc)
else (select top 1 [PKDate] from [calendar].[dbo].days
where is_weekend <> 1 and is_holiday <>1 and
PKDate >= Cast(o.[CreationDate] as date)
order by PKDate asc)
end OperatingDate
,case when
Cast(o.[CreationDate] as time) > '16:00:00'
then (select top 1 [PKDate] from [calendar].[dbo].days
where is_weekend <> 1 and is_holiday <>1 and
PKDate > dateadd(day,isnull(
(select top 1 [operatingdays]
from [dbo].[CS]
where DefaultService = 1)
,2)+1,Cast(o.[CreationDate] as date))
order by PKDate asc)
else (select top 1 [PKDate] from [calendar].[dbo].days
where is_weekend <> 1 and is_holiday <>1 and
PKDate > dateadd(day,isnull(
(select top 1 [operatingdays]
from [dbo].[CS]
where DefaultService = 1)
,2), Cast(o.[CreationDate] as date))
order by PKDate asc)
end EstimatedDeliveryDate
,(select dateadd(day,3,o.[CreationDate])) DeliveryDate
From o
现在的问题是,与索引扫描和执行次数有关:为什么是 20 亿?还是60亿?不可否认,整个查询的输出是 170 万行,但这并不能解释查询计划中显示的疯狂数字:
https://www.brentozar.com/pastetheplan/?id=H1iahxHAe
如果我可以敲平所有这些扫描,我可以显着减少查询时间,但首先:我如何解释这些数字以找到解决方案?
days 表包含 7.6 k 行(涵盖 2000-2020 年)。
让我们从查看计划的右上角开始。该部分计算
OperatingDate
列:由于我们为外部行集返回了 1.72 M 行,因此我们可以预期大约 1.72 M 索引针对
ix_days
. 确实是这样。有 478k 行,o.[CreationDate] as time) > '16:00:00'
因此该CASE
语句将 478k 查找发送到一个分支,其余的发送到另一个分支。请注意,您拥有的索引对于此查询而言并不是最有效的索引。我们只能针对 做一个查找谓词
PKDate
。其余过滤器作为谓词应用。这意味着查找可能会遍历许多行才能找到匹配项。我假设您的日历表中的大多数日子都不是周末或假期,因此它可能不会对该查询产生实际影响。但是,您可以在 上定义一个索引is_weekend, is_holiday, PKDate
。那应该让你立即寻找你想要的第一行。为了更清楚地说明这一点,让我们看一个简单的例子:
让我们进入更有趣的部分,即计算
DeliveryDate
列的分支。我只会包括一半:我怀疑您希望优化器做的是将其计算为标量:
并使用它的值来使用 进行索引查找
ix_days
。不幸的是,优化器不这样做。它改为对索引应用行目标并进行扫描。对于扫描返回的每一行,它都会检查该值是否与过滤器相匹配[dbo].[CS]
。一旦找到匹配的行,扫描就会停止。SQL Server 估计在找到匹配项之前,它平均只会从扫描中拉回 3.33 行。如果那是真的,那么你会看到大约 150 万次针对[dbo].[CS]
. 相反,优化器对该表执行了 20 亿次,因此估计值偏离了 1000 多次。作为一般规则,您应该仔细检查嵌套循环内侧的任何扫描。当然,有些查询正是您想要的。并且仅仅因为您进行了搜索并不意味着查询将是有效的。例如,如果搜索返回许多行,则与进行扫描可能没有太大区别。您没有在此处发布完整的查询,但我会介绍一些可能有帮助的想法。
这个查询有点奇怪:
它是不确定的,因为你
TOP
没有ORDER BY
。但是,表本身有 1 行,您总是从o
. 如果可能的话,我会尝试将此查询的值保存到局部变量中,然后在查询中使用它。这应该再次为您节省 80 亿次扫描[dbo].[CS]
,我希望看到索引搜索而不是针对ix_days
. 我能够在我的机器上模拟一些数据。这是查询计划的一部分:现在我们有了所有的搜索,这些搜索不应该处理太多额外的行。但是,实际查询可能比这更复杂,因此您可能无法使用变量。
Let's say I write a different filter condition that doesn't use
TOP
. Instead I'll useMIN
. SQL Server is able to process that subquery in a more efficient way. TOP can prevent certain query transformations. Here is my subquery:Here is what the plan might look like:
Now we'll only do around 1.5 million scans against the
CS
table. we also get a much more efficient index seek against theix_days
index which is able to use the results of the subquery:Of course, I'm not saying that you should rewrite your code to use that. It'll probably return incorrect results. The important point is that you can get the index seeks that you want with a subquery. You just need to write your subquery in the right way.
For one more example, let's assume that you absolutely need to keep the
TOP
operator in the subquery. It might be possible to add a redundant filter againstPkDate
to get better performance. I'm going to assume that the results of the subquery are non-negative and small. That means that this query will be equivalent:This changes the plan to use seeks:
It's important to realize that the seeks may return more just one row. The important point is that SQL Server can start seeking at
o.[CreationDate]
. If there's a large gap in the dates then the index seek will process many extra rows and the query will not be as efficient.您正在从嵌套循环连接中获取这些数字。
在您的示例中,这是您如何获得 2B 记录的一个示例。
另一个如何获得 5B+。
关于如何避免大型嵌套循环连接的几个链接:
The information that both other answers try to convey but fail (only partly due to assumptions that I understand exactly what they say) is this:
With the query written the way it was in the question the observed performance was inevitable.
While it was fancy and mostly easy to see the purpose it was simply too heavy for the optimizer to work magic on it. It wasn't quite the nested loop problem SqlWorldWide indicated, but the subqueries simply had to be executed for each row and since they were index seeks and scans they multiplied, and multiplied... and multiplied.
What I ended up having was this:
In addition to streamlining the query - which still is not optimal - I've also reworked the calendar.dbo.days table's indexes. Dropped the constraint (which I really didn't have to, but what the hell, it might cause more problems further down the line) and added this:
我承认这主要是为了让我可以更充分地利用日历表(我有没有提到吉姆霍恩是个天才?),但当人们看到我的账户时,他们想要越来越多的东西存储在......无处不在.
所以,归根结底,虽然查看查询的所有方面都很重要:逻辑、索引、谓词等,但有时唯一明智的改进方法是更改代码。在我的例子中,完整查询(几个插入、更新和 CTE)的执行时间现在在大约 2 分钟内完成,而之前是 15 分钟。