给定一个(简化的)存储过程,例如:
CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN @startDate AND @endDate
END
如果Sale
表很大,SELECT
可能需要很长时间才能执行,显然是因为优化器由于局部变量而无法优化。我们测试了SELECT
使用变量运行该部件,然后是硬编码日期,执行时间从约 9 分钟到约 1 秒。
我们有许多基于“固定”日期范围(周、月、8 周等)查询的存储过程,因此输入参数只是 @endDate 和 @startDate 在过程中计算。
问题是,在 WHERE 子句中避免使用变量以免损害优化器的最佳实践是什么?
我们提出的可能性如下所示。这些最佳实践中的任何一个,还是有其他方法?
使用包装程序将变量转换为参数。
参数不会像局部变量那样影响优化器。
CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
EXECUTE DateRangeProc @startDate, @endDate
END
CREATE PROCEDURE DateRangeProc(@startDate DATE, @endDate DATE)
AS
BEGIN
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN @startDate AND @endDate
END
使用参数化动态 SQL。
CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
DECLARE @sql NVARCHAR(4000) = N'
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN @startDate AND @endDate
'
DECLARE @param NVARCHAR(4000) = N'@startDate DATE, @endDate DATE'
EXECUTE sp_executesql @sql, @param, @startDate = @startDate, @endDate = @endDate
END
使用“硬编码”动态 SQL。
CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
DECLARE @sql NVARCHAR(4000) = N'
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN @startDate AND @endDate
'
SET @sql = REPLACE(@sql, '@startDate', CONVERT(NCHAR(10), @startDate, 126))
SET @sql = REPLACE(@sql, '@endDate', CONVERT(NCHAR(10), @endDate, 126))
EXECUTE sp_executesql @sql
END
直接使用该DATEADD()
功能。
我对此并不热衷,因为在 WHERE 中调用函数也会影响性能。
CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN DATEADD(DAY, -6, @endDate) AND @endDate
END
使用可选参数。
我不确定分配给参数是否与分配给变量有相同的问题,所以这可能不是一个选项。我真的不喜欢这个解决方案,但为了完整性而将其包括在内。
CREATE PROCEDURE WeeklyProc(@endDate DATE, @startDate DATE = NULL)
AS
BEGIN
SET @startDate = DATEADD(DAY, -6, @endDate)
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN @startDate AND @endDate
END
- 更新 -
感谢您的建议和意见。阅读它们后,我使用各种方法进行了一些计时测试。我在这里添加结果作为参考。
运行 1 没有计划。运行 2 紧接在运行 1 之后,具有完全相同的参数,因此它将使用运行 1 中的计划。
NoProc 时间用于在存储过程之外的 SSMS 中手动运行 SELECT 查询。
TestProc1-7 是来自原始问题的查询。
TestProcA-B 基于Mikael Eriksson的建议。数据库中的列是 DATE,因此我尝试将参数作为 DATETIME 传递并使用隐式转换 (testProcA) 和显式转换 (testProcB) 运行。
TestProcC-D 基于Kenneth Fisher的建议。我们已经为其他事情使用了日期查找表,但我们没有一个针对每个期间范围的特定列。我尝试的变体仍然使用 BETWEEN,但在较小的查找表上使用它并连接到较大的表。我将进一步调查我们是否可以使用特定的查找表,尽管我们的周期是固定的,但有很多不同的。
Sale 表中的总行数:136,424,366 运行 1(毫秒) 运行 2(毫秒) 程序 CPU Elapsed CPU Elapsed 注释 NoProc 常量 6567 62199 2870 719 手动查询常量 NoProc 变量 9314 62424 3993 998 带变量的手动查询 testProc1 6801 62919 2871 736 硬编码范围 testProc2 8955 63190 3915 979 参数及变量范围 testProc3 8985 63152 3932 987 带参数范围的包装程序 testProc4 9142 63939 3931 977 参数化动态SQL testProc5 7269 62933 2933 728 硬编码动态SQL testProc6 9266 63421 3915 984 在 DATE 上使用 DATEADD testProc7 2044 13950 1092 1087 虚拟参数 testProcA 12120 61493 5491 1875 在没有 CAST 的 DATETIME 上使用 DATEADD testProcB 8612 61949 3932 978 在 DATETIME 上使用 DATEADD 和 CAST testProcC 8861 61651 3917 993 使用查找表,先销售 testProcD 8625 61740 3994 1031 使用查找表,最后销售
这是测试代码。
------ SETUP ------
IF OBJECT_ID(N'testDimDate', N'U') IS NOT NULL DROP TABLE testDimDate
IF OBJECT_ID(N'testProc1', N'P') IS NOT NULL DROP PROCEDURE testProc1
IF OBJECT_ID(N'testProc2', N'P') IS NOT NULL DROP PROCEDURE testProc2
IF OBJECT_ID(N'testProc3', N'P') IS NOT NULL DROP PROCEDURE testProc3
IF OBJECT_ID(N'testProc3a', N'P') IS NOT NULL DROP PROCEDURE testProc3a
IF OBJECT_ID(N'testProc4', N'P') IS NOT NULL DROP PROCEDURE testProc4
IF OBJECT_ID(N'testProc5', N'P') IS NOT NULL DROP PROCEDURE testProc5
IF OBJECT_ID(N'testProc6', N'P') IS NOT NULL DROP PROCEDURE testProc6
IF OBJECT_ID(N'testProc7', N'P') IS NOT NULL DROP PROCEDURE testProc7
IF OBJECT_ID(N'testProcA', N'P') IS NOT NULL DROP PROCEDURE testProcA
IF OBJECT_ID(N'testProcB', N'P') IS NOT NULL DROP PROCEDURE testProcB
IF OBJECT_ID(N'testProcC', N'P') IS NOT NULL DROP PROCEDURE testProcC
IF OBJECT_ID(N'testProcD', N'P') IS NOT NULL DROP PROCEDURE testProcD
GO
CREATE TABLE testDimDate
(
DateKey DATE NOT NULL,
CONSTRAINT PK_DimDate_DateKey UNIQUE NONCLUSTERED (DateKey ASC)
)
GO
DECLARE @dateTimeStart DATETIME = '2000-01-01'
DECLARE @dateTimeEnd DATETIME = '2100-01-01'
;WITH CTE AS
(
--Anchor member defined
SELECT @dateTimeStart FullDate
UNION ALL
--Recursive member defined referencing CTE
SELECT FullDate + 1 FROM CTE WHERE FullDate + 1 <= @dateTimeEnd
)
SELECT
CAST(FullDate AS DATE) AS DateKey
INTO #DimDate
FROM CTE
OPTION (MAXRECURSION 0)
INSERT INTO testDimDate (DateKey)
SELECT DateKey FROM #DimDate ORDER BY DateKey ASC
DROP TABLE #DimDate
GO
-- Hard coded date range.
CREATE PROCEDURE testProc1 AS
BEGIN
SET NOCOUNT ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN '2012-12-09' AND '2012-12-10'
END
GO
-- Parameter and variable date range.
CREATE PROCEDURE testProc2(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
END
GO
-- Parameter date range.
CREATE PROCEDURE testProc3a(@startDate DATE, @endDate DATE) AS
BEGIN
SET NOCOUNT ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
END
GO
-- Wrapper procedure.
CREATE PROCEDURE testProc3(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
EXEC testProc3a @startDate, @endDate
END
GO
-- Parameterized dynamic SQL.
CREATE PROCEDURE testProc4(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
DECLARE @sql NVARCHAR(4000) = N'SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate'
DECLARE @param NVARCHAR(4000) = N'@startDate DATE, @endDate DATE'
EXEC sp_executesql @sql, @param, @startDate = @startDate, @endDate = @endDate
END
GO
-- Hard coded dynamic SQL.
CREATE PROCEDURE testProc5(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
DECLARE @sql NVARCHAR(4000) = N'SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN ''@startDate'' AND ''@endDate'''
SET @sql = REPLACE(@sql, '@startDate', CONVERT(NCHAR(10), @startDate, 126))
SET @sql = REPLACE(@sql, '@endDate', CONVERT(NCHAR(10), @endDate, 126))
EXEC sp_executesql @sql
END
GO
-- Explicitly use DATEADD on a DATE.
CREATE PROCEDURE testProc6(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN DATEADD(DAY, -1, @endDate) AND @endDate
END
GO
-- Dummy parameter.
CREATE PROCEDURE testProc7(@endDate DATE, @startDate DATE = NULL) AS
BEGIN
SET NOCOUNT ON
SET @startDate = DATEADD(DAY, -1, @endDate)
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
END
GO
-- Explicitly use DATEADD on a DATETIME with implicit CAST for comparison with SaleDate.
-- Based on the answer from Mikael Eriksson.
CREATE PROCEDURE testProcA(@endDateTime DATETIME) AS
BEGIN
SET NOCOUNT ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN DATEADD(DAY, -1, @endDateTime) AND @endDateTime
END
GO
-- Explicitly use DATEADD on a DATETIME but CAST to DATE for comparison with SaleDate.
-- Based on the answer from Mikael Eriksson.
CREATE PROCEDURE testProcB(@endDateTime DATETIME) AS
BEGIN
SET NOCOUNT ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN CAST(DATEADD(DAY, -1, @endDateTime) AS DATE) AND CAST(@endDateTime AS DATE)
END
GO
-- Use a date lookup table, Sale first.
-- Based on the answer from Kenneth Fisher.
CREATE PROCEDURE testProcC(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
SELECT SUM(Value) FROM Sale J INNER JOIN testDimDate D ON D.DateKey = J.SaleDate WHERE D.DateKey BETWEEN @startDate AND @endDate
END
GO
-- Use a date lookup table, Sale last.
-- Based on the answer from Kenneth Fisher.
CREATE PROCEDURE testProcD(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
SELECT SUM(Value) FROM testDimDate D INNER JOIN Sale J ON J.SaleDate = D.DateKey WHERE D.DateKey BETWEEN @startDate AND @endDate
END
GO
------ TEST ------
SET STATISTICS TIME OFF
DECLARE @endDate DATE = '2012-12-10'
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
RAISERROR('Run 1: NoProc with constants', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN '2012-12-09' AND '2012-12-10'
SET STATISTICS TIME OFF
RAISERROR('Run 2: NoProc with constants', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN '2012-12-09' AND '2012-12-10'
SET STATISTICS TIME OFF
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
RAISERROR('Run 1: NoProc with variables', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
SET STATISTICS TIME OFF
RAISERROR('Run 2: NoProc with variables', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
SET STATISTICS TIME OFF
DECLARE @sql NVARCHAR(4000)
DECLARE _cursor CURSOR LOCAL FAST_FORWARD FOR
SELECT
procedures.name,
procedures.object_id
FROM sys.procedures
WHERE procedures.name LIKE 'testProc_'
ORDER BY procedures.name ASC
OPEN _cursor
DECLARE @name SYSNAME
DECLARE @object_id INT
FETCH NEXT FROM _cursor INTO @name, @object_id
WHILE @@FETCH_STATUS = 0
BEGIN
SET @sql = CASE (SELECT COUNT(*) FROM sys.parameters WHERE object_id = @object_id)
WHEN 0 THEN @name
WHEN 1 THEN @name + ' ''@endDate'''
WHEN 2 THEN @name + ' ''@startDate'', ''@endDate'''
END
SET @sql = REPLACE(@sql, '@name', @name)
SET @sql = REPLACE(@sql, '@startDate', CONVERT(NVARCHAR(10), @startDate, 126))
SET @sql = REPLACE(@sql, '@endDate', CONVERT(NVARCHAR(10), @endDate, 126))
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
RAISERROR('Run 1: %s', 0, 0, @sql) WITH NOWAIT
SET STATISTICS TIME ON
EXEC sp_executesql @sql
SET STATISTICS TIME OFF
RAISERROR('Run 2: %s', 0, 0, @sql) WITH NOWAIT
SET STATISTICS TIME ON
EXEC sp_executesql @sql
SET STATISTICS TIME OFF
FETCH NEXT FROM _cursor INTO @name, @object_id
END
CLOSE _cursor
DEALLOCATE _cursor
Parameter sniffing is your friend almost all of the time and you should write your queries so that it can be used. Parameter sniffing helps building the plan for you using the parameter values available when the query is compiled. The dark side of parameter sniffing is when the values used when compiling the query is not optimal for the queries to come.
The query in a stored procedure is compiled when the stored procedure is executed, not when the query is executed so the values that SQL Server has to deal with here...
is a known value for
@endDate
and an unknown value for@startDate
. That will leave SQL Server to guessing on 30% of the rows returned for the filter on@startDate
combined with whatever the statistics tells it for@endDate
. If you have a big table with a lot of rows that could give you a scan operation where you would benefit most from a seek.Your wrapper procedure solution makes sure that SQL Server sees the values when
DateRangeProc
is compiled so it can use known values for both@endDate
and@startDate
.Both your dynamic queries leads to the same thing, the values are known at compile-time.
The one with a default null value is a bit special. The values known to SQL Server at compile-time is a known value for
@endDate
andnull
for@startDate
. Using anull
in a between will give you 0 rows but SQL Server always guess at 1 in those cases. That might be a good thing in this case but if you call the stored procedure with a large date interval where a scan would have been the best choice it may end up doing a bunch of seeks.I left "Use the DATEADD() function directly" to the end of this answer because it is the one I would use and there is something strange with it as well.
First off, SQL Server does not call the function multiple times when it is used in the where clause. DATEADD is considered runtime constant.
And I would think that
DATEADD
is evaluated when the query is compiled so that you would get a good estimate on the number of rows returned. But it is not so in this case.SQL Server estimates based on the value in the parameter regardless of what you do with
DATEADD
(tested on SQL Server 2012) so in your case the estimate will be the number of rows that is registered on@endDate
. Why it does that I don't know but it has to do with the use of the datatypeDATE
. Shift toDATETIME
in the stored procedure and the table and the estimate will be accurate, meaning thatDATEADD
is considered at compile time forDATETIME
not forDATE
.So to summarize this rather lengthy answer I would recommend the wrapper procedure solution. It will always allow SQL Server to use the values provided when compiling the the query without the hassle of using dynamic SQL.
PS:
In comments you got two suggestions.
OPTION (OPTIMIZE FOR UNKNOWN)
will give you an estimate of 9% of rows returned andOPTION (RECOMPILE)
will make SQL Server see the parameter values since the query is recompiled every time.好的,我有两个可能的解决方案给你。
首先,我想知道这是否允许增加参数化。我还没有机会测试它,但它可能会起作用。
另一个选项利用了您使用固定时间框架的事实。首先创建一个 DateLookup 表。像这样的东西
填写从现在到下个世纪之间的每个日期。这只有约 36500 行,所以是一个相当小的表。然后像这样更改您的查询
显然这只是一个例子,当然可以写得更好,但我对这种类型的表很幸运。特别是因为它是一个静态表,可以像疯了一样被索引。
(It's 2020 now, and I am surprised and disappointed that the SQL language still doesn't have a built-in predicate builder syntax to allow
WHERE
clauses to be constructed in a safe and verifiable manner without resorting to Dynamic SQL. Though in many applications this is moot because the ORM will handle query generation, and I am in love with Entity Framework and Linq-to-Entities - but this isn't available for people needing to write queries by hand).In my case, I had a Multi-Statement Table-Valued Function that had a single
SELECT
query withWHERE
clauses that I needed to disable or enable at runtime, and using the@param IS NULL OR [Col] = @param
"trick" didn't work because it was generating suboptimal execution plans. The query was rather complicated with a load ofJOIN
s as well, but theWHERE
clauses I wanted to customize was in the outer-query, something like this:What I did was move the complicated
FROM
into aVIEW
or Inline Table Valued Function, and then created a tree ofIF
statements for each combination of predicates. It's big - but it isn't complicated and it does mean the optimal query plan is always generated.If you have a lot of queries like this, you could use T4 to generate the
IF
branches for each set of parameters (I find myself having to use T4 to generate repetitive SQL anyway, because T-SQL doesn't have built-in support for macros).So I currently have this instead:
Fantasy time:
I just wish the SQL language design team would add built-in macros and/or a predicate-builder. There's no reason why something like this couldn't exist:
And if we had the ability to define a list of columns as a macro that'd make it even sweeter:
...but I can dream :/