Dado um procedimento armazenado (simplificado) como este:
CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN @startDate AND @endDate
END
Se a Sale
tabela for grande SELECT
pode levar muito tempo para executar, aparentemente porque o otimizador não pode otimizar devido à variável local. Testamos a execução da SELECT
parte com variáveis, em seguida, codificamos as datas e o tempo de execução passou de ~ 9 minutos para ~ 1 segundo.
Temos vários procedimentos armazenados que consultam com base em intervalos de datas "fixos" (semana, mês, 8 semanas etc), de modo que o parâmetro de entrada é apenas @endDate e @startDate é calculado dentro do procedimento.
A questão é: qual é a melhor prática para evitar variáveis em uma cláusula WHERE para não comprometer o otimizador?
As possibilidades que encontramos são mostradas abaixo. Algumas dessas práticas são recomendadas ou existe outra maneira?
Use um procedimento de wrapper para transformar as variáveis em parâmetros.
Os parâmetros não afetam o otimizador da mesma forma que as variáveis locais.
CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
EXECUTE DateRangeProc @startDate, @endDate
END
CREATE PROCEDURE DateRangeProc(@startDate DATE, @endDate DATE)
AS
BEGIN
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN @startDate AND @endDate
END
Use SQL dinâmico parametrizado.
CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
DECLARE @sql NVARCHAR(4000) = N'
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN @startDate AND @endDate
'
DECLARE @param NVARCHAR(4000) = N'@startDate DATE, @endDate DATE'
EXECUTE sp_executesql @sql, @param, @startDate = @startDate, @endDate = @endDate
END
Use SQL dinâmico "hard-coded".
CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
DECLARE @sql NVARCHAR(4000) = N'
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN @startDate AND @endDate
'
SET @sql = REPLACE(@sql, '@startDate', CONVERT(NCHAR(10), @startDate, 126))
SET @sql = REPLACE(@sql, '@endDate', CONVERT(NCHAR(10), @endDate, 126))
EXECUTE sp_executesql @sql
END
Use a DATEADD()
função diretamente.
Não estou interessado nisso porque chamar funções no WHERE também afeta o desempenho.
CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN DATEADD(DAY, -6, @endDate) AND @endDate
END
Use um parâmetro opcional.
Não tenho certeza se atribuir a parâmetros teria o mesmo problema que atribuir a variáveis, então isso pode não ser uma opção. Eu realmente não gosto desta solução, mas incluindo-a para completar.
CREATE PROCEDURE WeeklyProc(@endDate DATE, @startDate DATE = NULL)
AS
BEGIN
SET @startDate = DATEADD(DAY, -6, @endDate)
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN @startDate AND @endDate
END
-- Atualizar --
Obrigado pelas sugestões e comentários. Depois de lê-los, fiz alguns testes de tempo com as várias abordagens. Estou adicionando os resultados aqui como referência.
A execução 1 é sem um plano. A execução 2 é imediatamente após a execução 1 com exatamente os mesmos parâmetros, de modo que usará o plano da execução 1.
Os horários NoProc são para executar as consultas SELECT manualmente no SSMS fora de um procedimento armazenado.
TestProc1-7 são as consultas da pergunta original.
TestProcA-B são baseados na sugestão de Mikael Eriksson . A coluna no banco de dados é uma DATE, então tentei passar o parâmetro como DATETIME e executar com conversão implícita (testProcA) e conversão explícita (testProcB).
TestProcC-D são baseados na sugestão de Kenneth Fisher . Já usamos uma tabela de pesquisa de datas para outras coisas, mas não temos uma com uma coluna específica para cada período. A variação que tentei ainda usa BETWEEN, mas faz isso na tabela de pesquisa menor e se une à tabela maior. Vou investigar mais a fundo se podemos usar tabelas de pesquisa específicas, embora nossos períodos sejam fixos, existem alguns diferentes.
Total de linhas na tabela Venda: 136.424.366 Executar 1 (ms) Executar 2 (ms) Procedimento CPU decorrido CPU decorrido Comentário Constantes NoProc 6567 62199 2870 719 Consulta manual com constantes Variáveis NoProc 9314 62424 3993 998 Consulta manual com variáveis testProc1 6801 62919 2871 736 Faixa codificada testProc2 8955 63190 3915 979 Parâmetro e faixa variável testProc3 8985 63152 3932 987 Procedimento de wrapper com intervalo de parâmetros testProc4 9142 63939 3931 977 SQL dinâmico parametrizado testProc5 7269 62933 2933 728 SQL dinâmico codificado testProc6 9266 63421 3915 984 Use DATEADD em DATE testProc7 2044 13950 1092 1087 Parâmetro fictício testProcA 12120 61493 5491 1875 Use DATEADD em DATETIME sem CAST testProcB 8612 61949 3932 978 Use DATEADD em DATETIME com CAST testProcC 8861 61651 3917 993 Use a tabela de pesquisa, Venda primeiro testProcD 8625 61740 3994 1031 Use a tabela de pesquisa, Última venda
Aqui está o código de teste.
------ SETUP ------
IF OBJECT_ID(N'testDimDate', N'U') IS NOT NULL DROP TABLE testDimDate
IF OBJECT_ID(N'testProc1', N'P') IS NOT NULL DROP PROCEDURE testProc1
IF OBJECT_ID(N'testProc2', N'P') IS NOT NULL DROP PROCEDURE testProc2
IF OBJECT_ID(N'testProc3', N'P') IS NOT NULL DROP PROCEDURE testProc3
IF OBJECT_ID(N'testProc3a', N'P') IS NOT NULL DROP PROCEDURE testProc3a
IF OBJECT_ID(N'testProc4', N'P') IS NOT NULL DROP PROCEDURE testProc4
IF OBJECT_ID(N'testProc5', N'P') IS NOT NULL DROP PROCEDURE testProc5
IF OBJECT_ID(N'testProc6', N'P') IS NOT NULL DROP PROCEDURE testProc6
IF OBJECT_ID(N'testProc7', N'P') IS NOT NULL DROP PROCEDURE testProc7
IF OBJECT_ID(N'testProcA', N'P') IS NOT NULL DROP PROCEDURE testProcA
IF OBJECT_ID(N'testProcB', N'P') IS NOT NULL DROP PROCEDURE testProcB
IF OBJECT_ID(N'testProcC', N'P') IS NOT NULL DROP PROCEDURE testProcC
IF OBJECT_ID(N'testProcD', N'P') IS NOT NULL DROP PROCEDURE testProcD
GO
CREATE TABLE testDimDate
(
DateKey DATE NOT NULL,
CONSTRAINT PK_DimDate_DateKey UNIQUE NONCLUSTERED (DateKey ASC)
)
GO
DECLARE @dateTimeStart DATETIME = '2000-01-01'
DECLARE @dateTimeEnd DATETIME = '2100-01-01'
;WITH CTE AS
(
--Anchor member defined
SELECT @dateTimeStart FullDate
UNION ALL
--Recursive member defined referencing CTE
SELECT FullDate + 1 FROM CTE WHERE FullDate + 1 <= @dateTimeEnd
)
SELECT
CAST(FullDate AS DATE) AS DateKey
INTO #DimDate
FROM CTE
OPTION (MAXRECURSION 0)
INSERT INTO testDimDate (DateKey)
SELECT DateKey FROM #DimDate ORDER BY DateKey ASC
DROP TABLE #DimDate
GO
-- Hard coded date range.
CREATE PROCEDURE testProc1 AS
BEGIN
SET NOCOUNT ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN '2012-12-09' AND '2012-12-10'
END
GO
-- Parameter and variable date range.
CREATE PROCEDURE testProc2(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
END
GO
-- Parameter date range.
CREATE PROCEDURE testProc3a(@startDate DATE, @endDate DATE) AS
BEGIN
SET NOCOUNT ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
END
GO
-- Wrapper procedure.
CREATE PROCEDURE testProc3(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
EXEC testProc3a @startDate, @endDate
END
GO
-- Parameterized dynamic SQL.
CREATE PROCEDURE testProc4(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
DECLARE @sql NVARCHAR(4000) = N'SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate'
DECLARE @param NVARCHAR(4000) = N'@startDate DATE, @endDate DATE'
EXEC sp_executesql @sql, @param, @startDate = @startDate, @endDate = @endDate
END
GO
-- Hard coded dynamic SQL.
CREATE PROCEDURE testProc5(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
DECLARE @sql NVARCHAR(4000) = N'SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN ''@startDate'' AND ''@endDate'''
SET @sql = REPLACE(@sql, '@startDate', CONVERT(NCHAR(10), @startDate, 126))
SET @sql = REPLACE(@sql, '@endDate', CONVERT(NCHAR(10), @endDate, 126))
EXEC sp_executesql @sql
END
GO
-- Explicitly use DATEADD on a DATE.
CREATE PROCEDURE testProc6(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN DATEADD(DAY, -1, @endDate) AND @endDate
END
GO
-- Dummy parameter.
CREATE PROCEDURE testProc7(@endDate DATE, @startDate DATE = NULL) AS
BEGIN
SET NOCOUNT ON
SET @startDate = DATEADD(DAY, -1, @endDate)
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
END
GO
-- Explicitly use DATEADD on a DATETIME with implicit CAST for comparison with SaleDate.
-- Based on the answer from Mikael Eriksson.
CREATE PROCEDURE testProcA(@endDateTime DATETIME) AS
BEGIN
SET NOCOUNT ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN DATEADD(DAY, -1, @endDateTime) AND @endDateTime
END
GO
-- Explicitly use DATEADD on a DATETIME but CAST to DATE for comparison with SaleDate.
-- Based on the answer from Mikael Eriksson.
CREATE PROCEDURE testProcB(@endDateTime DATETIME) AS
BEGIN
SET NOCOUNT ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN CAST(DATEADD(DAY, -1, @endDateTime) AS DATE) AND CAST(@endDateTime AS DATE)
END
GO
-- Use a date lookup table, Sale first.
-- Based on the answer from Kenneth Fisher.
CREATE PROCEDURE testProcC(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
SELECT SUM(Value) FROM Sale J INNER JOIN testDimDate D ON D.DateKey = J.SaleDate WHERE D.DateKey BETWEEN @startDate AND @endDate
END
GO
-- Use a date lookup table, Sale last.
-- Based on the answer from Kenneth Fisher.
CREATE PROCEDURE testProcD(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
SELECT SUM(Value) FROM testDimDate D INNER JOIN Sale J ON J.SaleDate = D.DateKey WHERE D.DateKey BETWEEN @startDate AND @endDate
END
GO
------ TEST ------
SET STATISTICS TIME OFF
DECLARE @endDate DATE = '2012-12-10'
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
RAISERROR('Run 1: NoProc with constants', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN '2012-12-09' AND '2012-12-10'
SET STATISTICS TIME OFF
RAISERROR('Run 2: NoProc with constants', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN '2012-12-09' AND '2012-12-10'
SET STATISTICS TIME OFF
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
RAISERROR('Run 1: NoProc with variables', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
SET STATISTICS TIME OFF
RAISERROR('Run 2: NoProc with variables', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
SET STATISTICS TIME OFF
DECLARE @sql NVARCHAR(4000)
DECLARE _cursor CURSOR LOCAL FAST_FORWARD FOR
SELECT
procedures.name,
procedures.object_id
FROM sys.procedures
WHERE procedures.name LIKE 'testProc_'
ORDER BY procedures.name ASC
OPEN _cursor
DECLARE @name SYSNAME
DECLARE @object_id INT
FETCH NEXT FROM _cursor INTO @name, @object_id
WHILE @@FETCH_STATUS = 0
BEGIN
SET @sql = CASE (SELECT COUNT(*) FROM sys.parameters WHERE object_id = @object_id)
WHEN 0 THEN @name
WHEN 1 THEN @name + ' ''@endDate'''
WHEN 2 THEN @name + ' ''@startDate'', ''@endDate'''
END
SET @sql = REPLACE(@sql, '@name', @name)
SET @sql = REPLACE(@sql, '@startDate', CONVERT(NVARCHAR(10), @startDate, 126))
SET @sql = REPLACE(@sql, '@endDate', CONVERT(NVARCHAR(10), @endDate, 126))
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
RAISERROR('Run 1: %s', 0, 0, @sql) WITH NOWAIT
SET STATISTICS TIME ON
EXEC sp_executesql @sql
SET STATISTICS TIME OFF
RAISERROR('Run 2: %s', 0, 0, @sql) WITH NOWAIT
SET STATISTICS TIME ON
EXEC sp_executesql @sql
SET STATISTICS TIME OFF
FETCH NEXT FROM _cursor INTO @name, @object_id
END
CLOSE _cursor
DEALLOCATE _cursor
Parameter sniffing is your friend almost all of the time and you should write your queries so that it can be used. Parameter sniffing helps building the plan for you using the parameter values available when the query is compiled. The dark side of parameter sniffing is when the values used when compiling the query is not optimal for the queries to come.
The query in a stored procedure is compiled when the stored procedure is executed, not when the query is executed so the values that SQL Server has to deal with here...
is a known value for
@endDate
and an unknown value for@startDate
. That will leave SQL Server to guessing on 30% of the rows returned for the filter on@startDate
combined with whatever the statistics tells it for@endDate
. If you have a big table with a lot of rows that could give you a scan operation where you would benefit most from a seek.Your wrapper procedure solution makes sure that SQL Server sees the values when
DateRangeProc
is compiled so it can use known values for both@endDate
and@startDate
.Both your dynamic queries leads to the same thing, the values are known at compile-time.
The one with a default null value is a bit special. The values known to SQL Server at compile-time is a known value for
@endDate
andnull
for@startDate
. Using anull
in a between will give you 0 rows but SQL Server always guess at 1 in those cases. That might be a good thing in this case but if you call the stored procedure with a large date interval where a scan would have been the best choice it may end up doing a bunch of seeks.I left "Use the DATEADD() function directly" to the end of this answer because it is the one I would use and there is something strange with it as well.
First off, SQL Server does not call the function multiple times when it is used in the where clause. DATEADD is considered runtime constant.
And I would think that
DATEADD
is evaluated when the query is compiled so that you would get a good estimate on the number of rows returned. But it is not so in this case.SQL Server estimates based on the value in the parameter regardless of what you do with
DATEADD
(tested on SQL Server 2012) so in your case the estimate will be the number of rows that is registered on@endDate
. Why it does that I don't know but it has to do with the use of the datatypeDATE
. Shift toDATETIME
in the stored procedure and the table and the estimate will be accurate, meaning thatDATEADD
is considered at compile time forDATETIME
not forDATE
.So to summarize this rather lengthy answer I would recommend the wrapper procedure solution. It will always allow SQL Server to use the values provided when compiling the the query without the hassle of using dynamic SQL.
PS:
In comments you got two suggestions.
OPTION (OPTIMIZE FOR UNKNOWN)
will give you an estimate of 9% of rows returned andOPTION (RECOMPILE)
will make SQL Server see the parameter values since the query is recompiled every time.Ok, eu tenho duas soluções possíveis para você.
Primeiro eu estou querendo saber se isso permitirá maior parametrização. Não tive a oportunidade de testar, mas pode funcionar.
A outra opção aproveita o fato de você estar usando prazos fixos. Primeiro crie uma tabela DateLookup. Algo assim
Preencha-o para cada data entre agora e o próximo século. Isso é apenas ~ 36.500 linhas, portanto, uma tabela bastante pequena. Em seguida, altere sua consulta assim
Obviamente este é apenas um exemplo e certamente poderia ser escrito melhor, mas tive muita sorte com esse tipo de tabela. Particularmente porque é uma tabela estática e pode ser indexada como um louco.
(It's 2020 now, and I am surprised and disappointed that the SQL language still doesn't have a built-in predicate builder syntax to allow
WHERE
clauses to be constructed in a safe and verifiable manner without resorting to Dynamic SQL. Though in many applications this is moot because the ORM will handle query generation, and I am in love with Entity Framework and Linq-to-Entities - but this isn't available for people needing to write queries by hand).In my case, I had a Multi-Statement Table-Valued Function that had a single
SELECT
query withWHERE
clauses that I needed to disable or enable at runtime, and using the@param IS NULL OR [Col] = @param
"trick" didn't work because it was generating suboptimal execution plans. The query was rather complicated with a load ofJOIN
s as well, but theWHERE
clauses I wanted to customize was in the outer-query, something like this:What I did was move the complicated
FROM
into aVIEW
or Inline Table Valued Function, and then created a tree ofIF
statements for each combination of predicates. It's big - but it isn't complicated and it does mean the optimal query plan is always generated.If you have a lot of queries like this, you could use T4 to generate the
IF
branches for each set of parameters (I find myself having to use T4 to generate repetitive SQL anyway, because T-SQL doesn't have built-in support for macros).So I currently have this instead:
Fantasy time:
I just wish the SQL language design team would add built-in macros and/or a predicate-builder. There's no reason why something like this couldn't exist:
And if we had the ability to define a list of columns as a macro that'd make it even sweeter:
...but I can dream :/