我正在尝试调整在 20 列上调用相同表值函数 (TVF) 的查询。
我做的第一件事是将标量函数转换为内联表值函数。
是否使用CROSS APPLY
性能最佳的方式在查询中的多个列上执行相同的功能?
一个简单的例子:
SELECT Col1 = A.val
,Col2 = B.val
,Col3 = C.val
--do the same for other 17 columns
,Col21
,Col22
,Col23
FROM t
CROSS APPLY
dbo.function1(Col1) A
CROSS APPLY
dbo.function1(Col2) B
CROSS APPLY
dbo.function1(Col3) C
--do the same for other 17 columns
有更好的选择吗?
可以在针对 X 列的多个查询中调用相同的函数。
这是功能:
CREATE FUNCTION dbo.ConvertAmountVerified_TVF
(
@amt VARCHAR(60)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
WITH cteLastChar
AS(
SELECT LastChar = RIGHT(RTRIM(@amt), 1)
)
SELECT
AmountVerified = CAST(RET.Y AS NUMERIC(18,2))
FROM (SELECT 1 t) t
OUTER APPLY (
SELECT N =
CAST(
CASE
WHEN CHARINDEX(L.LastChar COLLATE Latin1_General_CS_AS, '{ABCDEFGHI}', 0) >0
THEN CHARINDEX(L.LastChar COLLATE Latin1_General_CS_AS, '{ABCDEFGHI}', 0)-1
WHEN CHARINDEX(L.LastChar COLLATE Latin1_General_CS_AS, 'JKLMNOPQR', 0) >0
THEN CHARINDEX(L.LastChar COLLATE Latin1_General_CS_AS, 'JKLMNOPQR', 0)-1
WHEN CHARINDEX(L.LastChar COLLATE Latin1_General_CS_AS, 'pqrstuvwxy', 0) >0
THEN CHARINDEX(L.LastChar COLLATE Latin1_General_CS_AS, 'pqrstuvwxy', 0)-1
ELSE
NULL
END
AS VARCHAR(1))
FROM
cteLastChar L
) NUM
OUTER APPLY (
SELECT N =
CASE
WHEN CHARINDEX(L.LastChar COLLATE Latin1_General_CS_AS, '{ABCDEFGHI}', 0) >0
THEN 0
WHEN CHARINDEX(L.LastChar COLLATE Latin1_General_CS_AS, 'JKLMNOPQRpqrstuvwxy', 0) >0
THEN 1
ELSE 0
END
FROM cteLastChar L
) NEG
OUTER APPLY(
SELECT Amt= CASE
WHEN NUM.N IS NULL
THEN @amt
ELSE
SUBSTRING(RTRIM(@amt),1, LEN(@amt) - 1) + Num.N
END
) TP
OUTER APPLY(
SELECT Y = CASE
WHEN NEG.N = 0
THEN (CAST(TP.Amt AS NUMERIC) / 100)
WHEN NEG.N = 1
THEN (CAST (TP.Amt AS NUMERIC) /100) * -1
END
) RET
) ;
GO
如果有人感兴趣,这是我继承的标量函数版本:
CREATE FUNCTION dbo.ConvertAmountVerified
(
@amt VARCHAR(50)
)
RETURNS NUMERIC (18,3)
AS
BEGIN
-- Declare the return variable here
DECLARE @Amount NUMERIC(18, 3);
DECLARE @TempAmount VARCHAR (50);
DECLARE @Num VARCHAR(1);
DECLARE @LastChar VARCHAR(1);
DECLARE @Negative BIT ;
-- Get Last Character
SELECT @LastChar = RIGHT(RTRIM(@amt), 1) ;
SELECT @Num = CASE @LastChar collate latin1_general_cs_as
WHEN '{' THEN '0'
WHEN 'A' THEN '1'
WHEN 'B' THEN '2'
WHEN 'C' THEN '3'
WHEN 'D' THEN '4'
WHEN 'E' THEN '5'
WHEN 'F' THEN '6'
WHEN 'G' THEN '7'
WHEN 'H' THEN '8'
WHEN 'I' THEN '9'
WHEN '}' THEN '0'
WHEN 'J' THEN '1'
WHEN 'K' THEN '2'
WHEN 'L' THEN '3'
WHEN 'M' THEN '4'
WHEN 'N' THEN '5'
WHEN 'O' THEN '6'
WHEN 'P' THEN '7'
WHEN 'Q' THEN '8'
WHEN 'R' THEN '9'
---ASCII
WHEN 'p' Then '0'
WHEN 'q' Then '1'
WHEN 'r' Then '2'
WHEN 's' Then '3'
WHEN 't' Then '4'
WHEN 'u' Then '5'
WHEN 'v' Then '6'
WHEN 'w' Then '7'
WHEN 'x' Then '8'
WHEN 'y' Then '9'
ELSE ''
END
SELECT @Negative = CASE @LastChar collate latin1_general_cs_as
WHEN '{' THEN 0
WHEN 'A' THEN 0
WHEN 'B' THEN 0
WHEN 'C' THEN 0
WHEN 'D' THEN 0
WHEN 'E' THEN 0
WHEN 'F' THEN 0
WHEN 'G' THEN 0
WHEN 'H' THEN 0
WHEN 'I' THEN 0
WHEN '}' THEN 1
WHEN 'J' THEN 1
WHEN 'K' THEN 1
WHEN 'L' THEN 1
WHEN 'M' THEN 1
WHEN 'N' THEN 1
WHEN 'O' THEN 1
WHEN 'P' THEN 1
WHEN 'Q' THEN 1
WHEN 'R' THEN 1
---ASCII
WHEN 'p' Then '1'
WHEN 'q' Then '1'
WHEN 'r' Then '1'
WHEN 's' Then '1'
WHEN 't' Then '1'
WHEN 'u' Then '1'
WHEN 'v' Then '1'
WHEN 'w' Then '1'
WHEN 'x' Then '1'
WHEN 'y' Then '1'
ELSE 0
END
-- Add the T-SQL statements to compute the return value here
if (@Num ='')
begin
SELECT @TempAmount=@amt;
end
else
begin
SELECT @TempAmount = SUBSTRING(RTRIM(@amt),1, LEN(@amt) - 1) + @Num;
end
SELECT @Amount = CASE @Negative
WHEN 0 THEN (CAST(@TempAmount AS NUMERIC) / 100)
WHEN 1 THEN (CAST (@TempAmount AS NUMERIC) /100) * -1
END ;
-- Return the result of the function
RETURN @Amount
END
样本测试数据:
SELECT dbo.ConvertAmountVerified('00064170') -- 641.700
SELECT * FROM dbo.ConvertAmountVerified_TVF('00064170') -- 641.700
SELECT dbo.ConvertAmountVerified('00057600A') -- 5760.010
SELECT * FROM dbo.ConvertAmountVerified_TVF('00057600A') -- 5760.010
SELECT dbo.ConvertAmountVerified('00059224y') -- -5922.490
SELECT * FROM dbo.ConvertAmountVerified_TVF('00059224y') -- -5922.490
第一:应该提到的是,获得所需结果的绝对最快的方法是执行以下操作:
{name}_new
的表添加新列DECIMAL(18, 3)
VARCHAR
将数据从旧列一次性迁移到DECIMAL
列{name}_old
{name}
{table_name}_new
使用DECIMAL(18, 3)
数据类型创建新表DECIMAL
基于新的表。_old
_new
从新表中删除话虽这么说:您可以摆脱很多代码,因为这在很大程度上是不必要的重复。此外,至少有两个错误会导致输出有时不正确,或者有时会引发错误。这些错误被复制到 Joe 的代码中,因为它产生与 OP 代码相同的结果(包括错误)。例如:
这些值产生正确的结果:
这些值会产生不正确的结果:
此值会产生错误:
使用 将所有 3 个版本与 448,740 行进行比较
SET STATISTICS TIME ON;
,它们的运行时间都刚刚超过 5000 毫秒。但是对于 CPU 时间,结果是:设置:数据
下面创建一个表并填充它。这应该在所有运行 SQL Server 2017 的系统中创建相同的数据集,因为它们在
spt_values
. 这有助于为在他们的系统上测试的其他人提供比较基础,因为随机生成的数据会影响系统之间的时间差异,甚至如果重新生成样本数据,甚至会影响同一系统上的测试之间的时间差异。我从与 Joe 相同的 3 列表开始,但使用问题中的示例值作为模板来提出各种数字值,并附有每个可能的尾随字符选项(包括无尾随字符)。这也是我在列上强制排序的原因:我不希望我使用二进制排序实例的事实不公平地否定使用COLLATE
关键字以在 TVF 中强制使用不同的排序规则)。唯一的区别在于表中行的顺序。
设置:TVF
请注意:
_BIN2
)排序规则,它比区分大小写的排序规则更快,因为它不需要考虑任何语言规则。VARCHAR(50)
toVARCHAR(60)
, and fromNUMERIC (18,3)
toNUMERIC (18,2)
(good reason would be "they were wrong"), then I would stick with the original signature / types.100.
,-1.
, and1.
. This was not in my original version of this TVF (in the history of this answer) but I noticed someCONVERT_IMPLICIT
calls in the XML execution plan (since100
is anINT
but the operation needs to beNUMERIC
/DECIMAL
) so I just took care of that ahead of time.CHAR()
function rather than passing a string version of a number (e.g.'2'
) into aCONVERT
function (which was what I was originally doing, again in the history). This appears to be ever so slightly faster. Only a few milliseconds, but still.TEST
Please note that I had to filter out rows ending with
}
as that caused the O.P.'s and Joe's TVFs to error. While my code handles the}
correctly, I wanted to be consistent with what rows were being tested across the 3 versions. This is why the number of rows generated by the setup query is slightly higher than the number I noted above the test results for how many rows were being tested.CPU time is only slightly lower when uncommenting the
--@Dummy =
, and the ranking among the 3 TVFs is the same. But interestingly enough, when uncommenting the variable, the rankings change a little:Not sure why the O.P.'s code would perform so much better in this scenario (whereas my and Joe's code only improved marginally), but it does seem consistent across many tests. And no, I did not look at execution plan differences as I don't have time to investigate that.
EVEN FASTERER
I have completed testing of the alternate approach and it does provide a slight but definite improvement to what is shown above. The new approach uses SQLCLR and it appears to scale better. I found that when adding in the second column to the query, the T-SQL approach double in time. But, when adding in additional columns using a SQLCLR Scalar UDF, the time went up, but not by the same amount as the single column timing. Maybe there is some initial overhead in invoking the SQLCLR method (not associated with the overhead of the initial loading of the App Domain and of the Assembly into the App Domain) because the timings were (elapsed time, not CPU time):
So it's possible that the timing (of dumping to a variable, not returning the result set) has a 200 ms - 250 ms overhead and then 750 ms - 800 ms per instance time. CPU timings were: 950 ms, 1750 ms, and 2400 ms for 1, 2, and 3 instances of the UDF, respectively.
C# CODE
I originally used
SqlDecimal
as the return type, but there is a performance penalty for using that as opposed toSqlDouble
/FLOAT
. Sometimes FLOAT has issues (due to it being an imprecise type), but I verified against the T-SQL TVF via the following query and no differences were detected:TEST
我将首先将一些测试数据放入表中。我不知道你的真实数据是什么样的,所以我只使用了顺序整数:
选择所有关闭了结果集的行提供了一个基线:
如果使用函数调用的类似查询需要更多时间,那么我们对函数的开销有一个粗略的估计。以下是我将您的 TVF 称为原样的结果:
因此,该函数需要大约 40 秒的 CPU 时间来处理 650 万行。将其乘以 20 即为 800 秒的 CPU 时间。我在您的函数代码中注意到两件事:
不必要的使用
OUTER APPLY
。CROSS APPLY
将为您提供相同的结果,并且对于此查询,它将避免一堆不必要的连接。这样可以节省一点时间。这主要取决于完整查询是否并行。我对您的数据或查询一无所知,所以我只是用MAXDOP 1
. 在这种情况下,我最好使用CROSS APPLY
.CHARINDEX
当您只是针对一小部分匹配值搜索一个字符时,会有很多调用。您可以使用该ASCII()
函数和一些数学运算来避免所有字符串比较。这是编写函数的另一种方法:
在我的机器上,新功能明显更快:
可能还有一些额外的优化可用,但我的直觉说它们不会太多。根据您的代码在做什么,我看不出您会如何通过以不同方式调用您的函数来进一步改进。这只是一堆字符串操作。每行调用该函数 20 次将比一次慢,但定义已经被内联。
Alternatively you can create one permanent table.This is one time creation.
Then TVF
From @Joe example,
-- It take 30 s
If it is possible, Amount can be formatted at UI level also. This is the best option. Otherwise you can share your original query also. OR if possible keep formatted value in table also.
Try to use the following
instead
One variant with using an auxiliary table
A test query
As variant you also can try to use a temporary auxiliary table
#LastCharLink
or a variable table@LastCharLink
(but it can be slower than a real or temporary table)And use it as
or
Then you also can create a simple inline function and put into it all the conversions
And then use this function as