我遇到一个奇怪的查询案例,尽管具有良好的匹配索引,但执行时间仍然很长。问题是非聚集索引中前两列的值在所有行中均为 NULL。(对于上下文,在存档旧业务流程中的旧数据后,它就变成了这样)。查询在这两列上使用相等和非 NULL 值的 WHERE 子句,因此 SQL Server 应该很容易使用索引查找+查找来准备一个好的计划,尤其是在查找的数据不存在的情况下,因此估计为 0 或 1 行。
不幸的是,查询对表中的更改非常敏感,甚至对统计信息自动更新或数据倾斜也不敏感。我怀疑这可能是优化器中的一个错误,它无法使用单步统计直方图,其中 NULL 是范围内的唯一值。不过,我不确定我是否探索了所有选项,因此我希望有人看看它。生产服务器是 2017,但我已在 2022 中准备了这个重现,并且这两个版本中的行为是相同的。
生产中的表包含 3500 万行和大约 60 列,并且在 1 小时内更新(主要是插入+更新)大约 3000 行,因此写入流量不多,主要是读取。
我是否遗漏了什么?或者这真的是一个错误?
先感谢您。
use tempdb;
go
-- The simplest repro of the issue.
-- Just in case you wondered if it's enabled. It's not needed, but included for your reference.
alter database tempdb set auto_create_statistics on;
alter database tempdb set auto_update_statistics on;
drop table if exists t1;
create table t1 (
ID int identity(1,1) not null
primary key clustered,
c2 int null,
c3 varchar(16) null,
c4 int null,
c5 varchar(128) null,
-- in prod there are ~50 more columns
);
-- The actual prod index has 4 columns with explicit ID PK at the end.
create index ix_test on t1 (c2, c3, c4, ID);
-- The prod table contains ~35 million rows. The issue occurs with much smaller
-- number of rows so I only use 1 million here to not waste your time playing with it.
--
-- It happens that in prod all rows have NULL values in c2 and c3.
insert into t1 (c4, c5)
select top (1000*1000) crypt_gen_random(2), 'a'
from sys.all_columns ac1, sys.all_columns ac2;
-- At this point the query uses the index (see the plan).
declare @0 int = 100; -- value that may or may not exist
select ID, c2, c3, c4, c5 -- and all remaining columns in prod
from t1
where c2 = @0 and c3 = 'x' -- 'x' is a value that may or may not exist
order by c2, c3, c4, ID; -- this is the exact order of the index
go
-- But with "option (recompile)" it does the full scan of the clustered index
-- and has awful estimates.
declare @0 int = 100;
select ID, c2, c3, c4, c5
from t1
where c2 = @0 and c3 = 'x'
order by c2, c3, c4, ID
option (recompile);
go
-- The same issue also occurs when not using parameters.
select ID, c2, c3, c4, c5
from t1
where c2 = 100 and c3 = 'x'
order by c2, c3, c4, ID
option (recompile);
-- Now let's update statistics to be as great as possible.
update statistics t1 with fullscan, maxdop=1;
-- After which it picks the right index and have great estimates with or
-- without "option (recompile)".
declare @0 int = 100;
select ID, c2, c3, c4, c5
from t1
where c2 = @0 and c3 = 'x'
order by c2, c3, c4, ID
option (recompile);
-- Insert only a tiny bit of more data. It does not trigger auto update of the
-- statistics.
insert into t1 (c4, c5)
select top (10) 1, crypt_gen_random(2)
from sys.all_columns ac1, sys.all_columns ac2;
-- And now it again does the full scan even when not using the hint nor the
-- @parameter.
select ID, c2, c3, c4, c5
from t1
where c2 = 100 and c3 = 'x'
order by c2, c3, c4, ID;
go
-- If the above gave you seek+lookup, this one will not:
declare @0 int = 100;
select ID, c2, c3, c4, c5
from t1
where c2 = @0 and c3 = 'x'
order by c2, c3, c4, ID
option (recompile);
-- When using the index hint it works fine, but has awful estimates.
select ID, c2, c3, c4, c5
from t1
where c2 = 100 and c3 = 'x'
order by c2, c3, c4, ID
option (table hint (t1, index (ix_test)));
-- I would expect it to just always use the index seek + lookup. The query
-- should always be fast. It's easy to determine that there's no data
-- satisfying the filter.
-- When you check the stats histogram, there's none.
declare @stats_id int = (
select stats_id from sys.stats
where [object_id] = object_id('t1') and name = 'ix_test');
select * from sys.dm_db_stats_histogram(object_id('t1'), @stats_id);
/* In SSMS you can see that there's only 1 step:
Statistics for INDEX 'ix_test'.
-------------------------------------------------------------------------------------------------------------------
Name Updated Rows Rows Sampled Steps Density Average Key Length String Index
-------------------------------------------------------------------------------------------------------------------
ix_test Sep 9 2024 4:59PM 1000000 1000000 1 0 8 NO 1000000 0
All Density Average Length Columns
--------------------------------------------
1 0 c2
1 0 c2, c3
1.525879E-05 4 c2, c3, c4
1E-06 8 c2, c3, c4, ID
Histogram Steps
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
----------------------------------------------------------------------
0 1000000 0 1
so the optimizer has the knowledge that all rows have NULL values, and
that there's nothing else in the table, it just is unable to use that
information in some cases.
*/
编辑:这是使用时的查询计划OPTION (RECOMPILE)
:
这是没有提示的计划。我忽略了它已被参数化的事实,尽管在字符串中传递了文字值: