Endrju提出的问题 -dba

Endrju

Asked: 2024-09-10 00:09:55 +0800 CST

当前两个索引列的所有行都为 NULL 值时，计划不稳定

我遇到一个奇怪的查询案例，尽管具有良好的匹配索引，但执行时间仍然很长。问题是非聚集索引中前两列的值在所有行中均为 NULL。（对于上下文，在存档旧业务流程中的旧数据后，它就变成了这样）。查询在这两列上使用相等和非 NULL 值的 WHERE 子句，因此 SQL Server 应该很容易使用索引查找+查找来准备一个好的计划，尤其是在查找的数据不存在的情况下，因此估计为 0 或 1 行。

不幸的是，查询对表中的更改非常敏感，甚至对统计信息自动更新或数据倾斜也不敏感。我怀疑这可能是优化器中的一个错误，它无法使用单步统计直方图，其中 NULL 是范围内的唯一值。不过，我不确定我是否探索了所有选项，因此我希望有人看看它。生产服务器是 2017，但我已在 2022 中准备了这个重现，并且这两个版本中的行为是相同的。

生产中的表包含 3500 万行和大约 60 列，并且在 1 小时内更新（主要是插入+更新）大约 3000 行，因此写入流量不多，主要是读取。

我是否遗漏了什么？或者这真的是一个错误？

先感谢您。

use tempdb;
go

-- The simplest repro of the issue.

-- Just in case you wondered if it's enabled. It's not needed, but included for your reference.
alter database tempdb set auto_create_statistics on;
alter database tempdb set auto_update_statistics on;

drop table if exists t1;

create table t1 (
  ID int identity(1,1) not null
      primary key clustered,
  c2 int null,
  c3 varchar(16) null,
  c4 int null,
  c5 varchar(128) null,
  -- in prod there are ~50 more columns
);


-- The actual prod index has 4 columns with explicit ID PK at the end.
create index ix_test on t1 (c2, c3, c4, ID);

-- The prod table contains ~35 million rows. The issue occurs with much smaller
-- number of rows so I only use 1 million here to not waste your time playing with it.
--
-- It happens that in prod all rows have NULL values in c2 and c3.
insert into t1 (c4, c5)
select top (1000*1000) crypt_gen_random(2), 'a'
from sys.all_columns ac1, sys.all_columns ac2;

-- At this point the query uses the index (see the plan).
declare @0 int = 100;      -- value that may or may not exist
select ID, c2, c3, c4, c5  -- and all remaining columns in prod
from t1
where c2 = @0 and c3 = 'x' -- 'x' is a value that may or may not exist
order by c2, c3, c4, ID;   -- this is the exact order of the index
go

-- But with "option (recompile)" it does the full scan of the clustered index
-- and has awful estimates.
declare @0 int = 100;
select ID, c2, c3, c4, c5
from t1
where c2 = @0 and c3 = 'x'
order by c2, c3, c4, ID
option (recompile);
go

-- The same issue also occurs when not using parameters.
select ID, c2, c3, c4, c5
from t1
where c2 = 100 and c3 = 'x'
order by c2, c3, c4, ID
option (recompile);

-- Now let's update statistics to be as great as possible.
update statistics t1 with fullscan, maxdop=1;

-- After which it picks the right index and have great estimates with or
-- without "option (recompile)".
declare @0 int = 100;
select ID, c2, c3, c4, c5
from t1
where c2 = @0 and c3 = 'x'
order by c2, c3, c4, ID
option (recompile);

-- Insert only a tiny bit of more data. It does not trigger auto update of the
-- statistics.
insert into t1 (c4, c5)
select top (10) 1, crypt_gen_random(2)
from sys.all_columns ac1, sys.all_columns ac2;

-- And now it again does the full scan even when not using the hint nor the
-- @parameter.
select ID, c2, c3, c4, c5
from t1
where c2 = 100 and c3 = 'x'
order by c2, c3, c4, ID;
go

-- If the above gave you seek+lookup, this one will not:
declare @0 int = 100;
select ID, c2, c3, c4, c5
from t1
where c2 = @0 and c3 = 'x'
order by c2, c3, c4, ID
option (recompile);

-- When using the index hint it works fine, but has awful estimates.
select ID, c2, c3, c4, c5
from t1
where c2 = 100 and c3 = 'x'
order by c2, c3, c4, ID
option (table hint (t1, index (ix_test)));

-- I would expect it to just always use the index seek + lookup. The query
-- should always be fast. It's easy to determine that there's no data
-- satisfying the filter.

-- When you check the stats histogram, there's none.
declare @stats_id int = (
  select stats_id from sys.stats
  where [object_id] = object_id('t1') and name = 'ix_test');
select * from sys.dm_db_stats_histogram(object_id('t1'), @stats_id);

/* In SSMS you can see that there's only 1 step:
Statistics for INDEX 'ix_test'.
-------------------------------------------------------------------------------------------------------------------
   Name             Updated     Rows   Rows Sampled   Steps  Density  Average Key Length  String Index
-------------------------------------------------------------------------------------------------------------------
ix_test   Sep 9 2024 4:59PM  1000000        1000000       1        0                   8            NO  1000000  0

 All Density  Average Length         Columns
--------------------------------------------
           1               0              c2
           1               0          c2, c3
1.525879E-05               4      c2, c3, c4
       1E-06               8  c2, c3, c4, ID

Histogram Steps
RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  DISTINCT_RANGE_ROWS  AVG_RANGE_ROWS
----------------------------------------------------------------------
                       0  1000000                    0               1

so the optimizer has the knowledge that all rows have NULL values, and
that there's nothing else in the table, it just is unable to use that
information in some cases.
*/

编辑：这是使用时的查询计划OPTION (RECOMPILE)：

这是没有提示的计划。我忽略了它已被参数化的事实，尽管在字符串中传递了文字值：

Endrju

Asked: 2024-09-09 20:19:34 +0800 CST

计划指南是否考虑空间？

我正在阅读sp_create_plan_guide的文档，但发现很难理解。我的问题是：

匹配时是否考虑空格？
白色区域和空白区域之间有区别吗？
那么前导空格和尾随空格怎么样？
这里自豪地介绍了针对文档创建 Git 问题的选项在哪里？

我这里遗漏了什么？提前致谢！

当前两个索引列的所有行都为 NULL 值时，计划不稳定

计划指南是否考虑空间？

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

Endrju's questions