SQL Server - 使用聚集索引时如何存储数据页

Question

Endrju

Asked: 2024-09-10 00:09:55 +0800 CST2024-09-10 00:09:55 +0800 CST 2024-09-10 00:09:55 +0800 CST

当前两个索引列的所有行都为 NULL 值时，计划不稳定

772

我遇到一个奇怪的查询案例，尽管具有良好的匹配索引，但执行时间仍然很长。问题是非聚集索引中前两列的值在所有行中均为 NULL。（对于上下文，在存档旧业务流程中的旧数据后，它就变成了这样）。查询在这两列上使用相等和非 NULL 值的 WHERE 子句，因此 SQL Server 应该很容易使用索引查找+查找来准备一个好的计划，尤其是在查找的数据不存在的情况下，因此估计为 0 或 1 行。

不幸的是，查询对表中的更改非常敏感，甚至对统计信息自动更新或数据倾斜也不敏感。我怀疑这可能是优化器中的一个错误，它无法使用单步统计直方图，其中 NULL 是范围内的唯一值。不过，我不确定我是否探索了所有选项，因此我希望有人看看它。生产服务器是 2017，但我已在 2022 中准备了这个重现，并且这两个版本中的行为是相同的。

生产中的表包含 3500 万行和大约 60 列，并且在 1 小时内更新（主要是插入+更新）大约 3000 行，因此写入流量不多，主要是读取。

我是否遗漏了什么？或者这真的是一个错误？

先感谢您。

use tempdb;
go

-- The simplest repro of the issue.

-- Just in case you wondered if it's enabled. It's not needed, but included for your reference.
alter database tempdb set auto_create_statistics on;
alter database tempdb set auto_update_statistics on;

drop table if exists t1;

create table t1 (
  ID int identity(1,1) not null
      primary key clustered,
  c2 int null,
  c3 varchar(16) null,
  c4 int null,
  c5 varchar(128) null,
  -- in prod there are ~50 more columns
);


-- The actual prod index has 4 columns with explicit ID PK at the end.
create index ix_test on t1 (c2, c3, c4, ID);

-- The prod table contains ~35 million rows. The issue occurs with much smaller
-- number of rows so I only use 1 million here to not waste your time playing with it.
--
-- It happens that in prod all rows have NULL values in c2 and c3.
insert into t1 (c4, c5)
select top (1000*1000) crypt_gen_random(2), 'a'
from sys.all_columns ac1, sys.all_columns ac2;

-- At this point the query uses the index (see the plan).
declare @0 int = 100;      -- value that may or may not exist
select ID, c2, c3, c4, c5  -- and all remaining columns in prod
from t1
where c2 = @0 and c3 = 'x' -- 'x' is a value that may or may not exist
order by c2, c3, c4, ID;   -- this is the exact order of the index
go

-- But with "option (recompile)" it does the full scan of the clustered index
-- and has awful estimates.
declare @0 int = 100;
select ID, c2, c3, c4, c5
from t1
where c2 = @0 and c3 = 'x'
order by c2, c3, c4, ID
option (recompile);
go

-- The same issue also occurs when not using parameters.
select ID, c2, c3, c4, c5
from t1
where c2 = 100 and c3 = 'x'
order by c2, c3, c4, ID
option (recompile);

-- Now let's update statistics to be as great as possible.
update statistics t1 with fullscan, maxdop=1;

-- After which it picks the right index and have great estimates with or
-- without "option (recompile)".
declare @0 int = 100;
select ID, c2, c3, c4, c5
from t1
where c2 = @0 and c3 = 'x'
order by c2, c3, c4, ID
option (recompile);

-- Insert only a tiny bit of more data. It does not trigger auto update of the
-- statistics.
insert into t1 (c4, c5)
select top (10) 1, crypt_gen_random(2)
from sys.all_columns ac1, sys.all_columns ac2;

-- And now it again does the full scan even when not using the hint nor the
-- @parameter.
select ID, c2, c3, c4, c5
from t1
where c2 = 100 and c3 = 'x'
order by c2, c3, c4, ID;
go

-- If the above gave you seek+lookup, this one will not:
declare @0 int = 100;
select ID, c2, c3, c4, c5
from t1
where c2 = @0 and c3 = 'x'
order by c2, c3, c4, ID
option (recompile);

-- When using the index hint it works fine, but has awful estimates.
select ID, c2, c3, c4, c5
from t1
where c2 = 100 and c3 = 'x'
order by c2, c3, c4, ID
option (table hint (t1, index (ix_test)));

-- I would expect it to just always use the index seek + lookup. The query
-- should always be fast. It's easy to determine that there's no data
-- satisfying the filter.

-- When you check the stats histogram, there's none.
declare @stats_id int = (
  select stats_id from sys.stats
  where [object_id] = object_id('t1') and name = 'ix_test');
select * from sys.dm_db_stats_histogram(object_id('t1'), @stats_id);

/* In SSMS you can see that there's only 1 step:
Statistics for INDEX 'ix_test'.
-------------------------------------------------------------------------------------------------------------------
   Name             Updated     Rows   Rows Sampled   Steps  Density  Average Key Length  String Index
-------------------------------------------------------------------------------------------------------------------
ix_test   Sep 9 2024 4:59PM  1000000        1000000       1        0                   8            NO  1000000  0

 All Density  Average Length         Columns
--------------------------------------------
           1               0              c2
           1               0          c2, c3
1.525879E-05               4      c2, c3, c4
       1E-06               8  c2, c3, c4, ID

Histogram Steps
RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  DISTINCT_RANGE_ROWS  AVG_RANGE_ROWS
----------------------------------------------------------------------
                       0  1000000                    0               1

so the optimizer has the knowledge that all rows have NULL values, and
that there's nothing else in the table, it just is unable to use that
information in some cases.
*/

编辑：这是使用时的查询计划OPTION (RECOMPILE)：

这是没有提示的计划。我忽略了它已被参数化的事实，尽管在字符串中传递了文字值：

3 个回答

Voted

Paul White · Answer 1 · 2024-09-10T02:54:47+08:00

-- When you check the stats histogram, there's none.

这是中的一个错误sys.dm_db_stats_histogram。DBCC SHOW_STATISTICS正确显示 NULL 行（这是 SSMS 使用的）。的原始问题sys.dm_db_stats_histogram是它从未显示额外的 NULL 步骤。他们修复了这个问题，但如果这是唯一的步骤，NULL它仍然不会显示该步骤。他们对修复这个问题没有表现出任何兴趣。

...非聚集索引中的前 2 列在所有行中均为 NULL。（对于上下文，在将旧业务流程中的旧数据存档后，它就变成了这样）。查询在这两列上使用相等和非 NULL 值的 WHERE 子句，因此 SQL Server 应该很容易使用索引查找 + 查找来准备一个好的计划，尤其是在查找的数据不存在的情况下，因此估计为 0 或 1 行。

是的，但那只是一个疯狂的场景。同样，创建一个带有前导列的索引也很奇怪。直方图（仅基于第一列构建）几乎毫无用处。是的，当统计数据刚构建时，它给出了一个很好的估计（SQL Server 从不估计 0 行，除了一个没有人关心的狭义情况）。

因此优化器知道所有行都有 NULL 值，并且表中没有其他内容，只是在某些情况下无法使用该信息。

当对基础表进行修改时，SQL Server 会记录此事实并使用它来修改直方图及其一般选择性估计。它对这些修改中使用的值一无所知。

它应该假设什么？新值全部为 NULL？也许，但事实并非如此。没有理由仅仅因为现有值为 NULL 就认为新值也将为 NULL。我们在这里牢牢地陷入了猜测的境地。如果 SQL Server 假设所有未来值都落在现有直方图内，其他人会抱怨这个假设。

另一个建模假设是，用户通常会编写查询来查找存在的数据。这可能是也可能不总是一个好的假设，但事实就是如此。您正在寻找您知道不存在的数据。您的查询根本不符合模型。

总而言之，难怪您得到的计划和估算不稳定。您向优化器提出了疯狂的问题，但没有给它提供有用的信息。SQL Server 不是魔术，无法猜测它在疯狂的情况下运行。设计或支持索引和查询的某些方面需要更改。

如果你不能做其他任何事情，至少告诉优化器这些列始终为 NULL：

ALTER TABLE dbo.t1
    WITH CHECK ADD
    CONSTRAINT 
        [CK dbo.t1 c2 IS NULL]
    CHECK (c2 IS NULL);

ALTER TABLE dbo.t1
    WITH CHECK ADD
    CONSTRAINT 
        [CK dbo.t1 c3 IS NULL]
    CHECK (c3 IS NULL);

由于根本不需要访问表，因此您的所有计划看起来都会像这样：

不管怎样，您的脚本总是会为我生成一个查找加查找计划，并使用兼容级别 160 和默认基数估计器在 SQL Server 2022 CU14 上进行测试。

您始终可以使用FORCESEEK提示，直接在查询中或通过查询存储或计划指南。我不知道如果您获得了搜索加查找计划，为什么您会关心基数估计，但也许问题中缺少该信息。

无论如何，你的问题的答案是否定的，这不是 SQL Server 错误。

Andrea B. · Answer 2 · 2024-09-10T21:19:50+08:00

Andrea B.

2024-09-10T21:19:50+08:002024-09-10T21:19:50+08:00

您可以创建仅包含 NOT NULL 值的部分索引：

create index ix_test on t1 (c2, c3, c4, ID) where c2 is not null and c3 is not null;

该索引将非常小（并且速度快），并且仅在插入非空值的新记录时才会更新。

为了让优化器使用该索引，您需要在要优化的查询中重复索引条件：

select ID, c2, c3, c4, c5
from t1
where c2 is not null and c3 is not null
and c2 = 100 and c3 = 'x'
order by c2, c3, c4, ID;

3

Endrju · Answer 3 · 2024-09-11T16:42:23+08:00

Best Answer

Endrju

2024-09-11T16:42:23+08:002024-09-11T16:42:23+08:00

我不愿意认可自己的答案，但这次我认为我是对的。

经过 3 天的摸索，我得出结论：SQL Server 无法处理单步统计信息。在所有行中将列 c2 的值更新为 -1，将 c3 的值更新为“a”后，问题仍然存在，因此它不是 NULL 特有的。

在表格变大之前，这不是问题。但是当表格很大时，您就会开始注意到它。

在 c2 中添加仅 1000 行且包含一些非 NULL 值的行后，问题就完全消失了。在生产系统中，我们取消存档了这少量的旧数据，更新了统计数据，然后就好了！计划很稳定，并且有 seek+lookup，估计行数约为 1，正如预期的那样。

附注：PostgreSQL 16 和 SQLite 可以很好地处理这种情况。尝试过高达 3000 万的大小。

谢谢大家的回答和评论。

-1

当前两个索引列的所有行都为 NULL 值时，计划不稳定

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

当前两个索引列的所有行都为 NULL 值时，计划不稳定

3 个回答

相关问题