SQL从一个表中获取另一个表中的多个条目的ID

Question

Stark

Asked: 2024-10-08 00:20:06 +0800 CST2024-10-08 00:20:06 +0800 CST 2024-10-08 00:20:06 +0800 CST

Azure SQL：hierarchyid 性能问题

772

（抱歉文章太长——这是 SE 上的第一篇帖子）

我已经在各种应用程序中使用 HierarchyID 类型大约 10 年了，总体上我对它的性能很满意。我偶尔会遇到有关 HierarchyID 性能不佳的报告，并且总是认为这是由于索引配置不正确或可以优化的查询造成的。但现在我自己也遇到了这个问题。

Azure SQL，数据库兼容级别设置为 160（SQL Server 2022）。

我有下表：

create table [Dim]
(
    [Key] [int] not null,
    [Path] [hierarchyid] null,
    [Name] [nvarchar](256) null,
    [ParentKey] [int] null,
    [SortOrder] [int] null,
    [IsLeaf] [bit] null,
    constraint PK_Dim primary key clustered ([Key]),
    index IX_Dim_Name unique ([Name]) include ([Path]),
    index IX_Dim_ParentKey unique ([ParentKey], [SortOrder]),
    index IX_Dim_Path ([Path]),
    index IX_Dim_Leaf ([IsLeaf], [Path])
)

该表旨在模拟父子层次结构。Path 字段基于父 Path 和当前成员的 SortOrder 反映层次结构。测试数据集中有 10,010 条记录，最大层次结构深度为 8。我们运行以下查询：

select d.[Key], r.[SortOrder], r.[Key], r.[Name], r.[Level]
from (
    select row_number() over (order by d.[Path]) as [SortOrder], d.[Key], d.[Name], d.[Path].GetLevel() as [Level], d.[Path]
    from [Dim] d
    inner join [Dim] p on (d.[Path].IsDescendantOf(p.[Path]) = 1)
    where p.[Name] = 'A8'
) r
inner join [Dim] d on (d.[Path].IsDescendantOf(r.[Path]) = 1
        and d.[IsLeaf] = 1);

该查询执行耗时 14 秒，返回 3,053 行。子查询返回 1,298 行。以下是执行计划：和链接：https ://www.brentozar.com/pastetheplan/?id=SydK3qIk1l

它看上去和我预期的差不多，除了 IX_Dim_Leaf 上的索引扫描，它读取了 6,861,228 行。

现在，如果向表中添加一个附加列，该列包含路径字段的字符串表示形式以及新字段上相应的两个索引：

alter table [Dim] add [PathStr] varchar(256);
go
update [Dim] set [PathStr] = [Path].ToString();
create index [IX_Dim_PathStr] on [Dim] ([PathStr]);
create index [IX_Dim_LeafStr] on [Dim] ([IsLeaf], [PathStr]);
go

然后重写查询以使用 PathStr 而不是 Path：

select d.[Key], r.[SortOrder], r.[Key], r.[Name], r.[Level]
from (
    select row_number() over (order by d.[PathStr]) as [SortOrder], d.[Key], d.[Name], d.[Path].GetLevel() as [Level], d.[PathStr]
    from [Dim] d
    inner join [Dim] p on (d.[PathStr] like p.[PathStr] + '%')
    where p.[Name] = 'A8'
) r
inner join [Dim] d on (d.[PathStr] like r.[PathStr] + '%'
    and d.[IsLeaf] = 1);

新查询执行时间为 0.064 秒。执行计划如下：链接：https://www.brentozar.com/pastetheplan/? id=Sy5oT5Uyye

由于新字段未包含在 IX_Dim_Name 索引中并且必须添加“％”字符串，因此它实际上比第一个查询计划更复杂，但最大的区别在于外部索引扫描，其中只读取了 3053 行而不是 6.8 百万行。

对我来说，字符串字段的性能优于理论上针对此类分层查询进行了优化的 HierarchyID 字段没有任何意义。是我做错了什么，还是 SQL Server 根本无法处理子查询中的 HierarchyID，我们应该坚持使用字符串字段？

注意：将子查询的结果存储在表 var 中，然后将表 var 与 Dim 表连接起来，在使用 Hierarchyid 时实际上性能会更好一些，但不幸的是这不是一个选项。

编辑：按照 Charlieface 的以下建议，我也尝试了这个查询：

select d.[Key], r.[SortOrder], r.[Key], r.[Name], r.[Level]
from (
    select row_number() over (order by d.[Path]) as [SortOrder], d.[Key], d.[Name], d.[Path].GetLevel() as [Level], d.[Path]
    from [Dim] d
    inner join [Dim] p on (d.[Path].IsDescendantOf(p.[Path]) = 1)
    where p.[Name] = 'A8'
) r
inner join [Dim] d on (d.[Path].GetAncestor(1) = r.[Path]
        and d.[IsLeaf] = 1);

执行时间为 38 毫秒（计划执行）。看来只有 IsDescendantOf() 有问题。

1 个回答

Voted

Charlieface · Answer 1 · 2024-10-13T21:29:55+08:00

问题在于，正如您所注意到的，索引查找实际上并不是查找。它返回包含的每一IsLeaf=1行，而这占了表的很大一部分。

似乎IsDescendantOf没有使用优化的访问路径。我发现让它正常工作的唯一方法是删除派生表（子查询）并替换row_number为dense_rank。

select
  d2.[Key],
  row_number() over (order by d.Path) AS SortOrder,
  d.[Key],
  d.Name,
  d.Path.GetLevel() as Level
from Dim d
inner join Dim p on d.Path.IsDescendantOf(p.Path) = 1
inner join Dim d2 on d2.Path.IsDescendantOf(d.Path) = 1
        and d2.IsLeaf = 1
where p.Name = 'A8';

我不确定为什么会发生这种情况，因为优化器似乎很容易就能解决这个问题。这row_number不会影响这一点，即使在删除以及GetLevel()调用后，我还是得到了相同的结果。我怀疑这里优化器可能有一个错误。

如果您仍想使用派生表或 CTE，我发现一种解决方法是改为使用GetAncestor(1)来获取直接父级。这似乎仍然有一条优化路径。

select
  d.[Key],
  r.SortOrder,
  r.[Key],
  r.Name,
  r.Level
from (
    select
      row_number() over (order by d.Path) as SortOrder,
      d.[Key],
      d.Name,
      d.Path.GetLevel() as Level,
      d.Path
    from Dim d
    inner join Dim p on d.Path.GetAncestor(1) = p.Path
    where p.Name = 'A8'
) r
inner join Dim d on d.Path.GetAncestor(1) = r.Path
        and d.IsLeaf = 1;

还有一点：您最终还是得到了 Key Lookup。您可以通过添加索引来避免这种Name情况IX_Dim_Path。

    index IX_Dim_Path ([Path]) include (Name),

db<>小提琴

Azure SQL：hierarchyid 性能问题

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

Azure SQL：hierarchyid 性能问题

1 个回答

相关问题