Stark提出的问题 -dba

Stark

Asked: 2024-10-08 00:20:06 +0800 CST

Azure SQL：hierarchyid 性能问题

（抱歉文章太长——这是 SE 上的第一篇帖子）

我已经在各种应用程序中使用 HierarchyID 类型大约 10 年了，总体上我对它的性能很满意。我偶尔会遇到有关 HierarchyID 性能不佳的报告，并且总是认为这是由于索引配置不正确或可以优化的查询造成的。但现在我自己也遇到了这个问题。

Azure SQL，数据库兼容级别设置为 160（SQL Server 2022）。

我有下表：

create table [Dim]
(
    [Key] [int] not null,
    [Path] [hierarchyid] null,
    [Name] [nvarchar](256) null,
    [ParentKey] [int] null,
    [SortOrder] [int] null,
    [IsLeaf] [bit] null,
    constraint PK_Dim primary key clustered ([Key]),
    index IX_Dim_Name unique ([Name]) include ([Path]),
    index IX_Dim_ParentKey unique ([ParentKey], [SortOrder]),
    index IX_Dim_Path ([Path]),
    index IX_Dim_Leaf ([IsLeaf], [Path])
)

该表旨在模拟父子层次结构。Path 字段基于父 Path 和当前成员的 SortOrder 反映层次结构。测试数据集中有 10,010 条记录，最大层次结构深度为 8。我们运行以下查询：

select d.[Key], r.[SortOrder], r.[Key], r.[Name], r.[Level]
from (
    select row_number() over (order by d.[Path]) as [SortOrder], d.[Key], d.[Name], d.[Path].GetLevel() as [Level], d.[Path]
    from [Dim] d
    inner join [Dim] p on (d.[Path].IsDescendantOf(p.[Path]) = 1)
    where p.[Name] = 'A8'
) r
inner join [Dim] d on (d.[Path].IsDescendantOf(r.[Path]) = 1
        and d.[IsLeaf] = 1);

该查询执行耗时 14 秒，返回 3,053 行。子查询返回 1,298 行。以下是执行计划：和链接：https ://www.brentozar.com/pastetheplan/?id=SydK3qIk1l

它看上去和我预期的差不多，除了 IX_Dim_Leaf 上的索引扫描，它读取了 6,861,228 行。

现在，如果向表中添加一个附加列，该列包含路径字段的字符串表示形式以及新字段上相应的两个索引：

alter table [Dim] add [PathStr] varchar(256);
go
update [Dim] set [PathStr] = [Path].ToString();
create index [IX_Dim_PathStr] on [Dim] ([PathStr]);
create index [IX_Dim_LeafStr] on [Dim] ([IsLeaf], [PathStr]);
go

然后重写查询以使用 PathStr 而不是 Path：

select d.[Key], r.[SortOrder], r.[Key], r.[Name], r.[Level]
from (
    select row_number() over (order by d.[PathStr]) as [SortOrder], d.[Key], d.[Name], d.[Path].GetLevel() as [Level], d.[PathStr]
    from [Dim] d
    inner join [Dim] p on (d.[PathStr] like p.[PathStr] + '%')
    where p.[Name] = 'A8'
) r
inner join [Dim] d on (d.[PathStr] like r.[PathStr] + '%'
    and d.[IsLeaf] = 1);

新查询执行时间为 0.064 秒。执行计划如下：链接：https://www.brentozar.com/pastetheplan/? id=Sy5oT5Uyye

由于新字段未包含在 IX_Dim_Name 索引中并且必须添加“％”字符串，因此它实际上比第一个查询计划更复杂，但最大的区别在于外部索引扫描，其中只读取了 3053 行而不是 6.8 百万行。

对我来说，字符串字段的性能优于理论上针对此类分层查询进行了优化的 HierarchyID 字段没有任何意义。是我做错了什么，还是 SQL Server 根本无法处理子查询中的 HierarchyID，我们应该坚持使用字符串字段？

注意：将子查询的结果存储在表 var 中，然后将表 var 与 Dim 表连接起来，在使用 Hierarchyid 时实际上性能会更好一些，但不幸的是这不是一个选项。

编辑：按照 Charlieface 的以下建议，我也尝试了这个查询：

select d.[Key], r.[SortOrder], r.[Key], r.[Name], r.[Level]
from (
    select row_number() over (order by d.[Path]) as [SortOrder], d.[Key], d.[Name], d.[Path].GetLevel() as [Level], d.[Path]
    from [Dim] d
    inner join [Dim] p on (d.[Path].IsDescendantOf(p.[Path]) = 1)
    where p.[Name] = 'A8'
) r
inner join [Dim] d on (d.[Path].GetAncestor(1) = r.[Path]
        and d.[IsLeaf] = 1);

执行时间为 38 毫秒（计划执行）。看来只有 IsDescendantOf() 有问题。

Azure SQL：hierarchyid 性能问题

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

Stark's questions