SQL Server - 使用聚集索引时如何存储数据页

Question

crokusek

Asked: 2023-12-06 09:58:11 +0800 CST2023-12-06 09:58:11 +0800 CST 2023-12-06 09:58:11 +0800 CST

如何允许谓词推送到使用 group by 的视图上

772

我们有一个表 Ecom.McProductToVendorProductCodeMap ，它具有多字段 PK，如下所示：

然后，视图包装该表以计算指标，并按该 PK 的前两个字段进行分组：

ALTER view ECom.McProductToVendorProductMd5SourceView
as
select ClientAppPrivateLabelId,
       BrandId, 
       convert(nvarchar(32), HashBytes('MD5', 
              string_agg(
                  convert(varchar(max), MaterialNumber + ',' + VendorProductCode + ',' + convert(varchar(30), VendorProductStatusId)),    -- sense any MaterialNumber/VendorProductCode/Status changes
                  ',') within group (order by MaterialNumber)
          ), 2) as Md5,
       Count(*) as Count,
       max(ModifiedUtc) as ModifiedUtc
  from ECom.McProductToVendorProductCodeMap 
 group by ClientAppPrivateLabelId, BrandId

现在，如果我们直接使用这 2 个字段作为谓词来选择视图，则使用这 2 个字段进行索引查找（19k 行，工具提示在 2 个字段上显示“Seek Predicate”）：

select * from ECom.McProductToVendorProductMd5SourceView
where ClientAppPrivateLabelId = 101 and BRandId = 3

然而，当尝试使用相同的 2 个谓词加入同一视图时，它仅在 ClientAppPrivateLabelId 而不是 BrandId 上查找。循环连接提示没有帮助，用交叉应用替换连接也没有帮助。

select IsNull(convert(smallint, Value), 0) as BrandId 
  into #Brands 
  from string_split('2,3', ',');    
    
select ClientAppPrivateLabelId, b.BrandId, Md5, Count, ModifiedUtc
  from #Brands b
 inner loop join ECom.McProductToVendorProductMd5SourceView m
    on m.BrandId = b.BrandId
   and m.ClientAppPrivateLabelId = 101;

除了窗口计算之外，视图很简单：

ALTER view ECom.McProductToVendorProductMd5SourceView
as
select ClientAppPrivateLabelId,
       BrandId, 
       convert(nvarchar(32), HashBytes('MD5', 
              string_agg(
                  convert(varchar(max), MaterialNumber + ',' + VendorProductCode + ',' + convert(varchar(30), VendorProductStatusId)),    -- sense any MaterialNumber/VendorProductCode/Status changes
                  ',') within group (order by MaterialNumber)
          ), 2) as Md5,
       Count(*) as Count,
       max(ModifiedUtc) as ModifiedUtc
  from ECom.McProductToVendorProductCodeMap 
 group by ClientAppPrivateLabelId, BrandId

为什么不使用 BrandId？原始表将 BrandId 定义为不可为 null 的smallint。

粘贴计划： https://www.brentozar.com/pastetheplan/? id=ryZWp86Hp

更新 #1 (12/5/2023)

将视图转换为表值函数 (TVF)：

alter function ECom.McProductToVendorProductMd5(
   @pBrandId smallint,
   @pClientAppPrivateLabelId smallint
)
returns table as 
return
select ClientAppPrivateLabelId,
       BrandId, 
       convert(nvarchar(32), HashBytes('MD5', 
              string_agg(
                  -- Sense any MaterialNumber/VendorProductCode/Status changes
                  convert(varchar(max), MaterialNumber + ',' + VendorProductCode + ',' + convert(varchar(30), VendorProductStatusId)),    
                  ',') within group (order by MaterialNumber)
          ), 2) as Md5,
       Count(*) as Count,
       max(ModifiedUtc) as ModifiedUtc
  from ECom.McProductToVendorProductCodeMap m
 where m.BrandId = @pBrandId
   and m.ClientAppPrivateLabelId = @pClientAppPrivateLabelId
 group by ClientAppPrivateLabelId, BrandId

并调整查询以通过交叉应用使用它：

select ClientAppPrivateLabelId, b.BrandId, Md5, Count, ModifiedUtc
  from #Brands b
 cross apply ECom.McProductToVendorProductMd5(b.BrandId, @pCaplId) m;

同样的问题： https ://www.brentozar.com/pastetheplan/?id=SJnRODaBT

它使用合并连接而不是在 BrandId 上搜索

1 个回答

Voted

Paul White · Answer 1 · 2023-12-06T20:04:43+08:00

SQL Server 非常热衷于在优化开始之前将apply 重写为联接。它也很擅长。它不太擅长将连接转换为应用，而这正是您想要的。

因此，当您编写联接时，它仍然是联接。当您编写 apply 时，它会转换为 join。

尽管未记录的跟踪标志 9114 执行此功能，但没有提示可以避免从 apply 到 join 的初始重写。以前由未记录的跟踪标志启用的行为最终已作为USE HINT选项浮出水面，因此也许有一天这种情况会改变。

要同时解决此问题，请将联接编写为应用，并使用OUTER APPLYor 冗余OFFSET来防止优化器将应用转换为联接。

SQL Server 原则上能够重写外部应用和OFFSET/TOP连接。它没有OFFSET/TOP专门这样做，因为人们过去经常使用它来避免转换为连接。外部应用不太适合转换，但它可能会发生。

外敷

SELECT
    MA.ClientAppPrivateLabelId,
    B.BrandId,
    MA.Md5,
    MA.[Count],
    MA.ModifiedUtc
FROM #Brands AS B
OUTER APPLY
(
    SELECT
        M.* 
    FROM ECom.McProductToVendorProductMd5SourceView AS M
    WHERE
        M.ClientAppPrivateLabelId = 101
        AND M.BrandId = B.BrandId
) AS MA;

冗余偏移

SELECT
    MA.ClientAppPrivateLabelId,
    B.BrandId,
    MA.Md5,
    MA.[Count],
    MA.ModifiedUtc
FROM #Brands AS B
CROSS APPLY
(
    SELECT
        M.* 
    FROM ECom.McProductToVendorProductMd5SourceView AS M
    WHERE
        M.ClientAppPrivateLabelId = 101
        AND M.BrandId = B.BrandId
    ORDER BY
        M.ClientAppPrivateLabelId,
        M.BrandId
        OFFSET 0 ROWS
) AS MA;

如果您想将其封装在函数中，一种可能的实现是：

CREATE OR ALTER FUNCTION ECom.McProductToVendorProductMd5
(
    @pBrandId integer,
    @pClientAppPrivateLabelId integer
)
RETURNS table
AS
RETURN 
    SELECT
        M.ClientAppPrivateLabelId,
        M.BrandId,
        Md5 = 
            CONVERT(char(32),
                HASHBYTES(N'MD5',
                    STRING_AGG(CSV.cols, ',')
                        WITHIN GROUP (ORDER BY M.MaterialNumber)), 
                2),
        [Count] = COUNT_BIG(*),
        ModifiedUtc = MAX(M.ModifiedUtc)
    FROM 
        ECom.McProductToVendorProductCodeMap AS M
    CROSS APPLY 
    (
        VALUES
        (
            CONVERT(varchar(max),
                CONCAT_WS(',', M.MaterialNumber, M.VendorProductCode, M.VendorProductStatusId))
        )
    ) AS CSV (cols)
    WHERE
        M.BrandId = @pBrandId
        AND M.ClientAppPrivateLabelId = @pClientAppPrivateLabelId
    GROUP BY
        M.ClientAppPrivateLabelId,
        M.BrandId
    ORDER BY
        M.ClientAppPrivateLabelId
        OFFSET 0 ROWS;

如何允许谓词推送到使用 group by 的视图上

外敷

冗余偏移

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

如何允许谓词推送到使用 group by 的视图上

1 个回答

外敷

冗余偏移

相关问题