SQL Server - 使用聚集索引时如何存储数据页

Question

孔夫子

Asked: 2012-10-27 17:07:09 +0800 CST2012-10-27 17:07:09 +0800 CST 2012-10-27 17:07:09 +0800 CST

创建计划指南以缓存（惰性假脱机）CTE 结果

772

我通常通过首先构建一个使用正确计划的查询，然后将其复制到不使用正确计划的类似查询来创建计划指南。但是，这有时很棘手，尤其是在查询不完全相同的情况下。从头开始创建计划指南的正确方法是什么？

SQLKiwi 提到了在 SSIS 中制定计划，有没有一种方法或有用的工具可以帮助为 SQL Server 制定一个好的计划？

有问题的具体实例是这个 CTE：SQLFiddle

with cte(guid,other) as (
  select newid(),1 union all
  select newid(),2 union all
  select newid(),3)
select a.guid, a.other, b.guid guidb, b.other otherb
from cte a
cross join cte b
order by a.other, b.other;

有什么方法可以让结果恰好有 3 个不同guid的 s 而不是更多？我希望将来能够通过包含 CTE 类型查询的计划指南来更好地回答问题，这些查询被多次引用以克服一些 SQL Server CTE 怪癖。

5 个回答

Voted

Paul White · Answer 1 · 2012-10-28T22:47:55+08:00

有什么方法可以让结果恰好有 3 个不同的 guid 而不是更多？我希望将来能够通过包含 CTE 类型查询的计划指南来更好地回答问题，这些查询被多次引用以克服一些 SQL Server CTE 怪癖。

今天不行。非递归公用表表达式 (CTE) 被视为内联视图定义，并在优化之前在引用它们的每个位置（就像常规视图定义一样）扩展到逻辑查询树。您的查询的逻辑树是：

LogOp_OrderByCOL: Union1007 ASC COL: Union1015 ASC 
    LogOp_Project COL: Union1006 COL: Union1007 COL: Union1014 COL: Union1015
        LogOp_Join
            LogOp_ViewAnchor
                LogOp_UnionAll
                    LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const
                    LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const
                    LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const

            LogOp_ViewAnchor
                LogOp_UnionAll
                    LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const
                    LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const
                    LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const

在优化开始之前，请注意两个视图锚点和对内部函数的六个调用。newid尽管如此，许多人认为优化器应该能够识别扩展的子树最初是单个引用的对象并相应地简化。还有几个Connect 请求允许显式实现 CTE 或派生表。

更通用的实现是让优化器考虑具体化任意常用表达式以提高性能（使用子查询是当今可能出现问题CASE的另一个示例）。微软研究院早在 2007 年就发表了一篇论文(PDF)，但迄今为止仍未实现。目前，我们仅限于使用表变量和临时表之类的显式实现。

SQLKiwi 提到了在 SSIS 中制定计划，有没有一种方法或有用的工具可以帮助为 SQL Server 制定一个好的计划？

这只是我的一厢情愿，远远超出了修改计划指南的想法。原则上，可以编写一个工具来直接操作显示计划 XML，但是如果没有特定的优化器工具，使用该工具可能会给用户带来令人沮丧的体验（并且开发人员会想到它）。

在这个问题的特定上下文中，这样的工具仍然无法以可供多个消费者使用的方式具体化 CTE 内容（在这种情况下将两个输入都提供给交叉连接）。优化器和执行引擎确实支持多消费者线轴，但仅用于特定目的 - 没有一个可以应用于此特定示例。

虽然我不确定，但我有一种相当强烈的预感，即即使查询与计划不完全相同，也可以遵循 RelOps（嵌套循环、延迟假脱机）——例如，如果您将 4 和 5 添加到 CTE ，它仍然继续使用相同的计划（似乎 - 在 SQL Server 2012 RTM Express 上测试过）。

这里有一个合理的灵活性。XML 计划的宽泛形式用于指导对最终计划的搜索（尽管许多属性被完全忽略，例如交易所的分区类型），并且正常的搜索规则也相当宽松。例如，禁用基于成本考虑的替代方案的早期修剪，允许显式引入交叉连接，并且忽略标量操作。

有太多细节需要深入探讨，但不能强制过滤器和计算标量的放置，并且形式的谓词column = value是通用的，因此包含 or 的计划X = 1可以X = @X应用于包含X = 502or的查询X = @Y。这种特殊的灵活性可以极大地帮助找到一个自然的强制计划。

在具体的例子中，常量Union All总是可以实现为Constant Scan；Union All 的输入数量无关紧要。

孔夫子 · Answer 2 · 2012-10-28T02:21:06+08:00

没有办法（2012 年以前的 SQL Server 版本）对 CTE 的两种情况重复使用单个假脱机。详细信息可以在 SQLKiwi 的回答中找到。下面是两次实现 CTE 的两种方法，这对于查询的性质是不可避免的。这两个选项都会导致净不同的 guid 计数为 6。

从 Martin 的评论到 Quassnoi 网站上关于计划指导 CTE 的博客的链接是这个问题的部分灵感。它描述了一种实现 CTE 以用于相关子查询的方法，该子查询仅被引用一次，尽管相关性可能导致对其进行多次评估。这不适用于问题中的查询。

选项 1 - 计划指南

从 SQLKiwi 的回答中得到提示，我已将指南缩减到仍然可以完成这项工作的最低限度，例如，ConstantScan节点仅列出 2 个可以充分扩展到任意数量的标量运算符。

;with cte(guid,other) as (
  select newid(),1 union all
  select newid(),2 union all
  select newid(),3)
select a.guid, a.other, b.guid guidb, b.other otherb
from cte a
cross join cte b
order by a.other, b.other
OPTION(USE PLAN
N'<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.2" Build="11.0.2100.60" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
  <BatchSequence>
    <Batch>
      <Statements>
        <StmtSimple StatementCompId="1" StatementEstRows="1600" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" StatementSubTreeCost="0.0444433" StatementText="with cte(guid,other) as (&#xD;&#xA;  select newid(),1 union all&#xD;&#xA;  select newid(),2 union all&#xD;&#xA;  select newid(),3&#xD;&#xA;select a.guid, a.other, b.guid guidb, b.other otherb&#xD;&#xA;from cte a&#xD;&#xA;cross join cte b&#xD;&#xA;order by a.other, b.other;&#xD;&#xA;" StatementType="SELECT" QueryHash="0x43D93EF17C8E55DD" QueryPlanHash="0xF8E3B336792D84" RetrievedFromCache="true">
          <StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
          <QueryPlan NonParallelPlanReason="EstimatedDOPIsOne" CachedPlanSize="96" CompileTime="13" CompileCPU="13" CompileMemory="1152">
            <MemoryGrantInfo SerialRequiredMemory="0" SerialDesiredMemory="0" />
            <OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="157240" EstimatedPagesCached="1420" EstimatedAvailableDegreeOfParallelism="1" />
            <RelOp AvgRowSize="47" EstimateCPU="0.006688" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1600" LogicalOp="Inner Join" NodeId="0" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="0.0444433">
              <OutputList>
                <ColumnReference Column="Union1163" />
              </OutputList>
              <Warnings NoJoinPredicate="true" />
              <NestedLoops Optimized="false">
                <RelOp AvgRowSize="27" EstimateCPU="0.000432115" EstimateIO="0.0112613" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="40" LogicalOp="Sort" NodeId="1" Parallel="false" PhysicalOp="Sort" EstimatedTotalSubtreeCost="0.0117335">
                  <OutputList>
                    <ColumnReference Column="Union1080" />
                    <ColumnReference Column="Union1081" />
                  </OutputList>
                  <MemoryFractions Input="0" Output="0" />
                  <Sort Distinct="false">
                    <OrderBy>
                      <OrderByColumn Ascending="true">
                        <ColumnReference Column="Union1081" />
                      </OrderByColumn>
                    </OrderBy>
                    <RelOp AvgRowSize="27" EstimateCPU="4.0157E-05" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="40" LogicalOp="Constant Scan" NodeId="2" Parallel="false" PhysicalOp="Constant Scan" EstimatedTotalSubtreeCost="4.0157E-05">
                      <OutputList>
                        <ColumnReference Column="Union1080" />
                        <ColumnReference Column="Union1081" />
                      </OutputList>
                      <ConstantScan>
                        <Values>
                          <Row>
                            <ScalarOperator ScalarString="newid()">
                              <Intrinsic FunctionName="newid" />
                            </ScalarOperator>
                            <ScalarOperator ScalarString="(1)">
                              <Const ConstValue="(1)" />
                            </ScalarOperator>
                          </Row>
                          <Row>
                            <ScalarOperator ScalarString="newid()">
                              <Intrinsic FunctionName="newid" />
                            </ScalarOperator>
                            <ScalarOperator ScalarString="(2)">
                              <Const ConstValue="(2)" />
                            </ScalarOperator>
                          </Row>
                        </Values>
                      </ConstantScan>
                    </RelOp>
                  </Sort>
                </RelOp>
                <RelOp AvgRowSize="27" EstimateCPU="0.0001074" EstimateIO="0.01" EstimateRebinds="0" EstimateRewinds="39" EstimatedExecutionMode="Row" EstimateRows="40" LogicalOp="Lazy Spool" NodeId="83" Parallel="false" PhysicalOp="Table Spool" EstimatedTotalSubtreeCost="0.0260217">
                  <OutputList>
                    <ColumnReference Column="Union1162" />
                    <ColumnReference Column="Union1163" />
                  </OutputList>
                  <Spool>
                    <RelOp AvgRowSize="27" EstimateCPU="0.000432115" EstimateIO="0.0112613" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="40" LogicalOp="Sort" NodeId="84" Parallel="false" PhysicalOp="Sort" EstimatedTotalSubtreeCost="0.0117335">
                      <OutputList>
                        <ColumnReference Column="Union1162" />
                        <ColumnReference Column="Union1163" />
                      </OutputList>
                      <MemoryFractions Input="0" Output="0" />
                      <Sort Distinct="false">
                        <OrderBy>
                          <OrderByColumn Ascending="true">
                            <ColumnReference Column="Union1163" />
                          </OrderByColumn>
                        </OrderBy>
                        <RelOp AvgRowSize="27" EstimateCPU="4.0157E-05" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="40" LogicalOp="Constant Scan" NodeId="85" Parallel="false" PhysicalOp="Constant Scan" EstimatedTotalSubtreeCost="4.0157E-05">
                          <OutputList>
                            <ColumnReference Column="Union1162" />
                            <ColumnReference Column="Union1163" />
                          </OutputList>
                          <ConstantScan>
                            <Values>
                              <Row>
                                <ScalarOperator ScalarString="newid()">
                                  <Intrinsic FunctionName="newid" />
                                </ScalarOperator>
                                <ScalarOperator ScalarString="(1)">
                                  <Const ConstValue="(1)" />
                                </ScalarOperator>
                              </Row>
                              <Row>
                                <ScalarOperator ScalarString="newid()">
                                  <Intrinsic FunctionName="newid" />
                                </ScalarOperator>
                                <ScalarOperator ScalarString="(2)">
                                  <Const ConstValue="(2)" />
                                </ScalarOperator>
                              </Row>
                            </Values>
                          </ConstantScan>
                        </RelOp>
                      </Sort>
                    </RelOp>
                  </Spool>
                </RelOp>
              </NestedLoops>
            </RelOp>
          </QueryPlan>
        </StmtSimple>
      </Statements>
    </Batch>
  </BatchSequence>
</ShowPlanXML>'
);

选项 2 - 远程扫描

通过增加查询费用并引入远程扫描，结果得以实现。

with cte(guid,other) as (
  select *
  from OPENQUERY([TESTSQL\V2012], '
  select newid(),1 union all
  select newid(),2 union all
  select newid(),3') x)
select a.guid, a.other, b.guid guidb, b.other otherb
from cte a
cross join cte b
order by a.other, b.other;

wBob · Answer 3 · 2012-10-28T02:08:04+08:00

说真的，你不能从头开始切割 xml 执行计划。使用 SSIS 创建它们是科幻小说。是的，这都是 XML，但它们来自不同的宇宙。查看 Paul 关于该主题的博客，他说“SSIS 允许的方式很多......”所以您可能误解了？我不认为他是在说“使用 SSIS 来创建计划”，而是“能够使用像SSIS 这样的拖放界面来创建计划不是很好吗”。也许，对于一个非常简单的查询，您可以管理它，但这是一个延伸，甚至可能是浪费时间。你可能会说工作很忙。

如果我正在为 USE PLAN 提示或计划指南创建计划，我有几种方法。例如，我可能会从表中删除记录（例如，在数据库的副本上）以影响统计数据并鼓励优化器做出不同的决定。我还使用了表变量而不是查询中的所有表，因此优化器认为每个表都包含 1 条记录。然后在生成的计划中，将所有表变量替换为原来的表名，作为计划换入。另一种选择是使用 UPDATE STATISTICS 的 WITH STATS_STREAM 选项来欺骗统计信息，这是克隆数据库的仅统计信息副本时使用的方法，例如

UPDATE STATISTICS 
    [dbo].[yourTable]([PK_yourTable]) 
WITH 
    STATS_STREAM = 0x0100etc, 
    ROWCOUNT = 10000, 
    PAGECOUNT = 93

过去，我花了一些时间修改 xml 执行计划，我发现最后，SQL 只是“我没有使用它”，并以它想要的方式运行查询。

对于您的具体示例，我相信您知道您可以在查询中使用 set rowcount 3 或 TOP 3 来获得该结果，但我想这不是您的意思。正确的答案真的是：使用临时表。我会赞成 :) 不正确的答案是“花费数小时甚至数天来削减您自己的自定义 XML 执行计划，您试图诱使优化器为 CTE 做一个懒惰的假脱机，这可能无论如何都不起作用，看起来很聪明但也无法维持”。

不是试图在那里不具建设性，只是我的意见 - 希望有帮助。

wBob · Answer 4 · 2016-01-15T07:00:20+08:00

有什么办法吗...

最后在 SQL 2016 CTP 3.0 中有一种方法，有点:)

使用 Dmitry Pilugin在此处详述的跟踪标志和扩展事件，您可以（有些随意）从查询执行的中间阶段找出三个唯一的 guid。

注意：此代码不适用于 CTE 计划强制的生产或严肃使用，只是轻松查看新的跟踪标志和不同的做事方式：

-- Configure the XEvents session; with ring buffer target so we can collect it
CREATE EVENT SESSION [query_trace_column_values] ON SERVER 
ADD EVENT sqlserver.query_trace_column_values
ADD TARGET package0.ring_buffer( SET max_memory = 2048 )
WITH ( MAX_MEMORY = 4096 KB, EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY = 30 SECONDS, MAX_EVENT_SIZE = 0 KB, MEMORY_PARTITION_MODE = NONE, TRACK_CAUSALITY = OFF , STARTUP_STATE = OFF )
GO

-- Start the session
ALTER EVENT SESSION [query_trace_column_values] ON SERVER
STATE = START;
GO

-- Run the query, including traceflag
DBCC TRACEON(2486);
SET STATISTICS XML ON;
GO

-- Original query
;with cte(guid,other) as (
  select newid(),1 union all
  select newid(),2 union all
  select newid(),3)
select a.guid, a.other, b.guid guidb, b.other otherb
from cte a
cross join cte b
order by a.other, b.other
option ( recompile )
go

SET STATISTICS XML OFF;
DBCC TRACEOFF(2486);
GO

DECLARE @target_data XML

SELECT @target_data = CAST( target_data AS XML )
FROM sys.dm_xe_sessions AS s 
    INNER JOIN sys.dm_xe_session_targets AS t ON t.event_session_address = s.address
WHERE s.name = 'query_trace_column_values'


--SELECT @target_data td

-- Arbitrarily fish out 3 unique guids from intermediate stage of the query as collected by XEvent session
;WITH cte AS
(
SELECT
    n.c.value('(data[@name = "row_id"]/value/text())[1]', 'int') row_id,
    n.c.value('(data[@name = "column_value"]/value/text())[1]', 'char(36)') [guid]
FROM @target_data.nodes('//event[data[@name="column_id"]/value[. = 1]][data[@name="row_number"]/value[. < 4]][data[@name="node_name"]/value[. = "Nested Loops"]]') n(c)
)
SELECT *
FROM cte a
    CROSS JOIN cte b
GO

-- Stop the session
ALTER EVENT SESSION [query_trace_column_values] ON SERVER
STATE = STOP;
GO

-- Drop the session
IF EXISTS ( select * from sys.server_event_sessions where name = 'query_trace_column_values' )
DROP EVENT SESSION [query_trace_column_values] ON SERVER 
GO

在版本 (CTP3.2) - 13.0.900.73 (x64) 上测试，只是为了好玩。

wBob · Answer 5 · 2012-10-28T05:35:15+08:00

wBob

2012-10-28T05:35:15+08:002012-10-28T05:35:15+08:00

我发现 traceflag 8649（强制并行计划）在我的 2008、R2 和 2012 实例的左侧 guid 列中引发了这种行为。我不需要在 CTE 行为正确的 SQL 2005 上使用该标志。我尝试在更高的实例中使用 SQL 2005 中生成的计划，但它不会验证。

with cte(guid,other) as (
  select newid(),1 union all
  select newid(),2 union all
  select newid(),3)
select a.guid, a.other, b.guid guidb, b.other otherb
from cte a
cross join cte b
order by a.other, b.other
option ( querytraceon 8649 )

使用提示、使用包含提示的计划指南或使用查询生成的计划以及 USE PLAN 中的提示等都有效。 cte newid

1

创建计划指南以缓存（惰性假脱机）CTE 结果

选项 1 - 计划指南

选项 2 - 远程扫描

如何查看 Oracle 中的数据库列表？

mysql innodb_buffer_pool_size 应该有多大？

列出指定表的所有列

从 .frm 和 .ibd 文件恢复表？

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

如何选择每组的第一行？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

创建计划指南以缓存（惰性假脱机）CTE 结果

5 个回答

选项 1 - 计划指南

选项 2 - 远程扫描

相关问题