SQL Server - 使用聚集索引时如何存储数据页

Question

Martin Smith

Asked: 2016-09-16 08:50:52 +0800 CST2016-09-16 08:50:52 +0800 CST 2016-09-16 08:50:52 +0800 CST

为什么在这些计划中，（相同的）1000 次搜索在唯一索引上的估计成本不同？

772

在下面的查询中，两个执行计划估计在唯一索引上执行 1,000 次查找。

查找是由同一源表上的有序扫描驱动的，因此看起来应该最终以相同的顺序查找相同的值。

两个嵌套循环都有<NestedLoops Optimized="false" WithOrderedPrefetch="true">

有谁知道为什么这个任务在第一个计划中的成本是 0.172434 而在第二个计划中是 3.01702？

（这个问题的原因是第一个查询被建议给我作为优化，因为计划成本明显低得多。实际上在我看来它好像做了更多的工作但我只是试图解释差异.. .)

设置

CREATE TABLE dbo.Target(KeyCol int PRIMARY KEY, OtherCol char(32) NOT NULL);

CREATE TABLE dbo.Staging(KeyCol int PRIMARY KEY, OtherCol char(32) NOT NULL); 

INSERT INTO dbo.Target
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY @@SPID), LEFT(NEWID(),32)
FROM master..spt_values v1,  
     master..spt_values v2;

INSERT INTO dbo.Staging
SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY @@SPID), LEFT(NEWID(),32)
FROM master..spt_values v1;

查询 1 _{“粘贴计划”链接}

WITH T
     AS (SELECT *
         FROM   Target AS T
         WHERE  T.KeyCol IN (SELECT S.KeyCol
                             FROM   Staging AS S))
MERGE T
USING Staging S
ON ( T.KeyCol = S.KeyCol )
WHEN NOT MATCHED THEN
  INSERT ( KeyCol, OtherCol )
  VALUES(S.KeyCol, S.OtherCol )
WHEN MATCHED AND T.OtherCol > S.OtherCol THEN
  UPDATE SET T.OtherCol = S.OtherCol;

查询 2 _{“粘贴计划”链接}

MERGE Target T
USING Staging S
ON ( T.KeyCol = S.KeyCol )
WHEN NOT MATCHED THEN
  INSERT ( KeyCol, OtherCol )
  VALUES( S.KeyCol, S.OtherCol )
WHEN MATCHED AND T.OtherCol > S.OtherCol THEN
  UPDATE SET T.OtherCol = S.OtherCol;

查询 1

查询 2

以上是在 SQL Server 2014 (SP2) (KB3171021) - 12.0.5000.0 (X64) 上测试的

@Joe Obbish在评论中指出，更简单的复制是

SELECT *
FROM staging AS S 
  LEFT OUTER JOIN Target AS T 
    ON T.KeyCol = S.KeyCol;

对比

SELECT *
FROM staging AS S 
  LEFT OUTER JOIN (SELECT * FROM Target) AS T 
    ON T.KeyCol = S.KeyCol;

对于 1,000 行的临时表，以上两个仍然具有相同的计划形状，带有嵌套循环，并且没有派生表的计划看起来更便宜，但是对于 10,000 行的临时表和与上面相同的目标表，成本差异确实改变了计划形状（完整扫描和合并连接看起来比昂贵的搜索更具吸引力）显示这种成本差异可能会产生影响，而不仅仅是使比较计划变得更加困难。

4 个回答

Voted

Paul White · Answer 1 · 2016-09-18T00:41:25+08:00

有谁知道为什么这个任务在第一个计划中的成本是 0.172434 而在第二个计划中是 3.01702？

一般而言，嵌套循环连接下的内部搜索在假设随机 I/O 模式的情况下进行计算。后续访问有一个简单的基于替换的减少，考虑到所需页面已经被前一次迭代带入内存的可能性。此基本评估产生标准（较高）成本。

还有另一个成本计算输入，Smart Seek Costing，关于它的细节知之甚少。我的猜测（现阶段仅此而已）是 SSC 试图更详细地评估内部寻道 I/O 成本，可能通过考虑本地排序和/或要获取的值的范围。谁知道。

For example, the first seeking operation brings in not just the requested row, but all rows on that page (in index order). Given the overall access pattern, fetching the 1000 rows in 1000 seeks requires only 2 physical reads, even with read-ahead and prefetching disabled. From that perspective, the default I/O costing represents a significant overestimate, and the SSC-adjusted cost is closer to reality.

It seems reasonable to expect that SSC would be most effective where the loop drives an index seek more or less directly, and the join outer reference is the basis of the seeking operation. From what I can tell, SSC is always attempted for suitable physical operations, but most often produces no downward adjustment when the seek is separated from the join by other operations. Simple filters are one exception to this, perhaps because SQL Server can often push these into the data access operator. In any case, the optimizer has pretty deep support for selections.

It is unfortunate that the Compute Scalar for the subquery outer projections seems to interfere with SSC here. Compute Scalars are usually relocated above the join, but these ones have to stay where they are. Even so, most normal Compute Scalars are pretty transparent to optimization, so this is a bit surprising.

Regardless, when the physical operation PhyOp_Range is produced from a simple selection on an index SelIdxToRng, SSC is effective. When the more complex SelToIdxStrategy (selection on a table to an index strategy) is employed, the resulting PhyOp_Range runs SSC but results in no reduction. Again, it seems that simpler, more direct operations work best with SSC.

I wish I could tell you exactly what SSC does, and show the exact calculations, but I don't know those details. If you want to explore the limited trace output available for yourself, you can employ undocumented trace flag 2398. An example output is:

Smart seek costing (7.1) :: 1.34078e+154 , 0.001

That example relates to memo group 7, alternative 1, showing a cost upper bound, and a factor of 0.001. To see cleaner factors, be sure to rebuild the tables without parallelism so the pages are as dense as possible. Without doing that, the factor is more like 0.000821 for your example Target table. There are some fairly obvious relationships there, of course.

SSC can also be disabled with undocumented trace flag 2399. With that flag active, both costs are the higher value.

Mikael Eriksson · Answer 2 · 2016-09-17T03:44:43+08:00

不确定这是一个答案，但评论有点长。造成差异的原因纯粹是我的猜测，也许可以引起其他人的思考。

使用执行计划简化查询。

SELECT S.KeyCol, 
       S.OtherCol,
       T.*
FROM staging AS S 
  LEFT OUTER JOIN Target AS T 
    ON T.KeyCol = S.KeyCol;

SELECT S.KeyCol, 
       S.OtherCol,
       T.*
FROM staging AS S 
  LEFT OUTER JOIN (
                  SELECT *
                  FROM Target
                  ) AS T 
    ON T.KeyCol = S.KeyCol;

这些实际上可能导致相同执行计划的等效查询之间的主要区别在于计算标量运算符。我不知道为什么它必须在那里，但我想这是优化器可以优化派生表的范围。

我的猜测是，计算标量的存在是造成第二个查询的 IO 成本的原因。

从优化器内部：计划成本核算

第一行的 CPU 成本计算为 0.0001581，后续行的 CPU 成本计算为 0.000011。
...
0.003125 的 I/O 成本正好是 1/320 – 反映了模型假设磁盘子系统每秒可以执行 320 次随机 I/O 操作
...
成本计算组件足够智能，可以识别磁盘子系统的总数需要从磁盘调入的页面永远不能超过存储整个表所需的页面数。

在我的例子中，该表需要 5618 页，要从 1000000 行中获取 1000 行，估计所需的页数为 5.618，IO 成本为 0.015625。

两个查询接缝的 CPU 成本相同，0.0001581 * 1000 executions = 0.1581.

因此，根据上面链接的文章，我们可以计算出第一次查询的成本为 0.173725。

假设我关于计算标量如何使 IO 成本变得一团糟的说法是正确的，它可以计算为 3.2831。

不完全是计划中显示的内容，但它就在附近。

Steven Hibble · Answer 3 · 2016-09-20T11:04:14+08:00

(This would be better as a comment to Paul's answer, but I don't have enough rep yet.)

I wanted to provide the list of trace flags (and a couple DBCC statements) I used to come to a near-conclusion, in case it will be helpful to investigate similar discrepancies in the future. All of these should not be used on production.

First, I had a look at the Final Memo to see what physical operators were being used. They certainly look the same according to the graphical execution plans. So, I used trace flags 3604 and 8615, the first directs output to the client and the second reveals the Final Memo:

SELECT S.*, T.KeyCol
FROM Staging AS S
      LEFT OUTER JOIN Target AS T
       ON T.KeyCol = S.KeyCol
OPTION(QUERYTRACEON 3604, -- Output client info
       QUERYTRACEON 8615, -- Shows Final Memo structure
       RECOMPILE);

Tracing back from the Root Group, I found these nearly identical PhyOp_Range operators:

PhyOp_Range 1 ASC 2.0 Cost(RowGoal 0,ReW 0,ReB 999,Dist 1000,Total 1000)= 0.175559(Distance = 2)
PhyOp_Range 1 ASC 3.0 Cost(RowGoal 0,ReW 0,ReB 999,Dist 1000,Total 1000)= 3.01702(Distance = 2)

The only obvious difference to me was the 2.0 and 3.0, which refer to their respective "memo group 2, original" and "memo group 3, original". Checking the memo, these refer to the same thing - so no differences revealed yet.

Second, I looked into a whole mess of trace flags that proved fruitless to me - but have some interesting content. I lifted most from Benjamin Nevarez. I was looking for clues as to optimization rules that were applied in one case and not the other.

 SELECT S.*, T.KeyCol
 FROM Staging AS S
      LEFT OUTER JOIN Target AS T
        ON T.KeyCol = S.KeyCol
 OPTION (QUERYTRACEON 3604, -- Output info to client
         QUERYTRACEON 2363, -- Show stats and cardinality info
         QUERYTRACEON 8675, -- Show optimization process info
         QUERYTRACEON 8606, -- Show logical query trees
         QUERYTRACEON 8607, -- Show physical query tree
         QUERYTRACEON 2372, -- Show memory utilization info for optimization stages 
         QUERYTRACEON 2373, -- Show memory utilization info for applying rules
         RECOMPILE );

Third, I looked at which rules were applied for our PhyOp_Ranges that look so similar. I used a couple trace flags mentioned by Paul in a blog post.

SELECT S.*, T.KeyCol
FROM Staging AS S
      LEFT OUTER JOIN (SELECT KeyCol
                      FROM Target) AS T
       ON T.KeyCol = S.KeyCol
OPTION (QUERYTRACEON 3604, -- Output info to client
        QUERYTRACEON 8619, -- Show applied optimization rules
        QUERYTRACEON 8620, -- Show rule-to-memo info
        QUERYTRACEON 8621, -- Show resulting tree
        QUERYTRACEON 2398, -- Show "smart seek costing"
        RECOMPILE );

从输出中，我们看到直接JOIN应用此规则来获取我们的PhyOp_Range运算符：Rule Result: group=7 2 <SelIdxToRng>PhyOp_Range 1 ASC 2 (Distance = 2)。subselect 改为应用此规则：Rule Result: group=9 2 <SelToIdxStrategy>PhyOp_Range 1 ASC 3 (Distance = 2)。这也是您看到与每个规则关联的“智能搜索成本核算”信息的地方。对于直接JOIN- 这是输出（对我来说）Smart seek costing (7.2) :: 1.34078e+154 , 0.001：. 对于子选择，这是输出：Smart seek costing (9.2) :: 1.34078e+154 , 1。

最后，我无法得出太多结论 - 但保罗的回答弥补了大部分差距。我想查看有关智能搜索成本核算的更多信息。

Hannah Vernon · Answer 4 · 2016-09-17T05:58:47+08:00

这也不是真正的答案 - 正如 Mikael 指出的那样，很难在评论中讨论这个问题......

有趣的是，如果将子查询(select KeyCol FROM Target)转换为内联 TVF，您会看到计划及其成本与简单的原始查询相同：

CREATE FUNCTION dbo.cs_test()
RETURNS TABLE
WITH SCHEMABINDING
AS 
RETURN (
    SELECT KeyCol FROM dbo.Target
    );

/* "normal" variant */
SELECT S.KeyCol, s.OtherCol, T.KeyCol 
FROM staging AS S 
    LEFT OUTER JOIN Target AS T ON T.KeyCol = S.KeyCol;

/* "subquery" variant */
SELECT S.KeyCol, s.OtherCol, T.KeyCol 
FROM staging AS S 
    LEFT OUTER JOIN (SELECT KeyCol FROM Target) AS T ON T.KeyCol = S.KeyCol;

/* "inline-TVF" variant */
SELECT S.KeyCol, s.OtherCol, T.KeyCol 
FROM staging AS S 
    LEFT OUTER JOIN dbo.cs_test() t ON s.KeyCol = t.Keycol

查询计划（pastetheplan 链接）：

推论使我相信成本核算引擎对这种类型的子查询可能产生的潜在影响感到困惑。

例如，以下内容：

SELECT S.KeyCol, s.OtherCol, T.KeyCol 
FROM staging AS S 
    LEFT OUTER JOIN (
        SELECT KeyCol = CHECKSUM(NEWID()) 
        FROM Target
        ) AS T ON T.KeyCol = S.KeyCol;

你会怎么花这个钱？查询优化器选择一个与上面的“子查询”变体非常相似的计划，包含一个计算标量（pastetheplan.com 链接）：

计算标量与上面显示的“子查询”变体具有完全不同的成本，但是它仍然只是一个猜测，因为查询优化器无法先验地知道返回的行数可能是多少。该计划对左外连接使用散列匹配，因为行估计是不可知的，因此设置为目标表中的行数。

除了我同意 Mikael 在他的回答中所做的工作之外，我没有得出很好的结论，并且希望其他人能提出更好的答案。

为什么在这些计划中，（相同的）1000 次搜索在唯一索引上的估计成本不同？

设置

查询 1 _{“粘贴计划”链接}

查询 2 _{“粘贴计划”链接}

查询 1

查询 2

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

为什么在这些计划中，（相同的）1000 次搜索在唯一索引上的估计成本不同？

设置

查询 1 “粘贴计划”链接

查询 2 “粘贴计划”链接

查询 1

查询 2

4 个回答

相关问题

查询 1 _{“粘贴计划”链接}

查询 2 _{“粘贴计划”链接}