SQL Server - 使用聚集索引时如何存储数据页

Question

Paul White

Asked: 2014-12-18 06:33:48 +0800 CST2014-12-18 06:33:48 +0800 CST 2014-12-18 06:33:48 +0800 CST

每组检索 n 行

772

我经常需要从结果集中的每个组中选择一些行。

例如，我可能想列出每个客户最近的“n”个最高或最低的订单值。

在更复杂的情况下，要列出的行数可能因组而异（由分组/父记录的属性定义）。这部分绝对是可选的/额外的功劳，并不是为了阻止人们回答。

在 SQL Server 2005 及更高版本中解决这些类型问题的主要选项是什么？每种方法的主要优点和缺点是什么？

AdventureWorks 示例（为清楚起见，可选）

列出表中最近的五个交易日期和 ID TransactionHistory，每个产品以字母从 M 到 R（含）开头。
再次相同，但n每个产品都有历史行，其中是Product 属性n的五倍。DaysToManufacture
同样，对于每个产品都需要一个历史行的特殊情况（最近的单个条目TransactionDate， tie-break on TransactionID.

6 个回答

Voted

Rob Farley · Answer 1 · 2014-12-23T03:28:40+08:00

Let's start with the basic scenario.

If I want to get some number of rows out of a table, I have two main options: ranking functions; or TOP.

First, let's consider the whole set from Production.TransactionHistory for a particular ProductID:

SELECT h.TransactionID, h.ProductID, h.TransactionDate
FROM Production.TransactionHistory h
WHERE h.ProductID = 800;

This returns 418 rows, and the plan shows that it checks every row in the table looking for this - an unrestricted Clustered Index Scan, with a Predicate to provide the filter. 797 reads here, which is ugly.

使用“残差”谓词进行昂贵的扫描

So let's be fair to it, and create an index that would be more useful. Our conditions call for an equality match on ProductID, followed by a search for the most recent by TransactionDate. We need the TransactionID returned too, so let's go with: CREATE INDEX ix_FindingMostRecent ON Production.TransactionHistory (ProductID, TransactionDate) INCLUDE (TransactionID);.

Having done this, our plan changes significantly, and drops the reads down to just 3. So we're already improving things by over 250x or so...

改进计划

Now that we've levelled the playing field, let's look at the top options - ranking functions and TOP.

WITH Numbered AS
(
SELECT h.TransactionID, h.ProductID, h.TransactionDate, ROW_NUMBER() OVER (ORDER BY TransactionDate DESC) AS RowNum
FROM Production.TransactionHistory h
WHERE h.ProductID = 800
)
SELECT TransactionID, ProductID, TransactionDate
FROM Numbered
WHERE RowNum <= 5;

SELECT TOP (5) h.TransactionID, h.ProductID, h.TransactionDate
FROM Production.TransactionHistory h
WHERE h.ProductID = 800
ORDER BY TransactionDate DESC;

$两个计划 - 基本 TOP\RowNum$

You will notice that the second (TOP) query is much simpler than the first, both in query and in plan. But very significantly, they both use TOP to limit the number of rows actually being pulled out of the index. The costs are only estimates and worth ignoring, but you can see a lot of similarity in the two plans, with the ROW_NUMBER() version doing a tiny amount of extra work to assign numbers and filter accordingly, and both queries end up doing just 2 reads to do their work. The Query Optimizer certainly recognises the idea of filtering on a ROW_NUMBER() field, realising that it can use a Top operator to ignore rows that aren't going to be needed. Both these queries are good enough - TOP isn't so much better that it's worth changing code, but it is simpler and probably clearer for beginners.

So this work across a single product. But we need to consider what happens if we need to do this across multiple products.

The iterative programmer is going to consider the idea of looping through the products of interest, and calling this query multiple times, and we can actually get away with writing a query in this form - not using cursors, but using APPLY. I'm using OUTER APPLY, figuring that we might want to return the Product with NULL, if there are no Transactions for it.

SELECT p.Name, p.ProductID, t.TransactionID, t.TransactionDate
FROM 
Production.Product p
OUTER APPLY (
    SELECT TOP (5) h.TransactionID, h.ProductID, h.TransactionDate
    FROM Production.TransactionHistory h
    WHERE h.ProductID = p.ProductID
    ORDER BY TransactionDate DESC
) t
WHERE p.Name >= 'M' AND p.Name < 'S';

The plan for this is the iterative programmers' method - Nested Loop, doing a Top operation and Seek (those 2 reads we had before) for each Product. This gives 4 reads against Product, and 360 against TransactionHistory.

申请计划

Using ROW_NUMBER(), the method is to use PARTITION BY in the OVER clause, so that we restart the numbering for each Product. This can then be filtered like before. The plan ends up being quite different. The logical reads are about 15% lower on TransactionHistory, with a full Index Scan going on to get the rows out.

WITH Numbered AS
(
SELECT p.Name, p.ProductID, h.TransactionID, h.TransactionDate, ROW_NUMBER() OVER (PARTITION BY h.ProductID ORDER BY h.TransactionDate DESC) AS RowNum
FROM Production.Product p
LEFT JOIN Production.TransactionHistory h ON h.ProductID = p.ProductID
WHERE p.Name >= 'M' AND p.Name < 'S'
)
SELECT Name, ProductID, TransactionID, TransactionDate
FROM Numbered n
WHERE RowNum <= 5;

ROW_NUMBER 计划

Significantly, though, this plan has an expensive Sort operator. The Merge Join doesn't seem to maintain the order of rows in TransactionHistory, the data must be resorted to be able to find the rownumbers. It's fewer reads, but this blocking Sort could feel painful. Using APPLY, the Nested Loop will return the first rows very quickly, after just a few reads, but with a Sort, ROW_NUMBER() will only return rows after a most of the work has been finished.

Interestingly, if the ROW_NUMBER() query uses INNER JOIN instead of LEFT JOIN, then a different plan comes up.

ROW_NUMBER() 与 INNER JOIN

This plan uses a Nested Loop, just like with APPLY. But there's no Top operator, so it pulls all the transactions for each product, and uses a lot more reads than before - 492 reads against TransactionHistory. There isn't a good reason for it not to choose the Merge Join option here, so I guess the plan was considered 'Good Enough'. Still - it doesn't block, which is nice - just not as nice as APPLY.

The PARTITION BY column that I used for ROW_NUMBER() was h.ProductID in both cases, because I had wanted to give the QO the option of producing the RowNum value before joining to the Product table. If I use p.ProductID, we see the same shape plan as with the INNER JOIN variation.

WITH Numbered AS
(
SELECT p.Name, p.ProductID, h.TransactionID, h.TransactionDate, ROW_NUMBER() OVER (PARTITION BY p.ProductID ORDER BY h.TransactionDate DESC) AS RowNum
FROM Production.Product p
LEFT JOIN Production.TransactionHistory h ON h.ProductID = p.ProductID
WHERE p.Name >= 'M' AND p.Name < 'S'
)
SELECT Name, ProductID, TransactionID, TransactionDate
FROM Numbered n
WHERE RowNum <= 5;

But the Join operator says 'Left Outer Join' instead of 'Inner Join'. The number of reads is still just under 500 reads against the TransactionHistory table.

在 p.ProductID 而不是 h.ProductID 上的 PARTITION BY

Anyway - back to the question at hand...

We've answered question 1, with two options that you could pick and choose from. Personally, I like the APPLY option.

To extend this to use a variable number (question 2), the 5 just needs to be changed accordingly. Oh, and I added another index, so that there was an index on Production.Product.Name that included the DaysToManufacture column.

WITH Numbered AS
(
SELECT p.Name, p.ProductID, p.DaysToManufacture, h.TransactionID, h.TransactionDate, ROW_NUMBER() OVER (PARTITION BY h.ProductID ORDER BY h.TransactionDate DESC) AS RowNum
FROM Production.Product p
LEFT JOIN Production.TransactionHistory h ON h.ProductID = p.ProductID
WHERE p.Name >= 'M' AND p.Name < 'S'
)
SELECT Name, ProductID, TransactionID, TransactionDate
FROM Numbered n
WHERE RowNum <= 5 * DaysToManufacture;

SELECT p.Name, p.ProductID, t.TransactionID, t.TransactionDate
FROM 
Production.Product p
OUTER APPLY (
    SELECT TOP (5 * p.DaysToManufacture) h.TransactionID, h.ProductID, h.TransactionDate
    FROM Production.TransactionHistory h
    WHERE h.ProductID = p.ProductID
    ORDER BY TransactionDate DESC
) t
WHERE p.Name >= 'M' AND p.Name < 'S';

And both plans are almost identical to what they were before!

可变行

Again, ignore the estimated costs - but I still like the TOP scenario, as it is so much more simple, and the plan has no blocking operator. The reads are less on TransactionHistory because of the high number of zeroes in DaysToManufacture, but in real life, I doubt we'd be picking that column. ;)

One way to avoid the block is to come up with a plan that handles the ROW_NUMBER() bit to the right (in the plan) of the join. We can persuade this to happen by doing the join outside the CTE.

WITH Numbered AS
(
SELECT h.TransactionID, h.ProductID, h.TransactionDate, ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY TransactionDate DESC) AS RowNum
FROM Production.TransactionHistory h
)
SELECT p.Name, p.ProductID, t.TransactionID, t.TransactionDate
FROM Production.Product p
LEFT JOIN Numbered t ON t.ProductID = p.ProductID
    AND t.RowNum <= 5 * p.DaysToManufacture
WHERE p.Name >= 'M' AND p.Name < 'S';

The plan here looks simpler - it's not blocking, but there's a hidden danger.

加入 CTE 之外

Notice the Compute Scalar that's pulling data from the Product table. This is working out the 5 * p.DaysToManufacture value. This value isn't being passed into the branch that's pulling data from the TransactionHistory table, it's being used in the Merge Join. As a Residual.

偷偷残留！

So the Merge Join is consuming ALL the rows, not just the first however-many-are-needed, but all of them and then doing a residual check. This is dangerous as the number of transactions increases. I'm not a fan of this scenario - residual predicates in Merge Joins can quickly escalate. Another reason why I prefer the APPLY/TOP scenario.

In the special case where it's exactly one row, for question 3, we can obviously use the same queries, but with 1 instead of 5. But then we have an extra option, which is to use regular aggregates.

SELECT ProductID, MAX(TransactionDate)
FROM Production.TransactionHistory
GROUP BY ProductID;

A query like this would be a useful start, and we could easily modify it to pull out the TransactionID as well for tie-break purposes (using a concatenation which would then be broken down), but we either look at the whole index, or we dive in product by product, and we don't really get a big improvement on what we had before in this scenario.

But I should point out that we're looking at a particular scenario here. With real data, and with an indexing strategy that may not be ideal, mileage may vary considerably. Despite the fact that we've seen that APPLY is strong here, it can be slower in some situations. It rarely blocks though, as it has a tendency to use Nested Loops, which many people (myself included) find very appealing.

I haven't tried to explore parallelism here, or dived very hard into question 3, which I see as a special case that people rarely want based on the complication of concatenating and splitting. The main thing to consider here is that these two options are both very strong.

I prefer APPLY. It's clear, it uses the Top operator well, and it rarely causes blocking.

Aaron Bertrand · Answer 2 · 2014-12-18T06:55:41+08:00

在 SQL Server 2005 及更高版本中执行此操作的典型方法是使用 CTE 和窗口函数。对于每个组的前 n 个，您可以简单地使用ROW_NUMBER()一个PARTITION子句，并在外部查询中对其进行过滤。因此，例如，每个客户的前 5 个最近订单可以这样显示：

DECLARE @top INT;
SET @top = 5;

;WITH grp AS 
(
   SELECT CustomerID, OrderID, OrderDate,
     rn = ROW_NUMBER() OVER
     (PARTITION BY CustomerID ORDER BY OrderDate DESC)
   FROM dbo.Orders
)
SELECT CustomerID, OrderID, OrderDate
  FROM grp
  WHERE rn <= @top
  ORDER BY CustomerID, OrderDate DESC;

你也可以这样做CROSS APPLY：

DECLARE @top INT;
SET @top = 5;

SELECT c.CustomerID, o.OrderID, o.OrderDate
FROM dbo.Customers AS c
CROSS APPLY 
(
    SELECT TOP (@top) OrderID, OrderDate 
    FROM dbo.Orders AS o
    WHERE CustomerID = c.CustomerID
    ORDER BY OrderDate DESC
) AS o
ORDER BY c.CustomerID, o.OrderDate DESC;

使用 Paul 指定的附加选项，假设 Customers 表有一列指示每个客户要包含多少行：

;WITH grp AS 
(
   SELECT CustomerID, OrderID, OrderDate,
     rn = ROW_NUMBER() OVER
     (PARTITION BY CustomerID ORDER BY OrderDate DESC)
   FROM dbo.Orders
)
SELECT c.CustomerID, grp.OrderID, grp.OrderDate
  FROM grp 
  INNER JOIN dbo.Customers AS c
  ON grp.CustomerID = c.CustomerID
  AND grp.rn <= c.Number_of_Recent_Orders_to_Show
  ORDER BY c.CustomerID, grp.OrderDate DESC;

再次，使用CROSS APPLY并合并添加的选项，即客户的行数由客户表中的某些列决定：

SELECT c.CustomerID, o.OrderID, o.OrderDate
FROM dbo.Customers AS c
CROSS APPLY 
(
    SELECT TOP (c.Number_of_Recent_Orders_to_Show) OrderID, OrderDate 
    FROM dbo.Orders AS o
    WHERE CustomerID = c.CustomerID
    ORDER BY OrderDate DESC
) AS o
ORDER BY c.CustomerID, o.OrderDate DESC;

请注意，这些将根据数据分布和支持索引的可用性而有所不同，因此优化性能和获得最佳计划实际上取决于本地因素。

就个人而言，我更喜欢 CTE 和窗口解决方案而不是CROSS APPLY/TOP因为它们更好地分离逻辑并且更直观（对我而言）。一般来说（在这种情况下和我的一般经验中），CTE 方法会产生更有效的计划（下面的示例），但这不应被视为普遍真理 - 您应该始终测试您的场景，特别是如果索引已更改或数据出现明显偏差。

AdventureWorks 示例 - 没有任何更改

列出表中最近的五个交易日期和 ID TransactionHistory，每个产品以字母从 M 到 R（含）开头。

-- CTE / OVER()

;WITH History AS
(
  SELECT p.ProductID, p.Name, t.TransactionID, t.TransactionDate,
    rn = ROW_NUMBER() OVER 
    (PARTITION BY t.ProductID ORDER BY t.TransactionDate DESC)
  FROM Production.Product AS p
  INNER JOIN Production.TransactionHistory AS t
  ON p.ProductID = t.ProductID
  WHERE p.Name >= N'M' AND p.Name < N'S'
)
SELECT ProductID, Name, TransactionID, TransactionDate
FROM History 
WHERE rn <= 5;

-- CROSS APPLY

SELECT p.ProductID, p.Name, t.TransactionID, t.TransactionDate
FROM Production.Product AS p
CROSS APPLY
(
  SELECT TOP (5) TransactionID, TransactionDate
  FROM Production.TransactionHistory
  WHERE ProductID = p.ProductID
  ORDER BY TransactionDate DESC
) AS t
WHERE p.Name >= N'M' AND p.Name < N'S';

这两者在运行时指标上的比较：

在此处输入图像描述

CTE/OVER()计划：

在此处输入图像描述

CROSS APPLY计划：

在此处输入图像描述

CTE 计划看起来更复杂，但实际上效率更高。很少关注估计的成本百分比数字，而是关注更重要的实际观察，例如更少的读取和更短的持续时间。我也在没有并行性的情况下运行这些，这没有区别。运行时指标和 CTE 计划（CROSS APPLY计划保持不变）：

在此处输入图像描述

再次相同，但n每个产品都有历史行，其中是Product 属性n的五倍。DaysToManufacture

这里需要非常小的更改。对于 CTE，我们可以在内部查询中添加一列，并在外部查询上进行过滤；对于CROSS APPLY，我们可以在相关性里面进行计算TOP。你会认为这会给CROSS APPLY解决方案带来一些效率，但在这种情况下不会发生这种情况。查询：

-- CTE / OVER()

;WITH History AS
(
  SELECT p.ProductID, p.Name, p.DaysToManufacture, t.TransactionID, t.TransactionDate,
    rn = ROW_NUMBER() OVER 
    (PARTITION BY t.ProductID ORDER BY t.TransactionDate DESC)
  FROM Production.Product AS p
  INNER JOIN Production.TransactionHistory AS t
  ON p.ProductID = t.ProductID
  WHERE p.Name >= N'M' AND p.Name < N'S'
)
SELECT ProductID, Name, TransactionID, TransactionDate
FROM History 
WHERE rn <= (5 * DaysToManufacture);

-- CROSS APPLY

SELECT p.ProductID, p.Name, t.TransactionID, t.TransactionDate
FROM Production.Product AS p
CROSS APPLY
(
  SELECT TOP (5 * p.DaysToManufacture) TransactionID, TransactionDate
  FROM Production.TransactionHistory
  WHERE ProductID = p.ProductID
  ORDER BY TransactionDate DESC
) AS t
WHERE p.Name >= N'M' AND p.Name < N'S';

运行时结果：

在此处输入图像描述

并行 CTE/OVER()计划：

在此处输入图像描述

单线程 CTE/OVER()计划：

在此处输入图像描述

CROSS APPLY计划：

在此处输入图像描述

同样，对于每个产品都需要一个历史行的特殊情况（最近的单个条目TransactionDate， tie-break on TransactionID.

再次，这里的小改动。在 CTE 解决方案中，我们添加TransactionID到OVER()子句，并将外部过滤器更改为rn = 1. 对于CROSS APPLY，我们将其更改TOP为TOP (1)，并添加TransactionID到内部ORDER BY。

-- CTE / OVER()

;WITH History AS
(
  SELECT p.ProductID, p.Name, t.TransactionID, t.TransactionDate,
    rn = ROW_NUMBER() OVER 
    (PARTITION BY t.ProductID ORDER BY t.TransactionDate DESC, TransactionID DESC)
  FROM Production.Product AS p
  INNER JOIN Production.TransactionHistory AS t
  ON p.ProductID = t.ProductID
  WHERE p.Name >= N'M' AND p.Name < N'S'
)
SELECT ProductID, Name, TransactionID, TransactionDate
FROM History 
WHERE rn = 1;

-- CROSS APPLY

SELECT p.ProductID, p.Name, t.TransactionID, t.TransactionDate
FROM Production.Product AS p
CROSS APPLY
(
  SELECT TOP (1) TransactionID, TransactionDate
  FROM Production.TransactionHistory
  WHERE ProductID = p.ProductID
  ORDER BY TransactionDate DESC, TransactionID DESC
) AS t
WHERE p.Name >= N'M' AND p.Name < N'S';

运行时结果：

在此处输入图像描述

并行 CTE/OVER()计划：

在此处输入图像描述

单线程 CTE / OVER() 计划：

在此处输入图像描述

CROSS APPLY计划：

在此处输入图像描述

开窗函数并不总是最好的选择（试一试COUNT(*) OVER()），这并不是解决每组 n 行问题的唯一两种方法，但在这种特定情况下 - 给定架构、现有索引和数据分布 -在所有有意义的账户中，CTE 的表现都更好。

AdventureWorks 示例 - 可以灵活地添加索引

但是，如果您添加一个支持索引，类似于Paul 在评论中提到的索引，但第 2 列和第 3 列是有序的DESC：

CREATE UNIQUE NONCLUSTERED INDEX UQ3 ON Production.TransactionHistory 
  (ProductID, TransactionDate DESC, TransactionID DESC);

实际上，您会得到更有利的计划，并且CROSS APPLY在所有三种情况下，指标都会翻转以支持该方法：

在此处输入图像描述

如果这是我的生产环境，我可能会对这种情况下的持续时间感到满意，并且不会费心进一步优化。

这在不支持APPLYorOVER()子句的 SQL Server 2000 中更加丑陋。

ypercubeᵀᴹ · Answer 3 · 2014-12-18T07:47:27+08:00

In DBMS, like MySQL, that do not have window functions or CROSS APPLY, the way to do this would be to use standard SQL (89). The slow way would be a triangular cross join with aggregate. The faster way (but still and probably not as efficient as using cross apply or the row_number function) would be what I call the "poor man's CROSS APPLY". It would be interesting to compare this query with the others:

Assumption: Orders (CustomerID, OrderDate) has a UNIQUE constraint:

DECLARE @top INT;
SET @top = 5;

SELECT o.CustomerID, o.OrderID, o.OrderDate
  FROM dbo.Customers AS c
    JOIN dbo.Orders AS o
      ON  o.CustomerID = c.CustomerID
      AND o.OrderID IN
          ( SELECT TOP (@top) oi.OrderID
            FROM dbo.Orders AS oi
            WHERE oi.CustomerID = c.CustomerID
            ORDER BY oi.OrderDate DESC
          )
  ORDER BY CustomerID, OrderDate DESC ;

For the extra problem of customized top rows per group:

SELECT o.CustomerID, o.OrderID, o.OrderDate
  FROM dbo.Customers AS c
    JOIN dbo.Orders AS o
      ON  o.CustomerID = c.CustomerID
      AND o.OrderID IN
          ( SELECT TOP (c.Number_of_Recent_Orders_to_Show) oi.OrderID
            FROM dbo.Orders AS oi
            WHERE oi.CustomerID = c.CustomerID
            ORDER BY oi.OrderDate DESC
          )
  ORDER BY CustomerID, OrderDate DESC ;

Note: In MySQL, instead of AND o.OrderID IN (SELECT TOP(@top) oi.OrderID ...) one would use AND o.OrderDate >= (SELECT oi.OrderDate ... LIMIT 1 OFFSET (@top - 1)). SQL-Server added FETCH / OFFSET syntax in 2012 version. The queries here were adjusted with IN (TOP...) to work with earlier versions.

Solomon Rutzky · Answer 4 · 2014-12-26T01:39:48+08:00

I took a slightly different approach, mainly to see how this technique would compare to the others, because having options is good, right?

The Testing

Why don't we start by just looking at how the various methods stacked up against each other. I did three sets of tests:

The first set ran with no DB modifications
The second set ran after an index was created to support TransactionDate-based queries against Production.TransactionHistory.
The third set made a slightly different assumption. Since all three tests ran against the same list of Products, what if we cached that list? My method uses an in-memory cache while the other methods used an equivalent temp table. The supporting index created for the second set of tests still exists for this set of tests.

Additional test details:

The tests were run against AdventureWorks2012 on SQL Server 2012, SP2 (Developer Edition).
For each test I labeled whose answer I took the query from and which particular query it was.
I used the "Discard results after execution" option of Query Options | Results.
Please note that for the first two sets of tests, the RowCounts appear to be "off" for my method. This is due my method being a manual implementation of what CROSS APPLY is doing: it runs the initial query against Production.Product and gets 161 rows back, which it then uses for the queries against Production.TransactionHistory. Hence, the RowCount values for my entries are always 161 more than the other entries. In the third set of tests (with caching) the row counts are the same for all methods.
I used SQL Server Profiler to capture the stats instead of relying on the execution plans. Aaron and Mikael already did a great job showing the plans for their queries and there is no need to reproduce that information. And the intent of my method is to reduce the queries to such a simple form that it wouldn't really matter. There is an additional reason for using Profiler, but that will be mentioned later.
Rather than using the Name >= N'M' AND Name < N'S' construct, I chose to use Name LIKE N'[M-R]%', and SQL Server treats them the same.

The Results

No Supporting Index

This is essentially out-of-the-box AdventureWorks2012. In all cases my method is clearly better than some of the other, but never as good as the top 1 or 2 methods.

Test 1 测试 1 结果-无索引
Aaron's CTE is clearly the winner here.

Test 2 测试 2 结果 - 无索引
Aaron's CTE (again) and Mikael's second apply row_number() method is a close second.

Test 3 测试 3 结果-无索引
Aaron's CTE (again) is the winner.

Conclusion
When there is no supporting index on TransactionDate, my method is better than doing a standard CROSS APPLY, but still, using the CTE method is clearly the way to go.

With Supporting Index (no Caching)

For this set of tests I added the obvious index on TransactionHistory.TransactionDate since all of the queries sort on that field. I say "obvious" since most other answers also agree on this point. And since the queries are all wanting the most recent dates, the TransactionDate field should be ordered DESC, so I just grabbed the CREATE INDEX statement at the bottom of Mikael's answer and added an explicit FILLFACTOR:

CREATE INDEX [IX_TransactionHistoryX]
    ON Production.TransactionHistory (ProductID ASC, TransactionDate DESC)
    WITH (FILLFACTOR = 100);

Once this index is in place, the results change quite a bit.

Test 1 测试 1 结果-带支持指数
This time it is my method that comes out ahead, at least in terms of Logical Reads. The CROSS APPLY method, previously the worst performer for Test 1, wins on Duration and even beats the CTE method on Logical Reads.

Test 2 Test 2 Results-with supporting index
This time it is Mikael's first apply row_number() method that is the winner when looking at Reads, whereas previously it was one of the worst performers. And now my method comes in at a very close second place when looking at Reads. In fact, outside of the CTE method, the rest are all fairly close in terms of Reads.

Test 3 Test 3 Results-with supporting index
Here the CTE is still the winner, but now the difference between the other methods is barely noticeable compared to the drastic difference that existed prior to creating the index.

Conclusion
The applicability of my method is more apparent now, though it is less resilient to not having proper indexes in place.

With Supporting Index AND Caching

For this set of tests I made use of caching because, well, why not? My method allows for using in-memory caching that the other methods cannot access. So to be fair, I created the following temp table that was used in place of Product.Product for all references in those other methods across all three tests. The DaysToManufacture field is only used in Test Number 2, but it was easier to be consistent across the SQL scripts to use the same table and it didn't hurt to have it there.

CREATE TABLE #Products
(
    ProductID INT NOT NULL PRIMARY KEY,
    Name NVARCHAR(50) NOT NULL,
    DaysToManufacture INT NOT NULL
);

INSERT INTO #Products (ProductID, Name, DaysToManufacture)
    SELECT  p.ProductID, p.Name, p.DaysToManufacture
    FROM    Production.Product p
    WHERE   p.Name >= N'M' AND p.Name < N'S'
    AND    EXISTS (
                    SELECT  *
                    FROM    Production.TransactionHistory th
                    WHERE   th.ProductID = p.ProductID
                );

ALTER TABLE #Products REBUILD WITH (FILLFACTOR = 100);

Test 1 Test 1 Results-with supporting index AND caching
All methods seem to benefit equally from caching, and my method still comes out ahead.

Test 2 Test 2 Results-with supporting index AND caching
Here we now see a difference in the lineup as my method comes out barely ahead, only 2 Reads better than Mikael's first apply row_number() method, whereas without the caching my method was behind by 4 Reads.

Test 3 Test 3 Results-with supporting index AND caching
Please see update towards the bottom (below the line). Here we again see some difference. The "parameterized" flavor of my method is now barely in the lead by 2 Reads compared to Aaron's CROSS APPLY method (with no caching they were equal). But the really strange thing is that for the first time we see a method that is negatively affected by the caching: Aaron's CTE method (which was previously the best for Test Number 3). But, I am not going to take credit where it is not due, and since without the caching Aaron's CTE method is still faster than my method is here with the caching, the best approach for this particular situation appears to be Aaron's CTE method.

Conclusion Please see update towards the bottom (below the line)
Situations that make repeated use of the results of a secondary query can often (but not always) benefit from caching those results. But when caching is a benefit, using memory for said caching has some advantage over using temporary tables.

The Method

Generally

I separated the "header" query (i.e. getting the ProductIDs, and in one case also the DaysToManufacture, based on the Name starting with certain letters) from the "detail" queries (i.e. getting the TransactionIDs and TransactionDates). The concept was to perform very simple queries and not allow the optimizer to get confused when JOINing them. Clearly this is not always advantageous as it also disallows the optimizer from, well, optimizing. But as we saw in the results, depending on the type of query, this method does have its merits.

The difference between the various flavors of this method are:

Constants: Submit any replaceable values as inline constants instead of being parameters. This would refer to ProductID in all three tests and also the number of rows to return in Test 2 as that is a function of "five times the DaysToManufacture Product attribute". This sub-method means that each ProductID will get its own execution plan, which can be beneficial if there is a wide variation in data distribution for ProductID. But if there is little variation in the data distribution, the cost of generating the additional plans will likely not be worth it.
Parameterized: Submit at least ProductID as @ProductID, allowing for execution plan caching and reuse. There is an additional test option to also treat the variable number of rows to return for Test 2 as a parameter.
Optimize Unknown: When referencing ProductID as @ProductID, if there is wide variation of data distribution then it is possible to cache a plan that has a negative effect on other ProductID values so it would be good to know if using this Query Hint helps any.
Cache Products: Rather than querying the Production.Product table each time, only to get the exact same list, run the query once (and while we are at it, filter out any ProductIDs that aren't even in the TransactionHistory table so we don't waste any resources there) and cache that list. The list should include the DaysToManufacture field. Using this option there is a slightly higher initial hit on Logical Reads for the first execution, but after that it is only the TransactionHistory table that is queried.

Specifically

Ok, but so, um, how is it possible to issue all of the sub-queries as separate queries without using a CURSOR and dumping each result set to a temporary table or table variable? Clearly doing the CURSOR / Temp Table method would reflect quite obviously in the Reads and Writes. Well, by using SQLCLR :). By creating a SQLCLR stored procedure, I was able to open a result set and essentially stream the results of each sub-query to it, as a continuous result set (and not multiple result sets). Outside of the Product information (i.e. ProductID, Name, and DaysToManufacture), none of the sub-query results had to be stored anywhere (memory or disk) and just got passed through as the main result set of the SQLCLR stored procedure. This allowed me to do a simple query to get the Product info and then cycle through it, issuing very simple queries against TransactionHistory.

And, this is why I had to use SQL Server Profiler to capture the statistics. The SQLCLR stored procedure did not return an execution plan, either by setting the "Include Actual Execution Plan" Query Option, or by issuing SET STATISTICS XML ON;.

For the Product Info caching, I used a readonly static Generic List (i.e. _GlobalProducts in the code below). It seems that adding to collections does not violate the readonly option, hence this code works when the assembly has a PERMISSON_SET of SAFE :), even if that is counter-intuitive.

The Generated Queries

The queries produced by this SQLCLR stored procedure are as follows:

Product Info

Test Numbers 1 and 3 (no Caching)

SELECT prod1.ProductID, prod1.Name, 1 AS [DaysToManufacture]
FROM   Production.Product prod1
WHERE  prod1.Name LIKE N'[M-R]%';

Test Number 2 (no Caching)

;WITH cte AS
(
    SELECT prod1.ProductID
    FROM   Production.Product prod1 WITH (INDEX(AK_Product_Name))
    WHERE  prod1.Name LIKE N'[M-R]%'
)
SELECT prod2.ProductID, prod2.Name, prod2.DaysToManufacture
FROM   Production.Product prod2
INNER JOIN cte
        ON cte.ProductID = prod2.ProductID;

Test Numbers 1, 2, and 3 (Caching)

;WITH cte AS
(
    SELECT prod1.ProductID
    FROM   Production.Product prod1 WITH (INDEX(AK_Product_Name))
    WHERE  prod1.Name LIKE N'[M-R]%'
    AND    EXISTS (
                SELECT *
                FROM Production.TransactionHistory th
                WHERE th.ProductID = prod1.ProductID
                  )
)
SELECT prod2.ProductID, prod2.Name, prod2.DaysToManufacture
FROM   Production.Product prod2
INNER JOIN cte
        ON cte.ProductID = prod2.ProductID;

Transaction Info

Test Numbers 1 and 2 (Constants)

SELECT TOP (5) th.TransactionID, th.TransactionDate
FROM   Production.TransactionHistory th
WHERE  th.ProductID = 977
ORDER BY th.TransactionDate DESC;

Test Numbers 1 and 2 (Parameterized)

SELECT TOP (5) th.TransactionID, th.TransactionDate
FROM   Production.TransactionHistory th
WHERE  th.ProductID = @ProductID
ORDER BY th.TransactionDate DESC
;

Test Numbers 1 and 2 (Parameterized + OPTIMIZE UNKNOWN)

SELECT TOP (5) th.TransactionID, th.TransactionDate
FROM   Production.TransactionHistory th
WHERE  th.ProductID = @ProductID
ORDER BY th.TransactionDate DESC
OPTION (OPTIMIZE FOR (@ProductID UNKNOWN));

Test Number 2 (Parameterized Both)

SELECT TOP (@RowsToReturn) th.TransactionID, th.TransactionDate
FROM   Production.TransactionHistory th
WHERE  th.ProductID = @ProductID
ORDER BY th.TransactionDate DESC
;

Test Number 2 (Parameterized Both + OPTIMIZE UNKNOWN)

SELECT TOP (@RowsToReturn) th.TransactionID, th.TransactionDate
FROM   Production.TransactionHistory th
WHERE  th.ProductID = @ProductID
ORDER BY th.TransactionDate DESC
OPTION (OPTIMIZE FOR (@ProductID UNKNOWN));

Test Number 3 (Constants)

SELECT TOP (1) th.TransactionID, th.TransactionDate
FROM   Production.TransactionHistory th
WHERE  th.ProductID = 977
ORDER BY th.TransactionDate DESC, th.TransactionID DESC;

Test Number 3 (Parameterized)

SELECT TOP (1) th.TransactionID, th.TransactionDate
FROM   Production.TransactionHistory th
WHERE  th.ProductID = @ProductID
ORDER BY th.TransactionDate DESC, th.TransactionID DESC
;

Test Number 3 (Parameterized + OPTIMIZE UNKNOWN)

SELECT TOP (1) th.TransactionID, th.TransactionDate
FROM   Production.TransactionHistory th
WHERE  th.ProductID = @ProductID
ORDER BY th.TransactionDate DESC, th.TransactionID DESC
OPTION (OPTIMIZE FOR (@ProductID UNKNOWN));

The Code

using System;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

public class ObligatoryClassName
{
    private class ProductInfo
    {
        public int ProductID;
        public string Name;
        public int DaysToManufacture;

        public ProductInfo(int ProductID, string Name, int DaysToManufacture)
        {
            this.ProductID = ProductID;
            this.Name = Name;
            this.DaysToManufacture = DaysToManufacture;

            return;
        }
    }

    private static readonly List<ProductInfo> _GlobalProducts = new List<ProductInfo>();

    private static void PopulateGlobalProducts(SqlBoolean PrintQuery)
    {
        if (_GlobalProducts.Count > 0)
        {
            if (PrintQuery.IsTrue)
            {
                SqlContext.Pipe.Send(String.Concat("I already haz ", _GlobalProducts.Count,
                            " entries :)"));
            }

            return;
        }

        SqlConnection _Connection = new SqlConnection("Context Connection = true;");
        SqlCommand _Command = new SqlCommand();
        _Command.CommandType = CommandType.Text;
        _Command.Connection = _Connection;
        _Command.CommandText = @"
   ;WITH cte AS
   (
     SELECT prod1.ProductID
     FROM   Production.Product prod1 WITH (INDEX(AK_Product_Name))
     WHERE  prod1.Name LIKE N'[M-R]%'
     AND    EXISTS (
                     SELECT *
                     FROM Production.TransactionHistory th
                     WHERE th.ProductID = prod1.ProductID
                   )
   )
   SELECT prod2.ProductID, prod2.Name, prod2.DaysToManufacture
   FROM   Production.Product prod2
   INNER JOIN cte
           ON cte.ProductID = prod2.ProductID;
";

        SqlDataReader _Reader = null;

        try
        {
            _Connection.Open();

            _Reader = _Command.ExecuteReader();

            while (_Reader.Read())
            {
                _GlobalProducts.Add(new ProductInfo(_Reader.GetInt32(0), _Reader.GetString(1),
                                                    _Reader.GetInt32(2)));
            }
        }
        catch
        {
            throw;
        }
        finally
        {
            if (_Reader != null && !_Reader.IsClosed)
            {
                _Reader.Close();
            }

            if (_Connection != null && _Connection.State != ConnectionState.Closed)
            {
                _Connection.Close();
            }

            if (PrintQuery.IsTrue)
            {
                SqlContext.Pipe.Send(_Command.CommandText);
            }
        }

        return;
    }


    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void GetTopRowsPerGroup(SqlByte TestNumber,
        SqlByte ParameterizeProductID, SqlBoolean OptimizeForUnknown,
        SqlBoolean UseSequentialAccess, SqlBoolean CacheProducts, SqlBoolean PrintQueries)
    {
        SqlConnection _Connection = new SqlConnection("Context Connection = true;");
        SqlCommand _Command = new SqlCommand();
        _Command.CommandType = CommandType.Text;
        _Command.Connection = _Connection;

        List<ProductInfo> _Products = null;
        SqlDataReader _Reader = null;

        int _RowsToGet = 5; // default value is for Test Number 1
        string _OrderByTransactionID = "";
        string _OptimizeForUnknown = "";
        CommandBehavior _CmdBehavior = CommandBehavior.Default;

        if (OptimizeForUnknown.IsTrue)
        {
            _OptimizeForUnknown = "OPTION (OPTIMIZE FOR (@ProductID UNKNOWN))";
        }

        if (UseSequentialAccess.IsTrue)
        {
            _CmdBehavior = CommandBehavior.SequentialAccess;
        }

        if (CacheProducts.IsTrue)
        {
            PopulateGlobalProducts(PrintQueries);
        }
        else
        {
            _Products = new List<ProductInfo>();
        }


        if (TestNumber.Value == 2)
        {
            _Command.CommandText = @"
   ;WITH cte AS
   (
     SELECT prod1.ProductID
     FROM   Production.Product prod1 WITH (INDEX(AK_Product_Name))
     WHERE  prod1.Name LIKE N'[M-R]%'
   )
   SELECT prod2.ProductID, prod2.Name, prod2.DaysToManufacture
   FROM   Production.Product prod2
   INNER JOIN cte
           ON cte.ProductID = prod2.ProductID;
";
        }
        else
        {
            _Command.CommandText = @"
     SELECT prod1.ProductID, prod1.Name, 1 AS [DaysToManufacture]
     FROM   Production.Product prod1
     WHERE  prod1.Name LIKE N'[M-R]%';
";
            if (TestNumber.Value == 3)
            {
                _RowsToGet = 1;
                _OrderByTransactionID = ", th.TransactionID DESC";
            }
        }

        try
        {
            _Connection.Open();

            // Populate Product list for this run if not using the Product Cache
            if (!CacheProducts.IsTrue)
            {
                _Reader = _Command.ExecuteReader(_CmdBehavior);

                while (_Reader.Read())
                {
                    _Products.Add(new ProductInfo(_Reader.GetInt32(0), _Reader.GetString(1),
                                                  _Reader.GetInt32(2)));
                }

                _Reader.Close();

                if (PrintQueries.IsTrue)
                {
                    SqlContext.Pipe.Send(_Command.CommandText);
                }
            }
            else
            {
                _Products = _GlobalProducts;
            }

            SqlDataRecord _ResultRow = new SqlDataRecord(
                new SqlMetaData[]{
                    new SqlMetaData("ProductID", SqlDbType.Int),
                    new SqlMetaData("Name", SqlDbType.NVarChar, 50),
                    new SqlMetaData("TransactionID", SqlDbType.Int),
                    new SqlMetaData("TransactionDate", SqlDbType.DateTime)
                });

            SqlParameter _ProductID = new SqlParameter("@ProductID", SqlDbType.Int);
            _Command.Parameters.Add(_ProductID);
            SqlParameter _RowsToReturn = new SqlParameter("@RowsToReturn", SqlDbType.Int);
            _Command.Parameters.Add(_RowsToReturn);

            SqlContext.Pipe.SendResultsStart(_ResultRow);

            for (int _Row = 0; _Row < _Products.Count; _Row++)
            {
                // Tests 1 and 3 use previously set static values for _RowsToGet
                if (TestNumber.Value == 2)
                {
                    if (_Products[_Row].DaysToManufacture == 0)
                    {
                        continue; // no use in issuing SELECT TOP (0) query
                    }

                    _RowsToGet = (5 * _Products[_Row].DaysToManufacture);
                }

                _ResultRow.SetInt32(0, _Products[_Row].ProductID);
                _ResultRow.SetString(1, _Products[_Row].Name);

                switch (ParameterizeProductID.Value)
                {
                    case 0x01:
                        _Command.CommandText = String.Format(@"
   SELECT TOP ({0}) th.TransactionID, th.TransactionDate
   FROM   Production.TransactionHistory th
   WHERE  th.ProductID = @ProductID
   ORDER BY th.TransactionDate DESC{2}
   {1};
", _RowsToGet, _OptimizeForUnknown, _OrderByTransactionID);

                        _ProductID.Value = _Products[_Row].ProductID;
                        break;
                    case 0x02:
                        _Command.CommandText = String.Format(@"
   SELECT TOP (@RowsToReturn) th.TransactionID, th.TransactionDate
   FROM   Production.TransactionHistory th
   WHERE  th.ProductID = @ProductID
   ORDER BY th.TransactionDate DESC
   {0};
", _OptimizeForUnknown);

                        _ProductID.Value = _Products[_Row].ProductID;
                        _RowsToReturn.Value = _RowsToGet;
                        break;
                    default:
                        _Command.CommandText = String.Format(@"
   SELECT TOP ({0}) th.TransactionID, th.TransactionDate
   FROM   Production.TransactionHistory th
   WHERE  th.ProductID = {1}
   ORDER BY th.TransactionDate DESC{2};
", _RowsToGet, _Products[_Row].ProductID, _OrderByTransactionID);
                        break;
                }


                _Reader = _Command.ExecuteReader(_CmdBehavior);

                while (_Reader.Read())
                {
                    _ResultRow.SetInt32(2, _Reader.GetInt32(0));
                    _ResultRow.SetDateTime(3, _Reader.GetDateTime(1));

                    SqlContext.Pipe.SendResultsRow(_ResultRow);
                }
                _Reader.Close();
            }

        }
        catch
        {
            throw;
        }
        finally
        {
            if (SqlContext.Pipe.IsSendingResults)
            {
                SqlContext.Pipe.SendResultsEnd();
            }

            if (_Reader != null && !_Reader.IsClosed)
            {
                _Reader.Close();
            }

            if (_Connection != null && _Connection.State != ConnectionState.Closed)
            {
                _Connection.Close();
            }

            if (PrintQueries.IsTrue)
            {
                SqlContext.Pipe.Send(_Command.CommandText);
            }
        }


    }
}

The Test Queries

There is not enough room to post the tests here so I will find another location.

The Conclusion

For certain scenarios, SQLCLR can be used to manipulate certain aspects of queries that cannot be done in T-SQL. And there is the ability to use memory for caching instead of temp tables, though that should be done sparingly and carefully as the memory does not get automatically released back to the system. This method is also not something that will help ad hoc queries, though it is possible to make it more flexible than I have shown here simply by adding parameters to tailor more aspects of the queries being executed.

UPDATE

Additional Test
My original tests that included a supporting index on TransactionHistory used the following definition:

ProductID ASC, TransactionDate DESC

I had decided at the time to forgo including TransactionId DESC at the end, figuring that while it might help Test Number 3 (which specifies tie-breaking on the most recent TransactionId--well, "most recent" is assumed since not explicitly stated, but everyone seems to agree on this assumption), there likely wouldn't be enough ties to make a difference.

But, then Aaron retested with a supporting index that did include TransactionId DESC and found that the CROSS APPLY method was the winner across all three tests. This was different than my testing which indicated that the CTE method was best for Test Number 3 (when no caching was used, which mirrors Aaron's test). It was clear that there was an additional variation that needed to be tested.

I removed the current supporting index, created a new one with TransactionId, and cleared the plan cache (just to be sure):

DROP INDEX [IX_TransactionHistoryX] ON Production.TransactionHistory;

CREATE UNIQUE INDEX [UIX_TransactionHistoryX]
    ON Production.TransactionHistory (ProductID ASC, TransactionDate DESC, TransactionID DESC)
    WITH (FILLFACTOR = 100);

DBCC FREEPROCCACHE WITH NO_INFOMSGS;

I re-ran Test Number 1 and the results were the same, as expected. I then re-ran Test Number 3 and the results did indeed change:

Test 3 Results-with supporting index (with TransactionId DESC)
The above results are for the standard, non-caching test. This time, not only does the CROSS APPLY beat the CTE (just as Aaron's test indicated), but the SQLCLR proc took the lead by 30 Reads (woo hoo).

Test 3 Results-with supporting index (with TransactionId DESC) AND caching
The above results are for the test with caching enabled. This time the CTE's performance is not degraded, though the CROSS APPLY still beats it. However, now the SQLCLR proc takes the lead by 23 Reads (woo hoo, again).

Take Aways

There are various options to use. It is best to try several as they each have their strengths. The tests done here show a rather small variance in both Reads and Duration between the best and worst performers across all tests (with a supporting index); the variation in Reads is about 350 and Duration is 55 ms. While the SQLCLR proc did win in all but 1 test (in terms of Reads), only saving a few Reads usually isn't worth the maintenance cost of going the SQLCLR route. But in AdventureWorks2012, the Product table has only 504 rows and TransactionHistory has only 113,443 rows. The performance difference across these methods probably becomes more pronounced as the row counts increase.
虽然这个问题专门针对获取一组特定的行，但不应忽视性能的最大因素是索引，而不是特定的 SQL。在确定哪种方法真正最好之前，需要有一个好的索引。
这里发现的最重要的教训不是关于 CROSS APPLY、CTE 和 SQLCLR：而是关于测试。不要假设。从几个人那里获得想法并尽可能多地测试场景。

Mikael Eriksson · Answer 5 · 2014-12-23T07:36:14+08:00

APPLY TOP or ROW_NUMBER()? What could there possibly be more to say on that matter?

A short recap of the differences and to really keep it short I will only show the plans for option 2 and I have added the index on Production.TransactionHistory.

create index IX_TransactionHistoryX on 
  Production.TransactionHistory(ProductID, TransactionDate)

The row_number() query:.

with C as
(
  select T.TransactionID,
         T.TransactionDate,
         P.DaysToManufacture,
         row_number() over(partition by P.ProductID order by T.TransactionDate desc) as rn
  from Production.Product as P
    inner join Production.TransactionHistory as T
      on P.ProductID = T.ProductID
  where P.Name >= N'M' and
        P.Name < N'S'
)
select C.TransactionID,
       C.TransactionDate
from C
where C.rn <= 5 * C.DaysToManufacture;

在此处输入图像描述

The apply top version:

select T.TransactionID, 
       T.TransactionDate
from Production.Product as P
  cross apply (
              select top(cast(5 * P.DaysToManufacture as bigint))
                T.TransactionID,
                T.TransactionDate
              from Production.TransactionHistory as T
              where P.ProductID = T.ProductID
              order by T.TransactionDate desc
              ) as T
where P.Name >= N'M' and
      P.Name < N'S';

在此处输入图像描述

The main difference between these are that apply top filters on the top expression below the nested loops join where row_number version filters after the join. That means there are more reads from Production.TransactionHistory than really is necessary.

If there only existed a way to push the operators responsible for enumerating rows down to the lower branch before the join then row_number version might do better.

So enter apply row_number() version.

select T.TransactionID, 
       T.TransactionDate
from Production.Product as P
  cross apply (
              select T.TransactionID,
                     T.TransactionDate
              from (
                   select T.TransactionID,
                          T.TransactionDate,
                          row_number() over(order by T.TransactionDate desc) as rn
                   from Production.TransactionHistory as T
                   where P.ProductID = T.ProductID
                   ) as T
              where T.rn <= cast(5 * P.DaysToManufacture as bigint)
              ) as T
where P.Name >= N'M' and
      P.Name < N'S';

在此处输入图像描述

As you can see apply row_number() is pretty much the same as apply top only slightly more complicated. Execution time is also about the same or bit slower.

So why did I bother to come up with an answer that is no better than what we already have? Well, you have one more thing to try out in the real world and there actually is a difference in reads. One that I don't have an explanation for*.

APPLY - ROW_NUMBER
(961 row(s) affected)
Table 'TransactionHistory'. Scan count 115, logical reads 230, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Product'. Scan count 1, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

APPLY - TOP
(961 row(s) affected)
Table 'TransactionHistory'. Scan count 115, logical reads 268, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Product'. Scan count 1, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

While I'm at it i might as well throw in a second row_number() version that in certain cases might be the way to go. Those certain cases would be when you expect you actually need most of the rows from Production.TransactionHistory because here you get a merge join between Production.Product and the enumerated Production.TransactionHistory.

with C as
(
  select T.TransactionID,
         T.TransactionDate,
         T.ProductID,
         row_number() over(partition by T.ProductID order by T.TransactionDate desc) as rn
  from Production.TransactionHistory as T
)
select C.TransactionID,
       C.TransactionDate
from C
 inner join Production.Product as P
      on P.ProductID = C.ProductID
where P.Name >= N'M' and
      P.Name < N'S' and
      C.rn <= 5 * P.DaysToManufacture;

在此处输入图像描述

To get the above shape without a sort operator you also have to change the supporting index to order by TransactionDate descending.

create index IX_TransactionHistoryX on 
  Production.TransactionHistory(ProductID, TransactionDate desc)

^* Edit: The extra logical reads are due to the nested loops prefetching used with the apply-top. You can disable this with undoc'd TF 8744 (and/or 9115 on later versions) to get the same number of logical reads. Prefetching could be an advantage of the apply-top alternative in the right circumstances. - Paul White

Kris Gruttemeyer · Answer 6 · 2014-12-18T07:16:43+08:00

我通常使用 CTE 和窗口函数的组合。您可以使用以下内容来实现此答案：

;WITH GiveMeCounts
AS (
    SELECT CustomerID
        ,OrderDate
        ,TotalAmt

        ,ROW_NUMBER() OVER (
            PARTITION BY CustomerID ORDER BY 
            --You can change the following field or sort order to whatever you'd like to order by.
            TotalAmt desc
            ) AS MySeqNum
    )
SELECT CustomerID, OrderDate, TotalAmt
FROM GiveMeCounts
--Set n per group here
where MySeqNum <= 10

For the extra credit portion, where different groups may want to return different numbers of rows, you could use a separate table. Lets say using geographic criteria such as state:

+-------+-----------+
| State | MaxSeqnum |
+-------+-----------+
| AK    |        10 |
| NY    |         5 |
| NC    |        23 |
+-------+-----------+

In order to achieve this where the values may be different you would need to join your CTE to the State table similar to this:

SELECT [CustomerID]
    ,[OrderDate]
    ,[TotalAmt]
    ,[State]
FROM GiveMeCounts gmc
INNER JOIN StateTable st ON gmc.[State] = st.[State]
    AND gmc.MySeqNum <= st.MaxSeqNum

每组检索 n 行

AdventureWorks 示例 - 没有任何更改

AdventureWorks 示例 - 可以灵活地添加索引

The Testing

The Results

No Supporting Index

With Supporting Index (no Caching)

With Supporting Index AND Caching

The Method

Generally

Specifically

The Generated Queries

Product Info

Transaction Info

The Code

The Test Queries

The Conclusion

UPDATE

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

每组检索 n 行

6 个回答

AdventureWorks 示例 - 没有任何更改

AdventureWorks 示例 - 可以灵活地添加索引

The Testing

The Results

No Supporting Index

With Supporting Index (no Caching)

With Supporting Index AND Caching

The Method

Generally

Specifically

The Generated Queries

Product Info

Transaction Info

The Code

The Test Queries

The Conclusion

UPDATE

相关问题