注意:这个问题纯粹是学术性的/有助于提高我对 SQL Server 性能的理解。
给定一个与一个或多个其他表相关的主表,您将如何确定查询该主表以获取记录的最佳方法,其中包括相关表中记录存在的指示符?
例如,假设我们有一个 Person 表,想要获取所有人的列表以及他们是否有孩子的指标(在此示例中,Person 可以作为相关表重用):
create table Person
(
Id bigint not null constraint pk_Person primary key clustered
, ParentId bigint null constraint fk_Person_Parent foreign key references Person(Id)
, FirstName nvarchar(256) not null
, LastName nvarchar(256) not null
)
我们可以运行以下任何查询来检查相关子项的存在:
--variables for restricting our result set, just to keep things interesting
declare @LastName nvarchar(256) = 'Be%'
, @FirstName nvarchar(256) = null
示例 1
-- fairly straight forward, but requires grouping to account for the
-- potential of a parent having multiple kids (which I don't care about here)
-- which could be adding some inefficiency.
select parent.Id
, parent.FirstName
, parent.LastName
, case when max(child.Id) is null then 0 else 1 end HasChildren
from Person parent
left outer join Person child --1:n
on child.ParentId = parent.Id
where (@LastName is null or parent.LastName like @LastName)
and (@FirstName is null or parent.FirstName like @FirstName)
group by parent.Id, parent.FirstName, parent.LastName --resolve 1:n
示例 2
-- avoid the need to group the results by first getting
-- a single child per parent.
-- may be inefficient because we get children for all parents
-- even if we filter for only a few parents.
select parent.Id
, parent.FirstName
, parent.LastName
, coalesce(child.hasChildren, 0) HasChildren
from Person parent
left outer join --1:? (0 or 1)
(
select distinct parentId, 1 hasChildren
from Person
where parentId is not null --not sure if this adds value
) child
on child.ParentId = parent.Id
where (@LastName is null or LastName like @LastName)
and (@FirstName is null or FirstName like @FirstName)
--group by removed since we're 1:?
示例 3
-- same as #2 except we limit the child results to those
-- related to the parents we're interested in / having stored
-- them in a CTE to avoid querying for the same parent data
-- in the inner query and outer query.
-- Getting a bit silly now, but could overcome some inefficienies?
;with parentCTE as (
select Id, FirstName, LastName
from person
where (@LastName is null or LastName like @LastName)
and (@FirstName is null or FirstName like @FirstName)
)
select parentCTE.Id
, parentCTE.FirstName
, parentCTE.LastName
, coalesce(child.hasChildren, 0) HasChildren
from parentCTE
left outer join --1:? (0 or 1)
(
select distinct parentId, 1 hasChildren
from Person
where parentId in --reduce the amount of data we return here based on the records we're interested in
(
select Id
from parentCTE
)
) child
on child.ParentId = parentCTE.Id
例子 4
-- back to a simple one; just check for children on our parents
-- but this time having brought back the full parent set.
-- may be inefficient because we're querying the table once per
-- matching parent to check for children.
select parent.Id
, parent.FirstName
, parent.LastName
, coalesce((select top 1 1 from Person child where child.parentId = parent.Id),0) HasChildren
from Person parent
where (@LastName is null or LastName like @LastName)
and (@FirstName is null or FirstName like @FirstName)
我正在寻找有关如何更好地理解在这种情况下所涉及的权衡取舍的信息,而不是简单example 3
的最好的。也欢迎指出可以帮助我理解的文章。
相关 SQL 小提琴:http ://sqlfiddle.com/#!6/edc17/3
示例 4 具有最少的扫描和读取:
示例 1
表“人”。
扫描计数 9,逻辑读取 27,物理读取 0,
示例 2
表“人”。
扫描计数 9,逻辑读取 27,物理读取 0,
示例 3
SQL Server 解析和编译时间:CPU 时间 = 7 毫秒,运行时间 = 7 毫秒。
SQL Server 执行时间:CPU 时间 = 0 毫秒,耗用时间 = 0 毫秒。
(8 行受影响)
表“人”。扫描计数 9,逻辑读取 41,物理读取 0,r
(9 行受影响)
SQL Server 执行时间:CPU 时间 = 0 毫秒,耗用时间 = 0 毫秒。
例 4
SQL Server 解析和编译时间:CPU 时间 = 3 毫秒,运行时间 = 3 毫秒。
SQL Server 执行时间:CPU 时间 = 0 毫秒,耗用时间 = 0 毫秒。
(8 行受影响)
表“人”。扫描计数 3,逻辑读取 26,物理读取 0,
(11 行受影响)
SQL Server 执行时间:CPU 时间 = 0 毫秒,耗用时间 = 0 毫秒。
您希望扫描次数尽可能少。我们扫描表格的次数越多,花费的时间就越长。