我需要为每种类型的查询使用单独的索引，还是一个多列索引可以工作？

Question

Rachel

Asked: 2012-07-27 04:44:36 +0800 CST2012-07-27 04:44:36 +0800 CST 2012-07-27 04:44:36 +0800 CST

在 SQL Server 2005 上获得最少多列的最有效方法是什么？

772

我处于想要从 6 列中获取最小值的情况。

到目前为止，我已经找到了三种方法来实现这一点，但我担心这些方法的性能，并想知道哪种方法对性能更好。

第一种方法是使用大case 语句。这是一个包含 3 列的示例，基于上面链接中的示例。我的案例陈述会更长，因为我将查看 6 列。

Select Id,
       Case When Col1 <= Col2 And Col1 <= Col3 Then Col1
            When Col2 <= Col3 Then Col2 
            Else Col3
            End As TheMin
From   MyTable

第二种选择是将UNION运算符与多个选择语句一起使用。我会把它放在一个接受 Id 参数的 UDF 中。

select Id, dbo.GetMinimumFromMyTable(Id)
from MyTable

和

select min(col)
from
(
    select col1 [col] from MyTable where Id = @id
    union all
    select col2 from MyTable where Id = @id
    union all
    select col3 from MyTable where Id = @id
) as t

我发现的第三个选项是使用 UNPIVOT 运算符，直到现在我才知道它存在

with cte (ID, Col1, Col2, Col3)
as
(
    select ID, Col1, Col2, Col3
    from TestTable
)
select cte.ID, Col1, Col2, Col3, TheMin from cte
join
(
    select
        ID, min(Amount) as TheMin
    from 
        cte 
        UNPIVOT (Amount for AmountCol in (Col1, Col2, Col3)) as unpvt
    group by ID
) as minValues
on cte.ID = minValues.ID

由于表的大小以及查询和更新该表的频率，我担心这些查询会对数据库产生性能影响。

此查询实际上将用于连接具有几百万条记录的表，但是返回的记录一次将减少到大约一百条记录。它将在一天中运行多次，并且我查询的 6 列经常更新（它们包含每日统计信息）。我认为我查询的 6 列上没有任何索引。

在尝试获得最少的多列时，这些方法中的哪一种对性能更好？还是有另一种我不知道的更好的方法？

我正在使用 SQL Server 2005

样本数据和结果

如果我的数据包含这样的记录：

标识 Col1 Col2 Col3 Col4 Col5 Col6
1 3 4 0 2 1 5
2 2 6 10 5 7 9
3 1 1 2 3 4 5
4 9 5 4 6 8 9

最终结果应该是

7 个回答

Voted

Rachel · Answer 1 · 2012-07-27T06:08:56+08:00

我测试了所有 3 种方法的性能，这就是我发现的：

1 条记录：无明显差异
10 条记录：无明显差异
1,000 条记录：无明显差异
10,000 条记录：UNION子查询有点慢。查询比CASE WHEN查询快一点UNPIVOT。
100,000 条记录：UNION子查询明显慢，但UNPIVOT查询变得比CASE WHEN查询快一点
500,000 条记录：UNION子查询仍然明显慢，但比查询UNPIVOT快得多CASE WHEN

所以最终的结果似乎是

对于较小的记录集，似乎没有足够的差异。使用最容易阅读和维护的东西。
一旦开始进入更大的记录集，UNION ALL与其他两种方法相比，子查询开始表现不佳。
该CASE语句在某个点（在我的情况下，大约 100k 行）之前执行最佳，并且UNPIVOT查询成为最佳执行查询

一个查询比另一个查询更好的实际数字可能会因您的硬件、数据库架构、数据和当前服务器负载而改变，因此如果您担心性能，请务必使用您自己的系统进行测试。

我还使用Mikael 的答案进行了一些测试；但是，对于大多数记录集大小，它比此处尝试的所有其他 3 种方法都慢。唯一的例外是它比UNION ALL对非常大的记录集大小的查询做得更好。我喜欢它除了显示最小值之外还显示列名的事实。

我不是 dba，所以我可能没有优化我的测试并错过了一些东西。我正在使用实际的实时数据进行测试，因此可能会影响结果。我试图通过多次运行每个查询来解释这一点，但你永远不知道。如果有人对此进行了干净的测试并分享了他们的结果，我肯定会感兴趣。

Mikael Eriksson · Answer 2 · 2012-07-27T05:51:10+08:00

不知道什么是最快的，但你可以尝试这样的事情。

declare @T table
(
  Col1 int,
  Col2 int,
  Col3 int,
  Col4 int,
  Col5 int,
  Col6 int
)

insert into @T values(1, 2, 3, 4, 5, 6)
insert into @T values(2, 3, 1, 4, 5, 6)

select T4.ColName, T4.ColValue
from @T as T1
  cross apply (
                select T3.ColValue, T3.ColName
                from (
                       select row_number() over(order by T2.ColValue) as rn,
                              T2.ColValue,
                              T2.ColName
                       from (
                              select T1.Col1, 'Col1' union all
                              select T1.Col2, 'Col2' union all
                              select T1.Col3, 'Col3' union all
                              select T1.Col4, 'Col4' union all
                              select T1.Col5, 'Col5' union all
                              select T1.Col6, 'Col6'
                            ) as T2(ColValue, ColName)
                     ) as T3
                where T3.rn = 1
              ) as T4

结果：

ColName ColValue
------- -----------
Col1    1
Col3    1

如果您对哪一列具有最小值不感兴趣，则可以使用它。

declare @T table
(
  Id int,
  Col1 int,
  Col2 int,
  Col3 int,
  Col4 int,
  Col5 int,
  Col6 int
)

insert into @T
select 1,        3,       4,       0,       2,       1,       5 union all
select 2,        2,       6,      10,       5,       7,       9 union all
select 3,        1,       1,       2,       3,       4,       5 union all
select 4,        9,       5,       4,       6,       8,       9

select T.Id, (select min(T1.ColValue)
              from (
                      select T.Col1 union all
                      select T.Col2 union all
                      select T.Col3 union all
                      select T.Col4 union all
                      select T.Col5 union all
                      select T.Col6
                    ) as T1(ColValue)
             ) as ColValue
from @T as T

一个简化的反透视查询。

select Id, min(ColValue) as ColValue
from @T
unpivot (ColValue for Col in (Col1, Col2, Col3, Col4, Col5, Col6)) as U
group by Id

Jon Seigel · Answer 3 · 2012-07-27T05:58:17+08:00

Jon Seigel

2012-07-27T05:58:17+08:002012-07-27T05:58:17+08:00

添加一个持久计算列，该列使用CASE语句来执行您需要的逻辑。

当您需要基于该值进行连接（或其他任何操作）时，最小值将始终有效。

每次任何源值更改 ( INSERT/ UPDATE/ MERGE) 时都会重新计算该值。我并不是说这一定是工作负载的最佳解决方案，我只是将其作为解决方案提供，就像其他答案一样。只有 OP 才能确定哪个最适合工作负载。

6

Gulli Meel · Answer 4 · 2012-07-27T20:44:25+08:00

你的case说法效率不高。您在最坏情况下进行 5 次比较，在最佳情况下进行 2 次比较；而找到最小值n应该做最多的n-1比较。

对于每一行，您平均进行 3.5 次比较，而不是 2 次。因此它需要更多的 cpu 时间并且速度很慢。使用以下case语句再次尝试您的测试。它每行只使用 2 次比较，应该比unpivotand更有效union all。

Select Id, 
       Case 
           When Col1 <= Col2 then case when Col1 <= Col3 Then Col1  else col3 end
            When  Col2 <= Col3 Then Col2  
            Else Col3 
            End As TheMin 
From   YourTableNameHere

在您的情况下，该union all方法是错误的，因为您获得的不是每行而是整个表的最小值。此外，它不会有效，因为您将扫描同一张表 3 次。当表很小时，I/O 不会有太大的区别，但对于大表会。不要使用那种方法。

Unpivot很好，也可以通过使用交叉连接表来尝试手动取消透视(select 1 union all select 2 union all select 3)。它应该和unpivot.

如果您没有空间问题，最好的解决方案是拥有一个计算的持久列。它会将行的大小增加 4 个字节（我想你会有int类型），这反过来会增加表的大小。

但是，您的系统中存在空间和内存问题，并且 CPU 不是，因此不要使其持久化，而是使用 case 语句使用简单的计算列。它将使代码更简单。

Jesse Adam · Answer 5 · 2015-09-18T17:07:05+08:00

6 个日期的案例陈述。要少做事，请从第一个 case 语句中复制真正的分支。最坏情况是Date1 是最小值，最好情况是Date6 是最小值，所以将最可能的日期放在Date6 中。由于计算列的限制，我写了这个。

CASE WHEN Date1 IS NULL OR Date1 > Date2 THEN
        CASE WHEN Date2 IS NULL OR Date2 > Date3 THEN
            CASE WHEN Date3 IS NULL OR Date3 > Date4 THEN
                CASE WHEN Date4 IS NULL OR Date4 > Date5 THEN
                    CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN
                        Date6
                    ELSE
                        Date5
                    END
                ELSE
                    CASE WHEN Date4 IS NULL OR Date4 > Date6 THEN
                        Date6
                    ELSE
                        Date4
                    END
                END
            ELSE
                CASE WHEN Date3 IS NULL OR Date3 > Date5 THEN
                    CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN
                        Date6
                    ELSE
                        Date5
                    END
                ELSE
                    CASE WHEN Date3 IS NULL OR Date3 > Date6 THEN
                        Date6
                    ELSE
                        Date3
                    END
                END
            END
        ELSE
            CASE WHEN Date2 IS NULL OR Date2 > Date4 THEN
                CASE WHEN Date4 IS NULL OR Date4 > Date5 THEN
                    CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN
                        Date6
                    ELSE
                        Date5
                    END
                ELSE
                    CASE WHEN Date4 IS NULL OR Date4 > Date5 THEN
                        CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN
                            Date6
                        ELSE
                            Date5
                        END
                    ELSE
                        CASE WHEN Date4 IS NULL OR Date4 > Date6 THEN
                            Date6
                        ELSE
                            Date4
                        END
                    END
                END
            ELSE
                CASE WHEN Date2 IS NULL OR Date2 > Date5 THEN
                    CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN
                        Date6
                    ELSE
                        Date5
                    END
                ELSE
                    CASE WHEN Date2 IS NULL OR Date2 > Date6 THEN
                        Date6
                    ELSE
                        Date2
                    END
                END
            END
        END
ELSE
    CASE WHEN Date1 IS NULL OR Date1 > Date3 THEN
        CASE WHEN Date3 IS NULL OR Date3 > Date4 THEN
            CASE WHEN Date4 IS NULL OR Date4 > Date5 THEN
                CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN
                    Date6
                ELSE
                    Date5
                END
            ELSE
                CASE WHEN Date4 IS NULL OR Date4 > Date6 THEN
                    Date6
                ELSE
                    Date4
                END
            END
        ELSE
            CASE WHEN Date3 IS NULL OR Date3 > Date5 THEN
                CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN
                    Date6
                ELSE
                    Date5
                END
            ELSE
                CASE WHEN Date3 IS NULL OR Date3 > Date6 THEN
                    Date6
                ELSE
                    Date3
                END
            END
        END
    ELSE
        CASE WHEN Date1 IS NULL OR Date1 > Date4 THEN
            CASE WHEN Date4 IS NULL OR Date4 > Date5 THEN
                CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN
                    Date6
                ELSE
                    Date5
                END
            ELSE
                CASE WHEN Date4 IS NULL OR Date4 > Date6 THEN
                    Date6
                ELSE
                    Date4
                END
            END
        ELSE
            CASE WHEN Date1 IS NULL OR Date1 > Date5 THEN
                CASE WHEN Date5 IS NULL OR Date5 > Date6 THEN
                    Date6
                ELSE
                    Date5
                END
            ELSE
                CASE WHEN Date1 IS NULL OR Date1 > Date6 THEN
                    Date6
                ELSE
                    Date1
                END
            END
        END
    END
END

如果您遇到此页面只是为了比较日期而不关心性能或兼容性，您可以使用表值构造函数，它可以在允许子选择的任何地方使用（SQL Server 2008 及更高版本）：

Lowest =    
(
    SELECT MIN(TVC.d) 
    FROM 
    (
        VALUES
            (Date1), 
            (Date2), 
            (Date3), 
            (Date4), 
            (Date5), 
            (Date6)
    ) 
    AS TVC(d)
)

NoChance · Answer 6 · 2012-07-27T05:46:51+08:00

NoChance

2012-07-27T05:46:51+08:002012-07-27T05:46:51+08:00

我猜第一个选项是最快的（尽管从编程的角度来看它看起来不是很漂亮！）。这是因为它只处理 N 行（其中 N 是表大小），并且不必像方法 2 或 3 那样进行搜索或排序。

大样本的测试应该证明这一点。

另一个要考虑的选项（好像您需要更多！）是在您的表上创建一个物化视图。如果您的桌子大小为 100 或更多。这样，在更改行时计算最小值，并且不必每次查询都处理整个表。在 SQL Server 中，物化视图称为索引视图

-1

Ravi · Answer 7 · 2015-11-25T01:23:34+08:00

Ravi

2015-11-25T01:23:34+08:002015-11-25T01:23:34+08:00

Create table #temp
   (
    id int identity(1,1),
    Name varchar(30),
    Year1 int,
    Year2 int,
    Year3 int,
    Year4 int
   )

   Insert into #temp values ('A' ,2015,2016,2014,2010)
   Insert into #temp values ('B' ,2016,2013,2017,2018)
   Insert into #temp values ('C' ,2010,2016,2014,2017)
   Insert into #temp values ('D' ,2017,2016,2014,2015)
   Insert into #temp values ('E' ,2016,2016,2016,2016)
   Insert into #temp values ('F' ,2016,2017,2018,2019)
   Insert into #temp values ('G' ,2016,2017,2020,2019)

   Select *, Case 
                 when Year1 >= Year2 and Year1 >= Year3 and Year1 >= Year4 then Year1
                 when Year2 >= Year3 and Year2 >= Year4 and Year2 >= Year1 then Year2
                 when Year3 >= Year4 and Year3 >= Year1 and Year3 >= Year2 then Year3
                 when Year4 >= Year1 and Year4 >= Year2 and Year4 >= Year3 then Year4  
                 else Year1 end as maxscore  
                 from #temp

-1

在 SQL Server 2005 上获得最少多列的最有效方法是什么？

如何查看 Oracle 中的数据库列表？

mysql innodb_buffer_pool_size 应该有多大？

列出指定表的所有列

从 .frm 和 .ibd 文件恢复表？

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

如何选择每组的第一行？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

在 SQL Server 2005 上获得最少多列的最有效方法是什么？

7 个回答

相关问题