我正在研究一个会变得非常大的 DW 报告表。为简单起见,我将显示该表如下:
BigTable
--------
TableID INT IDENTITY NOT NULL,
CompanyName NVARCHAR(100) NOT NULL
每个查询都将使用公司名称在数据分区(而不是物理分区)内进行查询。
由于该表可能包含超过十亿行,并且每个公司的数据分布非常均匀,因此按公司查询应该尽可能快。我正处于设置一些测试的阶段,但在这样做之前,我想我会问一下,看看这是否会浪费时间。
我的想法是确定如果每个公司的数据分区通过聚集索引在磁盘上彼此相邻放置,那么数据检索是否会比仅使用非聚集索引覆盖 CompanyName 更快。
示例 1:这是 IDENTITY 列是 PK 但不是 CLUSTERED 的变体。CompanayName 和 TableID 结合起来形成聚集索引,因此数据将按公司在磁盘上排序。
CREATE TABLE [dbo].[BigTable](
[TableID] [int] IDENTITY(1,1) NOT NULL,
[CompanyName] [nvarchar](100) NOT NULL,
CONSTRAINT [PK_BigTable] PRIMARY KEY NONCLUSTERED
(
[TableID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 97, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE UNIQUE CLUSTERED INDEX [CLUSTERED_ByCompanyName_TableID] ON [dbo].[BigTable]
(
[CompanyName] ASC,
[TableID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 97, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
这是创建具有覆盖索引的表的传统方式。
CREATE TABLE [dbo].[BigTable](
[TableID] [int] IDENTITY(1,1) NOT NULL,
[CompanyName] [nvarchar](200) NOT NULL,
CONSTRAINT [PK_BigTable] PRIMARY KEY CLUSTERED
(
[TableID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 97, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_ByCompanyName] ON [dbo].[BigTable]
(
[CompanyName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 97, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
有谁知道在使用第一个示例而不是第二个示例时是否会有任何性能改进?
编辑:我倾向于在公司中使用聚集索引。如果行需要唯一引用,TableID 只是用作 PK 的自动增量字段。我觉得聚集索引搜索/扫描比索引扫描/搜索更快。
我希望您可以轻松地根据 companyid 之类的东西进行分区或分片。
基本查询的形式为
SELECT
SUM(FieldA) OVER (PARTITION BY ...) a,
COUNT(1) OVER (PARTITION BY...) b
...
FROM
BigTable
WHERE
CompanyName = 'NABISCO'
GROUP BY
....
ORDER BY
....