在事实表或事务表上选择聚集索引的好策略是什么?我正在使用 SQL Server 2019。
我选择了一个具有以下属性的通用销售表 (FactSales):
- 没有身份代理键
- 包含 4 个字段的复合主键(均为 INT)
- 约 3 亿行
- 它在每个支票关闭时连续加载,因此每次加载将针对单个 DateOfSaleKey、StoreKey、CheckNumber(并包含许多 SaleItemKey)
我可以看到 5 个选项(但可能还有更多),我在下面编写了这些选项——各有利弊。
请让我知道您会选择什么。以及为什么。
CREATE TABLE dbo.FactSales
(
DateOfSaleKey INT NOT NULL,
StoreKey INT NOT NULL,
CheckNumber INT NOT NULL, -- not unique across stores
SaleItemKey INT NOT NULL,
CashierKey INT NOT NULL,
TerminalKey INT NOT NULL,
SaleTypeKey INT NOT NULL,
TimeSlotKey INT NOT NULL,
TransactionTypeKey INT NOT NULL,
SaleTime DATETIME NOT NULL,
SalesQuantity INT NOT NULL,
SalesNet DECIMAL (16, 8) NOT NULL,
SalesGross DECIMAL (16, 8) NULL,
VAT DECIMAL (16, 8) NOT NULL,
DiscountQuantity INT NOT NULL,
Discount DECIMAL (16, 8) NOT NULL,
VoidQuantity INT NOT NULL,
Void DECIMAL (16, 8) NOT NULL,
RefundQuantity INT NOT NULL,
Refund DECIMAL (16, 8) NOT NULL,
)
ALTER TABLE dbo.FactSales ADD CONSTRAINT PK_FactSales PRIMARY KEY NONCLUSTERED (DateOfSaleKey, StoreKey, CheckNumber, SaleItemKey)
-- OPTION #1: add a surrogate key (identity) and make that the clustered index
-- unique, narrow and always increasing, but unnecessary column
ALTER TABLE dbo.FactSales ADD SalesKey INT IDENTITY NOT NULL
CREATE UNIQUE CLUSTERED INDEX CX_FactSales ON dbo.FactSales (SalesKey)
-- OPTION #2: make the primary key also the clustered index:
-- unique, but wide
ALTER TABLE dbo.FactSales ADD DROP CONSTRAINT PK_FactSales
ALTER TABLE dbo.FactSales ADD CONSTRAINT PK_FactSales PRIMARY KEY CLUSTERED (DateOfSaleKey, StoreKey, CheckNumber, SaleItemKey)
-- OPTION #3: base the clustered index on how the data is inserted
-- optimised for inserting new data, but not unique
CREATE CLUSTERED INDEX CX_FactSales ON dbo.FactSales (DateOfSaleKey, StoreKey)
-- OPTION #4: base the clustered index on how the data is selected
-- optimised for inserting new data and some reports, but not unique and getting wider
CREATE CLUSTERED INDEX CX_FactSales ON dbo.FactSales (DateOfSaleKey, StoreKey, SaleItemKey)
-- OPTION #5: base the clustered index on how the data is selected -- more selective (so it covers more reports)
-- optimised for inserting new data and more reports, but not unique and even wider
CREATE CLUSTERED INDEX CX_FactSales ON dbo.FactSales (DateOfSaleKey, StoreKey, SaleItemKey, CheckNumber)
我通常会选择聚集列存储索引。它们具有最佳的压缩、最快的扫描、每列上的自动行组消除以及按列扫描和缓存。
第二个选择就是集中PK。
请注意,您可以对这些列中的任何一列进行分区,通常不是前导列。
StoreKey
这里可能是不错的选择。