我将数据加载到数据仓库中。
目前,为了方便和速度,我有一个脚本用于在加载数据之前删除外键约束和索引。有一个很大的窗口可以进行加载,所以我不需要担心用户在加载期间访问数据,但我不想影响数据库中其他表中不相关的数据。
我已经在这里和其他地方进行了一些研究以提出这个脚本,但我想知道是否有一些我可能会忽略的事情可能会导致性能次优或者我可能会遗漏一些重要的东西(我不知道. ..计算列或其他东西?)或者我做事的顺序错误等等。
任何建议都可以使这个强大和高性能。
禁用约束和索引
编辑:我删除了WHILE
评论者帮助我意识到的循环是多余的。
Declare @schema varchar(128) = 'dbo';
Declare @sql nvarchar(max) = N'';
-- 1. Indices
-- Select a list of indexes in the schema and generate statements to disable them.
Select @sql = @sql + 'ALTER INDEX ' + QuoteName(idx.name) + ' ON ' + QuoteName(@schema) + '.' + QuoteName(obj.name) + ' DISABLE;' + CHAR(13)
From sys.indexes As idx
Join sys.objects As obj On idx.object_id = obj.object_id
Where ((obj.type = 'U' And idx.type in (2,6)) -- Non-clustered index/columnstore on a table
Or obj.type = 'V') -- All indexes on indexed views
And obj.schema_id = (Select schema_id From sys.schemas Where name = @schema)
Order By obj.name, idx.name;
Execute sp_executesql @sql;
-- 2. Foreign-key constraints
-- Build a list of foreign keys constraints in the schema and generate statements to disable the constraint checking.
Select @sql = @sql + 'ALTER TABLE ' + QuoteName(@schema) + '.' + QuoteName(obj.name) + ' NOCHECK CONSTRAINT ' + QuoteName(fk.name) + ';' + CHAR(13)
From sys.foreign_keys As fk
Join sys.objects As obj On fk.parent_object_id = obj.object_id
Where obj.schema_id = (Select schema_id From sys.schemas Where name = @schema);
Execute sp_executesql @sql;
启用约束、重建索引和更新统计信息
Declare @schema nvarchar(128) = 'dbo';
Declare @sql nvarchar(max) = N'';
-- 1. Indices
-- Build a list of tables in the schema and generate statements to enable the indices on them.
Select @sql = @sql + 'ALTER INDEX ' + QuoteName(idx.name) + ' ON ' + QuoteName(@schema) + '.' + QuoteName(obj.name) + ' REBUILD' + iif(idx.type = 6, ' WITH (MAXDOP = 1);', ' WITH (FILLFACTOR = 100);') + CHAR(13)
From sys.indexes idx
Join sys.objects obj ON obj.object_id = idx.object_id
Where ((obj.type = 'U' And idx.type in (2,6)) -- Non-clustered index on a table
Or obj.type = 'V') -- All indexes on indexed views
And obj.schema_id = (Select schema_id From sys.schemas Where name = @schema)
And idx.is_disabled = 1 -- Don't rebuild indexes that are already online
And idx.is_hypothetical = 0 -- Don't rebuild hypothetical indexes!
Order By iif(idx.type = 6, 1, 2), obj.name, idx.name;
Execute sp_executesql @sql;
-- 2. Foreign-key constraints
-- Build a list of foreign keys constraints in the schema and generate statements to enable them with checking.
Select @sql = @sql + 'ALTER TABLE ' + QuoteName(@schema) + '.' + QuoteName(obj.name) + ' WITH CHECK CHECK CONSTRAINT ' + QuoteName(fk.name) + ';' + CHAR(13)
From sys.foreign_keys fk
Join sys.objects obj ON obj.object_id = fk.parent_object_id
Where obj.schema_id = (Select schema_id From sys.schemas Where name = @schema)
Order By obj.name, fk.name;
Execute sp_executesql @sql;
-- 3. Statistics
-- Build a list of tables in the schema and generate statements to update the statistics on them.
Select @sql = @sql + 'UPDATE STATISTICS ' + QuoteName(@schema) + '.' + QuoteName(obj.name) + ' WITH COLUMNS;' + CHAR(13)
From sys.objects obj
Where obj.type = 'U' -- User defined
AND obj.schema_id = (Select schema_id From sys.schemas Where name = @schema)
Order By obj.name;
Execute sp_executesql @sql;
对于一些你可能没有考虑过的事情:
对于性能:
ONLINE = OFF
以减少锁定,从而可能提高性能。SORT_IN_TEMPDB = ON
选项来提高性能。在您的情况下,在加载数据之前禁用外键有点令人担忧,因为当您启用它们时,您的脚本没有任何错误处理
TRY .. CATCH blocks
......加载数据后,您如何检查数据库的引用完整性?简单地启用外键并不能保证数据库的引用完整性。您至少应该运行
DBCC CHECKCONSTRAINTS
并确保它运行干净。参考:外键及其状态我看到的另一点是双重工作 - 重建索引后更新统计数据有什么意义?
请记住,重建索引将更新与该索引关联的列的统计信息。
我强烈建议您阅读The Data Loading Performance Guide和我的回答中提到的各种技术。
注意假设的指数。如果调优工具因为崩溃并且没有删除它们而留下它们,那么您的脚本将使它们成为真实的,现在您将在单个表上拥有 30 或 40 个索引,并且该表上的性能将充满阻塞和死锁.
这是我在整合了来自不同受访者的反馈之后的最终脚本(只是想将它们收集在一起以供将来的读者使用):
禁用约束和索引
** 重建索引、启用约束和更新统计信息**