我的一个生产数据库日志文件驱动器已满,我在 SQL Server 错误日志中注意到以下消息。
2022-02-13 03:16:24.500 spid72 Error: 9002, Severity: 17, State: 4.
2022-02-13 03:16:24.500 spid72 The transaction log for database 'huge_database' is full due to 'ACTIVE_TRANSACTION'.
2022-02-13 03:16:26.060 spid72 Error: 9002, Severity: 17, State: 4.
2022-02-13 03:16:26.060 spid72 The transaction log for database 'huge_database' is full due to 'ACTIVE_TRANSACTION'.
2022-02-13 03:16:26.070 spid72 Error: 3314, Severity: 21, State: 3.
2022-02-13 03:16:26.070 spid72 During undoing of a logged operation in database 'huge_database', an error occurred at log record ID (2766:550754:254). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.
2022-02-13 03:16:26.090 spid72 Database huge_database was shutdown due to error 3314 in routine 'XdesRMReadWrite::RollbackToLsn'. Restart for non-snapshot databases will be attempted after all connections to the database are aborted.
2022-02-13 03:16:26.090 spid72 Error: 3314, Severity: 21, State: 5.
2022-02-13 03:16:26.090 spid72 During undoing of a logged operation in database 'huge_database', an error occurred at log record ID (2678:51796:1). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.
2022-02-13 03:16:32.900 spid48s Starting up database 'huge_database'.
2022-02-13 03:16:37.980 spid48s Recovery of database 'huge_database' (20) is 0% complete (approximately 89573 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2022-02-13 03:16:57.980 spid48s Recovery of database 'huge_database' (20) is 0% complete (approximately 60835 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
...
2022-02-13 04:26:15.160 spid36s Recovery of database 'huge_database' (20) is 85% complete (approximately 721 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
...
2022-02-13 04:36:35.230 spid36s Recovery of database 'huge_database' (20) is 99% complete (approximately 1 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2022-02-13 04:36:35.790 spid36s 1 transactions rolled back in database 'huge_database' (20:0). This is an informational message only. No user action is required.
2022-02-13 04:36:35.790 spid36s Recovery is writing a checkpoint in database 'huge_database' (20). This is an informational message only. No user action is required.
2022-02-13 04:36:36.850 spid36s Recovery completed for database huge_database (database ID 20) in 4804 second(s) (analysis 4881 ms, redo 1196931 ms, undo 3600721 ms.) This is an informational message only. No user action is required.
所以看起来数据库日志驱动器已满,并且它启动了恢复过程,花了一个多小时。为什么在这种情况下需要恢复?我想重现它但失败了。下面是我的代码:
CREATE DATABASE MyDatabase
ON
(NAME = MyDatabase_Data,
FILENAME = 'f:\mssql\data\MyDatabase_Data.mdf',
SIZE = 10MB,
MAXSIZE = 1000MB,
FILEGROWTH = 5MB)
LOG ON
(NAME = MyDatabase_Log,
FILENAME = 'U:\data\MyDatabase_Log.ldf',
SIZE = 5MB,
MAXSIZE = 500MB,
FILEGROWTH = 1MB);
GO
USE MyDatabase;
GO
CREATE TABLE MyTable (
ID INT IDENTITY(1,1) PRIMARY KEY,
Data VARCHAR(MAX) NOT NULL
);
GO
BEGIN TRANSACTION;
DECLARE @i INT = 0;
WHILE @i < 1000000
BEGIN
INSERT INTO MyTable (Data) VALUES (REPLICATE('A', 8000));
SET @i = @i + 1;
END;
-- Don't commit the transaction
-- COMMIT TRANSACTION;
当我运行我的代码时,我在 SSMS 中得到了以下信息:
消息 9002,级别 17,状态 4,第 30 行 数据库“MyDatabase”的事务日志因“ACTIVE_TRANSACTION”而已满,并且保留 lsn 为 (39:24:1)。
SQL Server 错误日志中的内容如下:
2023-08-18 02:59:54.200 spid84 Error: 17053, Severity: 16, State: 1.
2023-08-18 02:59:54.200 spid84 U:\data\MyDatabase_Log.ldf: Operating system error 112(There is not enough space on the disk.) encountered.
2023-08-18 02:59:55.200 spid84 Error: 9002, Severity: 17, State: 4.
2023-08-18 02:59:55.200 spid84 The transaction log for database 'MyDatabase' is full due to 'ACTIVE_TRANSACTION' and the holdup lsn is (39:24:1).
没有恢复过程。为什么?如何模拟我的生产数据库问题?
更新:
顺便说一句,在恢复生产数据库期间,我仍然可以访问它。它仍然与 sys.databases 中的其他数据库具有相同的状态。这是预期的吗?我虽然无法访问恢复中的数据库。
更新2:
在撤消数据库“huge_database”中记录的操作期间,日志记录 ID (2678:51796:1) 发生错误。
这个错误日志听起来像是 SQL 没有预留足够的空间来回滚。我记得Paul Randal说过SQL Server总是保留一些日志用于回滚。如果是这样,为什么会发生这种情况?
撤消事务时出现问题,这使数据库处于不一致状态,事实上数据库应该移动到可疑状态。从错误中可以看出 SQL Server 正在自动重新启动。
正如 Dan Guzman 所说,企业版具有“快速恢复”功能,这将允许数据库在重做完成并且锁定所有未完成的撤消项目后向用户开放。您可以在日志中看到这种情况发生的时间(阶段 1 是分析,2 是重做,3 是撤消):
请注意,原始数据中的“...”是数据库可用的时间,即“2022-02-13 03:16:57.980”之后的某个时间。
你没有遇到同样的问题。在您的生产示例中,无法撤消更改,在您的重现中,没有失败,只是日志已满。
日志管理器会猜测每个事务需要保留多少空间,当时还不知道是太少、太多还是恰到好处。此外,为了以防万一,还保留了称为“关键保留”日志空间的额外空间。在某些情况下,如果没有足够的日志保留并溢出,日志可能会耗尽保留空间,并且在执行某些操作(例如回滚事务)时可能会发生异常。它不应该经常发生,但它可能会发生。