我已经AlwaysOn
设置了两个数据节点和一个见证人。我遇到了我的主服务器突然重新启动的问题。
以下是我在这段时间内遇到的一些问题。
由于 PRIMARY Replica 突然重启,DB 进入了 RECOVERY 模式。
数据库需要大约 1 小时的时间来恢复。
在主副本的此恢复阶段。辅助服务器(由于 FAILOVER 现在是 PRIMARY)面临超时并且正在观察缓慢。
查看日志,我可以看到有关数据库发生前滚和回滚的日志。但首先我想知道我恢复 DB 需要更长时间的原因是什么?
另外,如果我在此设置中再添加一个辅助节点,想得到一个输入,它会有效地帮助我吗?
添加错误日志:
2015-10-12 16:20:26.30 spid31s The recovery LSN (6821:15912:1) was identified for the database with ID 12. This is an informational message only. No user action is required.
2015-10-12 16:23:56.81 spid44s Recovery of database 'A' (11) is 0% complete (approximately 1168 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2015-10-12 16:23:57.28 spid44s Recovery of database 'A' (11) is 0% complete (approximately 1091 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2015-10-12 16:23:57.28 spid44s Recovery of database 'A' (11) is 0% complete (approximately 891 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 16:24:17.32 spid44s Recovery of database 'A' (11) is 31% complete (approximately 44 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 16:24:50.53 spid6s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [Z:\SQLData\A\A.mdf] in database [A] (11). The OS file handle is 0x0000000000000B9. The offset of the latest long I/O is: 0x0000041135800
2015-10-12 16:24:55.46 spid44s Recovery of database 'A' (11) is 53% complete (approximately 51 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 16:25:15.50 spid44s Recovery of database 'A' (11) is 85% complete (approximately 13 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 16:25:49.69 spid44s 3033 transactions rolled forward in database 'A' (11:0). This is an informational message only. No user action is required.
2015-10-12 16:25:50.15 spid44s Recovery completed for database A (database ID 11) in 290 second(s) (analysis 480 ms, redo 87801 ms, undo 0 ms.) This is an informational message only. No user action is required.
2015-10-12 16:25:52.35 spid44s CHECKDB for database 'A' finished without errors on 2012-10-27 22:19:16.470 (local time). This is an informational message only; no user action is required.
2015-10-12 16:42:57.67 spid24s AlwaysOn Availability Groups connection with primary database established for secondary database 'A' on the availability replica with Replica ID: {}. This is an informational message only. No user action is required.
2015-10-12 16:42:57.67 spid24s The recovery LSN (216726:2384:1) was identified for the database with ID 11. This is an informational message only. No user action is required.
2015-10-12 16:42:57.91 spid24s AlwaysOn Availability Groups connection with primary database established for secondary database 'A' on the availability replica with Replica ID: {}. This is an informational message only. No user action is required.
2015-10-12 16:42:57.91 spid24s The recovery LSN (216726:2384:1) was identified for the database with ID 11. This is an informational message only. No user action is required.
2015-10-12 16:42:58.53 spid24s Error: 35278, Severity: 17, State: 1.
2015-10-12 16:42:58.53 spid24s Availability database 'A', which is in the secondary role, is being restarted to resynchronize with the current primary database. This is an informational message only. No user action is required.
2015-10-12 16:42:58.53 spid29s Nonqualified transactions are being rolled back in database A for an AlwaysOn Availability Groups state change. Estimated rollback completion: 100%. This is an informational message only. No user action is required.
2015-10-12 16:42:58.53 spid38s AlwaysOn Availability Groups connection with primary database terminated for secondary database 'A' on the availability replica with Replica ID: {}. This is an informational message only. No user action is required.
2015-10-12 16:42:58.79 spid29s Starting up database 'A'.
2015-10-12 16:49:45.32 spid29s Recovery of database 'A' (11) is 0% complete (approximately 1168 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2015-10-12 16:49:45.78 spid29s Recovery of database 'A' (11) is 0% complete (approximately 1091 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2015-10-12 16:49:45.78 spid29s Recovery of database 'A' (11) is 0% complete (approximately 891 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 16:50:05.80 spid29s Recovery of database 'A' (11) is 35% complete (approximately 37 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 16:50:25.82 spid29s Recovery of database 'A' (11) is 68% complete (approximately 18 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 16:50:44.66 spid29s AlwaysOn Availability Groups connection with primary database established for secondary database 'A' on the availability replica with Replica ID: {}. This is an informational message only. No user action is required.
2015-10-12 16:50:44.66 spid29s The recovery LSN (216726:2386:1) was identified for the database with ID 11. This is an informational message only. No user action is required.
2015-10-12 16:50:44.66 spid29s Error: 35286, Severity: 16, State: 1.
2015-10-12 16:50:44.66 spid29s Using the recovery LSN (216726:2384:1) stored in the metadata for the database with ID 11. This is an informational message only. No user action is required.
2015-10-12 16:53:12.38 spid29s Error: 35278, Severity: 17, State: 1.
2015-10-12 16:53:12.38 spid29s Availability database 'A', which is in the secondary role, is being restarted to resynchronize with the current primary database. This is an informational message only. No user action is required.
2015-10-12 16:53:12.38 spid29s Nonqualified transactions are being rolled back in database A for an AlwaysOn Availability Groups state change. Estimated rollback completion: 100%. This is an informational message only. No user action is required.
2015-10-12 16:53:12.40 spid31s AlwaysOn Availability Groups connection with primary database terminated for secondary database 'A' on the availability replica with Replica ID: {}. This is an informational message only. No user action is required.
2015-10-12 16:53:14.45 spid29s Starting up database 'A'.
2015-10-12 17:05:44.30 spid29s Recovery of database 'A' (11) is 0% complete (approximately 1168 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2015-10-12 17:05:44.76 spid29s Recovery of database 'A' (11) is 0% complete (approximately 1091 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2015-10-12 17:05:44.76 spid29s Recovery of database 'A' (11) is 0% complete (approximately 891 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 17:06:04.88 spid29s Recovery of database 'A' (11) is 31% complete (approximately 45 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 17:06:24.92 spid29s Recovery of database 'A' (11) is 65% complete (approximately 21 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 17:06:45.55 spid29s AlwaysOn Availability Groups connection with primary database established for secondary database 'A' on the availability replica with Replica ID: {}. This is an informational message only. No user action is required.
2015-10-12 17:06:45.55 spid29s The recovery LSN (216726:19027:80) was identified for the database with ID 11. This is an informational message only. No user action is required.
2015-10-12 17:06:48.97 spid29s 3034 transactions rolled forward in database 'A' (11:0). This is an informational message only. No user action is required.
2015-10-12 17:06:49.04 spid29s Recovery completed for database A (database ID 11) in 710 second(s) (analysis 470 ms, redo 60412 ms, undo 0 ms.) This is an informational message only. No user action is required.
2015-10-12 17:06:49.07 spid20s AlwaysOn Availability Groups connection with primary database established for secondary database 'A' on the availability replica with Replica ID: {}. This is an informational message only. No user action is required.
2015-10-12 17:06:49.08 spid20s The recovery LSN (216726:19027:80) was identified for the database with ID 11. This is an informational message only. No user action is required.
问题可能在您看到的错误中:错误:35278,当这种情况发生时,您可能会观察到数据库长时间处于恢复状态。
这可能是由多种原因引起的,通常是长时间运行的事务。
您遇到的超时可能是由于要恢复并返回数据库的副本之间发送的流量引起的,但是,您确定超时不是由其他问题引起的吗?
我很想知道您在此数据库上的备份策略是什么,以及在此故障转移之前运行最后一次完整备份的时间。我最近在没有运行备份的测试环境中遇到了这种情况。完整备份和后续日志备份允许发生故障转移而没有问题,错误不存在并且恢复时间非常快。