上周我们的数据中心停电了,当我们的双 PIX 515E 运行 IOS 7.0(8)(配置了故障转移电缆)恢复时,它们处于故障转移状态,其中辅助单元处于活动状态,主单元是待机我尝试了“故障转移重置”、“故障转移活动”和“故障转移重新加载-待机”,并以各种顺序在两个单元上执行重新加载,它们不会返回主/活动辅助/备用。我唯一没有尝试过的就是开车到数据中心并执行硬重启,我讨厌这样做。
我已经阅读了 Cisco Secure Firewall 上的故障转移是如何工作的,看起来这应该是直截了当的。
show failover
初级输出:
Failover On
Cable status: Normal
Failover unit Primary
Failover LAN Interface: N/A - Serial-based failover enabled
Unit Poll frequency 15 seconds, holdtime 45 seconds
Interface Poll frequency 15 seconds
Interface Policy 1
Monitored Interfaces 2 of 250 maximum
Version: Ours 7.0(8), Mate 7.0(8)
Last Failover at: 02:52:05 UTC Mar 10 2010
This host: Primary - Standby Ready
Active time: 0 (sec)
Interface outside (x.x.x.165): Normal
Interface inside (y.y.y.3): Normal
Other host: Secondary - Active
Active time: 897045 (sec)
Interface outside (x.x.x.164): Normal
Interface inside (y.y.y.4): Normal
Stateful Failover Logical Update Statistics
Link : Unconfigured.
show failover
次级输出:
Failover On
Cable status: Normal
Failover unit Secondary
Failover LAN Interface: N/A - Serial-based failover enabled
Unit Poll frequency 15 seconds, holdtime 45 seconds
Interface Poll frequency 15 seconds
Interface Policy 1
Monitored Interfaces 2 of 250 maximum
Version: Ours 7.0(8), Mate 7.0(8)
Last Failover at: 02:03:04 UTC Feb 28 2010
This host: Secondary - Active
Active time: 896925 (sec)
Interface outside (x.x.x.164): Normal
Interface inside (y.y.y.4): Normal
Other host: Primary - Standby Ready
Active time: 0 (sec)
Interface outside (x.x.x.165): Normal
Interface inside (y.y.y.3): Normal
Stateful Failover Logical Update Statistics
Link : Unconfigured.
我在我的系统日志中看到以下内容:
Mar 10 03:05:00 fw1 %PIX-5-111008: User 'enable_15' executed the 'failover reset' command.
Mar 10 03:05:09 fw1 %PIX-5-111008: User 'enable_15' executed the 'failover reload-standby' command.
Mar 10 03:05:12 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=406,op=20,my=Active,peer=Failed.
Mar 10 03:05:12 fw1 %PIX-6-720028: (VPN-Secondary) HA status callback: Peer state Failed.
Mar 10 03:06:09 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=401,op=0,my=Active,peer=Failed.
Mar 10 03:06:09 fw1 %PIX-6-720024: (VPN-Secondary) HA status callback: Control channel is down.
Mar 10 03:06:09 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=401,op=1,my=Active,peer=Failed.
Mar 10 03:06:10 fw1 %PIX-6-720024: (VPN-Secondary) HA status callback: Control channel is up.
Mar 10 03:06:10 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=411,op=2,my=Active,peer=Failed.
Mar 10 03:06:23 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=406,op=80,my=Active,peer=Standby Ready.
Mar 10 03:06:23 fw1 %PIX-6-720028: (VPN-Secondary) HA status callback: Peer state Standby Ready.
Mar 10 03:06:24 fw2 %PIX-6-720027: (VPN-Primary) HA status callback: My state Standby Ready.
Mar 10 03:07:05 fw1 %PIX-5-111008: User 'enable_15' executed the 'failover reset' command.
Mar 10 03:07:31 fw1 %PIX-5-111008: User 'enable_15' executed the 'failover active' command.
Mar 10 03:08:04 fw1 %PIX-5-611103: User logged out: Uname: enable_1
Mar 10 03:08:04 fw1 %PIX-6-315011: SSH session from admin1_int on interface inside for user "pix" terminated normally
Mar 10 03:08:39 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=406,op=20,my=Active,peer=Failed.
Mar 10 03:08:39 fw1 %PIX-6-720028: (VPN-Secondary) HA status callback: Peer state Failed.
Mar 10 03:09:10 fw1 %PIX-6-605005: Login permitted from admin1_int/36891 to inside:192.168.4.4/ssh for user "pix"
Mar 10 03:09:23 fw1 %PIX-5-111008: User 'enable_15' executed the 'failover reset' command.
Mar 10 03:09:38 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=401,op=0,my=Active,peer=Failed.
Mar 10 03:09:39 fw1 %PIX-6-720024: (VPN-Secondary) HA status callback: Control channel is down.
Mar 10 03:09:39 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=401,op=1,my=Active,peer=Failed.
Mar 10 03:09:39 fw1 %PIX-6-720024: (VPN-Secondary) HA status callback: Control channel is up.
Mar 10 03:09:39 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=411,op=2,my=Active,peer=Failed.
Mar 10 03:09:52 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=406,op=80,my=Active,peer=Standby Ready.
Mar 10 03:09:52 fw1 %PIX-6-720028: (VPN-Secondary) HA status callback: Peer state Standby Ready.
Mar 10 03:09:53 fw2 %PIX-6-720027: (VPN-Primary) HA status callback: My state Standby Ready.
我不确定如何解释该系统日志数据。Primary 似乎甚至没有尝试变得活跃。当我单独重新加载各个单元时,我的连接被保留,所以看起来我没有真正的硬件故障。我可以查询(IOS 或 SNMP)来检查硬件问题吗?
有什么想法吗?我的IOS-fu很弱。
感谢您提供的任何帮助,亚伦
请不要使用
no failover
natacado 提到的命令。相反,请no failover active
在辅助(当前活动的)防火墙上使用该命令。第一个命令关闭故障转移;第二个命令将活动状态放弃给 HA 对中的另一个防火墙。如果您运行failover active
,请在主(当前备用)防火墙上运行它。我不相信 PIX 提供了在主防火墙准备好再次处理流量时允许自动抢占的功能。
请发布您的故障转移配置(“显示运行故障转移”)。或者尝试启用抢占(您需要手动指定哪个单元是主要的,哪个是次要的)。
至少对于 ASA5500 系列设备,您需要在 VPN-Primary 上运行以下命令:
no failover
这也应该适用于具有相对较新操作系统的 PIX。本质上,可以将其
failover
视为一个命令,它告诉单元尝试使辅助单元成为活动单元,并且像许多配置命令一样,no failover
删除该操作。FWIW,我们能够解决此问题的唯一方法是物理关闭两个防火墙,然后以正确的顺序将它们恢复。上述建议都无法为我解决问题。不过,感谢大家的时间和帮助。