我有这样的环境:
- 掌握
- 一些分配给主控的卫星
- 许多代理分配给卫星,一些代理分配给主(没有卫星)。
所有系统均已准备就绪,PKI 设置已完成。大多数默认检查(apt、disk、cpu)也在运行,我可以在主服务器上看到当前状态。现在我已经开始实施自定义检查(比如 check_eth 来监控网络流量)。我已将脚本发布到所有主机并在所有主机上定义了命令:
object CheckCommand "check_eth" {
import "plugin-check-command"
command = [ "/usr/bin/sudo", PluginDir + "/check_eth" ]
arguments = {
"-w" = {
value = "$eth_warning$"
description = "Percent free/used when to warn"
required = true
}
"-c" = {
value = "$eth_critical$"
description = "Percent free/used when critical"
required = true
}
"-i" = {
value = "$eth_interface$"
description = "Given network interface"
required = true
}
}
vars.eth_interface = "enp0s31f6"
vars.eth_warning = "2048G"
vars.eth_critical = "4096G"
}
我可以在所有主机上运行脚本。在 Master 上,可以看到卫星和所有直接分配给 master 的主机检查的响应。在所有具有 parent=satellite 的主机上,状态为 UNKNOWN。那是我的问题……为什么?
主机对象如下:
# master: /etc/icinga2/zones.conf
object Endpoint "monitor.domain" {
}
object Zone "master" {
endpoints = [ "monitor.domain" ]
}
object Endpoint "satellite1.domain" {
host = "<ip>"
port = "<port>"
}
object Zone "satellite1.domain" {
parent = "master"
endpoints = [ "satellite1.domain" ]
}
卫星配置如下所示:
# master: /etc/icinga2/zones.d/satellite1.domain/hosts.conf
object Host "satellite1.domain" {
import "generic-host"
check_command = "hostalive"
zone = "master"
address = "<ipv4>"
address6 = "<ipv6>"
vars.agent_endpoint = name
...
}
object Host "agent1.domain" {
import "generic-host"
check_command = "hostalive"
zone = "satellite1.domain"
address = "<ipv4>"
address6 = "<ipv6>"
vars.agent_endpoint = name
...
}
...
该区域包括。卫星内部的端点也在主节点上定义:
# master: /etc/icinga2/zones.d/satellite1.domain/zones.conf
object Zone "agent1.domain" {
parent = "satellite1.domain"
endpoints = [ "agent1.domain" ]
}
object Endpoint "agent1.domain" {
host = "<ip>"
port = "<port>"
}
现在将命令应用于主机(也在主机上定义)
# master: /etc/icinga2/zones.d/satellite1.domain/services.conf
apply Service "Network Traffic" {
import "generic-service"
check_command = "check_eth"
command_endpoint = host_name
assign where host.name == "satellite1.domain"
}
apply Service "Network Traffic" {
import "generic-service"
check_command = "check_eth"
command_endpoint = host_name
assign where host.name == "agent1.domain"
}
我想念什么?
啊,现在我发现了问题。
eth_interface
检查命令定义包含存在于卫星和主控上的默认值。但是虚拟机有另一个接口。如果我删除检查命令默认变量并为每个主机对象分配该变量,一切都很好。