在构成我们的 Nagios 服务器的大量文件中,有服务检查负载:
define service{
use generic-service
name check-load
hostgroup_name nrpe-hosts,!webnodes,!build-cluster
notification_options c,r
service_description NRPE - Load
check_command check_nrpe!check_load
contacts irc
}
还有两个联系人:
define contact{
contact_name irc
alias ircbot
host_notification_period 24x7
service_notification_period 24x7
host_notification_options d,u,r,f
service_notification_options w,u,c,r,f
service_notification_commands notify-by-epager
host_notification_commands host-notify-by-epager
pager [email protected]
}
define contact {
contact_name pagerduty
alias PagerDuty Pseudo-Contact
service_notification_period 24x7
host_notification_period 24x7
service_notification_options u,c,r
host_notification_options d,r
service_notification_commands notify-service-by-pagerduty
host_notification_commands notify-host-by-pagerduty
pager lol-no
}
编辑:还有,服务继承的事情:
define service{
name generic-service
check_period 24x7
max_check_attempts 3
normal_check_interval 3
retry_check_interval 1
notification_interval 0
notification_period 24x7
notification_options w,c,r
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
Edit2:还有一个通知命令定义,仅供怀疑者使用;):
# 'notify-by-epager' command definition
define command{
command_name notify-by-epager
command_line /usr/bin/printf "%b" "Service: $SERVICEDESC$\nHost: $HOSTNAME$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\nInfo: $SERVICEOUTPUT$\nDate: $LONGDATETIME$" | /bin/mail -s "$NOTIFICATIONTYPE$: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$" $CONTACTPAGER$
}
Edit3:和主机定义:
define host{
host_name vmprod1
alias vmprod1.example.com
address 192.1.1.123
use generic-host
hostgroups nrpe-hosts,vm-hosts,vm-prod,dellraid-hosts
contact_groups example,example-pager
}
这是服务描述“NRPE - 加载”的唯一检查。根据我的阅读,这应该只提醒 irc 联系人,而不是 pagerduty 联系人。然而,我上个月在 PagerDuty 中收到了 100 多个“NRPE - 加载”警报。
我错过了什么?
为了偿还我的感激之情,我会回答我自己的问题。事实证明,服务隐式继承自 hosts,因此上面的服务检查有一个联系人设置和一个继承的联系人组。
对服务检查进行简单修复即可: