Monit

当前板块内容涉及的版本较最新时间跨度较长,部分内容可能与当前信息不符。

详细请参考 官方文档

该板块内容来自 Kev 的贡献。🙂


安装

使用apt-get安装的monit版本(5.6)偏低,缺少很多功能,所有直接安装最新版(5.11)的deb包。

wget http://mirrors.kernel.org/ubuntu/pool/universe/m/monit/monit_5.11-1_amd64.deb
sudo dpkg -i monit_5.11-1_amd64.deb


配置

/etc/monit/conf.d/setting.cfg
set httpd port 2812 and
    use address 0.0.0.0
    allow 0.0.0.0/0
    allow admin:admin
 
set mailserver smtp.gmail.com port 587 
    username "MYUSER" password "MYPASSWORD"
    using tlsv1
 
set mail-format {
     from: MYUSER@gmail.com
  subject: monit alert --  $EVENT $SERVICE
  message: $EVENT Service $SERVICE
                Date:        $DATE
                Action:      $ACTION
                Host:        $HOST
                Description: $DESCRIPTION
 
           Your faithful employee,
           Monit
}
 
set alert MYUSER@gmail.com
/etc/monit/conf.d/ubuntu.cfg
check host 192.168.1.111 with address 192.168.1.111
 
  if failed
    ping
  then
    alert
 
  if failed
    port 2375
    proto http
    request "/info"
    status 200
  then
    alert


管理

# 检查配置文件语法
sudo monit -t
# 重新加载配置文件
sudo service monit reload
# 查看被监控服务状态详情
sudo monit status
# 查看被监控服务状态摘要
sudo monit summary

Web客户端: http://admin:admin@localhost:2812


开发

  • 开源版monit只提供简单的web客户端,但没提供RESTful API。
  • 收费版m/monit,提供各种高级的功能,并且提供RESTful API,但是价格不菲。
  • monit 调用 hipchat的 API,实时汇报程序运行状态:参考 hipchat 文档#send_room_notification

通过分析 m/monit 的工作原理,可以在 monit 的配置文件中添加一个设置,来获取 agent(即monit)提交的XML数据。

/etc/monit/monitrc
set mmonit http://admin:admin@localhost:2811

使用 ncat 捕获XML数据。

ncat -vlk 2811

使用nomit, 捕获XML数据:

mmonit.py
import BaseHTTPServer
import xml.dom.minidom
import lxml.objectify
import nomit
 
class ExampleHandler(nomit.MonitXmlHandler):
    def handle_unparsed(self, s):
        print xml.dom.minidom.parseString(s).toprettyxml()
 
    def handle_parsed(self, o):
        print lxml.objectify.dump(o)
 
server = BaseHTTPServer.HTTPServer(("127.0.0.1", 2811), ExampleHandler)
server.serve_forever()

捕获到的 XML 数据,如下所示。

capture.xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<monit id="7f6467146ee2bdd9aa05b6f018b8c89a" incarnation="1418632062" version="5.10">
  <server>
    <uptime>246</uptime>
    <poll>120</poll>
    <startdelay>0</startdelay>
    <localhostname>localhost</localhostname>
    <controlfile>/etc/monit/monitrc</controlfile>
    <httpd>
      <address>0.0.0.0</address>
      <port>2812</port>
      <ssl>0</ssl>
    </httpd>
    <credentials>
      <username>admin</username>
      <password>admin</password>
    </credentials>
  </server>
  <platform>
    <name>Linux</name>
    <release>3.13.0-24-generic</release>
    <version>#46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014</version>
    <machine>x86_64</machine>
    <cpu>4</cpu>
    <memory>12205828</memory>
    <swap>12480508</swap>
  </platform>
  <services>
    <service name="192.168.1.111">
      <type>4</type>
      <collected_sec>1418632191</collected_sec>
      <collected_usec>16483</collected_usec>
      <status>0</status>
      <status_hint>0</status_hint>
      <monitor>1</monitor>
      <monitormode>0</monitormode>
      <pendingaction>0</pendingaction>
      <icmp>
        <type>Ping</type>
        <responsetime>0.000</responsetime>
      </icmp>
      <port>
        <hostname>192.168.1.111</hostname>
        <portnumber>2375</portnumber>
        <request><![CDATA[/info]]></request>
        <protocol>HTTP</protocol>
        <type>TCP</type>
        <responsetime>0.003</responsetime>
      </port>
    </service>
    <service name="localhost">
      <type>5</type>
      <collected_sec>1418632191</collected_sec>
      <collected_usec>37063</collected_usec>
      <status>0</status>
      <status_hint>0</status_hint>
      <monitor>1</monitor>
      <monitormode>0</monitormode>
      <pendingaction>0</pendingaction>
      <system>
        <load>
          <avg01>5.08</avg01>
          <avg05>4.90</avg05>
          <avg15>4.82</avg15>
        </load>
        <cpu>
          <user>5.0</user>
          <system>3.9</system>
          <wait>47.1</wait>
        </cpu>
        <memory>
          <percent>15.2</percent>
          <kilobyte>1861684</kilobyte>
        </memory>
        <swap>
          <percent>4.4</percent>
          <kilobyte>560988</kilobyte>
        </swap>
      </system>
    </service>
  </services>
  <servicegroups/>
</monit>

调用 hipchat 的API。

http 'https://api.hipchat.com/v2/room/XXX/notification?auth_token=XXXXXX' notify:=true message_format=text color=red message="..........alert.........."
hipchat.sh
#!/bin/bash
# -*- coding: utf-8 -*-
#
# send monit alert to hipchat
#
 
ROOM=XXX
TOKEN="XXXXXX"
URL="https://api.hipchat.com/v2/room/$ROOM/notification?auth_token=$TOKEN"
MSG="[$MONIT_HOST] $MONIT_SERVICE - $MONIT_DESCRIPTION"
DATA="{\"notify\": true, \"message_format\": \"text\", \"color\": \"red\", \"message\": \"$MSG\"}"
 
curl -H 'content-type: application/json' --data-binary "$DATA" "$URL"


实用功能

/etc/monit/conf.d/restart.cfg
# restart a long running program (one week)
check process xxx with pidfile /path/to/xxx.pid
    restart program = "service xxx restart"
    if uptime > 7 days then restart
/etc/monit/conf.d/notify.cfg
# alert when program is freezed (no logging)
check file yyy with path /path/to/yyy.log
    if timestamp > 1 hour then alert
/etc/monit/conf.d/autossh.cfg
# kill autossh client if connection lost
check host pi with address 127.0.0.1
  if failed
    port 31415 protocol ssh
  then
    exec "/usr/bin/pkill -u guest sshd"
      as uid guest and gid guest


报错排除

$ sudo monit status
Socket error -- Connection refused
Error connecting to the monit daemon
/etc/monit/monitrc
set httpd port 2812 and
    use address localhost  # only accept connection from localhost
    allow localhost        # allow localhost to connect to the server and
    allow admin:monit      # require user 'admin' with password 'monit'

FIXME

Jovi Meng 2017/09/06 08:56