2012
07.28

A common way to hook an external script into Zabbix, is by using the UserParameter directive. These kind of checks have a set amount of time to return their result (maximum 30 seconds), otherwise they’ll just get killed by the Agent, return no data at all, and if you didn’t take this condition into account (using .nodata() in your triggers’ expressions) actual problems might not get detected… In practice the deadline could be even shorter: you don’t wan’t the Agent to spend too much time waiting for unresponsive services.

The script below is a simple parallel HTTP/HTTPS monitor. It will spawn up to the given number of threads, fetch the URL supplied, look for a matching string. The Parallel gem for Ruby makes it incredibly simple to implement such a scheme. When all the checks have completed, their results will be submitted back to Zabbix in one go with a single call to zabbix_sender.

Why parallel is important? Because waiting for a host to reply, or for a connection attempt to time out, is just a matter of, well, waiting. Your CPU is not really busy and can do some other work before the host decides to reply. Put it in another way: if you can afford to use 10 threads to monitor 10 hosts with a 30 seconds response time, your total check “run” will take 30 seconds total. With a single thread, the same check will take 5 minutes…

Here’s the script, we use it to check the availability of about 25 management interfaces (iLO or IPMI) in our Hadoop cluster.

Oh, one more thing: mind the Mutex. In a multi-threaded program, access to shared data must always be coordinated…

#!/usr/bin/env ruby
require 'rubygems'
require 'parallel'
require 'timeout'
require 'net/http'
 
MaxThreads = 10
MaxTime    = 30

checks = [
    {:key => 'fetch.bmc.slave123', :uri => 'http://192.168.123.123/page/login.html',  :match => 'STR_LOGIN_PASSWORD'},
    {:key => 'fetch.bmc.slave124', :uri => 'http://192.168.123.124/xmldata?item=All', :match => 'ProLiant'}
]

semaphore = Mutex.new
results = []

checker = lambda do |check|
    begin
        Timeout::timeout(MaxTime) do
            response = Net::HTTP.get_response(URI(check[:uri]))
            response.body =~ /(#{check[:match]})/s
            semaphore.synchronize { results.push({:key => check[:key], :v => ($1.nil? ? 0 : 1)}) }
        end
    rescue
        semaphore.synchronize { results.push({:key => check[:key], :v => 0}) }
    end
end
 
ZabbixSender        = File.join(File.dirname(__FILE__), 'zabbix_sender')
ZabbixSenderCmdLine = "#{ZabbixSender} -z 192.168.123.10 -s 'Zabbix Server' -i -"

Parallel.each(checks, :in_threads => MaxThreads, &checker)

data = ''
results.each do |i|
   data << "- #{i[:key]} #{i[:v]}\n"
end

Timeout::timeout(MaxTime) do
    IO.popen(ZabbixSenderCmdLine, :mode => 'w+', :external_encoding => Encoding::ASCII_8BIT) do |file|
        file.write data
    end
end
Share