Zabbix SNMP monitoring

Simple Network Monitoring Protocol (SNMP) may not be as simple as the name suggests; it's a de facto standard for many appliances and applications. It's not just ubiquitous—it's often the only sensible way in which one can extract the monitoring information from a network switch, disk array enclosure, UPS battery, and so on.

The basic architecture layout for SNMP monitoring is actually straightforward. Every monitored host or appliance runs an SNMP agent. This agent can be queried by any probe (whether it's just a command-line program to do manual queries or a monitoring server such as Zabbix) and will send back information on any metric it has made available or even change certain predefined settings on the host itself as a response to a set command from the probe. Furthermore, the agent is not just a passive entity that responds to the get and set commands but can also send warnings and alarms as SNMP traps to a predefined host when some specific conditions arise.

Things get a little more complicated when it comes to metric definitions. Unlike a regular Zabbix item, or any other monitoring system, an SNMP metric is part of a huge hierarchy, a tree of metrics that spans hardware vendors and software implementers across all of the IT landscape. This means that every metric has to be uniquely identified with some kind of code. This unique metric identifier is called OID and identifies both the object and its position in the SNMP hierarchy tree.

OIDs and their values are the actual content that is passed in the SNMP messages. While this is most efficient from a network traffic point of view, OIDs need to be translated into something usable and understandable by humans as well. This is done using a distributed database called Management Information Base (MIB). MIBs are essentially text files that describe a specific branch of the OID tree, with a textual description of its OIDs, their data types, and a human-readable string identificator.

MIBs let us know, for example, that OID 1.3.6.1.2.1.1.3 refers to the system uptime of whatever machine the agent is running on. Its value is expressed as an integer, in hundredths of a second and can generally be referred to as sysUpTime. The following diagram shows this:

Zabbix SNMP monitoring

As you can see, this is quite different from the way Zabbix agent items work, both in terms of the connection protocol, item definition, and organization. Nevertheless, Zabbix provides facilities to translate from SNMP OIDs to Zabbix items—if you compiled the support for the server in SNMP, it will be able to create the SNMP queries natively, and with the help of a couple of supporting tools, it will also be able to process SNMP traps.

This is, of course, an essential feature if you need to monitor appliances that only support SNMP and have no way of installing a native agent on network appliances in general (switcher, routers, and so forth), disk array enclosures, and so on. But the following may be reasons for you to actually choose SNMP as the main monitoring protocol in your network and completely dispense with Zabbix agents:

  • You may not need many complex or custom metrics apart from what is already provided by an operating system's SNMP OID branch. You, most probably, have already set up SNMP monitoring for your network equipment, and if you just need simple metrics, such as uptime, CPU load, free memory, and so on, from your average host, it might be simpler to rely on SNMP for it as well instead of the native Zabbix agent. This way, you will never have to worry about agent deployment and updates—you just let the Zabbix server contact the remote SNMP agents and get the information you need.
  • The SNMP protocol and port numbers are well known by virtually all the products. If you need to send monitoring information across networks, it might be easier to rely on the SNMP protocol instead of the Zabbix one. This could be because traffic on the UDP ports 161 and 162 is already permitted or because it might be easier to ask a security administrator to allow access to a well-known protocol instead of a relatively more obscure one.
  • SNMP Version 3 features built-in authentication and security. This means that, contrary to the Zabbix protocol, as you have already seen in Chapter 2, Distributed Monitoring, SNMPv3 messages will have integrity, confidentiality, and authentication. While Zabbix does support all three versions of SNMP, it's strongly advised that you use Version 3 wherever possible because it's the only one with real security features. In contrast, Version 1 and 2 only have a simple string sent inside a message as a very thin layer of security.
  • While there may be good reasons to use SNMP monitoring as much as possible in your Zabbix installation, there are still a couple of strong reasons to stick with the Zabbix agent. The Zabbix agent has a few, very useful built-in metrics that would need custom extensions if implemented through an SNMP agent. For example, if you want to monitor a log file, with automatic log rotation support, and skip old data, you just need to specify the logrt[] key for a Zabbix active item. The same thing applies if you want to monitor the checksum, the size of a specific file, or the Performance Monitor facility of the Windows operating system, and so on. In all these cases, the Zabbix agent is the most immediate and simple choice.
  • The Zabbix agent has the ability to discover many kinds of resources that are available on the host and report them back to the server, which will, in turn, automatically create items and triggers and destroy them when the said resources are not available anymore. This means that with the Zabbix agent, you will be able to let the server create the appropriate items for every host's CPU, mounted filesystem, number of network interfaces, and so on. While it's possible to define low-level discovery rules based on SNMP, it's often easier to rely on the Zabbix agent for this kind of functionality.

So, once again, you have to balance the different features of each solution in order to find the best match for your environment. But generally speaking, you could make the following broad assessments: if you have simple metrics but need strong security, go with SNMP v3; if you have complex monitoring or automated discovery needs and can dispense with strong security (or are willing to work harder to get it, as explained in Chapter 2, Distributed Monitoring), go with the Zabbix agent and protocol.

That said, there are a couple of aspects worth exploring when it comes to Zabbix SNMP monitoring. We'll first talk about simple SNMP queries and then about SNMP traps.

SNMP queries

An SNMP monitoring item is quite simple to configure. The main point of interest is that while the server will use the SNMP OID that you provided to get the measurement, you'll still need to define a unique name for the item and, most importantly, a unique item key. Keep in mind that an item key is used in all of Zabbix's expressions that define triggers, calculated items, actions, and so on. So, try to keep it short and simple, while easily recognizable. As an example, let's suppose that you want to define a metric for the incoming traffic on network port number 3 of an appliance, the OID would be 1.3.6.1.2.1.2.2.1.10.3, while you could call the key something similar to port3.ifInOctects, as shown in the following screenshot:

SNMP queries

If you don't already have your SNMP items defined in a template, an easy way to get them is using the snmpwalk tool to directly query the host that you need to monitor and get information about the available OIDs and their data types.

For example, the following command is used to get the whole object tree from the appliance at 10.10.15.19:

$ snmpwalk -v 3 -l AuthPriv -u user -a MD5 -A auth -x DES -X priv -m ALL 10.10.15.19

Tip

You need to substitute the user string with the username for the SNMP agent, auth with the authentication password for the user, priv with the privacy password, MD5 with the appropriate authentication protocol, and DES with the privacy protocol that you defined for the agent. Please remember that the authentication password and the privacy password must be longer than eight characters.

The SNMP agent on the host will respond with a list of all its OIDs. The following is a fragment of what you could get:

HOST-RESOURCES-MIB::hrSystemUptime.0 = Timeticks: (8609925) 23:54:59.25HOST-RESOURCES-MIB::hrSystemDate.0 = STRING: 2013-7-28,9:38:51.0,+2:0
HOST-RESOURCES-MIB::hrSystemInitialLoadDevice.0 = INTEGER: 393216
HOST-RESOURCES-MIB::hrSystemInitialLoadParameters.0 = STRING: "root=/dev/sda8 ro"
HOST-RESOURCES-MIB::hrSystemNumUsers.0 = Gauge32: 2
HOST-RESOURCES-MIB::hrSystemProcesses.0 = Gauge32: 172
HOST-RESOURCES-MIB::hrSystemMaxProcesses.0 = INTEGER: 0
HOST-RESOURCES-MIB::hrMemorySize.0 = INTEGER: 8058172 KBytes
HOST-RESOURCES-MIB::hrStorageDescr.1 = STRING: Physical memory
HOST-RESOURCES-MIB::hrStorageDescr.3 = STRING: Virtual memory
HOST-RESOURCES-MIB::hrStorageDescr.6 = STRING: Memory buffers
HOST-RESOURCES-MIB::hrStorageDescr.7 = STRING: Cached memory
HOST-RESOURCES-MIB::hrStorageDescr.8 = STRING: Shared memory
HOST-RESOURCES-MIB::hrStorageDescr.10 = STRING: Swap space
HOST-RESOURCES-MIB::hrStorageDescr.35 = STRING: /run
HOST-RESOURCES-MIB::hrStorageDescr.37 = STRING: /dev/shm
HOST-RESOURCES-MIB::hrStorageDescr.39 = STRING: /sys/fs/cgroup
HOST-RESOURCES-MIB::hrStorageDescr.53 = STRING: /tmp
HOST-RESOURCES-MIB::hrStorageDescr.56 = STRING: /boot

Let's say that we are interested in the system's memory size. To get the full OID for it, we will reissue the snmpwalk command using the fn option for the -O switch. These will tell snmpwalk to display the full OIDs in a numeric format. We will also limit the query to the OID we need, as taken from the previous output:

$ snmpwalk -v 3 -l AuthPriv -u user -a MD5 -A auth -x DES -X priv -m ALL -O fn 10.10.15.19 HOST-RESOURCES-MIB::hrMemorySize.0
.1.3.6.1.2.1.25.2.2.0 = INTEGER: 8058172 KBytes

And there we have it. The OID we need to put in our item definition is 1.3.6.1.2.1.25.2.2.0.

SNMP traps

SNMP traps are a bit of an oddball if compared to all the other Zabbix item types. Unlike other items, SNMP traps do not report a simple measurement but an event of some type. In other words, they are the result of a kind of check or computation made by the SNMP agent and sent over to the monitoring server as a status report. An SNMP trap can be issued every time a host is rebooted, an interface is down, a disk is damaged, or a UPS has lost power and is keeping the servers up using its battery.

This kind of information contrasts with Zabbix's basic assumption, that is, an item is a simple metric not directly related to a specific event. On the other hand, there may be no other way to be aware of certain situations if not through an SNMP trap either because there are no related metrics (consider, for example, the event the server is being shut down) or because the appliance's only way to convey its status is through a bunch of SNMP objects and traps.

So, traps are of relatively limited use to Zabbix as you can't do much more than build a simple trigger out of every trap and then notify about the event (not much point in graphing a trap or building calculated items on it). Nevertheless, they may prove essential for a complete monitoring solution.

To manage SNMP traps effectively, Zabbix needs a couple of helper tools: the snmptrapd daemon, to actually handle connections from the SNMP agents, and a kind of script to correctly format every trap and pass it to the Zabbix server for further processing.

The snmptrapd process

If you have compiled an SNMP support into the Zabbix server, you should already have the complete SNMP suite installed, which contains the SNMP daemon, the SNMP trap daemon, and a bunch of utilities, such as snmpwalk and snmptrap.

If it turns out that you don't actually have the SNMP suite installed, the following command should take care of the matter:

# yum install net-snmp net-snmp-utils

Just as the Zabbix server has a bunch of daemon processes that listen on the TCP port 10051 for incoming connections (from agents, proxies, and nodes), snmptrapd is the daemon process that listens on the UDP port 162 for incoming traps coming from remote SNMP agents.

Once installed, snmptrapd reads its configuration options from an snmptrapd.conf file, which can be usually found in the /etc/snmp/ directory. The bare minimum configuration for snmptrapd requires only the definition of a community string in the case of versions 1 and 2 of SNMP, which is as follows:

authCommunity log public

Alternatively, the definition of a user and a privacy level in the case of SNMP Version 3 is as follows:

createUser -e ENGINEID user MD5 auth DES priv

Tip

You need to create a separate createUser line for every remote Version 3 agent that will send traps. You also need to substitute all the user, auth, priv, MD5, and DES strings with what you have already configured on the agent, as explained in the previous note. Most importantly, you need to set the correct ENGINEID for every agent. You can get it from the agent's configuration itself.

With this minimal configuration, snmptrapd will limit itself to log the trap to syslog. While it could be possible to extract this information and send it to Zabbix, it's easier to tell snmptrapd how it should handle the traps. While the daemon has no processing capabilities of its own, it can execute any command or application by either using the trapHandle directive or leveraging its embedded perl functionality. The latter is more efficient as the daemon won't have to fork a new process and wait for its execution to finish, so it's the recommended one if you plan to receive a significant number of traps. Just add the following line to snmptrapd.conf:

perl do "/usr/local/bin/zabbix_trap_receiver.pl";

Tip

You can get the zabbix_trap_receiver script from the Zabbix sources. It's located in misc/snmptrap/zabbix_trap_receiver.pl.

Once it is restarted, the snmptrapd daemon will execute the perl script of your choice to process every trap received. As you can probably imagine, your job doesn't end here—you still need to define how to handle the traps in your script and find a way to send the resulting work over to your Zabbix server. We'll discuss both of these aspects in the following section.

The perl trap handler

The perl script included in the Zabbix distribution works as a translator from an SNMP trap format to a Zabbix item measurement. For every trap received, it will format it according to the rules defined in the script and will output the result in a log file. The Zabbix server will, in turn, monitor the said log file and process every new line as an SNMP trap item, basically matching the content of the line to any trap item defined for the relevant host. Let's see how it all works by looking at the perl script itself and illustrating its logic:

#!/usr/bin/perl

#
# Zabbix
# Copyright (C) 2001-2013 Zabbix SIA
#
#########################################
#### ABOUT ZABBIX SNMP TRAP RECEIVER ####
#########################################


# This is an embedded perl SNMP trapper receiver designed for
# sending data to the server.
# The receiver will pass the received SNMP traps to Zabbix server
# or proxy running on the
# same machine. Please configure the server/proxy accordingly.
#
# Read more about using embedded perl with Net-SNMP:
#       http://net-snmp.sourceforge.net/wiki/index.php/Tut:Extending_snmpd_using_perl

This first section contains just the licensing information and a brief description of the script. Nothing that's worth mentioning, except a simple reminder—check that your perl executable is correctly referenced in the first line, or change it accordingly. The following section is more interesting, and if you are happy with the script's default formatting of SNMP traps, it may also be the only section that you will ever need to customize:

#################################################
#### ZABBIX SNMP TRAP RECEIVER CONFIGURATION ####
#################################################

$SNMPTrapperFile = '/tmp/zabbix_traps.tmp';
$DateTimeFormat = '%H:%M:%S %Y/%m/%d';

Just set $SNMPTrapperFile to the path of the file that you wish the script to log its trap to, and set the SNMPTrapperFile option in your zabbix_server.conf file to the same value. While you are at it, also set StartSNMPTrapper to 1 in zabbix_server.conf so that the server will start monitoring the said file.

$DateTimeFormat, on the other hand, should match the format of the actual SNMP traps you receive from the remote agents. Most of the time, the default value is correct, but take the time to check it and change it as needed.

The following section contains the actual logic of the script. Notice how the bulk of the logic is contained in a subroutine called zabbix_receiver. This subroutine will be called and executed towards the end of the script but is worth examining in detail:

###################################
#### ZABBIX SNMP TRAP RECEIVER ####
###################################
use Fcntl qw(O_WRONLY O_APPEND O_CREAT);
use POSIX qw(strftime);
sub zabbix_receiver
{
        my (%pdu_info) = %{$_[0]};
        my (@varbinds) = @{$_[1]};

The snmptrapd daemon will execute the script and pass the trap that it just received. The script will, in turn, call its subroutine, which will immediately distribute the trap information into two lists—the first argument is assigned to the %pdu_info hash and the second one to the @varbinds array:

# open the output file
unless (sysopen(OUTPUT_FILE, $SNMPTrapperFile,O_WRONLY|O_APPEND|O_CREAT, 0666))
  {
    print STDERR "Cannot open [$SNMPTrapperFile]:$!
";
    return NETSNMPTRAPD_HANDLER_FAIL;
  }

Here, the script will open the output file or fail graciously if it somehow cannot. The next step consists of extracting the hostname (or IP address) of the agent that sent the trap. This information is stored in the %pdu_info hash we defined previously:

# get the host name
my $hostname = $pdu_info{'receivedfrom'} || 'unknown';
if ($hostname ne 'unknown') {
  $hostname =~ /[(.*?)].*/;
  $hostname = $1 || 'unknown';
}

Now, we are ready to build the actual SNMP trap notification message. The first part of the output will be used by Zabbix to recognize the presence of a new trap (by looking for the ZBXTRAP string and knowing which of the monitored hosts the trap refers to). Keep in mind that the IP address or hostname set here must match the SNMP address value in the host configuration as set using the Zabbix frontend. This value must be set even if it's identical to the main IP/hostname for a given host. Once the Zabbix server has identified the correct host, it will discard this part of the trap notification:

# print trap header
#       timestamp must be placed at the beginning of the first line (can be omitted)
#       the first line must include the header "ZBXTRAP [IP/DNS address] "
#       * IP/DNS address is the used to find the corresponding SNMP trap items
#       * this header will be cut during processing (will not appear in the item value)
printf OUTPUT_FILE "%s ZBXTRAP %s
",
strftime($DateTimeFormat, localtime), $hostname;

After the notification header, the script will output the rest of the trap as received by the SNMP agent:

# print the PDU info
print OUTPUT_FILE "PDU INFO:
";
foreach my $key(keys(%pdu_info))
{
  printf OUTPUT_FILE "  %-30s %s
", $key,
  $pdu_info{$key};
}

The printf statement in the previous code will circle over the %pdu_info hash and output every key-value pair:

# print the variable bindings:
print OUTPUT_FILE "VARBINDS:
";
foreach my $x (@varbinds)
{

  printf OUTPUT_FILE "  %-30s type=%-2d value=%s
", $x->[0], $x->[2], $x->[1];
}
close (OUTPUT_FILE);
return NETSNMPTRAPD_HANDLER_OK;
}

The second printf statement, printf OUTPUT_FILE " %-30s type=%-2d value=%s ", $x->[0], $x->[2], $x->[1];, will output the contents of the @varbinds array one by one. This array is the one that contains the actual values reported by the trap. Once done, the log file is closed and the execution of the subroutine ends with an exit message:

NetSNMP::TrapReceiver::register("all", &zabbix_receiver) or
        die "failed to register Zabbix SNMP trap receiver
";
print STDOUT "Loaded Zabbix SNMP trap receiver
";

The last few lines of the script set the zabbix_receiver subroutine as the actual trap handler and give feedback about its correct setup. Once the trap handler starts populating the zabbix_traps.log log file, you need to define the corresponding Zabbix items.

As you've already seen, the first part of the log line is used by the Zabbix trap receiver to match a trap with its corresponding host. The second part is matched to the aforesaid host's SNMP trap item's RegExp definitions, and its contents are added to every matching item's history of values. This means that if you wish to have a startup trap item for a given host, you'll need to configure an SNMP trap item with an snmptrap["coldStart"] key, as shown in the following screenshot:

The perl trap handler

From now on, you'll be able to see the contents of the trap in the item's data history.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset