Simple Network Monitoring Protocol (SNMP) may not be as simple as the name suggests; it's a de facto standard for many appliances and applications. It's not just ubiquitous—it's often the only sensible way in which one can extract the monitoring information from a network switch, disk array enclosure, UPS battery, and so on.
The basic architecture layout for SNMP monitoring is actually straightforward. Every monitored host or appliance runs an SNMP agent. This agent can be queried by any probe (whether it's just a command-line program to do manual queries or a monitoring server such as Zabbix) and will send back information on any metric it has made available or even change certain predefined settings on the host itself as a response to a set command from the probe. Furthermore, the agent is not just a passive entity that responds to the get and set commands but can also send warnings and alarms as SNMP traps to a predefined host when some specific conditions arise.
Things get a little more complicated when it comes to metric definitions. Unlike a regular Zabbix item, or any other monitoring system, an SNMP metric is part of a huge hierarchy, a tree of metrics that spans hardware vendors and software implementers across all of the IT landscape. This means that every metric has to be uniquely identified with some kind of code. This unique metric identifier is called OID and identifies both the object and its position in the SNMP hierarchy tree.
OIDs and their values are the actual content that is passed in the SNMP messages. While this is most efficient from a network traffic point of view, OIDs need to be translated into something usable and understandable by humans as well. This is done using a distributed database called Management Information Base (MIB). MIBs are essentially text files that describe a specific branch of the OID tree, with a textual description of its OIDs, their data types, and a human-readable string identificator.
MIBs let us know, for example, that OID 1.3.6.1.2.1.1.3 refers to the system uptime of whatever machine the agent is running on. Its value is expressed as an integer, in hundredths of a second and can generally be referred to as sysUpTime. The following diagram shows this:
As you can see, this is quite different from the way Zabbix agent items work, both in terms of the connection protocol, item definition, and organization. Nevertheless, Zabbix provides facilities to translate from SNMP OIDs to Zabbix items—if you compiled the support for the server in SNMP, it will be able to create the SNMP queries natively, and with the help of a couple of supporting tools, it will also be able to process SNMP traps.
This is, of course, an essential feature if you need to monitor appliances that only support SNMP and have no way of installing a native agent on network appliances in general (switcher, routers, and so forth), disk array enclosures, and so on. But the following may be reasons for you to actually choose SNMP as the main monitoring protocol in your network and completely dispense with Zabbix agents:
161
and 162
is already permitted or because it might be easier to ask a security administrator to allow access to a well-known protocol instead of a relatively more obscure one.logrt[]
key for a Zabbix active item. The same thing applies if you want to monitor the checksum, the size of a specific file, or the Performance Monitor facility of the Windows operating system, and so on. In all these cases, the Zabbix agent is the most immediate and simple choice.So, once again, you have to balance the different features of each solution in order to find the best match for your environment. But generally speaking, you could make the following broad assessments: if you have simple metrics but need strong security, go with SNMP v3; if you have complex monitoring or automated discovery needs and can dispense with strong security (or are willing to work harder to get it, as explained in Chapter 2, Distributed Monitoring), go with the Zabbix agent and protocol.
That said, there are a couple of aspects worth exploring when it comes to Zabbix SNMP monitoring. We'll first talk about simple SNMP queries and then about SNMP traps.
An SNMP monitoring item is quite simple to configure. The main point of interest is that while the server will use the SNMP OID that you provided to get the measurement, you'll still need to define a unique name for the item and, most importantly, a unique item key. Keep in mind that an item key is used in all of Zabbix's expressions that define triggers, calculated items, actions, and so on. So, try to keep it short and simple, while easily recognizable. As an example, let's suppose that you want to define a metric for the incoming traffic on network port number 3 of an appliance, the OID would be 1.3.6.1.2.1.2.2.1.10.3
, while you could call the key something similar to port3.ifInOctects
, as shown in the following screenshot:
If you don't already have your SNMP items defined in a template, an easy way to get them is using the snmpwalk
tool to directly query the host that you need to monitor and get information about the available OIDs and their data types.
For example, the following command is used to get the whole object tree from the appliance at 10.10.15.19
:
$ snmpwalk -v 3 -l AuthPriv -u user -a MD5 -A auth -x DES -X priv -m ALL 10.10.15.19
You need to substitute the user
string with the username for the SNMP agent, auth
with the authentication password for the user, priv
with the privacy password, MD5
with the appropriate authentication protocol, and DES
with the privacy protocol that you defined for the agent. Please remember that the authentication password and the privacy password must be longer than eight characters.
The SNMP agent on the host will respond with a list of all its OIDs. The following is a fragment of what you could get:
HOST-RESOURCES-MIB::hrSystemUptime.0 = Timeticks: (8609925) 23:54:59.25HOST-RESOURCES-MIB::hrSystemDate.0 = STRING: 2013-7-28,9:38:51.0,+2:0 HOST-RESOURCES-MIB::hrSystemInitialLoadDevice.0 = INTEGER: 393216 HOST-RESOURCES-MIB::hrSystemInitialLoadParameters.0 = STRING: "root=/dev/sda8 ro" HOST-RESOURCES-MIB::hrSystemNumUsers.0 = Gauge32: 2 HOST-RESOURCES-MIB::hrSystemProcesses.0 = Gauge32: 172 HOST-RESOURCES-MIB::hrSystemMaxProcesses.0 = INTEGER: 0 HOST-RESOURCES-MIB::hrMemorySize.0 = INTEGER: 8058172 KBytes HOST-RESOURCES-MIB::hrStorageDescr.1 = STRING: Physical memory HOST-RESOURCES-MIB::hrStorageDescr.3 = STRING: Virtual memory HOST-RESOURCES-MIB::hrStorageDescr.6 = STRING: Memory buffers HOST-RESOURCES-MIB::hrStorageDescr.7 = STRING: Cached memory HOST-RESOURCES-MIB::hrStorageDescr.8 = STRING: Shared memory HOST-RESOURCES-MIB::hrStorageDescr.10 = STRING: Swap space HOST-RESOURCES-MIB::hrStorageDescr.35 = STRING: /run HOST-RESOURCES-MIB::hrStorageDescr.37 = STRING: /dev/shm HOST-RESOURCES-MIB::hrStorageDescr.39 = STRING: /sys/fs/cgroup HOST-RESOURCES-MIB::hrStorageDescr.53 = STRING: /tmp HOST-RESOURCES-MIB::hrStorageDescr.56 = STRING: /boot
Let's say that we are interested in the system's memory size. To get the full OID for it, we will reissue the snmpwalk
command using the fn
option for the -O
switch. These will tell snmpwalk
to display the full OIDs in a numeric format. We will also limit the query to the OID we need, as taken from the previous output:
$ snmpwalk -v 3 -l AuthPriv -u user -a MD5 -A auth -x DES -X priv -m ALL -O fn 10.10.15.19 HOST-RESOURCES-MIB::hrMemorySize.0 .1.3.6.1.2.1.25.2.2.0 = INTEGER: 8058172 KBytes
And there we have it. The OID we need to put in our item definition is 1.3.6.1.2.1.25.2.2.0
.
SNMP traps are a bit of an oddball if compared to all the other Zabbix item types. Unlike other items, SNMP traps do not report a simple measurement but an event of some type. In other words, they are the result of a kind of check or computation made by the SNMP agent and sent over to the monitoring server as a status report. An SNMP trap can be issued every time a host is rebooted, an interface is down, a disk is damaged, or a UPS has lost power and is keeping the servers up using its battery.
This kind of information contrasts with Zabbix's basic assumption, that is, an item is a simple metric not directly related to a specific event. On the other hand, there may be no other way to be aware of certain situations if not through an SNMP trap either because there are no related metrics (consider, for example, the event the server is being shut down) or because the appliance's only way to convey its status is through a bunch of SNMP objects and traps.
So, traps are of relatively limited use to Zabbix as you can't do much more than build a simple trigger out of every trap and then notify about the event (not much point in graphing a trap or building calculated items on it). Nevertheless, they may prove essential for a complete monitoring solution.
To manage SNMP traps effectively, Zabbix needs a couple of helper tools: the snmptrapd daemon, to actually handle connections from the SNMP agents, and a kind of script to correctly format every trap and pass it to the Zabbix server for further processing.
If you have compiled an SNMP support into the Zabbix server, you should already have the complete SNMP suite installed, which contains the SNMP daemon, the SNMP trap daemon, and a bunch of utilities, such as snmpwalk
and snmptrap
.
If it turns out that you don't actually have the SNMP suite installed, the following command should take care of the matter:
# yum install net-snmp net-snmp-utils
Just as the Zabbix server has a bunch of daemon processes that listen on the TCP port 10051
for incoming connections (from agents, proxies, and nodes), snmptrapd
is the daemon process that listens on the UDP port 162
for incoming traps coming from remote SNMP agents.
Once installed, snmptrapd
reads its configuration options from an snmptrapd.conf
file, which can be usually found in the /etc/snmp/
directory. The bare minimum configuration for snmptrapd
requires only the definition of a community string in the case of versions 1 and 2 of SNMP, which is as follows:
authCommunity log public
Alternatively, the definition of a user and a privacy level in the case of SNMP Version 3 is as follows:
createUser -e ENGINEID user MD5 auth DES priv
You need to create a separate createUser
line for every remote Version 3 agent that will send traps. You also need to substitute all the user
, auth
, priv
, MD5
, and DES
strings with what you have already configured on the agent, as explained in the previous note. Most importantly, you need to set the correct ENGINEID
for every agent. You can get it from the agent's configuration itself.
With this minimal configuration, snmptrapd
will limit itself to log the trap to syslog. While it could be possible to extract this information and send it to Zabbix, it's easier to tell snmptrapd
how it should handle the traps. While the daemon has no processing capabilities of its own, it can execute any command or application by either using the trapHandle
directive or leveraging its embedded perl
functionality. The latter is more efficient as the daemon won't have to fork a new process and wait for its execution to finish, so it's the recommended one if you plan to receive a significant number of traps. Just add the following line to snmptrapd.conf
:
perl do "/usr/local/bin/zabbix_trap_receiver.pl";
Once it is restarted, the snmptrapd
daemon will execute the perl
script of your choice to process every trap received. As you can probably imagine, your job doesn't end here—you still need to define how to handle the traps in your script and find a way to send the resulting work over to your Zabbix server. We'll discuss both of these aspects in the following section.
The perl
script included in the Zabbix distribution works as a translator from an SNMP trap format to a Zabbix item measurement. For every trap received, it will format it according to the rules defined in the script and will output the result in a log file. The Zabbix server will, in turn, monitor the said log file and process every new line as an SNMP trap item, basically matching the content of the line to any trap item defined for the relevant host. Let's see how it all works by looking at the perl
script itself and illustrating its logic:
#!/usr/bin/perl # # Zabbix # Copyright (C) 2001-2013 Zabbix SIA # ######################################### #### ABOUT ZABBIX SNMP TRAP RECEIVER #### ######################################### # This is an embedded perl SNMP trapper receiver designed for # sending data to the server. # The receiver will pass the received SNMP traps to Zabbix server # or proxy running on the # same machine. Please configure the server/proxy accordingly. # # Read more about using embedded perl with Net-SNMP: # http://net-snmp.sourceforge.net/wiki/index.php/Tut:Extending_snmpd_using_perl
This first section contains just the licensing information and a brief description of the script. Nothing that's worth mentioning, except a simple reminder—check that your perl
executable is correctly referenced in the first line, or change it accordingly. The following section is more interesting, and if you are happy with the script's default formatting of SNMP traps, it may also be the only section that you will ever need to customize:
################################################# #### ZABBIX SNMP TRAP RECEIVER CONFIGURATION #### ################################################# $SNMPTrapperFile = '/tmp/zabbix_traps.tmp'; $DateTimeFormat = '%H:%M:%S %Y/%m/%d';
Just set $SNMPTrapperFile
to the path of the file that you wish the script to log its trap to, and set the SNMPTrapperFile
option in your zabbix_server.conf
file to the same value. While you are at it, also set StartSNMPTrapper
to 1
in zabbix_server.conf
so that the server will start monitoring the said file.
$DateTimeFormat
, on the other hand, should match the format of the actual SNMP traps you receive from the remote agents. Most of the time, the default value is correct, but take the time to check it and change it as needed.
The following section contains the actual logic of the script. Notice how the bulk of the logic is contained in a subroutine called zabbix_receiver
. This subroutine will be called and executed towards the end of the script but is worth examining in detail:
################################### #### ZABBIX SNMP TRAP RECEIVER #### ################################### use Fcntl qw(O_WRONLY O_APPEND O_CREAT); use POSIX qw(strftime); sub zabbix_receiver { my (%pdu_info) = %{$_[0]}; my (@varbinds) = @{$_[1]};
The snmptrapd
daemon will execute the script and pass the trap that it just received. The script will, in turn, call its subroutine, which will immediately distribute the trap information into two lists—the first argument is assigned to the %pdu_info
hash and the second one to the @varbinds
array:
# open the output file unless (sysopen(OUTPUT_FILE, $SNMPTrapperFile,O_WRONLY|O_APPEND|O_CREAT, 0666)) { print STDERR "Cannot open [$SNMPTrapperFile]:$! "; return NETSNMPTRAPD_HANDLER_FAIL; }
Here, the script will open the output file or fail graciously if it somehow cannot. The next step consists of extracting the hostname (or IP address) of the agent that sent the trap. This information is stored in the %pdu_info
hash we defined previously:
# get the host name my $hostname = $pdu_info{'receivedfrom'} || 'unknown'; if ($hostname ne 'unknown') { $hostname =~ /[(.*?)].*/; $hostname = $1 || 'unknown'; }
Now, we are ready to build the actual SNMP trap notification message. The first part of the output will be used by Zabbix to recognize the presence of a new trap (by looking for the ZBXTRAP
string and knowing which of the monitored hosts the trap refers to). Keep in mind that the IP address or hostname set here must match the SNMP address value in the host configuration as set using the Zabbix frontend. This value must be set even if it's identical to the main IP/hostname for a given host. Once the Zabbix server has identified the correct host, it will discard this part of the trap notification:
# print trap header # timestamp must be placed at the beginning of the first line (can be omitted) # the first line must include the header "ZBXTRAP [IP/DNS address] " # * IP/DNS address is the used to find the corresponding SNMP trap items # * this header will be cut during processing (will not appear in the item value) printf OUTPUT_FILE "%s ZBXTRAP %s ", strftime($DateTimeFormat, localtime), $hostname;
After the notification header, the script will output the rest of the trap as received by the SNMP agent:
# print the PDU info print OUTPUT_FILE "PDU INFO: "; foreach my $key(keys(%pdu_info)) { printf OUTPUT_FILE " %-30s %s ", $key, $pdu_info{$key}; }
The printf
statement in the previous code will circle over the %pdu_info
hash and output every key-value pair:
# print the variable bindings: print OUTPUT_FILE "VARBINDS: "; foreach my $x (@varbinds) { printf OUTPUT_FILE " %-30s type=%-2d value=%s ", $x->[0], $x->[2], $x->[1]; } close (OUTPUT_FILE); return NETSNMPTRAPD_HANDLER_OK; }
The second printf
statement, printf OUTPUT_FILE " %-30s type=%-2d value=%s
", $x->[0], $x->[2], $x->[1];
, will output the contents of the @varbinds
array one by one. This array is the one that contains the actual values reported by the trap. Once done, the log file is closed and the execution of the subroutine ends with an exit message:
NetSNMP::TrapReceiver::register("all", &zabbix_receiver) or die "failed to register Zabbix SNMP trap receiver "; print STDOUT "Loaded Zabbix SNMP trap receiver ";
The last few lines of the script set the zabbix_receiver
subroutine as the actual trap handler and give feedback about its correct setup. Once the trap handler starts populating the zabbix_traps.log
log file, you need to define the corresponding Zabbix items.
As you've already seen, the first part of the log line is used by the Zabbix trap receiver to match a trap with its corresponding host. The second part is matched to the aforesaid host's SNMP trap item's RegExp definitions, and its contents are added to every matching item's history of values. This means that if you wish to have a startup trap item for a given host, you'll need to configure an SNMP trap item with an snmptrap["coldStart"]
key, as shown in the following screenshot:
From now on, you'll be able to see the contents of the trap in the item's data history.