The Zabbix log file format

One of the first places we should check when there's an unexplained issue is log files. This is not just a Zabbix-specific thing; log files are great. Sometimes. Other times, they do not help, but we will discuss some other options for when log files do not provide the answer. To be able to find the answer, though, it is helpful to know some basics about the log file format. The Zabbix log format is as follows:

PPPPPP:YYYYMMDD:HHMMSS.mmm

Here, PPPPPP is process ID, space-padded to six characters, YYYYMMDD is the current date, HHMMSS is the current time, and mmm is milliseconds for the timestamp. Colons and the dot are literal symbols. This prefix is followed by a space and then by the actual log message. Here's an example log entry:

10372:20151223:134406.865 database is down: reconnecting in 10 seconds

If there's a line in the log file without this prefix, it is most likely coming from an external source, such as a script, or maybe from some library, such as Net-SNMP.

During startup, output similar to the following will be logged:

3737:20181208:111546.489 Starting Zabbix Server. Zabbix 4.0.2 (revision 87228).
3737:20181208:111546.489 ****** Enabled features ******
3737:20181208:111546.489 SNMP monitoring: YES
3737:20181208:111546.489 IPMI monitoring: YES
3737:20181208:111546.489 Web monitoring: YES
3737:20181208:111546.489 VMware monitoring: YES
3737:20181208:111546.489 SMTP authentication: YES
3737:20181208:111546.489 Jabber notifications: YES
3737:20181208:111546.489 Ez Texting notifications: YES
3737:20181208:111546.489 ODBC: YES
3737:20181208:111546.489 SSH2 support: YES
3737:20181208:111546.489 IPv6 support: YES
3737:20181208:111546.489 TLS support: YES
3737:20181208:111546.489 ******************************
3737:20181208:111546.489 using configuration file: /etc/zabbix/zabbix_server.conf
3737:20181208:111546.500 current database version (mandatory/optional): 04000000/04000003
3737:20181208:111546.500 required mandatory version: 04000000

The first line prints out the daemon type and version. Depending on how it was compiled, it might also include the current SVN revision number. A list of the compiled-in features follows. This is very useful to know whether you should expect SNMP, IPMI, or VMware monitoring to work at all. Then, the path to the currently-used configuration file is shown—helpful when we want to figure out whether the file we changed was the correct one. In the server and proxy log files, both the current and the required database versions are present—we discussed those in Chapter 20, Zabbix Maintenance.

After the database versions, the internal process startup messages can be found:

  3737:20181208:111546.507 server #0 started [main process]
3747:20181208:111546.517 server #6 started [timer #1]
3748:20181208:111546.518 server #7 started [http poller #1]
3743:20181208:111546.518 server #2 started [alerter #1]
3744:20181208:111546.518 server #3 started [alerter #2]
3745:20181208:111546.518 server #4 started [alerter #3]
3749:20181208:111546.519 server #8 started [discoverer #1]
3750:20181208:111546.529 server #9 started [history syncer #1]
3746:20181208:111546.529 server #5 started [housekeeper #1]
3742:20181208:111546.529 server #1 started [configuration syncer #1]
3769:20181208:111546.529 server #28 started [trapper #5]
3771:20181208:111546.531 server #30 started [alert manager #1]
3754:20181208:111546.532 server #13 started [escalator #1]
3756:20181208:111546.533 server #15 started [proxy poller #1]
3757:20181208:111546.535 server #16 started [self-monitoring #1]
3758:20181208:111546.535 server #17 started [task manager #1]
3761:20181208:111546.535 server #20 started [poller #3]
3764:20181208:111546.546 server #23 started [unreachable poller #1]
3765:20181208:111546.556 server #24 started [trapper #1]
3755:20181208:111546.558 server #14 started [snmp trapper #1]
3763:20181208:111546.558 server #22 started [poller #5]
3772:20181208:111546.570 server #31 started [preprocessing manager #1]
3766:20181208:111546.570 server #25 started [trapper #2]
3751:20181208:111546.572 server #10 started [history syncer #2]
3753:20181208:111546.572 server #12 started [history syncer #4]
3759:20181208:111546.572 server #18 started [poller #1]
3762:20181208:111546.584 server #21 started [poller #4]
3767:20181208:111546.594 server #26 started [trapper #3]
3768:20181208:111546.596 server #27 started [trapper #4]
3770:20181208:111546.598 server #29 started [icmp pinger #1]
3752:20181208:111546.599 server #11 started [history syncer #3]
3760:20181208:111546.599 server #19 started [poller #2]
3774:20181208:111547.136 server #33 started [preprocessing worker #2]
3773:20181208:111547.162 server #32 started [preprocessing worker #1]
3775:20181208:111547.162 server #34 started [preprocessing worker #3]

There will be many more lines like these; the output here is trimmed. This might help verify that the expected number of processes of some type has been started. When looking at log file contents, it is not always obvious which process logged a specific line, and this is where the startup messages can help. If we see a line such as the following, we can find out which process logged it:

21974:20151231:184520.117 Zabbix agent item "vfs.fs.size[/,free]" on host "A test host" failed: another network error, wait for 15 seconds

We can do that by looking for the startup message with the same PID:

# grep 21974 zabbix_server.log | grep started
21974:20151231:184352.921 server #8 started [unreachable poller #1]
If more than one line is returned, apply common sense to find out the startup message.

This demonstrates that hosts are deferred to the unreachable poller after the first network failure.

But what if the log file has been rotated and the original startup messages are lost? Besides more advanced detective work, there's a simple method, provided that the daemon is still running. We will look at that method a bit later in the chapter runtime process status.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset