If you’ve made it this far, it is assumed that you’ve decided to join the ranks of the Ganglia user base. Congratulations! We’ll have your Ganglia-user conspiracy to conquer the world kit shipped immediately. Until it arrives, feel free to read through this chapter, in which we show you how to install and configure the various Ganglia components. In this chapter, we cover the installation and configuration of Ganglia 3.1.x for some of the most popular operating systems, but these instructions should apply to later versions as well.
As mentioned earlier, Ganglia is composed of three components: gmond, gmetad, and gweb. In this first section, we’ll cover the installation and basic setup of each component.
gmond stands for Ganglia Monitoring Daemon. It’s a lightweight service that must be installed on each node from which you want to have metrics collected. This daemon performs the actual metrics collection on each host using a simple listen/announce protocol to share the data it gleans with its peer nodes in the cluster. Using gmond, you can collect a lot of system metrics right out of the box, such as CPU, memory, disk, network, and data about active processes.
gmond installation is straightforward, and the libraries it depends upon are installed by default on most modern Linux distributions (as of this writing, those libraries are libconfuse, pkgconfig, PCRE, and APR). Ganglia packages are available for most Linux distributions, so if you are using the package manager shipped with your distribution (which is the suggested approach), resolving the dependencies should not be problematic.
The Ganglia components are available in a prepackaged binary format for most Linux distributions. We’ll cover the two most popular types here: .deb- and .rpm-based systems.
To install gmond on a Debian-based Linux distribution, execute:
user@host:# sudo apt-get install ganglia-monitor
You’ll find that some RPM-based distributions ship with Ganglia packages in the base repositories, and others require you to use special-purpose package repositories, such as the Red Hat project’s EPEL (Extra Packages for Enterprise Linux) repository. If you’re using a RPM-based distro, you should search in your current repositories for the gmond package:
user@host:$ yum search ganglia-gmond
If the search fails, chances are that Ganglia is not shipped with your RPM distribution. Red Hat users need to install Ganglia from the EPEL repository. The following examples demonstrate how to add the EPEL repository to Red Hat 5 and Red Hat 6.
If you need to add the EPEL repository, be sure to take careful note of the distro version and architecture you are running and match it to that of the EPEL you’re adding.
For Red Hat 5.x:
user@host:# sudo rpm -Uvh http://mirror.ancl.hawaii.edu/linux/epel/5/i386/epel-release-5-4.noarch.rpm
For Red Hat 6.x:
user@host:# sudo rpm -Uvh http://mirror.chpc.utah.edu/pub/epel/6/i386/epel-release-6-7.noarch.rpm
Finally, to install gmond, type:
user@host:# sudo yum install ganglia-gmond
gmond compiles and runs fine on Mac OS X; however, at the time of this writing, there are no prepackaged binaries available. OS X users must therefore build Ganglia from source. Refer to the following instructions, which work for the latest Mac OS X Lion. For other versions of Mac OS X, the dependencies might vary. Please refer to Ganglia’s website for further information.
Several dependencies must be satisfied before building and installing Ganglia on OS X. These are, in the order they should be installed:
Xcode >= 4.3
MacPorts (requires Xcode)
libconfuse (requires MacPorts)
pkgconfig (requires MacPorts)
PCRE (requires MacPorts)
APR (requires MacPorts)
Xcode is a collection of development tools, and an Integrated Development Environment (IDE) for OS X. You will find Xcode at Apple’s developer tools website for download or on the MAC OS X installation disc.
MacPorts is a collection of build instructions for popular open source software for OS X. It is architecturally identical to the venerable FreeBSD Ports system. To install MacPorts, download the installation disk image from the MacPorts website. MacPorts for MAC OS X Lion is here. If you’re using Snow Leopard, the download is located here. For older versions, please refer here for documentation and download links.
Once MacPorts is installed and working properly, use it to install both libconfuse and pkconfig:
$ sudo port install libconfuse pkgconfig pcre apr
After satisfying the previously listed requirements, you are ready to proceed with the installation. Please download the latest Ganglia source release.
Change to the directory where the source file has been downloaded. Uncompress the tar-gzip file you have just downloaded:
$ tar -xvzf ganglia-major.minor.release.tar.gz
On Mac OS X 10.5+, you need to apply a patch so that gmond builds successfully. For further details on the patch, please visit the website. Download the patch file, copy it to the root of the build directory, and run the patch:
$ cd ganglia-major.minor.release $ patch -p0 < patch-file
Assuming that you installed MacPorts under the default
installation directory (/opt/local), export
MacPorts’ bin directory to your PATH
and run the
configure script, specifying the location of lib/
and include/ as options:
$ export PATH=$PATH:/opt/local/bin $ ./configure LDFLAGS="-L/opt/local/lib" CPPFLAGS="-I/opt/local/include"
$ make $ sudo make install
Convenient binary packages for Solaris are distributed in the
OpenCSW collection. Follow
the standard procedure to install the OpenCSW. Run the
pkgutil
tool on Solaris, and then use the tool to
install the package:
$ pkgutil $ CSWgangliaagent
The default location for the configuration files on Solaris (OpenCSW) is /etc/opt/csw/ganglia. You can now start and stop all the Ganglia processes using the normal SMF utility on Solaris, such as:
$ svcadm enable cswgmond
Because Ganglia is an open source project, it is possible to compile a runnable binary executable of the gmond agent on virtually any platform with a C compiler.
The Ganglia projects uses the autotools build system to detect the tools available on most Linux and UNIX-like environments and build the binaries.
The autotools build system is likely to have support for many other platforms that are not explicitly documented in this book. Please start by reading the INSTALL file in the source tree, and also look online for tips about Ganglia or generic tips about using autotools projects in your environment.
gmetad (the Ganglia Meta Daemon) is the service that collects metric data from other gmetad and gmond sources and stores their state to disk in RRD format. It also provides a simple query mechanism for collecting specific information about groups of machines and supports hierarchical delegation, making possible the creation of federated monitoring domains.
The requirements for installing gmetad on Linux are nearly the same as gmond, except for the addition of RRDtool, which is required to store and display time-series data collected from other gmetad or gmond sources.
Once again, you are encouraged to take advantage of the prepackaged binaries available in the repository of your Linux distribution; we provide instructions for the two most popular formats next.
To install gmetad on a Debian-based Linux distribution, execute:
user@host:# sudo apt-get install gmetad
Compared to gmond, gmetad has additional software dependencies.
As mentioned in the earlier gmond installation section, an EPEL repository must be installed if the base repositories don’t provide gmetad. Refer to gmond to add the EPEL repository. Once you’re ready, type:
user@host:# sudo yum install ganglia-gmetad
There are only two functional differences between building gmond
and gmetad on OS X. First,
gmetad has one additional software dependency (RRDtool), and second,
you must include the --with-gmetad
option to the
configure script, because only gmond is built by the default
Makefile.
Following is the list of requirements that must be satisfied before you can build gmetad on Mac OS X:
Xcode >= 4.3
MacPorts (requires Xcode)
libconfuse (requires MacPorts)
pkgconfig (requires MacPorts)
PCRE (requires MacPorts)
APR (requires MacPorts)
RRDtool (requires MacPorts)
Refer to OS X for instructions on installing Xcode and MacPorts. Once you have those sorted out, install the following packages to satisfy the requirements:
$ sudo port install libconfuse pkgconfig pcre apr rrdtool
Once those packages have been installed, proceed with the Ganglia installation by downloading the latest Ganglia version.
Uncompress and extract the tarball you have just downloaded:
$ tar -xvzf ganglia-major.minor.release.tar.gz
Successfully building Ganglia 3.1.2 on OS X 10.5 requires that you apply the patch detailed here. Download the patch file and copy it to the root of the extracted Ganglia source tree, then apply it:
$ cd ganglia-major.minor.release $ patch -p0 < patch-file
Assuming that you installed MacPorts under the default
installation directory (/opt/local). Export
MacPorts’ /bin directory to your
PATH
, and run the configure script, specifying the
location of lib/ and
include/ as options
$ export PATH=$PATH:/opt/local/bin $ ./configure --with-gmetad LDFLAGS="-L/opt/local/lib" CPPFLAGS="-I/opt/local/include"
$ make $ sudo make install
Convenient binary packages for Solaris are distributed in the
OpenCSW collection. Follow
the standard procedure to install the OpenCSW. Run the
pkgutil
tool on Solaris, and then use the tool to
install the package:
$ pkgutil $ CSWgangliagmetad
The default location for the configuration files on Solaris (OpenCSW) is /etc/opt/csw/ganglia. You can now start and stop all the Ganglia processes using the normal SMF utility on Solaris, as in:
$ svcadm enable cswgmetad
Ganglia wouldn’t be complete without its web interface: gweb (Ganglia Web). After collecting several different metrics in order to evaluate how our cluster is performing, we certainly need a visual representation, preferably using graphics in the Web. gweb fills this gap. gweb is a PHP frontend in which you display all data stored by gmetad using your browser. Please see the “Demos” section here for live demos of the web frontend.
As of Ganglia 3.4.0, the web interface is a separate distribution tarball maintained in a separate source code repository. The release cycle and version numbers of gweb are no longer in lockstep with the release cycle and version numbers of the Ganglia gmond and the gmetad daemon.
Ganglia developers support gweb 3.4.0 with all versions of gmond/gmetad version 3.1.x and higher. Future versions of gweb may require a later version of gmond/gmetad. It’s recommended to check the installation documentation for exact details whenever installing or upgrading gweb.
The frontend, as already mentioned, is a web application. This book covers gweb versions 3.4.x and later, which may not be available to all distributions, requiring more work to get it installed. Before proceeding, please review the requirements to install gweb:
Apache Web Server
PHP 5.2 or later
PHP JSON extension installed and enabled
If you are installing from the repositories, the installation is pretty straightforward. Requirements will be automatically satisfied, and within a few commands you should be able to play with the web interface.
To install gweb on a Debian-based Linux distribution, execute the following command as either root or user with high privilege:
root@host:# apt-get install apache2 php5 php5-json
This command installs Apache and PHP 5 to satisfy its dependencies, in case you don’t have it already installed. You might have to enable the PHP JSON module as well. Then execute this command:
root@host:# grep ^extension=json.so /etc/php5/conf.d/json.ini
and if the module is not enabled, enable it with the following command:
root@host:# echo 'extension=json.so' >> /etc/php5/conf.d/json.ini
You are ready to download the latest gweb. Once it’s downloaded, explode and edit Makefile to install gweb:
root@host:# tar -xvzf ganglia-web-major.minor.release.tar.gz root@host:# cd ganglia-web-major.minor.release
Edit Makefile and set DESTDIR
and
APACHE_USER
variables. On Debian-based distros,
the default settings are the following:
# Location where gweb should be installed to DESTDIR = /var/www/html/ganglia2 APACHE_USER = www-data ...
This means that gweb will be available to the user here. You can change to whichever name you want. Finally, run the following command:
root@host:# make install
If no errors are shown, gweb is successfully installed. Skip to Configuring Ganglia for further information on gweb settings.
The way to install gweb on a RPM-based distribution is very similar to installing gweb on a Debian-based distribution. Start by installing Apache and PHP 5:
root@host:# yum install httpd php
You also need to enable the JSON extension for PHP. It’s already included in PHP 5.2 or later. Make sure it’s enabled by checking the content of /etc/php.d/json.ini file. You should have something similar to the following listing:
extension=json.ini
Download the latest gweb. Once downloaded, explode and edit Makefile to install gweb 2:
root@host:# tar -xvzf ganglia-web-major.minor.release.tar.gz root@host:# cd ganglia-web-major.minor.release
Edit Makefile and set the DESTDIR
and
APACHE_USER
variables. On RPM-based distros, the
default settings are:
# Location where gweb should be installed to DESTDIR = /var/www/html/ganglia2 APACHE_USER = apache ...
This means that gweb will be available here. You can change to whichever name you want. Finally, run:
root@host:# make install
If no errors are shown, gweb is successfully installed. Skip to Configuring Ganglia for further information on gweb settings.
If you need to install gweb on Mac OS X, you have to follow a slightly different approach than if you were installing in Linux. Again, there isn’t any binary package for Mac OS X, leaving you with the option of downloading the source from the website. Before downloading, you have to make sure that your Mac OS X has shipped with a few of the requirements. That’s what this section is about.
First off, an HTTP server is required, and chances are good that
your Mac OS X installation was shipped with Apache Web Server. You can
also install it via MacPorts, but this approach is not covered here.
It is your choice. In order to verify your Apache installation, go to
System Preferences → Sharing. Turn
Web Services on if it is off. Make sure it’s
running by typing http://localhost
on your
browser. You should see a test page. You can also load Apache via
Terminal by typing:
$ sudo launchctl load -w /System/Library/LaunchDaemons/org.apache.httpd.plist
PHP is also required to run gweb. PHP is shipped with Mac OS X,
but it’s not enabled by default. To enable, edit the
httpd.conf file and uncomment the line that loads
the php5_module
.
$ cd /etc/apache2 $ sudo vim httpd.conf
Search for the following line, uncomment (strip the #) it, and save the file:
# LoadModule php5_module libexec/apache2/libphp5.so
Restart Apache:
$ sudo launchctl unload -w /System/Library/LaunchDaemons/org.apache.httpd.plist $ sudo launchctl load -w /System/Library/LaunchDaemons/org.apache.httpd.plist
Now that you have satisfied the requirements, it’s time to
download and install gweb 2.
Please download the latest
release. Once you have finished, change to the directory where
the file is located and extract its content. Next,
cd
to the extraction directory:
$ tar -xvzf ganglia-web-major.minor.release.tar.gz $ cd ganglia-web-major.minor.release
This next step really depends on how Apache Web Server is set up on your system. You need to find out where Apache serves its pages from or, more specifically, its DocumentRoot. Of course, the following location isn’t the only possibility, but for clarity’s sake, we will work with the default settings. So here, we’re using /Library/WebServer/Documents:
$ grep -i documentroot /etc/apache2/httpd.conf
Edit the Makefile found in the tarball. Insert the location of your Apache’s DocumentRoot and the name of the user that Apache runs. On Mac OS X Lion, the settings are:
# Location where gweb should be installed DESTDIR = /Library/WebServer/Documents/ganglia2 APACHE_USER = _www ...
This means that gweb will be available to the user here. You can change this to whichever name you want. Finally, run:
$ sudo make install
If no errors are shown, Ganglia Web is successfully installed. Read the next sections to configure Ganglia prior to running it for the first time.
Convenient binary packages for Solaris are distributed in the
OpenCSW collection. Follow
the standard procedure to install the OpenCSW. Run the
pkgutil
tool on Solaris, and then use the tool to
install the package:
$ pkgutil $ CSWgangliaweb
The default location for the configuration files on Solaris (OpenCSW) is /etc/opt/csw/ganglia. You can now start and stop all the Ganglia processes using the normal SMF utility on Solaris, as in:
$ svcadm enable cswapache
The following subsections document the configuration specifics of each Ganglia component. The default configuration shipped with Ganglia “just works” in most environments with very little additional configuration, but we want to let you know what other options are available in addition to the default. We would also like you to understand how the choice of a particular option may affect Ganglia deployment in your environment.
gmond, summarized in Chapter 1, is installed on each host that you want to monitor. It interacts with the host operating system to obtain metrics and shares the metrics it collects with other hosts in the same cluster. Every gmond instance in the cluster knows the value of every metric collected by every host in the same cluster and by default provides an XML-formatted dump of the entire cluster state to any client that connects to gmond’s port.
gmond’s default topology is a multicast mode, meaning that all nodes in the cluster both send and receive metrics, and every node maintains an in-memory database—stored as a hash table—containing the metrics of all nodes in the cluster. This topology is illustrated in Figure 2-1.
Of particular importance in this diagram is the disparate nature of the gmond daemon. Internally, gmond’s sending and receiving halves are not linked (a fact that is emphasized in Figure 2-1 by the dashed vertical line). gmond does not talk to itself—it only talks to the network. Any local data captured by the metric modules are transmitted directly to the network by the sender, and the receiver’s internal database contains only metric data gleaned from the network.
This topology is adequate for most environments, but in some cases it is desirable to specify a few specific listeners rather than allowing every node to receive (and thereby waste CPU cycles to process) metrics from every other node. More detail about this architecture is provided in Chapter 3.
The use of “deaf” nodes, as illustrated in Figure 2-2, eliminates the processing overhead associated with large clusters. The deaf and mute parameters exist to allow some gmond nodes to act as special-purpose aggregators and relays for other gmond nodes. Mute means that the node does not transmit; it will not even collect information about itself but will aggregate the metric data from other gmond daemons in the cluster. Deaf means that the node does not receive any metrics from the network; it will not listen to state information from multicast peers, but if it is not muted, it will continue sending out its own metrics for any other node that does listen.
The use of multicast is not required in any topology. The deaf/mute topology can be implemented using UDP unicast, which may be desirable when multicast is not practical or preferred (see Figure 2-3).
Further, it is possible to mix and match the deaf/mute, and default topologies to create a system architecture that better suits your environment. The only topological requirements are:
At least one gmond instance must receive all the metrics from all nodes in the cluster.
Periodically, gmetad must poll the gmond instance that holds the entire cluster state.
In practice, however, nodes not configured with any multicast connectivity do not need to be deaf; it can be useful to configure such nodes to send metrics to themselves using the address 127.0.0.1 so that they will keep a record of their own metrics locally. This makes it possible to make a TCP probe of any gmond for an XML about its own agent state while troubleshooting.
For a more thorough discussion of topology and scalability considerations, see Chapter 3.
You can generate a default configuration file for gmond by running the following command:
user@host:$ gmond -t
The configuration file is composed of sections, enclosed in curly braces, that fall roughly into two logical categories. The sections in the first category deal with host and cluster configuration; those in the second category deal with the specifics of metrics collection and scheduling.
All section names and attributes are case insensitive. The following attributes, for example, are all equivalent:
name NAME Name NaMe
Some configuration sections are optional; others are required. Some may be defined in the configuration file multiple times; others must appear only once. Some sections may contain subsections.
The include
directive can be used to break up the
gmond.conf file into multiple files for
environments with large complex configurations. The
include
directive supports the use of typeglobs. For
example, the line:
include ('/etc/ganglia/conf.d/*.conf')
would instruct gmond to load all files in /etc/ganglia/conf.d/ that ended in “.conf”.
To get gmond up and running quickly just to poke around, all
you should need to set is the name
attribute in
the “cluster” section of the default configuration file.
The configuration file is parsed using libconfuse, a third-party
API for configuration files. The normal rules of libconfuse file
format apply. In particular, boolean values can be set using
yes
, true
, and
on
for a positive value and their opposites,
no
, false
, and
off
for a negative value. Boolean values are not
handled in a case-sensitive manner.
There are eight sections that deal with the configuration of the host itself.
The globals section configures the general characteristics of the daemon itself. It should appear only once in the configuration file. The following is the default globals section from Ganglia 3.3.1:
globals { daemonize = yes setuid = yes user = nobody debug_level = 0 max_udp_msg_len = 1472 mute = no deaf = no allow_extra_data = yes host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */ host_tmax = 20 /*secs */ cleanup_threshold = 300 /*secs */ gexec = no send_metadata_interval = 0 /*secs */ }
daemonize
(boolean)When true, gmond will fork and run in the background. Set this value to false if you’re running gmond under a daemon manager such as daemontools.
setuid
(boolean)When true, gmond will set its effective UID to the UID
of the user specified by the user
attribute. When
false, gmond will not change its effective user.
debug_level
(integer value)When set to zero (0), gmond will run normally. A
debug_level
greater than zero will result
in gmond running in the foreground and outputting debugging
information. The higher the debug_level
,
the more verbose the output.
max_udp_msg_len
(integer value)This value is the maximum size that one packet sent by gmond will contain. It is not a good idea to change this value.
mute
(boolean)When true, gmond will not send data, regardless of any other configuration directive. “Mute” gmond nodes are only mute when it comes to other gmond daemons. They still respond to queries from external pollers such as gmetad.
deaf
(boolean)When true, gmond will not receive data, regardless of any other configuration directives. In large grids with thousands of nodes per cluster, or carefully optimized HPC grids, in which every CPU cycle spent on something other than the problem is a wasted cycle, “normal” compute nodes are often configured as deaf in order to minimize the overhead associated with aggregating cluster state. In these instances, dedicated nodes are set aside to be mute. In such a setup, the performance metrics of the mute nodes aren’t measured because those nodes aren’t a computationally relevant portion of the grid. Their job is to aggregate, so their performance data would pollute that of the functional portion of the cluster.
allow_extra_data
(boolean)When false, gmond will not send the
EXTRA_ELEMENT
and
EXTRA_DATA
parts of the XML. This value
might be useful if you are using your own frontend and would
like to save some bandwidth.
host_dmax
(integer_value in seconds)Stands for “delete max.” When set to 0, gmond will never delete a host from its list, even when a remote host has stopped reporting. If host_dmax is set to a positive number, gmond will flush a host after it has not heard from it for host_dmax seconds.
host_tmax
(integer_value in seconds)Stands for “timeout max.” Represents the maximum amount of time that gmond should wait between updates from a host. Because messages may get lost in the network, gmond will consider the host as being down if it has not received any messages from it after four times this value.
cleanup_threshold
(integer_value in
seconds)Minimum amount of time before gmond will clean up expired data.
gexec
(boolean)When true, gmond will announce the host’s availability to run gexec jobs. This approach requires that gexecd be running on the host and the proper keys have been installed.
send_metadata_interval
(integer_value in
seconds)Establishes the interval at which gmond will send or resend the metadata packets that describe each enabled metric. This directive by default is set to 0, which means that gmond will send the metadata packets only at startup and upon request from other gmond nodes running remotely. If a new machine running gmond is added to a cluster, it needs to announce itself and inform all other nodes of the metrics that it currently supports. In multicast mode, this isn’t a problem, because any node can request the metadata of all other nodes in the cluster. However, in unicast mode, a resend interval must be established. The interval value is the minimum number of seconds between resends.
module_dir
(path; optional)Indicates the directory where the metric collection
modules are found. If omitted, defaults to the value of the
compile-time option: --with-moduledir
. This
option, in turn, defaults to a subdirectory named
Ganglia in the directory where libganglia
will be installed. To discover the default value in a
particular gmond binary, generate a sample configuration file
by running:
# gmond -t
For example, in a 32-bit Intel-compatible Linux host, the default is usually at /usr/lib/ganglia.
Each gmond daemon will report information about the cluster in
which it resides using the attributes defined in the cluster
section. The default values are the string
"unspecified"
; the system is usable with the
default values. This section may appear only once in the
configuration file. Following is the default cluster section:
cluster { name = "unspecified" owner = "unspecified" latlong = "unspecified" url = "unspecified" }
The attributes in the cluster section directly correspond
to the attributes in the CLUSTER
tag in the
XML output from gmond.
name
(text)Specifies the name of the cluster. When the node is
polled for an XML summary of cluster state, this name is
inserted in the CLUSTER
element. The gmetad
polling the node uses this value to name the directory where
the cluster data RRD files are stored. It supersedes a cluster
name specified in the gmetad.conf
configuration file.
owner
(text)Specifies the administrators of the cluster.
latlong
(text)Specifies the latitude and longitude GPS coordinates of this cluster on earth.
url
(text)Intended to refer to a URL with information specific to the cluster, such as the cluster’s purpose or usage details.
The name
attribute specified in the
cluster section does place this host into a cluster. The
multicast address and the UDP port specify whether a host is
on the cluster. The name attribute acts justs as an
identifier when polling.
The host section provides information about the host running this instance of gmond. Currently, only the location string attribute is supported. The default host section is:
host { location = "unspecified" }
location
(text)The location of the host in a format relative to the
site, although rack,U[,blade]
is often
used.
UDP send and receive channels establish how gmond nodes talk to each other. Clusters are defined by UDP communication channels, which is to say, that a cluster is nothing more than some number of gmond nodes that share the same send and/or receive channels.
By default, every node in a gmond cluster multicasts its own metric data to its peers via UDP and listens for similar UDP multicasts from its peers. This is easy to set up and maintain: every node in the cluster shares the same multicast address, and new nodes are automatically discovered. However, as we mentioned in the previous section on deaf and mute nodes, it is sometimes desirable to specify individual nodes by their unicast address.
For this reason, any number of gmond send and receive channels may be configured to meet the needs of your particular environment. Each configured send channel defines a new way that gmond will advertise its metrics, and each receive channel defines a way that gmond will receive metrics from other nodes. Channels may be either unicast or multicast and either IPv4 or IPv6.
Note that a gmond node should not be configured to contribute metrics to more than one Ganglia cluster, nor should you attempt to receive metrics for more than one cluster.
UDP channels are created using the
udp_(send|receive)_channel
sections. Following is the
default UDP send channel:
udp_send_channel { #bind_hostname = yes mcast_join = 239.2.11.71 port = 8649 ttl = 1 }
bind_hostname
(boolean; optional, for
multicast or unicast)Tells gmond to use a source address that resolves to the machine’s hostname.
mcast_join
(IP; optional, for multicast
only)When specified, gmond will create a UDP socket and join
the multicast group specified by the IP. This option creates a
multicast channel and is mutually exclusive with
host
.
mcast_if
(text; optional, for multicast
only)When specified, gmond will send data from the specified
interface (eth0
, for example).
host
(text or IP; optional, for unicast
only)When specified, gmond will send data to the named host.
This option creates a unicast channel and is mutually
exclusive with mcast_join
.
port
(number; optional, for multicast and
unicast)The port number to which gmond will send data. If it’s not set, port 8649 is used by default.
ttl
(number; optional, for multicast or
unicast)The time-to-live, this setting is particularly important for multicast environments, as it limits the number of hops over which the metric transmissions are permitted to propagate. Setting this value to any value higher than necessary could result in metrics being transmitted across WAN connections to multiple sites or even out into the global Internet.
Following is the default UDP receive channel:
udp_recv_channel { mcast_join = 239.2.11.71 port = 8649 bind = 239.2.11.71 }
mcast_join
(IP; optional, for multicast
only)When specified, gmond will listen for multicast packets from the multicast group specified by the IP. If you do not specify multicast attributes, gmond will create a unicast UDP server on the specified port.
mcast_if
(text; optional, for multicast
only)When specified, gmond will listen for data on the
specified interface (eth0
, for
example).
bind
(IP; optional, for multicast or
unicast)When specified, gmond will bind to the local address specified.
port
(number; optional, for multicast or
unicast)The port number from which gmond will receive data. If not set, port 8649 is used by default.
family
(inet4|inet6; optional, for multicast
or unicast)The IP version, which defaults to
inet4
. If you want to bind the port to an
inet6 port, specify inet6
in the family
attribute. Ganglia will not allow IPv6=>IPv4 mapping (for
portability and security reasons). If you want to listen on
both inet4
and inet6
for
a particular port, define two separate receive channels for
that port.
acl
(ACL definition; optional, for multicast
or unicast)An access control list may be specified for fine-grained access control to a receive channel. See Access control for details on ACL syntax.
TCP Accept Channels establish the means by which gmond nodes report the cluster state to gmetad or other external pollers. Configure as many of them as you like. The default TCP Accept Channel is:
tcp_accept_channel { port = 8649 }
bind
(IP; optional)When specified, gmond will bind to the local address specified.
port
(number)The port number on which gmond will accept connections.
family
(inet4|inet6; optional)The IP version, which defaults to
inet4
. If you want to bind the port to an
inet6 port, you need to specify inet6
in
the family attribute. Ganglia will not allow IPv6=>IPv4
mapping (for portability and security reasons). If you want to
listen on both inet4
and
inet6
for a particular port, define two
separate receive channels for that port.
interface
(text; optional)When specified, gmond will listen for data on the
specified interface (eth0
, for
example).
acl
(ACL definition; optional)An access control list (discussed in the following section) may be specified for fine-grained access control to an accept channel.
The udp_recv_channel
and
tcp_accept_channel
directives can contain an
Access Control List (ACL). This list allows you to specify addresses
and address ranges from which gmond will accept or deny connections.
Following is an example of an ACL:
acl { default = "deny" access { ip = 192.168.0.0 mask = 24 action = "allow" } access { ip = ::ff:1.2.3.0 mask = 120 action = "deny" } }
The syntax should be fairly self-explanatory to anyone with a
passing familiarity with access control concepts. The
default
attribute defines the default policy for the
entire ACL. Any number of access
blocks may be
specified that list hostnames or IP addresses and associate
allow
or deny
actions to those
addresses. The mask
attribute defines a subnet mask in
CIDR notation, allowing you to specify address ranges instead of
individual addresses. Notice that in case of conflicting ACLs, the
first match wins.
sFlow is an industry standard technology for monitoring high-speed switched networks. Originally targeted at embedded network hardware, sFlow collectors now exist for general-purpose operating systems as well as popular applications such as Tomcat, memcached, and the Apache Web Server. gmond can be configured to act as a collector for sFlow agents on the network, packaging the sFlow agent data so that it may be transparently reported to gmetad. Further information about sFlow interoperability is provided in Chapter 8. The entire sFlow section is optional. Following is the default sFlow configuration:
#sflow { # udp_port = 6343 # accept_vm_metrics = yes # accept_jvm_metrics = yes # multiple_jvm_instances = no # accept_http_metrics = yes # multiple_http_instances = no # accept_memcache_metrics = yes # multiple_memcache_instances = no #}
udp_port
(number; optional)The port on which gmond will accept sFlow data.
The remaining configuration parameters deal with application-specific sFlow data types. See Chapter 8 for details.
The modules section contains the parameters that are necessary to load a metric module. Metric modules are dynamically loadable shared object files that extend the available metrics gmond is able to collect. Much more information about extending gmond with modules can be found in Chapter 5.
Each modules
section must contain at least one
module
subsection. The module
subsection
is made up of five attributes. The default configuration contains
every module available in the default installation, so you should
not have to change this section unless you’re adding new modules.
The configuration for an imaginary example_module
is provided here:
modules { module { name = "example_module" language = "C/C++" enabled = yes path = "modexample.so" params = "An extra raw parameter" param RandomMax { value = 75 } param ConstantValue { value = 25 } } }
name
(text)The name of the module as determined by the module structure if the module was developed in C/C++. Alternatively, the name can be the name of the source file if the module has been implemented in an interpreted language such as Python.
language
(text; optional)The source code language in which the module was
implemented. Defaults to “C/C++”
if
unspecified. Currently, only C, C++, and Python are
supported.
enabled
(boolean; optional)Allows a metric module to be easily enabled or disabled
through the configuration file. If the enabled
directive is not included in the module configuration, the
enabled state will default to yes
.
If a module that has been disabled contains a metric that is still listed as part of a collection group, gmond will produce a warning message but will continue to function normally by ignoring the metric.
path
(text)The path from which gmond is expected to load the module
(C/C++ compiled dynamically loadable module only). If the
value of path
does not begin with a forward
slash, the value will be appended to that of the
module_path
attribute from the
globals
section.
params
(text; optional)Used to pass a string parameter to the module initialization function (C/C++ module only). Multiple parameters can be passed to the module’s initialization function by including one or more param sections. Each param section must be named and contain a value directive.
The collection_group
entries specify the metrics
that gmond will collect, as well as how often gmond will collect and
broadcast them. You may define as many collection groups as you
wish. Each collection group must contain at least one
metric
section.
These are logical groupings of metrics based on common collection intervals. The groupings defined in gmond.conf do not affect the groupings used in the web interface, nor is it possible to use this mechanism to specify the names of the groups for the web interface. An excerpt from the default configuration follows:
collection_group { collect_once = yes time_threshold = 1200 metric { name = "cpu_num" title = "CPU Count" } } collection_group { collect_every = 20 time_threshold = 90 /* CPU status */ metric { name = "cpu_user" value_threshold = "1.0" title = "CPU User" } metric { name = "cpu_system" value_threshold = "1.0" title = "CPU System" } }
collect_once
(boolean)Some metrics are said to be “nonvolatile” in that they
will not change between reboots. This includes metrics such as
the OS type or the number of CPUs installed in the system.
These metrics need to be collected only once at startup and
are configured by setting the collect_once
attribute to yes
. This attribute is
mutually exclusive with collect_every
.
collect_every
(seconds)This value specifies the polling interval for the
collection group. In the previous example, the
cpu_user
and cpu_system
metrics will be collected every 20 seconds.
time_threshold
(seconds)The maximum amount of time that can pass before gmond
sends all metrics specified in the
collection_group
to all configured
udp_send_channels
.
name
(text)The name of an individual metric as defined within the
metric collection module. Typically, each loaded module
defines several individual metrics. An alternative to
name
is name_match
. By using the
name_match
parameter instead of
name
, it is possible to use a single definition
to configure multiple metrics that match a regular expression.
The Perl-compatible regular expression (pcre) syntax is used
(e.g., name_match =
"multicpu_([a-z]+)([0-9]+)"
).
You can get a list of the available metric names by
running gmond with an -m
switch.
value_threshold
(number)Each time a metric value is collected, the new value is
compared with the last measured value. If the difference
between the last value and the current value is greater than
the value_threshold
, the entire collection
group is sent to the udp_send_channels
defined. The units denoted by the value vary according to the
metric module. For CPU stats, for example, the value
represents a percentage, and network stats interpret the value
as a raw number of bytes.
Any time a value_threshold
is
surpassed by any single metric in a collection group, all
metrics in that collection group are sent to every UDP
receive channel.
title
(text)A user-friendly title for the metric for use on the web frontend.
gmetad, the Ganglia Meta Daemon, is installed on the host that will collect and aggregate the metrics collected by the hosts running gmond. By default, gmetad will collect and aggregate these metrics in RRD files, but it is possible to configure gmetad to forward metrics to external systems such as Graphite instead.
gmetad listens on tcp port 8651 for connections from remote gmetad instances and will provide an XML dump of the grid state to authorized hosts. It also responds to interactive requests on tcp port 8652. The interactive facility allows simple subtree and summation views of the grid state XML tree. gweb uses this interactive query facility to present information that doesn’t fit naturally in RRD files, such as OS version.
The simplest topology is a single gmetad process polling one or more gmond instances, as illustrated in Figure 2-4.
Redundancy/high availability is a common requirement and is easily implemented. Figure 2-5 shows an example in which two (redundant) gmetads poll multiple gmonds in the same cluster. The gmetads will poll node2 only if they are unable to poll node1 successfully. Both gmetads are always polling (active-active clustering).
gmetad is not limited to polling gmond: a gmetad can poll another gmetad to create a hierarchy of gmetads. This concept is illustrated in Figure 2-6.
gmetad, by default, writes all metric data directly to RRD files on the filesystem, as illustrated in Figure 2-4.
In large installations in which there is an IO constraint, rrdcached acts as a buffer between gmetad and the RRD files, as illustrated in Figure 2-7.
The gmetad.conf configuration file is composed of single-line attributes and their corresponding values. Attribute names are case insensitive, but their values are not. The following attributes, for example, are all equivalent:
name NAME Name NaMe
Most attributes are optional; others are required. Some may be defined in the configuration file multiple times; others must appear only once.
The data_source
attribute is the heart of gmetad
configuration. Each data_source
line describes either a
gmond cluster or a gmetad grid from which this gmetad instance will
collect information. gmetad is smart enough to automatically make
the distinction between a cluster and a grid, so the
data_source
syntax is the same for either. If gmetad
detects that the data_source
refers to a cluster, it
will maintain a complete set of round robin databases for the data
source. If, however, gmetad detects that the
data_source
refers to a grid, it will maintain only
summary RRDs.
Setting the scalable
attribute to
off
overrides this behavior and forces gmetad to
maintain a full set of RRD files for grid data sources.
The following examples, excerpted from the default configuration file, are valid data sources:
data_source "my cluster" 10 localhost my.machine.edu:8649 1.2.3.5:8655 data_source "my grid" 50 1.3.4.7:8655 grid.org:8651 grid-backup.org:8651 data_source "another source" 1.3.4.8:8655 1.3.4.8
Each data_source
is composed of three fields. The
first is a string that uniquely identifies the source. The second
field is a number that specifies the polling interval for the
data_source
in seconds. The third is a space-separated
list of hosts from which gmetad may poll the data. The addresses may
be specified as IP addresses or DNS hostnames and may optionally be
suffixed by a colon followed by the port number where the gmond
tcp_accept_channel
is to be found. If no port number is
specified, gmetad will attempt to connect to tcp/8649.
gmetad will check each specified host in order, taking the
status data from the first host to respond, so it’s not necessary
to specify every host in a cluster in the data_source
definition. Two or three are usually sufficient to ensure that
data is collected in the event of a node failure.
The following attributes affect the functioning of the gmetad daemon itself:
gridname
(text)A string that uniquely identifies the grid for which
this gmetad instance is responsible. This string should be
different from the one set in gmond. The one set in
gmond.conf (at cluster { name = “XXX” })
is used in the CLUSTER
tag that wraps all
the hosts that particular gmond instance has collected. The
gridname
attribute will wrap all data sources
specified in a GRID
tag, which could be
thought of as a collection of clusters defined in the
data_source
.
authority
(URL)The authority URL for this grid. Used by other gmetad instances to locate graphs for this instance’s data sources. By default, this value points to “http://hostname/ganglia/”.
trusted_hosts
(text)A space-separated list of hosts with which this gmetad
instance is allowed to share data.
Localhost
is always trusted.
all_trusted
(on|off)Set this value to on
to override the
trusted_hosts
attribute and allow data sharing
with any host.
setuid_username
(UID)The name of the user gmetad will set the UID to after
launch. This defaults to nobody
.
setuid
(on|off)Set this to off
to disable
setuid.
xml_port
(number)The gmetad listen port. This value defaults to 8651.
interactive_port
(number)The gmetad interactive listen port. This value defaults to 8652.
server_threads
(number)The number of simultaneous connections allowed to connect to the listen ports. This value defaults to 4.
case_sensitive_hostnames
(1|0)In earlier versions of gmetad, the RRD files were created with case-sensitive hostnames, but this is no longer the case. Legacy users who wish to continue to use RRD files created by Ganglia versions before 3.2 should set this value to 1. Since Ganglia 3.2, this value has defaulted to 0.
Several attributes affect the creation and handling of RRD files.
RRAs
(text)These specify custom Round Robin Archive values. The default is (with a “step size” of 15 seconds):
"RRA:AVERAGE:0.5:1:5856" "RRA:AVERAGE:0.5:4:20160" "RRA:AVERAGE:0.5:40:52704"
The full details of an RRA specification are contained
in the manpage for rrdcreate
(1).
umask
(number)Specifies the umask to apply to created RRD files and the directory structure containing them. It defaults to 022.
rrd_rootdir
(path)Specifies the base directory where the RRD files will be stored on the local filesystem.
It is possible to export all the metrics collected by gmetad to Graphite, an external open source metrics storage and visualization tool, by setting the following attributes.
carbon_server
(address)The hostname or IP of a remote carbon daemon.
carbon_port
(number)The carbon port number, which defaults to 2003.
graphite_prefix
(text)Graphite uses dot-separated paths to organize and refer
to metrics, so it is probably desirable to prefix the metrics
from gmetad with something descriptive like
datacenter1.gmetad
, so Graphite
will organize them appropriately.
carbon_timeout
(number)The number of milliseconds gmetad will wait for a response from the Graphite server. This setting is important because gmetad’s carbon sender is not threaded and will block waiting on a response from a down carbon daemon. Defaults to 500.
As mentioned previously, gmetad listens on TCP port 8652 (by default) for interactive queries. The interactive query functionality enables client programs to get XML dumps of the state of only the portion of the Grid in which they’re interested.
Interactive queries are performed via a text protocol (similar to SMTP or HTTP). Queries are hierarchal, and begin with a forward slash (/). For example, the following query returns an XML dump of the entire grid state:
/
To narrow the query result, specify the name of a cluster:
/cluster1
To narrow the query result further, specify the name of a host in the cluster:
/cluster1/host1
Queries may be suffixed with a filter to modify the type of
metric information returned by the query (as of this writing,
summary
is the only filter available). For example, you
can request only the summary metric data from
cluster1
like so:
/cluster1?filter=summary
Of the three daemons that comprise Ganglia, gweb is both the most configurable, and also least in need of configuration. In fact, there is no need to change anything whatsoever in gweb’s default configuration file to get up and running with a fully functional web UI.
Although gweb itself requires no configuration to speak of, some
web server configuration is necessary to get gweb up and running. Any
web server with PHP support will do the job, and although web server
configuration is beyond of the scope of this book, the Apache Web
Server is such a common choice that it has been included an example of
a virtual host configuration for Apache. Assuming that gweb is
installed in /var/www/html/ganglia2 on a host
that resolves to myganglia.example.org
, the
following configuration should get you started with Apache:
NameVirtualHost *.80 <VirtualHost *:80> ServerName myganglia.example.org ServerAlias myganglia DocumentRoot /var/www/html/ganglia2 # Other directives here </VirtualHost>
This is, of course, a simplistic example. For further reading on the subject, we recommend further reading here.
gweb is configured by way of the conf.php file. In fact, this file overrides and extends the default configuration set in conf_default.php. conf.php is located in the web root directory. This file is well documented, and as of this writing there are more than 80 options, so it won’t cover them all, but it will cover some of the more important ones, and make note of some option categories—just so you’re aware they’re there.
The file, as its name suggests, is itself a PHP script made up
of variable assignments. Unlike the other configuration files,
assignments might span multiple lines. Attribute names are themselves
keys in gweb’s $conf
data structure, so they are case
sensitive, and look like PHP array assignments. The following line,
for example, informs gweb of the location of the RRDtool binary:
$conf['rrdtool'] = "/usr/bin/rrdtool";
All
attributes in the file are required, and some may be defined multiple
times; others must appear only once. Some values are derived from
other values. For example, the rrds
attribute is derived
from gmetad_root
:
$conf['rrds'] = "${conf['gmetad_root']}/rrds";
Attributes in this category affect gweb’s functional parameters—its own home directory, for example, or the directories in which it will check for RRDs or templates. These are rarely changed by the user, but a few of them bear mentioning.
templates
(path)Specifies the directory in which gweb will search for template files. Templates are like a skin for the site that can alter its look and feel.
graphdir
(path)Specifies the directory where the user may drop JSON definitions of custom graphs. As described in the next chapter, users may specify custom report graphs in JSON format and place them in this directory, and they will appear in the UI.
rrds
(path)Specifies the directory where the RRD files are to be found.
As described in Chapter 7, various Nagios integration features may be set in gweb’s conf.php. Collectively, these enable Nagios to query metric information from gweb instead of relying on remote execution systems such as Nagios Service Check Acceptor (NSCA) and Nagios Remote Plugin Executor (NRPE).
gweb may be configured to limit the number of graphs it
displays at once (max_graphs
) and to use a specified
number of columns for the grid and host views. There are also a
number of boolean options that affect the default behavior of the UI
when it is first launched, such as
metric_groups_initially_collapsed
.
The config.php file defines numerous settings that modify the functional attributes of the graphs drawn in the UI. For example, you may change the colors used to plot the values in the built-in load report graph and the default colors used in all the graphs and even define custom time ranges.
Attributes in this category include the following:
auth_system
(readonly|enabled|disabled)gweb includes a simple authorization system to
selectively allow or deny individual users access to specific
parts of the application. This system is enabled by setting
auth_system
to enabled
. For
more information on the authorization features in gweb, see
Chapter 4.
Here are some advanced features:
rrdcached_socket
(path)Specifies the path to the rrdcached socket. rrdcached is a high-performance caching daemon that lightens the load associated with writing data to RRDs by caching and combining the writes. More information may be found in Appendix A.
graph_engine
(rrdtool|graphite)gweb can use Graphite instead of RRDtool as the rendering engine used to generate the graphs in the UI. This approach requires you to install patched versions of whisper and the Graphite webapp on your gweb server. More information can be found here.
Now that Ganglia is installed and configured, it’s time to get the various daemons started, verify that they’re functional, and ensure that they can talk to each other.
Starting the processes in a specific order is not necessary; however, if the daemons are started in the order recommended here, there won’t be a delay waiting for metadata to be retransmitted to the UDP aggregator and users won’t get error pages or incomplete data from the web server:
If you’re using the UDP unicast topology, start the UDP aggregator nodes first. This ensures that the aggregator nodes will be listening when the other nodes send their first metadata transmission.
Start gmetad instances at the lowest level of the hierarchy (in other words, gmetad instances that don’t poll any other gmetad instances).
Work up the hierarchy starting any other gmetad instances.
Start the Apache web servers. Web servers are started after gmetad; otherwise, the PHP scripts can’t contact gmetad and the users see errors about port 8652.
If gmetad is configured to use rrdcached, it is essential for rrdcached to be running before gmetad is started.
gmond and gmetad both listen on TCP sockets for inbound
connections. To test whether gmond is operational on a given host,
telnet
to gmond’s TCP port:
user@host:$ telnet localhost 8649
In reply, gmond should output an XML dump of metric data. If the
gmond is deaf or mute, it may return a rather empty XML document, with
just the CLUSTER
tag. gmetad may be likewise tested
with telnet like so:
user@host:$ telnet localhost 8651
A functioning gmetad will respond with an XML dump of metric data.
See Chapter 6 for a more comprehensive list of techniques for validating the state of the processes.
Firewall problems are common with new Ganglia installations that span network subnets. Here, we’ve collected the firewall requirements of the various daemons together to help you avoid interdaemon communication problems:
gmond uses multicast by default, so clusters that span a network subnet need to be configured with unicast senders and listeners as described in the previous topology sections. If the gmond hosts must traverse a firewall to talk to each other, allow udp/8649 in both directions. For multicast, support for the IGMP protocol must also be enabled in the intermediate firewalls and routers.
gmond listens for connections from gmetad on TCP port 8649. If gmetad must traverse a firewall to reach some of the gmond nodes, allow tcp/8649 inbound to a few gmond nodes in each cluster.
gmetad listens for connections on TCP port 8651 and 8652. The former port is analogous to gmond’s 8649, while the latter is the “interactive query” port to which specific queries may be sent. These ports are used by gweb, which is usually installed on the same host as gmetad, so unless you’re using some of the advanced integration features, such as Nagios integration, or have custom scripts querying gmetad, you shouldn’t need any firewall ACLs for gmetad.
gweb runs in a web server, which usually listens on ports 80 and 443 (if you enable SSL). If the gweb server is separated from the end users by a firewall (a likely scenario), allow inbound tcp/80 and possibly tcp/443 to the gweb server.
If your Ganglia installation uses sFlow collectors and the sFlow collectors must traverse a firewall to reach their gmond listener, allow inbound udp/6343 to the gmond listener.