Chapter 2. Installing and Configuring Ganglia

Dave Josephsen

Frederiko Costa

Daniel Pocock

Bernard Li

If you’ve made it this far, it is assumed that you’ve decided to join the ranks of the Ganglia user base. Congratulations! We’ll have your Ganglia-user conspiracy to conquer the world kit shipped immediately. Until it arrives, feel free to read through this chapter, in which we show you how to install and configure the various Ganglia components. In this chapter, we cover the installation and configuration of Ganglia 3.1.x for some of the most popular operating systems, but these instructions should apply to later versions as well.

Installing Ganglia

As mentioned earlier, Ganglia is composed of three components: gmond, gmetad, and gweb. In this first section, we’ll cover the installation and basic setup of each component.

gmond

gmond stands for Ganglia Monitoring Daemon. It’s a lightweight service that must be installed on each node from which you want to have metrics collected. This daemon performs the actual metrics collection on each host using a simple listen/announce protocol to share the data it gleans with its peer nodes in the cluster. Using gmond, you can collect a lot of system metrics right out of the box, such as CPU, memory, disk, network, and data about active processes.

Requirements

gmond installation is straightforward, and the libraries it depends upon are installed by default on most modern Linux distributions (as of this writing, those libraries are libconfuse, pkgconfig, PCRE, and APR). Ganglia packages are available for most Linux distributions, so if you are using the package manager shipped with your distribution (which is the suggested approach), resolving the dependencies should not be problematic.

Linux

The Ganglia components are available in a prepackaged binary format for most Linux distributions. We’ll cover the two most popular types here: .deb- and .rpm-based systems.

Debian-based distributions

To install gmond on a Debian-based Linux distribution, execute:

user@host:# sudo apt-get install ganglia-monitor
RPM-based distributions

You’ll find that some RPM-based distributions ship with Ganglia packages in the base repositories, and others require you to use special-purpose package repositories, such as the Red Hat project’s EPEL (Extra Packages for Enterprise Linux) repository. If you’re using a RPM-based distro, you should search in your current repositories for the gmond package:

user@host:$ yum search ganglia-gmond

If the search fails, chances are that Ganglia is not shipped with your RPM distribution. Red Hat users need to install Ganglia from the EPEL repository. The following examples demonstrate how to add the EPEL repository to Red Hat 5 and Red Hat 6.

Note

If you need to add the EPEL repository, be sure to take careful note of the distro version and architecture you are running and match it to that of the EPEL you’re adding.

For Red Hat 5.x:

user@host:# sudo rpm -Uvh 
http://mirror.ancl.hawaii.edu/linux/epel/5/i386/epel-release-5-4.noarch.rpm

For Red Hat 6.x:

user@host:# sudo rpm -Uvh 
http://mirror.chpc.utah.edu/pub/epel/6/i386/epel-release-6-7.noarch.rpm

Finally, to install gmond, type:

user@host:# sudo yum install ganglia-gmond

OS X

gmond compiles and runs fine on Mac OS X; however, at the time of this writing, there are no prepackaged binaries available. OS X users must therefore build Ganglia from source. Refer to the following instructions, which work for the latest Mac OS X Lion. For other versions of Mac OS X, the dependencies might vary. Please refer to Ganglia’s website for further information.

Several dependencies must be satisfied before building and installing Ganglia on OS X. These are, in the order they should be installed:

  • Xcode >= 4.3

  • MacPorts (requires Xcode)

  • libconfuse (requires MacPorts)

  • pkgconfig (requires MacPorts)

  • PCRE (requires MacPorts)

  • APR (requires MacPorts)

Xcode is a collection of development tools, and an Integrated Development Environment (IDE) for OS X. You will find Xcode at Apple’s developer tools website for download or on the MAC OS X installation disc.

MacPorts is a collection of build instructions for popular open source software for OS X. It is architecturally identical to the venerable FreeBSD Ports system. To install MacPorts, download the installation disk image from the MacPorts website. MacPorts for MAC OS X Lion is here. If you’re using Snow Leopard, the download is located here. For older versions, please refer here for documentation and download links.

Once MacPorts is installed and working properly, use it to install both libconfuse and pkconfig:

$ sudo port install libconfuse pkgconfig pcre apr

After satisfying the previously listed requirements, you are ready to proceed with the installation. Please download the latest Ganglia source release.

Change to the directory where the source file has been downloaded. Uncompress the tar-gzip file you have just downloaded:

$ tar -xvzf ganglia-major.minor.release.tar.gz

On Mac OS X 10.5+, you need to apply a patch so that gmond builds successfully. For further details on the patch, please visit the website. Download the patch file, copy it to the root of the build directory, and run the patch:

$ cd ganglia-major.minor.release 
$ patch -p0 < patch-file

Assuming that you installed MacPorts under the default installation directory (/opt/local), export MacPorts’ bin directory to your PATH and run the configure script, specifying the location of lib/ and include/ as options:

$ export PATH=$PATH:/opt/local/bin 
$ ./configure LDFLAGS="-L/opt/local/lib" CPPFLAGS="-I/opt/local/include"

Compile and install Ganglia:

$ make 
$ sudo make install

Solaris

Convenient binary packages for Solaris are distributed in the OpenCSW collection. Follow the standard procedure to install the OpenCSW. Run the pkgutil tool on Solaris, and then use the tool to install the package:

$ pkgutil
$ CSWgangliaagent

The default location for the configuration files on Solaris (OpenCSW) is /etc/opt/csw/ganglia. You can now start and stop all the Ganglia processes using the normal SMF utility on Solaris, such as:

$ svcadm enable cswgmond

Other platforms

Because Ganglia is an open source project, it is possible to compile a runnable binary executable of the gmond agent on virtually any platform with a C compiler.

The Ganglia projects uses the autotools build system to detect the tools available on most Linux and UNIX-like environments and build the binaries.

The autotools build system is likely to have support for many other platforms that are not explicitly documented in this book. Please start by reading the INSTALL file in the source tree, and also look online for tips about Ganglia or generic tips about using autotools projects in your environment.

gmetad

gmetad (the Ganglia Meta Daemon) is the service that collects metric data from other gmetad and gmond sources and stores their state to disk in RRD format. It also provides a simple query mechanism for collecting specific information about groups of machines and supports hierarchical delegation, making possible the creation of federated monitoring domains.

Requirements

The requirements for installing gmetad on Linux are nearly the same as gmond, except for the addition of RRDtool, which is required to store and display time-series data collected from other gmetad or gmond sources.

Linux

Once again, you are encouraged to take advantage of the prepackaged binaries available in the repository of your Linux distribution; we provide instructions for the two most popular formats next.

Debian-based distributions

To install gmetad on a Debian-based Linux distribution, execute:

user@host:# sudo apt-get install gmetad

Note

Compared to gmond, gmetad has additional software dependencies.

RPM-based distributions

As mentioned in the earlier gmond installation section, an EPEL repository must be installed if the base repositories don’t provide gmetad. Refer to gmond to add the EPEL repository. Once you’re ready, type:

user@host:# sudo yum install ganglia-gmetad

OS X

There are only two functional differences between building gmond and gmetad on OS X. First, gmetad has one additional software dependency (RRDtool), and second, you must include the --with-gmetad option to the configure script, because only gmond is built by the default Makefile.

Following is the list of requirements that must be satisfied before you can build gmetad on Mac OS X:

  • Xcode >= 4.3

  • MacPorts (requires Xcode)

  • libconfuse (requires MacPorts)

  • pkgconfig (requires MacPorts)

  • PCRE (requires MacPorts)

  • APR (requires MacPorts)

  • RRDtool (requires MacPorts)

Refer to OS X for instructions on installing Xcode and MacPorts. Once you have those sorted out, install the following packages to satisfy the requirements:

$ sudo port install libconfuse pkgconfig pcre apr rrdtool

Once those packages have been installed, proceed with the Ganglia installation by downloading the latest Ganglia version.

Uncompress and extract the tarball you have just downloaded:

$ tar -xvzf ganglia-major.minor.release.tar.gz

Successfully building Ganglia 3.1.2 on OS X 10.5 requires that you apply the patch detailed here. Download the patch file and copy it to the root of the extracted Ganglia source tree, then apply it:

$ cd ganglia-major.minor.release
$ patch -p0 < patch-file

Assuming that you installed MacPorts under the default installation directory (/opt/local). Export MacPorts’ /bin directory to your PATH, and run the configure script, specifying the location of lib/ and include/ as options

$ export PATH=$PATH:/opt/local/bin 
$ ./configure --with-gmetad LDFLAGS="-L/opt/local/lib" CPPFLAGS="-I/opt/local/include"

Compile and install Ganglia:

$ make
$ sudo make install

Solaris

Convenient binary packages for Solaris are distributed in the OpenCSW collection. Follow the standard procedure to install the OpenCSW. Run the pkgutil tool on Solaris, and then use the tool to install the package:

$ pkgutil
$ CSWgangliagmetad

The default location for the configuration files on Solaris (OpenCSW) is /etc/opt/csw/ganglia. You can now start and stop all the Ganglia processes using the normal SMF utility on Solaris, as in:

$ svcadm enable cswgmetad

gweb

Ganglia wouldn’t be complete without its web interface: gweb (Ganglia Web). After collecting several different metrics in order to evaluate how our cluster is performing, we certainly need a visual representation, preferably using graphics in the Web. gweb fills this gap. gweb is a PHP frontend in which you display all data stored by gmetad using your browser. Please see the “Demos” section here for live demos of the web frontend.

Requirements

As of Ganglia 3.4.0, the web interface is a separate distribution tarball maintained in a separate source code repository. The release cycle and version numbers of gweb are no longer in lockstep with the release cycle and version numbers of the Ganglia gmond and the gmetad daemon.

Ganglia developers support gweb 3.4.0 with all versions of gmond/gmetad version 3.1.x and higher. Future versions of gweb may require a later version of gmond/gmetad. It’s recommended to check the installation documentation for exact details whenever installing or upgrading gweb.

The frontend, as already mentioned, is a web application. This book covers gweb versions 3.4.x and later, which may not be available to all distributions, requiring more work to get it installed. Before proceeding, please review the requirements to install gweb:

  • Apache Web Server

  • PHP 5.2 or later

  • PHP JSON extension installed and enabled

Linux

If you are installing from the repositories, the installation is pretty straightforward. Requirements will be automatically satisfied, and within a few commands you should be able to play with the web interface.

Debian-based distributions

To install gweb on a Debian-based Linux distribution, execute the following command as either root or user with high privilege:

root@host:# apt-get install apache2 php5 php5-json

This command installs Apache and PHP 5 to satisfy its dependencies, in case you don’t have it already installed. You might have to enable the PHP JSON module as well. Then execute this command:

root@host:# grep ^extension=json.so /etc/php5/conf.d/json.ini

and if the module is not enabled, enable it with the following command:

root@host:# echo 'extension=json.so' >> /etc/php5/conf.d/json.ini

You are ready to download the latest gweb. Once it’s downloaded, explode and edit Makefile to install gweb:

root@host:# tar -xvzf ganglia-web-major.minor.release.tar.gz
root@host:# cd ganglia-web-major.minor.release

Edit Makefile and set DESTDIR and APACHE_USER variables. On Debian-based distros, the default settings are the following:

# Location where gweb should be installed to
DESTDIR = /var/www/html/ganglia2
APACHE_USER = www-data
...

This means that gweb will be available to the user here. You can change to whichever name you want. Finally, run the following command:

root@host:# make install

If no errors are shown, gweb is successfully installed. Skip to Configuring Ganglia for further information on gweb settings.

RPM-based distributions

The way to install gweb on a RPM-based distribution is very similar to installing gweb on a Debian-based distribution. Start by installing Apache and PHP 5:

root@host:# yum install httpd php

You also need to enable the JSON extension for PHP. It’s already included in PHP 5.2 or later. Make sure it’s enabled by checking the content of /etc/php.d/json.ini file. You should have something similar to the following listing:

extension=json.ini

Download the latest gweb. Once downloaded, explode and edit Makefile to install gweb 2:

root@host:# tar -xvzf ganglia-web-major.minor.release.tar.gz
root@host:# cd ganglia-web-major.minor.release

Edit Makefile and set the DESTDIR and APACHE_USER variables. On RPM-based distros, the default settings are:

# Location where gweb should be installed to
DESTDIR = /var/www/html/ganglia2
APACHE_USER = apache
...

This means that gweb will be available here. You can change to whichever name you want. Finally, run:

root@host:# make install

If no errors are shown, gweb is successfully installed. Skip to Configuring Ganglia for further information on gweb settings.

OS X

If you need to install gweb on Mac OS X, you have to follow a slightly different approach than if you were installing in Linux. Again, there isn’t any binary package for Mac OS X, leaving you with the option of downloading the source from the website. Before downloading, you have to make sure that your Mac OS X has shipped with a few of the requirements. That’s what this section is about.

First off, an HTTP server is required, and chances are good that your Mac OS X installation was shipped with Apache Web Server. You can also install it via MacPorts, but this approach is not covered here. It is your choice. In order to verify your Apache installation, go to System Preferences Sharing. Turn Web Services on if it is off. Make sure it’s running by typing http://localhost on your browser. You should see a test page. You can also load Apache via Terminal by typing:

$ sudo launchctl load -w /System/Library/LaunchDaemons/org.apache.httpd.plist

PHP is also required to run gweb. PHP is shipped with Mac OS X, but it’s not enabled by default. To enable, edit the httpd.conf file and uncomment the line that loads the php5_module.

$ cd /etc/apache2
$ sudo vim httpd.conf

Search for the following line, uncomment (strip the #) it, and save the file:

# LoadModule php5_module libexec/apache2/libphp5.so

Restart Apache:

$ sudo launchctl unload -w /System/Library/LaunchDaemons/org.apache.httpd.plist
$ sudo launchctl load -w /System/Library/LaunchDaemons/org.apache.httpd.plist

Now that you have satisfied the requirements, it’s time to download and install gweb 2. Please download the latest release. Once you have finished, change to the directory where the file is located and extract its content. Next, cd to the extraction directory:

$ tar -xvzf ganglia-web-major.minor.release.tar.gz
$ cd ganglia-web-major.minor.release

This next step really depends on how Apache Web Server is set up on your system. You need to find out where Apache serves its pages from or, more specifically, its DocumentRoot. Of course, the following location isn’t the only possibility, but for clarity’s sake, we will work with the default settings. So here, we’re using /Library/WebServer/Documents:

$ grep -i documentroot /etc/apache2/httpd.conf

Edit the Makefile found in the tarball. Insert the location of your Apache’s DocumentRoot and the name of the user that Apache runs. On Mac OS X Lion, the settings are:

# Location where gweb should be installed
DESTDIR = /Library/WebServer/Documents/ganglia2
APACHE_USER = _www
...

This means that gweb will be available to the user here. You can change this to whichever name you want. Finally, run:

$ sudo make install

If no errors are shown, Ganglia Web is successfully installed. Read the next sections to configure Ganglia prior to running it for the first time.

Solaris

Convenient binary packages for Solaris are distributed in the OpenCSW collection. Follow the standard procedure to install the OpenCSW. Run the pkgutil tool on Solaris, and then use the tool to install the package:

$ pkgutil
$ CSWgangliaweb

The default location for the configuration files on Solaris (OpenCSW) is /etc/opt/csw/ganglia. You can now start and stop all the Ganglia processes using the normal SMF utility on Solaris, as in:

$ svcadm enable cswapache

Configuring Ganglia

The following subsections document the configuration specifics of each Ganglia component. The default configuration shipped with Ganglia “just works” in most environments with very little additional configuration, but we want to let you know what other options are available in addition to the default. We would also like you to understand how the choice of a particular option may affect Ganglia deployment in your environment.

gmond

gmond, summarized in Chapter 1, is installed on each host that you want to monitor. It interacts with the host operating system to obtain metrics and shares the metrics it collects with other hosts in the same cluster. Every gmond instance in the cluster knows the value of every metric collected by every host in the same cluster and by default provides an XML-formatted dump of the entire cluster state to any client that connects to gmond’s port.

Topology considerations

gmond’s default topology is a multicast mode, meaning that all nodes in the cluster both send and receive metrics, and every node maintains an in-memory database—stored as a hash table—containing the metrics of all nodes in the cluster. This topology is illustrated in Figure 2-1.

Default multicast topology
Figure 2-1. Default multicast topology

Of particular importance in this diagram is the disparate nature of the gmond daemon. Internally, gmond’s sending and receiving halves are not linked (a fact that is emphasized in Figure 2-1 by the dashed vertical line). gmond does not talk to itself—it only talks to the network. Any local data captured by the metric modules are transmitted directly to the network by the sender, and the receiver’s internal database contains only metric data gleaned from the network.

This topology is adequate for most environments, but in some cases it is desirable to specify a few specific listeners rather than allowing every node to receive (and thereby waste CPU cycles to process) metrics from every other node. More detail about this architecture is provided in Chapter 3.

The use of “deaf” nodes, as illustrated in Figure 2-2, eliminates the processing overhead associated with large clusters. The deaf and mute parameters exist to allow some gmond nodes to act as special-purpose aggregators and relays for other gmond nodes. Mute means that the node does not transmit; it will not even collect information about itself but will aggregate the metric data from other gmond daemons in the cluster. Deaf means that the node does not receive any metrics from the network; it will not listen to state information from multicast peers, but if it is not muted, it will continue sending out its own metrics for any other node that does listen.

Deaf/mute multicast topology
Figure 2-2. Deaf/mute multicast topology

The use of multicast is not required in any topology. The deaf/mute topology can be implemented using UDP unicast, which may be desirable when multicast is not practical or preferred (see Figure 2-3).

UDP unicast topology
Figure 2-3. UDP unicast topology

Further, it is possible to mix and match the deaf/mute, and default topologies to create a system architecture that better suits your environment. The only topological requirements are:

  1. At least one gmond instance must receive all the metrics from all nodes in the cluster.

  2. Periodically, gmetad must poll the gmond instance that holds the entire cluster state.

In practice, however, nodes not configured with any multicast connectivity do not need to be deaf; it can be useful to configure such nodes to send metrics to themselves using the address 127.0.0.1 so that they will keep a record of their own metrics locally. This makes it possible to make a TCP probe of any gmond for an XML about its own agent state while troubleshooting.

Note

For a more thorough discussion of topology and scalability considerations, see Chapter 3.

Configuration file

You can generate a default configuration file for gmond by running the following command:

user@host:$ gmond -t

The configuration file is composed of sections, enclosed in curly braces, that fall roughly into two logical categories. The sections in the first category deal with host and cluster configuration; those in the second category deal with the specifics of metrics collection and scheduling.

All section names and attributes are case insensitive. The following attributes, for example, are all equivalent:

name NAME Name NaMe

Some configuration sections are optional; others are required. Some may be defined in the configuration file multiple times; others must appear only once. Some sections may contain subsections.

The include directive can be used to break up the gmond.conf file into multiple files for environments with large complex configurations. The include directive supports the use of typeglobs. For example, the line:

include ('/etc/ganglia/conf.d/*.conf')

would instruct gmond to load all files in /etc/ganglia/conf.d/ that ended in “.conf”.

gmond.conf: Quick Start

To get gmond up and running quickly just to poke around, all you should need to set is the name attribute in the “cluster” section of the default configuration file.

The configuration file is parsed using libconfuse, a third-party API for configuration files. The normal rules of libconfuse file format apply. In particular, boolean values can be set using yes, true, and on for a positive value and their opposites, no, false, and off for a negative value. Boolean values are not handled in a case-sensitive manner.

There are eight sections that deal with the configuration of the host itself.

Section: globals

The globals section configures the general characteristics of the daemon itself. It should appear only once in the configuration file. The following is the default globals section from Ganglia 3.3.1:

globals {
  daemonize = yes
  setuid = yes
  user = nobody
  debug_level = 0
  max_udp_msg_len = 1472
  mute = no
  deaf = no
  allow_extra_data = yes
  host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */
  host_tmax = 20 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no
  send_metadata_interval = 0 /*secs */
}
daemonize (boolean)

When true, gmond will fork and run in the background. Set this value to false if you’re running gmond under a daemon manager such as daemontools.

setuid (boolean)

When true, gmond will set its effective UID to the UID of the user specified by the user attribute. When false, gmond will not change its effective user.

debug_level (integer value)

When set to zero (0), gmond will run normally. A debug_level greater than zero will result in gmond running in the foreground and outputting debugging information. The higher the debug_level, the more verbose the output.

max_udp_msg_len (integer value)

This value is the maximum size that one packet sent by gmond will contain. It is not a good idea to change this value.

mute (boolean)

When true, gmond will not send data, regardless of any other configuration directive. “Mute” gmond nodes are only mute when it comes to other gmond daemons. They still respond to queries from external pollers such as gmetad.

deaf (boolean)

When true, gmond will not receive data, regardless of any other configuration directives. In large grids with thousands of nodes per cluster, or carefully optimized HPC grids, in which every CPU cycle spent on something other than the problem is a wasted cycle, “normal” compute nodes are often configured as deaf in order to minimize the overhead associated with aggregating cluster state. In these instances, dedicated nodes are set aside to be mute. In such a setup, the performance metrics of the mute nodes aren’t measured because those nodes aren’t a computationally relevant portion of the grid. Their job is to aggregate, so their performance data would pollute that of the functional portion of the cluster.

allow_extra_data (boolean)

When false, gmond will not send the EXTRA_ELEMENT and EXTRA_DATA parts of the XML. This value might be useful if you are using your own frontend and would like to save some bandwidth.

host_dmax (integer_value in seconds)

Stands for “delete max.” When set to 0, gmond will never delete a host from its list, even when a remote host has stopped reporting. If host_dmax is set to a positive number, gmond will flush a host after it has not heard from it for host_dmax seconds.

host_tmax (integer_value in seconds)

Stands for “timeout max.” Represents the maximum amount of time that gmond should wait between updates from a host. Because messages may get lost in the network, gmond will consider the host as being down if it has not received any messages from it after four times this value.

cleanup_threshold (integer_value in seconds)

Minimum amount of time before gmond will clean up expired data.

gexec (boolean)

When true, gmond will announce the host’s availability to run gexec jobs. This approach requires that gexecd be running on the host and the proper keys have been installed.

send_metadata_interval (integer_value in seconds)

Establishes the interval at which gmond will send or resend the metadata packets that describe each enabled metric. This directive by default is set to 0, which means that gmond will send the metadata packets only at startup and upon request from other gmond nodes running remotely. If a new machine running gmond is added to a cluster, it needs to announce itself and inform all other nodes of the metrics that it currently supports. In multicast mode, this isn’t a problem, because any node can request the metadata of all other nodes in the cluster. However, in unicast mode, a resend interval must be established. The interval value is the minimum number of seconds between resends.

module_dir (path; optional)

Indicates the directory where the metric collection modules are found. If omitted, defaults to the value of the compile-time option: --with-moduledir. This option, in turn, defaults to a subdirectory named Ganglia in the directory where libganglia will be installed. To discover the default value in a particular gmond binary, generate a sample configuration file by running:

# gmond -t

For example, in a 32-bit Intel-compatible Linux host, the default is usually at /usr/lib/ganglia.

Section: cluster

Each gmond daemon will report information about the cluster in which it resides using the attributes defined in the cluster section. The default values are the string "unspecified"; the system is usable with the default values. This section may appear only once in the configuration file. Following is the default cluster section:

cluster {
  name = "unspecified"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

Note

The attributes in the cluster section directly correspond to the attributes in the CLUSTER tag in the XML output from gmond.

name (text)

Specifies the name of the cluster. When the node is polled for an XML summary of cluster state, this name is inserted in the CLUSTER element. The gmetad polling the node uses this value to name the directory where the cluster data RRD files are stored. It supersedes a cluster name specified in the gmetad.conf configuration file.

owner (text)

Specifies the administrators of the cluster.

latlong (text)

Specifies the latitude and longitude GPS coordinates of this cluster on earth.

url (text)

Intended to refer to a URL with information specific to the cluster, such as the cluster’s purpose or usage details.

Note

The name attribute specified in the cluster section does place this host into a cluster. The multicast address and the UDP port specify whether a host is on the cluster. The name attribute acts justs as an identifier when polling.

Section: host

The host section provides information about the host running this instance of gmond. Currently, only the location string attribute is supported. The default host section is:

host {
  location = "unspecified"
}
location (text)

The location of the host in a format relative to the site, although rack,U[,blade] is often used.

Section: UDP channels

UDP send and receive channels establish how gmond nodes talk to each other. Clusters are defined by UDP communication channels, which is to say, that a cluster is nothing more than some number of gmond nodes that share the same send and/or receive channels.

By default, every node in a gmond cluster multicasts its own metric data to its peers via UDP and listens for similar UDP multicasts from its peers. This is easy to set up and maintain: every node in the cluster shares the same multicast address, and new nodes are automatically discovered. However, as we mentioned in the previous section on deaf and mute nodes, it is sometimes desirable to specify individual nodes by their unicast address.

For this reason, any number of gmond send and receive channels may be configured to meet the needs of your particular environment. Each configured send channel defines a new way that gmond will advertise its metrics, and each receive channel defines a way that gmond will receive metrics from other nodes. Channels may be either unicast or multicast and either IPv4 or IPv6.

Note that a gmond node should not be configured to contribute metrics to more than one Ganglia cluster, nor should you attempt to receive metrics for more than one cluster.

UDP channels are created using the udp_(send|receive)_channel sections. Following is the default UDP send channel:

udp_send_channel {
  #bind_hostname = yes 
  mcast_join = 239.2.11.71
  port = 8649
  ttl = 1
}
bind_hostname (boolean; optional, for multicast or unicast)

Tells gmond to use a source address that resolves to the machine’s hostname.

mcast_join (IP; optional, for multicast only)

When specified, gmond will create a UDP socket and join the multicast group specified by the IP. This option creates a multicast channel and is mutually exclusive with host.

mcast_if (text; optional, for multicast only)

When specified, gmond will send data from the specified interface (eth0, for example).

host (text or IP; optional, for unicast only)

When specified, gmond will send data to the named host. This option creates a unicast channel and is mutually exclusive with mcast_join.

port (number; optional, for multicast and unicast)

The port number to which gmond will send data. If it’s not set, port 8649 is used by default.

ttl (number; optional, for multicast or unicast)

The time-to-live, this setting is particularly important for multicast environments, as it limits the number of hops over which the metric transmissions are permitted to propagate. Setting this value to any value higher than necessary could result in metrics being transmitted across WAN connections to multiple sites or even out into the global Internet.

Following is the default UDP receive channel:

udp_recv_channel {
  mcast_join = 239.2.11.71
  port = 8649
  bind = 239.2.11.71
}
mcast_join (IP; optional, for multicast only)

When specified, gmond will listen for multicast packets from the multicast group specified by the IP. If you do not specify multicast attributes, gmond will create a unicast UDP server on the specified port.

mcast_if (text; optional, for multicast only)

When specified, gmond will listen for data on the specified interface (eth0, for example).

bind (IP; optional, for multicast or unicast)

When specified, gmond will bind to the local address specified.

port (number; optional, for multicast or unicast)

The port number from which gmond will receive data. If not set, port 8649 is used by default.

family (inet4|inet6; optional, for multicast or unicast)

The IP version, which defaults to inet4. If you want to bind the port to an inet6 port, specify inet6 in the family attribute. Ganglia will not allow IPv6=>IPv4 mapping (for portability and security reasons). If you want to listen on both inet4 and inet6 for a particular port, define two separate receive channels for that port.

acl (ACL definition; optional, for multicast or unicast)

An access control list may be specified for fine-grained access control to a receive channel. See Access control for details on ACL syntax.

Section: TCP Accept Channels

TCP Accept Channels establish the means by which gmond nodes report the cluster state to gmetad or other external pollers. Configure as many of them as you like. The default TCP Accept Channel is:

tcp_accept_channel {
   port = 8649
}
bind (IP; optional)

When specified, gmond will bind to the local address specified.

port (number)

The port number on which gmond will accept connections.

family (inet4|inet6; optional)

The IP version, which defaults to inet4. If you want to bind the port to an inet6 port, you need to specify inet6 in the family attribute. Ganglia will not allow IPv6=>IPv4 mapping (for portability and security reasons). If you want to listen on both inet4 and inet6 for a particular port, define two separate receive channels for that port.

interface (text; optional)

When specified, gmond will listen for data on the specified interface (eth0, for example).

acl (ACL definition; optional)

An access control list (discussed in the following section) may be specified for fine-grained access control to an accept channel.

Access control

The udp_recv_channel and tcp_accept_channel directives can contain an Access Control List (ACL). This list allows you to specify addresses and address ranges from which gmond will accept or deny connections. Following is an example of an ACL:

acl {
    default = "deny"
    access {
      ip = 192.168.0.0
      mask = 24
      action = "allow"
    }
    access {
      ip = ::ff:1.2.3.0
      mask = 120
      action = "deny"
    }
}

The syntax should be fairly self-explanatory to anyone with a passing familiarity with access control concepts. The default attribute defines the default policy for the entire ACL. Any number of access blocks may be specified that list hostnames or IP addresses and associate allow or deny actions to those addresses. The mask attribute defines a subnet mask in CIDR notation, allowing you to specify address ranges instead of individual addresses. Notice that in case of conflicting ACLs, the first match wins.

Optional section: sFlow

sFlow is an industry standard technology for monitoring high-speed switched networks. Originally targeted at embedded network hardware, sFlow collectors now exist for general-purpose operating systems as well as popular applications such as Tomcat, memcached, and the Apache Web Server. gmond can be configured to act as a collector for sFlow agents on the network, packaging the sFlow agent data so that it may be transparently reported to gmetad. Further information about sFlow interoperability is provided in Chapter 8. The entire sFlow section is optional. Following is the default sFlow configuration:

#sflow {
# udp_port = 6343
# accept_vm_metrics = yes
# accept_jvm_metrics = yes
# multiple_jvm_instances = no
# accept_http_metrics = yes
# multiple_http_instances = no
# accept_memcache_metrics = yes
# multiple_memcache_instances = no
#}
udp_port (number; optional)

The port on which gmond will accept sFlow data.

The remaining configuration parameters deal with application-specific sFlow data types. See Chapter 8 for details.

Section: modules

The modules section contains the parameters that are necessary to load a metric module. Metric modules are dynamically loadable shared object files that extend the available metrics gmond is able to collect. Much more information about extending gmond with modules can be found in Chapter 5.

Each modules section must contain at least one module subsection. The module subsection is made up of five attributes. The default configuration contains every module available in the default installation, so you should not have to change this section unless you’re adding new modules. The configuration for an imaginary example_module is provided here:

modules {
     module {
       name = "example_module"
       language = "C/C++"
       enabled = yes
       path = "modexample.so"
       params = "An extra raw parameter"
       param RandomMax {
         value = 75
       }
       param ConstantValue {
         value = 25
       }
    }
}
name (text)

The name of the module as determined by the module structure if the module was developed in C/C++. Alternatively, the name can be the name of the source file if the module has been implemented in an interpreted language such as Python.

language (text; optional)

The source code language in which the module was implemented. Defaults to “C/C++” if unspecified. Currently, only C, C++, and Python are supported.

enabled (boolean; optional)

Allows a metric module to be easily enabled or disabled through the configuration file. If the enabled directive is not included in the module configuration, the enabled state will default to yes.

Note

If a module that has been disabled contains a metric that is still listed as part of a collection group, gmond will produce a warning message but will continue to function normally by ignoring the metric.

path (text)

The path from which gmond is expected to load the module (C/C++ compiled dynamically loadable module only). If the value of path does not begin with a forward slash, the value will be appended to that of the module_path attribute from the globals section.

params (text; optional)

Used to pass a string parameter to the module initialization function (C/C++ module only). Multiple parameters can be passed to the module’s initialization function by including one or more param sections. Each param section must be named and contain a value directive.

Section: collection_group

The collection_group entries specify the metrics that gmond will collect, as well as how often gmond will collect and broadcast them. You may define as many collection groups as you wish. Each collection group must contain at least one metric section.

These are logical groupings of metrics based on common collection intervals. The groupings defined in gmond.conf do not affect the groupings used in the web interface, nor is it possible to use this mechanism to specify the names of the groups for the web interface. An excerpt from the default configuration follows:

collection_group {
  collect_once = yes
  time_threshold = 1200
  metric {
    name = "cpu_num"
    title = "CPU Count"
  }
}
collection_group {
  collect_every = 20
  time_threshold = 90
  /* CPU status */
  metric {
    name = "cpu_user"
    value_threshold = "1.0"
    title = "CPU User"
  }
  metric {
    name = "cpu_system"
    value_threshold = "1.0"
    title = "CPU System"
  }
}
collect_once (boolean)

Some metrics are said to be “nonvolatile” in that they will not change between reboots. This includes metrics such as the OS type or the number of CPUs installed in the system. These metrics need to be collected only once at startup and are configured by setting the collect_once attribute to yes. This attribute is mutually exclusive with collect_every.

collect_every (seconds)

This value specifies the polling interval for the collection group. In the previous example, the cpu_user and cpu_system metrics will be collected every 20 seconds.

time_threshold (seconds)

The maximum amount of time that can pass before gmond sends all metrics specified in the collection_group to all configured udp_send_channels.

name (text)

The name of an individual metric as defined within the metric collection module. Typically, each loaded module defines several individual metrics. An alternative to name is name_match. By using the name_match parameter instead of name, it is possible to use a single definition to configure multiple metrics that match a regular expression. The Perl-compatible regular expression (pcre) syntax is used (e.g., name_match = "multicpu_([a-z]+)([0-9]+)").

Note

You can get a list of the available metric names by running gmond with an -m switch.

value_threshold (number)

Each time a metric value is collected, the new value is compared with the last measured value. If the difference between the last value and the current value is greater than the value_threshold, the entire collection group is sent to the udp_send_channels defined. The units denoted by the value vary according to the metric module. For CPU stats, for example, the value represents a percentage, and network stats interpret the value as a raw number of bytes.

Note

Any time a value_threshold is surpassed by any single metric in a collection group, all metrics in that collection group are sent to every UDP receive channel.

title (text)

A user-friendly title for the metric for use on the web frontend.

gmetad

gmetad, the Ganglia Meta Daemon, is installed on the host that will collect and aggregate the metrics collected by the hosts running gmond. By default, gmetad will collect and aggregate these metrics in RRD files, but it is possible to configure gmetad to forward metrics to external systems such as Graphite instead.

gmetad listens on tcp port 8651 for connections from remote gmetad instances and will provide an XML dump of the grid state to authorized hosts. It also responds to interactive requests on tcp port 8652. The interactive facility allows simple subtree and summation views of the grid state XML tree. gweb uses this interactive query facility to present information that doesn’t fit naturally in RRD files, such as OS version.

gmetad topology

The simplest topology is a single gmetad process polling one or more gmond instances, as illustrated in Figure 2-4.

Basic gmetad topology
Figure 2-4. Basic gmetad topology

Redundancy/high availability is a common requirement and is easily implemented. Figure 2-5 shows an example in which two (redundant) gmetads poll multiple gmonds in the same cluster. The gmetads will poll node2 only if they are unable to poll node1 successfully. Both gmetads are always polling (active-active clustering).

gmetad topology for high availability (active-active)
Figure 2-5. gmetad topology for high availability (active-active)

gmetad is not limited to polling gmond: a gmetad can poll another gmetad to create a hierarchy of gmetads. This concept is illustrated in Figure 2-6.

gmetad hierarchical topology
Figure 2-6. gmetad hierarchical topology

gmetad, by default, writes all metric data directly to RRD files on the filesystem, as illustrated in Figure 2-4.

In large installations in which there is an IO constraint, rrdcached acts as a buffer between gmetad and the RRD files, as illustrated in Figure 2-7.

gmetad with rrdcached
Figure 2-7. gmetad with rrdcached

gmetad.conf: gmetad configuration file

The gmetad.conf configuration file is composed of single-line attributes and their corresponding values. Attribute names are case insensitive, but their values are not. The following attributes, for example, are all equivalent:

name NAME Name NaMe

Most attributes are optional; others are required. Some may be defined in the configuration file multiple times; others must appear only once.

The data_source attribute

The data_source attribute is the heart of gmetad configuration. Each data_source line describes either a gmond cluster or a gmetad grid from which this gmetad instance will collect information. gmetad is smart enough to automatically make the distinction between a cluster and a grid, so the data_source syntax is the same for either. If gmetad detects that the data_source refers to a cluster, it will maintain a complete set of round robin databases for the data source. If, however, gmetad detects that the data_source refers to a grid, it will maintain only summary RRDs.

Setting the scalable attribute to off overrides this behavior and forces gmetad to maintain a full set of RRD files for grid data sources.

The following examples, excerpted from the default configuration file, are valid data sources:

data_source "my cluster" 10 localhost my.machine.edu:8649 1.2.3.5:8655
data_source "my grid" 50 1.3.4.7:8655 grid.org:8651 grid-backup.org:8651
data_source "another source" 1.3.4.8:8655  1.3.4.8

Each data_source is composed of three fields. The first is a string that uniquely identifies the source. The second field is a number that specifies the polling interval for the data_source in seconds. The third is a space-separated list of hosts from which gmetad may poll the data. The addresses may be specified as IP addresses or DNS hostnames and may optionally be suffixed by a colon followed by the port number where the gmond tcp_accept_channel is to be found. If no port number is specified, gmetad will attempt to connect to tcp/8649.

Note

gmetad will check each specified host in order, taking the status data from the first host to respond, so it’s not necessary to specify every host in a cluster in the data_source definition. Two or three are usually sufficient to ensure that data is collected in the event of a node failure.

gmetad daemon behavior

The following attributes affect the functioning of the gmetad daemon itself:

gridname (text)

A string that uniquely identifies the grid for which this gmetad instance is responsible. This string should be different from the one set in gmond. The one set in gmond.conf (at cluster { name = “XXX” }) is used in the CLUSTER tag that wraps all the hosts that particular gmond instance has collected. The gridname attribute will wrap all data sources specified in a GRID tag, which could be thought of as a collection of clusters defined in the data_source.

authority (URL)

The authority URL for this grid. Used by other gmetad instances to locate graphs for this instance’s data sources. By default, this value points to “http://hostname/ganglia/”.

trusted_hosts (text)

A space-separated list of hosts with which this gmetad instance is allowed to share data. Localhost is always trusted.

all_trusted (on|off)

Set this value to on to override the trusted_hosts attribute and allow data sharing with any host.

setuid_username (UID)

The name of the user gmetad will set the UID to after launch. This defaults to nobody.

setuid (on|off)

Set this to off to disable setuid.

xml_port (number)

The gmetad listen port. This value defaults to 8651.

interactive_port (number)

The gmetad interactive listen port. This value defaults to 8652.

server_threads (number)

The number of simultaneous connections allowed to connect to the listen ports. This value defaults to 4.

case_sensitive_hostnames (1|0)

In earlier versions of gmetad, the RRD files were created with case-sensitive hostnames, but this is no longer the case. Legacy users who wish to continue to use RRD files created by Ganglia versions before 3.2 should set this value to 1. Since Ganglia 3.2, this value has defaulted to 0.

RRDtool attributes

Several attributes affect the creation and handling of RRD files.

RRAs (text)

These specify custom Round Robin Archive values. The default is (with a “step size” of 15 seconds):

"RRA:AVERAGE:0.5:1:5856" "RRA:AVERAGE:0.5:4:20160" "RRA:AVERAGE:0.5:40:52704"

The full details of an RRA specification are contained in the manpage for rrdcreate(1).

umask (number)

Specifies the umask to apply to created RRD files and the directory structure containing them. It defaults to 022.

rrd_rootdir (path)

Specifies the base directory where the RRD files will be stored on the local filesystem.

Graphite support

It is possible to export all the metrics collected by gmetad to Graphite, an external open source metrics storage and visualization tool, by setting the following attributes.

carbon_server (address)

The hostname or IP of a remote carbon daemon.

carbon_port (number)

The carbon port number, which defaults to 2003.

graphite_prefix (text)

Graphite uses dot-separated paths to organize and refer to metrics, so it is probably desirable to prefix the metrics from gmetad with something descriptive like datacenter1.gmetad, so Graphite will organize them appropriately.

carbon_timeout (number)

The number of milliseconds gmetad will wait for a response from the Graphite server. This setting is important because gmetad’s carbon sender is not threaded and will block waiting on a response from a down carbon daemon. Defaults to 500.

gmetad interactive port query syntax

As mentioned previously, gmetad listens on TCP port 8652 (by default) for interactive queries. The interactive query functionality enables client programs to get XML dumps of the state of only the portion of the Grid in which they’re interested.

Interactive queries are performed via a text protocol (similar to SMTP or HTTP). Queries are hierarchal, and begin with a forward slash (/). For example, the following query returns an XML dump of the entire grid state:

/

To narrow the query result, specify the name of a cluster:

/cluster1

To narrow the query result further, specify the name of a host in the cluster:

/cluster1/host1

Queries may be suffixed with a filter to modify the type of metric information returned by the query (as of this writing, summary is the only filter available). For example, you can request only the summary metric data from cluster1 like so:

/cluster1?filter=summary

gweb

Of the three daemons that comprise Ganglia, gweb is both the most configurable, and also least in need of configuration. In fact, there is no need to change anything whatsoever in gweb’s default configuration file to get up and running with a fully functional web UI.

Apache virtual host configuration

Although gweb itself requires no configuration to speak of, some web server configuration is necessary to get gweb up and running. Any web server with PHP support will do the job, and although web server configuration is beyond of the scope of this book, the Apache Web Server is such a common choice that it has been included an example of a virtual host configuration for Apache. Assuming that gweb is installed in /var/www/html/ganglia2 on a host that resolves to myganglia.example.org, the following configuration should get you started with Apache:

NameVirtualHost *.80

<VirtualHost *:80>
   ServerName myganglia.example.org
   ServerAlias myganglia
   
   DocumentRoot /var/www/html/ganglia2

   # Other directives here
</VirtualHost>

This is, of course, a simplistic example. For further reading on the subject, we recommend further reading here.

gweb options

gweb is configured by way of the conf.php file. In fact, this file overrides and extends the default configuration set in conf_default.php. conf.php is located in the web root directory. This file is well documented, and as of this writing there are more than 80 options, so it won’t cover them all, but it will cover some of the more important ones, and make note of some option categories—just so you’re aware they’re there.

The file, as its name suggests, is itself a PHP script made up of variable assignments. Unlike the other configuration files, assignments might span multiple lines. Attribute names are themselves keys in gweb’s $conf data structure, so they are case sensitive, and look like PHP array assignments. The following line, for example, informs gweb of the location of the RRDtool binary:

$conf['rrdtool'] = "/usr/bin/rrdtool";

All attributes in the file are required, and some may be defined multiple times; others must appear only once. Some values are derived from other values. For example, the rrds attribute is derived from gmetad_root:

$conf['rrds'] = "${conf['gmetad_root']}/rrds";
Application settings

Attributes in this category affect gweb’s functional parameters—its own home directory, for example, or the directories in which it will check for RRDs or templates. These are rarely changed by the user, but a few of them bear mentioning.

templates (path)

Specifies the directory in which gweb will search for template files. Templates are like a skin for the site that can alter its look and feel.

graphdir (path)

Specifies the directory where the user may drop JSON definitions of custom graphs. As described in the next chapter, users may specify custom report graphs in JSON format and place them in this directory, and they will appear in the UI.

rrds (path)

Specifies the directory where the RRD files are to be found.

As described in Chapter 7, various Nagios integration features may be set in gweb’s conf.php. Collectively, these enable Nagios to query metric information from gweb instead of relying on remote execution systems such as Nagios Service Check Acceptor (NSCA) and Nagios Remote Plugin Executor (NRPE).

Look and feel

gweb may be configured to limit the number of graphs it displays at once (max_graphs) and to use a specified number of columns for the grid and host views. There are also a number of boolean options that affect the default behavior of the UI when it is first launched, such as metric_groups_initially_collapsed .

The config.php file defines numerous settings that modify the functional attributes of the graphs drawn in the UI. For example, you may change the colors used to plot the values in the built-in load report graph and the default colors used in all the graphs and even define custom time ranges.

Security

Attributes in this category include the following:

auth_system (readonly|enabled|disabled)

gweb includes a simple authorization system to selectively allow or deny individual users access to specific parts of the application. This system is enabled by setting auth_system to enabled. For more information on the authorization features in gweb, see Chapter 4.

Advanced features

Here are some advanced features:

rrdcached_socket (path)

Specifies the path to the rrdcached socket. rrdcached is a high-performance caching daemon that lightens the load associated with writing data to RRDs by caching and combining the writes. More information may be found in Appendix A.

graph_engine (rrdtool|graphite)

gweb can use Graphite instead of RRDtool as the rendering engine used to generate the graphs in the UI. This approach requires you to install patched versions of whisper and the Graphite webapp on your gweb server. More information can be found here.

Postinstallation

Now that Ganglia is installed and configured, it’s time to get the various daemons started, verify that they’re functional, and ensure that they can talk to each other.

Starting Up the Processes

Starting the processes in a specific order is not necessary; however, if the daemons are started in the order recommended here, there won’t be a delay waiting for metadata to be retransmitted to the UDP aggregator and users won’t get error pages or incomplete data from the web server:

  1. If you’re using the UDP unicast topology, start the UDP aggregator nodes first. This ensures that the aggregator nodes will be listening when the other nodes send their first metadata transmission.

  2. Start all other gmond instances.

  3. If you’re using rrdcached, start all rrdcached instances.

  4. Start gmetad instances at the lowest level of the hierarchy (in other words, gmetad instances that don’t poll any other gmetad instances).

  5. Work up the hierarchy starting any other gmetad instances.

  6. Start the Apache web servers. Web servers are started after gmetad; otherwise, the PHP scripts can’t contact gmetad and the users see errors about port 8652.

Remember rrdcached

If gmetad is configured to use rrdcached, it is essential for rrdcached to be running before gmetad is started.

Testing Your Installation

gmond and gmetad both listen on TCP sockets for inbound connections. To test whether gmond is operational on a given host, telnet to gmond’s TCP port:

user@host:$ telnet localhost 8649

In reply, gmond should output an XML dump of metric data. If the gmond is deaf or mute, it may return a rather empty XML document, with just the CLUSTER tag. gmetad may be likewise tested with telnet like so:

user@host:$ telnet localhost 8651

A functioning gmetad will respond with an XML dump of metric data.

See Chapter 6 for a more comprehensive list of techniques for validating the state of the processes.

Firewalls

Firewall problems are common with new Ganglia installations that span network subnets. Here, we’ve collected the firewall requirements of the various daemons together to help you avoid interdaemon communication problems:

  1. gmond uses multicast by default, so clusters that span a network subnet need to be configured with unicast senders and listeners as described in the previous topology sections. If the gmond hosts must traverse a firewall to talk to each other, allow udp/8649 in both directions. For multicast, support for the IGMP protocol must also be enabled in the intermediate firewalls and routers.

  2. gmond listens for connections from gmetad on TCP port 8649. If gmetad must traverse a firewall to reach some of the gmond nodes, allow tcp/8649 inbound to a few gmond nodes in each cluster.

  3. gmetad listens for connections on TCP port 8651 and 8652. The former port is analogous to gmond’s 8649, while the latter is the “interactive query” port to which specific queries may be sent. These ports are used by gweb, which is usually installed on the same host as gmetad, so unless you’re using some of the advanced integration features, such as Nagios integration, or have custom scripts querying gmetad, you shouldn’t need any firewall ACLs for gmetad.

  4. gweb runs in a web server, which usually listens on ports 80 and 443 (if you enable SSL). If the gweb server is separated from the end users by a firewall (a likely scenario), allow inbound tcp/80 and possibly tcp/443 to the gweb server.

  5. If your Ganglia installation uses sFlow collectors and the sFlow collectors must traverse a firewall to reach their gmond listener, allow inbound udp/6343 to the gmond listener.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset