This chapter introduces and advises you on topics and considerations necessary for large Puppet deployments.
A node terminus is a node data provider for the Puppet server. A node terminus can do the following:
environment
supplied by the nodeYou can use one of the following two types of node terminus, each of which is described shortly:
ENCs are rarely necessary, as the global and environment data providers have completely eclipsed the more limited data that ENCs can provide. Use an ENC only when you cannot retrieve the data through Puppet Lookup.
You can use a node terminus when using puppet apply
(without a Puppet server). It is configured in a similar way, shown in the next section.
An external node classifier is a program that provides environment, class, and variable assignments for nodes. The ENC is executed by the exec
node terminus of the Puppet server prior to selecting the node’s environment.
puppet apply
implementation.The node classifier can be any type of program. Use the following configuration settings in the [master]
section of /etc/puppetlabs/puppet/puppet.conf:
[
master]
node_terminus
=
exec
external_nodes
=
/path/to/node_classifier
puppet apply
, then you need to place these configuration options in the [user]
section of the configuration file.The program will receive a single command-line argument: the certname
of the node (which defaults to the fully qualified domain name). The program should output YAML format with parameters
and classes
hashes, and an optional environment
string:
environment:
(optional)environment
configuration value of the agent.parameters:
(optional if classes
hash is defined)::parameter
)classes:
(optional if parameters
hash is defined)The output should be in the same YAML format as used by Hiera. Following is example output showing both single-level and multilevel parameter and class names:
$
/path/to/node_classifier
client.example.com
---
environment:
test
parameters:
selinux:
disabled
time
:
timezone:
PDT
summertime:
true
classes:
ntp:
dns:
search
:
example.com
timeout:
2
options:
rotate
puppet::agent:
status
:
running
enabled:
true
The hash of values indented beneath each class name is supplied as parameter input for the class. In the preceding example, we have passed status = true
as an input parameter to the puppet::agent
class.
parameters
. Puppet class parameters are declared underneath the class name.The node classifier must exit with a success status (0
) and must provide either a parameters
or classes
hash in the output. If it fails to deliver a successful response, the catalog build process will be aborted.
If no parameters or classes should be assigned, then return an empty hash as the result for one or both of the parameters:
$
/path/to/node_classifier
client.example.com
---
classes:
{
}
parameters:
{
}
An LDAP server can be queried as a node terminus to provide the same node data as provided by an ENC. You can only store Puppet attributes in LDAP if you are competent with LDAP configuration, and have the rights to modify the LDAP schemas available on the server.
To enable LDAP as a node terminus, add the following configuration settings to /etc/puppetlabs/puppet/puppet.conf:
[
master
]
node_terminus
=
ldap
ldapserver
=
ldapserver
.
example
.
com
ldapport
=
389
ldaptls
=
true
ldapssl
=
false
ldapuser
=
readuser
ldappassword
=
readonly
ldapbase
=
ou
=
Hosts
,
dc
=
example
,
dc
=
com
To add the schema to your server, download the Puppet LDAP schema from PuppetLabs LDAP Schema on GitHub and install it on your LDAP server according to the documentation for your specific LDAP server.
Then you need to add LDAP entries for each Puppet node, which again will be specific to your LDAP implementation and the tools available to you. A Puppet node entry in LDAP similar to the one specified for the ENC would look like this:
dn
:
cn
=
client
,
ou
=
Hosts
,
dc
=
example
,
dc
=
com
objectClass
:
device
objectClass
:
puppetClient
objectClass
:
top
cn
:
client
environment
:
test
puppetClass
:
dns
puppetClass
:
puppet
:
:agent
puppetVar
:
selinux
=
disabled
LDAP does not provide the flexibility to provide multilevel variables or parameters for classes. However, if you have additional attributes associated with the host, such as the IP address from the ipHost objectClass
, these will be available as top-scope variables for the node.
You can create a default
entry to be used when an exact match is not found in LDAP:
dn
:
cn
=
default
,
ou
=
Hosts
,
dc
=
example
,
dc
=
com
objectClass
:
device
objectClass
:
puppetClient
objectClass
:
top
environment
:
production
For this to work you must also add the following line to the named environment’s manifests/site.pp file:
[
vagrant@puppetserver
~
]
$
echo
"node default {}"
>>
/etc/puppetlabs/code/environments/production/manifests/site.pp
You can find more details at “The LDAP Node Classifier” on the Puppet docs site.
You don’t need to build your own node classifier from scratch. There are many community-provided examples you can utilize, or use as a starting point for development. Here are just a few examples, chosen mostly for diversity of their approach:
environment
from a Hiera lookup and output it. This is useful because Hiera cannot override the environment supplied by a node, but an ENC can. This allows you to store everything in Hiera. You’ll need to modify this ENC to use environment-based pathnames.There are a lot of ENCs on GitHub that read their data from a YAML file, or even call Hiera directly. It’s really not clear to me that these provide any value, as classes and parameters could be more easily specified in Hiera directly. To me the value of an ENC is when you need to query a data source outside of Hiera. The one and only advantage that an ENC has over Hiera is the ability to override the environment requested by the node. Perhaps there is value in Hiera-backed ENC lookups that never applied to the sites at which I’ve worked.
It could make sense if there are different teams managing separate Hiera data sources, although this may be better served by environment-specific data sources, as described in “Using Custom Backends in Environments”.
The good news about Puppet is that replicating servers is easy—in fact, trivially easy. Given the same module code, the same Hiera data, and access to the same resources used by ENCs or data lookups in modules, two different Puppet servers will render the same catalog for the same node every time.
The only problems managing multiple Puppet servers are deciding how to synchronize the files, and whether or not to centralize the signing of TLS certificates.
There are several ways to implement server redundancy with Puppet. Let’s review each of them.
One deployment strategy implements a Puppet server for each group of nodes. This can be simple and efficient when different teams manage different servers and nodes. Each Puppet server acts as its own certificate authority—which is by far the easiest way to bring up new Puppet agents. Each team validates and authorizes the nodes that connect to its server.
In this configuration, the Puppet servers can host the same Puppet modules, a distinct set of Puppet modules and configurations, or a combination thereof. This provides ultimate freedom to the team managing each group of nodes.
You can achieve redundancy by using a hybrid config with other techniques described next, or by simply restoring the Puppet server vardir
to another node and pointing the server’s name at it.
Another solid deployment strategy implements a single Puppet CA server for the entire organization. Configure the Puppet agent to submit all CSRs to this one CA server, using the following parameters in the [agent]
section of /etc/puppetlabs/puppet/puppet.conf:
[
agent
]
ca_server
=
puppet
-
ca
.
example
.
com
server
=
puppetserver01
.
example
.
com
With this configuration, every server and agent will get certificates from the single Puppet CA. Nodes point at different servers for all normal Puppet transactions.
This works much easier than using an external CA, because the Puppet agent will request and install the certificates for you. The only complication is that you’ll need to sync all approved agent certificates from the CA server down to the normal Puppet servers.
You can utilize a Network File System (NFS) to share the server’s TLS directory (/var/opt/puppetlabs/puppetserver/ssl), or you can use something like rsync
to keep the directories in sync.
If you are utilizing NFS, you may want to also share the /var/opt/puppetlabs/puppetserver/reports directory so that all client reports are centrally stored.
If you have many nodes in a single site, it may be necessary to put multiple Puppet servers behind a load balancer. Configure Puppet much as you would configure any other TCP load-balanced service and it will work fine.
The servers themselves must each have the same Puppet modules and Hiera data stored on them, or they must mount the data from the same place. I have found that NFS works well for this purpose, as there is generally not a high number of reads, and the files end up in the server’s file cache. You would want to share the following directories between servers:
The list of directories may change over time.
There are three different ways to synchronize data on globally disparate Puppet servers (naturally, each option has its own benefits and trade-offs):
A very common deployment model has each Puppet server check out the modules and data from a source code respository (which we’ll cover in “Managing Environments with r10k”). This can be done by schedule, triggered by a post-commit handler, or manual authorization or automated triggers of orchestration tools like as MCollective.
This technique has the advantage of fairly quick synchronization of changes to all servers.
In some environments, it is desirable to progressively deploy changes through staging and canary (“in a coal mine”) environments. In these situations, automation schedulers like Jenkins are used to progressively roll the changes from one environment to the other.
This technique enables a cautious, slow-roll deployment strategy.
Which one is best for your needs? Well, it depends on your needs and your testing model.
The single source pull model is simple to configure, requires very few components, and promotes change quickly. It can also promote a breaking change quickly, which will create large-scale outages in a very short period of time.
The rolling push of change through stages requires a nontrivial investment in services, systems, and custom automation to deploy. Once built, these systems can help identify problems long before they reach production. Depending on the automation created, the deployment automation can identify the breaking committer, open trouble tickets, roll back the changes, and even bake and butter your bread.
In reality, very few organizations use only one approach or the other. Most situations have a hybrid solution that meets their most important needs. For example:
environment
configuration variable.There are two different ways to distribute Puppet nodes globally (each option has its own benefits and trade-offs):
Which one is best for your needs? Well, it depends. There is no silver bullet and no single answer, as the problems one group will have won’t match others. If you are distributing many or large files using Puppet, latency will be very important. I’ve worked in environments where it was absolutely necessary to keep all nodes in sync within 20 seconds of a release. File synchronization latency created problems.
I’ve worked in other environments where the Puppet configuration was small without any significant file transfer. It also wasn’t essential that a node picked up changes more than once or twice a day, so a single cluster of Puppet servers handled 2,000 small remote sites. This was very easy to manage and maintain, and met the needs and expectations for that client.
Global deployments with fast node convergence requirements are usually best served with the hybrid of the choices previously shown. The process for building the hybrid solution requires two steps:
This improves access and latency for nodes by placing a well-connected Puppet server in their region. It limits the problems around synchronization time by pushing the files to a limited set of well-connected centers.
When managing that which may lose connectivity to a remote Puppet server, you can decide whether or not the client should run Puppet with the last cached catalog. There are two parameters to control this. The defaults are shown here:
# these are the default values
[
agent
]
usecacheonfailure
=
true
use_cached_catalog
=
false
usecacheonfailure
use_cached_catalog
With the default configuration, the agent will ask the Puppet server for a new catalog but fall back to the last cached catalog if the Puppet server cannot be contacted.
Which of these choices best meets your needs? Only you can tell. Each of them works to solve one set of problems. If you’re not sure which selection is best, simply make a choice. If you build the servers as suggested in this book, it won’t be difficult to migrate the servers to a different deployment strategy later. Changing from one configuration to another isn’t usually that difficult either.
However, the best thing about these choices is that none of them is exclusive. You can combine each and any of these techniques to solve more requirements.
Perhaps you’ve come up with a better choice, or you’ve seen a problem I didn’t mention here? I’d love to hear about it.
Before you move on to the next chapter, I’d like to remind you of best practices for managing Puppet servers:
And finally, remember that Puppet master is at the end of the road. It is only useful to help test your modules with Puppet 4, or to utilize an existing Rack infrastructure. Prepare to migrate to Puppet Server in the future.
In this chapter, you’ve created a small learning environment with Puppet Server (or a Puppet master), that you can use for testing Puppet manifests. You’ve learned the following about Puppet servers: