Chapter 8. Node Classification

The term node classification refers to both the tools and process used to supply a role, individual classes, and node parameters used to customize the catalog for a node.

Classifiers take many forms. Large sites typically employ a database or console for node classification. Smaller sites will often use Hiera or the Puppet language to classify nodes. The best solution is always site specific.

In this chapter, we look at a number of classification strategies. The goal of this chapter is not to redocument all of the available classifiers, but to look at the available options and specific considerations and best practices behind each one.

What Data Should the Node Classifier Provide?

As we discussed in Chapter 1, node classification provides data used to customize the model to produce a catalog specific to a node. Let’s take a quick look at the kind of data that is commonly set with a node classifier.

Roles and Profiles

Node classification’s original and most common purpose is to declare classes to be applied on the node. This selection can be direct and explicit or indirect through the assignment of roles and profiles.

Tip

A role doesn’t need to be a class. The role could be used to alter the Hiera hierarchy appropriately for inclusion of classes from a Hiera array.

The recommended approach is to assign a single role class, as discussed in Chapter 7; however, a classifier can also assign individual classes or profiles a-la-carte if necessary. Direct class or profile assignment remains common in environments where responsibility for Puppet is split between multiple teams, such as when security has direct ownership of some Puppet data, or in environments that offer self-service node classification.

Node-Specific Data

You can use node-specific data provided by an ENC or node facts to customize the Hiera hierarchy or Puppet catalog. The most common use case for node parameters is to modify the Hiera hierarchy used for lookup. Some of the more commonly used values include the following:

  • Hardware details such as node type, CPUs, interfaces, and storage

  • Tier, such as whether it’s a prod or dev node

  • Site, the physical or virtual location of the node

  • Contact information for node maintenance

Although data can be provided by either source, there are two points of concern for selection of the data source:

Information ownership

Even though node information can be provided by either node facts or an ENC, you should take care to avoid having an external data source attempt to provide values that a node knows better, such as how many CPUs are available on the node. Databases will always have drift or replication issues that delay information updates. Source details of the node hardware and other fixed attributes come from node facts.

Security controls

Node facts are provided by the node and can be altered by someone with superuser access to the node, whether approved or malicious. ENCs provide a centralized data source for security-sensitive values in environments in which superuser access to the node is shared, intentionally or otherwise.

Node Statements

Node statements are one of the oldest methods of classification for Puppet. They are simple classifiers based on the client certificate name, which makes them suitable for only the smallest sites.

The main benefit of node statements is their simplicity. They are available in a basic Puppet installation with no other infrastructure, as demonstrated in the following example:

node 'db011.example.com' {
  include roles::database
}

Node statements allow assignment of a block of code based on the node’s name, which defaults to the certificate name, which defaults to the node’s fully qualified domain name (FQDN). Typically, node statements are combined with semantic hostnames so that the purpose of a node can be derived from its name.

Tip

Name-based assignment won’t work well for any site that tracks nodes using serial number, MAC address, or other unique attributes. However, sites with complex tracking needs will have infrastructure available for an ENC to query, thus removing this need.

Although simple and easy to use, node statements have a large number of limitations:

  • The node statement will match only a single value (usually the node FQDN).

  • Only one matching node statement will be applied.

  • Node statement blocks cannot use parameters except those provided by an ENC (which would obviate the need for node statements).

  • Variables declared in the node statement are not available to top-level code outside the node statement (manifests in the manifests/ directory of the environment).

  • Variables used in the node statement are evaluated immediately and thus cannot be overridden by data lookups—a feature that’s possible only in the delayed evaluation of declared classes.

  • Node statements are all-or-nothing—enabling them requires that every node have a matching node statement (or node default definition).

As you will see through this section, node statements have a number of major limitations. Except when building small demonstration or isolated test environments, any of the other classifiers presented in this chapter will prove to be much more flexible and powerful.

Node Statement Matching

For review (because we touch on it in the sections that follow), nodes are matched by node statements in the following order:

  1. Complete match
  2. Regex match
  3. Partial match
  4. Default statement applied

Only the first match from this list is applied.

Default node statement

If any node statements exist, every node must match a node statement, or the catalog compilation will fail. Therefore, if you utilize even a single node statement, make certain to have a node default statement that will be applied to nonmatching nodes, as shown in the following:

node default {
  # this can be empty
}

Regular expression node statements

Node statements permit the use of regex statements to match groups of nodes. This is commonly used to classify nodes using a semantic hostname pattern. This can be useful for easily adding nodes of the same type without having to create node statements for each new node. The following example shows the code for this:

node /^webd+[13579]/ {
  # odd numbered web servers are Blue pod
}

Because the regex language can be difficult to read, always put a comment in or above a regex explaining what you expect the regex to match.

Warning

The behavior when multiple regex node statements match the same node is not guaranteed. One will be selected apparently randomly.

Because regexes are checked only after a complete name match is not found, create a node statement for a node’s complete name to exclude it from a group that matches a regex.

Partial node match

Node statements permit the use of partial FQDN statements to match nodes. During this step, Puppet will strip the rightmost sections of the name (up to the next period) and then check for a complete match. This is commonly used to classify nodes using a hostname pattern where the domain name could differ, as shown here:

node 'puppet-server' {
  # will match 'puppet-server.example.com', 'puppet-server.example.net', etc
}

As a partial match intends to match against a missing part of the data, these statements are inherently unreadable. Always put a comment in or above the match reminding the reader that it is intended to match multiple domains. Otherwise, it might be either misunderstood as a mistake and fixed or copied when creating another node because the person copying it doesn’t realize the consequences of a partial match.

Replacing Node Inheritance

Puppet versions prior to 4.0 allowed node statements to inherit from one another. You should avoid using node inheritance, even with older versions of Puppet. This was a source of significant difficulty and was made obsolete by the roles and profiles design pattern even for older versions of Puppet.

Let’s review an example conversion of node inheritence to the roles and profiles design pattern.

Node classification with inheritance

The most common inheritance design pattern is to specify a base node, a series of node statements that extend the base with specific services (such as an Apache node, a Tomcat node, a MySQL node), and a node that performs the final service specific configuration tasks, as shown in Example 8-1.

Warning

This older syntax has been deprecated and no longer works in any supported version of Puppet.

Example 8-1. Node inheritence example
# Base used by numerous profiles
node base {
  include 'ntp'
  include 'security'
}

# Web node profile extends base
node web inherits base {
  include 'apache'
  include 'php'
}

# Database node profile extends base
node db inherits base {
  include 'mysql'
}

# These nodes use existing node groups
node www1.example.com www2.example.com inherits web { }
node db1.example.com db2.example.com inherits db { }

# This node inherits web, but has to add db manually
node www-dev1.example.com inherits web {
  include 'mysql'
}

Even though this example was greatly simplified to keep it on a single page, in practice this often becomes a large sprawling mess of inconsistent updates. Besides becoming difficult to maintain, it has a few major limitations:

  • Node definitions do not support multiple inheritance, so it isn’t possible to create web and db profiles and include them both.

  • Inheritance also affects variable lookup scope, causing node assignment changes to affect code evaluation unpredictably.

There is nothing worse than unpredictable variable values and resource defaults that leak in unpredictable manners. Seriously, you’d be staring at the same code applied to two identical nodes trying to figure out why they got different catalogs. It was always a rabbit-chasing exercise and never lent itself to being fixed permanently. That’s why this approach has been abandoned.

Node classification using roles and profiles

The roles and profiles design pattern allows you to assign multiple profiles to the node without changing the code’s scope or parent module. You can easily convert any node inheritence design to the roles and profiles pattern by following these steps:

  1. Convert each node inheritence statement (something named inherits name) to a profile.

  2. Convert each node matching statement (something that matched a given certname) to a role.

  3. Replace the node matching statement with a role selection.

So, the first thing we do is convert each inherited node statement to a profile:

class profiles::base {
  include 'security'
  include 'ntp'
}

class profiles::web {
  include 'apache'
  include 'php'
}

class profiles::db {
  include 'mysql'
}

Next, we convert each matching node statement to a role:

class roles::webserver {
  include 'profiles::base'
  include 'profiles::web'
}

class roles::devserver {
  include 'profiles::base'
  include 'profiles::web'
  include 'profiles::db'
}

With this approach you can simply apply roles::devserver to a node, making use of any other profiles. There’s no linear map of single inclusion to maintain or work around. Each role implements a site-specific configuration by declaratively including the relevant profiles.

Finally, we use node statements to assign roles to the nodes to complete the conversion:

node 'www1.example.com', 'www2.example.com' {
  inlude 'roles::webserver'
}

node 'www-dev1.example.com' {
  inlude 'roles::devserver'
}

In several easy steps, you’ve just converted the old node inheritence pattern into the modern roles and profiles best practice. Further evolutions like removing kitchen-sink base profiles will be easy to accomplish after you finish this migration.

Node Parameters Within Node Blocks

When using node statements, node properties such as the location and tier of a host can be set as global variables within the node block. Because ENC properties are presented as top-scope variables, this offers a seamless way to transition to an ENC in the future.

If node properties are set this way, avoid adding any code that might rely on these properties outside the node statement. The rest of your site manifests are a higher scope than the node statement. Node properties will not be visible outside of the node scope. These properties will be visible to classes declared within the node scope, but not to classes declared within the top scope. This scoping restriction applies to Hiera lookups performed outside of the node scope, as well.

The practical implication of variables declared within the node scope is that top scope can be used for only very basic operations, such as setting resource defaults. You can no longer rely on it to include a base set of classes.

Fact-Based Classification

Facts are a simple way to pass node-provided data to a classifier and declared classes. They appear within the $facts[] hash and are available for interpolation in both the Hiera hierarchy and Puppet language modules.

Fact-Based Role Assignment

It’s possible to declare classes in a manifest based on a fact supplied by the node, as demonstrated in the following:

if $facts['role'] {
  include $facts['role']
}

Because the include command accepts arrays as an input, a-la-carte profile provisioning can also be supported using this method:

if $facts['profiles'] {
  include $facts['profiles']
}

Although fact-based assignment can be powerful for environments where nodes self-select the Puppet model, it decentralizes node management in a way that will reduce your ability to restructure your code or change your interfaces in the future. This will also complicate attempts to migrate to other classification systems. This is only practical in environments in which Puppet is a smaller piece of a large node orchestration platform.

Security and Fact-Selected Roles or Profiles

Role and profile declaration from facts permits node owners to generate a catalog for any node type in your infrastructure. If you use this approach, you should carefully consider what data might be exposed.

Warning

This approach moves security and node classification out to the edges, to the nodes themselves.

For example, imagine a case in which Hiera stores database credentials for your web application. If a user was able to add the web application profile to their catalog, they could easily recover these credentials. Virtually any security-sensitive data is vulnerable to such attacks.

Mitigating the risk by using trusted facts

If you carefully control how node certificates are signed, you can employ the example shown in Example 6-2 to utilize a role stored in the node’s signed Puppet certificate. This would be a very effective way to decentralize role assignment without an ENC.

If you make use of certificate authorities with autosign enabled, you’ll gain nothing from this change. Anybody can generate a certificate with any role they want and get it signed.

Fact-Based Hiera Classification

A more powerful and more common approach is to classify nodes using Hiera data selected by node-provided facts. You can use this approach instead of, or in combination with, an ENC.

The implementation of a Hiera node classifier is fairly simple. A list of classes is stored in Hiera, a lookup is performed, and the classes returned by the lookup are applied to a node. This is typically used with a certname level in the hierarchy so that classification can be managed on a per-node basis.

Fact-based classification using Hiera is accomplished by the following steps:

  1. Utilize node facts to adjust the Hiera hierarchy appropriately.

  2. List classes to be declared in Hiera data.

  3. Include classes from the Hiera data in a manifest.

Customizing Hiera hierarchy using facts

This topic was already covered in depth in Chapter 6, but let’s refer to the following example that uses both default and custom facts:

:hierarchy:
  - name: "Node-specific data"
    path: "fqdn/%{facts.fqdn}.yaml"

  - name: "Role data"
    path: "roles/%{facts.role}.yaml"

  - name: "Site data"
    path: "sites/%{facts.site}.yaml"

  - name: "OS-specific data"
    path: "os/%{facts.os.family}.yaml"

Listing classes in Hiera data

We can add classes to the preceding example at multiple levels of the hierarchy.

  1. Classes for the specific node in the fqdn/ data file

  2. Classes for the node’s role (from a custom fact) in the roles/ data file

  3. Classes for the node’s site (from a custom fact) in the sites/ data file

  4. Classes for every node of a given OS family in the os/ data file

Following is the basic role definition in YAML format:

---
# classes for the webserver role
classes:
  - 'profiles::base'
  - 'profiles::web'

Declaring classes from Hiera data

The typical approach is to use the include() function call on the results of a Hiera lookup of classes, like so:

lookup('classes', Array, 'unique', []).include

Always supply a default value in case the lookup() call fails.

Node Parameters with Hiera Classification

Nodes are commonly given node-specific parameters during classification. It would appear that a circular dependency is created by storing these values in Hiera. Hierarchy lookups are based on these values, but these values are supposed to be sourced from Hiera lookups. How do you address this problem?

There are two solutions to this problem:

  • Use only facts to customize the Hiera hierarchy.

  • Provide the node values early in the classes being applied.

Let’s review how this would work:

  1. The class that retrieves these node parameters uses automatic parameter lookup or explicit lookup() calls to retrieve the node parameters.

  2. The Hiera data file containing node data is read from a Hiera hierarchy level that uses node facts (such as certname).

  3. Hierarchy levels utilizing the node data will be available after the node data values are read.

Let’s look at an example that ties this self-referential lookup together:

:hierarchy:
  - name: "Node-specific data"
    path: "fqdn/%{facts.fqdn}.yaml"

  - name: "Role data"
    path: "roles/%{node_data::role}.yaml"

Within the data file accessible from node facts, we can define values to be used within the hierarchy:

---
node_data::role: 'webserver'

Finally, we use a module to source the node values and make them available for the Hiera hierarchy:

class node_data(
  String $role,
) { }

class roles::any_role() {
  require 'node_data'
}

Understanding the time-order nature of the lookups makes what would appear to be circular references easy to resolve.

Avoiding Node Data in Manifests

You should avoid using free-standing manifests (such as site.pp) to source and manipulate node data. Although this appears easy, you are creating an implicit interface that must be used by every module. Top-level manifests are shared by every node and every module in that environment. Any changes to that manifest will affect every node’s catalog. You will find it difficult if not impossible to adapt this interface as needs change.

Tip

A node data module can be selected by a role, and it can be versioned. This allows multiple node data mechanisms to coexist.

The SoC philosophy makes it clear that a module whose purpose is to look up node data is the correct approach. The KISS philosophy applies here, as well. Contain all your application logic in your roles and profiles, and limit node classification data to one or more modules specific to that goal.

By creating a module that provides the interface for node data aquisition, you can tie the interface to the version of the module. You can test a new module and node data interface without breaking modules that depend on the older interface. Doing node data mapping in a single module (rather than spread out across manifests and classes) keeps it simple to debug, and easy to understand. This relegates your classification logic to the responsibility of determining what roles and profiles should be applied.

Serverless Classification

In a serverless Puppet environment, where the Puppet codebase and data are synchronized to every node, the node is responsible for self-classification. You can do this using any of the techniques used with a server or by passing the node classification directly on the command line:

$ puppet apply -e 'include roles::webserver'

This kind of implementation requires that each node is configured knowing what role to request. Using node facts to customize the Hiera hierarchy as discussed in “Fact-Based Hiera Classification” provides a significantly more flexible and powerful implementation for self-classification. 

ENCs

There are many ENCs available for use with Puppet. We review a few of them here and examine what makes them an appropriate choice.

The most important consideration for a node classifier are the benefits it provides to the teams managing the nodes; for example:

  • GUI for node management

  • Node inventory and reporting features

  • APIs for automated node management

  • Role-based access control (RBAC)

  • Change management and history

These features make it possible for self-service provisioning and oversight by users who are not suitable for or capable of managing Puppet code.

What Data Can an ENC Provide?

The ENC can provide three types of data about a node. Let’s take a quick look at these data types and how you can best use them in node classification.

Class list (node role)

The most common purpose of an ENC is to assign classes to a node. As the usage of ENCs has evolved with practice, the recommended approach is to assign a single role class as described in Chapter 7; however, a classifier can also assign individual classes a-la-carte if so desired.

Node parameters

There are often a number of node-specific properties that cannot be provided by node facts because they might not be available or knowable by the node. As with node facts, you can use these parameters for Hiera hierarchy and Puppet catalog customization. The more commonly used values include the following:

  • Tier, such as whether it’s a prod or dev node

  • Site, the physical or virtual location of the node

  • The state of the node for deployment purposes

  • Contact information for node maintenance

You’ll notice that most of this information is tied to inventory or resource management.

Puppet environment

The node classifier can override the Puppet environment requested by the node. This allows Puppet Enterprise or infrastructure management tools like Foreman to dynamically assign groups of nodes without making configuration changes on each node.

Warning

It’s important to be aware that the ENC-supplied Puppet environment is authoritative. If the classifier supplies environment, there is no way to override that from the node. This can complicate the use of Puppet environments for ad hoc testing.

There won’t be a best-practice choice of whether to assign environments from the ENC because this will differ greatly depending on your infrastructure and tools available. Concerns about environment management and how to handle testing are discussed in depth in Chapter 9.

Puppet Management Consoles

Some ENCs are tightly integrated with Puppet and thus offer features above and beyond basic node management. Let’s take a brief moment to review a few of the consoles available for Puppet.

Puppet Enterprise console

The Puppet Enterprise product offers the following capabilities:

  • Provides both an ENC and a Puppet-supported console for Puppet node management. This is one of the prime differentiators between Puppet Community and Enterprise versions.

  • Authorizes users based on granular RBAC that integrates cleanly with Active Directory and Lightweight Directory Access Protocol (LDAP) authentication.

  • Uses a group model for node classification: classes and node parameters can be assigned to every node, to a group of nodes, to a subgroup, or directly to a single node. Additionally, a rules-based node classifier enables classification based on node facts or exported resources to assign classes.

  • Generates a console overview of nodes in the infrastructure, including last run status and inventory. Node events can be inspected. Reports can be generated based on data from the Puppet convergence reports.

  • Provides a robust RESTful API for managing nodes, making it fairly simple to integrate with external automation systems and node inventories.

  • Includes a GUI for ad hoc execution and exploration of nodes using Puppet. The Puppet agent on a node can be invoked at any time from the Console.

With Puppet version 3, there were significant differences between the way Puppet Enterprise and Open Source Puppet were deployed to your infrastructure. Puppet 4 Unified Packaging uses the same basic installation layout as Puppet Enterprise, making conversion from and use of open source code much simpler.

We encourage you to consider licensing Puppet Enterprise. It’s a great product, and purchasing a license helps to support ongoing development of Puppet for both Enterprise and Open Source users.

Puppet Dashboard

The Puppet Dashboard was an early proof of concept for what evolved to become the Puppet Enterprise console. Development was discontinued years ago, and community support has stopped as the ENC features no longer matched well with Puppet feature development. You should consider one of the other console solutions instead.

Inventory and Infrastructure Management ENCs

The following products are infrastructure management frameworks that happen to provide node data to Puppet through the ENC interface. Although these are less tightly integrated with Puppet, they provide features and functionality available both before and beyond what Puppet can provide to a node.

Foreman

Foreman is a complete lifecycle management tool for physical and virtual servers. In addition to providing an external node classifier for Puppet, Foreman provides inventory management, physical machine provisioning (PXE), virtual machine provisioning, cloud provisioning, and reporting services.

Its node classification system is robust. Foreman supplies a group-based class and node parameter inheritance model, similar to the model offered by Puppet Enterprise. Classes are autodiscovered but you can hide them. Parameterized class assignment is supported. You can pass array and hash data structures to the node. Unlike Puppet Enterprise, Foreman does not currently provide a rules-based classifier.

Because Puppet node management is just a part of what Foreman does, configuration is somewhat more complex than Puppet Enterprise or any Puppet-oriented ENC solutions. Foreman has a strong focus on provisioning, and provides PXEboot automation, DNS management, DHCP management, and an IPAM solution. As a result, its RESTful API is somewhat heavier than other solutions, though it is well documented and straightforward to use. Even though most of the components are optional, a full-featured environment management solution needs to integrate with your hypervisors, cloud providers, DNS, DHCP, TFTP, and Puppet infrastructure.

Foreman is a RedHat–supported project that works with community contributors, much as Puppet does. Upstream, it’s incorporated into Katello and sold as part of the Red Hat Satellite product.

Cobbler

Cobbler was originally used as part of Red Hat Satellite provisioning services but was replaced by the more full-featured Foreman. Cobbler is a good choice if you’re looking for a simple, lightweight provisioning system and ENC. It is not as full-featured as the other solutions, but it does provide platform-, group-, and node-based classification. You must supplement Cobbler with another console for reporting.

Cobbler’s strength is that it’s fast, well understood, and very simple to set up. Cobbler integrates with DNS and DHCP by writing templates for each. Its APIs are fairly straightforward, and KOAN provides a simple method to provision KVM and Xen virtual machines. It maintains a node inventory and provides an ENC for Puppet, but does not offer reporting features.

It stores node data in a simple JSON document, and supports a number of replication topologies out of the box. Using replication, you can easily scale out your DHCP and DNS infrastructure, as well. The storage backend is pluggable, and MongoDB, CouchDB, and MySQL are also supported out of the box.

Cobbler is a good choice if you need a simple provisioning solution for a small environment without node reporting requirements.

Build your own ENC

You can use virtually anything that stores node data as an ENC. The Puppet node terminus is fairly simple to extend using Puppet’s plugin architecture, but if you’re uncomfortable with that approach you can simply write an external node classifier to be invoked by the exec terminus.

Although we generally advise against creating anything from scratch when a suitable solution is publicly available, node classification logic can be incredibly site specific. The creation of an external classifier to use an existing configuration management database or inventory management database could be done fairly easily.

If you do decide to generate your own, be sure to consider the demand load from Puppet and reliability requirements. An ENC lookup failure will prevent the node catalog from being built. Here’s some things to consider in an ENC design:

  • The data source should be designed for availability not less than the Puppet servers.

  • The data source must be capable of handling spiky bursts of traffic such as an orchestrated push, or every node restarting at the same time.

  • Are read-only slaves available to handle traffic while maintenance is done on the database?

If you do implement your own ENC, it’s a good idea to review Puppet’s internal indirectors and termini and consider how your classifier can use them. In particular, the facts indirector and the report indirector can be queried for information about the nodes, such as the agent_specified_environment fact available on newer releases of Puppet.

Your console can also perform queries against PuppetDB, which makes a huge amount of node data available via the API. If your console relies on PuppetDB, be sure that it does so in a way that meets your availability and failover requirements because this approach can potentially create another point of failure.

Summary

Node classification provides data used to customize the Puppet catalog for a node. You can implement node classification in many different and overlapping ways, including node facts, ENCs, and node statements.

There is no single best-practice choice for a node classifier, because it depends on your specific needs and the tools available. The appropriate choice of a node classifier will vary or change over time depending on the infrastructure available to manage the nodes.

Here are the key takeaways from this chapter:

  • A node classifier provides node-specific data not available from node facts.

  • You can use node parameters to customize the Hiera hierarchy for data lookups.

  • You can source classification data from existing inventory or infrastructure management tools.

  • Assignment using node statements is suitable only for small environments without other infrastructure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset