Chapter 6. Hiera Data

Hiera is a multilayered, hierarchical key/value lookup system for Puppet that provides data used to inform the code. In this chapter, we look at best practices for configuring Hiera backends, designing your hierarchy, and integrating Hiera into your Puppet code. We briefly review the features and functionality of Hiera before covering best practices.

Separating Code and Data

Hiera’s most important capability and its greatest value come from its ability to separate your code and data. Separation of code and data is a fairly old idea as far as programming goes. You generally cannot use code that contains data without modification, whereas code that acts based on input data can be utilized repeatedly for different purposes. This abstraction makes it possible to assemble functionality from many reusable components rather than building a large monolithic codebase with a single purpose.

Conversely, the structure and storage of data is a well-studied and understood subject. There are frameworks and skill sets devoted to data management. Separating the code from the data allows the developer to focus on the code and data specialists to focus on data management.

There are many different guidelines and rules, but we’ve use the following three simple and easy baselines for clarity. When considering where some information belongs, compare it to these guidelines:

  • Information that differs for each resource (user name, UID, etc.) is data.

  • Code describes how to do something, whereas data describes what should be done.

  • Data provides content, and code implements behavior.

Global, Environment, and Module Data

Hiera v4/v5 provides three independent layers where you can find data. Every query will proceed through the layers in the following order:

Global

A hierarchy of site-specific data shared by all Puppet environments. Only globally shared values should exist here.

Environment

A hierarchy of site-specific data for the Puppet environment. You will place the vast majority of data here.

Module

A unique hierarchy for component-specific data in the module, usually only platform-specific data or module default values.

Each layer has its own unique hierarchy for data lookups. This provides considerable flexibility for management of data by different teams, as discussed in “Designing the Hiera Hierarchy”.

Tip

Until the module layer was added to Hiera, it was necessary for module code to contain default values. The module layer of Hiera v4+ does away with that limitation, allowing for the creation of completely dataless modules.

Each query will proceed through the layers in order, checking each level of the hierarchy in each layer. You can configure each key lookup to return results using four different merge strategies:

First

The lookup can return the first key found.

unique

The lookup will return array values found from every layer, deduplicated and flattened into a single array.

hash

The lookup will return the values of hash keys found under the query key from every layer.

deep

The lookup will return the values of hashes within hashes found under the query key from every layer.

Select the appropriate merge strategy based on your data structure.

Hiera Backends

Hiera can source data from built-in or custom backends, creating a pluggable interface between Puppet and any given data source. Hiera’s built-in backends include YAML-, JSON-, and HOCON-formatted data files. Custom backends exist for HTTP sources, MySQL databases, MongoDB, Hashicorp Vault, and many other sources. It’s quite trivial to create a custom backend for anything you want.

Data that differs for each resource should be stored in a Hiera datastore. Depending on the need, here are some recommended guidelines:

  • You could store data that tends to change less frequently in YAML, JSON, or HOCON text files.

  • You could store data that changes often in service registries or databases with low-latency lookup.

  • Data that must be computed can be queried from functions and programs.

Note

We discuss the built-in text file backends in “The Built-In Backends”, and custom backends for database and service registries in “Custom Hiera Backends”.

Designing the Hiera Hierarchy

Each layer has its own hierarchy for data lookups. Each level of the hierarchy can select a different backend as a data source. This flexibility makes Hiera capable of sourcing data from innumerable sources.

The design of your hierarchy plays a key role in the organization of your data. There are several key considerations when designing a hierarchy. An ideal design should d the following:

  • Reduce or eliminate duplication of data across the hierarchy.

  • Reduce or eliminate the need to pass data between different teams.

  • Group data according to the supplier of the data.

  • Facilitate auditing and debugging.

Athough many of these concerns revolve around standard data management topics, we touch on concerns specific to Puppet in the sections that follow.

Variable Interpolation

The Hiera hierarchy contains interpolation tokens to select the appropriate data source for the current node. An interpolation token is special syntax that will be replaced with the value named. Hiera will recognize the interpolation token, look up the value, and then replace the token with the value found—just like interpolating a variable in a string.

You can interpolate any node fact or Puppet variable to customize a level in the hierarchy, as shown in Example 6-1. This flexibility can adjust hierarchies to suit each node, but it can also create problems if not designed well. In this section, we review best practices relating to interpolation of data in the hierarchy.

Node-provided facts

The most common interpolation is the use of node-provided facts to select the data source.

Example 6-1. Interpolate node facts
:hierarchy:
  - name: "Per-node data"
    path: "nodes/%{facts.hostname}.yaml"

The hierarchy level shown in this example will search a filename specific to the hostname provided by the node.

Tip

Hiera uses the %{variable} token for interpolation and a period to access hash values. $facts['os']['family'] in Puppet language would be %{facts.os.family} in Hiera.

Use the facts hash to retrieve variables

Simple facts are available as top-level variables in Puppet (e.g., $hostname) but might be removed in the future. Sourcing facts from the $facts hash as shown in Example 6-1 guarantees it is a node-provided fact, and documents the source of data to the reader.

In contrast, %{some_name} could be a node fact, an ENC parameter, or a manifest variable. You’ll need to analyze the code to determine the value’s origin.

Viewing a node’s facts

You can run the facter command to see what facts are gathered about the node. You can also use facter -p or puppet facts print to get all of those facts plus the custom facts installed by Puppet modules. The -p option might disappear in future versions.

Use trusted facts when available

Using node facts is sufficient and necessary for most use cases, but it can be a security concern. You should prefer any fact available in the $trusted hash over node-supplied facts. There are few trusted facts, but they cover most essential use cases.

Example 6-2 demonstrates the use of the safer server-validated node certificate name.

Example 6-2. Interpolated trusted facts
:hierarchy:
  - name: "Per-node data"
    path: "nodes/%{trusted.certname}.yaml"

Any custom attributes from the node’s signed certificate are available in the $trusted['extensions'] hash. For example, if the certificate was built to contain a customer name you can retrieve this and use it in the hierarchy as shown in the following code:

:hierarchy:
  - name: "Per-customer data"
    path: "customers/%{trusted.extensions.customer}.yaml"

To learn more about storing custom attributes in the node’s certificate, refer to the documentation.

Remove environment interpolation

Hiera v3 (used in Puppet 3) had only a single layer, and thus only a single hierarchy. This led many people to develop complex environment interpolation in the global hierarchy to access environment and module data. Now that Hiera has separate layers for environment and module data, we can remove that unnecessary complexity.

Label node classifier-provided data

The hierarchy can and often should make use of data provided by a node classifier such as Puppet Enterprise Console or Foreman. Which values are provided by the ENC is specific to the implementation. Unless labeled clearly, they are indistinguishable from variables set in Puppet code.

To make the data source apparent to the viewer, place the ENC-provided data in its own bespoke hash, as shown in the following code:

:hierarchy:
  - name: "Site data"
    path: "sites/%{foreman.location}.yaml"

Avoid using Puppet variables

The hierarchy can make use of variables set by Puppet modules and manifests. You should avoid this in the global and environment hierarchies, because it ties the deployment or an environment to a specific module or manifest. It is also parse-order dependent, so any queries performed before the variable is defined will not resolve to the correct path.

This is useful and effective only when you use it to customize lookups within a certain module. For example, a service profile might use a mandatory parameter in the hierarchy layer for queries done within the profile.

To make the data source readily apparent to the viewer, place module-provided data in its own bespoke hash, as shown in the following:

:hierarchy:
  - name: "Data from service module"
    path: "services/%{service.cluster}.yaml"

As discussed in Chapter 1, use only variables explicitly documented as public by the module author. Otherwise, code refactoring of that module might rename, eliminate, or change the purpose and behavior of the variable used. It’s best to avoid using module-sourced values whenever possible.

Use explain to debug lookup interpolation

As discussed in Chapters 3 and 7, it’s easiest and most intuitive to follow variable references when those references strictly flow upward from your node’s facts and the ENC data. By interpolating module-specific variables into higher-level Hiera hierarchies, the flow of data through your site becomes somewhat cyclical and difficult to debug and understand.

The puppet lookup --explain will get node facts from --node hostname or --facts facts.yaml and provide a detailed analysis of the Hiera hierarchy and in which level of which layer a specific value was found. This is an incredibly powerful and useful tool. The following example shows it in action:

$ puppet lookup --node node_name --explain classes --merge unique
Searching for "classes"
  Global Data Provider (hiera configuration version 5)
    Using configuration "/etc/puppetlabs/puppet/hiera.yaml"
    Hierarchy entry "Per-node data"
      Path "/etc/puppetlabs/code/hiera_global/nodes/testy.example.om.yaml"
        Original path: "nodes/%{trusted.certname}.yaml"
        Path not found
...each level of each layer will be shown...

By keeping to the aforementioned recommendations, the puppet lookup command makes it trivial to debug. If any part of the hierarchy is interpolated from Puppet code, those values won’t be available. You’ll be forced to read and analyze the code yourself to figure out what happened.

You can read about how to use the explain feature at Using Puppet Lookup and the Puppet Lookup man page.

This is by no means an exhaustive list; facts are arbitrary and treated as top-level variables. Any top-level variable can easily be spoofed unless it’s defined explicitly by your Puppet code or ENC.

Facts are by default presented as top-level variables. Global variables and data from your ENC have priority over facts. However, values not supplied by your ENC can be spoofed using facts. For example, if your hierarchy uses a tier parameter from your ENC to provide production specific values and there is a case in which an agent can invoke a run without the ENC supplying a tier parameter, a malicious user might be able to recover data from that hierarchy using a static fact.

You should use caution with Hiera interpolation functions, as well. Although Hiera does provide the lookup() recursive lookup function, you should not use it as a hierarchy interpolation token because doing so might create an infinite recursive lookup loop.

Design Guidelines

Let’s review some common design goals and best usage of them. These examples show YAML data files, but the principles apply regardless of which backend is used.

Most specific to general—top to bottom

Because first-found answers override lower answers, the Hiera hierarchy should always query node-specific files first, with each successive layer containing more general data. The final layer should contain fallback values common to all nodes. This allows overrides to be applied at the appropriate level for clarity and debugging. Example 6-3 shows a simple, flat hierarchy demonstrating the idea.

Example 6-3. Flat, data-focused hierarchy
:hierarchy:
  - name: "Per-node data"
    path: "nodes/%{trusted.certname}.yaml"

  - name: "Per-site data"
    path: "sites/%{facts.site}/test.yaml"

  - name: "Per-OS defaults"
    path: "os/%{facts.os.family}.yaml"

  - name: "Common data"
    path: "common.yaml"

In this hierarchy, values for every node are applied in the common level. Values that are specific to the operating system (OS), site, and node are stored in their specific files. The first level is the most specific, and the final level is the most general. Even though most hierarchies will have more levels, this concept holds true regardless.

Avoid unnecessary deep hierarchies

In our experience, it’s best to keep your hierarchy as flat as possible. In a flat hierarchy, levels of the hierarchy are near the root and have no children of their own. A flat hierarchy tends to improve data organization and tends to reduce the amount of duplicated data you need to manage.

A deep hierarchy is one shaped like a tree, where each node in the hierarchy is a child of a higher priority node. The example shown below was a flat hierarchy with a clear separation of data type. The best way to identify a problematic deep hierarchy is the use of unrelated data in subdirectories below data. The following type of hierarchy will create many layers of duplication:

:hierarchy:
  - name: "Node data"
    path: "%{facts.tier}/%{facts.location}/%{facts.os.family}/%{facts.fqdn}.yaml"

  - name: "OS data"
    path: "%{facts.tier}/%{facts.location}/%{facts.os.family}.yaml"

  - name: "Location data"
    path: "%{facts.tier}/%{facts.location}.yaml"

A deep hierarchy will increase data redundency the deeper you look. In the preceding example, there is likely to be a lot of commonality in your os.family hierarchy, but because this is nested under location and tier, it will be duplicated in each location tree. If you have three tiers (dev, stage, prod) and four locations (America, Europe, Asia, Africa) you’ll end up duplicating the same value in 12 places. Larger structures could have hundreds of duplicates.

Warning

When data is duplicated across many locations, there’s more chances of inconsistent updates. An error in one copy might not be noticed during testing, simply because you’re unlikely to test the data for every possible combination of choices.

Deep hierarchies should contain data relevant to the top level

The benefit of deep hierarchies is that they facilitate flexibility in your values. Because nothing is shared between branches of the structure, you are free to set whatever value you want in each branch.

For cases in which a deep hierarchy is necessary, the subsections should reflect most specific to most general. This will use Hiera’s lookup functionality and decrease duplicate data. Here’s what the code looks like:

:hierarchy:
  - name: "Per-node data"
    path: "nodes/%{trusted.certname}.yaml"

  - name: "Per-site data"
    paths:
      - "sites/%{facts.site}/%{facts.service}/%{facts.cluster}.yaml"
      - "sites/%{facts.site}/%{facts.service}.yaml"
      - "sites/%{facts.site}.yaml"

  - name: "Per-OS defaults"
    path: "os/%{facts.os.family}.yaml"

There are benefits and drawbacks to both approaches, but there’s a simple golden rule for evaluating the need for depth in a hierarchy: the branches should be refinements of their parent. If the data is not related to the data higher on the branch, it should be stored in a more general location.

Useful Hierarchy Levels

There are a few other common levels that you might want to add to your hierarchy, based on your needs. These hierarchies (and any of the hierarchies listed earlier) are by no means mandatory, but they can be useful to solve a few specific problems.

The accounts hierarchies

Most organizations keep user account data separate from service data. Account data rarely needs to be platform, location, or tier specific, and it is often managed by a completely different team.

The example that follows will handle not only project-specific and shared-account lists, but also the few cases in which an account might be platform specific: your Linux root and Windows administrator accounts will have platform-specific properties.

Example 6-4. Accounts hierarchy level
  - name: "Account data"
    paths:
      - "accounts/%{facts.project}.yaml"
      - "accounts/%{facts.os.family}.yaml"
      - "accounts/common.yaml"

Data in the accounts hierarchy can be stored as serialized data in a form ready to be consumed by the create_resources() function or a defined type you’ve created for account management. For more details, see “Converting Serialized Hiera Data into Resource Declarations”.

Packages hierarchy level

Some people create complex hierarchies to track package information for each OS. We recommend that you place this data as close to the relevant branch as possible. The following example shows a quite powerful example:

  - name: "Service data"
    paths:
      - "os/%{facts.service}/%{facts.os.family}.yaml"
      - "os/%{facts.service}/common.yaml"

  - name: "OS data"
    paths:
      - "os/%{facts.os.family}/common.yaml"
      - "os/%{facts.os.family}/packages.yaml"

You can use this hierarchy with the create_resources() function call to install OS base packages and service-specific packages. You don’t need to write platform-specific code for that purpose since the platform differences (data) are entirely contained in Hiera.

If multiple modules need to install the same packages, you can store the packages in a common hash and then create a list of packages used by each module. This ensures that all necessary packages are installed without platform-specific data (package names) in the code.

Team hierarchy level

In some cases, it can be useful to have a hierarchy to manage data based on the team that owns a particular node or the service it provides. This tends to be common in large environments that provide Infrastructure as a Service (IaaS) and allow self-service provisioning. This can be a useful hierarchy level for storing team-specific security data, tags for data gathering, and monitoring alarm recipient information.

Team is a property that is difficult to determine programmatically, and will usually be maintained as a node property in your configuration management database or determined from node configuration and provided as a fact as shown here:

  - name: "Team owner"
    path: "project/%{facts.project}/team.yaml"

This hierarchy has some overlap with the accounts hierarchy. Keep account data out of the owner hierarchy. Instead, use the owner hierarchy to list what accounts should be added to a particular host.

Eliminating Data

Not all data belongs in Hiera. If data is managed in another system, it should be output from that system in JSON format for Hiera, or you can use a custom backend to directly query it.

The following list contains common kinds of data stored and managed with Hiera. We’ll review what the alternative approaches are and when it makes sense to source this data elsewhere.

Hierarchy design

We already discussed hierarchy design at length. It’s important to emphasize here that a good design will limit, not cause, data duplication. Any design that makes copypasta problems more likely needs to be revisited.

Package management

A common use for Hiera is to store package version information. If your organization already uses a package management solution such as Pulp, Katello, or Spacewalk, use Puppet to configure that data source rather than recreate the same data in Hiera.

User management

Users are another common source of data lists in Hiera. Deploying a directory service provides security and usability benefits over maintaining user accounts as static data in Hiera.

Service discovery

Service information tends to make up the bulk of data in Hiera, and it tends to create the most change requests in a complex environment. Questions such as what servers are currently available in my web pool? are better answered by service registries such as Consul or etcd.

Accessing Hiera

Puppet provides three approaches to retrieving data from Hiera:

  • Automatic parameter lookups

  • The lookup() function call

  • The puppet lookup command-line utility

All three of these approaches can make exactly the same queries. Let’s review how each one is used.

Automatic Parameter Lookups

Puppet automatically queries Hiera whenever it encounters a class parameter that has not been explicitly declared by using resource-style class parameters, as demonstrated here:

class example(
  String $version  = 'installed',
  String $ensure   = 'running',
  Boolean $enabled = true,
) {
  #...
}

include 'example'

In this code example, we declare our example class with three variables. When we declare the class without providing values, Puppet will perform an automatic lookup for the variables in the namespace of the class requesting them (e.g., example::version). It will use only the default values provided if the Hiera lookups fail.

A major advantage of automatic parameter lookups is that they force a tight coupling between your data keys and the code that consumes those keys. Each parameter is keyed in the namespace of the module. Puppet would perform a Hiera lookup for the example::version, example::ensure, and example::enable keys. We can see at a glance that these keys are used by the example class to set the named parameters.

Automatic parameter lookups provide a simple guarantee: if a class has parameters, you can set the value of those parameters in Hiera by simply by adding the correct key/value pair to your Hiera data. Explicit lookups have no such requirements, and often result in the creation of arbitrary, sometimes undocumented, key names in your hierarchy as developers try to share keys between multiple classes.

Defined types

Defined types do not support automatic parameter lookups. This is because classes are singletons and use only a single set of parameters. Defined types (and any other resource type) can be called with unique parameters multiple times.

You can still get data from Hiera by interpolating a parameter within a lookup key, as shown here:

define example_module::example_type(
  id      => lookup("example_module::service_hash::${title}::id'),
  version => lookup("example_module::service_hash::${title}::version'),
) {
  #...
}

Hiera Function Calls

If you want to explicitly retrieve a specific value from another namespace in Hiera, use the lookup() function call.

Key naming conventions

When performing explicit Hiera lookups, it’s a good idea to follow the same key naming conventions as the automatic parameter lookup system. To have profile-specific class parameters, you could use the following style:

profiles::apache {
  class { 'apache':
    docroot => lookup('profiles::apache::docroot'),
  }
}

This approach helps keep the key in Hiera tightly associated with the location that calls the key. Being able to quickly identify and review the context in which a value is being used is invaluable when attempting to update the key. In this example, we named the key as if it were a parameter within the profiles::apache class before being assigned to the docroot parameter of the Apache module. Even though no such parameter exists, the intent and context where the key is used is clear at a glance in our Hiera data. For a contrasting example, look at this key and value combination:

wordpress_docroot: "/var/www/wordpress"

Do you notice in this case how much context and clarity of purpose is lost by a simple change of key name? Although the variable appears to be named clearly, it would be difficult to determine where the value is being used without performing an exhaustive search of the code. With the profiles::apache::docroot name, it’s immediately clear which class is using the variable, making it trivial to track down the code it might affect.

By carefully selecting key names, you can significantly ease troubleshooting and maintenance of your hierarchy data.

Hash and array data merges

When retrieving a value from Hiera using the lookup() function call or automatic parameter lookup, Hiera will normally return the first key encountered. For queries of Hash and Array values, you can choose a merge behavior. When a merge attribute is specific, Hiera will retrieve keys and values from every level of each hierarchy and merge all keys into a single data structure. The merge function selected controls how the results are combined into a single data structure.

Data-driven class assignment

You can pass a Hiera lookup that retrieves a unique array of class names to the include() function for Hiera-driven class assignment:

lookup('classes', Array, 'unique', []).include

This code looks up array of entries for classes from every level of each layer of the hierarchy, flattens the results into a single array, and removes duplicates. The results will be exactly the same as calling include module_name for each array entry.

Automatic parameter lookup ensures that each class’ parameters will also be looked up from Hiera. This allows for great flexibility with design of your roles and profiles.

Converting Serialized Hiera Data into Resource Declarations

Earlier in this chapter, we mentioned that in some cases it makes sense to store resource definitions in Hiera. Here’s an example of how you can accomplish that.

  1. A hash of data for each resource needs to be available in Hiera.

  2. An automatic or explicit lookup call to get that data must be made.

  3. The hash must be iterated over by create_resources() or each() to declare resources with each hash entry.

Here’s an example hash of packages to be installed or removed:

packages:
  emacs:
    ensure: absent
    tag: base
  vim-enhanced:
    ensure: present
    tag: base

package is a hash data structure inside Hiera. Each key in the hash is a resource title. The values under that key are the attributes for the package resource. This data structure serializes the title and attributes for the package resource type. The following code example will iterate over this data structure:

$packages = lookup('packages')
unless( empty($packages) ) {
  create_resources('package', $packages)
}

The create_resources() function call creates package resources to install Vim and remove Emacs based on the data in Hiera.

This example demonstrates the use of the create_resources() function call. Although this can empower DRY development principles, you can also use it to store code in data, which violates the KISS and SoC principles of software development. For a discussion of these concerns, see “The create_resources() function”.

Warning

We have seen many cases in which people build wacky, overcomplicated abstractions that do nothing but make their code difficult to understand. It’s a nightmare to maintain, and it’s a nightmare for their successors to decipher.

Interpolation in Your Data

Variable interpolation of Hiera data is somewhat safer than variable interpolation in your hierarchy configuration. The data indexed under a module key should be used only by the module, so problems will not be created if the module isn’t included in the node’s catalog.

Puppet variables can be interpolated into values, either directly or via the scope() Hiera function call. Use variable interpolation to do handy value adjustment like this:

my::source: "https://example.com/pkg/myapp-%{my::version}.%{facts.arch}.rpm"

In this example, we interpolate the app version and the node’s architecture to select the appropriate RPM package name.

Hiera allows interpolation of Puppet functions in values. This enables dynamic lookup features such as inline lookup with lookup() and referencing other values with alias().

The Built-In Backends

Hiera v5 includes four built-in backends. Usage of these backends is well documented, so this section instead focuses on what makes each backend appropriate for different situations.

YAML

YAML Ain’t Markup Language (YAML) is a human-friendly data serialization standard for all programming languages. YAML is one of the most user-friendly and least syntax-heavy data formats, for the following reasons:

  • It provides a simple syntax for creating complex data structures.

  • It uses indentation to determine the data grouping.

  • It allows comments at every layer.

  • It supports every data type native to Puppet.

  • YAML import and export is available in every programming language.

  • YAML values can reference other values, avoiding duplication of data.

  • YAML can be tested for validity, and will throw an error if a reference is broken.

YAML’s native support for comments and data references make it the most human-readable format available. The reference syntax makes it very clear that a particular key is being consumed elsewhere in the YAML data. This greatly simplifies management of large and complex data files. These benefits make it the best choice for beginners as well as most situations in which data is not generated programatically.

YAML has a few major disadvantages over other backends:

  • YAML’s loose format is parsed somewhat slower than the JSON backend. If you have a huge amount of data or a large number of nodes consuming your Hiera data, you might see a performance gain by switching to JSON.
  • YAML supports several string delimiters, and multiple folding methods. Flexibility for multiple valid formats is seen (by some) as less readable than strict formats.
  • YAML that is loaded and dumped will rarely match the original file. References might be converted to strings, and comments will be dropped.
  • YAML’s syntax can also cause surprises for people who aren’t very familiar with the markup language. For example, quoting strings is optional; however, the string true or ON must be enclosed in quotes to avoid being interpreted as a Boolean true value.

JSON

JavaScript Object Notation (JSON) is a lightweight, text-based, language-independent data interchange format that has long outgrown its JavaScript origins to become a de facto standard. Many applications use it as both a data-interchange format and as a configuration language. JSON is supported natively by Hiera and can be used instead of or in addition to the YAML backend. Following are some of the advantages of JSON:

  • JSON’s syntax is simple and very strict. This makes it one of the fastest data sources for code to parse.

  • JSON’s popularity makes it easy to find or develop tools for managing JSON documents.

  • JSON that is loaded and dumped will match the original file.

Outside of the ease of writing out and reading JSON programatically, JSON has many disadvantages vis-a-vis other backends:

  • All values in JSON are quoted or enclosed in braces, making it difficult for a human to read.

  • JSON does not allow comments.

  • JSON supports only basic data structures: Number, String, Boolean, Array, Hash, and null (this could be considered a security feature).

  • JSON does not provide the ability to reference one value from another value.

  • JSON’s strict schema means that human editing errors and frustration are more common.

  • The extra syntax required to escape special characters can be difficult to learn and read.

Because of these limitations, JSON is most suitable for programatic export of data for use by Puppet. Humans find manual maintenance of JSON data files to be frustrating and prone to error.

HOCON

Human-Optimized Config Object Notation (HOCON) is a superset of JSON and Java Properties that aims to utilize JSON’s semantic structure while being easy to use as a human-editable config file format. HOCON provides many advantages over JSON:

  • HOCON can parse any valid JSON file.

  • It allows comments.

  • It can reference other values, avoiding duplication of data.

  • It can be retrieved as a flat properties list, like Java properties.

  • The specification is far more forgiving, so there are many ways to represent different types of data.

HOCON has a few disadvantages when compared to JSON:

  • HOCON’s friendly format is parsed more slowly than the JSON backend.

  • The specification’s flexibility allows disparate ways to represent the same data.

  • It is less common so there are fewer language bindings for integrations.

HOCON is the configuration file format utilized by all new Puppet tools and features.

eYAML

Encrypted YAML (eYAML) is the only encryption backend built into Puppet. It uses standard YAML format files with plain-text keys and encrypted values, allowing use of the same format and tools as used for unencrypted data. We cover eYAML implementation and security considerations in more detail in “Encrypted Key/Value Storage”.

Custom Hiera Backends

You can add custom Hiera backends to any level of the hierarchy. There are a large number of custom backends for Hiera. We cover just a few of them here.

Hiera backends are actually fairly simple to write. If your specific needs are not satisfied by an existing backend, it’s straightforward to create something that will. For more information on how to do that, check out “Writing new data backends”.

Before writing a custom backend, review the use case to ensure that your needs aren’t better met by an ENC or a service discovery tool. We’ve outlined some of the more common backends in this section.

Database and NoSQL Engines

Several database and NoSQL backends exist for Hiera:

Database and NoSQL backends allow you to query existing databases and NoSQL backends in Hiera, with all the benefits and drawbacks that entails. This approach makes data from other applications and systems available without having to export it as JSON and transfer it to the Puppet parser.

Text file formats for Hiera data can be easily stored in version-control repositories to track the history of changes. Changes to the files can trigger automatic testing of the data. Database systems might not provide a way to audit who made a change, why the change was made, and what the configuration was before the change was made. This can be a major drawback compared to using the built-in text formats as a data store.

If the catalog depends on data from the database, an unavailable backend will cause catalog compilation to fail. Because Hiera is a read-only query system, the database backend can utilize read-only replicas to reduce the risk of service disruption.

Warning

Do not mask exceptions or otherwise allow a Puppet catalog to be built without results from a failed Hiera backend. This catalog failure will prevent incorrect configurations from being applied that can cause service disruption.

Even though a database can be updated every second, never forget that Puppet won’t see the changes until the next convergence interval for your environment (30 minutes by default).

Service Discovery Backends

Service registration and discovery services are highly responsive to service changes, and contain near-instantaneous information about node and service availability. The ability to query service discovery services makes it possible to get data directly from the services for use in Puppet catalogs rather than duplicating or exporting it for use by Hiera. Here are just a few of the service discovery backends available:

You should not use any of these Hiera backends to replace these services. Data provided by them to Puppet is available only on the succeeding Puppet convergence interval for your environment (30 minutes by default), which makes it suitable only for eventual convergence applications.

All the same limitations and concerns regarding database backends for Hiera apply to these tools, as well. Make sure that you understand how service failure will affect your Puppet runs and compiled catalog.

If you use Hiera to query a service discovery tool, plan for the impact that this will have on Puppet reporting and simulation. Service discovery tends to create a certain amount of configuration churn. If you don’t have a way to quiesce changes induced by service discovery, it can become somewhat difficult to compare the results of two simulated runs because it might be unclear if a change is induced by code or by service inventory.

Encrypted Key/Value Storage

The following backends use public key cryptography to encrypt Hiera data:

These backends are incredibly useful when managing passwords and other sensitive data with Hiera. The primary benefit of a cryptographic backend is that you can safely store sensitive data in a Git repository without having to severely restrict access to that repository or accept risks of the data leaking.

With hiera-gpg, the entire YAML file is stored in encrypted form. With hiera-eyaml, the YAML is stored in plain-text form, and only the values are encrypted. Because of this, hiera-eyaml is much better suited to revision control; changes to a value affect only the lines in your data files associated with the value—no other lines are changed. In contrast, updating any value with hiera-gpg will result in the entire file changing, which will tend to break git blame and git diff.

Because these backends use public-key cryptography, you can give the public key to anyone to encrypt values for storage. This allows contributors to add encrypted values to your data without giving them the ability to see other encrypted values.

Encrypted Hiera is not filesystem security

When working with either of these backends, it’s important to understand how the data is handled. Hiera’s encrypted backends solve the problem of storing secrets in shared source-code repositories. They do not keep the value encrypted in all parts of the process.

Warning

hiera-eyaml and hiera-gpg only encrypt data in the hierarchy. The data is decrypted during lookup and inserted into the node’s catalog in plain-text form.

When a request is made to Hiera, the value is decrypted and returned to Puppet for use in the catalog build. In many cases, the decrypted value will be inserted into a catalog, either as a class parameter, a resource parameter, or the contents of a file.

There are many situations in which your encrypted data is stored to disk and available in decrypted form, including at least the following:

  • The node caches the catalog to the disk containing the decrypted values.

  • The values might be written out by Puppet to files that contain the decrypted value.

  • The values might be used to execute commands (and available in memory).

In short, you cannot keep encrypted Hiera values secret from someone who has root or administrator access to the node that uses the decrypted values.

This is not a complete data-security solution. Use these tools to facilitate revision control of your sensitive data, but do not expect them to provide anything greater than protection from people reading the secrets from your source tree. The value will not be logged or included in the Puppet report unencrypted. But it’s not possible to prevent a user with access to a signed Puppet client certificate of the target node from accessing the decrypted values.

Tip

There are a wide variety of modules offering data encryption with different trade-offs. But unless the encrypted data can be decrypted by the application using it, the unencrypted value will end up being used or stored somewhere on the node unencrypted.

Summary

In this chapter, we explored best practices relating to Hiera hierarchies and data. Proper backend selection, effective use of each layer, and careful design of the environment hierarchy can greatly simplify site maintenance while providing great flexibility in the source of data.

Here are the takeaways from this chapter:

  • Consider your backend selection based on how you maintain the data.

  • YAML provides a user-friendly format for human-edited files.

  • Be aware of the security risks associated with variable interpolation.

  • Reduce the size of your hierarchy as much as possible.

  • Build appropriate abstraction layers, but avoid the trap of over abstraction.

  • Enforce strong naming conventions in your Hiera data.

  • Avoid explicit Hiera lookups in your modules unless absolutely necessary.

  • Avoid designing module interfaces around variable interpolation.

  • Hiera data can be used to identify or inform the roles and profiles used on your nodes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset