Chapter 14. Improving the Module

There is a tremendous amount of additional features that can be utilized in a module. This chapter will cover the following features:

  • Validating input data to ensure it conforms to expectations
  • Providing data to modules using Hiera
  • Classes and subclasses within the module
  • Definitions that can be reused for processing multiple items
  • Utilizing other modules within your own
  • Documenting modules

Let’s get started.

Validating Input with Data Types

Puppet 4 introduced a new type system that can validate input parameters. This improves code quality and readability.

In older versions of Puppet, it was common to perform input validation like this:

class puppet(
  # input parameters and default values for the class
  $version = 'latest',
  $status  = 'running',
  $enabled = true,
  $datetime = '',
) {
  validate_string( $version )
  validate_string( $status )
  validate_bool( $enabled )
  # can't validate or convert timestamps
...resources defined below...

While this looks easy with three variables, it could consume pages of code if there are a lot of input variables. In my experience with larger modules, it wasn’t uncommon for the first resource defined in a manifest to be down below line 180.

Puppet 4 allows you to define the type when declaring the parameter now, which both shortens the code and improves its readability. When declaring the parameter, simply add the type before the variable name. This declares the parameter and adds validation for the input on a single line.

I think you’ll agree the following is significantly more readable:

class puppet(
  # input parameters and default values for the class
  String $version = 'latest',
  Enum['running','stopped'] $status  = 'running',
  Boolean $enabled = true,
  Timestamp $datetime = Timestamp.new(),
) {
...class resources...
}

It is not necessary for you to declare a type. If you are passing in something that can contain multiple types of data, simply leave the definition without a type as shown in the previous chapter. A parameter without an explicit type defaults to a type named Any.

Best Practice

Use explicit data types for all input parameters. Avoid accepting ambiguous values that must be introspected before use.

You can (and should) also declare types for lambda block parameters:

split( $facts['interfaces'] ).each |String $interface| { 
  ...lambda block...
}

Valid Types

The type system is hierarchical, where you can allow multiple types to match by defining at a higher level in the tree. If you are familiar with Ruby object types, this will look very familiar:

  • Data accepts any of:

    • Scalar accepts any of:

      • Boolean (true or false)

      • String (ASCII and UTF8 characters)

        • Enum (a specified list of String values)
        • Pattern (a subset of strings that match a given Regexp)
      • Numeric accepts either:

        • Float (numbers with periods; fractions)
        • Integer (whole numbers without periods)
      • Regexp (a regular expression)
      • SemVer (a semantic version)
      • Timespan (duration in seconds with nanosecond precision)
      • Timestamp (seconds since the epoch with nanosecond precision)
    • Undef (a special type that only accepts undefined values)
  • SemVerRange a contiguous range of SemVer versions

  • Collection accepts any of:

    • Array (list containing Data data types)
    • Hash (Scalar keys associated with Data values)
  • Catalogentry accepts any of:

    • Resource (built-in, custom, and defined resources)
    • Class (named manifests that are not executed until called)

As almost every common data value falls under Data, use it to accept any of these types. When you are testing types, $type =~ Data will also match a Collection that contains only Scalar values.

There are a few special types that can match multiple values:

Variant
A special type that matches any of a specified list of types
Optional
A special type that matches a specific type, or no value
NotUndef
A special type that will match any type, but not an undefined value
Tuple
An Array with specific data types in specified positions
Struct
A Hash with specified types for the key/value pairs
Iterable
A special type that matches any Type or Collection that can be iterated over (new in Puppet 4.4)
Iterator
A special type that produces a single value of an Iterable type, used by chained iterator functions to act on elements individually instead of copying the entire Iterable type (new in Puppet 4.4)

There are a few types you will likely never use, but you might see them in error messages.

Default
A special type to match default: key in case and select statements
Callable
A special type that holds a lambda block
Runtime
Refers to the running interpreter (e.g., Ruby)

All of these types are considered type Any. There’s no point in declaring this type because you’d be saying that it accepts anything, when most of the time you want Data.

Best Practice

Use Scalar for parameters where the values can be multiple types. In most circumstances, you are expecting a Scalar data type rather than one of the Puppet internal objects.

Validating Values

The type system allows for validation of not only the parameter type, but also of the values within structured data types. For example:

Integer[13,19]           # teenage years
Array[Float]             # an array containing only Float numbers
Array[ Array[String] ]   # an array containing only Arrays of String values
Array[ Numeric[-10,10] ] # an array of Integer or Float values between -10 and 10

For structured data types, the range parameters indicate the size of the structured type, rather than a value comparison. You can check both the type and size of a Collection (Array or Hash) using two values—the type and an array with the size parameters:

Array[String,0,10]      # an array of 0 to 10 string values
Hash[String,Any,2,4] ]  # a hash with 2 to 4 pairs of string keys with any value

For even more specificity about key and value types for a hash, a Struct specifies exactly which data types are valid for the keys and values.

# This is a Hash with short string keys and floating-point values
Struct[{
  day => String[1,8],       # keys are 1-8 characters in length
  temp => Float[-100,100],  # values are floating-point Celsius
}]

# This is a hash that accepts only three well-known ducks as key names
Struct[{
  duck  => enum['Huey','Dewey','Louie'],
  loved => Boolean,
}]

Like what a Struct does for a Hash, you can specify types of data for an Array using the Tuple type. The tuple can list specific types in specific positions, along with a specified minimum and optional maximum number of entries:

# an array with three integers followed by one string (explicit)
Tuple[Integer,Integer,Integer,String]

# an array with two integers followed by 0-2 strings (length 2-4)
Tuple[Integer,Integer,String,2,4]

# an array with 1-3 integers, and 0-5 strings
Tuple[Integer,Integer,Integer,String,1,8]

That last one is pretty hard to understand, so let’s break it down:

  • Integer in position 1 (required minimum).
  • Integers in positions 2–3 are optional (above minimum length).
  • String in position 4 is optional (above minimum length).
  • The final type (String) may be added 4 times up to the maximum 8 entries.

As you can see, the ability to be specific for many positions in an Array makes Tuple a powerful type for well-structured data.

The Variant and Optional types allow you to specify valid alternatives:

Variant[Integer,Float]         # the same as Numeric type
Variant[String,Array[String]]  # a string or an array of strings
Variant[String,undef]          # a string or nothing
Optional[String]               # same as the previous line
Optional[String,Integer]       # string, integer, or nada

You can also check the size of the value of a given type:

String[12]            # a String at least 12 characters long
String[1,40]          # a String between 1 and 40 characters long
Array[Integer,3]      # an Array of at least 3 integers
Array[String,1,5]     # an Array with 1 to 5 strings

You can use all of these together in combination:

# An array of thumbs up or thumbs down values
Array[ Enum['thumbsup','thumbsdown'] ]

# An array of thumbs up, thumbs down, or integer from 1 to 5 values
Array[ Variant[ Integer[1,5], Enum['thumbsup','thumbsdown'] ] ]

Testing Values

In addition to defining the type of parameters passed into a class, defined type, or lambda, you can perform explicit tests against values in a manifest. Use the =~ operator to compare a value against a type to determine if the value matches the type declaration. For instance, if a value could be one of several types, you could determine the exact type so as to process it correctly:

if( $input_value =~ String ) {
  notice( "Received string ${input_value}" )
}
elsif( $input_value =~ Integer ) {
  notice( "Received integer ${input_value}" )
}

The match operator can inform you if a variable can be iterated over:

if( $variable =~ Iterable ) {
  $variable.each() |$value| {
    notice( $value )
  }
}
else {
  notice( $variable )
}

You can determine if a version falls within an allowed SemVerRange using the match operator:

if( $version =~ SemVerRange('>=4.0.0 <5.0.0') ) {
  notice('This version is permitted.')
}
else {
  notice('This version is not acceptable.')
}

You can also determine if a type is available within a Collection with the in operator:

if( String in $array_of_values ) {
  notice('Found a string in the list of values.')
}
else {
  notice('No strings found in the list of values.')
}

The with() function can be useful for type checking as well:

with($password) |String[12] $secret| {
  notice( "The secret '${secret}' is a sufficiently long password." )
}

You can likewise test value types using case and selector expressions:

case $input_value {
  Integer: { notice('Input plus ten equals ' + ($input_value+10) ) }
  String:  { notice('Input was a string, unable to add ten.') }
}

You can test against Variant and Optional types as well:

if( $input =~ Variant[ String, Array[String] ] ) {
  notice( 'Values are strings.' )
}
if( $input =~ Optional[Integer] ) {
  notice( 'Values is a whole number or undefined.' )
}

A type compares successfully against its exact type, and parents of its type, so the following statements will all return true:

  'text' =~ String  # exact
  'text' =~ Scalar  # Strings are children of Scalar type
  'text' =~ Data    # Scalars are a valid Data type
  'text' =~ Any     # all types are children of Any

If you don’t like the default messages displayed when the catalog build fails due to a type mismatch, you can customize your own by using the assert_type() function with a lambda block:

assert_type(String[12], $password) |$expected, $actual| {
  fail "Passwords less than 12 chars are easily cracked. (provided: ${actual})"
}

Comparing Strings with Regular Expressions

You can evaluate strings against regular expressions to determine if they match using the same =~ operator. For instance, if you are evaluating filenames to determine which ones are Puppet manifests, the following example would be useful:

$manifests = ${filenames}.filter |$filename| {
  $filename =~ /.pp$/
}

Puppet uses the Ruby core Regexp class for matching purposes. The following options are supported:

i (case insensitive)
Ignore upper/lowercase when matching strings.
m (multiline mode)
Allow matching newlines with the . wildcard.
x (free spacing)
Ignore whitespace and comments in the pattern.

Puppet doesn’t support options after the final slash—instead, use the (?<options>:<pattern>) syntax to set options:

$input =~ /(?i:food)/            # will match Food, FOOD, etc.
$input =~ /(?m:fire.flood)/      # will match "fire
flood"
$input =~ /(?x:fo w $)/         # will match fog but not food or "fo g"
$input =~ /(?imx:fire . flood)/  # will match "Fire
Flood"

You can match against multiple exact strings and regular expressions with the Pattern type:

$placeholder_names = $victims.filter |$name| {
  $name =~ Pattern['alice','bob','eve','^(?i:j.* doe)',/^(?i:j.* roe)/]
}

A String on the righthand side of the matching operator is converted to a Regexp. This allows you to use interpolated variables in the regular expression:

$drink_du_jour = 'coffee'
$you_drank =~ "^${drink_du_jour}$"	  # true if they drank coffee

Matching a Regular Expression

You can also compare to determine if something is a Regexp type:

/foo/ =~ Regexp     # true
'foo' =~ Regexp     # false

This allows for input validation:

if $input =~ Regexp {
  notify { 'Input was a regular expression': }
}

You can compare to see if the regular expressions are an identical match:

/foo/ =~ Regexp[/foo/]   # true
/foo/ =~ Regexp[/foo$/]  # false

The Regexp type can use variable interpolation, by placing another variable within a string that is converted to the regular expression to match. Because this string requires interpolation, backslashes must be escaped:

  $nameregexp =~ Regexp["${first_name} [\w\-]+"]
}

The preceding example returns true if $nameregexp is a regular expression that looks for the first name input followed by another word.

Revising the Module

Now that you’ve taken a tour of data type validation, let’s take a look at how the Puppet module could be revised to validate each parameter:

class puppet(
  String $server                    = 'puppet.example.com',
  String $version                   = 'latest',
  Enum['running','stopped'] $status = 'running',
  Boolean $enabled                  = true,
  String $common_loglevel           = 'warning',
  Optional[String] $agent_loglevel  = undef,
  Optional[String] $apply_loglevel  = undef,
) {

As written, each parameter value is now tested to ensure it contains the expected data type.

Note

Instead of type String, it would be more specific to use the following for each log level:

Enum['debug','info','notice','warning','err','alert',
  'emerg','crit','verbose']

I didn’t do this here due to page formatting reasons.

Looking Up Input from Hiera

As discussed in Chapter 11, Hiera provides a configurable, hierarchical mechanism to look up input data for use in manifests.

Retrieve Hiera data values using the lookup() function call, like so:

  $version = lookup('puppet::version')

One of the special features of Puppet classes is automatic parameter lookup. Explicit hiera() or lookup() function calls are unnecessary. Instead, list the parameters in Hiera within the module’s namespace.

As we are still testing our module, let’s go ahead and define those values now in data for this one node at /etc/puppetlabs/code/hieradata/hostname/client.yaml:

--
classes:
  - puppet

puppet::version: 'latest'
puppet::status: 'stopped'
puppet::enabled: false

Without any function calls, these values will be provided to the puppet module as parameter input, overriding the default values provided in the class declaration.

Naming Parameters Keys Correctly

Given that Hiera uses the :: separator to define the data hierarchy, you might think that it would be easier to define the input parameters as hash entries underneath the module. And yes, I agree that the following looks very clean:

puppet:
  version: 'latest'
  status: 'stopped'
  enabled: false

Unfortunately, it does not work. The key for an input parameter must match the complete string of the module namespace plus the parameter name. You must write the file using the example shown immediately before this section.

Warning
You cannot define input parameters as hash keys under the module name.

Using Array and Hash Merges

By default, automatic parameter lookup will use the first strategy for lookup of Hiera data, meaning that the first value found will be returned. There are two ways to retrieve merged results of arrays and hashes from the entire hierarchy of Hiera data:

  • Define a default merge strategy for the value with the lookup_option hash from the module data provider.
  • Use the merge parameter of lookup() to override the default strategy.

An explicit lookup() function call will override the merge strategy specified in the module data. We will cover how to adjust the merge policy for module lookups in “Binding Data Providers in Modules”. This section will document how to perform explicit lookup calls.

In order to retrieve merged results of arrays or hashes from the entire hierarchy of Hiera data, utilize the lookup() function with the merge parameter. A complete example of the function with merge, default value, and value type checking parameters is shown as follows:

$userlist = lookup({
  name          => 'users::users2add',
  value_type    => Array[String],
  default_value => [],
  merge         => 'unique',
})

The following merge strategies are currently supported:

first
Returns the first value found in priority order (called priority in older Hiera versions).
unique
Returns a flattened array of all unique values found (called array in older Hiera versions).
hash
Returns a hash containing the highest priority key and its values (called native in older Hiera versions). Ignores unique values of lower-priority keys.
deep
Returns a hash containing every key and all values from every level of the hierarchy. Merges the values of unique lower-priority keys with higher-priority values.

Understanding Lookup Merge

The lookup() function will merge values from multiple levels of the Hiera hierarchy. For example, a users module for creating user accounts might expect data laid out like so:

users::home_path: '/home'
users::default_shell: '/bin/bash'
users::users2add:
  - jill
  - jack
  - jane

If you wanted to add one user to a given host, you could create an override file for that host with just the user’s name:

users::users2add:
  - jo

By default the automatic parameter lookup would find this entry and return it, making jo be the only user created on the system:

[vagrant@client hieradata]$ puppet lookup users::users2add
---
jo

To merge all the answers together, you could instead look up all unique Hiera values, as shown here:

[vagrant@client hieradata]$ puppet lookup --merge unique users::users2add
---
jo
jill
jack
jane

Apply this same merge option in the Puppet manifest to create an array of all users, as shown here:

class users(
  # input parameters and default values for the class
  $home_path     = '/home',
  $default_shell = '/bin/bash',
) {

  $userlist = lookup({ name => 'users::users2add', merge => 'unique' })

The $userlist variable will be assigned a unique, flattened array of values from all priority levels.

Debugging Lookup

Sometimes a lookup does not return the value you expect. To see how the lookup is finding the values in the hierarchy, add the --explain option to your puppet lookup command.

[vagrant@client hieradata]$ puppet lookup --explain users::users2add
Searching for "users::users2add"
  Global Data Provider (hiera configuration version 5)
    Using configuration "/etc/puppetlabs/puppet/hiera.yaml"
    Hierarchy entry "yaml"
      Path "/etc/puppetlabs/code/hieradata/hostname/client.yaml"
        Original path: "hostname/%{facts.hostname}"
        Path not found
      Path "/etc/puppetlabs/code/hieradata/common.yaml"
        Original path: "common"
        Path not found
...iterates over every hierarchy layer...

Specifying Merge Strategy in Data

The unfortunate aspect of the previous example is that you have to split out the parameter assignment to an explicit lookup() call, which doesn’t read well.

It is possible to specify the default merge strategy on a per-parameter basis. Do this by creating a lookup_options hash in your data source with the full parameter name as a key. The value should be a hash of lookup options, exactly the same as used in the lookup() call shown in “Using Array and Hash Merges”.

lookup_options:
  users::userlist:
    merge: unique

This allows you to simplify the class parameter lookup shown in the preceding section back to a single location:

class users(
  # input parameters and default values for the class
  $home_path     = '/home',
  $default_shell = '/bin/bash',
  $userlist      = [],
) {

Adding a lookup_options hash to the data allows module authors to set a default merge strategy and other options for automatic parameter lookup. The user of the module can override the module author by declaring a lookup_options hash key in the global or environment data, which are evaluated at a higher priority.

Warning
You cannot do a lookup() query to retrieve the lookup_options hash. It is accessible only to the lookup() function and automatic parameter lookup.

Replacing Direct Hiera Calls

Direct Hiera queries utilize only the original (global) lookup scope. Replace all hiera() queries with lookup() queries to make use of environment and module data providers.

If a module does direct Hiera queries, such as this:

# Format: hiera( key, default )
$status = hiera('specialapp::status', 'running')

replace them with one of the following two variants, depending on whether extra options like default values and type checking are necessary. The simplest form is identical to the original hiera() function with a single parameter.

# Format: lookup( key )
# Simple example assumes type Data (anything), no default value
$status = lookup('specialapp::status')

The more complex form allows you to pass in a hash with optional attributes that define how the data is retrieved and validated. Here are some examples:

# Perform type checking on the value
# Provide a default value if the lookup doesn't succeed
$status = lookup({
  name          => 'specialapp::status',
  value_type    => Enum['running','stopped'],
  default_value => 'running',
}

Here is another example, which performs an array merge of all values from every level of the hierarchy:

$userlist = lookup({
  name          => 'users2add',
  value_type    => Array[String],
  merge         => 'unique',
})

Here are some example replacements for the older Hiera functions:

# replaces hiera_array('specialapp::id', [])
lookup({
  name          => 'specialapp::id',
  merge         => 'unique'
  default_value => [],
  value_type    => Array[Data],
})

# replaces hiera_hash('specialapp::users', {})
lookup({
  name          => 'specialapp::users',
  merge         => 'hash'
  default_value => {},
  value_type    => Hash[Data],
})

The lookup() function will accept an array of attribute names. Each name will be looked up in order until a value is found. Only the result for the first name found is returned, although that result could contain merged values:

lookup( ['specialapp::users', 'specialapp::usernames', 'specialapp::users2add'], {
  merge => 'unique'
})

The lookup() function can also pass names to a lambda and return the result it provides, as shown here:

# Create a default value on request
$salt = lookup('security::salt') |$key| {
  # ignore key; generate a random salt every time
  rand(2**256).to_s(24)
}

Building Subclasses

When building a module you may find yourself with several different related components, some of which may not be utilized on every system. For example, our Puppet class should be able to configure both the Puppet agent and a Puppet server. In situations like this, it is best to break your module up with subclasses.

Each subclass is named within the scope of the parent class. For example, it would make sense to use puppet::agent as the name for the class that configures the Puppet agent.

Each subclass should be a separate manifest file, stored in the manifests directory of the module, and named for the subclass followed by the .pp extension. For example, our Puppet module could be expanded to have the following classes:

Class name Filename
puppet manifests/init.pp
puppet::agent manifests/agent.pp
puppet::server manifests/server.pp

As our module currently only configures the Puppet agent, let’s go ahead and move all resources from the puppet class into the puppet::agent class. When we are done, the files might look like this:

# manifests/init.pp
class puppet(
  # common variables for all Puppet classes
  String $version  = 'latest',
  String $loglevel = 'warning',
) {
  # no resources in this class
}

# manifests/agent.pp
class puppet::agent(
  # input parameters specific to agent subclass
   Enum['running','stopped'] $status  = 'running',
  Boolean $enabled,             # required parameter
)
inherits puppet {

  all of the resources previously defined
}

# manifests/server.pp
class puppet::server() {
  # we'll write this in Part III of the book
}

Best Practice

Any time you would need an if/then block in a module to handle different needs for different nodes, use subclasses instead for improved readability.

One last change will be to adjust Hiera to reflect the revised class name:

# /etc/puppetlabs/code/hieradata/common.yaml
classes:
  - puppet::agent

puppet::common_loglevel: 'info'
puppet::version: 'latest'
puppet::agent::status: 'stopped'
puppet::agent::enabled: false
puppet::agent::server: 'puppetmaster.example.com'

With these small changes we have now made it possible for a node to have the Puppet agent configured, or the Puppet server configured, or both.

Tip
Remember that module parameters must be supplied with the entire module class. You cannot define parameters as hash keys under the module name.

Creating New Resource Types

Puppet classes are what’s known as singletons. No matter how many places they are called with include or require() functions, only one copy of the class exists in memory. Only one set of parameters is used, and only one set of scoped variables exists.

There will be times that you may want to declare Puppet resources multiple times with different input parameters each time. For that purpose, create a defined resource type. Defined resource types are implemented by manifests that look almost exactly like subclasses:

  • They are placed in the manifests/ directory of a module.
  • They are named within the module namespace exactly like subclasses.
  • The filename should match the type name, and end with the .pp suffix.
  • They begin with parentheses that define parameters that are accepted.

Like the core Puppet resources, and unlike classes, defined resource types can be called over and over again. This makes them suitable for use within the lambda block of an iterator. We’ll use an iterator in our puppet class in the next section. To demonstrate the idea now, here’s an example from a users class.

# modules/users/manifests/create.pp
define users::create(
  String $comment,
  Integer $uid,
  Optional[Integer] $gid = undef,
) {
  user { $title:
    uid     => $uid,
    gid     => $gid || $uid,
    comment => $comment,
  }
}

# modules/users/manifests/init.pp
class users( Array[Hash] $userlist = [] ) {
  userlist.each |$user| {
    users::create { $user['name']:
      uid     => $user['uid'],
      comment => $user['comment'],
    }
  }
}

The create defined type will be declared once for every user in the array provided to the users module. Unlike a class, the defined type sees fresh input parameters each time it is called.

You might notice that this defined type utilizes a variable that wasn’t listed in the parameters. Just like a core resource, a defined resource type receives two parameters that aren’t named in the parameter list:

$title
The resource title used when the defined resource was declared.
$name
Defaults to $title but can be overridden in the declaration.

These attributes should be declared exactly the same way as any core Puppet resource.

Tip
Defined resource types are also called defined types, or just plain defines in many places.

Understanding Variable Scope

Modules may only declare variables within the module’s namespace (also called scope). This is very important to remember when using subclasses within a module, as each subclass has its own scope.

class puppet::agent {
   # these two definitions are the same and will produce an error
   $status = 'running'
   $::puppet::agent::status = 'running'

A module may not create variables within the top scope or another module’s scope. Any of the following declarations will cause a build error:

class puppet {
  # FAIL: can't declare top-scope variables
  $::version = '1.0.1'

  # FAIL: Can't declare variables in another class
  $::mcollective::version = '1.0.1'

  # FAIL: no, not even in the parent class
  $::puppet::version = '1.0.1'

Variables can be prefaced with an underscore to indicate that they should not be accessed externally:

# variable that shouldn't be accessed outside the current scope
$_internalvar = 'something'

# deprecated: don't access underscore-prefaced variables out of scope
notice( $::mymodule::_internalvar )

This is currently polite behavior rather than enforced; however, external access to internal variables will be removed in a future version of Puppet.

Using Out-of-Scope Variables

While you cannot change variables in other scopes, you can use them within the current scope:

   notify( $variable )                # from current, parent, node, or top scope
   notify( $::variable )              # from top scope
   notify( $::othermodule::variable ) # from a specific scope

The first invocation could return a value from an in-scope variable, or the same variable name defined in the parent scope, the node scope, or the top scope. A person would have to search the module to be certain a local scope variable wasn’t defined. Furthermore, a declaration added to the manifest above this could assign a value different from what you intended to use. The latter forms are explicit and clear about the source.

Best Practice

Always refer to out-of-scope variables with the explicit $:: root prefix for clarity.

Understanding Top Scope

Top-scope variables are defined from the following crazy mishmash of places, which can confuse and baffle you when you’re trying to debug problems in a module:

  • Facts submitted by the node
  • Variables declared in manifests outside of a module
  • Variables declared in the parameters block of an ENC’s result
  • Variables declared at top scope within Hiera
  • Variables set by the Puppet agent or Puppet server

In my own perfect world, top-scope variables would cease to exist and be replaced entirely by hashes of data from each source. That said, top-scope variables are used in many places, and many Forge modules, and some of them have no other location from which to gather the data. If you are debugging a module, you’ll have to evaluate all of these sources to determine where a value came from.

Follow these rules to simplify debugging within your environment:

  • Always use client-supplied facts from the facts[] hash.
  • When using a Puppet server, enable trusted_server_facts and use the server-validated facts available in the $server_facts[] and $trusted[] hashes.

By following these rules you can safely assume that any top-level variable was set by Hiera or an ENC’s results.

Best Practice

Avoid defining top-scope variables. Declare all variables in a module or role namespace.

Understanding Node Scope

Node scope is a special type of scope where variables could be defined that look like top-scope variables but are specific to a node assignment.

As discussed in “Assigning Modules to Nodes”, it was previously common to declare classes within node blocks. It was possible to declare variables within the node block, which would override top-scope variables if you were using the variable name without the $:: prefix.

It is generally considered best practice to avoid node blocks entirely and to assign classes to nodes using Hiera data, as documented in the section mentioned. However, it remains possible in Puppet 4 to declare node definitions, and to declare variables within the node definitions. These variables would be accessible as $variable, and hide the values defined in the top scope.

Best Practice

Avoid using node-scope variables. It’s never fun to debug a module when you have to comb through the entire environment to determine where a value is being declared from.

Understanding Parent Scope

Parent scope for variables is the scope of the class which the current class inherits (for example, when a subclass inherits from the base class, as shown in “Building Subclasses”):

class puppet::agent() inherits puppet {
 ...
}

Previous versions of Puppet would use the class that declared the current class as the parent class. This caused significant confusion and inconsistent results when multiple modules/classes would declare a common class dependency.

Tracking Resource Defaults Scope

As discussed in Chapter 6, it is possible to declare attribute defaults for a resource type.

It can be useful to change those defaults within a module or a class. To change them within a class scope, define the default within the class. To change them for every class in a module, define them in a defaults class and inherit it from every class in the module.

It’s not uncommon to place module defaults in the params class, as it is inherited by every class of the module to provide default values:

class puppet::params() {
  # Default values
  $attribute = 'default value'
   
  # Resource defaults
  Package {
    install_options => '--enable-repo=epel',
  }
}

As with variables, if a resource default is not declared in the class, it will use a default declared in the parent scope, the node scope, or the top scope. Unlike variables, parent scope is selected through dynamic scoping rules. This means that the parent class will be the class which declared this class if the class does not inherit from another class. Read that sentence twice, carefully.

  • If a class inherits from another class, then the parent scope is the inherited class.
  • Otherwise, the parent scope is the class that declared it.

As classes are singletons and thus instantiated only once, this means the parent scope changes depending on which class declares this class first. This can change as you add and remove modules from your environment.

Tip
Make all classes with resources inherit from another class in the module to ensure that no resource defaults bleed in from another module. This defensive technique prevents unexpected attribute values from being applied to the module’s resources.

Avoiding Resource Default Bleed

Puppet 4 provides the ability to implement named resource defaults which never bleed to other modules. It involves combining two techniques together:

By combining these techniques together, you can create resource defaults that can be applied by name. Let’s use this technique with the same default values shown in the previous section:

$package_defaults = {
  'ensure'          => 'present',
  'install_options' => '--enable-repo=epel',
}

# Resource defaults
package {
  default:
    * => $package_defaults
  ;

  'puppet-agent':
     ensure => 'latest'
  ;
}

This works exactly as if a Package {} default was created, but it will apply only to the resources that specifically use that hash for the defaults. This allows you to have multiple resource defaults, and select the appropriate one by name.

Best Practice

Use the default: resource title to set resource defaults based upon a hash of attribute values. This is more readable, less surprising, and will never bleed over to something you didn’t expect.

Redefining Variables

In previous versions of Puppet, you could also use $sumtotal += 10 to declare a local variable based on a computation of variable in a parent, node, or top scope. This reads an awful lot like a redefinition of a variable, which as you might recall is not possible within Puppet. This was removed in Puppet 4 to be more consistent.

This kind of redeclaration is now handled with straightforward assignment like so:

  $sumtotal = $sumtotal + 10

This appears to be a variable redefinition. However, the = operator actually creates a variable in the local scope with a value computed from the higher scope variable as modified by the operands.

To avoid confusion, I won’t use this syntax. I always use one of the following forms instead:

  # clearly access top-scope variable
  $sumtotal = $::sumtotal + 10          

  # clearly access parent-scope variable
  $sumtotal = $::parent::sumtotal + 10

  # even more clear by not using the same name
  $local_sumtotal = $::sumtotal + 10
Warning
Node-scope variables can only be accessed using the unqualified variable name and dynamic scope lookup. This is an excellent reason to avoid node-scope variables.

Calling Other Modules

In “Building Subclasses”, we split up the module into separate subclasses for the Puppet agent and Puppet server. A complication of this split is that both the Puppet agent and Puppet server read the same configuration file, puppet.conf. Both classes would modify this file, and restart their services if the configuration changes.

Let’s review two different ways to deal with this complication. Both solutions have classes depend on another module to handle configuration changes. Each presents different ways to deal with the complications of module dependencies, and thus we cover both solutions to demonstrate different tactics.

Sourcing a Common Dependency

One way to solve this problem would be to create a third subclass named config. This module would contain a template for populating the configuration file with settings for both the agent and server. In this scenario, each of the classes could include the config class. This would work as shown here:

# manifests/_config.pp
class puppet::_config(
  Hash $common = {},  #   [main] params empty if not available in Hiera
  Hash $agent = {},   #  [agent] params empty if not available in Hiera
  Hash $user = {},    #   [user] params empty if not available in Hiera
  Hash $server = {},  # [master] params empty if not available in Hiera
) {
  file { 'puppet.conf':
    ensure  => ensure,
    path    => '/etc/puppetlabs/puppet/puppet.conf',
    owner   => 'root',
    group   => 'wheel',
    mode    => '0644',
    content => epp(
     'puppet:///puppet/puppet.conf.epp',        # template file
     { 'agent' => $agent, 'server' => $server } # hash of config params
    ),
  }
}

This example shows a common practice of naming classes that are used internally by a module with a leading underscore.

Best Practice

Name classes, variables, and types that should not be called directly by other modules with a leading underscore.

You may notice that the file resource doesn’t require the agent or server packages, nor notify the Puppet agent or Puppet server services. This is because a Puppet agent and server are separate classes. One or the other may not be declared for a given node.1 Don’t define relationships with resources that might not exist in the catalog.

Warning
It is only safe to depend on resources from classes that are guaranteed to be available in the catalog.

Instead, we’ll modify the agent class to include the config class, and depend on the file resource it provides:

# manifests/agent.pp
class puppet::agent(
  String $status = 'running',
  Boolean $enabled,
) {
  # Include the class that defines the config
  include puppet::_config

  # Install the Puppet agent
  package { 'puppet-agent':
    version => $version,
    before  => File['puppet.conf'],
    notify  => Service['puppet'],
  }

  # Manage the Puppet service
  service { 'puppet':
    ensure    => $status,
    enable    => $enabled,
    subscribe => [ Package['puppet-agent'], File['puppet.conf'] ],
  }

Create a server class with the same dependency structure. As you might remember from the preceding section, each class is a singleton: the configuration class will be called only once, even though it is included by both classes. If the puppet::server class is defined with the same dependencies as the puppet::agent class, the before and subscribe attributes shown will ensure that resource evaluation will happen in the following order on a node that utilizes either or both classes:

  1. Packages will be installed:
    puppet::agent
    The Puppet agent package would be installed.
    puppet::server
    The Puppet server package would be installed.
  2. puppet::_config: The Puppet configuration file would be written out.

  3. The services will be started:

    puppet::agent
    The Puppet agent service would be started.
    puppet::server
    The Puppet server service would be started.

Using a Different Module

The previous example showed a way to solve a problem within a single Puppet module, where you control each of the classes that manages a common dependency. Sometimes there will be a common dependency shared across Puppet modules maintained by different groups, or perhaps even sometimes entirely outside of Puppet.

The use of templates requires the ability to manage the entire file. Even when using modules that can build a file from multiple parts, such as puppetlabs/concat on the Puppet Forge, you must define the entirety of the file within the Puppet catalog.

The following alternative approach utilizes a module to make individual line or section changes to a file without any knowledge of the remainder of the file:

# manifests/agent.pp
class puppet::agent(
  String $status   = 'running',
  Boolean $enabled = true,
  Hash $config     = {},
) {
  # Write each agent configuration option to the puppet.conf file
  $config.each |$setting,$value| {
    ini_setting { "agent $setting":
      ensure  => present,
      path    => '/etc/puppetlabs/puppet/puppet.conf',
      section => 'agent',
      setting => $setting,
      value   => $value,
      require => Package['puppet-agent'],
    }
  }
}

This shorter, simpler definition uses a third-party module to update the Puppet configuration file in a nonexclusive manner. In my opinion, this is significantly more flexible than the common dependency module shown in the previous example.

Ordering Dependencies

When using other modules, it will be necessary to use ordering metaparameters to ensure that the dependencies are fulfilled before resources that require those dependencies are evaluated. Some tricky problems come about when you try to utilize ordering metaparameters between classes maintained by different people.

In this section, we’ll cover strategies for safe ordering of dependencies between modules.

Depending on Entire Classes

A primary problem with writing classes that depend on resources in other classes comes from this requirement:

  • You must know the name of the resource(s).

If you are using a module that depends on an explicitly named resource in another module, you are at risk of breaking when the dependency module is refactored and resource titles are changed.

As discussed in “Understanding Variable Scope”, classes and defined types has their own unique scope. Each instance of a class or defined type becomes a container for the variables and resources declared within it. This means you can set dependencies on the entire container.

Best Practice

Whenever possible, treat other modules as black boxes and depend on the entire class, rather than “peeking in” to depend on specific resources.

If a resource defines a dependency with a class or type, it will form the same relationship with every resource inside the container. For example, say that we want the Puppet service to be started after the rsyslog daemon is already up and running. As you might imagine, the rsyslog module has a similar set of resources as our puppet::client module:

class rsyslog {
  package { ... }
  file { ... }
  service { ... }
}

Rather than setting a dependency on one of these resources, we can set a dependency on the entire class:

  # Manage the Puppet service
  service { 'puppet':
    ensure    => $status,
    enable    => $enabled,
    subscribe => Package['puppet-agent'],
    after      => Class['rsyslog'],
  }

With the preceding configuration, someone can refactor and change the rsyslog module without breaking this module.

Placing Dependencies Within Optional Classes

There is another difficulty with ordering dependencies that you may run into:

  • The resource you notify or subscribe to must exist in the catalog.

This obviously comes into play when you are writing a module that may be used with or without certain other modules. It’s easy if the requirement is absolute: you simply include or require the dependency class. But if not having the class is a valid configuration, then it becomes tricky.

Let’s use, for example, the puppet module you are building. It is entirely valid for a user to install Puppet server with it, but not to run or even configure the Puppet agent. If you use a puppet::config class, the file['puppet.conf'] resource cannot safely notify service['puppet'] because that service won’t be available in the catalog if the puppet::agent class wasn’t included. The catalog build will abort, and animals will scatter in fright.

You could explicity declare the puppet::agent class, and force everyone who doesn’t want to run the agent to define settings to disable it. However, a more flexible approach would be to have the optional service subscribe to the file resource, which is always included:

  # Manage the Puppet service
  service { 'puppet':
    ensure    => $status,
    enable    => $enabled,
    subscribe => File['puppet.conf'],
  }

By placing the notification dependency within the optional class, you have solved the problem of ensuring that the resources exist in the catalog. If the puppet::agent class is not included on a node, the dependency doesn’t exist, and no animals were harmed when Puppet applied the resources.

Notifying Dependencies from Dynamic Resources

The same rule discussed before has a much trickier application when ordering dependencies of dynamic resources:

  • You must know the name of the resource(s).

This comes into play when you are writing a module that depends on resources that are dynamically generated. The use of puppetlabs::inifile to modify the configuration file defines each configuration setting as a unique resource within the class:

  # Write each agent configuration option to the puppet.conf file
  $config.each |$setting,$value| {
    ini_setting { "agent $setting":
      ensure  => present,
      ...
    }
   }

Because each setting is a unique resource, the package and service resources can’t use before or subscribe attributes, as the config settings list can change. In this case, it is best to reverse the logic. Use the dynamic resource’s require and notify attributes to require the package and notify the service resources.

Here’s an example that places ordering metaparameters on the dynamic INI file resources for the Puppet configuration file:

  # Write each agent configuration option to the puppet.conf file
  $config.each |$setting,$value| {
    ini_setting { "agent $setting":
      ensure  => present,
      path    => '/etc/puppetlabs/puppet/puppet.conf',
      section => 'agent',
      setting => $setting,
      value   => $value,
      require => Package['puppet-agent'],
      notify  => Service['puppet'],
    }
  }

In this form, we’ve moved the ordering attributes into the dynamic resources to target the well-known resources. The service no longer needs to know in advance the list of resources that modify the configuration file. If any of the settings are changed in the file, it will notify the service.

Solving Unknown Resource Dependencies

The really tricky problems come about when you are facing all of these problems together:

  • You must know the name of the resource(s).
  • The resource you notify or subscribe to must exist in the catalog.

This obviously comes into play when you are writing a module that depends on resources dynamically generated by a different class. If that class can be used without your dependent class, then it cannot send a notify event, as your resource may not exist in the catalog.

Warning
Likewise, if your class depends on a module from the Puppet Forge, you can’t modify their class without having to maintain patches to that module going forward.

The solution is what I call an escrow refresh resource, which is a well-known static resource that will always be available to notify and subscribe to.

Let’s return to the Puppet module you are building for an example. It is entirely valid for a user to install Puppet with it, but not to run or configure the Puppet agent. Changes to the Puppet configuration file can happen within the agent class. As the agent service is defined in the same class, it can safely notify the service.

However, changes to the Puppet configuration file can happen in the main puppet class and modify the configuration parameters in [main]. As these parameters will affect the Puppet agent service, the Puppet agent will need to reload the configuration file.

However, the base puppet class can be used without the puppet::agent subclass, so the inifile configuration resources cannot notify the agent service, as that service might not exist in the catalog. Likewise, the Puppet agent service cannot depend on a dynamically generated set of configuration parameters.

In this situation, the base puppet class creates an escrow refresh resource to which it will submit notifications that the Puppet configuration file has changed. With this combination of resource definitions, the following sequence takes place:

  1. Each dynamic resource notifies the escrow refresh if it changes the file.
  2. On receiving a refresh event, the escrow resource does something negligible.
  3. Services that subscribe to the escrow resource receive refresh events.

Implement this by creating a resource which does something that succeeds. It need not do anything in particular, as it only serves as a well-known relay for refresh events:

  # refresh escrow that optional resources can subscribe to
  Exec { 'puppet-configuration-has-changed':
    command     => '/bin/true',
    refreshonly => true,
  }

Adjust the dynamic resources to notify the escrow resource if they change the configuration file:

  # Write each main configuration option to the puppet.conf file
  $config.each |$setting,$value| {
    ini_setting { "main $setting":
      ensure  => present,
      path    => '/etc/puppetlabs/puppet/puppet.conf',
      section => 'main',
      setting => $setting,
      value   => $value,
      require => Package['puppet-agent'],
      notify  => Exec['puppet-configuration-has-changed'],
    }
  }

Declare the agent service to subscribe to the escrow refresh resource:

  # Manage the Puppet service
  service { 'puppet':
    ensure    => $status,
    enable    => $enabled,
    subscribe => Exec['puppet-configuration-has-changed'],
  }

The only difficulty with this pattern is that the escrow resource must be defined by the class on which the optional classes depend. This may require you to submit a request to the maintainer of a Forge module to add in an escrow resource for you to subscribe to.

Best Practice

If you maintain a module that manages a configuration file upon which other services may depend, or for which plugins or add-ons exist, create an escrow resource to which wrapper modules can subscribe.

Containing Classes

In most situations, each class declaration stands independent. While a class can include another class, the class is defined at an equal level as the calling class—they are both instances of the Class type. Ordering metaparameters are used to control which classes are processed in which order.

As classes are peers, no class contains any other class. In almost every case, this is exactly how you want class declaration to work. This allows freedom for any class to set dependencies and ordering against any other class.

However, there is also a balance where one class should not be tightly tied to the internals of another class. It can be useful to allow other classes to declare ordering metaparameters that refer to the parent class, yet ensure that any necessary subclasses are processed at the same time.

For example, a module may have a base class that declares only common variables. All resources might be declared in package and service subclasses. A module that sets a dependency on the base class would not achieve the intended goal of being evaluated after the service is started:

service { 'dependency':
  ensure => running,
  after  => Class['only_has_variables']
}

Rather than require the module to set dependencies on each subclass of the module, declare that each of the subclasses is contained within the main class:

class application(
  Hash[String] $globalvars = {},
) {
  # Ensure that ordering includes subclasses
  contain application::package
  contain application::service
}

With this definition, any class that references the application class need not be aware of the subclasses it contains.

Creating Reusable Modules

In this section, we’ll talk about ways to ensure your module can be used successfully by others, and even yourself in different situations. Even if you don’t plan to share your modules with anyone, the ideas in this section will help you build better modules that you won’t kick yourself for later.

Avoiding Fixed Values in Attribute Values

Many, many examples in this book have placed hard values in resources in the name of simplicity, to make the resource language easy to read for learning purposes. Unfortunately, this is a terrible idea when building manifests for production use. You’ll find yourself changing paths, changing values, and adding if/else sequences as the code is deployed in more places.

Best Practice

Use variables for resource attribute values. Set the values in a params class, Hiera, or another data source.

To give you a clear example, visualize a module that installs and configures the Apache httpd server. The following would be a valid definition for a virtualhost configuration file on CentOS 7:

file { '/etc/httpd/conf.d/virtualhost.conf':
  ensure  => file,
  owner   => 'apache',
  group   => 'apache',
  mode    => '0644',
  source  => 'puppet:///modules/apache_vhost/virtualhost.conf',
  require => Package['httpd'],
  notify  => Service['httpd'],
}

Then someone wants to use that module on an Ubuntu server. The problem is, nearly everything in that definition is wrong. The package name, the service name, the file location, and the file owners are all different on Ubuntu. It’s much better to write that resource as follows:

package { 'apache-httpd-package':
  ensure => present,
  name   => $apache_httpd::package_name,
}

file { 'virtualhost.conf':
  ensure  => file,
  path    => "${apache_httpd::sites_directory}/virtualhost.conf",
  owner   => $apache_httpd::username,
  group   => $apache_httpd::groupname,
  mode    => '0644',
  source  => 'puppet:///modules/apache_vhost/virtualhost.conf',
  require => Package['apache-httpd-package'],
  notify  => Service['apache-httpd-service'],
}

service { 'apache-httpd-service':
  name   => $apache_httpd::service_name,
  ensure => 'running',
}

Then utilize your Hiera hierarchy and place the following variables in the os/redhat.yaml file:

apache_httpd::username: apache
apache_httpd::groupname: apache
apache_httpd::package_name: httpd
apache_httpd::service_name: httpd

If this service is ever deployed on Ubuntu, you can redefine those variables for that platform in the os/debian.yaml file—zero code changes to your module:

apache_httpd::username: httpd
apache_httpd::groupname: httpd
apache_httpd::package_name: apache2
apache_httpd::service_name: apache2

Likewise, if you find yourself using the module for an Apache instance installed in an alternate location, you can simply override those values for that particular hostname in your Hiera hierarchy, such as hostname/abitoddthisone.yaml.

Ensuring Fixed Values for Resource Names

If you were looking carefully at the preceding section, you might have noticed that I didn’t take advantage of the resource’s ability to take the resource name from the title. Here, have another look:

package { 'apache-httpd-package':
  ensure => present,
  name   => "apache_httpd::package_name",
}

Wouldn’t it be much easier and less code to inherit the value like this?

package $apache::httpd::package_name {
  ensure => present,
}

It would be simpler, but only the first time I used this class. The resource’s name would vary from operating system to operating system. If I ever created a wrapper class for this, or depended on it in another class, I’d have to look up which value contains the resource and where it was defined. And then I’m scattering variable names from this class throughout another class.

Here’s an example of what a class that needs to install its configuration files before the Apache service starts would have to put within its own manifest:

# poor innocent class with no knowledge of Apache setup
file { 'config-file-for-other-service':
 ...
 require => Package[ $apache_httpd::package_name ],
 notify  => Service[ $apache_httpd::service_name ],
}

Worse, if you refactor the Apache class and rename the variables, it will break every module that referred to this resource.

Best Practice

Use static names for resources to which wrapper classes may need to refer with ordering metaparameters.

In this situation, it’s much better to explicitly define a static title for the resource, and declare the package or service name by passing a variable to the name attribute. Then wrapper and dependent classes can safely depend on the resource name:

# poor innocent class with no knowledge of Apache setup
file { 'config-file-for-other-service':
 ...
 require => Package['apache-httpd-package'],
 notify  => Service['apache-httpd-service'],
}

Defining Defaults in a Params Manifest

As discussed previously in this book, parameters can be declared in the class definition with a default value. Continuing with our Apache module example, you might define the base class like so:

class apache_httpd (
  String $package_name = 'httpd',
  String $service_name = 'httpd',

Then you could define the default values in Hiera operating system overrides. However, this would require everyone who uses your module to install Hiera data to use your module. To avoid that, you’d have to muddy up the class with case blocks or selector expressions:

String $package_name = $facts['os']['family'] ? {
   /redhat/  => 'httpd',
   /debian/  => 'apache2',
   default   => 'apache',
}
String $service_name = $facts['os']['family'] ? { ... }
String $user_name = $facts['os']['family'] ? { ... }

A much cleaner design is to place all conditions on platform-dependent values in another manifest named params.pp.

Best Practice

Place all conditional statements around operating system and similar data in a params class. Inherit from the params class and refer to it for all default values.

Here’s an example:

class apache_httpd (
  String $package_name = $apache_httpd::params::package_name,
  String $service_name = $apache_httpd::params::service_name,
  String $user_name    = $apache_httpd::params::user_name,
) inherits apache_httpd::params {

This makes your class clear and easy to read, hiding away all the messy per-OS value selection. Furthermore, this design allows Hiera values to override the OS-specific values if desired.

Note

In many cases, the params manifest can be replaced with data in modules, as described in “Binding Data Providers in Modules”.

Best Practices for Module Improvements

Let’s review some of the best practices for module development we covered in this chapter:

  • Declare parameters with explicit types for data validation.
  • Validate each parameter for expected values in the manifest.
  • Place each subclass and defined type in a separate manifest file.
  • Avoid using top or node scope variables.
  • Use the contain() function to wrap subclasses for simple dependency management.
  • Use variables instead of fixed values for resource attributes.
  • Use static names for resources which other modules may depend on.
  • Create escrow resources for wrapper and extension modules to subscribe to.

You can find more detailed guidelines in the Puppet Labs Style Guide.

Reviewing Module Improvements

Modules provide an independent namespace for reusable blocks of code that configure or maintain something. A module can create new resource types that can be independently used in other modules.

In this chapter, we have covered how to configure a module to:

  • Create reusable Puppet manifests that accept external data
  • Provide data to modules using Hiera
  • Validate input data to ensure it conforms to expectations
  • Utilize other modules to provide dependencies
  • Share new types and subclasses with discrete functionality
  • Relay refresh events through escrow resources to wrapper modules

This chapter has reviewed the features and functionality you can utilize within modules. The next chapter will discuss how to create plugins that extend modules with less common functionality.

1 The astute reader might point out that Puppet couldn’t possibly configure the Puppet server if the Puppet agent isn’t installed—a unique situation for only a Puppet module. This concept would be valid for any other module that handles both the client and server configurations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset