If you are using Puppet, you will be quite happy with the level of control MCollective gives you. MCollective can allow new ways of using Puppet that simply aren’t possible from agent, cron-run, or even command-line usage of Puppet.
The first thing we need to do is install the MCollective Puppet agent. Installation of this is identical to the agents we installed in Chapter 4 above. Since we know you have Puppet installed, we’ll dispense with the command line installation and show you to do it with Puppet.
node nodename
{
mcollective::plugin::agent { 'puppet': } # for servers
mcollective::plugin::client { 'puppet': } # for clients
}
If you use Hiera you can install the agent with a simple listing of the puppet agent in the mcollective::plugin::agents
array. In this example we’re going to show you an example where we set the puppet agent dependencies to ensure that puppet client is installed on the host.
mcollective::plugin::agents: puppet: version: latest dependencies: - Package[%{hiera('puppet::client::package_name')}] - Service[%{hiera('puppet::client::service_name')}] mcollective::plugin::clients: puppet: version: latest
Obviously a bit redundant since Puppet is enforcing this policy so we already know that it is installed, but this makes for a good example since the mcollective agent for puppet can’t function without puppet installed.
Once you have installed the agent and restarted mcollectived (which the puppet module does for you) you can query and run puppet from any client which has the puppet client installed. The first thing you should do is confirm which systems have the MCollective Puppet agent installed.
$mco puppet count
Total Puppet nodes: 3 Nodes currently enabled: 3 Nodes currently disabled: 0 Nodes currently doing puppet runs: 0 Nodes currently stopped: 3 Nodes with daemons started: 1 Nodes without daemons started: 2 Daemons started but idling: 1 $mco puppet summary
Summary statistics for 3 nodes: Total resources: ▄▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 17.0 Out Of Sync resources: ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 0.0 Failed resources: ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 0.0 Changed resources: ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 0.0 Config Retrieval time (seconds): ▇▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 1.8 Total run-time (seconds): ▇▇▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 2.3 Time since last run (seconds): ▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▇ min: 221.0 max: 2.5k
You’ll notice that these puppet runs are very fast, with fairly few resources involved. Only a few resources are used for the minimum test environment for the mcollective module provided in this book. A production setup will usually have much longer run times and thousands or tens of thousands of resources involved.
During maintenance you may want to disable the puppet agent on certain nodes. When you disable the agent, you can supply a message letting others know what you are doing.
$mco puppet disable --with-identity heliotrope message="Disk replacement"
* [ ==================================================> ] 1 / 1 Summary of Enabled: disabled = 1 Finished processing 1 / 1 hosts in 85.28 ms $mco puppet runonce --with-identity heliotrope
* [ =================================================> ] 1 / 1 heliotrope Request Aborted Puppet is disabled: 'Disk replacement' Summary: Puppet is disabled: 'Disk replacement' Finished processing 1 / 1 hosts in 84.22 ms
Re-enabling the puppet agent on the node is just as easy as disabling it.
$ mco puppet enable --with-identity heliotrope
* [ ==================================================> ] 1 / 1
Summary of Enabled:
enabled = 1
Finished processing 1 / 1 hosts in 84.36 ms
You can easily apply these same commands to enable or disable the puppet agent on nodes matching any filter criteria as discussed in Filters.
The MCollective puppet agent provides a powerful tool for controlling Puppet runs. If you example the help command for puppet you’ll find many familiar options for controlling puppet runs from the command line with puppet agent
or puppet apply
:
$ mco help puppet
Application Options
--force Bypass splay options when running
--server SERVER Connect to a specific server or port
--tags, --tag TAG Restrict the run to specific tags
--noop Do a noop run
--no-noop Do a run with noop disabled
--environment ENVIRONMENT Place the node in a specific environment for this run
--splay Splay the run by up to splaylimit seconds
--no-splay Do a run with splay disabled
--splaylimit SECONDS Maximum splay time for this run if splay is set
--ignoreschedules Disable schedule processing
--rerun SECONDS When performing runall do so repeatedly
with a minimum run time of SECONDS
The simplest invocation is naturally to run puppet immediately on one system.
$mco puppet runonce --with-identity
* [ ==================================================> ] 1 / 1 Finished processing 1 / 1 hosts in 193.99 ms $heliotrope
mco puppet status --with-identity
* [ ==================================================> ] 1 / 1 heliotrope: Currently idling; last completed run 02 seconds ago Summary of Applying: false = 1 Summary of Daemon Running: running = 1 Summary of Enabled: enabled = 1 Summary of Idling: true = 1 Summary of Status: idling = 1 Finished processing 1 / 1 hosts in 86.43 msheliotrope
What if you needed to run puppet instantly on every CentOS host to fix the sudoers file? Notice in the output here that one of these hosts had puppet agent running, and the other did not. However both ran puppet when we asked them to.
$mco puppet runonce --tags=
* [ ==================================================> ] 2 / 2 Finished processing 2 / 2 hosts in 988.26 ms $sudo
--with-factoperatingsystem=CentOS
mco puppet status --wf operatingsystem=
* [ ==================================================> ] 2 / 2 geode: Currently stopped; last completed run 1 minutes 52 seconds ago heliotrope: Currently idling; last completed run 2 minutes 21 seconds ago Summary of Applying: false = 2 Summary of Daemon Running: stopped = 1 running = 1 Summary of Enabled: enabled = 2 Summary of Idling: true = 1 false = 1 Summary of Status: stopped = 1 idling = 1 Finished processing 2 / 2 hosts in 42.17 msCentOS
How about prompting puppet to update immediately on every host in your environment? If you are using only local manifests, you can trigger a run affecting thousands of hosts. In most server-based environments the puppet servers won’t be able to handle every client checking in for a fresh catalog all at the same time. Likewise, you may want to slow roll a large number of hosts to prevent too many of them being out of service for a major change.
Run puppet on all servers, just one at a time:
$ mco puppet runall 1
2014-02-10 23:14:00: Running all nodes with a concurrency of 1
2014-02-10 23:14:00: Discovering enabled Puppet nodes to manage
2014-02-10 23:14:03: Found 3 enabled nodes
2014-02-10 23:14:06: geode schedule status: Signalled the running Puppet Daemon
2014-02-10 23:14:06: 2 out of 3 hosts left to run in this iteration
2014-02-10 23:14:09: Currently 1 node applying the catalog; waiting for less than 1
2014-02-10 23:14:13: Currently 1 node applying the catalog; waiting for less than 1
2014-02-10 23:14:17: heliotrope schedule status: Signalled the running Puppet Daemon
2014-02-10 23:14:18: 1 out of 3 hosts left to run in this iteration
...etc
Run puppet on all webservers, up to five at at time:
$ mco puppet runall 5
--with-identity /^webd/
Note that runall
is like batch
except that instead of waiting for a sleep time, it waits for one of the puppet daemons to complete its run before it starts another. If you didn’t mind some potential overlap, you could always use the batch options instead:
$ mco puppet --batch 10
--batch-sleep 60
--tags ntp
The mcollective puppet agent is so powerful that you can make arbitrary changes based on Puppet’s Resource Abstraction Layer (RAL). For example, if you wanted to ensure the httpd service was stopped on a given host, you could do the following.
$ mco puppet resource service httpd ensure=stopped --with-identity geode
* [ ==================================================> ] 1 / 1
geode
Changed: true
Result: ensure changed 'running' to 'stopped'
Summary of Changed:
Changed = 1
Finished processing 1 / 1 hosts in 630.99 ms
You can obviously limit this in all the ways specified in Filters. For example, you probably only want to do this on hosts where apache is not being managed by puppet:
$ mco puppet resource service httpd ensure=stopped --wc !apache
You could also fix the root alias on hosts:
$ mco puppet resource mailalias root recipient=[email protected]
This section documents some extremely powerful controls. Enabling the Puppet RAL allows direct, instantaneous and arbitrary access to any Puppet Resource Type it knows how to affect. Read carefully through the next section for how to protect yourself.
By default no resources can be controlled from mcollective. The feature is enabled in the mcollective agent but it has an empty whitelist. Consider this feature a really powerful shotgun. The whitelist protects you and everyone who depends upon that foot you are aiming at. Be careful.
These are the default configuration options:
plugin
.
puppet
.
resource_allow_managed_resources
=
true
plugin
.
puppet
.
resource_type_whitelist
=
none
If you want to allow resource control, you would need to edit the mcollective/server.cfg file with either a whitelist or a blacklist of resources which can be controlled.
plugin
.
puppet
.
resource_type_whitelist
=
package
,
service
plugin
.
puppet
.
resource_type_blacklist
=
exec
MCollective does not allow you to mix white and black lists.
By default no resource defined in the puppet catalog can be controlled from mcollective, so as to prevent mcollective from making a change against the Puppet policy. Sending alternate options for a resource in the puppet catalog is most likely to simply be overwritten the next time Puppet runs without the same options. In a worse case, well… sorry about the foot.
To allow MCollective to alter resources under Puppet’s control, enable the following setting.
plugin
.
puppet
.
resource_allow_managed_resources
=
true
Here we'll go through some common errors you might encounter with MCollective and Puppet interaction.
If you are unable to match a host using the --with-class
filter option, the first thing to do is get an inventory of the node with mco inventory hostname
. If you find that the inventory does not list any classes for a host, then the classes.txt file that mcollectived
is trying to read is not being written to by Puppet.
The classes.txt file is written out by the Puppet Agent during each run. In the [agent]
section of puppet.conf is a variable classfile
. This defaults to $statedir/classes.txt and $statedir
defaults to $vardir/state. MCollective defaults to the same location as Puppet does on every platform.
However this variable can be overridden in both puppet.conf and mcollective/server.cfg. If you do not see Puppet classes in the output of an inventory
request for a puppetized node, you should check the following two values and ensure that they match up.
$sudo puppet apply --configprint classfile
/var/lib/puppet/state/classes.txt $grep classesfile
$/etc/mcollective
/server.cfgmco rpc rpcutil get_config_item item=classesfile -I
heliotrope Property: classesfile Value: /var/lib/puppet/state/classes.txtheliotrope
If the classfile
from Puppet matches what is above, then mcollective doesn’t need an override in server.cfg. If any different value is found, you may want to set them explicitly to match in both files.
# /etc/puppet/puppet.conf [agent] classfile = $statedir/classes.txt # /etc/mcollective/server.cfg classesfile = /var/lib/puppet/state/classes.txt
If you are unable to match a host using the --with-fact
filter option, the first thing to do is get an inventory of the node with >mco inventory hostname. If you find that the inventory does not list any facts for a server, then the facts.yaml file that mcollectived
is trying to read is not being written to by Facter or Puppet.
For MCollective to know about Facts, there needs to be a parameter named plugin.yaml
defined in mcollective’s server.cfg. The value of this parameter should be a filename that lists the server’s facts in YAML format, usually /etc/mcollective/facts.yaml.
# /etc/mcollective/server.cfg
factsource
=
yaml
plugin
.
yaml
=
/
etc
/
mcollective
/
facts
.
yaml
The target for the plugin.yaml
parameter could include multiple filenames separated by a colon in Unix systems, or a semi-colon for Windows servers. If the facts do not show up after restarting mcollectived
, then the most likely problem is the formatting of the YAML within the file.
The most basic way to collect system facts was described in Facts. A more elegant and flexible solution which can use Puppet-generated facts or values was documented in Sharing Facts with Puppet. It doesn’t matter how you generate your system facts, as long as they are written in YAML format to the listed file.
Confirm that one of the following is configured to write out facts to this file.
You can install Facts plugins other than YAML from the Puppet Labs Forge, GitHub, or other repositories as discussed in Finding Community Plugins. You can also build your own as documented in Creating Other Types of Plugins.
There is a plugin named mcollective-facter-facts
on the Puppet Labs GitHub. This agent can be slow to run, as it invokes Facter
for each evaluation. The YAML plugin used above to load facts from a YAML-format text file works much better.
If you are unable to match a host using the --with-identity
or -I
filter option, the first thing to do is confirm that mcollectived
is running on the server. This is the most likely reason for a failed response by name.
The next step is to check and see what the configured identity in the server configuration might be:
$ grep identity /etc/mcollective
/server.cfg
#identity=
In this situation, the identity is not hardcoded in the server configuration, so we’ll have to look elsewhere.
The default identity for the node is the output of the hostname
command. If you are using Puppet, we can query Puppet for its certname
, which we can use as a fact to query the node identity.
# server $sudo puppet apply --configprint certname
heliotrope.example.net # client $mco rpc rpcutil get_config_item item=identity --wf clientcert=heliotrope.example.net
heliotrope Property: identity Value: heliotrope
No, that’s not a misprint. The configuration variable certname
is provided by Puppet as Facter fact clientcert
. No idea why the inconsistency, it’s just how Puppet is.
Likewise, you can use any other fact
or class
as previously described to locate the node. For example, there are only two CentOS hosts in my testlab.
$ mco rpc rpcutil get_config_item item=identity --wf operatingsystem=CentOS
Discovering hosts using the mc method for 2 second(s) .... 2
* [ =================================================> ] 2 / 2
geode
Property: identity
Value: geode
heliotrope
Property: identity
Value: heliotrope
Summary of Value:
geode = 1
heliotrope = 1
Finished processing 2 / 2 hosts in 18.38 ms
If you want an MCollective node to think of itself with a different name, then set identity
in server.cfg:
server.cfg.
identity = iambatman
If you are using configuration management like any sane person, you can have the variable set from the configuration management’s knowledge of itself. For example, here’s a Puppet template fragment to ensure the MCollective node knows itself by the Puppet certificate name, rather than the output of hostname
:
server.cfg.erb.
identity = <%= scope.lookup('::clientcert') -%>
The most common source of node name confusion is based around the use of node names or FQDNs in the hostname of a system. For example, you can set a node’s hostname to either a simple name or you can include the domain.
$grep HOSTNAME /etc/sysconfig/network
# RedHat location HOSTNAME=heliotrope $hostname
heliotrope $hostname -f
heliotrope.example.net
With this setup the MCollective identity was heliotrope while the Puppet certname was heliotrope.example.net. You can resolve that mismatch by changing /etc/sysconfig/network on RedHat-derived systems, or /etc/hostname on Debian-derived systems or /etc/rc.conf on *BSD systems. Or you can leave it alone, so long as you understand the difference.
The lack of matching between Puppet and MCollective does not create any explicit problems. My test setup uses short node names (e.g. “heliotrope”) for MCollective while Puppet always uses the FQDN of the host.
Absolutely nothing breaks out of the box by having different identities in Puppet and MCollective, it only affects how you might write your custom plugins. In the author’s personal opinion if you have many hosts with entirely unique hostnames across your DNS domains, you can save a lot of typing by leaving the domain name off of the hostname. Other people have different opinions drawn from their experiences. Your Mileage May Vary (YMMV).
Many of the weirder problems observed on the mailing list end up being due to the clients and servers having a different idea of what time it is. Before you take any other debugging steps, ensure that your systems have a consistent view of the time.
client$date +%s
1396310400 server$date +%s
1396310402
Allowing for the difference in time taken for you to run the commands on these two systems, they should be within a few seconds. In modern NTP time-sync 1/100th of a second is a considerable gap, so most systems should be easily within the same second.
The reason this is important is due to how messages are constructed. Every MCollective message sent out contains the current timestamp, and a ttl
to indicate how long the message is valid.
{
:msgtime => 1396310400,
:ttl => 60,
:filter => {
"fact" => [{:fact=>"operatingsystem", :value=>"Debian"}],
"agent" => ["package"]
},
:senderid => "desktop.example.net",
:msgtarget => "/topic/mcollective.discovery.command",
:agent: => 'discovery',
:collective' => 'mcollective',
:body => body,
:hash => "88dd360f13614b7db83616ba49deb130",
:requestid => "70141ca8a465954706a51ef6a7d4914e"
}
In the situation described by this packet, the request is valid from 1396310400 until 1396310460. If your server receives a request from a client too far in the past, then the request will be ignored because the TTL has already expired. Even weirder problems can occur with clients in the future, from the server’s perspective. It is absolutely essential that all of the systems in the collective have a consistent view of the time.
We aren’t talking about Timezones here. Computers track time in UTC time, and display it to you in the timezone-offset you have configured in your preferences. To computers, all time is stored and compare in UTC time as represented above. The commands above show you the UTC epoch time
, or seconds since January 1st, 1970 UTC.
If you know how to translate that number back to Pacific Standard Time, then you’ll know the exact minute I wrote this particular chapter.