Chapter 12. Continuous Chaos

With every chaos experiment that you write and run, you increase your chances of finding evidence of dark debt that you can learn from and use to improve your system. Your chaos experiments will start out as explorations of your system; ways to ask yourself, “If this happens, I think the system will survive…or will it?” You’ll gradually build a catalog of experiments for your system that explores a selection of your Hypothesis Backlog, helping you build trust and confidence that you’re proactively exploring and surfacing weaknesses before they affect your users.

Some of your chaos experiments will then graduate into a different phase of their lives. The first phase of an experiment’s life, as just described, is about finding evidence of system weaknesses. It’s about exploring and uncovering that dark debt inherent in all complex sociotechnical systems. Over time, you will choose to overcome some or all of the weaknesses that your automated chaos experiments have surfaced evidence for. At that point, a chaos experiment enters the second phase of its life: it becomes a chaos test.

A chaos experiment is exploration; a chaos test is validation. Whereas a chaos experiment seeks to surface weaknesses and is celebrated when a deviation is found,1 a chaos test validates that previously found weaknesses have been overcome.

There’s more good news: a chaos experiment and a chaos test look exactly the same. Only the interpretation of the results is different. Instead of being a scientific exploration to find evidence of weaknesses, the goal has become to validate that those weaknesses seem to have been overcome. If a chaos engineer celebrates when evidence from a chaos experiment or Game Day shows that a new weakness may have been found, they will celebrate again when no evidence of that weakness is found after that same experiment is run as a chaos test once system improvements have been put into place.

Over time you will build catalogs of hypotheses, chaos experiments (Game Days and automated experiments), and chaos tests (always automated). You’ll share those experiments with others and demonstrate, through the contribution model (see “Specifying a Contribution Model”), what areas you are focusing on to improve trust and confidence in your system…but there is one more thing you could do to really turn those chaos tests into something powerful.

Chaos tests enable an additional chaos engineering superpower: they enable the potential for “continuous chaos.”

What Is Continuous Chaos?

Continuous chaos means that you have regularly scheduled—often frequent—executions of your chaos tests. Usually chaos tests, rather than chaos experiments, are scheduled, because the intent is to validate that a weakness has not returned. The more frequently you schedule your chaos tests to run, the more often you can validate that a transient condition has not caused the weakness to return.

A continuous chaos environment is made up of these three elements:

Scheduler

Responsible for taking control of when a chaos test can and should be executed

Chaos runtime

Responsible for executing the experiment

Chaos tests catalog

The collection of experiments that have graduated into being tests with a high degree of trust and confidence (see “Continuous Chaos Needs Chaos Tests with No Human Intervention” for more on this)

Figure 12-1 shows how these three concepts work together in a continuous chaos environment.

An image of the key concepts in a Continuous Chaos Environment.
Figure 12-1. The key parts of a continuous chaos environment

So far in this book you’ve been using the Chaos Toolkit as your chaos runtime, and you’ve been building up a collection of chaos experiments that are ready to be run as chaos tests; now it’s time to slot the final piece into place by adding scheduled, continuous chaos to your toolset.

Scheduling Continuous Chaos Using cron

Since the Chaos Toolkit provides a CLI through the chaos command, you can hook it up to your cron scheduler.2

Creating a Script to Execute Your Chaos Tests

We won’t go into all the details of how to use cron here,3 but it is one of the simplest ways of scheduling chaos tests to run as part of your own continuous chaos environment. First you need to have activated the Python virtual environment into which your Chaos Toolkit and its extensions are installed. To do this, create a runchaos.sh file and add the following to turn on your chaostk Python virtual environment (where your Chaos Toolkit was installed), and then run the chaos --help command to show that everything is working:

#!/bin/bash

source  ~/.venvs/chaostk/bin/activate 1

export LANG="en_US.UTF-8" # Needed currently for the click library
export LC_ALL="en_US.UTF-8" # Needed currently for the click library

chaos --help

deactivate 2
1

Activate the Python virtual environment where the Chaos Toolkit and any necessary extensions are installed.

2

Deactivate the Python virtual environment at the end of the run. This is only included to show that you could activate and deactivate different virtual environments with different installations of the Chaos Toolkit and extensions depending on your experiment’s needs.

Save the runchaos.sh file and then make it executable:

$ chmod +x runchaos.sh

Now when you run this script you should see:

$ ./runchaos.sh
Usage: chaos [OPTIONS] COMMAND [ARGS]...

Options:
  --version           Show the version and exit.
  --verbose           Display debug level traces.
  --no-version-check  Do not search for an updated version of the
                      chaostoolkit.
  --change-dir TEXT   Change directory before running experiment.
  --no-log-file       Disable logging to file entirely.
  --log-file TEXT     File path where to write the command's log.  [default:
                      chaostoolkit.log]
  --settings TEXT     Path to the settings file.  [default:
                      /Users/russellmiles/.chaostoolkit/settings.yaml]
  --help              Show this message and exit.

Commands:
  discover  Discover capabilities and experiments.
  info      Display information about the Chaos Toolkit environment.
  init      Initialize a new experiment from discovered capabilities.
  run       Run the experiment loaded from SOURCE, either a local file or a...
  validate  Validate the experiment at PATH.

You can now add as many chaos run commands to the runchaos.sh script as you need to execute each of those chaos tests sequentially when the script is run. For example:

#!/bin/bash

source  ~/.venvs/chaostk/bin/activate

export LANG="en_US.UTF-8" # Needed currently for the click library
export LC_ALL="en_US.UTF-8" # Needed currently for the click library

chaos run /absolute/path/to/experiment/experiment.json
# Include as many more chaos tests as you like here!

deactivate

This script will work well if your experiment files are always available locally. If that is not the case, another option is to direct the Chaos Toolkit to load the experiment from a URL.4 You can do this by amending your runchaos.sh file with URL references in your chaos run commands:

#!/bin/bash

source  ~/.venvs/chaosinteract/bin/activate

export LANG="en_US.UTF-8" # Needed currently for the click library
export LC_ALL="en_US.UTF-8" # Needed currently for the click library

chaos run /Users/russellmiles/temp/simpleexperiment.json
# Include as many more chaos tests as you like here!

deactivate

Adding Your Chaos Tests Script to cron

Now you can schedule a task with cron by adding an entry into your system’s crontab (cron table). To open up the crontab file, execute the following:

$ crontab -e

This will open the file in your terminal’s default editor. Add the following line to execute your runChaosTests.sh script every minute:

*/1 * * * * absolute/path/to/script/runChaosTests.sh

Save the file and exit, and you should see the crontab: installing new crontab message. Now just wait; if everything is working correctly, your chaos tests will be executed every minute by cron.

Scheduling Continuous Chaos with Jenkins

Scheduling your chaos tests to be executed every time there’s been a change to the target system,5 is a very common choice, so that’s what you’re going to set up now: you;ll install the popular open source Jenkins Continuous integration and delivery pipeline tool and add your chaos tests to that environment as an additional deployment stage.

Grabbing a Copy of Jenkins

First you need to get a Jenkins server running, and the simplest way to do that is to download and install it locally for your operating system.6 Once Jenkins has been downloaded, installed, and unlocked and is ready for work, you should see the Jenkins home screen shown in Figure 12-2.

An image of Jenkins ready for use.
Figure 12-2. Jenkins installed and ready for use

Adding Your Chaos Tests to a Jenkins Build

You are now all set to tell Jenkins how to run your chaos tests. From the Jenkins home screen, click “create new jobs” (see Figure 12-2). You’ll then be asked what type of Jenkins job you’d like to create. Select “Freestyle project” and give it a name such as “Run Chaos Tests” (see Figure 12-3).

An image of link to click to create a new Freestyle project for your Chaos Tests.
Figure 12-3. Create a new Jenkins freestyle project for your chaos tests

Once you’ve clicked OK to create your new project, you’ll be presented with a screen where you can configure the job. There’s a lot you could complete here to make the most of Jenkins, but for our purposes you’re going to do the minimum to be able to execute your chaos tests.

Navigate down the page to the “Build” section and click the “Add build step” button (see Figure 12-4), and then select “Execute shell.”

An image of button to click to add a new build step.
Figure 12-4. Adding a new build step

You’ll be asked to specify the shell command that you want Jenkins to execute. You’ll be reusing the run-chaos-tests.sh script that you created earlier, so simply enter the full path to your run-chaos-tests.sh file and then click “Save” (Figure 12-5).

An image of text to enter to run the run-chaos-tests.sh script.
Figure 12-5. Invoking the run-chaos-tests.sh shell script

You’ll now be returned to your new Run Chaos Tests job page. To test that everything is working, click the “Build Now” link; you should see a new build successfully completed in the Build History pane (Figure 12-6).

An image of text to enter to run the run-chaos-tests.sh script.
Figure 12-6. A successful single execution of your Run Chaos Tests job

You can see the output of running your chaos tests by clicking the build execution link (i.e., the job number) and then the “Console Output” link (Figure 12-7).

An image of the console output when running chaos tests from inside Jenkins.
Figure 12-7. Console output from your chaos tests

Great! You now have Jenkins executing your chaos tests. However, your clicking the “Build Now” button is hardly “continuous.” To enable continuous chaos, you need to add an appropriate build trigger.

Scheduling Your Chaos Tests in Jenkins with Build Triggers

You can trigger your new Run Chaos Tests Jenkins job in a number of different ways, including triggering on the build success of other projects. For our purposes, you can see some continuous chaos in action by simply triggering the job on a schedule, just as you did earlier with cron. In fact, Jenkins scheduled builds are specified with exactly the same cron pattern, so let’s do that now.

From your Run Chaos Tests job home page, click “Configure” and then go to the “Build Triggers” tab (see Figure 12-8). Select “Build periodically” and then enter the same cron pattern that you used earlier when editing the crontab file, which was:

*/1 * * * *

Figure 12-8 shows what your completed build trigger should look like.

An image of the build trigger configuration.
Figure 12-8. Configuring your Run Chaos Tests job to be triggered every minute

Now when you go back to your job’s home page you should see new executions of your chaos tests being run every minute!

Summary

The progression from manual Game Days to automated chaos experiments to chaos tests and continuous chaos is now complete. By building a continuous chaos environment, you can search for and confidently surface weaknesses as often as needed, without long delays between Game Days.

But your journey into chaos engineering is only just beginning.

Chaos engineering never stops; as long as a system is being used, you will find value in exploring and surfacing evidence of weaknesses in it. Chaos engineering is never done, and this is a good thing! As a chaos engineer, you know that the real value of chaos engineering is in gaining evidence of system weaknesses as early as possible, so that you and your team can prepare for them and maybe even overcome them. As a mind-set, a process, a set of techniques, and a set of tools, chaos engineering is a part of your organization’s resilience engineering capability, and you are now ready to play your part in that capability. Through the establishment of the learning loops that chaos engineering supports, everyone can be a chaos engineer and contribute to the reliability of your systems.

Good luck, and happy chaos engineering!

1 Maybe “celebrate” is too strong a term for your reaction to finding potential evidence of system weaknesses, but that is the purpose of a chaos experiment.

2 If you’re running on Windows there are a number of other options, such as Task Scheduler.

3 Check out bash Cookbook by Carl Albing and JP Vossen (O’Reilly) for more on using cron to schedule tasks.

4 The specified URL must be reachable from the machine that the chaos run command will be executed on.

5 Possibly even as part of the choice to roll back during a blue-green deployment.

6 If an instance of Jenkins is already available, please feel free to use that existing installation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset