Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

5. Testing

Moshe Zadka¹

(1)

Belmont, CA, USA

Too often, code used for automating systems does not have the same attention for testing as application code. DevOps teams are often small and under tight deadlines. Such code is also hard to test since it is meant to automate large systems, and proper isolation for testing is non-trivial.

However, testing is one of the best ways to increase code quality. It helps make code more maintainable in many ways. It also lowers defect rates. For code where defects can often mean total system outage since it often touches all the parts of the system, this is important.

5.1 Unit Testing

Unit tests serve several distinct purposes. It is important to keep these purposes in mind, as the resulting pressures on the unit tests are sometimes at odds.

The first purpose is as an API usage example. This is sometimes summarized with the somewhat-inaccurate term test-driven development and sometimes summarized with another somewhat-inaccurate term. The unit tests are the documentation.

Test-driven development means writing the unit tests before the logic, but it usually has little impact on the final source code commit, which contains both the unit tests and the logic, unless care is taken to preserve the original branch-wise commit history.

However, what does show up in the commit is the unit tests as ways to exercise the API. It is, ideally, not the only documentation of the API. However, it serves as a useful reference-of-last-resort; at the very least, you know that the unit tests call the API correctly and get the results they expect.

You also want to be confident that the logic expressed in the code does the right thing. This is often done with regression tests, which make sure that a bug detected by someone is truly fixed. However, since the code developer is aware of the potential edge-cases and trickier flows, they are often able to add that test before such a bug makes it out into the externally observed code change. However, such a confidence-increasing test looks exactly like a regression test.

A final reason is to avoid incorrect future changes. This is different from the regression test in that often, the case being tested is straightforward for the code as is, and other tests already cover the flows involved. However, some potential optimizations or other natural changes might break this case, so including it helps a future maintenance programmer.

When writing a test, it is important to think about which of those goals it is meant to accomplish. A good test accomplishes more than one.

All tests have two potential impacts.

Make the code better by helping with future maintenance work
Make the code worse by making future maintenance work harder

Every test does some of both. A good test does more of the first; a bad test does more of the second. One way to reduce the bad impact is to consider if the test is testing something that the code promises to do. If the answer is no, it is valid to change the code in some way that breaks the test but does not cause any bugs. This means the test must be changed or discarded.

When writing tests, it is important to test the actual contract of the code.

Here is an example.

def write_numbers(fout):

fout.write("1 ")

fout.write("2 ")

fout.write("3 ")

This function writes a few numbers into a file.

A bad test might look like this:

class DummyFile:

def __init__ (self):

self.written = []

def write(self, thing):

self.written.append(thing)

def test_write_numbers():

fout = DummyFile()

write_numbers(fout)

assert fout.written == ["1 ", "2 ", "3 "]

This is a bad test because it checks for a promise that write_numbers never made—that each write only writes one line.

A future refactor might look like this:

def write_numbers(fout):

fout.write("1 2 3 ")

This keeps the code correct; all users of write_numbers still have correct files but cause a change in the test.

A slightly more sophisticated approach is to concatenate the strings written.

class DummyFile:

def __init__(self):

self.written = []

def write(self, thing):

self.written.append(thing)

def test_write_numbers():

fout = DummyFile()

write_numbers(fout)

assert_that("".join(fout.written), is_("1 2 3 "))

Note that this test works before and after the hypothetical optimization I suggested. However, this still tests more than the implied contract of write_numbers. After all, the function is supposed to operate on files; it might use another method to write.

The test would break if you modified write_numbers as follows.

def write_numbers(fout):

fout.writelines(["1 ",

"2 ",

"3 "]

A good test only breaks if there is a bug in the code. However, this code still works for the users of write_numbers, meaning that the maintenance now involved unbreaking a test, pure overhead.

Since the contract is to be able to write to file objects, it is best to supply a file object. In this case, Python has a ready-made one.

def test_write_numbers():

fout = io.StringIO()

write_numbers(fout)

assert_that(fout.getvalue(), is_("1 2 3 "))

In some cases, this requires writing a custom fake. The concept of fakes and how to write them is covered later.

You learned about the implicit contract of write_numbers. Since it had no documentation, you could not know the original programmer’s intent. This is, unfortunately, common— especially in internal code only used by other pieces of the project. Of course, it is better to clearly document programmer intent. In the face of a lack of clear documentation, however, it is important to make reasonable assumptions about the implicit contract.

Earlier, the assert_that and is_ functions verified that the values were what you expected. Those functions come from the hamcrest library. This library, ported from Java, allows specifying properties of structures and checks that they are satisfied.

Using the pytest test runner to run unit tests makes it possible to use regular Python operators with the assert keyword and get useful test failures. However, this binds the tests to a specific runner and has a specific set of assertions that get treated, especially for useful error messages.

Hamcrest is an open-ended library. While it has built-in assertions for the usual things (equality, comparisons, sequence operations, and more), it also allows you to define specific assertions. Those come in handy when handling complicated data structures, such as those returned from APIs, or when only specific assertions can be guaranteed by the contract (for example, the first three characters can be arbitrary but must be repeated somewhere inside the string).

This allows you to test the exact contract of the function. It is another tool for avoiding over-testing; testing implementation details that can change, requiring changing the test when no real users have been broken. This is crucial for three reasons.

One is straightforward; time spent updating tests that could have been avoided is time wasted. DevOps teams are usually small, and there is little room to waste resources.

The second is that getting used to changing tests when they fail is a bad habit. When behavior has changed due to a bug, people assume that updating the test is the right thing to do.

Finally, and most importantly, combining those two lowers the return on investment on unit testing and, worse, the perceived return on investment. As a result, there is organizational pressure to spend less time writing tests. Bad tests that test implementation details are the single biggest cause for the meme that writing unit tests for DevOps code is not worth it.

For example, let’s assume you have a function where all you can assert confidently is that the result has to be divisible by one of the arguments.

from hamcrest.core import base_matcher

class DivisibleBy(base_matcher.BaseMatcher):

def __init__(self, factor):

self.factor = factor

def _matches(self, item):

return (item % self.factor) == 0

def describe_to(self, description):

description.append_text('number divisible by')

description.append_text(repr(self.factor))

def divisible_by(num):

return DivisibleBy(num)

This example uses pyhamcrest, a third-library package for writing test assertions. The package can be installed in a virtual environment using pip install pyhamcrest.

By convention, you wrap constructors in a function. This is usually useful if you want to convert the argument to a matcher, which in this case would not make sense.

def test_scale():

result = scale_one(3, 7)

assert_that(result,

any_of(divisible_by(3),

divisible_by(7)))

You get an error like the following.

Expected: (number divisible by 3 or number divisible by 7)

but: was <17>

It lets you test exactly what the contract of scale_one promises; in this case, it would scale up one of the arguments by an integer factor.

The emphasis on the importance of testing precise contracts is not accidental. This emphasis, which is a skill that is possible to learn and has principles that are possible to teach, makes unit tests into something that accelerates the process of writing code rather than making it slower.

Much of the reason people have an aversion to unit tests as something that wastes time for DevOps engineers and leads to a lot of poorly tested code that is foundational for business processes such as the deployment of software is this misconception. Properly applying principles of high-quality unit testing leads to a more reliable foundation for operational code.

5.2 Mocks, Stubs, and Fakes

Typical DevOps code has outsized effects on the operating environment. Indeed, this is almost the definition of good DevOps code. It replaces a significant amount of manual work. Testing DevOps code needs to be done carefully. You cannot simply spin up a few hundred virtual machines for each test run.

Automating operations means writing code that can significantly impact production systems if run haphazardly. When testing the code, it is worthwhile to have a few of these side effects as possible. Even if you have high-quality staging systems, sacrificing one every time a bug in operational code would lead to a lot of wasted time. It is important to remember that unit tests run on the worst code produced. The act of running them, and fixing bugs, means even code committed into feature branches is likelier to be in better condition.

Because of that, you often try to run unit tests against a fake system. It is important to classify what you mean by fake and how it impacts unit tests and code design. It is worthwhile thinking about how to test the code well before writing it.

Test doubles are the neutral term for things that substitute for the systems not under test. Fakes, mocks, and stubs usually have a more precise meaning, although, in casual conversation, they are used interchangeably.

The most authentic test double is a verified fake. A verified fake fully implements the system’s interface not under test, though often simplified; perhaps less efficiently implemented, often without touching any external operating system. The verified refers to the fact that the fake has its own tests, verifying that it does indeed implement the interface.

An example of a verified fake in tests is using a memory-only SQLite database instead of a file-based one. Since SQLite has its own tests, this is a verified fake. You can be confident it behaves like a real SQLite database.

Below the verified fake is the fake. The fake implements an interface but often does it in a rudimentary form that the implementation is simple and not worth the effort to test.

For example, it is possible to create an object with the same interface as subprocess.Popen, but that never actually runs the process. Instead, it simulates a process that consumes all standard input, outputs some predetermined content into standard output, and exits with a predetermined code.

This object, if simple enough, might be a stub. A stub is a simple object that answers with predetermined data, always the same, holding almost no logic. This makes it easier to write, but it does make it constrained in what tests it can do.

An inspector, or a spy, is an object that attaches to a test double and monitors the calls. Often, part of the contract of a function is that it calls some method with specific values. An inspector records the calls and can be used in assertions to make sure the right calls get the right arguments.

You get a mock if you combine an inspect with a stub or a fake. Since this means that the stub/fake has more functionality than the original (at least, whatever is needed to check the recording), this can lead to some side effects. However, the simplicity and immediacy of creating mocks often compensate by making testing code simpler.

5.3 Testing Files

The filesystem is, in many ways, the most important thing about a Unix system. While the slogan everything is a file falls short of describing modern systems, the filesystem is still at the heart of most operations.

The filesystem has several properties that are worthwhile to consider when testing file manipulation code.

First, filesystems tend to be robust. While bugs in filesystems are not unknown, they are rare, far between, and usually only triggered by extreme conditions or an unlikely combination of conditions.

Next, filesystems tend to be fast. Consider that unpacking a source tarball, a routine operation, quickly creates many small files (several kilobytes). This is a combination of fast system call mechanisms and sophisticated cache semantics when reading or writing files.

Filesystems also have a curious fractal property; except for some esoteric operations, a sub-subdirectory supports the same semantics as the root directory.

Finally, filesystems have a very thick interface. Some of it is built into Python. Consider that the module system reads files directly. There are also third-party C libraries that use their internal wrappers to access the filesystem and several ways to open files even in Python, such as the built-in file object and the os.open low-level operations.

5.3.1 Testing with Subdirectories

For most file manipulation code, faking out or mocking the filesystem is a low return on investment. To make sure you are only testing the contract of a function, the investment is considerable since the function could switch to low-level file manipulation operations. You would need to reimplement a significant portion of Unix file semantics. The return is low. Using the filesystem directly is fast, reliable, and, as long as the code allows you to pass an alternative root path, almost side-effect free.

The best way to design file manipulation code is to allow passing in such a root path argument, even if the default is /. Given such design, the best way to test is to create a temporary directory, populate it appropriately, call the code, and garbage collect it.

If you create the temporary directory using Python’s built-in tempfile module, you can configure the tox runner to put the temporary file inside of tox’s built-in temporary directory. This keeps the general filesystem clean and is usually compatible with whatever version control the ignore file already uses to ignore tox artifacts.

[tox]

skipsdist = True

[testenv]

setenv =

TMPDIR = {envtmpdir}

commands =

python -c

'import os,sys;os.makedirs(sys.argv[1], exist_ok=True)'

{envtmpdir}

# Rest of testing commands go here.

python -c

'import os,sys;print(os.stat(sys.argv[1]))'

{envtmpdir}

Creating the temporary directory is important since Pythons tempfile only uses the environment variable when pointing to a real directory. Some versions of tox create the directory automatically, while others do not. It is best to ensure it exists to avoid unpleasant surprises.

As an example, you write tests for a function that looks for .js files and renames them as .py.

def javascript_to_python_1(dirname):

for fname in os.listdir(dirname):

if fname.endswith('.js'):

os.rename(fname, fname[:-3] + '.py')

This function uses the os.listdir call to find the file names, and then renames them with os.rename.

def javascript_to_python_2(dirname):

for fname in glob.glob(os.path.join(dirname, "*.js")):

os.rename(fname, fname[:-3] + '.py')

This function uses the glob.glob function to filter by wildcard all the files that match the *.js pattern.

def javascript_to_python_3(dirname):

for path in pathlib.Path(dirname).iterdir():

if path.suffix == '.js':

path.rename(path.parent.joinpath(path.stem + '.py'))

The function uses the built-in module pathlib (new in Python 3) to iterate on the directory and find its children. The real function under test is not sure which implementation to use.

def javascript_to_python(dirname):

return random.choice([javascript_to_python_1,

javascript_to_python_2,

javascript_to_python_3])(dirname)

Since you cannot be sure which implementation the function uses, you are left with only one choice: test the actual contract.

To write a test, you define some helper code. In a real project, this code lives in a dedicated module, possibly named something like helpers_for_tests. This module would be tested with its own unit tests.

You first create a context manager for the temporary directory. This ensures that the temporary directory is cleaned up.

@contextlib.contextmanager

def get_temp_dir():

temp_dir = tempfile.mkdtemp()

try:

yield temp_dir

finally:

shutil.rmtree(temp_dir)

Since this test needs to create many files, and you do not care about their contents, define a helper method.

def touch(fname, content=''):

with open(fname, 'a') as fpin:

fpin.write(content)

Now with the help of these functions, you can finally write a test.

def test_javascript_to_python_simple():

with get_temp_dir() as temp_dir:

touch(os.path.join(temp_dir, 'foo.js'))

touch(os.path.join(temp_dir, 'bar.py'))

touch(os.path.join(temp_dir, 'baz.txt'))

javascript_to_python(temp_dir)

assert_that(set(os.listdir(temp_dir)),

is_({'foo.py', 'bar.py', 'baz.txt'}))

For a real project, you would write more tests, many of them possibly using the get_temp_dir and touch helpers.

If you have a function that is supposed to check a specific path, you can have it take an argument to relativize its paths.

For example, let’s say you want a function to analyze the Debian installation paths and give you a list of all domains you download packages from.

def _analyze_debian_paths_from_file(fpin):

for line in fpin:

line = line.strip()

if not line:

continue

line = line.split('#', 1)[0]

parts = line.split()

if parts[0] != 'deb':

continue

if parts[1][0] == '[':

del parts[1]

parsed = hyperlink.URL.from_text(parts[1].decode('ascii'))

yield parsed.host

A naive approach would be to test _analyze_debian_paths_from_file. However, it is an internal function and has no contract. The implementation can change, perhaps reading the files and then scanning all strings, or possibly breaking up this function and letting the top level handle the line loop.

Instead, you want to test the public API.

def analyze_debian_paths():

for fname in os.listdir('/etc/apt/sources.list.d'):

with open(os.path.join('/etc/apt/sources.list.d', fname)) as fpin:

yield from _analyze_debian_paths_from_file(fpin)

However, you cannot control the /etc/apt/sources.list.d directory without root privileges. Even with root privileges, letting each test run control such a sensitive directory would be a risk. Additionally, many continuous integration systems are not designed for running tests with root privileges for good reasons, making this a problematic approach.

Instead, you can generalize the function a little bit, which means intentionally expanding the official, public API of the function to allow testing. This is definitely a trade-off.

However, the expansion is minimal. All you need is an explicit directory in which to work. In return, you get to simplify the testing requirements while avoiding any kind of patching, which inevitably starts poking at private implementation details.

def analyze_debian_paths(relative_to='/'):

sources_dir = os.path.join(relative_to, 'etc/apt/sources.list.d')

for fname in os.listdir(sources_dir):

with open(os.path.join(sources_dir, fname)) as fpin:

yield from _analyze_debian_paths_from_file(fpin)

Now, using the same helpers as before, you can write a simple test for this.

def test_analyze_debian_paths():

with get_temp_dir() as root:

touch(os.path.join(root, 'foo.list'),

content='deb http://foo.example.com ')

ret = list(analyze_debian_paths(relative_to=root))

assert(ret, equals_to(['foo.example.com']))

Again, in a real project, you would write more than one test and try to make sure many more cases are covered. Those could be built using the same techniques.

It is a good habit to add a relative_to parameter to any function that accesses specific paths.

5.3.2 Accelerating Tests with eatmydata

Unix operating systems try to be efficient with writes to disk. After data has been given to the OS to write, it tries to optimize when to write it to disk to minimize performance impact. This comes with a downside; some data might not be written to disk if the operating system crashes midstream.

Some applications cannot afford this risk. In these cases, there are ways to make sure the OS writes pending data to the disk. The name for this operation is sync (from synchronize).

There are a handful of system calls that expose sync-related functionality. The most common is fsync, wrapped by Pythons os module as os.fsync(). These system calls trade performance in favor of correctness. In general, this is a good trade-off to make. When running tests, however, if the OS crashes midstream, the test results are suspect anyway, and the tests need to run again.

Because of that, when testing code that uses os.fsync(), it would be nice to be able to turn this functionality off. The risk that the data being lost is worth it.

The libeatmydata library and the eatmydata executable are tools to turn off the sync functionality in programs. They are not Python-specific, which means that they help even if the sync is deep inside a C library that is linked to the code under test.

For eatmydata to show its value, the code must use fsync. The following is a code example.

last = datetime.datetime.now()

with open("foo.txt", "w") as fpout:

for i in range(1, 101):

fpout.write("X")

fpout.flush()

os.fsync(fpout.fileno())

if i % 10 == 0:

current = datetime.datetime.now()

print("Done", i, round((current-last).total_seconds(), 2))

last = current

This code writes only a hundred bytes to a file. It does so extremely inefficiently. On a reasonably modern computer, the output might look like the following.

$ python write_stuff.py

Done 10 0.35

Done 20 0.34

Done 30 0.34

Done 40 0.34

Done 50 1.41

Done 60 0.34

Done 70 0.44

Done 80 0.46

Done 90 1.27

Done 100 0.46

Even in the best iterations, 10 bytes take more than a third of a second to write. As is typical for fsync(), this is not merely slow but variable. The worst iteration takes 1.41 seconds to write those ten bytes.

How much can eatmydata speed it up?

$ eatmydata python write_stuff.py Done 10 0.0

Done 20 0.0

Done 30 0.0

Done 40 0.0

Done 50 0.0

Done 60 0.0

Done 70 0.0

Done 80 0.0

Done 90 0.0

Done 100 0.0

It is so fast that the rounding rounds it down to 0. Without rounding, numbers look like 4.2e-05—over 8000 times faster.

Note that this code is intentionally worst-case for fsync(). In general, the speed-up is not as dramatic.

It is possible to useeatmydata as eatmydata tox when running the entire tox test suite. This requires all developers to remember to use it. It is even better if tox does it.

[testenv]

allowlist_externals = eatmydata

commands =

eatmydata python write_stuff.py

Note that it is important to add eatmydata to allowlist_externals for this to work correctly.

At the time of writing, tox deprecates running commands outside of the virtual environment or explicitly allowed external commands. A future version will disable that completely.

Another way of enabling eatmydata from tox relies on understanding its internal mechanism.

The way fsync() is disabled relies on an interesting feature in how the OS runs programs. Most programs are dynamically linked. This means they get functions from the standard C library, like fsync(), by looking for them after they start, not at compile time.

When the LD_PRELOAD variable is set, the dynamic linker loads a dynamic library before loading the libraries that were originally linked against it. This means that a function in the library that LD_PRELOAD points at overrides those from any explicitly linked libraries, including the standard C library.

The eatmydata executable sets the LD_PRELOAD variable to libeatmydata.so and then runs the command. With the write configuration, tox can do the same and skip a layer of abstraction.

[testenv]

setenv =

LD_PRELOAD = libeatmydata.so

commands =

python write_stuff.py

This way of configuration can be useful in many places. For example, environment variables are easier to conditionally turn on or off depending on the environment. This allows some environments to run the command as is (maybe when eatmydata is not available) and lets others use it.

5.3.3 Accelerating Tests with tmpfs

tmpfs is an in-memory filesystem. In other words, when it is mounted, it does not have a backing hard-drive store. All files on a tmpfs are gone when the operating system reboots or crashes.

One way to mount tmpfs is by using containers. Many CI/CD systems either run the build and test steps in containers or can be configured to do so.

The exact steps to mount tmpfs into a container depend on how the containers are running. In Kubernetes, this is done by an emptyDir volume with emptyDir.medium set to Memory. When using Docker or nerdctl to run containers, this is done with the --tmpfs MOUNT_DIRECTORY argument.

Assume a container has a mounted tmpfs as /app/tmpdir. One way to accelerate any test that writes to files, but especially one that uses fsync() heavily, is to make sure the files are written inside /app/tmpdir.

To show how to do this, the preceding code needs to become slightly more sophisticated. The code always writes to a file named foo.txt in the current directory.

A more typical way of writing file handling code is to avoid hardcoding. Most code is written to accept file names as parameters. This code is similar to the preceding code, written as a function that accepts a file name.

def write_to_file(fname):

last = datetime.datetime.now()

with open(fname, "w") as fpout:

for i in range(1, 101):

fpout.write("X")

fpout.flush()

os.fsync(fpout.fileno())

if i % 10 == 0:

current = datetime.datetime.now()

print("Done", i, round((current-last).total_seconds(), 2))

last = current

A typical way of testing such code is to use the tempfile module. This module creates temporary files, which is ideal for tests.

def test_writer():

with tempfile.NamedTemporaryFile() as fp:

write_stuff.write_to_file(fp.name)

data = fp.read()

raise ValueError(len(data))

To have clearer output, this test is made to fail. The final raise in the last line allows the test output to be more thorough, as pytest adds debugging output.

Often tox.ini is written to force temporary files into the .tox directory. Even in cases where code catastrophically crashes since tox is cleaning the temporary directory between runs, this helps make sure there are no ugly leftovers.

[testenv]

deps = pytest

setenv =

TEMP = {envtmpdir}

commands =

pytest test_write_stuff.py

When running tox, this can take a while.

$ tox

...

test_write_stuff.py F [100%]

================================ FAILURES =================================

_______________________________ test_writer _______________________________

def test_writer():

with tempfile.NamedTemporaryFile() as fp:

write_stuff.write_to_file(fp.name)

data = fp.read()

> raise ValueError(len(data))

E ValueError: 100

test_write_stuff.py:9: ValueError

--------------------------- Captured stdout call --------------------------

Done 10 0.54

Done 20 1.27

Done 30 0.5

Done 40 0.46

Done 50 1.34

Done 60 0.76

Done 70 0.56

Done 80 1.37

Done 90 0.59

Done 100 1.32

...

As expected, this fsync() heavy code takes a long time to write one hundred bytes.

If the tmpfs is mounted on /app/tmpdir, it is possible to force the tempfile module to use that. The trick is to know the order of the environment variables used in tempfile. The TMPDIR environment variable takes precedence.

$ TMPDIR=/app/tmpdir tox

...

test_write_stuff.py F [100%]

================================= FAILURES ================================

_______________________________ test_writer _______________________________

def test_writer():

with tempfile.NamedTemporaryFile() as fp:

write_stuff.write_to_file(fp.name)

data = fp.read()

> raise ValueError(len(data))

E ValueError: 100

test_write_stuff.py:9: ValueError

--------------------------- Captured stdout call --------------------------

Done 10 0.0

Done 20 0.0

Done 30 0.0

Done 40 0.0

Done 50 0.0

Done 60 0.0

Done 70 0.0

Done 80 0.0

Done 90 0.0

Done 100 0.0

...

Even though the fsync() system call is not intercepted by a library like eatmydata, the speed rises dramatically. Since the tmpfs filesystem is memory-only, fsync() does nothing on it and takes little time.

5.4 Testing Processes

Testing process-manipulation code is often a subtle endeavor, full of trade-offs. In theory, the process of running code has a thick interface with the operating system. You learned about the subprocess module, but you can directly use the os.spawn* functions or the os.fork and os.exec* functions. Likewise, the standard output/input communication mechanism can be implemented in many ways, including using the Popen abstraction or directly manipulating file descriptors with os.pipe and os.dup.

Process-manipulation code can also be some of the most fragile. Running external commands depends on the behavior of those commands as a starting point. The interprocess communication means that the flow is inherently concurrent. It is too easy to make the tests rely on ordering assumptions that are not always true. Those mistakes can lead to flaky tests that pass most of the time but fail under seemingly random circumstances.

Those ordering assumptions can sometimes be true more often on development machines or unloaded machines, which means bugs only be exposed in production, or possibly in production only in extreme circumstances.

This is one of the reasons the chapter about using processes concentrated on ways to reduce concurrency and have things more sequential. It is worthwhile to carefully design process code to be reliably testable. Design often causes pressure on the code to be simple and reliable.

If the code only uses subprocess.run without taking advantage of exotic parameters, it is possible to use a simplified form of a pattern called dependency injection to make it testable. In this case, dependency injection is just a fancy way of saying passing parameters to a function. Consider the following function.

def error_lines(container_name):

ret_value = subprocess.run(

["docker", "logs", container_name],

capture_output=True,

text=True,

check=True,

)

for line in ret_value.stdout.splitlines():

if 'error' in line:

yield line

This function is unpleasant to test. Advanced patching can replace subprocess.run, but this would be error-prone and rely on implementation details. Instead, dependency injection explicitly elevates that implementation detail into being a part of the contract.

def error_lines(runner, container_name):

ret_value = runner(

["docker", "logs", container_name],

)

for line in ret_value.stdout.splitlines():

if 'error' in line:

yield line

Now that runner is part of the official interface, testing becomes easier. This might seem a trivial change, but it is deeper than it looks. In some sense, error_lines has constrained its interface to process running.

The new version can be tested with unittest.mock.

def test_error_lines():

runner = mock.MagicMock()

runner.return_value.stdout = textwrap.dedent("""

hello

error: 5 is not 6

goodbye

""")

lines = list(error_lines(runner, "cool-container"))

assert lines == ["error: 5 is not 6"]

args, kwargs = runner.call_args

assert kwargs == {}

assert len(args) == 1

[single_arg] = args

assert single_arg == ["docker", "logs", "cool-container"]

assert_that(lines, is_(["error: 5 is not 6"]))

The textwrap.dedent() function is useful in tests that need to create a multi-line string. This makes the code look nicer and indentation compatible without changing its values.

Note that this test does not restrict itself to only checking the contract. The code for error_lines could have run, for example, docker logs -- <container_name>.

This can slowly improve if the implementation does change until fidelity between the test and real life is achieved.

For example, the test can be modified to support -- as an argument separator.

def test_error_lines():

runner = mock.MagicMock()

runner.return_value.stdout = textwrap.dedent("""

hello

error: 5 is not 6

goodbye

""")

lines = list(error_lines(runner, "cool-container"))

assert lines == ["error: 5 is not 6"]

args, kwargs = runner.call_args

assert kwargs == {}

assert len(args) == 1

[single_arg] = args

command, rest = single_arg[:2], single_arg[2:]

assert command == ["docker", "logs"]

if rest[0] == "--":

del rest[0]

assert rest == ["cool-container"]

This still works with the old version of the code and post-modification code. Fully emulating Docker is not realistic or worthwhile. However, this approach would slowly improve the accuracy of the test with no downsides.

If a significant amount of the code interfaces, for example, with Docker, there is often a mini-Docker emulator that can be factored out. Using higher-level abstractions for process running helps with this sort of approach.

Because processes are so hard to test, it is good to use process running only when necessary. Especially when porting over shell scripts to Python, often a good idea when they grow in complexity, it is good to substitute long pipelines with in-memory data processing.

Especially when factoring the code the right way, with the data processing as a simple, pure function that takes an argument and returns a value, the bulk of the code becomes much easier to test.

Imagine, for example, the following pipeline.

ps aux | grep conky | grep -v grep | awk '{print $2}' | xargs kill

This kills all processes that have conky in their names.

Here is a way to refactor the code to make it easier to test.

def get_pids(lines):

for line in lines:

if 'conky' not in line:

continue

parts = line.split()

pid_part = parts[1]

pid = int(pid_part)

yield pid

def ps_aux(runner):

ret_value = runner(["ps", "aux"])

return ret_value.stdout.splitlines()

def kill(pids, *, killer):

for pid in pids:

killer(pid, signal.SIGTERM)

def main():

runner = functools.partial(

subprocess.run,

capture_output=True,

text=True,

check=True,

)

killer = os.kill

kill(get_pid(ps_aux(runner)), killer=killer)

Note how the most complicated code is now in a pure function: get_pids. Hopefully, this means most bugs are there. Unit tests for get_pids can find those.

The code that is harder to unit test, ps_aux, has an ad hoc dependency injection. It is also simpler, so less testing is needed.

The main logic is in functions that do data processing. Testing those requires supplying data structures and observing the return value.

This moves potential bugs from the system-related code to pure logic code. The system-related code requires more effort to unit test than the pure logic code.

This does not reduce the bugs. It does move them to where they can be caught with unit tests. Writing these unit tests and making sure they pass reduces the total number of bugs in a shipped version.

With the input argument to subprocess.run, almost all process-manipulation code can be written like this.

For example, the following code makes a (potentially empty) commit and checks that the commit message is correct. It assumes it is running in a valid git directory.

def empty_hello_commit(runner):

runner(

["git", "commit", "--allow-empty", "-F", "-"],

input="hello world ",

)

ret_value = runner(

["git", "log", "-n", "1"],

capture_output=True,

text=True,

check=True,

)

lines = iter(ret_value.stdout.splitlines())

for line in lines:

if line == "":

break

if next(lines).strip() != "hello world":

raise ValueError("commit failed", ret_value.stdout)

While it would have been easier to use git commit -m, this shows how to feed standard input into commands. In this case, git commit -F - expects a commit message on standard input.

The main code needs to initialize a runner.

runner = functools.partial(subprocess.run,

capture_output=True,

text=True,

check=True,

)

empty_hello_commit(runner)

The empty_hello_commit function can be tested with similar techniques. It accepts a runner function as a parameter.

5.5 Testing Networking

When writing network code that deals with lower-level concepts, such as sockets, it is useful to write them in a test-friendly way. Since the creation of the socket object is separate from its usage, a lot of mileage can be gotten out of writing functions that accept socket objects and create them outside.

To simulate extreme conditions and see if the code can work despite them, you might want to write a dedicated class as a socket fake. To reduce boilerplate when writing the class, the following code uses the attrs library. This library can be installed in a virtual environment using pip install attrs.

import attr

@attr.s

class FakeSimpleSocket:

_chunk_size = attr.ib()

_received = attr.ib(init=False, factory=list)

_to_send = attr.ib()

def connect(self, addr):

pass

def send(self, blob):

actually_sent = blob[:self._chunk_size]

self._received.append(actually_sent)

return len(actually_sent)

def recv(self, max_size):

chunk_size = min(max_size, self._chunk_size)

received, self._to_send = (self._to_send[:chunk_size],

self._to_send[chunk_size:])

return received

This allows you to control the size of chunks. An extreme test would be to use a chunk_size of 1. This means bytes would go out one at a time and receive one at a time. No real network would be this bad, but a unit test allows you to simulate more extreme conditions than any reasonable network.

This fake is useful for testing networking code. For example, this code does some ad hoc HTTP to get a result.

import json

def get_get(sock):

sock.connect(('httpbin.org', 80))

sock.send(b'GET /get HTTP/1.0 Host: httpbin.org ')

res = sock.recv(1024)

return json.loads(res.decode('ascii').split(' ', 1)[1])

if __name__ == '__main__':

# Sample code to exercise get_get

import socket

print(get_get(socket.socket())

This code has a subtle bug in it. You can uncover the bug with a simple unit test using the socket fake.

def test_get_get():

result = dict(url='http://httpbin.org/get')

headers = b'HTTP/1.0 200 OK Content-Type: application/json '

output = headers + json.dumps(result).encode("ascii")

fake_sock = FakeSimpleSocket(to_send=output, chunk_size=1)

value = get_get(fake_sock)

assert_that(value, is_(result))

This test would fail. The get_get assumes a good quality network connection, which simulates a bad one. The test would succeed if you changed chunk_size to 1024.

You could run the test in a loop, testing chunk sizes from 1 to 1024. In a real test, you would also check the sent data and send invalid results to see the response. The important thing is that none of those things need setting up clients or servers or trying to realistically simulate bad networks.

5.6 Testing HTTP Clients

Code that uses httpx as a web client can be made test-friendly by having most of it accept a httpx.Client as an argument. This is a good practice since it also allows adding support for complicated networking setups and configuring optimization parameters in one place.

A reductive example, using httpbin.org, can use the /put endpoint and the fact that it returns the data as the data parameter. The following code does not accomplish much, but it is an example of the skeleton of a function that accepts an httpx.Client as a parameter.

def put_httpbin(client):

resp = client.put("https://httpbin.org/put", json=dict(a=1, b=2))

resp.raise_for_status()

resp_value = resp.json()

print("debug", resp_value)

data = json.loads(resp_value["data"])

return data["a"] + data["b"]

Assuming httpbin.org works correctly, it always returns 3. It is possible to write a naive test that runs it against the real httpbin.org.

def test_put_httpbin_real():

with httpx.Client() as client:

value = put_httpbin(client)

assert value == 4

This test fails since it compares the output to 4, not 3. Running this failing test with pytest shows the print output ("debug"...) in the function.

--------------------------- test_put_httpbin_real ---------------------------

def test_put_httpbin_real():

with httpx.Client() as client:

value = put_httpbin(client)

> assert value == 4

E assert 3 == 4

httpbin_httpx.py:36: AssertionError

--------------------------- Captured stdout call --------------------------

debug {'args': {}, 'data': '{"a": 1, "b": 2}', ...

This test works correctly if the right value is used in the comparison. Running the tests against httpbin.org does come with its share of problems.

Some CI/CD environments limit communications with the outside Internet. The httpbin.org site might be down, causing tests to fail. Alternatively, running too many of these tests, perhaps due to a flurry of PRs, could lead to the endpoint denying the originating IP of the CI/CD system because of a sudden load spike.

One way to solve this is to use the fact that httpx.Client can run against in-process WSGI applications. Any Python web framework which can produce WSGI applications can be used to write local emulation.

The following code uses Pyramid to build minimal WSGI applications, which is enough to test the client.

from pyramid import response, config

def return_put_data(request):

data = request.body.decode("ascii")

resp_value = json.dumps(dict(data=data)).encode("ascii")

return response.Response(resp_value, content_type="application/json")

def make_app():

with config.Configurator() as cfg:

cfg.add_route('put', '/put')

cfg.add_view(return_put_data, route_name='put')

app = cfg.make_wsgi_app()

return app

Notice that it is not a perfect emulation. It does not return the args or other fields that httpbin.org returns, only the data field.

This code makes it is possible to write a test that tests the client against the local emulation.

def test_put_httpbin_fake():

with httpx.Client(app=make_app()) as client:

value = put_httpbin(client)

assert value == 4

This is the same as the ...._real test, except the parameter app is passed to the httpx.Client constructor. This parameter causes all HTTP calls that the client makes to be sent directly to the application.

This test causes the same failure as the real test.

--------------------------- test_put_httpbin_fake ---------------------------

def test_put_httpbin_fake():

with httpx.Client(app=make_app()) as client:

value = put_httpbin(client)

> assert value == 4

E assert 3 == 4

httpbin_httpx.py:30: AssertionError

--------------------------- Captured stdout call --------------------------

debug {'data': '{"a": 1, "b": 2}'}

Note that the debug output is different. This request has been made against the in-process WSGI application, not the real httpbin.org website.

This technique does require reconstructing a server version of any API the client uses. The implementation does not have to be complete, just enough to exercise the code and emulate any interesting edge-cases the real server has.

Because httpx.Client accepts any WSGI application, any Python web framework can be used to write the server emulation. Many Python web frameworks, including Flask, Pyramid, and Django, are popular and well-documented.

This makes writing such server emulations easier since both project-generated documentation and user-generated documentation are abundant. This expertise can be tapped for writing HTTP client tests, especially when working in a team that already uses Python web frameworks for a different reason.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Testing

Create new playlist

Sign In

Sign Up

5. Testing

5.1 Unit Testing

5.2 Mocks, Stubs, and Fakes

5.3 Testing Files

5.3.1 Testing with Subdirectories

5.3.2 Accelerating Tests with eatmydata

5.3.3 Accelerating Tests with tmpfs

5.4 Testing Processes

5.5 Testing Networking

5.6 Testing HTTP Clients

Table of Contents for
5. Testing