Chapter 5. Control Development, Test, Acceptance, and Production Environments

Testing shows the presence, not the absence of bugs.

Edsger Dijkstra, Software Engineering Techniques

Best Practice:

  • Create separate environments for various stages in the development pipeline.

  • Keep these environments as similar as possible and strictly control the progression of code from one environment to the next.

  • This improves speed and predictability of development, because defects found in each environment can be diagnosed clearly and fixed without disturbing the development flow.

Imagine that in your development team, a sprint-specific Definition of Done is defined and a version control system is in place. The developers are now able to write code and quickly merge their versions. You notice that development of new features proceeds quickly, but the process of testing, acceptance testing, and production monitoring takes a long time to perform and validate. There is still a fair amount of code that was tested successfully in the test environment that fails during acceptance testing. The team encounters trouble when they need to fix bugs/defects for which they had written the fix long ago.

These are the kind of issues that may be caused by inconsistencies between different environments: Development, Test, Acceptance, and Production (DTAP in short). Let us briefly review these four environments:

  • Development, in which developers modify source code. This environment is optimized for developer productivity.

  • Test, in which the system is tested in various ways (e.g., validating whether technical requirements are being met). It is optimized for testing developers’ products efficiently and in combination.

  • Acceptance, in which it is verified whether user needs are met. It is optimized for mimicking the production environment as realistically as possible.

  • Production, where the system is made available to its users. It therefore should be optimized for operation—that is, it needs to be secure, reliable, and perform well.

By controlling this “DTAP street” (the pipeline from development through production) you are in a better position to interpret problems. In particular, by ruling out inconsistent environments as a cause.

Controlling DTAP means defining, agreeing on, and standardizing three main characteristics:

  • The configuration of different environments.

  • The transfer from one environment to another.

  • Responsibilities of each environment—that is, which activities are performed where (notably, whether fixes for issues found in later stages always need to be retested in earlier stages).

Making assumptions about the behavior of different environments causes trouble. Consider an example in which bugs are found in production after acceptance testing. If you can only guess that the acceptance and production environments are “fairly similar” in configurations, version numbers, etc., it will be a lot harder to find the underlying causes of bugs.

Take careful notice: for this best practice to work, the organization will need to be involved. Consider that an IT department could be reluctant to give up control over environments. But this separation of environments is a crucial step. Failure to have a proper separation of environments will jeopardize progress later!

Motivation

Controlling your DTAP street is useful for at least the following reasons:

  • It clarifies responsibilities of the different development phases, avoiding undesirable behavior.

  • It allows predicting the required effort of each phase of development and thus planning.

  • It allows identifying bottlenecks and problems in the development process early.

  • It reduces dependence on key personnel.

Consider that without defining different environments, testing, acceptance, and pushing to production “occurs somewhere, sometime.” The DTAP environments are undefined (Figure 5-1).

bmso 0501
Figure 5-1. Uncontrolled DTAP

In this situation it is rather likely that the environments are set up and configured differently. They might be using the same resources (or not), they might use similar test data, they might be configured properly. Who knows? In fact, we see often that the answer to this question is unclear.

In many organizations the availability of these environments is an issue, especially for acceptance environments. Oftentimes we see that acceptance tests have to be reserved and planned well in advance. Similarly, the test environment is often shared with other systems. Without clear separation of capacity it will then be hard to tell whether performance test results are due to code changes or the environment itself.

In a highly controlled DTAP street (Figure 5-2), the environments’ configuration and setup are predictable and consistent with agreements. Notably test, acceptance, and production environments should be as similar as possible. Separation between may be technical (e.g., transfer between different servers) or formal (handover), and it may be physical (different servers) or virtualized (a physical server with different instances).

bmso 0502
Figure 5-2. Controlled DTAP

Controlled DTAP Clarifies Responsibilities Between Development Phases

Different development environments should separate concerns, just like good software should separate concerns in its implementation. Let us discuss the typical boundaries of different environments:

  • A separation between development and test environments distinguishes clearly which code is ready for testing and which code is under development. Unit tests are commonly performed locally by developers, but these ought to be repeated in a test environment (typically managed by a Continuous Integration server).

  • The separation between test and acceptance environments is needed to avoid time being wasted on verifying the system while the code is insufficiently tested and not ready. The test environment should be as similar to the production environment as possible in order to obtain realistic results.

  • The separation between acceptance and production is needed to prevent code going to production that is insufficiently verified; i.e., the system does not behave the way it is supposed to. Typical examples of this are performance or reliability issues because of certain implementation/configuration flaws in the production environment.

Controlled DTAP Allows for Good Predictions

When you have a clear separation between responsibilities in the DTAP street, you can meaningfully measure the time spent in each phase. The more consistent the environments are among each other, the better you can compare those measurements. This is especially useful for making predictions and estimates for the time-to-market of new functionality. Clear separation of environment responsibilities facilitates accurate estimation of the lead times required for each development phase (typically, a division into the four phases of DTAP, but that can be more specific).

Controlled DTAP Reveals Development Bottlenecks and Explains Problems More Easily

When you have an overview of the time it takes to develop new features, you can track the time between the discovery of a bug and its introduction and the time it takes to resolve it. See Figure 5-3 for a simple visualization of such a feedback cycle.

bmso 0503
Figure 5-3. DTAP feedback cycles

By measuring you can verify the difference in effort when issues are identified early or late in the development process. Clearly, issues found in production take more time to fix than the ones found in the test environment. Therefore, you want to keep the loop as close as possible. And this is why effective automated testing will make a huge difference: it will identify issues early and effortlessly.

Controlled DTAP Reduces Dependence on Key Personnel

With more consistency between environments, less specialist knowledge is required to fix problems or set up infrastructure. This is generally true for standardization: consistency means less dependence on assumptions and specialist knowledge.

It is of particular interest because different specialties are involved: development, testing (including data, scenarios, scripting), and infrastructure. Consider how much expertise is necessary when none of these is standardized. Not all of this expertise might be part of the development team and therefore there may be dependencies on other teams, other departments, and even other companies (when infrastructure/hosting are outsourced).

How to Apply the Best Practice

Usually, the production environment is already clearly distinguished from the other environments in terms of organization. This is because often another department (operations) or another organization (hosting provider) is contracted to provide and manage the system’s operation. To get the most out of the distinction between environments, you should:

  • Separate concerns and restrict access to the different environments: developers should not by default be able to, for example, access the production environment. For testing and acceptance environments, the distinction is less solid; e.g., when work is performed with cross-functional teams (DevOps).

  • Only push code into the next environment when all tests have succeeded: when a test fails, the code should quickly return to the development environment so that the developers can fix the underlying cause. For consistency, bug fixes should be done in the development environment only. This seems evident, but we often see bug fixes being made in the acceptance environment and then backported (merging to an earlier version). See also Figure 5-3.

  • Have the test environments resemble production as much as possible. This includes at least uniformity in versions, (virtualized) configuration of hardware, configured usage of frameworks, libraries, software, and having representative test data and scenarios. A particular point of interest is the actual test data. The ideal, most realistic dataset is an anonymized production copy, but this is often restricted because of security requirements or may not be feasible because of the production database size or other technical issues. For systems in which data integrity is especially important, one solution is to use an older backup that has lost its sensitivity.

Measuring the DTAP Street in Practice

Suppose that you have taken steps to make your DTAP environments clearly separated. You now want to identify the effect on productivity: the effort that the team needs to implement new features. Therefore, as team lead, you could come up with the following GQM model:

  • Goal A: To understand productivity by gaining insight into effort spent within the different development DTAP phases.

    • Question 1: What is the average time spent in each development phase for a particular feature?

      • Metric 1: Time spent on realizing features/issues/user stories divided by each DTAP phase. This could be the average time spent from development to acceptance as measured by a Continuous Integration server (see Chapter 7). The time spent for the last push to production is typically manually registered. At the start you may find much effort concentrated in the acceptance environment. Gradually you may expect a distribution where the time spent in test and acceptance environments lowers, as tests improve and processes run more smoothly. Ideally, the maximum amount of time is spent in development producing new features (instead of fixing bugs that were developed earlier).

    • Question 2: How many features pass through the testing environment but fail in the acceptance environment?

      • Metric 2: Percentage of rejected features/issues/user stories during acceptance testing. You could take the average for each sprint. Expect a downward trend. Investigate acceptance test rejections to see whether human error played a role, or unclear expectations, configurations, or requirements. This metric indicates how well code is developed and tested, but also signals how well the test and acceptance environments are aligned.

    • Question 3: How long are the feedback cycles on average?

      • Metric 3: Time between checking in erroneous code into version control and discovery of issue. The timeline can be determined after analysis of the bug, sifting through version control. Tracing the issue is clearly easier if you follow the version control guideline on doing specific commits with issue IDs (refer back to “Commit Specifically and Regularly”). Expect this metric to follow the feature phase time, and expect a declining trend.

This model is initially designed to obtain a baseline: the initial value from which you can measure improvement. Consider the first metric: the time spent in each phase combined with the duration of feedback cycles gives you information on how well development moves through the DTAP street. In combination these metrics can help you understand where improvements are possible. For example, when issues seem to take a long time to resolve, an underlying problem could be that the team members are unevenly distributed over the different environments. It could be that there are too many developers compared to the number of testers, so that the testers cannot keep up the work, resulting in a longer phase time in the testing environment.

For this, you can use the following GQM model:

  • Goal B: To understand whether the team is properly staffed/distributed over the DTAP environments.

    • Question 4: Is the development capacity balanced with the testing capacity?

      • Metric 4a: Number of work items for testers. Ideally, the amount of work should be stable and according to the testing capacity; that is, the metric moves toward a stable “standard backlog.” If the backlog is rising it seems that testers cannot keep up with the load. That may lead to different causes (e.g., a lack of test capacity, a lack of test automation, or unclarity of requirements).

      • Metric 4b: Workload for different roles (developers, testers, infrastructure specialists). Depending on the exact composition of the team, the distribution of the team might be uneven. This can be measured by monitoring actual working hours (as present in your time registration). Comparable to the number of work items for testers, a growing number of working (over)hours signals that a backlog is building up. You would have to assume that team members are writing hours consistently and accurately.

    • Question 5: Are issues resolved at a faster pace than they are created?

      • Metric 5: The sum of created issues versus resolved issues, averaged per week. This would exclude closed issues that are unresolved. A negative sum signifies that the backlog is shrinking and that the team can manage the issue load. Expect a downward trend that stabilizes at a point where the number of created issues is on average about the same as the number of resolved issues. That signifies that the influx of issues can be managed, even when taking into consideration peaks and troughs. In Figure 5-4, the surfaces signify whether more issues are resolved than created (green) or whether the backlog is growing (red).

bmso 0504
Figure 5-4. Created versus resolved issues per day

With this GQM model you can determine whether a bottleneck exists in your staffing: when testing work is piling up, maybe more testers or different/better tests are needed. Or the other way around, when testers are not occupied, you may want to increase the number of developers. Over time you can determine norms that are considered “business as usual;” for example, a 1:1 relationship between development and testing effort.

Consider that it is advantageous to split functionality into small parts so that they reach the testing and acceptance phases more quickly. In terms of automation and tooling you can also improve the speed and predictability of code through the development pipeline with the help of the following:

  • Automate your tests to speed up the two testing phases (see Chapter 6)

  • Implementing Continuous Integration to improve the speed and reliability of building, integrating, and testing code modifications (see Chapter 7)

  • Automate deployment (see Chapter 8)

These practices are discussed in the following chapters. Notice that test automation is placed before Continuous Integration because an important advantage of using Continuous Integration servers is that they kick-off tests automatically.

Common Objections to DTAP Control Metrics

Objections against controlling the DTAP street concern perceptions of slowing down development or the idea that it is unnecessarily complex to distinguish test and acceptance environments.

Objection: A Controlled DTAP Street Is Slow

“Controlling DTAP will actually slow down our development process because we need more time to move our code between environments.”

There may be some overhead time to move code from the development environment to the test environment and from the test environment to the acceptance environment. But you “win back” that time when bugs arise: analyzing bugs will be faster when you know in what phase they occur, as it gives you information about their causes. Moreover, these transitions are suited for automation.

The pitfall of controlling different environments is to end up with a classical “nothing gets in” kind of pipeline, with formal tollgates or entry criteria (e.g., some sort of rigid interpretation of service management approaches such as ITIL).

Objection: There Is No Need to Distinguish Test and Acceptance Environments

“We can do all the testing, including acceptance testing, in one environment, so it is unnecessarily complex to create a separate acceptance environment.”

An acceptance environment requires an investment to have it resemble the production environment as much as possible. But this has several advantages. First, you can distinguish better between integration issues (essentially of technical nature) and acceptance issues (which typically have a deeper cause). Second, it distinguishes responsibilities in testing. When acceptance testing starts, the technical tests should all have passed, which provides extra certainty that the system will behave well. Then, acceptance tests can make assumptions about the system’s technical behavior. This narrows down the scope somewhat for acceptance tests. Of course, to what extent those assumptions hold depends on whether the configuration in the test environment is sufficiently representative of production.

Metrics Overview

As a recap, Table 5-1 shows an overview of the metrics discussed in this chapter, with their corresponding goals.

Table 5-1. Summary of metrics and goals in this chapter
Metric # in text Metric description Corresponding goal

DTAP 1

Average feature phase time

DTAP phase effort

DTAP 2

Percentage of features that fail during acceptance testing

DTAP phase effort

DTAP 3

Feedback cycle time

DTAP phase effort

DTAP 4a

Number of work items for testing

Team distribution and workload

DTAP 4b

Workload of different roles

Team distribution and workload

DTAP 5

Sum created versus resolved issues

Team distribution and workload

With a controlled DTAP street, you are in a good position to shorten feedback cycles and delivery times. The next three chapters are aimed at achieving those goals through automated testing (Chapter 6), Continuous Integration (Chapter 7), and automated deployment (Chapter 8).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset