After PI planning, the teams on the ART are set to work. They’ll look at the outputs of Continuous Exploration and the features selected for the PI and carry them to the next stage of the Continuous Delivery Pipeline, Continuous Integration (CI).
In this chapter, we will discover the following activities in the CI stage of the Continuous Delivery Pipeline:
We will also discover that the processes described in the Continuous Delivery Pipeline will meet up with automation in a CI/Continuous Deployment (CD) pipeline in the CI stage.
Let’s join the ART now as they develop a solution as defined by the features created during Continuous Exploration.
Teams on an ART work in markedly different ways than those that worked in product development using waterfall methodologies. An emphasis on Lean thinking and a focus on the system dictates newer ways of working.
We will examine the following aspects of engineering practices that are used by Agile teams today:
These practices strive to allow for a continuous flow of work, ready for the next state of building.
An important part of the Lean flow we picked up on in Chapter 4, Leveraging Lean Flow to Keep the Work Moving, was to keep batch sizes small. A feature, as presented to us before PI planning, is a large batch of work, meant to be completed by the end of the PI. In order to ensure a smooth flow of work, the feature must be broken down into smaller batches of work.
User stories often describe small pieces of desired user functionality meant to be delivered by the end of a sprint or iteration, which is commonly two weeks. They are commonly phrased in a user-voice form that briefly explains who the story is for and what the desired functionality and intended value are. An example follows of the user-voice form:
As a customer, I want an itemized receipt of services emailed monthly so that I can understand and organize my spending.
An essential part of the story is its acceptance criteria. The acceptance criteria outline the correct behavior of the story and are the way the team determines that the story is complete.
Acceptance criteria can be written using the Gherkin format. This helps outline pre-conditions, inputs, and the desired behavior and outputs. These are described in clauses that begin with GIVEN-WHEN-THEN.
The following example of acceptance criteria for our story is written in the Gherkin format. Note how it describes the initial conditions, inputs, and outputs:
Initial conditions |
GIVEN I have configured the notification date… |
Input |
…WHEN the notification date passes… |
Output or desired behavior |
…THEN I receive an email notification containing my itemized receipts. |
Table 11.1: Acceptance criteria in the Gherkin format
Enabler stories can also come from features. These stories do not provide direct user value but allow for easier development and architectural capability for future user stories, creating future business value. SAFe® outlines the following four types of enablers.
Splitting a feature into user and enabler stories can be done in a variety of ways. The following methods can be used to create stories that can be completed in a sprint.
Note that splitting a large story further may be required so that the story can be completed and delivered by the end of the sprint. The preceding methods can be used to split larger stories into smaller stories.
Although teams can choose how they develop their stories, high-performing teams have found that team members working together instead of working solo produces higher quality products and enables effective knowledge sharing, and this creates stronger, more collaborative teams.
In this section, we’ll discuss two practices that allow teams to collaboratively develop products together, promoting better quality and stronger team cohesion: pair programming and mob programming or swarming.
Pair programming is a practice that originated with Extreme Programming (XP). Instead of two developers working separately in front of two computers, the developers are working in front of a shared computer, exchanging ideas back and forth, and simultaneously coding and reviewing their work.
With two developers working together, the following patterns emerge for how they collaborate.
Pair programming has proven to be an effective way of working together. Code written during a pair programming session is frequently reviewed and debugged, resulting in higher-quality code. Knowledge is shared between developers, creating faster learning for novices or those new to the code base. If the code breaks, there is also more than a single developer with an understanding of the code that can help with repairs.
A common misconception is that pair programming requires twice the effort or resources. This belief is not supported by studies done on the effectiveness of pair programming, including one done by the University of Utah (https://collaboration.csc.ncsu.edu/laurie/Papers/XPSardinia.PDF) that found that while development costs increased by 15%, defects discovered at later stages decreased by 15% and code functionality was accomplished using fewer lines of code, which is a sign of better design quality.
Mob programming can be considered to be pair programming taken to the highest level. Instead of a pair of developers, the entire team is seated in front of a single computer and controls. The team is working on the same thing, in the same time and space, and on the same computer.
A typical pattern for this is a variation of the driver/navigator pattern. One person on the team has control of the computer, typing and creating the code or other pieces of work. The other members of the team review and guide the driver as navigators. After some time (usually 10 minutes), the controls are rotated to another member of the team. The rotation continues until all members of the team have had an opportunity to play the driver.
Mob programming benefits the entire team. Knowledge sharing of the code is applied to the entire team, instead of a pair of developers. Communication is easier with the entire team present. Decisions are made with the most current and relevant information.
During this time, not only is the product being developed but the ways of ensuring that the product is high-quality are also being developed simultaneously. This is a change from traditional development where tests were created and run after the development of code. This change is often referred to as shifting left as illustrated in the following representation of the development process. This is one of the important practices in SAFe, which is described in more detail in the SAFe article on Built-In Quality (https://www.scaledagileframework.com/built-in-quality/):
Figure 11.1 – Comparison of testing with “shift left” (© Scaled Agile, Inc., All Rights Reserved)
In the preceding diagram, we see on the left that traditional testing may test stories and features long after the stories and features were originally conceived. This delayed feedback may take as long as 3 to 6 months, which may be too late to know whether we are moving in the right direction.
With the diagram on the right, we see that we can accelerate the feedback using TDD and BDD tests to evaluate whether the behaviors of the feature and story are what is desired. Ideally, these tests should be automated so that they can be run repeatedly and quickly.
Another thing we can see from the preceding diagram is that there are many levels of tests, some of which should be run repeatedly with the help of automation, and others that may take some time or can only be run manually. How do we know which tests to automate and which tests should be run frequently?
Mike Cohn described the levels of testing as a “testing pyramid” in his book Succeeding with Agile. He initially described the pyramid with the following three levels from bottom to top.
Other types of testing can be added and applied to the testing pyramid. This allows us to view the testing pyramid as follows:
Figure 11.2 – Test pyramid
Note that at the bottom of the pyramid, unit tests are both the quickest to execute and the cheapest to run. It makes sense to automate their execution in the pipeline and frequently run them at every commit into version control, ideally during the build phase.
As you move further up the pyramid, the tests gradually take longer to execute and are more expensive. Those tests may not be run as frequently as unit tests. They may be executed through automation, but only upon entering the testing phase. Examples of these types of tests include story testing from BDD, integration testing, performance testing, and security testing.
The tests at the top of the pyramid take the longest to execute and are also the most expensive to run. These are mostly manual tests. These tests may be run just before release. Examples of testing here include user acceptance testing and exploratory testing done by the customer.
Most tests look to verify either proper code functionality and correctness as well as verification of the behaviors of the story and feature. The primary methods of creating tests to measure these criteria fall into the following methods: TDD and BDD. Let’s look at how these tests are developed.
TDD is a practice derived from XP. With TDD, you practice the following flow:
This flow is repeated as new functionality is developed. A graphical representation of this flow is as follows:
Figure 11.3 – TDD (https://en.wikipedia.org/wiki/Test-driven_development#/media/File:TDD_Global_Lifecycle.png licensed under CC BY-SA)
The tests usually written using TDD are unit tests; small, easily executed tests designed to verify the correct functionality of a code module. Broader tests use BDD to develop tests that verify the systemic behavior of features and stories. Let’s look at BDD now.
BDD is often seen as an extension of TDD, but while TDD looks to verify the correct behavior of individual code functions and components, BDD strives to verify the correct behavior of the system as an executable specification expressed in features and stories. One application of BDD was seen earlier in this chapter when we created the acceptance criteria for the story.
Looking at the correct systemic behavior involves three perspectives that work together to bring their point of view of what is eventually specified, what is developed, and what gets tested as correct. The following three perspectives include the following:
BDD brings these three perspectives together using specifications. These specifications are written in a Domain-Specific Language (DSL) that employs natural language syntax so that technical and non-technical people can collaboratively develop the specification. One of these DSLs is Gherkin, which divides behavior into the following three clauses:
Multiple GIVEN, WHEN, and THEN clauses may be joined together using AND to indicate multiple conditions, inputs, and behaviors accordingly
The specification, using the DSL, can become several artifacts. Product owners and product management create the acceptance criteria for features and stories with the other members of the development teams. The creation of acceptance criteria can be seen as the discovery of the desired systemic behavior.
The next phase of creating the specification is formulation. In this phase, developers and testers work together to create acceptance tests. They can take the acceptance criteria as written and elaborate specific criteria in each clause, including allowable initial conditions and values to measure for inputs and outputs so that the specification for a specific scenario becomes a test. Ideally, acceptance tests are written in the same DSL as the acceptance criteria.
We can create an automated test by taking the acceptance criteria for our story and adding specific pre-conditions, input, and desired output or behavior. Let’s look at our previously seen acceptance criteria converted into a test in the following table:
Acceptance Criteria |
Test |
GIVEN I have configured the notification date… |
Given the date state is not x… |
…WHEN the notification date passes… |
…when it’s one business day after the date state has changed to x… |
…THEN I receive an email notification containing my itemized receipts. |
…then send an email notification to all users with xxx content. |
Table 11.2 – Conversion of acceptance criteria into a test
The last phase of the specification is automation. The acceptance test, written in the DSL, can be executed in a tool that allows for automated testing. Acceptance tests written in Gherkin can be executed by tools such as Cucumber, JBehave, Lettuce Behave, and Behat.
Version control software allows for multiple developers on a team to develop in parallel on the same code base, test scripts, or other bodies of text without interference from other developers’ changes. Each change in the version control system is recorded. Changes are consolidated using merge operations to bring together and resolve changes in the bodies of work.
Important practices for the use of version control include the following ideas:
Other best practices of version control that take place during the build process will be identified in our next section.
The features and stories are not the only criteria teams have to consider when developing new capabilities for their product. Non-Functional Requirements (NFRs) are qualities that may impact every feature and story acting as a constraint or limitation. These NFRs may deal with security, compliance, performance, scalability, and reliability, among other things
Two practices are performed to ensure compliance with some NFRs: designing for operations and threat modeling. Let’s take a look at these practices.
The collaboration between development and operations is a hallmark of the DevOps movement. Ensuring that operations can easily examine system resources is something that is easily incorporated in the early stages of development rather than added as an afterthought.
A key part of ensuring capabilities are present for proper maintenance of the product is application telemetry. The product as a system must allow for the easy measurement of system resources, including how an application uses resources such as server memory and storage. In addition to system measurements, application telemetry should also allow for the measurement of business data, used as a leading indicator to validate the benefit hypothesis.
Other considerations include ensuring that changes brought by new features can be easily rolled back or that fixes can be rolled forward through the Continuous Delivery Pipeline. When doing this, be aware of components that may represent the state of the system, such as a database. These may not be easily rolled back.
Moving toward a DevSecOps approach requires a shift-left mindset toward security. This mindset allows for including security concerns in the design and development stages of the Continuous Delivery Pipeline, which gives a more holistic view of the product.
We first saw that threat modeling was part of architecting the system in Chapter 10, Continuous Exploration and Finding New Features. As part of threat modeling during CI, we may be asking the following questions:
Developing toward DevSecOps is then based on the countermeasures and mitigation steps identified during the assessment.
As development changes are completed, they must be integrated with the current product as it stands and be tested. This may start the incorporation of automation into a CI/CD pipeline. Next, let’s examine how entry into the CI/CD pipeline begins with a build process.
The CI/CD pipeline can be triggered by version control system actions such as saving a new change as a commit. Before the changes are accepted by the version control system, they should go through a testing process to ensure they will not adversely affect the current code base. This process of testing and version control integration is an important part of the CI process where practices are divided into a version control perspective and a testing perspective.
Let’s look at each of these perspectives and the practices within.
Good version control practices ensure that changes introduced into version control are evaluated through testing before they are saved and merged with the existing code base. This ensures that the code base is robust, without changes that may prevent the code base from being built or packaged correctly.
Version control practices can further be divided into three types that help ensure a robust code base as changes come in. Let’s look at what these practices are in detail.
The practice of CI has its origins in optimizing the build process through automation by a build script or a CI tool. Saving a change to version control through a commit operation sets off a chain of the following steps:
The preceding steps are outlined in the following diagram:
Figure 11.4 – CI automation
When performing the preceding chain of steps, teams eventually figured out that successful code integration relied on the following factors:
CI results in the following outputs, regardless of success or failure:
CI tools such as Jenkins, GitLab pipelines, and GitHub Actions form the basis of the automation that allows the CI of code to occur.
Multiple developers in a single team or even multiple teams (such as the team of teams that is present in an ART) often must work on the same code base that is saved in version control. Version control systems allow for parallel development of the same code base by allowing changes to be branched out. A developer or team could work on a branch without their change affecting the rest of the team or ART until it was ready to merge and be shared with other developers or teams.
Figure 11.5 – Branching structure example
In the preceding illustration, Team 1 and Team 2 have their own branches of the code base, based on different versions of the main branch (commonly referred to as the trunk). A developer on Team 1 has created a change (D1) and merged it back into the Team 1 branch as a change, T1-2. With separate team branches, how do we know that a necessary change from Team 2 is visible and can be used by Team 1?
Another problem occurs as Team 1 and Team 2 develop on their branches without receiving updates from the trunk and wait to merge when they release or at the end of the sprint. Keeping track of the multiple changes from multiple teams and resolving a large number of merge conflicts results in an incredible challenge.
To keep things simple and ensure changes are visible to all teams, we want to avoid branches that are permanent or last a long time. Developers and teams may form branches to allow for parallel development, but when ready, they must merge back to the main branch, destroying the offshoot branch in the process. This process is known as trunk-based development. The following diagram highlights the process of trunk-based development:
Figure 11.6 – Branching structure with trunk-based development
Trunk-based development allows for easier merge operations to occur since a merge to the main branch is happening on each validated change instead of a group of changes.
With trunk-based development, we are merging changes to the main branch of the code base as often as possible. This main branch is used by all the teams on the ART. Since the integrity of the main branch is vital for multiple teams, how do we ensure that any errant changes don’t break the current code base?
To ensure a robust code base, we must follow a gated commit process where before a change is allowed to merge with the main branch, it must successfully pass the build and test process. Additional measures, such as a review of the change, may also be taken.
In Git-based environments, Git servers from Bitbucket, GitLab, and GitHub define gated commits as pull requests or merge requests that allow for closer scrutiny when a merge operation is requested.
We saw that build processes relied on testing to ensure that changes to a code base did not adversely affect the functionality or the security of the product. The tests that were run proved to be important since ideally, the build process would be performed on every saved change and if successful, the next step would be merging the change to the main branch, which is visible to the team or multiple teams in the case of the ART.
The build process involves two types of tests that are run against a potential new version of the code base: automated unit testing and static analysis for application security.
Let’s take a deeper look at each type of test run as part of the build process.
Unit tests are often written at the same time as code, if not beforehand. These unit tests may be run by an individual developer on their workstation while the code is being developed. If that’s the case, why run them again as part of the build process?
The main idea of CI is to ensure a standard, reliable process. Automation through a CI/CD pipeline ensures that this occurs for all developers. Adding unit testing during the build process on an automated CI/CD pipeline ensures that the unit tests are run on every developer’s code change every time.
It’s also important to make sure that any unit tests that have been updated are simultaneously part of the potential change to the code base in version control. This ensures that code changes are validated against correct tests preventing a situation where the CI/CD pipeline stops because of incorrect tests. Collaborative development efforts between those creating the code and those creating the tests are required to ensure this situation doesn’t occur.
Static analysis is a process by which a tool scans the text of the code base, including the potential code change that is being checked in, to find specific text patterns. These text patterns can be used to identify the following issues:
The analysis is performed without the need for executing the application. Because of this, static analysis is an efficient means of checking for problems in the build process.
As we saw in Chapter 3, Automation for Efficiency and Quality, static analysis can fall into the following two categories:
Our application has passed the first set of tests, but is it ready for the rigors of a production environment? To answer that question, we look to perform system-level testing. Our next section examines the practices that enable system-level testing.
At this point, we have performed tests on individual pieces of code and ensured the correct functionality while maintaining security. Here, we start to integrate the new changes of code with the existing code base and evaluate the system as a whole by testing the system end to end.
The practices that allow for true end-to-end testing of the system will be examined in the upcoming sections. Let’s dive in.
System-level testing should be performed in an environment that resembles the production environment as closely as possible. Testing the solution in an environment with as many similarities to the production environment as possible enables higher confidence that the solution will work when actually released into the actual production environment. The more similarities a test environment has with the production environment, the fewer variables come into play when problems are found and troubleshooting for the root cause begins.
A key factor in ensuring the equivalence between test environments and the production environment is the use of configuration management. With configuration management, key resources such as the operating system version, versions of key drivers, and versions of applications are recorded in a text-based configuration file. Ideally, the configuration file is maintained in version control with labels that indicate the version of the solution and its application in the test environments and production environment.
Because the cost of allocating exact duplicates of the resources in production may be prohibitive, the important view is that exact versions of resources, rather than the exact number of resources, are key to maintaining equivalence.
System-level testing can encompass a variety of levels, most of which can be automated. When looking at a variety of tests at various levels, which of these tests could be automated?
In addition to the testing pyramid mentioned earlier, we may need to consider the people who need to understand the test results. A second consideration is whether the test is to verify that the solution is meeting requirements or whether the test is to allow developers to see whether their design approach is correct.
The Agile testing matrix looks at the various kinds of tests and organizes them from these considerations. The following diagram depicts the Agile testing matrix as seen in the SAFe article on Agile Testing (https://www.scaledagileframework.com/agile-testing/):
Figure 11.7 – Agile testing matrix (© Scaled Agile, Inc., All Rights Reserved)
We can see from the preceding diagram that the first consideration looks at the perspective of either the business or the technology. Developers look at the technology tests to ensure the correct functionality and proper operation of the solution. End users look at the business-facing tests to ensure an understanding of the solution and the validation of the benefit hypothesis.
We can also see the second consideration: whether the test informs the complete solution or the implementation. Tests that guide development assist in TDD and BDD approaches where the test is written first. Tests that critique the product look to see whether the solution complies with user requirements.
With two areas of concern within each of the two considerations, we can divide tests into the following four quadrants:
We will see that tests in Q3 are done during CD in Chapter 12, Continuous Deployment to Production. Tests in Q4 are done during Release on Demand as mentioned in Chapter 13, Releasing on Demand to Realize Value.
A key part of ensuring the similarities between a testing environment and the production environment is the data used to test the solution. Using data that could be found in production environments allows for more realistic test outcomes, leading to higher confidence in the solution.
Realistic test data can come from either synthetic test data or real production data. Test data may come from a backup of production data restored into the test environment. The test data should have any information that is considered private removed.
Synthetic test data is fake data created by a data generation tool such as DATPROF Privacy and Gretel. It offers the advantage of not requiring an anonymization step to redact private information.
Regardless of whether the data is anonymized production data or synthetic data, the test data should be maintained in version control, using artifact repository software for large binary-based data.
Service virtualization allows for test environments to behave like production environments even when the test environment is missing resources available in production. The production environment may have crucial dependencies on key components that are impossible to copy because of the following factors:
Systems made up of components that communicate together using well-known interfaces can take advantage of service virtualization to simulate the behavior of one or more components. If the virtualized service is a database, it can return synthetic test data.
Components that are simulated in test environments are called virtual assets. Virtual assets are created by tools that measure the true component’s behavior by the following methods:
Once the virtual asset is created, it takes its place in the test environment. An illustration of the difference between the production environment and the test environment with virtual assets is as follows:
Figure 11.8 – Production vs. test environment
Popular tools for creating virtual assets include SoapUI and ReadyAPI from Smartbear, MockLab, Parasoft Virtualize, and WireMock.
An important difference to consider is that while service virtualization may seem similar to mocking or stubbing a component, the two concepts are not similar. Adding a mock component or a stub may be done during development when the component is not ready for release. The behavior of a mock object only returns one type of output – a success message – so the development of other components is not impeded. Service virtualization allows for the proper behavior for a wide variety of scenarios.
Environments with virtual assets should be maintained in configuration management tools. The configuration files and interface definition files for virtual assets should be kept in version control, in close proximity, and with labels identifying their role as test assets for versions of an application.
As we perform end-to-end system testing, we need to remember the constraints our system has, which we have previously identified as NFRs. NFRs affect every story and feature, acting as a constraint that must be heeded. Qualities such as security, performance, reliability, and scalability, among other things, should be examined through testing to verify that these constraints aren’t broken.
The testing of NFRs is often automated, involving specialized testing tools. Agile teams on the ART often work with the system team to ensure that the tooling is established to perform testing for NFRs as part of the end-to-end testing.
After the portfolio of tests is performed on our change, we may want one more opportunity to see whether our change is ready to be deployed to production. We place our changes into a staging environment, a stand-in for the production environment for a final examination before deploying changes to production. Let’s look at the activities involved in deploying changes to staging.
We may want verification that we can deploy the change to a production-like environment and verify that our solution still works. To enable this last look, we employ certain practices.
Let’s look at these practices in depth.
A staging environment is a facsimile of the production environment, which has several uses throughout the PI. It is the place where demonstrations of the system as it currently stands are performed in the system demo. User acceptance testing can be performed in this environment, which is as close to production as possible.
As the changes to the product are being developed, the staging environment shows the state of change before deployment to production. At the very least, changes to the staging environment happen every sprint or iteration for the ART. More frequent changes are allowed as long as the build process and end-to-end testing are completed successfully.
A staging environment may also act as an alternative environment for production in a configuration known as blue/green deployment, which may allow for easy rollback in the event of production failures. Let’s take a look at this configuration now.
In a blue/green deployment, you have two identical environments. Of the two environments, one is the production environment and facing traffic, while the other is idle and on standby.
The idle environment receives the latest change where thorough testing occurs. At the appropriate time, the change is released by making the idle environment live and the other environment idle. This transition is illustrated by the following graphic:
Figure 11.9 – Blue/green deployment release of a new version
If problems are discovered, a switchback can be made to roll back changes. This transition back and forth is easy for systems that do not track states. Otherwise, blue/green deployments must be carefully architected so that components that are capable of storing states, such as databases, do not get corrupted upon transition.
At the end of every sprint or iteration in the PI, after each team’s iteration review, the teams get together to integrate their efforts. Working with the system team, they demonstrate the current state of the product as it stands so far in the PI in the staging environment. Business owners, customers, and other key stakeholders of the ART are present at this demonstration to view progress and supply feedback. This event provides fast feedback on the efforts of the ART so far.
Note that the system demo does not prevent the deployment of changes into production. That may continue to happen automatically as part of CD (which we will visit in the next chapter), but feedback may prevent the release of the change to customers until changes resulting from the feedback make their way into production and are released on demand.
Successful testing in a staging environment gives us confidence that our change has the correct functionality and is robust enough in a production environment, but the only true way to prove that is to deploy our change into the actual production environment.
In this chapter, we continued our discovery of the Continuous Delivery Pipeline by looking at CI, the part that implements the features created in Continuous Exploration. Features are divided into more digestible stories. Development of not only the product but also the tests to verify the product begins. Security and designing for operation concerns are included in the development.
The build phase introduces automation into the pipeline. When a commit to version control occurs, unit tests are run to ensure continued, correct functionality. The build is also scanned for coding errors and to find security vulnerabilities. If everything is correct, the commit will be allowed to merge with the main branch, or trunk, of the version control repository.
A successful build can trigger further testing in a testing environment that may be similar to the production environment. Here, system-level, end-to-end testing happens to guard against any production failures. The testing here is as automated as it can be. Accurate test data and service virtualization may offer a reasonable facsimile to a production environment for testing.
When building and testing are complete, the change may find itself in a staging environment, a copy of the production environment, or one-half of a blue/green deployment. A staging environment is also a place where changes are shown during a system demo, an event where the ART receives feedback on the development of the system at the end of each sprint or iteration.
After making its way to the staging environment, we must move our changes into production. That happens in CD, which we will explore in our next chapter.