Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

11 Continuous Integration of Solution Development

After PI planning, the teams on the ART are set to work. They’ll look at the outputs of Continuous Exploration and the features selected for the PI and carry them to the next stage of the Continuous Delivery Pipeline, Continuous Integration (CI).

In this chapter, we will discover the following activities in the CI stage of the Continuous Delivery Pipeline:

Developing the solution
Building the solution package
Performing end-to-end testing
Moving the packages to a staging environment

We will also discover that the processes described in the Continuous Delivery Pipeline will meet up with automation in a CI/Continuous Deployment (CD) pipeline in the CI stage.

Let’s join the ART now as they develop a solution as defined by the features created during Continuous Exploration.

Developing the solution

Teams on an ART work in markedly different ways than those that worked in product development using waterfall methodologies. An emphasis on Lean thinking and a focus on the system dictates newer ways of working.

We will examine the following aspects of engineering practices that are used by Agile teams today:

Breaking down the work
Collaborative development
Building in quality
Version control
Designing to the system

These practices strive to allow for a continuous flow of work, ready for the next state of building.

Breaking down into stories

An important part of the Lean flow we picked up on in Chapter 4, Leveraging Lean Flow to Keep the Work Moving, was to keep batch sizes small. A feature, as presented to us before PI planning, is a large batch of work, meant to be completed by the end of the PI. In order to ensure a smooth flow of work, the feature must be broken down into smaller batches of work.

User stories often describe small pieces of desired user functionality meant to be delivered by the end of a sprint or iteration, which is commonly two weeks. They are commonly phrased in a user-voice form that briefly explains who the story is for and what the desired functionality and intended value are. An example follows of the user-voice form:

As a customer, I want an itemized receipt of services emailed monthly so that I can understand and organize my spending.

An essential part of the story is its acceptance criteria. The acceptance criteria outline the correct behavior of the story and are the way the team determines that the story is complete.

Acceptance criteria can be written using the Gherkin format. This helps outline pre-conditions, inputs, and the desired behavior and outputs. These are described in clauses that begin with GIVEN-WHEN-THEN.

The following example of acceptance criteria for our story is written in the Gherkin format. Note how it describes the initial conditions, inputs, and outputs:

Initial conditions	GIVEN I have configured the notification date…
Input	…WHEN the notification date passes…
Output or desired behavior	…THEN I receive an email notification containing my itemized receipts.

Table 11.1: Acceptance criteria in the Gherkin format

Enabler stories can also come from features. These stories do not provide direct user value but allow for easier development and architectural capability for future user stories, creating future business value. SAFe® outlines the following four types of enablers.

Infrastructure
Architectural
Exploration
Compliance

Splitting a feature into user and enabler stories can be done in a variety of ways. The following methods can be used to create stories that can be completed in a sprint.

Workflow steps: Set up stories to perform the necessary steps of a workflow first. The other steps can be released in later sprints.
Variations of a business rule: Divide up stories according to different rules for business, such as different classes of service, different product lines, and so on.
Major effort: Examine the possible stories that could be followed. Pick the story that appears to be most difficult to do first.
Simple vs. complex: When evaluating the feature, is there a story that could be written that would provide the core functionality? That’s the first story to work. Subsequent stories would elaborate on the core functionality.
Variations present in data: Start with a story that works with one data type and move to different stories to handle the other data types.
Different data entry methods: Start by creating a story that enters the data manually, then progress with further stories with automated data entry.
Different system qualities: One example of system qualities could be the different devices or interfaces our application would work with and establishing a story for each.
By operation: Divide into stories that handle different operations. A common breakdown is Create, Read, Update, and Delete (CRUD).
Use case scenarios: Create a story for every use case.
Setting a spike and follow-up: A spike is used to set aside development time to research a technical approach or an unknown. Once the research is complete, proceed with stories that implement the approach.

Note that splitting a large story further may be required so that the story can be completed and delivered by the end of the sprint. The preceding methods can be used to split larger stories into smaller stories.

Collaborative development

Although teams can choose how they develop their stories, high-performing teams have found that team members working together instead of working solo produces higher quality products and enables effective knowledge sharing, and this creates stronger, more collaborative teams.

In this section, we’ll discuss two practices that allow teams to collaboratively develop products together, promoting better quality and stronger team cohesion: pair programming and mob programming or swarming.

Pair programming

Pair programming is a practice that originated with Extreme Programming (XP). Instead of two developers working separately in front of two computers, the developers are working in front of a shared computer, exchanging ideas back and forth, and simultaneously coding and reviewing their work.

With two developers working together, the following patterns emerge for how they collaborate.

Driver/navigator: In this pattern, one developer takes control of the computer (driver) while the other developer reviews and guides by commenting on what is typed on the screen (navigator). At certain times during the session, the roles are switched. This is the most common pattern used in pair programming. This pattern is frequently used when one of the developers is an expert programmer paired with a novice.
Unstructured: In this ad hoc style of pair programming, there are no set roles for the developers. The collaboration tends to be unguided and loose. Typically, this pattern is adopted when neither developer knows what approach will work. This pattern works well with developers that are at similar levels of expertise, but two novice developers may have problems working with this approach.
Ping-pong: This pattern is frequently used where one developer writes the test and the other developer works to pass the test. The developers switch roles frequently between writing the tests and writing code to pass the tests. This style works well with two developers at an advanced level.

Pair programming has proven to be an effective way of working together. Code written during a pair programming session is frequently reviewed and debugged, resulting in higher-quality code. Knowledge is shared between developers, creating faster learning for novices or those new to the code base. If the code breaks, there is also more than a single developer with an understanding of the code that can help with repairs.

A common misconception is that pair programming requires twice the effort or resources. This belief is not supported by studies done on the effectiveness of pair programming, including one done by the University of Utah (https://collaboration.csc.ncsu.edu/laurie/Papers/XPSardinia.PDF) that found that while development costs increased by 15%, defects discovered at later stages decreased by 15% and code functionality was accomplished using fewer lines of code, which is a sign of better design quality.

Mob programming or swarming

Mob programming can be considered to be pair programming taken to the highest level. Instead of a pair of developers, the entire team is seated in front of a single computer and controls. The team is working on the same thing, in the same time and space, and on the same computer.

A typical pattern for this is a variation of the driver/navigator pattern. One person on the team has control of the computer, typing and creating the code or other pieces of work. The other members of the team review and guide the driver as navigators. After some time (usually 10 minutes), the controls are rotated to another member of the team. The rotation continues until all members of the team have had an opportunity to play the driver.

Mob programming benefits the entire team. Knowledge sharing of the code is applied to the entire team, instead of a pair of developers. Communication is easier with the entire team present. Decisions are made with the most current and relevant information.

Building in quality by “shifting left”

During this time, not only is the product being developed but the ways of ensuring that the product is high-quality are also being developed simultaneously. This is a change from traditional development where tests were created and run after the development of code. This change is often referred to as shifting left as illustrated in the following representation of the development process. This is one of the important practices in SAFe, which is described in more detail in the SAFe article on Built-In Quality (https://www.scaledagileframework.com/built-in-quality/):

Figure 11.1 – Comparison of testing with “shift left” (© Scaled Agile, Inc., All Rights Reserved)

In the preceding diagram, we see on the left that traditional testing may test stories and features long after the stories and features were originally conceived. This delayed feedback may take as long as 3 to 6 months, which may be too late to know whether we are moving in the right direction.

With the diagram on the right, we see that we can accelerate the feedback using TDD and BDD tests to evaluate whether the behaviors of the feature and story are what is desired. Ideally, these tests should be automated so that they can be run repeatedly and quickly.

Another thing we can see from the preceding diagram is that there are many levels of tests, some of which should be run repeatedly with the help of automation, and others that may take some time or can only be run manually. How do we know which tests to automate and which tests should be run frequently?

Mike Cohn described the levels of testing as a “testing pyramid” in his book Succeeding with Agile. He initially described the pyramid with the following three levels from bottom to top.

Unit testing
Service testing
UI testing

Other types of testing can be added and applied to the testing pyramid. This allows us to view the testing pyramid as follows:

Figure 11.2 – Test pyramid

Note that at the bottom of the pyramid, unit tests are both the quickest to execute and the cheapest to run. It makes sense to automate their execution in the pipeline and frequently run them at every commit into version control, ideally during the build phase.

As you move further up the pyramid, the tests gradually take longer to execute and are more expensive. Those tests may not be run as frequently as unit tests. They may be executed through automation, but only upon entering the testing phase. Examples of these types of tests include story testing from BDD, integration testing, performance testing, and security testing.

The tests at the top of the pyramid take the longest to execute and are also the most expensive to run. These are mostly manual tests. These tests may be run just before release. Examples of testing here include user acceptance testing and exploratory testing done by the customer.

Most tests look to verify either proper code functionality and correctness as well as verification of the behaviors of the story and feature. The primary methods of creating tests to measure these criteria fall into the following methods: TDD and BDD. Let’s look at how these tests are developed.

TDD

TDD is a practice derived from XP. With TDD, you practice the following flow:

Create the test. This is done to understand the behavior.
Watch the test fail (even with no code written). This gives us confidence in the test execution environment and demonstrates system behavior when a test fails.
Write the simplest code necessary to pass the test.
Ensure all tests pass. This may mean any new code created is revised until the tests pass.
Refactor the tests and code as needed.

This flow is repeated as new functionality is developed. A graphical representation of this flow is as follows:

Figure 11.3 – TDD (https://en.wikipedia.org/wiki/Test-driven_development#/media/File:TDD_Global_Lifecycle.png licensed under CC BY-SA)

The tests usually written using TDD are unit tests; small, easily executed tests designed to verify the correct functionality of a code module. Broader tests use BDD to develop tests that verify the systemic behavior of features and stories. Let’s look at BDD now.

BDD

BDD is often seen as an extension of TDD, but while TDD looks to verify the correct behavior of individual code functions and components, BDD strives to verify the correct behavior of the system as an executable specification expressed in features and stories. One application of BDD was seen earlier in this chapter when we created the acceptance criteria for the story.

Looking at the correct systemic behavior involves three perspectives that work together to bring their point of view of what is eventually specified, what is developed, and what gets tested as correct. The following three perspectives include the following:

Customers who understand the business needs and look for the desirability and viability of new features
Developers who understand the feasible technical approaches
Testers who view the edge cases and boundary conditions of the systemic behavior

BDD brings these three perspectives together using specifications. These specifications are written in a Domain-Specific Language (DSL) that employs natural language syntax so that technical and non-technical people can collaboratively develop the specification. One of these DSLs is Gherkin, which divides behavior into the following three clauses:

GIVEN outlines the initial conditions that must be present for the desired behavior in a scenario
WHEN describes the input that triggers a scenario
THEN describes the desired behavior for the scenario

Multiple GIVEN, WHEN, and THEN clauses may be joined together using AND to indicate multiple conditions, inputs, and behaviors accordingly

The specification, using the DSL, can become several artifacts. Product owners and product management create the acceptance criteria for features and stories with the other members of the development teams. The creation of acceptance criteria can be seen as the discovery of the desired systemic behavior.

The next phase of creating the specification is formulation. In this phase, developers and testers work together to create acceptance tests. They can take the acceptance criteria as written and elaborate specific criteria in each clause, including allowable initial conditions and values to measure for inputs and outputs so that the specification for a specific scenario becomes a test. Ideally, acceptance tests are written in the same DSL as the acceptance criteria.

We can create an automated test by taking the acceptance criteria for our story and adding specific pre-conditions, input, and desired output or behavior. Let’s look at our previously seen acceptance criteria converted into a test in the following table:

Acceptance Criteria	Test
GIVEN I have configured the notification date…	Given the date state is not x…
…WHEN the notification date passes…	…when it’s one business day after the date state has changed to x…
…THEN I receive an email notification containing my itemized receipts.	…then send an email notification to all users with xxx content.

Table 11.2 – Conversion of acceptance criteria into a test

The last phase of the specification is automation. The acceptance test, written in the DSL, can be executed in a tool that allows for automated testing. Acceptance tests written in Gherkin can be executed by tools such as Cucumber, JBehave, Lettuce Behave, and Behat.

Version control

Version control software allows for multiple developers on a team to develop in parallel on the same code base, test scripts, or other bodies of text without interference from other developers’ changes. Each change in the version control system is recorded. Changes are consolidated using merge operations to bring together and resolve changes in the bodies of work.

Important practices for the use of version control include the following ideas:

Save EVERYTHING in version control: A lot of design decisions are captured as artifacts in version control beyond source code. Code, test scripts, configuration files, and any other text-based artifacts can be tagged together to show they are part of the same release. Version control also allows for the retrieval of previous versions to roll back any changes or to view the evolution of design decisions.
Everyone uses the same version control system: As we saw in Chapter 1, Introducing SAFe® and DevOps, the photo-sharing website Flickr had a common version control system between its development and operations people. This allowed for the easy retrieval of artifacts by anyone when production failures occurred.

Other best practices of version control that take place during the build process will be identified in our next section.

Designing to the system

The features and stories are not the only criteria teams have to consider when developing new capabilities for their product. Non-Functional Requirements (NFRs) are qualities that may impact every feature and story acting as a constraint or limitation. These NFRs may deal with security, compliance, performance, scalability, and reliability, among other things

Two practices are performed to ensure compliance with some NFRs: designing for operations and threat modeling. Let’s take a look at these practices.

Designing for operations

The collaboration between development and operations is a hallmark of the DevOps movement. Ensuring that operations can easily examine system resources is something that is easily incorporated in the early stages of development rather than added as an afterthought.

A key part of ensuring capabilities are present for proper maintenance of the product is application telemetry. The product as a system must allow for the easy measurement of system resources, including how an application uses resources such as server memory and storage. In addition to system measurements, application telemetry should also allow for the measurement of business data, used as a leading indicator to validate the benefit hypothesis.

Other considerations include ensuring that changes brought by new features can be easily rolled back or that fixes can be rolled forward through the Continuous Delivery Pipeline. When doing this, be aware of components that may represent the state of the system, such as a database. These may not be easily rolled back.

Threat modeling

Moving toward a DevSecOps approach requires a shift-left mindset toward security. This mindset allows for including security concerns in the design and development stages of the Continuous Delivery Pipeline, which gives a more holistic view of the product.

We first saw that threat modeling was part of architecting the system in Chapter 10, Continuous Exploration and Finding New Features. As part of threat modeling during CI, we may be asking the following questions:

What are we working on? This helps you get an idea of the scope.
What can go wrong with it? This allows you to start your assessment using brainstorming or a structured threat modeling process, such as the Application Security Framework (ASF) or Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, or Elevation of Privilege (STRIDE), which identifies the types of possible information security threats.
What can we do about the things that go wrong? Based on the assessment, devise countermeasures or mitigation steps.
Are we doing a good enough job so far for the system as is? Continually evaluate the assessment, countermeasures, and mitigation steps.

Developing toward DevSecOps is then based on the countermeasures and mitigation steps identified during the assessment.

As development changes are completed, they must be integrated with the current product as it stands and be tested. This may start the incorporation of automation into a CI/CD pipeline. Next, let’s examine how entry into the CI/CD pipeline begins with a build process.

Building the solution package

The CI/CD pipeline can be triggered by version control system actions such as saving a new change as a commit. Before the changes are accepted by the version control system, they should go through a testing process to ensure they will not adversely affect the current code base. This process of testing and version control integration is an important part of the CI process where practices are divided into a version control perspective and a testing perspective.

Let’s look at each of these perspectives and the practices within.

Version control practices

Good version control practices ensure that changes introduced into version control are evaluated through testing before they are saved and merged with the existing code base. This ensures that the code base is robust, without changes that may prevent the code base from being built or packaged correctly.

Version control practices can further be divided into three types that help ensure a robust code base as changes come in. Let’s look at what these practices are in detail.

CI of code

The practice of CI has its origins in optimizing the build process through automation by a build script or a CI tool. Saving a change to version control through a commit operation sets off a chain of the following steps:

The application would be built, incorporating the saved changes. If building this code resulted in an error, notifications would be sent regarding the error. The changes would not be allowed to merge with the code base.
If the build succeeded, tests would run against the build with the code changes. These tests are often small tests, measuring a small part of the functionality, that can be quickly executed and don’t take a lot of time. If a test failure was detected, a notification would be sent, and the changes would not be allowed to merge with the code base.
Another type of test that could be executed would be a scan of the code base. Scanning could look for deviations from coding style standards and syntax errors to known security vulnerabilities. Depending on the severity of the findings, changes would not be allowed to merge with the code base. Notifications would be sent on all findings.
Upon successful results from the building, testing, and scanning steps, the code change would be recorded into the version control system. Merging the change into the main trunk of the code base would proceed on the version control system, integrating the changes with the rest of the code.

The preceding steps are outlined in the following diagram:

Figure 11.4 – CI automation

When performing the preceding chain of steps, teams eventually figured out that successful code integration relied on the following factors:

Performing the integration frequently: Many teams started with performing the build and testing processes nightly, examining changes that had been saved over the previous day. With a lot of changes, the nightly build often grew until it couldn’t be finished until the next day or later. More frequent builds, often occurring several times a day, allowed people to see the build results and act on them swiftly.
Performing integration on fewer changes: A nightly build that absorbs multiple changes from multiple developers produces problems when the build fails. It becomes more complex to troubleshoot to determine which developer’s change broke the build. Performing a build based on fewer changes, with the idea of building on each saved change, allows for quicker troubleshooting when errors occur.

CI results in the following outputs, regardless of success or failure:

Fast feedback: The results of CI should occur in minutes. Any errors that prevent the successful completion of CI will be given in a short amount of time, resulting in fewer delays.
Deployable artifacts upon success: With a successful build, a build package able to be deployed into non-production environments will be produced.

CI tools such as Jenkins, GitLab pipelines, and GitHub Actions form the basis of the automation that allows the CI of code to occur.

Trunk-based development

Multiple developers in a single team or even multiple teams (such as the team of teams that is present in an ART) often must work on the same code base that is saved in version control. Version control systems allow for parallel development of the same code base by allowing changes to be branched out. A developer or team could work on a branch without their change affecting the rest of the team or ART until it was ready to merge and be shared with other developers or teams.

Figure 11.5 – Branching structure example

In the preceding illustration, Team 1 and Team 2 have their own branches of the code base, based on different versions of the main branch (commonly referred to as the trunk). A developer on Team 1 has created a change (D1) and merged it back into the Team 1 branch as a change, T1-2. With separate team branches, how do we know that a necessary change from Team 2 is visible and can be used by Team 1?

Another problem occurs as Team 1 and Team 2 develop on their branches without receiving updates from the trunk and wait to merge when they release or at the end of the sprint. Keeping track of the multiple changes from multiple teams and resolving a large number of merge conflicts results in an incredible challenge.

To keep things simple and ensure changes are visible to all teams, we want to avoid branches that are permanent or last a long time. Developers and teams may form branches to allow for parallel development, but when ready, they must merge back to the main branch, destroying the offshoot branch in the process. This process is known as trunk-based development. The following diagram highlights the process of trunk-based development:

Figure 11.6 – Branching structure with trunk-based development

Trunk-based development allows for easier merge operations to occur since a merge to the main branch is happening on each validated change instead of a group of changes.

Gated commits

With trunk-based development, we are merging changes to the main branch of the code base as often as possible. This main branch is used by all the teams on the ART. Since the integrity of the main branch is vital for multiple teams, how do we ensure that any errant changes don’t break the current code base?

To ensure a robust code base, we must follow a gated commit process where before a change is allowed to merge with the main branch, it must successfully pass the build and test process. Additional measures, such as a review of the change, may also be taken.

In Git-based environments, Git servers from Bitbucket, GitLab, and GitHub define gated commits as pull requests or merge requests that allow for closer scrutiny when a merge operation is requested.

Testing practices

We saw that build processes relied on testing to ensure that changes to a code base did not adversely affect the functionality or the security of the product. The tests that were run proved to be important since ideally, the build process would be performed on every saved change and if successful, the next step would be merging the change to the main branch, which is visible to the team or multiple teams in the case of the ART.

The build process involves two types of tests that are run against a potential new version of the code base: automated unit testing and static analysis for application security.

Let’s take a deeper look at each type of test run as part of the build process.

Automated unit testing

Unit tests are often written at the same time as code, if not beforehand. These unit tests may be run by an individual developer on their workstation while the code is being developed. If that’s the case, why run them again as part of the build process?

The main idea of CI is to ensure a standard, reliable process. Automation through a CI/CD pipeline ensures that this occurs for all developers. Adding unit testing during the build process on an automated CI/CD pipeline ensures that the unit tests are run on every developer’s code change every time.

It’s also important to make sure that any unit tests that have been updated are simultaneously part of the potential change to the code base in version control. This ensures that code changes are validated against correct tests preventing a situation where the CI/CD pipeline stops because of incorrect tests. Collaborative development efforts between those creating the code and those creating the tests are required to ensure this situation doesn’t occur.

Static analysis for application security

Static analysis is a process by which a tool scans the text of the code base, including the potential code change that is being checked in, to find specific text patterns. These text patterns can be used to identify the following issues:

Coding errors
Known security vulnerabilities
Non-adherence to coding guidelines or coding standards

The analysis is performed without the need for executing the application. Because of this, static analysis is an efficient means of checking for problems in the build process.

As we saw in Chapter 3, Automation for Efficiency and Quality, static analysis can fall into the following two categories:

Static code analysis: Look through the code for possible coding errors. Linting is an example of static code analysis.
Static security analysis: Look through the code for possible security vulnerabilities and attack vectors. Applications that perform static security analysis may perform the following scans:
- Dependency scanning: Scanning code dependencies and references to third-party libraries to find vulnerabilities
- Static Application Security Testing (SAST): Scanning code to find attack vectors and vulnerabilities
- License compliance: Scanning libraries to determine their opensource licensing model
- Container scanning: Scanning Docker containers to find embedded vulnerabilities
- Secret detection: Scanning code to find embedded credentials, keys, and tokens

Our application has passed the first set of tests, but is it ready for the rigors of a production environment? To answer that question, we look to perform system-level testing. Our next section examines the practices that enable system-level testing.

Testing end to end

At this point, we have performed tests on individual pieces of code and ensured the correct functionality while maintaining security. Here, we start to integrate the new changes of code with the existing code base and evaluate the system as a whole by testing the system end to end.

The practices that allow for true end-to-end testing of the system will be examined in the upcoming sections. Let’s dive in.

Equivalent test environments

System-level testing should be performed in an environment that resembles the production environment as closely as possible. Testing the solution in an environment with as many similarities to the production environment as possible enables higher confidence that the solution will work when actually released into the actual production environment. The more similarities a test environment has with the production environment, the fewer variables come into play when problems are found and troubleshooting for the root cause begins.

A key factor in ensuring the equivalence between test environments and the production environment is the use of configuration management. With configuration management, key resources such as the operating system version, versions of key drivers, and versions of applications are recorded in a text-based configuration file. Ideally, the configuration file is maintained in version control with labels that indicate the version of the solution and its application in the test environments and production environment.

Because the cost of allocating exact duplicates of the resources in production may be prohibitive, the important view is that exact versions of resources, rather than the exact number of resources, are key to maintaining equivalence.

Test automation

System-level testing can encompass a variety of levels, most of which can be automated. When looking at a variety of tests at various levels, which of these tests could be automated?

In addition to the testing pyramid mentioned earlier, we may need to consider the people who need to understand the test results. A second consideration is whether the test is to verify that the solution is meeting requirements or whether the test is to allow developers to see whether their design approach is correct.

The Agile testing matrix looks at the various kinds of tests and organizes them from these considerations. The following diagram depicts the Agile testing matrix as seen in the SAFe article on Agile Testing (https://www.scaledagileframework.com/agile-testing/):

Figure 11.7 – Agile testing matrix (© Scaled Agile, Inc., All Rights Reserved)

We can see from the preceding diagram that the first consideration looks at the perspective of either the business or the technology. Developers look at the technology tests to ensure the correct functionality and proper operation of the solution. End users look at the business-facing tests to ensure an understanding of the solution and the validation of the benefit hypothesis.

We can also see the second consideration: whether the test informs the complete solution or the implementation. Tests that guide development assist in TDD and BDD approaches where the test is written first. Tests that critique the product look to see whether the solution complies with user requirements.

With two areas of concern within each of the two considerations, we can divide tests into the following four quadrants:

Q1: These contain unit and component tests. These tests may be created as part of a TDD approach.
Q2: These contain functional tests and tests for stories and features. These may be created using a BDD approach to allow for automated testing. Otherwise, some of this validation may be manual.
Q3: These are acceptance tests of the entire solution. These may be the final validation before release. These are often manually run with alpha and beta users.
Q4: These test overall system qualities, including NFRs. These verify the system in the production environment.

We will see that tests in Q3 are done during CD in Chapter 12, Continuous Deployment to Production. Tests in Q4 are done during Release on Demand as mentioned in Chapter 13, Releasing on Demand to Realize Value.

Management of test data

A key part of ensuring the similarities between a testing environment and the production environment is the data used to test the solution. Using data that could be found in production environments allows for more realistic test outcomes, leading to higher confidence in the solution.

Realistic test data can come from either synthetic test data or real production data. Test data may come from a backup of production data restored into the test environment. The test data should have any information that is considered private removed.

Synthetic test data is fake data created by a data generation tool such as DATPROF Privacy and Gretel. It offers the advantage of not requiring an anonymization step to redact private information.

Regardless of whether the data is anonymized production data or synthetic data, the test data should be maintained in version control, using artifact repository software for large binary-based data.

Service virtualization

Service virtualization allows for test environments to behave like production environments even when the test environment is missing resources available in production. The production environment may have crucial dependencies on key components that are impossible to copy because of the following factors:

The component is not complete yet
The component is being developed by a third-party vendor or partner
There is limited access to the component in a test environment
The component is difficult to configure in a test environment
Multiple teams with differing setups require access to the component
The component is too costly to use for performance or load testing

Systems made up of components that communicate together using well-known interfaces can take advantage of service virtualization to simulate the behavior of one or more components. If the virtualized service is a database, it can return synthetic test data.

Components that are simulated in test environments are called virtual assets. Virtual assets are created by tools that measure the true component’s behavior by the following methods:

Recording the messages, responses, and response times of a component as it communicates on a common channel or bus
Examination of the component’s logs
Viewing the service interface specifications
Manually applying inputs and measuring behavior

Once the virtual asset is created, it takes its place in the test environment. An illustration of the difference between the production environment and the test environment with virtual assets is as follows:

Figure 11.8 – Production vs. test environment

Popular tools for creating virtual assets include SoapUI and ReadyAPI from Smartbear, MockLab, Parasoft Virtualize, and WireMock.

An important difference to consider is that while service virtualization may seem similar to mocking or stubbing a component, the two concepts are not similar. Adding a mock component or a stub may be done during development when the component is not ready for release. The behavior of a mock object only returns one type of output – a success message – so the development of other components is not impeded. Service virtualization allows for the proper behavior for a wide variety of scenarios.

Environments with virtual assets should be maintained in configuration management tools. The configuration files and interface definition files for virtual assets should be kept in version control, in close proximity, and with labels identifying their role as test assets for versions of an application.

Testing nonfunctional requirements

As we perform end-to-end system testing, we need to remember the constraints our system has, which we have previously identified as NFRs. NFRs affect every story and feature, acting as a constraint that must be heeded. Qualities such as security, performance, reliability, and scalability, among other things, should be examined through testing to verify that these constraints aren’t broken.

The testing of NFRs is often automated, involving specialized testing tools. Agile teams on the ART often work with the system team to ensure that the tooling is established to perform testing for NFRs as part of the end-to-end testing.

After the portfolio of tests is performed on our change, we may want one more opportunity to see whether our change is ready to be deployed to production. We place our changes into a staging environment, a stand-in for the production environment for a final examination before deploying changes to production. Let’s look at the activities involved in deploying changes to staging.

Moving to staging

We may want verification that we can deploy the change to a production-like environment and verify that our solution still works. To enable this last look, we employ certain practices.

Let’s look at these practices in depth.

Staging environments

A staging environment is a facsimile of the production environment, which has several uses throughout the PI. It is the place where demonstrations of the system as it currently stands are performed in the system demo. User acceptance testing can be performed in this environment, which is as close to production as possible.

As the changes to the product are being developed, the staging environment shows the state of change before deployment to production. At the very least, changes to the staging environment happen every sprint or iteration for the ART. More frequent changes are allowed as long as the build process and end-to-end testing are completed successfully.

A staging environment may also act as an alternative environment for production in a configuration known as blue/green deployment, which may allow for easy rollback in the event of production failures. Let’s take a look at this configuration now.

Blue/green deployments

In a blue/green deployment, you have two identical environments. Of the two environments, one is the production environment and facing traffic, while the other is idle and on standby.

The idle environment receives the latest change where thorough testing occurs. At the appropriate time, the change is released by making the idle environment live and the other environment idle. This transition is illustrated by the following graphic:

Figure 11.9 – Blue/green deployment release of a new version

If problems are discovered, a switchback can be made to roll back changes. This transition back and forth is easy for systems that do not track states. Otherwise, blue/green deployments must be carefully architected so that components that are capable of storing states, such as databases, do not get corrupted upon transition.

System demo

At the end of every sprint or iteration in the PI, after each team’s iteration review, the teams get together to integrate their efforts. Working with the system team, they demonstrate the current state of the product as it stands so far in the PI in the staging environment. Business owners, customers, and other key stakeholders of the ART are present at this demonstration to view progress and supply feedback. This event provides fast feedback on the efforts of the ART so far.

Note that the system demo does not prevent the deployment of changes into production. That may continue to happen automatically as part of CD (which we will visit in the next chapter), but feedback may prevent the release of the change to customers until changes resulting from the feedback make their way into production and are released on demand.

Successful testing in a staging environment gives us confidence that our change has the correct functionality and is robust enough in a production environment, but the only true way to prove that is to deploy our change into the actual production environment.

Summary

In this chapter, we continued our discovery of the Continuous Delivery Pipeline by looking at CI, the part that implements the features created in Continuous Exploration. Features are divided into more digestible stories. Development of not only the product but also the tests to verify the product begins. Security and designing for operation concerns are included in the development.

The build phase introduces automation into the pipeline. When a commit to version control occurs, unit tests are run to ensure continued, correct functionality. The build is also scanned for coding errors and to find security vulnerabilities. If everything is correct, the commit will be allowed to merge with the main branch, or trunk, of the version control repository.

A successful build can trigger further testing in a testing environment that may be similar to the production environment. Here, system-level, end-to-end testing happens to guard against any production failures. The testing here is as automated as it can be. Accurate test data and service virtualization may offer a reasonable facsimile to a production environment for testing.

When building and testing are complete, the change may find itself in a staging environment, a copy of the production environment, or one-half of a blue/green deployment. A staging environment is also a place where changes are shown during a system demo, an event where the ART receives feedback on the development of the system at the end of each sprint or iteration.

After making its way to the staging environment, we must move our changes into production. That happens in CD, which we will explore in our next chapter.

Questions

What are two examples of collaborative development?
1. Solo programming
2. Pair programming
3. Gauntlet programming
4. Mob programming
5. Cross-team programming
What is the first step in TDD?
1. Write the test
2. Write the code
3. Refactor the test
4. Refactor the code
When performing trunk-based development, a successful build and test will allow the committed change to merge with which branch?
1. Release branch
2. Fix branch
3. Main branch
4. Test-complete branch
According to the testing pyramid, what types of tests are the quickest to execute?
1. Unit tests
2. Security tests
3. Story tests
4. User acceptance tests
What text-based artifacts should be stored in version control?
1. Code
2. Tests
3. Configuration files
4. A and C
5. All of the above
What can be used to allow test environments to be similar to production environments?
1. Using old production servers
2. Sanitized backups of production data
3. Service virtualization
4. B and C
5. All of the above
What can a staging environment, identical to the production environment, be used for?
1. User acceptance tests
2. An idle environment for blue/green deployment
3. System demos
4. All of the above

Table of Contents for
Chapter 11: Continuous Integration of Solution Development

11

Continuous Integration of Solution Development

Developing the solution

Breaking down into stories