12
Case Study

A saying in the military field attributed to General Omar Bradley states that “amateurs talk strategy. Professionals talk logistics”. This maxim also applies to the world of software testing and even more to system-of-systems testing. We have described in the previous chapters many aspects related to testing, each in isolation. The interactions between the actors and the processes make the logistical aspects become as much – if not more – important than the methodological aspects, to achieve the desired level of quality.

Below is a case study, with various development models and functional areas. We will thus be able to understand how the various preceding chapters interact within the framework of the tests of systems-of-systems. This case study merges various clients that the author has had the pleasure of working with, so it is not representative of any one client or industry. The proposals and suggestions for implementing test activities are the author’s suggestions and not the choices of companies that served as inspiration.

A common point is the difficulty of having all the elements necessary to control the quality of the systems-of-systems to be delivered. The most advanced techniques, resources, testing tools, environments are necessary, but they are constrained by the reality of limited budgets, very short deadlines, siled visions and limited short-term objectives.

12.1. Case study: improvement of an existing complex system

This case study does not relate directly to a system-of-systems, but to complex systems produced for a very large industrial company, present in many countries, tracking objects that are tracked by telemetry and interfacing with different ERPs depending on the country or continent. The integration of these proprietary systems with software packages (ERP, CRM and WMS) may allow the use of the term system-of-systems.

12.1.1. Context and organization

12.1.1.1. Context

The organization of the company reflects the organization of sales according to a geographical aspect by continent: EMEA (for Europe and Middle East + Africa), NAM and SAM (for North America and South America) and APAC (for Asia and Pacific). An international matrix organization brings together the types of products supported and marketed and the development and quality resources are dispatched according to the annual needs of each geographical area and each type of product.

The software is all a few years old, having been initially developed sequentially and then having – in varying proportions – taken the turn towards agility, with three-week sprints. The latest deliveries of many products have revealed a significant number of failures during the UAT (User Acceptance Testing) phase. It was therefore decided to implement automated testing activities in order to achieve the organization’s long-term objective, which is to make continuous integration and continuous delivery and continuous testing (DevOps CI/CD/CT).

The objectives are multiple:

– improve quality without negative effect on delivery times;

– set up processes that can be followed regardless of the development team;

– design metrics and KPIs to measure the increase in the level of software quality;

– limit costs (e.g. automated tools, test environments, etc.) and hiring given the shortage of adequate manpower;

– set up end-to-end tests (E2E) to ensure that an order entered on one system is correctly processed by the other systems, from its manufacture/production to its delivery and storage at customers (including tracking of objects and pallets to ensure real time), invoicing and storage of information for statistical purposes.

The tests in progress at the time of our intervention are limited to functional tests via the user interface; some of these tests are automated with a “no code” type capture/replay test tool, and grouped in an Azure DevOps type framework. Development activities are dispatched between Europe (Paris and Madrid), North America (USA and Canada) and South America (Argentina) as well as Asia (Singapore and Jakarta).

At the time of our involvement with this client, the documentation available to determine the actions and processes of the SQA team was mainly a collage of recommendations found on the Internet or taken from the ISTQB Foundation Level. The use of the development framework and a “No Code” type capture/replay tool being chosen, the monitoring and reporting elements were limited to what these tools and this framework made it possible to obtain, without focusing on the real needs of the stakeholders.

12.1.1.2. Organization of sprints and impacts

The sprints of each project were organized as described in Figure 12.1. This organization included a sprint (sprint 1) where new features were developed, which were tested in a test environment in the next sprint (sprint 2), anomalies identified that could be corrected from sprint 3 and the regression tests – manual or automated – were carried out from sprint 3. This solution made it possible to have a repository of regression tests close to the needs of the project. The duration of the sprints varied between two and three weeks. A grouping of several sprints, like the SAFe philosophy, made it possible to manage release deliveries to customers and users.

As the evolution progressed, the automation became slower and the complexity linked to the applications caused a delay in the automation. This resulted in an increase in the number of manual regression tests and therefore an increase in the duration of manual regression tests, going so far as to take the entire duration of a sprint.

Schematic illustration of case study: project organization.

Figure 12.1 Case study: project organization

It is easy to note that the testing activities of a sprint started at the next sprint or even at sprint 2, thus going against one of the seven basic principles of testing, namely to test early. The result was that (1) anomalies were found late and corrected even later; (2) defect remediation charges not being factored into the sprint, many anomalies were not being fixed, increasing technical debt; (3) the automated tests only covered regression and therefore had little chance of discovering anomalies.

12.1.2. Risks, characteristics and business domains

12.1.2.1. Risks and challenges

The challenges for the SQA (Software Quality Assurance) team are:

– Ensure the quality of software with several years of existence and continuing to be developed, by guaranteeing the absence of regressions and the increase in quality, for new developments, and also for older developments with technical debt.

– Be recognized as specialists, both in the functional areas (production, delivery, monitoring and maintenance of products and services marketed) as well as in the organizational and relational aspects with the development teams, and indeed understood in the provision of quality assurance services (functional testing, monitoring of anomalies, identification of root causes, continuous improvement, etc.) and the achievement of defined objectives.

– Improve the automated tests – or not – previously designed. Indeed, a large number of tests did not include a phase of comparing the results obtained with the expected results and were content to ensure that the test case was executed without crashing, the majority of the tests required manual actions (e.g. selection of data or creation of search criteria), and therefore, the continuous presence of a human being to supervise their execution.

– Generate traceability of test cases from and to Epics, Features, User Stories and other PBIs (Product Backlog Items) in order to allow measurement of requirements coverage. It should be noted that this will have to take into account old developments which could not be documented.

– Generate a subset of automated tests – such as Smoke Tests – that can be run without human supervision, for example, at night and/or on virtual machines, to guarantee the quality of the products delivered. This would allow SQA team members – during the day – to design tests and analyze tests that ran overnight. Automated tests should be able to be run at night so as not to immobilize the positions of the SQA team. Wherever possible these automated tests should be modular and cover end-to-end functional activities across all systems. This implies that the test automata must be demonstrated to be reliable, therefore considering the – periodic – injection of faults in order to ensure their reliability.

– Implement end-to-end automation of the DevOps type in order to accelerate and make reliable the delivery of new functionalities. This must cover both the aspects of automatic verification of anomaly corrections (focused regression tests) and also the identification of possible side effects focused (depending on the same functionality or the same component) and transverse (impact affecting several features).

– Extend testing beyond purely functional verifications via user interfaces to focus on the most significant risks, namely, the processing of production, transport, storage and distribution flows of products, mainly processed via APIs. This involves understanding the business operation of each application and how the various applications interact within the system-of-systems and defining E2E tests that can cover several separate systems. Initially, it was decided to cover only APIs exposed to the outside and not all APIs.

– Propose methods for improving the design and developments in order to limit the introduction of defects in the code (e.g. via techniques such as specification reviews, TDD, code reviews and static analyses). This should go as far as the design of BIT (Built-In Tests) and CBIT (Continuous Built-In Tests) whose objective is – during production – to allow the application to determine its operating state and to alert problem monitoring processes.

– Propose areas for improving or extending the quality of software in order to test computer security and measure performance. In the context of API testing, this includes, for example, identifying and responding to DOS or DDOS type attacks, as well as checking for malformed messages or transactions. In the context of performance tests, it will be necessary to process both transactions by user interfaces and transactions from objects tracked via telemetry.

A major risk is rejection by the various teams – development, business, etc. – of the recommendations for improvement, whether because they are too cumbersome to implement, or too costly, or introduce delays in deliveries.

Finally, it will be necessary to consider that the improvements can never all be carried out at the same time, that some of these improvements have dependencies and that the systems are being converted from a legacy architecture on proprietary systems to a cloud architecture with software packages and redesigns with more powerful languages, as well as centralized systems administration.

12.1.2.2. Characteristics to cover

Identification of the current state:

– quality objectives determined by the business;

– quality level of the tests;

– software quality level;

– definition of targets, KPIs and metrics.

The quality characteristics (see ISO25010) covered are mainly functional. The test documentation (Test Policy and Test Strategy) did not reflect what was being done or what was achieving the desired objectives. In order to ensure that the objectives of the new SQA cell meet the wishes of the businesses and the production teams, brainstorming sessions have been set up to identify the expectations and the means to be implemented.

Existing tests, previously developed by inexperienced testers, quickly demonstrated their limitations (e.g. need for specific test data, lack of independence or modularity). It was therefore necessary to analyze them and replace them as needed.

12.1.2.3. Business domains

The business areas to be covered related to the three major system businesses:

– production and monitoring thereof;

– bulk distribution and transport;

– distribution of reusable components (therefore, their delivery and return).

For each business, on-board equipment (e.g. Android tablets), remote components (IoT telemetry modules) and heavier equipment (PC type and heavy clients) interacted with servers. Each business application was configurable and the behavior differed according to continents and/or countries. In addition, the systems were linked, among other things, to software packages such as ERP, CMS and WMS, which could be different depending on the continent, generating significant combinations.

12.1.3. Approach and environment

The initial approach focused on functional testing performed through user interfaces only. The long-term objective is to move towards DevSecOps. A “no code” test tool (not requiring coding) was chosen for the automation. Investments having already been made, it was not possible to change to another tool.

Limiting the approach to only user interfaces did not allow us to address the aspects of IoT which formed the majority of the functionalities of the applications. An adaptation of the testing approach was therefore necessary to cover performance – monitoring was carried out only on production – security – handled by a separate team – and also the APIs which formed the basis of the functionalities provided by the systems.

The approach could not be disruptive: it had to continue to provide service for current applications and improve this service on an ongoing basis.

In addition, testing – and development – was done in silos, so we had to think about how to combine testing in such a way as to allow end-to-end – automated – validation of features for the entire organization.

12.1.3.1. Approach

Imposing quality improvement by force never works; we have to raise awareness of the need and take small steps, while maintaining a long-term vision. The long-term vision included:

– orientation towards DevOps, therefore continuous development, continuous integration and continuous testing, but allowing to keep traditional approaches;

– greatest possible automation of tests in order to be able to launch them without human supervision (e.g. at night);

– need to ensure that the functionalities delivered to customers do not suffer from regressions, whether these are regressions due to changes in the tested application or its execution environment (DBMS, OS, etc.);

– need to harmonize the tests whatever the application to allow a tester to take over from a colleague or to support them without the need for long training;

– design of tests driven by keywords (Keyword-Driven tests) on each of the applications of the system-of-systems, to allow functional E2E tests on the system-of-systems. This implied that tests could be recursive, be run from another system, and be able to centralize execution results;

– automatic execution of API and message system tests, so as to verify the Backend without having to go through the Frontend (or the user interface);

– the need to consider a test environment that could completely simulate production and all applications, to test end-to-end, if necessary with ERP, CMS or WMS. This would allow for performance testing and security testing, in addition to actual End-to-End testing.

The approach was carried out at two levels:

– at the managerial level, to replace the Test Policy and the Global Test Strategy which did not reflect the objectives of the organization, and therefore propose a long-term vision that can be accepted by all;

– at the level of each project, to continue existing activities and propose improvements that would go in the same direction as the Global Testing Strategy.

12.1.3.1.1. Bottom-up approach

The approach implemented on each of the projects was to measure the defects (taking at least six months of history to have a representative sample), to seek the root causes of these defects – as far as possible – then to group these defects according to various axes (e.g. main functionalities, technical component and sub-component, etc.). This made it possible for the design teams to pinpoint the components that deserved to be redeveloped (those that concentrated the defects), to make the team managers aware of the need to measure their velocity and the ability of their teams to simultaneously develop new functions and correct defects to avoid an increase in technical debt.

At the level of the development and test framework used, a reflection was carried out on each test project in order to determine the best organization of the directories, while keeping in mind the need to adapt whatever the development cycle. In the reflection, the ease of understanding for newcomers was also considered. The solution that emerged was a grouping by major functionalities and sub-systems, with an identification of PBIs and Sprints to identify what was specific to each sprint, while retaining the ability to more easily identify the effects of edge associated with the same functional areas.

An update of the “Test Policy” and “Test Strategy” documents has been started to obtain documents that can generically cover systems-of-systems and hybrid traditional and Agile approaches.

12.1.3.1.2. Top down approach

The proposed approach sensitized managers to identify their felt needs, through brainstorming sessions to identify areas for improvement. The results of the bottom-up approach above have been used to show the current state of play and raise awareness among all stakeholders. The main difficulty with this approach was, on the one hand, to find an adequate time slot for all the participants and, on the other hand, to maintain the interest of all the participants during the meeting.

12.1.3.2. API testing

API testing became a necessary activity to cover the verification and validation needs of messages exchanged between applications (IoT). We had the opportunity to focus on one application and a limited number of APIs, which allowed us to use these APIs as pilots and set up coherent processes that could be used to automate testing API on this application and on others in the future.

It was then considered to use the exchange files received and generated by APIs, as data sources for testing the APIs of other system-of-systems applications, to be able to set up end-to-end tests between the various systems.

12.1.3.3. Implementation

We involved the SQA team as quality ambassadors. It meant:

– To identify the level of software quality, for example, by showing the number of anomalies not corrected to date (the backlog of anomalies) as well as the cumulative curves of anomalies discovered and closed. These curves, put in relation with the dates of the planned delivery milestones and the capacity of the production teams to correct the anomalies in the duration of the planned sprints, made it possible to highlight the significant technical debt on certain software.

– To identify the deadlines for correcting anomalies, to take into account that an anomaly detected in a sprint can only be corrected in the following sprint and retested in the sprint which will follow that one, if a strong decision is not made not taken for a quick correction of anomalies, or the implementation of refactoring and cleaning sprints every three development sprints.

– To ensure that the members of the SQA team were competent both in the technical aspects of testing (methodologies, techniques and tools), in the aspects of development and interaction between Development and Tests, of fault injection according to the development phases, and also in the transverse functional aspects in order to quickly identify the impacts on the other applications of the complex information system. In addition, the members of the SQA team had to be quality ambassadors and realize that they were unable to test their applications exhaustively, but that they had to rely on the goodwill of the development teams. This involved important relational skills.

– To identify areas for improvement and find the right arguments to justify their implementation. This includes the identification of the need for dedicated test environments, the implementation of processes for publishing new versions without destroying the version previously installed on the SQA workstations, the management of data versions and their storage in order to design quickly dedicated environments for each of the countries to be covered, given the differences in processing, regulations and settings.

– To raise awareness that quality could not be added at the end of the project, but had to be present throughout the development cycle, and had to be measured to identify each of the development phases where improvements could be implemented work. This required – for the members of the SQA team – a change in their way of perceiving their tasks, among other things to take a step back from the anomalies found in order to determine whether the corrective effort – and continuous improvement of the processes to implement – was worth the investment.

– Finally, to offer a range of software quality improvement services, from their design to their maintenance and their end of life (obsolescence and replacement by another system). These services had to be pragmatic and cost-effective, modular and able to be implemented without involving major modifications that would lead us into a dead end.

The implementation was proposed in three phases: short, medium and long term. The short-term phase implemented risk-based testing with the use of all the testing techniques mentioned in Chapter 9, grouping according to business functionality, and functional test automation documented and shared between all applications. The medium-term phase was based on the previous one and developed API tests and messages exchanged between applications. This revolved around the implementation of test engines driven by keywords (KDT – Keyword-Driven Tests) to allow the execution of test suites in a similar way regardless of the system or application. The principle included the reuse of output data from one application to use as input to another application. Thus, a consistent test environment could be developed even though the interactions were asynchronous. The longer-term phase, depending on whether or not it was possible to have a complete test environment – allowing step-type testing – relied on risk analysis to design tests on quality characteristics that were not adequately covered.

12.1.3.4. Environments

Initially, the test environments, the teams were limited per project to an environment shared with the teams in charge of the UATs carried out by the customers and their representatives. This did not make it possible to make a real separation between the failures found by the UATs and those found by SQA. This resulted in a negative view of SQA actions that did not detect defects quickly enough (at least to the liking of users and their representatives). In addition, the lack of a dedicated environment forced development teams to share the SQA environment to simulate production environments to reproduce certain anomalies. This temporarily blocked the execution of test campaigns.

Two activities were carried out in parallel to limit these points:

– set up a process for publishing builds that is as automated as possible in order to avoid problems with creating incomplete builds;

– determine the possibility of having environments dedicated to SQA and entirely managed by SQA. The objective here was – in addition to controlling the software versions – to control the configuration data allowing each country to be simulated.

A second aspect concerning the environments focused on the needs for E2E tests which involved the use of all the applications present in production, including CRM, ERP and WMS. Initially, integration tests were done with mock objects, which made it impossible to ensure end-to-end operation. Setting up and maintaining end-to-end environments – including synchronizing data between systems – would have become extremely expensive and complex, but would provide a significantly greater level of trust than originally existed.

Here too, it is necessary to measure whether the expected benefit is worth the investment. That is, is the number of anomalies that can be found large enough to justify the workload and the amounts invested? This can be determined by analyzing the root causes of production anomalies and the frequency of these root causes. In general, the anomalies each come from a phase of the development chain. Identifying the root cause and therefore the introductory phase of the anomaly make it possible to focus improvement efforts on this phase.

The needs analysis led to the proposal to copy the databases and settings information for each of the countries/continents, in order – by a set of backups/restores – to update the test environments to meet the needs of each campaign.

A third aspect relates to IT security and the rules imposed by the group on the use of test environments. As long as the test environments were controlled by the organization, certain tests (e.g. API testing) could be executed. On the contrary, as soon as the environments were managed by the group, the rules and directives of the group applied. The specificities and needs of the tests were not taken into account, among other things because they could represent behavior similar to a cybersecurity threat. This generated significant delays.

12.1.3.5. Deployment

The deployment of new versions and releases in test or qualification environments was also an element to improve: manual deployment, including all stages of verification, traceability, etc. took about 4 hours. Manual actions were the source of errors often requiring the deployment to be restarted. Each deployment in an environment (e.g. TEST, INT or PPROD) had to be checked to identify possible regressions. As the deployments were configured according to the countries (e.g. selection of settings and translations), these deployments required human intervention.

The objective was to reduce deployment times and secure them. Securing was carried out by testing SQL queries and conversion scripts, while the reduction in duration benefited from the automation of actions and the implementation of settings that can be used during automated tests and allowing night execution without human intervention.

12.1.3.6. Defect fixing focus

The volume of defects to be corrected, the capacity limits of the development teams and the very short deadlines forced us to look for solutions to optimize the actions of the DEV and TEST teams. Among the solutions considered, we have to:

– Focus the corrections on the components with the most failures. This was relatively easy to identify from the anomaly statistics. It should be noted that the software components were the same regardless of the country, the only difference being the configuration. Anomalies were focused by country and system and acceptance test sequences were also done by country to limit the number of environments. Statistically, everything suggested that, if the anomalies were processed by country, regressions would appear in the modules that had just been corrected.

– Focus corrections according to their priorities and severity, focusing first on RPNs less than or equal to 4 (priority or severity ≤ 2). This solution forces developers to analyze anomalies one by one and to make corrections in different modules of the system. The focus of anomalies by RPN, however, does not correspond directly to the planned content of the new version of the system, so could be considered as a sub-optimal activity. The technical debt included a significant number of old anomalies.

– Ensure that the anomalies identified in the TEST environment were really regressions and not elements that worked the same way in production. This was necessary because the approved changes were limited to a few very limited evolutions. The continuous improvement of applications – by the detection of improvable elements by the test team – was outside the scope of new versions.

– Identify the velocity of the development team to determine how quickly the defect backlog could be resolved, assuming no new defects are detected. This velocity information requires statistical velocity data over previous sprints. In the absence of information, a wet-finger estimate (e.g. one anomaly corrected per day and per person) allows some anticipation. This reinforces the need to measure – effectively and continuously – the basic KPIs to have relevant data to make decisions.

Anomaly triage and cleanup activities were required to focus remediation efforts based on business needs. We were confronted here with the contradictory needs of certain customers/users who either did not understand the philosophy underlying the applications or wanted to implement solutions specific to their needs.

12.1.3.7. Versions parallelization

Version and configuration management present a particular challenge. On the one hand, the applications and projects do not have the same objectives or the same way of managing versions: some projects delivered a common version of software, but specific parameterization or configuration files for each customer, which impacted the behavior and user interfaces. On the other hand, the mode of development and delivery of applications varies, with specific versions (branches) by country or by customer. A rapid merger of the branches was planned in order to manage only one major branch, but as long as this merger is not carried out, several branches coexist.

Another important element in version management relates to the availability of dedicated environments. In general, there was a production version (which we will call VersionN) in the PROD environment, a version corresponding to the sprint being qualified (VersionN+1) in the SQA environment, and a version in the DEV environment corresponding to the one under development (VersionN+2). When an anomaly was identified in PROD, and in the absence of a dedicated environment, it became necessary to stop the VersionN+1 tests and install VersionN on the SQA environment. Version N+1 must then be reinstalled on the SQA environment when the analysis of the anomaly is complete.

For another application, there was a VersionNE in production for Europe, a VersionNE+1 undergoing UAT (User Acceptance Testing) for Europe, a VersionNA+1 version under development and testing for America – a branch containing specificities for America in addition to the specificities of the Europe branch – and developments of a future version VersionNE+2 relating to developments specific to the Europe branch. The number of environments dedicated to testing (SQA environments) was limited for cost reasons, so the testing of each version had to be properly anticipated to make the best use of the environments. We could consider this limited number of environments as bottlenecks in the sense of production management (Goldratt and Cox 2001) and apply the same optimization rules as those proposed.

12.1.3.8. Risk management

Risk management had not been implemented, which had the impact that tests and developments were not properly prioritized, with testers sometimes focusing on points of detail while neglecting more important elements (either for the project or for the product).

It was decided to define a risk calculation method (impacts and frequency) based on five frequency levels, five impact levels and grouping the risks into five categories, each with its level of effort.

Condition associated with probabilityProbabilityEstimationValue
Event will almost certainly happen90+%Very high probability1
Event will not happen only under optimistic conditions70–89%High probability2
Event might or might not happen under normal circumstances. No evidence one way or the other31–69%Average probability3
Event will only happen in pessimistic circumstances11–30%Low probability4
Event will only happen in very pessimistic circumstances0–10%Very low probability5

The impact was estimated according to three axes applicable only on the application or the project (system level and not system-of-system level) as described below.

CostFeaturesScheduleSeverity/impact
More than 25% additional cost, seriously out of budgetInability to deliver the application or system, or to meet acceptance criteriaMore than 25% delay or inability to achieve system-of-system functionality1 Critical
10%–25% above budget, well over budgetInability to achieve a key criterion and no circumvention10%–25% late or unable to reach a major milestone2 High (major)
5%–10% over budget, limited budget overrunInability to meet a criterion and possibility of circumvention5%–10% delay without the need for major rescheduling3 Average (important)
Additional cost of less than 5% but significant impact on the budgetInability to reach a criterion but negotiable with the customer/userLess than 5% delay or delay that can be contained4 Low (minor)
No budget impactNo impact on functionality or performanceNo impact on milestones or delivery date5 Very low (no impact)

RPN# is calculated by multiplying the probability value by the severity/impact value. We will therefore have these values ranging from 1 to 25. They are grouped into five risk levels, each associated with a level of testing effort and testing techniques to be implemented, as described in the following table.

Risk levelRPN#Planned effortTechniques to implement
I1–5Design and code reviews, static analysisEquivalence partition (EP)
Boundary value analysis (BVA)
Extensive testing: full feature coverage with multiple tests for each techniquePath coverage (STT)
Exploratory tests (ET)
Automated regression testing
II6–10Design and code reviews, static analysisEquivalence partition (EP)
Path coverage (STT)
Extensive tests: coverage of all features with a test for each featureExploratory tests (ET)
Manual regression tests
III11–15Peer reviews or static code analysisPath coverage (STT)
Superficial tests: partial coverage of the most important featuresExploratory tests (ET)
No regression testing
IV V16–20 21–25Peer reviewsExploratory tests (ET) No tests
Opportunistic testing: if we get into the feature (for whatever reason), it will be tested
No regression testing
No reviews
No tests: only report defects
No regression testing

12.1.3.9. Shift left

We have previously seen the interest of finding defects quickly after their creation. This aspect has also been identified here and the activities of the test teams have been extended to reviews of Specifications, Requirements and Code.

For this, it was necessary to:

– Train testers in business needs, which required several targeted training sessions. A solution used to circumvent this was to recall how to implement the essential testing techniques and to define the level of testing effort to be implemented according to the identified risks.

– Train testers in reading programs and code, mainly on aspects of understanding the results of code analysis execution.

12.1.3.10. Test conditions and exploratory testing

Some Proof of Concept (POC) type applications required testing, but did not justify the implementation of detailed and/or automated test cases. It was decided to implement:

– a Test Plan clearly defining the test objectives and the proposed approach;

– test conditions like exploratory test charters or agreements describing what was interesting to test from a functional point of view, which test techniques to use and what maximum test duration to consider.

The definition of these test conditions allowed, on other larger projects, to submit to the business experts what the test teams planned to test, in order to ensure that it corresponded – in terms of priorities and objectives – what the profession wanted to see developed.

12.1.3.11. UAT test scenario

It quickly became necessary to design end-to-end scenarios representing business activities, in order to anticipate the business scenarios that were going to be executed during the acceptance tests. This was intended to group – within one or more scenarios – test cases executed separately, and thus ensure a verification similar to what the acceptance tests were going to validate.

12.1.4. Resources, tools and personnel

12.1.4.1. Environmental resources

In order to implement the testing activities on this system-of-systems, we first needed to test each of the systems separately. As each system was separate and independent from the others, with its own development teams – front and back – and its own user representatives, this required one test environment per system. In general, four separate systems existed for a software product: the DEV system, the TEST system – sometimes called SQA – the PPROD system – sometimes called UAT – and the production system.

12.1.4.2. Automation and tools

The tests were initially carried out manually – then with an automated test tool – on the TEST environment and then, about a month before production, the “Golden Build” or “Release Candidate” version was pushed to the environment PPROD. This allowed a control of the independence between environments, including for the test data. The tests were grouped into four sets:

– manual regression tests;

– automated regression testing;

– testing of new functionalities (evolutions);

– the correction test.

This solution allowed new test cases to be developed in parallel with the design of new features during the sprint, and then manually executed at the start of the next sprint.

12.1.4.3. Personnel

The main challenge came from the need to have profiles with several qualities simultaneously:

– mastery of test techniques to design and automate functional tests;

– understanding of the business (long to acquire) to identify the combinations of actions that are important for users;

– development skill to identify the design processes that need to be improved, in view of the root causes of the anomalies identified;

– behavior skills such as capacity for listening, diplomacy, reflection, synthesis and proposal, in order to be real ambassadors of quality with the teams.

A good knowledge of the framework used and a mastery of English being necessary, it was very difficult to find methodical, curious, rigorous and meticulous staff, to both design and execute manual tests and automated tests, analyze the results obtained and convey the right messages to the production and management teams.

In order to compensate for the potential one-off increase in work and to avoid loss of information in the event of departures, several people were assigned to each project, one with the role of main contact (assigned to 80% of their work) the others being in backup (e.g. assigned to 20% or 50% of their load). This solution made it possible to continue to provide a service throughout the project, even in the event of occasional absences.

Handling code reviews and static code analyzes has been assigned to development teams, with processes adapted to include these activities. There was therefore a difference between the “role” and the “title” where the activity of checking and/or validating the code was carried out by people who did not have the “title” of tester.

12.1.4.4. Continuous improvement

Continuous improvement aspects are extremely important in order to constantly provide greater added value to projects. This involves analyzing each element and asking how a failure identified in production could have been identified during testing, how a task could have been carried out in less time or with more efficiency.

This requires, on the part of the test manager, and also of the testers, a constant questioning: “How can we do better?” Then, the following questions will relate to the added value of the action envisaged compared to the absence of improvement action: if to improve a 2-hour task we need 4 hours of work, the game is not worth it. If, on the contrary, the 2-hour task is repeated several times a month, then it may be worth it.

In an Agile framework, during each progress monitoring meeting (Daily Standup or Weekly Planning) as well as more formally during Sprint Retrospectives, these questions should be asked, and solutions proposed. In a sequential environment, continuous improvement must be considered by the test manager in the Activity Control activity.

12.1.5. Deliverables, reporting and documentation

The deliverables generated by the test framework initially related only to the execution of test cases, without providing synthetic information on the level of quality, nor on the trends of detection/correction of anomalies. Also, traceability of test cases from/to requirements and features was not available.

Documentation of test cases, the use of GHERKIN (with Given, When, Then) did not include enough information to allow a good understanding by new testers on the project. It was therefore necessary to add information to the framework to describe the test cases. This activity slowed down the execution and required going back – with an experienced tester – on the test cases to be automated, resulting in a loss of time.

Execution reports previously focused on test cases executed and anomalies detected. There was no information related to features or components developed. The execution reports have therefore been improved to arrive at a common model covering:

– test execution, with identification of failed test cases, comparison with test execution in the previous sprint;

– coverage of PBIs and functionalities, insofar as the information was available;

– information on the defect backlog, average defect remediation time (often longer than two sprints), percentage of automated test cases and non-automated test cases.

Periodic meetings between representatives of the test teams, the project teams and the development teams took place (frequency varying from one to three times a week, up to daily meetings) to follow the progress of the tests, the corrections and deliveries of new versions, on the one hand, and taking decisions on anomalies, on the other hand (triage meetings).

12.1.6. Planning and progress

Project schedules and budgets were also impacted by decisions at group level. Among the long-term decisions having a major impact on the system-of-systems, we have:

– The replacement of individual or specialized servers and systems to merge them into a cloud-based architecture. This decision, which we will name FUSIT, aimed to merge the various architectures within a reduced number of cloud systems and to standardize the management of the systems into a single global and international entity.

– The ALIGN project to replace existing specialized ERPs into a single ERP integrating all the functionalities of each of the various existing ERPs, such as the tracking of containers, pallets and components, aspects of integration with WMS (Warehouse Management Systems), specificities related to certain countries or organizations, etc.

The constraints generated by the FUSIT project focused on access authorizations to test servers and more particularly to API tests and exchange tests between applications. Indeed, the automated testing activities resembled, for the cybersecurity team, the actions that hackers could perform. Opening ports, APIs, and other microservices to be tested required long and tedious administrative requests. In the same vein, the needs for additional servers, to cover the specific needs of different countries (therefore different configurations and behaviors), also required administrative justifications rarely related to the development and deployment deadlines of the systems.

The constraints generated by the ALIGN project focused more on the possible scrapping of systems that needed to be replaced. As the documentation of the systems was not up to date, the specificities of each country created tensions and questions for the team members. Priorities have therefore evolved and varied over time.

12.1.6.1. Planning

Each project was managed separately, with its own sprint durations, functional content and delivery dates. Formal coordination between projects seemed to be lacking, and no project management tools were used. In order to meet deadlines, certain defects identified as minor were not corrected but pushed back from release to release, increasing the backlog of defects on each of the projects (on certain projects the backlog of defects was multiplied by 5.6 in 12 months, going from 36 to 205, while on others, it had “only” doubled from 34 to 62 over the same period). On some projects, the average defect correction time increased from an average of 20 days to more than 75 days in one year, while on others, it remained fairly stable (around 24 days, i.e. two sprints).

None of the projects used formal project management tools that would have helped coordinate projects and properly manage resource contention. The workload allocation forecasts were made with tables and spreadsheets, focusing mainly on the allocation of resources according to the allocated budgets and the priorities of the moment. It was more of a follow-up than an anticipation.

The projects mainly used Kanban boards, one per project team, even though several system versions were handled by the same team.

12.1.6.2. Progress

The team size is five engineers and only two have advanced experience. Each engineer was specialized in one system, with limited experience in other systems. In the event of absence or leave, the system was left without SQA resources. It was therefore decided to assign each engineer to two applications, one where the engineer was assigned most of the time and the other where the engineer was assigned for a shorter weekly period, with smoothing over the month. In addition, certain skills were not evenly distributed within the team, and therefore regularly required engineers to be reassigned to other activities or projects, depending on their knowledge or skills.

As part of the improvement and modernization of the applications, the production teams had to replace the Janus components with Telerik components for the user interfaces. These replacements had not been mentioned to the SQA team, and their impact was only observed during the deployment of an application. Indeed, all automated tests interfacing with Janus components have stopped working. The impact was several hundred automated test cases – regression test cases – that needed to be modified to replace the Janus components with the correct Telerik components. The workload included looking at all automated test cases, which ones were impacted – presumably all of them – and the level of impact (number of Janus components to replace). Each Janus component had to be identified, its Telerik replacement had to be retrieved, and the Janus component replaced, and then the automated test cases had to be re-run to verify that the changes had not introduced any side effects in testing or verification results. This load was added to the normal load planned for a sprint, that is, the design of test cases to cover developments and corrections as well as the execution of regression tests. Obviously, this had a big impact on how multiple sprints went, especially since the team was already almost 100% loaded.

In addition to the concerns related to the inability to anticipate significant workloads to respond to technical developments (e.g. Janus replaced by Telerik), it is also necessary to take into account all the maintenance needs of the systems as well as the test loads to be anticipated for aspects of obsolescence of components, applications and systems. In a system-of-systems, these evolutions are extremely frequent and numerous, so they must be planned and anticipated as much as possible.

Schematic illustration of daily assignment tracking.

Figure 12.2 Daily assignment tracking. see www.iste.co.uk/homes/systems2.zip

No project management software was used and only spreadsheets (GoogleSheet and Excel) were used, as well as an HR management tool for the management of engineers’ time tracking sheets for:

– anticipating project activities and expenses over the year;

– monitoring the loads actually used by project (on a day by day basis);

– monitoring the loads used by project (on an hourly basis).

In Figure 12.2, we see a two-week view for the various members of the SQA team on the various projects running simultaneously. We can note that the LIM and CAN projects are planned only for the first quarter while the COB and A7F projects are planned only for the second half (quarters 3 and 4).

The first column (Prio) defines the priority for the organization, so the applications and systems to be staffed in priority.

12.1.6.3. Exchanges and interactions

Daily meetings were established between the members of the SQA team and made it possible to share information on the concerns and problems encountered by each. This allowed the other members of the team to propose solutions or alternatives, based on their experiences, thus allowing a comparison of the effectiveness and relevance of each before a final choice. The role of the manager was to state the problem, listen to the proposed solutions – some more relevant than others – and expose the constraints that could impact one solution or another.

The respect of the engineers towards each other, based on their functional and/or technical knowledge, allowed good cooperation and the implementation of efficient and profitable solutions. It was important for the manager to let the participants express themselves, try solutions, and measure the effectiveness of these solutions before deciding and explaining the reasons for this decision. Some of the manager’s choices were to reassign engineers on the fly to one project and then to another to make the best use of their skills and to allow training for other engineers. It was a way of implementing the CRM technique.

12.1.7. Logistics and campaigns

The test workload – mainly manual tests – being significant, the duration of the test campaigns was affected. Some defect corrections or even some evolutions were deployed in the test environments even though the campaigns had already started. This way of doing things allowed more flexibility for the DEV team, but introduced regressions that were not identified because the tests that could have identified them had already been executed and – in the absence of an analysis impact – had not been re-executed prior to deployment to UATs.

This formalism on the follow-up of the versions of the components delivered and being the subject of a test campaign is an important element for identifying the causes of regressions. Without a detailed impact analysis, the solution is to repeat the tests with each new delivery of a component.

Maintenance: it should be noted that the maintenance of test cases, mainly automated test cases, is a significant load in case of modification of user interfaces.

Stability of environments: any modification in the test environment, whether modification of the OS or modification – correction or evolution – of the code invalidates the results of the tests carried out previously insofar as regressions may appear.

12.1.8. Test techniques

The testing techniques implemented varied according to the level of risk identified as defined in the following table of risks.

The techniques used were:

– equivalence partitioning (EP);

– boundary value analysis (BVA);

– path testing and state and transitions testing (STT);

– exploratory testing (ET) with charts and timeboxes.

The choice of these techniques comes mainly from their ease of understanding. It is possible, when these techniques are correctly mastered, to add other techniques according to the needs and the types of anomalies to be found.

12.1.9. Conclusions and return on experience

Hindsight enables us to identify things that “should” have been done, but in the heat of the moment, we seldom have the luxury of time to decide on a solution. We end up trying to find the “least bad” solution that we can implement given the current constraints. The activities described in this case study, and the conclusions or lessons learned, are those proposed by the author with the limited information known at the time, with the aim of continuously improving the results and not finishing at a dead end. It is evident that other techniques could have been implemented.

12.1.9.1. Comments and documentation

As with development projects, it is important to facilitate understanding of the functional and technical testing environment. This includes documenting test cases and scenarios to save time when designing and maintaining tests, and also to allow better learning of the tests planned for an application. This applies regardless of the framework and language used. It is not reasonable to think that the tests or the scenarios will be clear enough to be self-documenting. This requires conscious effort when designing scenarios and test cases.

The use of frameworks where test cases are linked to requirements (or PBI – Product Backlog Items) is often not sufficient to manage several levels of traceability at the same time (e.g. component, feature in addition to Epic, Feature and User Story), so the ability to add parameterizable fields to components and bugs is a way to increase documentation. These fields can also be used when creating anomaly reports, thus allowing functional traceability in addition to technical traceability.

For long-term systems development, the “documentation” aspect is important, even though the values of the “Agile Manifesto” advocate software work better than exhaustive documentation. Given the size of systems-of-systems and the complexity of the interactions between the systems that compose them, documentation is critical.

12.1.9.2. Architecture

To correctly identify the interactions between the various components of a complex system or a system-of-systems, it is necessary to understand the architecture of these systems. This understanding makes it possible to identify the components related to each other – which makes it possible to identify possible side effects – and to ensure, during the build, that all the necessary components have been correctly identified and included.

Any evolution of the architecture going towards a more important – or less important – coupling can have an impact on the tests and the modularity of these. It is therefore important to ensure that the architecture documentation is up to date and communicated to the test engineers.

12.1.9.3. Planning and tracking

The dependencies between activities and the large number of impacted activities will create a “snowballing” effect that will lead to constant replanning of the tasks. Similarly, comparison between the initial planning (the very first one) and the revised planning should be retained to ascertain the impact of changes.

It is important to avoid changes in measurement units, and it is necessary to take into account the differences of abilities between experienced and less experienced individuals.

Kanban boards and Burndown charts focus on one project at a time. This can be a problem if an engineer is assigned to several projects simultaneously.

Assignment of one individual to several projects increases the time lost by this person, among other things, for their participation in standup meetings as well as for remembering all the ins and outs of each project. On the contrary, it provides a backup when the primary test actor for that project is unavailable.

12.1.9.4. Training and knowledge sharing

Any project can be affected by the unexpected. The Covid-19 epidemic reminded us of this recently. Therefore, it is necessary to guard against this. One way to do this is to:

– share knowledge between test engineers in order to have several engineers available – in case of need or increased load – on the same project;

– develop for each test engineer the knowledge of test techniques to provide more design patterns for testing and find different types of defects; these test techniques are described in numerous works such as Koomen (2008) and the advanced syllabuses of the ISTQB;

– develop the business knowledge of test engineers in order to provide useful test cases for the business. This is even more important for E2E testing.

12.1.9.5. Defects backlog

It is important not to let the defect backlog grow, among other things, to facilitate the monitoring of new anomalies and to avoid having to look in the entire backlog to check if a defect is duplicated. Limiting the defect backlog also ensures that defects will all be addressed within a single sprint – a refactoring sprint, for example – or included in development sprints over time. The correction of a large number of defects cannot be done within a single sprint, but will have to be done over several sprints. The duration of defect correction will be increased; customers and users will conclude that the DEV teams are not focusing on the needs of the business.

12.1.9.6. Acceptance criteria

Differences in understanding between the different actors of the project have shown that it is imperative to have exhaustive acceptance criteria, whether one works in an Agile or in a traditional development mode. It is not up to the test engineer to determine what is or is not valid in the behaviors of a system.

12.1.9.7. Test techniques

Limiting yourself to functional tests via the user interface to hope to correctly test all messages, services and APIs is a sweet illusion: management rules unknown to test engineers can impact this. It is necessary to perform automated tests of the means of inter-system exchanges (e.g. API, microservices, message processing, remote processing, etc.).

12.1.9.8. Test reliability and reviews

The results of the test campaigns must be reliable and faithfully represent reality. However, it can happen that false negatives are missed due to poor test case design. As test cases must be able to be read, understood and executed by other engineers, it is necessary that these tests are understandable and reliable. This involves checklists (to ensure checks are done correctly, to ensure results are reported correctly, etc.) and peer reviews – done between test engineers – to achieve consistency in writing tests between systems.

12.1.9.9. Reporting and synthesis

We have different reports depending on the teams:

– The test team mainly focuses on:

- the design of test cases for the evolutions and corrections present in the sprint to be tested;

- the automation of previously defined and validated test cases;

- the execution of manual or automated test cases as soon as possible;

- the creation and follow-up of the anomalies identified during the execution of the tests.

– The DEV team mainly focuses on:

- fixing defects;

- providing evolutions of system functionalities.

– The business teams (Product Experts, Product Owners, etc.) focus on the operation of the system in isolation and then within the framework of the operation of the system-of-systems in which they are introduced.

12.1.9.10. Workload estimation

The development of systems is not limited to the creation of the system alone, but must also consider the creation, growth, maintenance and updating of the system, even the disposal and replacement of the system. For a system-of-systems, corrections and maintenance will have long-term impacts and the test load will vary depending on the size of the system, the quality of the documentation and the maintainability of the code.

The use of “industry standard” values will never correspond to the context of our organization, system and applications: the constraints are specific to our domain, and the level of professionalism of our DEV team and our test team is certainly not the same as these pseudo “industry standards”.

12.1.9.11. Environments

Several separate environments should be considered and maintained:

– a DEV environment;

– a system test environment, one per version of the system in the event of different branches being developed simultaneously;

– at least one end-to-end test environment to verify the operation of the developed system within the system-of-systems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset