2 Structure-Based Testing

During the development of any non-trivial program, software structure is almost always created that cannot be determined from top-level software specifications.

—Michael Dale Herring

The second chapter of the Advanced Level Syllabus – Technical Test Analyst is concerned with structure-based (also known as white-box or code-based) test techniques. This syllabus contains eight sections:

1. Introduction

2. Condition Testing

3. Decision Condition Testing

4. Modified Condition/Decision Testing (MC/DC)

5. Multiple Condition Testing

6. Path Testing

7. API Testing

8. Selecting a Structure-Based Technique

In addition, we will cover several other topics that we feel are either required to understand these ISTQB mandated topics or extend into closely related topics that we feel a well-prepared technical test analyst should comprehend. Several of these were covered in the Foundation syllabus, but we feel it is important to go into a little more depth on these topics to help you better understand the material in this Advanced syllabus. These additional topics are as follows:

  • Control Flow Testing Theory

  • Statement Coverage Testing

  • Decision Coverage Testing

  • Loop Coverage Testing

2.1 Introduction

Learning objectives

Recall of content only.

Structure-based testing (also known as white-box testing) was briefly discussed in the Foundation level syllabus. Clearly, this is a topic that we must dig into more deeply in this book.

Structure-based testing uses the internal structure of the system as a test basis for deriving dynamic test cases. In other words, we are going to use information about how the system is designed and built to derive our tests.

The question that should come to mind is why. We have all kinds of specification-based (black-box) testing methods to choose from. Why do we need more? We don’t have time or resources to spare for extra testing, do we?

Well, consider a world-class, outstanding system test team using all black-box and experience-based techniques. Suppose they go through all of their testing, using decision tables, state-based tests, boundary analysis, and equivalence classes. They do exploratory and attack-based testing and error guessing and use checklist-based methods. After all that, have they done enough testing? Perhaps for some. But research has shown, that even with all of that testing, and all of that effort, they may have missed a few things.

There is a really good possibility that as much as 70 percent of all of the code that makes up the system might never have been executed once! Not once!

Image

Figure 2–1 Advanced syllabus testing techniques

How can that be? Well, a good system is going to have a lot of code that is only there to handle the unusual, exceptional conditions that may occur. The happy path is often fairly straightforward to build—and test. And, if every user were an expert, and no one ever made mistakes, and everyone followed the happy path without deviation, we would not need to worry so much about testing the rest. If systems did not sometimes go down, and networks sometimes fail, and databases get busy and stuff didn’t happen …

But, unfortunately, many people are novices at using software, and even experts forget things. And people do make mistakes and multi-strike the keys and look away at the wrong time. And virtually no one follows only the happy path without stepping off it occasionally. Stuff happens. And, the software must be written so that when weird stuff happens, it does not roll over and die.

To handle these less likely conditions, developers design systems and write code to survive the bad stuff. That makes most systems convoluted and complex. Levels of complexity are placed on top of levels of complexity; the resulting system is usually hard to test well. We have to be able to look inside so we can test all of that complexity.

In addition, black-box testing is predicated on having models that expose the behaviors and list all requirements. Unfortunately, no matter how complete, not all behaviors and requirements are going to be visible to the testers. Requirements are often changed on the fly, features added, changed, or removed. Functionality often requires the developers to build “helper” functionality to be able to deliver features. Internal data flows that have asynchronous timing triggers often occur between hidden devices, invisible to black-box testers. Finally, malicious code—since it was probably not planned for in the specifications—will not be detected using black-box techniques. We must look under the covers to find it.

Much of white-box testing is involved with coverage, making sure we have tested everything we need to based on the context of project needs. Using white-box testing on top of black-box testing allows us to measure the coverage we got and add more testing when needed to make sure we have tested all of the important complexity we should. In this chapter, we are going to discuss how to design, create, and execute white-box testing.

2.1.1 Control Flow Testing Theory

Our first step into structural testing will be to discuss a technique called control flow testing. Control flow testing is done through control flow graphs, which provide a way to abstract a code module to better understand what it does. Control flow graphs give us a visual representation of the structure of the code. The algorithm for all control flow testing consists of converting a section of the code into a control graph and then analyzing the possible paths through the graph. There are a variety of techniques that we can apply to decide just how thoroughly we want to test the code. Then we can create test cases to test to that chosen level.

If there are different levels of control flow testing we can choose, we need to come up with a differentiator that helps us decide which level of coverage to choose. How much should we test? Possible answers range from no testing at all (a complete laissez-faire approach) to total exhaustive testing, hitting every possible path through the software. The end points are actually a little silly; no one is going to (intentionally) build a system and send it out without running it at least once. At the other end of the spectrum, exhaustive testing would require a near infinite amount of time and resources. In the middle, however, there is a wide range of coverage that is possible.

We will look at different reasons for testing to some of these different levels of coverage later in this chapter. In this section, we just want to discuss what these levels of control flow coverage are named.

At the low end of control flow testing, we have statement coverage. Common industry synonyms for statement are instruction and code. They all mean the same thing: have we exercised, at one time or another, every single line of executable code in the system? That would give us 100 percent statement coverage. It is possible to test less than that; people not doing white-box testing do it all the time, they just don’t measure it. Remember, thorough black-box testing without doing any white-box testing may total less than 30 percent statement coverage, a number often discussed at conferences urging more white-box testing.

To calculate the total statement coverage actually achieved, divide the total number of executable statements into the number of statements that have actually been exercised by your testing. For example, if there are 1,000 total executable statements and you have executed 500 of them in your testing, you have achieved 50 percent statement coverage. This same calculation may be done for each different level of coverage we discuss in this chapter.

The next step up in control flow testing is called decision (or branch) coverage. This is determined by the total number of all of the outcomes of all of the decisions in the code that we have exercised. We will honor the ISTQB decision to treat branch testing and decision testing as synonyms. There are very slight differences between decision testing and branch testing, but those differences are insignificant at the level at which we will examine them.1

Then we have condition coverage, where we ensure that we evaluate each atomic condition that makes up a decision at least once, and multiple condition coverage, where we test all possible combinations of outcomes for individual atomic conditions inside all decisions.

We will add a level of coverage called loop coverage, not discussed by ISTQB but we think interesting anyway.

If this sounds complex, well, it is a bit. In the upcoming pages, we will explain what all of these terms and all the concepts mean. It is not as confusing as it might be—we just need to take it one step at a time and all will be clear. Each successive technique essentially adds more coverage than could be achieved by the previous technique.

2.1.2 Building Control Flow Graphs

Before we can discuss control flow testing, we must define how to create a control flow graph. In the next few paragraphs, we will discuss the individual pieces that make up control flow graphs.

Image

Figure 2–2 The process block

In Figure 2–2, we see the process block. Graphically, it consists of a node (bubble or circle) with one path leading to it and one path leading from it. Essentially, this represents a chunk of code that executes sequentially: that is, no decisions are made inside of it. The flow of execution reaches the process block, executes through that block of code in exactly the same way each time, and then exits, going elsewhere.

This concept is essential to understanding control flow testing. Decisions are the most significant part of the control flow concept; individual lines of code where no decisions are made do not affect the control flow and thus can be effectively ignored. The process block has no decisions made inside it. Whether the process block has one line or a million lines of code, we only need one test to execute it completely. The first line of code executes, the second executes, the third … right up to the millionth line of code. Excepting the possibility of an error (e.g., differing data may cause a divide by zero), there is no deviation no matter how many different test cases are run. Entry is at the top, exit is at the bottom, and every line of code executes every time.

Image

Figure 2–3 The junction point

The second structure we need to discuss is called a junction point, seen in Figure 2–3. This structure may have any number of different paths leading into the process block, with only one path leading out. No matter how many different paths we have throughout a module, eventually they must converge—or exit the module. Again, no decisions are made in this block; all roads lead to it with only one road out.

Image

Figure 2–4 Two decision points

A decision point is very important; indeed, it’s key to the concept of control flow. A decision point is represented as a node with one input and two or more possible outputs. Its name describes the action inside the node: a decision as to which way to exit is made and control flow continues out that path while ignoring all of the other possible choices. In Figure 2–4, we see two decision points: one with two outputs, one with five outputs.

How is the choice of output path made? Each programming language has a number of different ways of making decisions. In each case, a logical decision, based on comparing specific values, is made. We will discuss these later; for now, it is sufficient to say a decision is made and control flow continues one way and not others.

Note that these decision points force us to have multiple tests, at least one test for each way to make the decision differently, changing the way we traverse the code. In a very real way, it is the decisions that a computer can make that make it interesting, useful, and complex to test.

The next step is to combine these three relatively simple structures into useful control flow graphs.

Image

Figure 2–5 A simple control flow graph

In Figure 2–5, we have a small module of code. Written in C, this module consists of a routine to calculate a factorial. Just to refresh your memory, a factorial is the product of all positive integers less than or equal to n and is designated mathematically as n!. 0! is a special case that is explicitly defined to be equal to

1. The following are the first three factorials:

1! = 1

2! = 1 * 2 ==> 2

3! = 1 * 2 * 3 ==> 6   etc.

To create a control flow diagram from this code, we do the following:

1. The top process block contains up to and including line 6. Note there are no decisions, so all lines go into the same process block. By combining multiple lines of code where there is no decision made into a single process block, we simplify the graph. Note that we could conceivably have drawn a separate process block for each line of code.

2. At line 7 there is a reserved word in the C language: if. This denotes that the code is going to make a decision. Note that the decision can go one of two ways, but not both. If the decision resolves to TRUE, the left branch is taken and lines 8 and 9 are executed and then the thread jumps to line 17.

3. On the other hand, if the decision at line 7 resolves to FALSE, the thread jumps to line 10 to execute the else clause.

4. Line 11 sets a value and then falls through to line 12 where another decision is made. This line has the reserved word for, which means we may or may not loop.

5. At line 12, a decision is made in the for loop body using the second phrase in the statement: (i <= n). If this evaluates to TRUE, the loop will fire, causing the code to go to line 13, where it calculates the value of f and then goes right back to the loop at line 12.

6. If the for loop evaluates to FALSE, the thread falls through to line 15 and thence to line 17.

7. Once the thread gets to line 17, the function ends at line 18.

To review the control flow graph: there are six process blocks, two decision blocks (lines 7 and 12), and two junction points (lines 12 and 17).

2.1.3 Statement Coverage

Now that we have discussed control flow graphing, let’s take a look at the first level of coverage we mentioned earlier, statement coverage (also called instruction or code coverage). The concept is relatively easy to understand: executable statements are the basis for test design selection. To achieve statement coverage, we pick test data that forces the thread of execution to go through each line of code that the system contains.

The first edition of this book was challenged by some who did not believe that statement coverage is a control flow design technique. We have decided that for logical consistency, and for the understanding of those who are just learning this subject, we will discuss statement coverage under control flow testing whether or not it strictly belongs here.

To calculate the current level of statement coverage that we have attained, divide the number of code statements that have been executed during the testing by the number of code statements in the entire module/system; if the quotient is equal to 1, we have 100 percent statement coverage.

The bug hypothesis is pretty much as expected; bugs can lurk in code that has not been executed.

Statement-level coverage is considered the least effective of all control flow techniques. In order to reach this modest level of coverage, a tester must come up with enough test cases to force every line to execute at least once. While this does not sound too onerous, it must be done purposefully. One good practice for an advanced technical test analyst is to make sure the developers they work with are familiar with the concepts we are discussing here. Achieving (at least) statement coverage can (and should) be done while unit testing.

There is certainly no guarantee that even college-educated developers learned how important this coverage level is. Jamie had an undergraduate computer science major and earned a master’s degree in computer science from reputable colleges and yet was never exposed to any of this information until after graduation, when he started reading test books and attending conferences. Rex can attest that the concepts of white-box coverage—indeed, test coverage in general—were not discussed when he got his degree in computer science and engineering at UCLA.

The Institute of Electrical and Electronics Engineers (IEEE), in the unit test standard ANSI 87B (1987), stated that statement coverage was the minimum level of coverage that should be acceptable. Boris Beizer, in his seminal work, Software Testing Techniques (ITC Press 1990), has a slightly more inflammatory take on it: “… testing less than this for new software is unconscionable and should be criminalized.”

Also from that book, Beizer lays out some rules of common sense:

1. Not testing a piece of code leaves a residue of bugs in the program in proportion to the size of the untested code and the probability of bugs.

2. The high-probability paths are always thoroughly tested if only to demonstrate that the system works properly. If you have to leave some code untested at the unit level, it is more rational to leave the normal, high-probability paths untested, because someone else is sure to exercise them during integration testing or system testing.

3. Logic errors and fuzzy thinking are inversely proportional to the probability of the path’s execution.

4. The subjective probability of executing a path as seen by the routine’s designer and its objective execution probability are far apart. Only analysis can reveal the probability of a path, and most programmers’ intuition with regard to path probabilities is miserable.

5. The subjective evaluation of the importance of a code segment as judged by its programmer is biased by aesthetic sense, ego, and familiarity. Elegant code might be heavily tested to demonstrate its elegance or to defend the concept, whereas straightforward code might be given cursory testing because “How could anything go wrong with that?”

Have we convinced you that this is important? Let’s look at how to do it.

Image

Figure 2–6 Statement coverage example

Looking at the same code and control flow graph we looked at earlier, Figure 2–6 shows the execution of our factorial function when an input value less than zero is tested. The gray arrows show the path taken through the code with this test. Note that we have covered some of the lines of code, but not all. Based on the graph, we still have a way to go to reach statement coverage.

Following the black arrows, we see the result of testing with a value greater than 0. At this point, the if in line 7 resolves to FALSE and so we go down the untested branch. Because the input value is at least 1, the loop will fire at least once (more times if the input is greater than 1). When i increments and becomes larger than n (due to the third phrase in the parentheses, i++), the loop will stop and the thread of execution will fall through to line 15, line 17, and out.

Note that you do not really need the control flow graph to calculate this coverage; you could simply look at the code and see the same thing. However, for many people, it is much easier to read the graph than the code.

At this point, we have achieved statement coverage. If you are unsure, go back and look at Figure 2–6 again. Every node (and line of code) has been covered by at least one arrow. We have run two test cases (n<0, n>0). At this point, we might ask ourselves if we are done testing. If we were to do more testing than we need, it is wasteful of resources and time. While it might be wonderful to always test enough that there are no more bugs possible, it would (even if it were possible) make software so expensive that no one could afford it.

Tools are available to determine if your testing has achieved 100 percent statement coverage. Coverage tools were introduced in the ISTQB Foundation level syllabus.

The real question we should ask is, Did we test enough for the context of our project? And the answer, of course, is that it depends on the project. Doing less testing than necessary will increase the risk of really bad things happening to the system in production. Every test group walks a balance beam in making the “do we need more testing?” decision.

Maybe we need to see where more coverage might come from and why we might need it. Possibly we can do a little more testing and get a lot better coverage.

Figure 2–7 shows a really simple piece of code and a matching control flow graph. Test case 1 is represented by the gray arrows; the result of running this single test with the given values is full 100 percent statement coverage. Notice on line 2, we have an if statement that compares a to b. Since the inputted values (a = 3, b = 2), when plugged into this decision, will evaluate to TRUE, the line z = 12 will be executed, and the code will fall out of the if statement to line 4, where the final line of code will execute and set the variable Rep to 6 by dividing 72 by 12.

Image

Figure 2–7 Where statement coverage fails

A few moments ago, we asked if statement coverage was enough testing. Here, we can see a potential problem with stopping at only statement coverage. Notice that if we pass in the values shown as test 2 (a = 2, b = 3), we have a different decision made at the conditional on line 2. In this case, a is not greater than b (i.e., the decision resolves to FALSE) and line 3 is not executed. No big deal, right. We still fall out of the if to line 4. There, the calculation 72/z is performed, exactly the way we would expect. However, there is a nasty little surprise at this point. Since z was not reset to 12 inside the branch, it remained set to 0. Therefore, expanding the calculation performed on line 4, we have 72 divided by 0.

Oops!

You might remember from elementary school that anything divided by 0 is not defined. Seriously, in our mathematics, it is simply not allowed. So what should the computer do at this point? If there is an exception handler somewhere in the execution stack, it should fire, unwinding all of the calculations that were made since it was set. If there is no exception handler, the system may crash—hard! The processor that our system is running on likely has a “hard stop” built into its micro-code for this purpose. In reality, most operating systems are not going to let the CPU crash with a divide by zero failure—but the OS will likely make the offending application disappear like a bad dream.

But, you might say, we had statement coverage when we tested! This just isn’t fair! As we noted earlier, statement coverage by itself is a bare minimum coverage level that is almost guaranteed to miss defects. We need to look at another, higher level of coverage that can catch this particular kind of defect.

2.1.4 Decision Coverage

The next strongest level of structural coverage is called decision (or branch) coverage.

Rather than looking at individual statements, this level of coverage looks at the decisions themselves. Every decision has the possibility of being resolved as either TRUE or FALSE. No other possibilities. Binary results, TRUE or FALSE. For those who point out that the switch statement can make more than two decisions, well, conceptually that appears to be true. The switch statement is a complex set of decisions, often built as a table by the compiler. The generated code, however, is really just a number of binary compares that continue sequentially until either a match is found or the default condition is reached. Each atomic decision in the switch statement is still a comparison between two values that evaluates either TRUE or FALSE.

To get to the 100 percent decision level of coverage, every decision made by the code must be resolved both ways at one time or another, TRUE and FALSE. That means—at minimum—two test cases must be run, one with data that causes the evaluation to resolve TRUE and a separate test case where the data causes the decision to resolve FALSE. If you omit one test or the other, then you do not achieve 100 percent decision coverage.

Our bug hypothesis was proved out in Figure 2–7. An untested branch may leave a land mine in the code that can cause a failure even though every line was executed at least once. The example we went over may seem too simplistic to be true, but this is exactly what often happens. The failure to set (or reset) a value in the conditional causes a problem later on in the code.

Note that if we execute each decision both ways, giving us decision coverage, it guarantees statement coverage in the same code. Therefore, decision coverage is said to be stronger than statement coverage. Statement coverage, as the weakest coverage, does not guarantee anything beyond each statement being executed at least once.

As before, we could calculate the exact level of decision coverage by dividing the number of decision outcomes tested by the total number of decision outcomes in the module/system. For this book, we are going to speak as if we always want to achieve full—that is 100 percent—decision coverage. In real life, your mileage might vary. There are tools available to measure decision coverage. However, we have found that some of the tools that claim to calculate decision coverage do not work correctly; you should verify the correctness of the tool you are using if it matters to your organization.

Having defined this higher level of coverage, let’s dig into it.

The ability to take one rather than the other path in a computer is what really makes a computer powerful. Each computer program likely makes millions of decisions a minute. But exactly what is a decision?

As noted earlier, each predicate eventually must resolve to one of two values, TRUE or FALSE. As seen in our code, this might be a really simple expression: if (a > b) or if (n < 0). However, we often need to consider more complex expressions in a decision. In fact, a decision can be arbitrarily complex, as long as it eventually resolves to either TRUE or FALSE. The following is a legal expression that resolves to a single Boolean value:

(a>b) || (x+y==-1) && ((d) != TRUE)

This expression has three subexpressions in it, separated by the Boolean operators || and &&. The first is an OR operator and the second is an AND operator. While the AND operator has a slightly higher precedence for evaluation than the OR, the rule for evaluation of Boolean operators in the same expression is to treat them as if they have equal precedence and to evaluate them according to their associativity (i.e., left to right). Only parentheses would change the order of evaluation. Therefore, the first subexpression to be evaluated would be the leftmost, (a>b). This will evaluate to either TRUE or FALSE based on the actual runtime values. Next, the subexpression (x+y==-1) will evaluate to either TRUE or FALSE based on the runtime values. This result will be ORed to the first result and stored for a moment. Finally, the last subexpression, ((d)!= TRUE), will evaluate to either TRUE or FALSE based on the value of d and will then be ANDed to the previous result.2

While this expression appears to be ugly, the compiler will evaluate the entire predicate—step-by-step—to inform us whether it is a legal expression or not. This actually points out something every developer should do: write code that is understandable! Frankly, we would not insult the term understandable by claiming this expression is!

There are a variety of different decisions that a programming language can make.

Table 2–1 C language decision constructs

Image

Here, in Table 2–1, you can see a short list of the decision constructs that the programming language C uses. Other languages (including C++, C#, and Java) use similar constructs. The commonality between all of these (and all of the other decision constructs in all imperative programming languages) is that each makes a decision that can go only two ways: TRUE or FALSE.

Image

Figure 2–8 Decision coverage example

Looking back at our original example, with just two test cases we did not achieve decision coverage, even though we did attain statement coverage. We did not not execute the for loop during the test. Remember, a for loop evaluates an expression and decides whether to loop or not to loop based on the result.

In order to get decision coverage, we need to test with the value 0 inputted, as shown in Figure 2–8. When 0 is entered, the first decision evaluates to FALSE, so we take the else path. At line 12, the predicate (is 1 less than or equal to 0) evaluates to FALSE, so the loop is not taken. Recall that earlier, we had tested with a value greater than 0, which did cause the loop to execute. Now we have achieved decision coverage; it took three test cases.

At the risk of being pedantic, when the test was (n > 0), causing the loop to fire, it did eventually run down and not execute the for loop. When discussing a loop, there will always be a time that the loop ends and does fall out (or else we have an infinite loop, which brings other problems). That time after the final loop does not count as not taking the loop. We need a test that never takes the loop to achieve decision coverage.

That brings us back to the same old question. Having tested to the decision level of coverage, have we done enough testing?

Consider the loop itself. We executed it zero times when we inputted 0. We executed it an indeterminate number of times when we inputted a value greater than zero. Is that sufficient testing?

Well, not surprisingly, there is another level of coverage called loop coverage. We will look at that next.

2.1.5 Loop Coverage

While not discussed in the ISTQB Advanced syllabus, loop testing should be considered an important control flow, structural test technique. If we want to completely test a loop, we would need to test it zero times (i.e., did not loop), one time, two times, three times, all the way up to n times where it hits the maximum it will ever loop. Bring your lunch; it might be an infinite wait!

Like other exhaustive testing techniques, full loop testing does not provide a reasonable amount of coverage. Different theories have been offered that try to prescribe how much testing we should give a loop. The basic minimum tends to be two tests, zero times through the loop and one time through the loop (giving us decision coverage). Others add a third test, multiple times through the loop, although they do not specify how many times. We prefer a stronger standard; we suggest that you try to test the loop zero and one time and then test the maximum number of times it is expected to cycle (if you know how many times that is likely to be). In a few moments, we will discuss Boris Bezier’s standard, which is even more stringent.

We should be clear about loop testing. Some loops could execute an infinite number of times; each time through the loop creates one or more extra paths that could be tested. In the Foundation syllabus, a basic principle of testing was stated: “Exhaustive testing is impossible.” Loops, and the possible infinite variations they can lead to, are the main reason exhaustive testing is impossible. Since we cannot test all of the ways loops will execute, we want to be pragmatic here, coming up with a level of coverage that gives us the most information in the least amount of tests. Our bug hypothesis is pretty clear; failing to test the loop (especially when it does not loop even once) may cause bugs to be shipped to production.

In the next few paragraphs, we will discuss how to get loop coverage in our factorial example. Here in Figure 2–9, following the gray arrows, we satisfy the first leg of loop coverage, zero iterations. Entering a zero, as we discussed earlier, allows the loop to be skipped without causing it to loop. In line 12, the for loop condition evaluates as (1 is less than or equal to 0), or FALSE. This causes us to drop out of the loop without executing it.

Image

Figure 2–9 Loop coverage 0 and 1 times through

By entering a 1 (following the black arrows), we get the loop to execute just once. Upon entering line 12, the condition evaluates to (1 is less than or equal to 1), or TRUE. The loop executes and i is incremented. After the loop (line 13) is executed, the condition is evaluated again, (2 is less than or equal to 1), or FALSE. This causes us to drop out of the loop.

Zero and one time through the loop are relatively easy to come up with. How about the maximum times through? Each loop is likely to have a different way of figuring that out, and for some loops, it will be impossible to determine.

In this code, we have a monotonically increasing value which makes it easy to calculate the greatest number of loops possible. The maximum size of the data type used in the collector variable, f, will allow us to calculate the maximum number of iterations. We need to compare the size of the calculated factorial against the maximum integer size (the actual data type would have to be confirmed in the code).

In Table 2–2, we show a table with factorial values.

Table 2–2 Factorial values

Image

Assuming a signed 32-bit integer being used to hold the calculation, the maximum value that can be stored is 2,147,483,647. An input of 12 should give us a value of 479,001,600. An input value of 13 would cause an overflow (6,227,020,800). If the programmer used an unsigned 32-bit integer with a maximum size of 4,294,967,295, notice that the same number of iterations would be allowed; an input of 13 would still cause an overflow.

In Figure 2–10, we would go ahead and test the input of 12 and check the expected output of the function.

Image

Figure 2–10 Loop coverage max times through

It should be noted that our rule for loop coverage does not deal comprehensively with negative testing. Remember that negative testing is checking invalid values to make sure the failure they cause is graceful, giving us a meaningful error message and allowing us to recover from the error. We probably want to test the value 13 to make sure the overflow is handled gracefully. This is consistent with the concept of boundary value testing discussed in the ISTQB Foundation level syllabus.

Boris Beizer, in his book Software Testing Techniques, had suggestions for how to test loops extensively. Note that he is essentially covering the three point boundary values of the loop variable with a few extra tests thrown in.

1. If possible, test a value that is one less than the expected minimum value the loop can take. For example, if we expect to loop with the control variable going from 0 to 100, try -1 and see what happens.

2. Try the minimum number of iterations—usually zero iterations. Occasionally, there will be a positive number as the minimum.

3. Try one more than the minimum number.

4. Try to loop once (this test case may be redundant and should be omitted if it is).

5. Try to loop twice (this might also be redundant).

6. A typical value. Beizer always believes in testing what he often calls a nominal value. Note that the number of loops, from one to max, is actually an equivalence set. The nominal value is often an extra test that we tend to leave out.

7. One less than the maximum value.

8. The maximum number of values.

9. One more than the maximum value.

The most important differentiation between Beizer’s guidelines and our loop coverage described earlier is that he advocates negative testing of loops. Frankly, it is hard to argue against this thoroughness. Time and resources, of course, are always factors that we must consider when testing. We would argue that his guidelines are useful and should be considered when testing mission-critical or safety-critical software.

Finally, one of the banes of testing loops is when they are nested inside each other. We will end this topic with Beizer’s advice for reducing the number of tests when dealing with nested loops.

1. Starting at the innermost loop, set all outer loops to minimum iteration setting.

2. Test the boundary values for the innermost loop as discussed earlier.

3. If you have done the outermost loop already, go to step 5.

4. Continue outward, one loop at a time until you have tested all loops.

5. Test the boundaries of all loops simultaneously. That is, set all to 0 loops, 1 loop, maximum loops, 1 more than maximum loops.

Beizer points out that practicality may not allow hitting each one of these values at the same time for step 5. As guidelines go, however, these will ensure pretty good testing of looping structures.

2.1.6 Hexadecimal Converter Exercise

In Figure 2–11, you’ll find a C program, targeted for a UNIX platform, that accepts a string with hexadecimal characters (among other unwanted characters). It ignores the other characters and converts the hexadecimal characters to a numeric representation. If a Ctrl-C is inputted, the last digit that was converted is removed from the buffer. Please note that we removed the #includes to simplify the code; clearly those would need to be added to actually compile this for the UNIX platform.

If you test with the input strings “24ABd690BBCcc” and “ABCdef1234567890”, what level of coverage will you achieve?

What input strings could you add to achieve statement and branch coverage? Would those be sufficient for testing this program?

The answers are shown in the next section.

Image

Figure 2–11 Hexadecimal converter code

2.1.7 Hexadecimal Converter Exercise Debrief

The strings “24ABd690BBCcc” and “ABCdef1234567890” do not achieve any specified level of coverage that we have discussed in this book.

To get statement and decision coverage, you would need to add the following:

  • At least one string containing a signal (Ctrl-C) to execute the signal() and pophdigit() code

  • At least one string containing a non-hex digit

  • At least one empty string to cause the while loop on line 17 to not loop

To get loop coverage, we would have to add this:

  • One test that inputs the maximum number of hex digits that can be stored (based on the datatype of nhex)

There are also some interesting test cases based on placement that we would likely run. In at least one of these we would expect to find a bug (hint, hint):

  • A string containing only a signal (Ctrl-C)

  • A string containing a signal at the first position

  • A string with more hex digits than can be stored

  • A string with more signals than hex digits

2.2 Condition Coverage

Learning objectives

TTA-2.2.1 (K2) Understand how to achieve condition coverage and why it may be less rigorous testing than decision coverage.

So far we have looked at statement coverage (have we exercised every line of code?), decision coverage (have we exercised each decision both ways?), and loop coverage (have we exercised the loop enough?).

As Ron Popeil, the purveyor of many a fancy gadget in all-night infomercials used to say, “But wait! There’s more!”

While we have exercised the decision both ways, we have yet to discuss how the decision is made. Earlier, when we discussed decisions, you saw that they could be relatively simple, such as

A > B

or arbitrarily complex, such as

(a>b) || (x+y==-1) && (d) != TRUE

If we input data to force the entire condition to TRUE, then we go one way; force it to FALSE and we go the other way. At this point, we have achieved decision coverage. But is that enough testing?

Could there be bugs hiding in the evaluation of the condition itself? The answer, of course, is a resounding yes! And, if we know there might be bugs there, we might want to test more.

Our next level of coverage is called condition coverage. The basic concept is that, when a decision is made by a complex expression that eventually evaluates to TRUE or FALSE, we want to make sure each atomic condition is tested both ways, TRUE and FALSE.

Here is an alternate definition, which we find slightly easier to understand:

An atomic condition is defined as “the simplest form of code that can result in a TRUE or FALSE outcome.”3

Our bug hypothesis is that defects may lurk in untested atomic conditions, even though the full decision has been tested both ways. As always, test data must be selected to ensure that each atomic condition actually be forced TRUE and FALSE at one time or another. Clearly, the more complex a decision expression is, the more test cases we will need to execute to achieve this level of coverage.

Once again, we could evaluate this coverage for values less than 100 percent, by dividing the number of Boolean operand values executed by the total number of Boolean operand values there are. But we won’t. Condition coverage, for this book, will only be discussed as 100 percent coverage.

Note an interesting corollary to this coverage. Decision and condition coverage will always be exactly the same when all decisions are made by simple atomic expressions. For example, for the conditional, if (a == 1) {}, condition coverage will be identical to decision coverage.

Here, we’ll look at some examples of complex expressions, all of which evaluate to TRUE or FALSE.

x

The first, shown above, has a single Boolean variable, x, which might evaluate to TRUE or FALSE. Note that in some languages, it does not even have to be a Boolean variable. For example, if this were the C language, it could be an integer, because any non-zero value constitutes a TRUE value and zero is deemed to be FALSE. The important thing to notice is that it is an atomic condition and it is also the entire expression.

D && F

The second, shown above, has two atomic conditions, D and F, which are combined together by the AND operator to determine the value of the entire expression.

(A || B) && (C == D)

The third is a little tricky. In this case, A and B are both atomic conditions, which are combined together by the OR operator to calculate a value for the subexpression (A || B). Because A and B are both atomic conditions, the subexpression (A || B) cannot be an atomic condition. However, (C == D) is an atomic condition because it cannot be broken down any further. That makes a total of three atomic conditions: A, B, (C == D).

(a>b)||(x+y==-1)&&((d)!=TRUE)

In the last complex expression, shown above, there are again three atomic conditions. The first, (a>b), is an atomic condition that cannot be broken down further. The second, (x+y == -1), is an atomic condition following the same rule. In the last subexpression, (d!=TRUE) is the atomic condition.

Just for the record, if we were to see the last expression in actual code during a code review, we would jump all over the expression with both feet. Unfortunately, it is an example of the way some people program.

In each of the preceding examples, we would need to come up with sufficient test cases that each of these atomic conditions was tested where they evaluated both TRUE and FALSE. Should we test to that extent, we would achieve 100 percent condition coverage. That means the following:

  • x, D, F, A, and B would each need a test case where it evaluated to TRUE and one where it evaluated to FALSE.

  • (C==D) needs two test cases.

  • (a>b) needs two test cases.

  • (x+y==-1) needs two test cases.

  • ((d)!=TRUE) needs two test cases.

Surely that must be enough testing. Maybe, maybe not. Consider the following pseudo code:

if (A && B) then

 

{Do something}

else

 

{Do something else}

In order to achieve condition coverage, we need to ensure that each atomic condition goes both TRUE and FALSE in at least one test case each.

Test 1: A == FALSE, B == TRUE resolves to FALSE

Test 2: A == TRUE, B == FALSE resolves to FALSE

Assume that our first test case has the values inputted to make A equal to FALSE and B equal to TRUE. That makes the entire expression evaluate to FALSE, so we execute the else clause. For our second test, we reverse that so A is set to TRUE and B is set to FALSE. That evaluates to FALSE so we again execute the else clause.

Do we now have condition coverage? A was set both TRUE and FALSE, as was B. Sure, we have achieved condition coverage. But ask yourself, do we have decision coverage? The answer is no! At no time, in either test case, did we force the predicate to resolve to TRUE. Condition coverage does not automatically guarantee decision coverage.

That is an important point. We made the distinction that decision coverage was stronger than statement coverage because 100 percent decision coverage implied 100 percent statement coverage (but not vice versa). In the 2011 ISTQB Foundation level syllabus, section 4.4.3 has the following sentence: “There are stronger levels of structural coverage beyond decision coverage, for example, condition coverage and …”

As we showed in the previous example, condition coverage is not stronger than decision coverage because 100 percent condition coverage does not imply 100 percent decision coverage. That makes condition coverage—by itself—not very interesting for structural testing. However, it is still an important test design technique; we just need to extend the technique a bit to make it stronger.

We will look at three more levels of coverage that will make condition coverage stronger—at the cost of more testing, of course.

2.3 Decision Condition Coverage

Learning objectives

TTA-2.3.1 (K3) Write test cases by applying the decision condition testing test design technique to achieve a defined level of coverage.

The first of these we will call decision condition coverage. This level of coverage is just a combination of decision and condition coverage (to solve the shortcoming of condition only coverage pointed out in the preceding section). The concept is that we need to achieve condition level coverage where each atomic condition is tested both ways, TRUE and FALSE, and we also need to make sure that we achieve decision coverage by ensuring that the overall predicate be tested both ways, TRUE and FALSE.

To test to this level of coverage, we ensure that we have condition coverage, and then make sure we evaluate the full decision both ways. The bug hypothesis should be clear from the preceding discussion; not testing for decision coverage even if we have condition coverage may allow bugs to remain in the code.

Full decision condition coverage guarantees condition, decision, and statement coverage and thereby is stronger than all three.

Going back to the previous example, where we have condition coverage but not decision coverage for the predicate (A && B), we already had two test cases as follows:

A is set to FALSE and B is set to TRUE.

A is set to TRUE and B is set to FALSE.

We can add a third test case:

Both A and B are set to TRUE.

We now force the predicate to evaluate to TRUE, so we now have decision coverage also. An alternative would be to select our original test cases a bit more wisely, as follows:

A and B are both set to TRUE.

A and B are both set to FALSE.

In this case, we can achieve full decision condition coverage with only two test cases.

Whew! Finally, with all of these techniques, we have done enough testing. Right?

Well, maybe.

2.4 Modified Condition/Decision Coverage (MC/DC)

Learning objectives

TTA-2.4.1 (K3) Write test cases by applying the modified condition/decision coverage (MC/DC) testing test design technique to achieve a defined level of coverage.

There is another, stronger level of coverage that we must discuss. This one is called modified condition/decision coverage (usually abbreviated to MC/DC or sometimes MCDC).

This level of coverage is considered stronger than those we have covered (so far) because we add other factors to what we were already testing in decision condition coverage. Like decision condition coverage, MC/DC requires that each atomic condition be tested both ways and that decision coverage must be satisfied. It then adds two more constraints:

1. For each atomic condition, there is at least one test case where the entire predicate evaluates to FALSE only because that specific atomic condition evaluated to FALSE.

2. For each atomic condition, there is at least one test case where the entire predicate evaluates to TRUE only because that specific atomic condition evaluated to TRUE.

There has been some discussion about the correct definition for MC/DC coverage. For example, the standard DO-178C4 (discussed below) has a slightly looser definition, as follows:

Every point of entry and exit in the program has been invoked at least once, every condition in a decision in the program has taken all possible outcomes at least once, every decision in the program has taken all possible outcomes at least once, and each condition in a decision has been shown to independently affect a decision’s outcome. A condition is shown to independently affect a decision’s outcome by: (1) varying just that condition while holding fixed all other possible conditions, or (2) varying just that condition while holding fixed all other possible conditions that could affect the outcome.

The definition we are using appears to be slightly more restrictive and so we will use it; this will ensure that when we are using it for safety- and mission-critical testing, it will cover all requirements for MC/DC coverage. By using our version of the definition rather than that in the syllabus or DO-178C, we supply coverage that’s at least as good, and it simplifies the way we can create our tests, using the neutral-value design technique.

Note that, in order to use our definition, we need to deal with the unary negation operator. Assume we have the following predicate:

A && (!B)

According to rule 1, if A is TRUE, the predicate must evaluate to TRUE. Also, if B is TRUE, then the predicate must evaluate to TRUE. However, if B is TRUE, then (!B) evaluates to FALSE and the predicate evaluates to FALSE. The only way we can use our “neutral value” design method (as seen below) is to deem that the atomic condition is the variable and the unary operator together. That is, (!B) is deemed to be the atomic condition that, when it evaluates TRUE (i.e., B is FALSE), fulfills the MC/DC definition by forcing the predicate TRUE. This definition fits within the glossary definition.

For either definition, when the specified atomic condition is toggled from one state to its opposite, while keeping all other atomic conditions fixed, the entire predicate will also toggle to its opposite value. Interestingly enough, this tends to be a lot less testing than it sounds like. In fact, in most cases, this level of coverage can usually be achieved in (N + 1) test cases, where (N) is the total number of atomic conditions in the predicate.

Many references use the term MC/DC as a synonym for exhaustive testing. However, this level of testing does not rise to the level of exhaustive testing, which would include far more than just predicate testing (running every loop every possible number of times, for example). MC/DC does, however, exercise each complex predicate in a number of ways that should result in finding certain types of subtle defects.

MC/DC coverage may not be useful to all testers. It will likely be most useful to those who are testing mission- or safety-critical software. According to the most prevalent legend, it was originally created at Boeing for use when certain programming languages were to be used in safety-critical software development. It is the required level of coverage under the standard FAA DO-178C5 when the software is judged to fall into the Level A category (where there are likely to be catastrophic consequences in case of failure).

Avionics systems—which are covered by DO-178C—are a perfect example of why MC/DC coverage is so important. At one time, some software professionals believed that safety-critical software should be tested to a multiple condition level of coverage. That would require that every single possible combination of atomic conditions be tested—requiring essentially 2n different test cases. See Table 2–3 in section 2.5 to compare the advantage MC/DC coverage has over multiple condition coverage.

There are several different algorithms that have been advanced to define the specific test cases needed to achieve MC/DC-level coverage. Perhaps the simplest to use is the test design technique using neutral values.6 This method works quite well with our definition of MC/DC coverage.

A neutral value for an atomic condition is defined as one that has no effect on the eventual outcome evaluation of the expression. The operator used in joining two atomic conditions determines exactly what value will be neutral. For example, suppose we are evaluating two atomic conditions, A and B. Keeping it simple, suppose we have two possible operators, AND and OR:

A AND B

A OR B

If A was the atomic condition we were interested in evaluating in our MC/DC test design, we would like the output result to be TRUE when A is TRUE and FALSE when A is FALSE. What value of B can we use that will enable that set of values? This is the value we call a neutral, as it has no effect on the evaluation.

If the operator is an AND, then the neutral value will be TRUE. That is, if we set B to the neutral value, we control the output completely by toggling A.

(A AND B) → (TRUE AND TRUE == TRUE), (FALSE AND TRUE == FALSE)

On the other hand, if the operator is OR, the neutral value must evaluate to FALSE.

(A OR B) → (TRUE OR FALSE == TRUE), (FALSE OR FALSE == FALSE)

Likewise, if we are interested in evaluating the value B, the value of A would be set to neutral in the same way. By using this concept of neutral substitutes, we can build a table for coming up with sufficient test values to achieve MC/DC coverage.

Let us consider a non-trivial predicate and then determine which test cases we would need to exercise to achieve full MC/DC coverage.

(A OR B) AND (C AND D)

The first step is to build a table with three columns and as many rows as there are atomic conditions plus one (the extra row is used as a header). That means, in this case, we start with three columns and five rows, as shown in Table 2–3.

Table 2–3 Non-trivial predicate (1)

Image

The second column will be used for when the predicate evaluates to TRUE and the third column will be used for when the predicate evaluates to FALSE. Each of the rows represents a specific atomic value as shown. Finally, each cell shown at the junction of an output and an atomic condition is set up so that the neutral values are set as dashes. In the second column (representing a TRUE output) and the first row (representing the atomic condition A), the contents represent a value:

TRUE Neutral Neutral Neutral

If the values substituted for the neutrals are indeed neutral, then cell (2,2) should contain values that will evaluate to TRUE, and cell (2,3) should contain values that evaluate to FALSE.

The next step (as seen in Table 2–4) is to substitute the correct values for the neutrals in each cell, basing them on the operators that are actually used in the predicate. This can be confusing because of the parentheses. Look carefully at the explanation given for each cell to achieve the possible combinations of inputs that can be used.

Table 2–4 Non‐trivial predicate (2)

Image
  • Cell (2,2): (TRUE OR B) AND (C AND D)

    B is ORed to A, therefore its neutral value is FALSE.

    (C AND D) is ANDed to (A OR B), and therefore neutral is

    TRUE.

    Since (C AND D) must be TRUE, both C and D must be set to TRUE.

  • Cell (2,3): (FALSE OR B) AND (C AND D)

    B is ORed to A, therefore its neutral value is FALSE.

    (C AND D) is ANDed to (A OR B), and therefore neutral is TRUE.

    Since (C AND D) must be TRUE, both C and D must be set to TRUE.

  • Cell (3,2): (A OR TRUE) AND (C AND D)

    A is ORed to B, therefore its neutral value is FALSE.

    (C AND D) is ANDed to (A OR B), and therefore neutral is TRUE.

    Since (C AND D) must be TRUE, both C and D must be set to TRUE.

  • Cell (3,3): (A OR FALSE) AND (C AND D)

    A is ORed to B, therefore its neutral value is FALSE.

    (C AND D) is ANDed to (A OR B), and therefore neutral is TRUE.

    Since (C AND D) must be TRUE, both C and D must be set to TRUE.

  • Cell (4,2): (A OR B) AND (TRUE AND D)

    D is ANDed to C, therefore its neutral value is TRUE.

    (A OR B) is ANDed to (C AND D), therefore neutral is TRUE.

    (A OR B) can achieve a value of TRUE in three different ways. It turns out not to matter which of those you use.

    (F OR T)

    (T OR F)

    (T OR T)

  • Cell (4,3): (A OR B) AND (FALSE AND D)

    D is ANDed to C, therefore its neutral value is TRUE.

    (A OR B) is ANDed to (C AND D), therefore neutral is TRUE.

    (A OR B) can achieve a value of TRUE in three different ways.

    It turns out not to matter which of those you use.

    (F OR T)

    (T OR F)

    (T OR T)

  • Cell (5,2): (A OR B) AND (C AND TRUE)

    C is ANDed to D, therefore its neutral value is TRUE.

    (A OR B) is ANDed to (C AND D), therefore neutral is TRUE.

    (A OR B) can achieve a value of TRUE in three different ways.

    It turns out not to matter which of those you use.

    (F OR T)

    (T OR F)

    (T OR T)

  • Cell (5,3): (A OR B) AND (C AND FALSE)

    C is ANDed to D, therefore its neutral value is TRUE.

    (A OR B) is ANDed to (C AND D), therefore neutral is TRUE.

    (A OR B) can achieve a value of TRUE in three different ways.

    It turns out not to matter which of those you use.

    (F OR T)

    (T OR F)

    (T OR T)

Okay, that is a mouthful. We have made it look really complex, but it is not. In truth, when you do this yourself, it will take a very short time (assuming you do not have to write it all out in text form for a book).

There is one final step you must take, and that is to remove redundant test cases (shown in Table 2–5). This is where it might get a little complex. When substituting values for (A OR B) in the final two rows, note that you have a choice whether to use (T OR T), (T OR F), or (F OR T). All three expressions evaluate to TRUE, which is the required neutral value.

Our first try was to use (T OR T) in all four cells where we could make that substitution. When we started to remove redundant test cases, we found that we could remove only two, leaving six test cases that we must run to get MC/DC coverage. Since the theory of MC/DC says we will often end up with (N + 1), or five test cases, we wondered if we could try alternate values, which might allow us to remove the sixth test case and get down to the theoretical number of five. By substituting the logical equivalent of (T OR F) in cells (4,2) and (5,2), we were able to remove three redundant test cases, as can be seen in Table 2–5.

Table 2–5 Test cases for non-trivial predicate

Image

This leaves us five (N + 1) tests, as follows:

Test 1: A = TRUE, B = FALSE, C = TRUE, D = TRUE

Test 2: A = FALSE, B = TRUE, C = TRUE, D = TRUE

Test 3: A = FALSE, B = FALSE, C = TRUE, D = TRUE

Test 4: A = TRUE, B = TRUE, C = FALSE, D = TRUE

Test 5: A = TRUE, B = TRUE, C = TRUE, D = FALSE

2.4.1 Complicating Issues: Short-Circuiting

These two issues affect both MC/DC and multiple condition testing (which we will discuss next). In both of these design techniques, we are trying to test a complex predicate with multiple test cases. However, the actual number of test cases needed may be complicated by two different issues: short-circuiting and multiple occurrences of an individual atomic condition.

First, we will look at short-circuiting. Well, you might ask, what does that term mean?

Some programming languages are defined in such a way that Boolean expressions may be resolved by the compiler without evaluating every subexpression, depending on the Boolean operator that is used. The idea is to reduce execution time by skipping some nonessential calculations.

Table 2–6 Truth table for A || B

Image

Consider what happens when a runtime system, that has been built using a short-circuiting compiler, resolves the Boolean expression (A || B) (see Table 2–6). If A by itself evaluates to TRUE, then it does not matter what the value of B is. At execution time, when the runtime system sees that A is TRUE, it does not even bother to evaluate B. This saves execution time.

C++ and Java are examples of languages whose compilers may exhibit this behavior. Some flavors of C compilers short-circuit expressions. Pascal does not provide for short-circuiting, but Delphi (object-oriented Pascal) has a compiler option to allow short-circuiting if the programmer wants it.

Note that both the OR and the AND short-circuit in different ways. Consider the expression (A || B). When the Boolean operator is an OR (||), it will short-circuit when the first term is a TRUE because the second term does not matter. If the Boolean operator was an AND (&&), it would short circuit when the first term was FALSE because no value of the second term would prevent the expression from evaluating as FALSE.

Short-circuiting may cause a serious defect condition when it is used. If an atomic condition is supplied by the output of a called function and it is short-circuited out of evaluation, then any side effects that might have occurred through its execution are lost. Oops! For example, assume that the actual predicate, instead of

(A || B)

is

(A || funcB())

Further suppose that a side effect of funcB() is to initialize a data structure that is to be used later (not necessarily a good programming custom). In the code, the developer could easily assume that the data structure has already been initialized since they knew that the predicate containing the function had been used in a conditional that was expected to have already executed. When they try to use the non-initialized data structure, it causes a failure at runtime. To quote Robert Heinlein, for testers, “there ain’t no such thing as a free lunch.”

On the other hand, short-circuiting may be used to help avoid serious defects. Consider a pointer p that may or may not be NULL. Short-circuiting allows this common code to work:

if ((p != NULL) && (*p>0))

If short-circuiting was not used and p actually evaluated to NULL, then this code would crash. Jamie has always been wary of such code, but it is widely used. Because the short-circuiting is going to be dependent on the compiler writer, Jamie would avoid this dependency by writing this code:

if (p) {

 

if(*p > 0) {

 

}

}

If the B expression is never evaluated, the question then becomes, Can we still achieve MC/DC coverage? The answer seems to be maybe. Our research shows that a number of organizations have weighed in on this subject in different ways; unfortunately, it is beyond the scope of this book to cover all of the different opinions. If your project is subject to regulatory statutes, and the development group is using a language or compiler that short-circuits, our advice is to research exactly how that regulatory body wants you to deal with that short-circuiting.

2.4.2 Complicating Issues: Coupling

The second issue to be discussed is when there are multiple occurrences of a condition in an expression. Consider the following pseudo-code predicate:

A || (!A && B)

In this example, A and !A are said to be coupled. They cannot be varied independently as the MC/DC coverage rule says they must. So how does a test analyst actually need to test this predicate and still achieve the required coverage? As with short-circuiting, there are (at least) two approaches to this problem.

One approach is called unique cause MC/DC. In this approach, the term condition in the definition of MC/DC is deemed to mean uncoupled condition and the question becomes moot.

The other approach is called masking MC/DC, and it permits more than one atomic condition to vary at once but requires analyzing the logic of the predicate on a case-by-case basis to ensure that only one atomic condition of interest influences the decision. Once again, our suggestion is to follow whatever rules are imposed upon you by the regulatory body that can prevent the release of your system.

Wow! This is really down in the weeds. Should you test to the MC/DC level? Well, if your project needed to follow the FAA DO-178C standard, and the particular software was of Level A criticality, the easy answer is yes, you would need to test to this level. Level A criticality means that, if the software were to fail, catastrophic results would likely ensue. If you did not test to this level, you would not be able to sell your software or incorporate it into any module for the project. We will discuss this standard and others that might affect your level of coverage later in the chapter.

As always, context matters. The amount of testing should be commensurate with the amount of risk.

2.5 Multiple Condition Coverage

Learning objectives

TTA-2.5.1 (K3) Write test cases by applying the multiple condition testing test design technique to achieve a defined level of coverage.

In this section, we come to the natural culmination of all of the previously discussed control flow coverage techniques. Each one gave us a little higher level of coverage, usually by adding more tests. The final control flow coverage level to discuss is called multiple condition coverage. The concept is to test every possible combination of atomic conditions in a decision! This one is truly exhaustive testing as far as the combinations of atomic conditions that a predicate can take on. Go through every possible permutation—and test them all.

Creating a set of tests does not take much analysis. Create a truth table with every combination of atomic conditions, TRUE and FALSE, and come up with values that test each one. The number of tests is proportional to the number of atomic conditions, using the formula 2n, where n is the number of atomic conditions (assuming no short-circuiting).

The bug hypothesis is really pretty simple also—bugs could be anywhere, lurking in some esoteric combinations of atomic conditions that are possible but unlikely!

Once again, we could conceivably measure the amount of multiple condition coverage by calculating the theoretical possible numbers of permutations we could have and dividing those into the number that we actually test. However, instead we will simply discuss the hypothetical perfect: 100 percent coverage.

A simple predicate can be tested without creating a truth table. For example, to get multiple condition coverage of

if (A AND B) then

we can calculate the number of tests needed (22) and simply enumerate through them as follows, giving us four tests to run:

A = TRUE, B = TRUE

A = TRUE, B = FALSE

A = FALSE, B = TRUE

A = FALSE, B = FALSE

Let’s use a non-trivial predicate to see how many different test cases we are looking at, assuming no short-circuiting. Here is a conditional decision that might be used:

if (A&& B && (C || (D && E))) then …

We can devise the truth table shown in Table 2–7. Since there are five atomic conditions, we would expect to see 25, or 32, unique rows in the table.

Table 2–7 Truth table

Image

Theoretically, we would then create a unique test case for every row.

Why theoretically? Well, we might have the same issues that popped up earlier, the concept of short-circuiting and coupling. Depending on the order of the Boolean operators, the number of test cases that are interesting (i.e., achievable in a meaningful way) will differ when we have a short-circuiting compiler.

Let’s assume that we are using a compiler that does short-circuit predicate evaluation. Again, here is our predicate:

(A&& B && (C || (D && E)))

Necessary test cases, taking into consideration short-circuiting, are seen in Table 2–8. Note that we started off with 32 different rows depicting all of the possible combinations that five atomic conditions could take, as shown in Table 2–7.

Table 2–8 Tests (1)

Image

The first thing we see is that 16 of those rows start with A being set to FALSE. Because of the AND operator, if A is FALSE, the other four terms will all be short-circuited out and never evaluated. We must test this once, in test 1, but the other 15 rows (18–32) are dropped as being redundant testing.

Likewise, if B is set to FALSE, even with A TRUE, the other terms are short-circuited and dropped. Since there are eight rows where this occurs (9–16), seven of those are dropped and we are left with test 2.

In test 3, if D is set to FALSE, the E is never evaluated.

In tests 4 and 5, each term matters and hence is included.

In test 6, once C evaluates to TRUE, D and E no longer matter and are short-circuited.

Out of the original possible 32 tests, only 6 of them are interesting.

For our second example, we have the same five atomic conditions arranged in a different logical configuration, as follows:

((A || B) && (C || D)) && E

In this case, we again have five different atomic conditions and so we start with the same 32 rows possible as shown in Table 2–7. Notice that anytime E is FALSE, the expression is going to evaluate to false. Since there are 16 different rows where E is FALSE, you might think that we would immediately get rid of 15 of them. While a smart compiler may be able to figure that out, most won’t because the short-circuiting normally goes from left to right.

This predicate results in a different number of tests, as seen in Table 2–9.

Table 2–9 Tests (2)

Image

For test 1, we have both A and B being set to FALSE. That makes the C and D and E short-circuit because the AND signs will ensure that they do not matter. We lose seven redundant cases right there.

For test 2, we lose a single test case because with C and D both evaluating to FALSE, the entire first four terms evaluate to FALSE, making E short-circuit.

With tests 3 and 4, we must evaluate both expressions completely because the expression in the parentheses evaluates to TRUE, making E the proximate cause of the output expression.

For both tests 5 and 6, the D atomic condition is short-circuited because C is TRUE; however, we still need to evaluate E because the entire parentheses evaluates to TRUE. You might ask how can we short-circuit D but not E? Remember that the compiler will output code to compute values in parentheses first; therefore, the calculation inside the parentheses can be short-circuited, but the expressions outside the parentheses are not affected.

Likewise, for tests 7, 8, 9 10, and 11, we short circuit B since A is TRUE. For test 7, we can ignore E because the subexpression inside the parentheses evaluates to FALSE. Tests 8 and 9 must evaluate to the end because the calculation in the parentheses evaluates to TRUE. And finally, 10 and 11 also short-circuit D.

Essentially, when we’re testing to multiple condition coverage with a short-circuiting language, each subexpression must be considered to determine whether short-circuiting can be applied or not. Of course, this is an exercise that can be done during the analysis and design phase of the project when we are not on the critical path (before the code is delivered into test). Every test case that can be thrown out because of this analysis of short-circuiting will save us time when we are on the critical path (i.e., actually executing test cases).

Finally, as promised, we will give you a real-life comparison of the difference between MC/DC coverage and multiple condition coverage. The following table, Table 2–10, is drawn from a NASA paper7 that discusses MC/DC techniques. The specific software discussed is written in Ada; it shows all the Boolean expressions with n conditions in five different line replaceable units (LRUs) that, for DO-178C purposes, are rated to be Level A. The five LRUs came from five different airborne systems from 1995.

Table 2–10 Comparing MC/DC with MC coverage

Image
Image

The first row is split into columns, each representing the number of atomic conditions that occur in the expression. To keep the table reasonably sized, larger numbers of atomic conditions are put into ranges; for example, the last column shows expressions that had between 36 and 76 unique atomic conditions in them. Airborne systems can obviously get rather complicated!

The second row shows the number of times (in the five LRU systems) that each different-sized expression was seen. For example, there were 16,491 times that expressions were simple (i.e., containing only a single atomic condition). On the other hand, there were 219 times that an expression contained between 6 and 10 atomic conditions inclusive.

The third row shows the number of test cases that would be required to achieve multiple condition coverage considering the number of atomic conditions in the expression. So, if there were 16 atomic conditions, it would require 216 (65,536) tests to achieve coverage, and if there were 20, it would require 1,048,576 separate test cases to achieve coverage. Luckily, there are only 36 separate expressions requiring that much testing!

The last row estimates how many tests would be needed to achieve MC/DC coverage for those same predicates. Since we can usually achieve MC/DC coverage with n+1 number of tests, the given values use that calculation. In some cases, it might take one or two more tests to achieve coverage.

As you can see in the table, expressions with up to five atomic conditions are the most common (19,960 of 20,256 predicates), and they could conceivably be easily tested using multiple condition testing coverage. For more complex predicates, however, the number of test cases required to achieve multiple condition coverage increases logarithmically. Since the likelihood of defects probably is much higher in more complex predicates, the value of MC/DC coverage testing should be apparent.

We will further discuss the DO-178C standard and others that require specific coverage later in this chapter.

We are not sure when multiple condition testing might actually be used in today’s world. MC/DC seems to be the method used when safety and/or mission are critical. We do know that, at one time, multiple condition testing was used for systems that were supposed to work for decades without fail; telephone switches were so tested back in the 1980s.

2.5.1 Control Flow Exercise

In Table 2–11 is a snippet of Delphi code that Jamie wrote for a test management tool. This code controlled whether to allow a drag-and-drop action of a test case artifact to continue or whether to return an error and disallow it.

  • Tests are leaf nodes in a hierarchical feature/requirement/use case tree.

  • Each test is owned by a feature, a requirement, or a use case.

  • Tests cannot own other tests.

  • Tests can be copied, moved, or cloned from owner to owner.

  • If a test is dropped on another test, it will be copied to the parent of the target test.

  • This code was designed to prevent a test from being copied to its own parent, unless the intent was to clone it.

  • This code was critical … and buggy

Using this code snippet:

1. Determine the total number of tests needed to achieve decision condition testing.

2. Determine the total number of tests required to achieve MC/DC coverage (assuming no short-circuiting).

3. Determine the total number of tests needed for multiple condition coverage (assuming no short-circuiting).

4. If the compiler is set to short-circuit, which of those multiple condition tests are actually needed?

Table 2–11 Code snippet

Image

The answers are shown in the following section.

2.5.2 Control Flow Exercise Debrief

We find it is almost always easier to rewrite this kind of predicate using simple letters as variable names than to do the analysis using the long names. The long names are great for documenting the code—just hard to use in an analysis.

If (

(DDSourceNode.StringData2 = ’Test’)

AND (DDTgtNode.StringData2 = ’Test’)

AND (DDCtrlPressed OR DDShiftPressed)

AND (DDSourceNode.Parent = DDTgtNode.Parent)

AND (NOT DoingDuplicate)) Then Begin

This code translates to

if ((A) AND (B) AND (C OR D) AND (E) AND (NOT F8)) then

Where

A = DDSourceNode.StringData2 = ’Test’

B = DDTgtNode.StringData2 = ’Test’

C = DDCtrlPressed

D = DDShiftPressed

E = DDSourceNode.Parent = DDTgtNode.Parent

F = DoingDuplicate

1. Determine the tests required for decision condition coverage.

Remember, to achieve decision condition level coverage, we need two separate rules to apply:

1. Each atomic condition must be tested both ways.

2. Decision coverage must be satisfied.

It is a matter then of making sure we have both of those covered.

Table 2–12 Decision condition coverage

Image

Tests 1 and 2 in Table 2–12 evaluate each of the atomic conditions both ways and give us decision coverage. If this looks strange, remember that the last atomic condition that will be calculated is actually !F. But when we pass in the actual variable value in the test case, it will be just F (as represented in the table).

2. Determine the total number of tests required to achieve MC/DC coverage (assuming no short-circuiting) and define them.

There are three rules to modified condition/decision coverage:

1. Each atomic condition must be evaluated both ways.

2. Decision coverage must be satisfied.

3. Each atomic condition must affect the outcome independently when the other conditions are held steady: a TRUE atomic condition results in a TRUE output, a FALSE atomic condition results in a FALSE output.

The first step is to create a neutrals table, three columns and (N + 1) rows.

Table 2–13 Initial neutrals table

Image

The following are neutral values for the six atomic conditions in the table:

A: A is ANDed to B, therefore its neutral value is TRUE.

B: B is ANDed to A, therefore its neutral value is TRUE.

C: C is ORed to D, therefore its neutral value is FALSE.

D: D is ORed to C, therefore its neutral value is FALSE.

(C OR D): this expression is ANDed to B, therefore its neutral value is TRUE and may be achieved three ways:

(T,T), (T,F), or (F,T)

E: E is ANDed to the expression (C OR D), therefore its neutral value is T.

F: (NOT F) is ANDed to E, but it is inverted inside the parentheses, therefore its neutral value is FALSE.

Substituting these values into Table 2–13 creates Table 2–14. Then, simply eliminate the duplicates.

Table 2–14 Completed MC/DC Table

Image

There are two details that will help you understand these results.

1. The theory says that we need to have a test case for each atomic condition such that if the atomic condition goes TRUE, it makes the entire predicate go TRUE, and that if it toggles FALSE, then the predicate goes FALSE. Looking at the F atomic condition (row 7), this appears to not be fulfilled. When F goes FALSE, the predicate goes TRUE and vice versa. That is because the F variable is negated with a unary operator: that is, (NOT F). The actual atomic condition in this case is (NOT F), so the rule still is met.

2. For rows 2 (A), 3 (B), 6 (E), and 7 (F), the (C OR D) expression must evaluate to TRUE. We have the choice of three separate value sets that can be used to achieve a TRUE value: (T,T), (T,F), or (F,T). In order to reduce the number of test cases to (N + 1), you often need to try different combinations. In this case, it took us seven minutes to find combinations that allowed us to reduce the 12 possible test cases to 7.

3. Determine the total number of tests needed for multiple condition coverage.

With 6 different atomic conditions, we can build a truth table to determine the number of multiple condition tests that we would need to test without short-circuiting. Table 2–15 contains the truth tables to cover the six variables; hence 64 tests would be needed for multiple condition coverage.

Table 2–15 Truth tables for 6 values

Image
4. If the compiler is set to short-circuit, which of those tests are actually needed?

If the compiler generated short-circuiting code, the actual number of test cases would be fewer. We try to do this in a pattern based on the rules of short-circuiting. That is, the compiler generates code to evaluate from left to right subject to the general rules of scoping and parentheses. As soon as the outcome is determined, it stops evaluating.

Table 2–16 Short-circuited test cases

Image

Therefore, we would actually need to run 9 test cases to achieve multiple condition coverage when the compiler short-circuits.

2.6 Path Testing

Learning objectives

TTA-2.6.1 (K3) Write test cases by applying the path testing test design technique.

We have looked at several different control flow coverage schemes. But there are other ways to design test cases using the structure of the system as the test basis. We have covered three main ways of approaching white-box test design so far:

1. Statement testing where the statements themselves drove the coverage

2. Decision/branch testing where the decision predicate (as a whole) drove the coverage

3. Condition, decision condition, modified condition/decision coverage, and multiple condition coverage, all of which looked at subexpressions and atomic conditions of a particular decision predicate to drive the coverage

Now we are going to discuss path testing. Rather than concentrating merely on control flows, path testing is going to try to identify interesting paths through the code that we may want to test. Frankly, in some cases we will get some of the same tests we got through control flow testing. On the other hand, we might come up with some other interesting tests.

To try to get the terminology right, consider the following definition from Boris Beizer’s book Software Testing Techniques:

A path through the software is a sequence of instructions or statements that starts at an entry, junction, or decision and ends at another, or possibly the same, junction, decision, or exit. A path may go through several junctions, processes, or decisions one or more times.

When we discuss path testing, we start out by dismissing one commonly held but bad idea. Brute-force testing—that is, testing every independent path through the system. Got infinity? That’s how long it would take for any non-trivial system. Loops are the predominant problem here. Every loop is a potential black hole where each different iteration through the loop creates twice as many paths. We would need infinite time and infinite resources. Well, we said it was a bad idea...

Perhaps we can subset the infinite paths to come up with a smaller number of possible test cases that are interesting and useful. There are several ways of doing this. We will concentrate on two.

2.6.1 Path Testing via Flow Graphs

The first method that we will discuss comes from Boris Beizer; in his book Software Testing Techniques, he posits that statement and branch coverage of the source code, by itself, is a very poor indicator of how thoroughly the code has actually been tested.

Assume that you have outlined a set of tests that achieve statement coverage of the source code for a module. Now run that code through the compiler to generate object code that is executable on the platform of choice.

It is logical to assume that running those defined tests against the generated code would achieve statement-level coverage—that is, every generated statement of object code would be executed. However, that does not necessarily follow. In fact, depending on the number of decisions in the source code, the testing done with those tests might end up covering less than 75 percent of the actual executable statements in the object code. Depending on the programming language and the compiler used, decision/branch coverage of the source code might not generate even statement coverage at the object level. Or, as Beizer puts it:

The more we learn about testing, the more we realize that statement and branch coverage are minimum floors below which we dare not fall, rather than ceilings to which we should aspire.

We will not discuss why this disparity of coverage exists when dealing with compiled code. It is well beyond the scope of what we hope to achieve with this book. If you are interested in this topic, we suggest investigating compiler theory and design books that are available.

Image

Figure 2–12 Control flow graph for hexadecimal converter

Bezier’s answer to this coverage issue is to use control flow graphs to visually pick interesting (and perhaps some uninteresting) paths through the code. There must be sufficient paths selected to supply both statement and decision/branch coverage (remember, he considers that the absolute minimum testing to be done). Beizer is unconcerned and does not discuss the number of those paths that should be selected for testing (over and above full branch coverage). Instead, he states two general rules for selecting the paths:

1. Many simple paths are better that fewer complicated paths.

2. There is no conceptual problem executing the same code more than once.

The first step in this path coverage technique is to create a control flow graph for the code we want to test. In the interest of simplicity, we will use the code for the hexadecimal converter, already introduced earlier in this chapter (Figure 2–11).

We have fashioned this control flow graph a bit differently than previous flow graphs in this chapter to allow us to better explain how we can use it for path testing.

Note that each code segment between objects in the flow graph is lettered such that we can denote a path by stringing those letters together. The test case would require input data to force the decisions to take the paths desired. For example, a possible path that a tester could test, from beginning to end, might be

abdegkmehkmejkmnoqr

[a]

This path would start at the beginning (line 7)

--

[b]

Decide TRUE at the if() statement (line 13)

Executing in foreground9

[d]

Proceed out of the if() statement (line 16)

--

[e]

Decide TRUE for the while() predicate (line 17)

C

[g]

Go into handler that deals with (A..F)

--

[k]

Proceed out of handler (line 43)

--

[m]

Proceed back to while loop (line 17)

--

[e]

Decide TRUE for the while() predicate (line 17)

d

[h]

Go into handler that deals with (a..f)

--

[k]

Proceed out of handler (line 43)

--

[m]

Proceed back to while() loop (line 17)

--

[e]

Decide TRUE for the while() predicate (line 17)

R

[j]

Go into handler that deals with non-hex character

--

[k]

Proceed out of handler (line 43)

--

[m]

Proceed back to while() loop (line 17)

--

[n]

Decide FALSE for the while() predicate

EOF

[o]

Proceed to end of while() (line 44)

--

[q]

Decide FALSE at the if () statement (line 45)

There are hex digits

[r]

Proceed to end of function

--

This set of path segments would be driven by the inputs (or states of the system) as seen in the right-hand column. The expected output would be as follows:

“Got 2 hex digits: cd”

Once we have a control flow diagram, we still need to have some way to determine coverage. To that end, we can create a table to store the paths and facilitate the analysis as to which tests we find interesting and want to run. Our example is shown in Table 2–17. The table should include the paths we decide to test, the decisions that are made in those tests, and the actual path segments that are covered in each test.

The decision columns help us determine whether we have decision coverage given the paths we have chosen. After filling in the paths we decide are interesting, we know that we have decision coverage if each decision column has at least one TRUE (T) and one FALSE (F) entry in it. Notice that we have a switch statement in line 18 that has four possible paths through it (f, g, h, and j). Each of those columns must have at least one entry showing that the path segment has been traversed or we have not achieved decision coverage through that construct.

Likewise, we can ensure that we have statement coverage by ensuring that each code segment has been checked (with an x in this case) in at least one path instance.

For example, path number one in Table 2–17 has been entered to show the route as illustrated earlier. In this case, decision 13 has been marked T, decision 17 has been marked T (even though it eventually goes F to fall out of the while() loop), decision 45 has been marked F (because we did have hex digits), and the switch statement has been marked showing three segments covered, so we have partial coverage of the switch statement at this point. The fact that some segments have been traversed multiple times is not recorded and does not matter for the purposes of this analysis.

Table 2–17 Control flow for hex converter

Image

Despite the fact that we have a candidate test, we don’t know whether it is a “good” test based on Boris Bezier’s theories. Beizer recommends an algorithm that takes into account his two general rules of selecting paths. Because his algorithm is salient and succinct, we will quote it completely.

1. Pick the simplest, functionally sensible entry/exit path.

2. Pick additional paths as small variations from previous paths. Pick paths that do not have loops rather than paths that do. Favor short paths over long paths, simple paths over complicated paths, and paths that make sense over paths that don’t.

3. Pick additional paths that have no obvious functional meaning only if it’s necessary to provide coverage. But ask yourself first why such paths exist at all. Why wasn’t coverage achieved with functionally sensible paths?

4. Be comfortable with your chosen paths. Play your hunches and give your intuition free reign as long as you achieve C1 + C2. (100 percent statement and branch coverage)

5. Don’t follow rules slavishly—except for coverage.

Note that we likely break rule 1 with our first path as shown in Table 2–17. It is not the shortest possible path; while it is functionally sensible, it loops several times which also breaks rule 2.

For actually testing this module, we have filled out what we believe are a reasonable set of paths to start to test in Table 2–18, using Beizer’s algorithm.

Table 2–18 Completed control flow table

Image

Test 1: Test a single digit (0..9) and print it out.

Test 2: Test a single capitalized hex digit (A..F) and print it out.

Test 3: Test a single noncapitalized digit (a..f) and print it out.

Test 4: Test a non-hex digit and show there are no hex digits to convert.

Test 5: Test entering EOF as first input while running the application in the background.

Are these enough tests? We would say not, but they are a good start because they do give us both statement and branch coverage. Test cases 2, 3, and 4 are each a small variation in the previous test case. Test 5 is there only to give us the final coverage that we need.

Note that tests running the application in the background would be interesting, feeding input via a device or file. We know this because we see specific provisions for handling signals in the module. Tests selecting non-trivial strings to test would also be interesting. The five tests just shown should be seen as only the start.

One test that is not included in this basic set is one that tests the signal action itself. The first if() statement (line 13) is setting up the ability for the system to respond to a Ctrl^C asynchronously. When a Ctrl^C is inputted through the keyboard, and the signal is not set to be ignored (i.e., the application is running in the foreground), it removes the last hex digit that was inputted. This allows a user to correct a mistake if they entered an incorrect value. Interestingly enough, a serious defect exists with this signal-handling code. This points out very well that you cannot blindly perform white-box testing based solely on our ideas of coverage. You must understand all of the possible ways the code may be executed, both synchronously and asynchronously.

Of course, this particular defect should have been found—before we ever got to dynamic testing—through a code inspection, but that is covered in Chapter 5.

Beizer recommends a low-tech, graphical way to come up with the interesting paths to test. Create the control graph and then print off copies of it and use a highlighter to mark a single path through each copy. Enter each of those paths into the table. As you do so, the next path you create could cover some of the decisions or path segments that were not already covered. Continue until comfortable with the amount of testing—which should be governed by the context of the project, the application, and the test plan.

Using this approach, it is certainly conceivable that some of the paths will be impossible or uninteresting or have no functional meaning. Beizer addresses that issue in rule 3 of his algorithm.

As long-time developers and testers, we have often written and tested code that sometimes requires unusual looking tests to achieve a level of coverage. Some of those times, the structure was necessary to be able to achieve the needed functionality in an elegant way. Other times, we find that we have locked in on a bone-headed structure that was not needed and could have (should have) been rewritten.

In general, having test analysts criticizing the programmers’ choice of structure is not a recommendation that we are always comfortable making. However, in today’s world of Agile testers working side by side with programmers and mission- and safety-critical software, perhaps a discussion of that attitude might be in order.

2.6.2 Basis Path Testing

The second method we will look at comes from Thomas McCabe and his theory of cyclomatic complexity. His entirely sensible suggestion was to test the structure of the code to a reasonable amount, and he used mathematics and mapping theory to suggest how.

McCabe called his technique basis path testing. In December 1976, he published his paper “A Complexity Measure”10 in the IEEE Transactions on Software Engineering. He theorized that any software module has a small number of unique, independent paths through it (if we exclude iterations). He called these basis paths.

His theory suggests that the structure of the code can be tested by executing through this small number of paths and that all the infinity of different paths actually consists of simply using and reusing the basis paths.

A basis path is defined as a linearly independent path through the module. No iterations are allowed for these paths. When you’re dealing with looping structures, the loop predicate will generally be visited twice: once TRUE (to travel through the loop once) and once FALSE (to skip the loop). The basis path set is the smallest number of basis paths that cover the structure of the code.

The theory states that creating a set of tests that cover these basis paths, the basis set, will guarantee us both statement and decision coverage of testing. The basis path has also been called the minimal path for coverage.

Cyclomatic complexity is what Thomas McCabe called this theory of basis paths. The term cyclomatic is used to refer to an imaginary loop from the end of a module of code back to the beginning of it. How many times would you need to cycle through that outside loop until the structure of the code has been completely covered? This cyclomatic complexity number is the number of loops needed and—not coincidentally—the number of test cases we need to cover the set of basis paths.

The cyclomatic complexity depends not on the size of the module but on the number of decisions that are in its structure.

McCabe, in his paper, pointed out that the higher the complexity, the more likely there will be a higher bug count. Some subsequent studies have shown such a correlation; modules with the highest complexity tend to also contain the highest number of defects.11 Since the more complex the code, the harder it is to understand and maintain, a reasonable argument can be made to keep the level of complexity down. McCabe’s suggestion was to split larger, more-complex modules into more and smaller, less-complex modules. This complexity issue is not quite as straightforward as we just said; there are nuances to this that we will discuss in Chapter 3 when we discuss static analysis using control flows. For now, however, let’s discuss the nuts and bolts of measuring cyclomatic complexity.

We can measure cyclomatic complexity by creating a directed control flow graph. This works well for small modules; we will show an example later. Realistically, however, tools will generally be used to measure module complexity.

In our directed control flow graph, we have nodes (bubbles) to represent entries to the module, exits from the module, and decisions made in the module. Edges (arrows) represent non-branching code statements which do not add to the complexity. It is essential to understand that, if the code does not branch, it has a complexity of one. That is, even if we had a million lines of code with no decisions in it, it could be tested with a single test case.

In general, the higher the complexity, the more test cases we need to cover the structure of the code. If we cover the basis path set, we are guaranteed 100 percent of both statement and decision coverage.

Image

Figure 2–13 Cyclomatic complexity example

On the left side of Figure 2–13 we have a function to calculate the greatest common divisor of two numbers using Euclid’s algorithm.

On the right side of the figure, you see the two methods of calculating McCabe’s cyclomatic complexity metric. Seemingly the simplest is perhaps the “enclosed region” calculation. The four enclosed regions (R1, R2, R3, R4), represented by R in the upper equation, are found in the diagram by noting that each decision (bubble) has two branches that—in effect—enclose a region of the graph. We say “seemingly” the simplest because it really depends on how the graph is drawn as to how easy it is to see the enclosed regions.

The other method of calculation involves counting the edges (arrows) and the nodes (bubbles) and applying those values to the calculation E - N + 2, where E is the number of edges and N is the number of nodes.

Now, this is all simple enough for a small, modest method like this. For larger functions, drawing the graph and doing the calculation from it can be really painful. So, a simple rule of thumb is this: Count the branching and looping constructs and add 1. The if statements, for, while, and do/while constructs, each count as one. For the switch/case constructs, each case block counts as one. In if and ladder if constructs, the final else does not count. For switch/case constructs, the default block does not count. This is a rule of thumb, but it usually seems to work. In the sample code, there are three if statements and one while statement. 3 + 1 + 1 = 5.

Note that entry and exit nodes are not counted when calculating cyclomatic complexity.

When we introduced McCabe’s theory of cyclomatic complexity a bit ago, we mentioned basis paths and basis tests. Figure 2–14 shows the basis paths and basis tests on the right side.

Image

Figure 2–14 Basis tests from directed flow graph

The number of basis paths is equal to the cyclomatic complexity. You construct the basis paths by starting with the most obvious path through the diagram, from enter to exit. Beizer claims that this likely corresponds to the normal path. Then add another path that covers a minimum number of previously uncovered edges, repeating this process until all edges have been covered at least once.

Following the IEEE and ISTQB definition that a test case substantially consists of input data and expected output data, the basis tests are the inputs and expected results associated with each basis path. The basis tests will correspond with the tests required to achieve 100 percent branch coverage. This makes sense, since complexity increases anytime more than one edge leaves from a node in a directed control flow diagram. In this kind of a control flow diagram, a situation where there is “more than one edge coming from a node” represents a branching or looping construct where a decision is made.

What use is that tidbit of information? Well, suppose you were talking to a programmer about their unit testing. You ask how many different sets of inputs they used for testing. If they tell you a number that is less than the McCabe cyclomatic complexity metric for the code they are testing, it’s a safe bet they did not achieve branch coverage.

One more thing on cyclomatic complexity. Occasionally, you might hear someone argue that the actual formula for McCabe’s cyclomatic complexity is as follows:

C = E - N + P

or

C = E – N + 2P

We list it as this:

C = E - N + 2

A careful reading of the original paper does show all three formulae. However, the directed graph that accompanies the former version of the formula (E - N + P) also shows that there is an edge drawn from the exit node to the entrance node that is not there when McCabe uses the latter equation (E - N + 2P). This edge is drawn differently than the other edges—as a dashed line rather than a solid line, showing that while he is using it theoretically in the math proof, it is not actually there in the code. This extra edge is counted when it is shown, and it must be added when not shown (as in our example). The P value stands for the number of connected components.12 In our examples, we do not have any connected components, so by definition, P = 1. To avoid confusion, we abstract out the P and simply use 2 (P equal to 1 plus the missing theoretical line, connecting the exit to the entrance node). The mathematics of this is beyond the scope of this book; suffice it to say, unless an example directed graph contains an edge from the exit node to the enter node, the formula that we used is correct.

We’ll revisit cyclomatic complexity later when we discuss static analysis in Chapter 3.

2.6.3 Cyclomatic Complexity Exercise

The following C code function loads an array with random values. Originally, this was a called function in a real program; we have simplified it as a main program to make it easier to understand. As it stands, it is not a very useful program.

Table 2–19 Example code

Image

1. Create a directed control flow graph for this code.

2. Using any of the methods given in the preceding section, calculate the cyclomatic complexity.

3. List the basis tests that could be run.

The answers are in the following section.

2.6.4 Cyclomatic Complexity Exercise Debrief

The directed control flow graph should look like Figure 2–15. Note that the edges D1 and D2 are labeled; D1 is where the if conditional in line 14 evaluates to TRUE, and D2 is where it evaluates to FALSE.

Image

Figure 2–15 Cyclomatic complexity exercise

We could use three different ways to calculate the cyclomatic complexity of the code, as shown in the box on the right in Figure 2–15.

First, we could calculate the number of test cases by the region methods. Remember, a region is an enclosed space. The first region can be seen on the left side of the image. The curved line goes from B to E; it is enclosed by the nodes (and edges between them) B-C-E. The second region is the top edge that goes from C to D and is enclosed by line D1. The third is the region with the same top edge, C to D, and is enclosed by D2. Here is the formula:

C = # Regions + 1

C = 3 + 1

C = 4

The second way to calculate cyclomatic complexity uses McCabe’s cyclomatic complexity formula. Remember, we count up the edges (lines between bubbles) and the nodes (the bubbles themselves), as follows:

C = E - N + 2

C = 7 - 5 + 2

C = 4

Finally, we could use our rule-of-thumb measure, which usually seems to work. Count the number of places where decisions are made and add 1. So, in the code itself, we have line 8 (an if() statement), line 10 (a while() loop), and line 14 (an if() statement).

C = # decisions + 1

C = 3 + 1

C = 4

In each case, our basis path is equal to 4. That means our basis set of tests would also number 4. The following test cases would cover the basis paths.

1. ABE

2. ABCE

3. ABCD(D1)CE

4. BCD(D2)CE

2.7 API Testing

Learning objectives

TTA-2.7.1 (K2) Understand the applicability of API testing and the kinds of defects it finds.

From the Advanced Technical Test Analyst syllabus:

An Application Programming Interface (API) is code which enables communication between different processes, programs and/or systems. APIs are often utilized in a client/server relationship where one process supplies some kind of functionality to other processes.

In the old days of computing, APIs did not exist because they were not needed. A central mainframe computer did the processing using punched paper and cards and tape for input and storage. The idea that a single person might sit down and interact with the system in real time was ludicrous.

Then came dumb terminals, allowing multiple individuals to interact with the system in limited ways concurrently. All of the actual processing was still done on the central processor, so there was no need (or place) to offload processing. Mini computers allowed organizations that could not afford the big iron to buy some limited processing power; dumb terminals still ruled the day with all processing done in the central CPU.

As processors started to get smaller and more powerful, the demand for more than time sharing began to develop. Jamie remembers listening to a rant by a real old-time computer programmer complaining that “today’s computer users are spoiled. Someday everyone will want their own powerful computer on their own desk.”

Word processing, data analysis, and decentralized data access: people did indeed want access to their “stuff” that time sharing alone could not satisfy.

As UNIX and other similar mini-computer operating systems were introduced, the possibility of personal computing became a reality. You still did not have a computer on your desk at home; that came a lot later. A small group could have access to a computer network that was not tethered directly to a mainframe; each node in the network had its own processor.

This distributed computing came with its own issues, however. Processing large amounts of data or performing large, complex calculations could not be done on these smaller processors without loading them down such that they no longer supported users interactively. One popular solution that was devised was the ability to request that some of the processing be done on remote processors. This was the advent of the remote procedure call (RPC), which is the direct progenitor of the API.

A number of networks were created that allowed a person working on a single CPU to spread some of their processing to other CPUs that were not currently busy. The ability to distribute some of the processing tasks was not without its own difficulties; if someone turned off their local computer while it was doing remote processing, or wanted to interact with it directly, the remote process might get kicked off without completing its work.

There are really two different kinds of APIs that software engineers deal with. Local APIs are often used to interact with the operating system, databases, and I/O or other processes performing work on the local processor. For example, if a programmer wants to open a file, allocate some heap memory, start a transaction on the database, show some data on screen, or pick up a mouse movement, they make API calls in their programming language to perform the task. Often, the programmer does not even know they are calling various APIs because the programming language often hides the direct calls.

While those APIs are certainly useful (and sometimes really complex), in this book we are more interested in an alternate definition. The implementation of requests for remote processing can also be called APIs. One phrase we have seen when dealing with this kind of an API is object exchange protocol. Essentially, the API implements a protocol that specifies a way for a local process to request that a task be done from a remote location and defines the way the results of that task are returned to the local process. Included in the protocol are techniques for specifying the task to be done, the way to marshal the data to pass to the remote process, the way that information is actually conveyed to the remote location, the way the results are passed back, and the way those results should be interpreted.

As technical test analysts, it is not required that we understand all of the technical details in the implementation of these APIs. We do, however, have to understand how to test them in representative environments and in realistic ways.

APIs offer organization-to-organization connections that promote integration between them. Rather than having every group reinvent the wheel pro-grammatically, cooperation between some aspects of different organizations can save money and resources for all involved. In some cases, a particular business may supply some kind of up-to-date information to other organizations to leverage its own business. For example, Amazon might leverage APIs from UPS, FedEx, and USPS, each of which allows immediate access to shipping rates and delivery time frames. This helps Amazon service its customers and helps the shipping companies compete more effectively for the business. In the old days, Amazon would likely use a printed rate schedule—updated periodically by the shipping companies—to review shipping information. With the use of APIs, a shipping company can run sales and give special rates to good customers, in effect trying to maximize its market share.

As noted earlier, there are no free lunches. Using APIs means that your organization is assuming all risks pertaining to those APIs. Should they fail, the damage is to your organization because the user does not care if it was a remote API that failed. They are going to blame the organization that they were dealing with directly. You can protest and point your finger at the real source of the original error, but it will not matter to the damaged user.

Those same connections that can provide synergy between organizations can also provide pain for all groups involved. In our example, an Amazon customer who is charged twice as much as expected, or does not receive the two-day delivery they expected, is not going to be aware that the real problem was caused by failure of the shipper’s API; they are going to blame Amazon.

Over the last 15 years the use of APIs has increased astronomically among business organizations, especially on the Internet. As originally envisioned, the Internet was predominately a static environment. A user requested a particular site by typing in a URL (or clicking a link) in their browser and the website would send a static screen of information to that browser. In January 1999, Darcy DiNucci, an electronic information design consultant, prognosticated:

The Web we know now, which loads into a browser window in essentially static screenfuls, is only an embryo of the Web to come. The first glimmerings of Web 2.0 are beginning to appear, and we are just starting to see how that embryo might develop. The Web will be understood not as screenfuls of text and graphics but as a transport mechanism, the ether through which interactivity happens. It will [...] appear on your computer screen, [...] on your TV set [...] your car dashboard [...] your cell phone [...] hand-held game machines [...] maybe even your microwave oven.”13

Most would say he understated the evolution of the web. Because an image is worth a thousand words, here is the preceding quote presented graphically:

Image

Figure 2–16 The evolution of the Web

An organization is likely to use private, partner, and public APIs. Very often a business organization will use APIs to tie together the disparate systems after merging with another company. This can reduce the amount of work needed—rather than trying to merge all of the systems together physically.

Large organizations led the way in creating APIs. Some large enterprises have so many complex and interlocking APIs that they have to develop a way to manage them, developing a new architecture called the enterprise service bus (ESB).

In Figure 2–17, the exponential growth of public APIs can be seen.14

Image

Figure 2–17 The growth of public APIs

In 2013, it was estimated that there were over 10,000 publically available APIs published, more than twice as many as in 2011. It was also estimated that businesses worldwide have millions of private APIs. Looking forward, it is only going to get more intense, with estimates of 30,000 public APIs by 2016 and even estimates of 1,000,000 by 2017.15 We may be coming to a time when most communication between machine and machine or human and machine will involve APIs. And they all need to be tested.

These figures are not included to startle readers. For testers, this should be seen as a huge opportunity. APIs are not easy to test, which means there is a growing need for skilled technical test analysts (since the heavy use of tools and automation is required to test APIs effectively). The fastest growing segment of computing worldwide is mobile computing. It should not be a surprise to find that mobile environments are substantially run by multiple layers of APIs.

When an API fails, an organization can be exposed to high levels of risk. Often APIs reveal security, performance, reliability, portability, and many other types of risks. Risks accrue to both the organization providing an API and the organizations consuming it.

Research performed by tool vendor Parasoft along with the Gartner group in 201416 found the following:

  • Over 80 percent of those responding to the survey claimed they had stopped using one or more APIs because they were too buggy.

  • Top impacts of API issues reported:

    Development delay

    Increased customer support service costs

    Time to market delays

    Testing delays

    Lost business

  • 90 percent of respondents report that one or more APIs failed to meet their expectations.

  • 68 percent encountered reliability or functional issues.

  • 42 percent encountered security issues.

  • 74 percent encountered performance issues.

  • The most severe integrity issues encountered with APIs:

    61 percent reliability

    22.2 percent security

    16.7 percent performance

Clearly there may be problems with APIs that require intensive testing. The following issues are among those that must be considered when preparing to test APIs:

  • Security issues. This can be a huge problem with the use of APIs because they often have great access to the core data and functional areas of an organization’s computing center. The following problems can occur:

    Worm injections and other payload-based attacks.

    Ineffective authentication.

    Insufficient level encryption.

    Unauthorized access control.

    Cloud issues. If the APIs are located in the cloud, security issues might even be worse since there is often little effective control over the physical network security.

  • Unexpected misuse. This may often be innocent due to poor documentation or programming errors by a consumer of the API. However, it might also be malicious, which leads us back to security testing.

  • Performance issues. There is a wide range of problems that may be detected through proper performance testing:

    How is the API projected to be used?

    What will happen if it suddenly becomes popular?

    Does the API exhibit high availability for worldwide 24/7 usage?

    Are there times when the usage might peak (e.g., a sports website delivering on-demand statistics during the final game of the World Cup)?

  • Portability issues. Your APIs are liable to need to work on a variety of layers. They might be called directly, or they might be called by other APIs, which in turn are called by other APIs in turn controlled by an ESB.

  • Documentation. This must be tested for correctness. Since a wide range of developers may need to interact with your APIs, the documentation must provide sufficient information for all of them. And the quality of the documentation is critical. Data order, data types, formatting, even capitalization in the documentation is critical and must be correct.

  • Usability. The consumers of APIs are expected to be software developers; can they actually learn and understand how to use the APIs successfully? Clearly, this will partially depend on the documentation mentioned above.

  • Testing. Those using your API will likely want to test it with their own layers of APIs. This may require special environments (sandboxes) for them to play in. Do those work correctly without interfering with the production environment?

  • Timing issues. APIs are often developed by separate groups. Trying to make sure the schedules of all the participants are coherent is often like herding cats. That means that change becomes a constant during the development phase. One group makes some changes, and then those changes ripple through all of the development teams like a tsunami.

  • Discovery. The API is liable to manipulate resources, creating, listing, updating, and deleting them. Are those operations available and correct for all of the ways they may be performed?

  • Compliance. Many industries require that software be compliant with various regulations and standards. Regulations may come from governmental and quasi-governmental entities such as the FDA or EPA. Standards may come from international organizations or industry groups. For example, if the software processes credit or debit cards, it must be PCI compliant. Systems that handle health information in any way are subject to HIPAA compliance. APIs may be subject to service-level agreements (SLAs) or other technical agreements. Failure to meet such agreements may cost the organization a great deal in penalties or fines.

  • Error handling. Have you tested all the possible error conditions that may occur? Often errors are generated by arguments being passed in incorrectly (e.g., out of order) or out of range.

  • Change management. Does your API depend on other APIs? When those change, how will you know? In addition, the data schema for an API is critical. Even a small change to a schema—for example, changing an integer into a real—may cause defects and failures to ripple far downstream. And changes are often rampant.

  • Failure management. If (and when) your API fails in production, how will you know it? When (and if) outside organizations stop using your API, will you be able to figure out why and when it occurred?

  • Asynchronicity. Because APIs are very loosely coupled, timing glitches and lost and duplicated transactions are a very common failure mode.

  • Combinatorial testing. Because APIs are often used in layers—both horizontally and vertically—there turns out to be many different ways to call them. Have you tested all of the boundary values concurrently?

  • Protocols used.

Experts note that the amount of time and resources needed to adequately test APIs is often the same as needed to analyze and develop them. This can cause problems with management who are used to much smaller test efforts relative to the development effort.

A recent study performed by a popular automation tool vendor17 found that the typical application under test had an average of 30 dependencies. Those dependencies include third-party applications, mainframe and mini-computers, enterprise resource planning packages (ERPs), databases, and APIs (private, partner, and public). In this interconnected world, the rules of testing are changing. Testing no longer can simply be done during the SDLC and then dropped, waiting for the next cycle.

Many enterprise applications are becoming increasingly interdependent using APIs from private, partner, and public entities. The success of an end-to-end transaction becomes increasingly a function of the working of the composite gestalt with its integrity dependent on its weakest links.

Given all of the risk described above, how does an organization survive or maybe even thrive testing APIs?

Automation of testing is an absolute requirement. Manual techniques simply do not allow enough testing to mitigate the risks noted above, given that a single API by itself may take hundreds of tests that might need to be repeated daily (or more often) as environments evolve, changes to data structures occur, and dependent APIs from partner and public sources mutate. This is especially true since often the APIs must be monitored in production to solve issues there as soon as they crop up (continuous regression testing in production).

Not only must each API be tested alone (often using simulators and/or mock objects), they also needed to be tested together in end-to-end test scenarios. While this can be a huge effort, the alternative is to have untested scenarios actually cause failure in production.

The automation tools that are needed for API testing must have intuitive interfaces, automating as much of the test development as possible as well as the execution of the tests. The protocols used to define the APIs tend to be well developed and documented (if not fully mature). That allows many of the tools in the market today to deliver facilities for creating multiple tests to cover the protocols. This helps the technical test analyst create more and better tests in less time.

Tests on many platforms are required. The automation tools must be able to test across ESBs, databases, architectures, protocols and messaging types. In order to service so many different platforms, virtualization is also an important requirement. Many APIs have limits to the amount of times you can use them in a given period, or you are charged for their use. That means a tool must be able to simulate those resources that would not be directly testable on the fly (or at least with minimal manual actions).

In many ways, what we are describing here sounds a lot like unit testing. However, API testing is not unit testing. Despite the technical details we have discussed, API testing is still essentially black-box testing. Setting up the tests certainly requires technical skills, but at the end of the setup, a figurative button is pushed and the test runs, passing or failing. The reason for a failure is usually not discernable by running a debugger as a developer would in a unit test.

Unit tests are often trying to ensure that a chunk of code works in isolation. API testing must be designed to test the full functionality of the system—although, as noted earlier, we may need to use stubs and drivers as part of our test harness to allow for missing pieces. That makes the API tests much more thorough in their testing of broad scenarios and data flows.

The data used for API testing is a critical part of the overall picture. The data source should be static to the point that using it for testing should not be able to change it. For example, a common issue with regression testing via automation is that the data used for the testing tends to evolve as the tests run. If a regression set runs only partially, or must be reset, the data has changed and must be refreshed. In API testing, allowing the data to change during the testing would place an undue amount of pressure on the test analysts to maintain it. Consider that some APIs may call other APIs, themselves called by yet other APIs. Trying to make sure that a changing data store is coherent would quickly become an impossible task.

To conclude this topic, we have heard API testing—for all of the reasons mentioned—described as similar to juggling chainsaws while seated on a unicycle that is standing on a medicine ball. Attention to detail and great intestinal fortitude is required to succeed. All in a day’s work for most technical test analysts…

2.8 Selecting a Structure-Based Technique

Learning objectives

TTA-2.8.1 (K4) Select an appropriate structure-based technique according to a given project situation.

After looking at all of these structure-based techniques, we need to consider one more topic. How much of this stuff do we need to do on our project?

As with so many other topics in software testing, the correct answer is, It depends! The context of the software—how, where, when, how much, and who—will determine how deep we need to get into structural testing.

The more important the software, the more an organization will likely lose when a failure occurs.

The higher the risk—and cost—of failure, the more testing is needed. The more testing needed, the higher the cost in time and resources.

Somewhere, these three related but disjoint imperatives will come to a single sweet spot that will define the amount of testing we need to do. At least that was what we were told when we started testing. Often we find that illumination comes only in the rear-view mirror.

Sometimes the amount of testing, and the techniques we use, is defined not by the testers but by the legal standards or guidelines to which we are required to adhere. In this section, we will look at some standards that may force our choices.

The British Standards Institute produces the BS 7925/2 standard. It has two main sections, test design techniques and test measurement techniques. For test design, it reviews a wide range of techniques, including black-box, white-box, and others. It covers the following black-box techniques that were also covered in the Foundation syllabus:

  • Equivalence partitioning

  • Boundary value analysis

  • State transition testing

It also covers a black-box technique called cause-effect graphing, which is a graphical version of a decision table, and a black-box technique called syntax testing.

It covers the following white-box techniques that were also covered in the Foundation syllabus:

  • Statement testing

  • Branch and decision testing

It also covers some additional white-box techniques that were covered only briefly or not at all in the Foundation syllabus:

  • Data flow testing

  • Branch condition testing

  • Branch condition combination testing (similar to decision condition testing, discussed earlier)

  • Modified condition decision testing

  • Linear Code Sequence and Jump (LCSAJ) testing18

Rounding out the list are two sections, “Random Testing” and “Other Testing Techniques.” Random testing was not covered in the Foundation syllabus, but we’ll talk about the use of randomness in relation to reliability testing in a later chapter. The section on other testing techniques doesn’t provide any examples but merely talks about rules on how to select them.

You might be thinking, “Hey, wait a minute: that was too fast. Which of those do I need to know for the Advanced Level Technical Test Analyst exam?” The answer is in two parts. First, you need to know any test design technique that was covered in the Foundation syllabus. Such techniques may be covered on the Advanced Level Technical Test Analyst exam. Second, we have covered all the white-box test design techniques that might be on the Advanced Level Test Analyst exam in detail earlier in this chapter.

BS 7925/2 provides one or more coverage metrics for each of the test measurement techniques. These are covered in the measurement part of the standard. The choice of organization for this standard is curious indeed because there is no clear reason why the coverage metrics weren’t covered at the same time as the design techniques.

However, from the point of view of the ISTQB fundamental test process, perhaps it is easier that way. For example, our entry criteria might require some particular level of test coverage, as it would if we were testing safety-critical avionics software subject to the United States Federal Aviation Administration’s standard DO-178C. (We touched on that standard when we covered MC/DC coverage earlier.) So, during test design, we’d employ the required test design techniques. During test implementation, we’d use the test measurement techniques to ensure adequate coverage.

In addition to these two major sections, this document also includes two annexes. Annex B brings the dry material in the first two major sections to life by showing an example of applying them to realistic situations. Annex A covers process considerations, which is perhaps closest to our area of interest here. It discusses the application of the standard to a test project, following a test process given in the document. To map that process to the ISTQB fundamental test process, we can say the following:

  • Test analysis and design along with test implementation in the ISTQB process is equivalent to test specification in the BS 7925/2 process.

  • BS 7925/2 test execution, logically enough, corresponds to test execution in the ISTQB process. Note that the ISTQB process includes test logging as part of test execution, while BS 7925/2 has a separate test recording process.

  • Finally, BS 7925/2 has checking for test completion as the final step in its process. That corresponds roughly to the ISTQB’s evaluating test criteria and reporting.

Finally, as promised, let’s complete our discussion about the DO-178C standard. This standard is promulgated by the United States Federal Aviation Administration (US FAA). As noted, it covers avionics software systems. In Europe, it’s called ED-12B. The standard assigns a criticality level, based on the potential impact of a failure. Based on the criticality level, a certain level of white-box test coverage is required, as shown in Table 2–20.

Table 2–20 FAA DO-178C mandated coverage

Image

Let us explain Table 2–20 a bit more thoroughly:

Criticality level A, or Catastrophic, applies when a software failure can result in a catastrophic failure of the system. A catastrophic failure is defined as:

Failure conditions which would result in multiple fatalities, usually with the loss of the airplane.

For software with such criticality, the standard requires modified condition/decision, decision, and statement coverage. Interestingly, for this level of criticality, structural coverage may only be applied to the source code if it can be shown that the source maps to the compiled object code. We discussed the possibility of the compiled code not having a one-to-one relationship earlier when we discussed path coverage.

Criticality level B, or Hazardous and Severe, applies when a software failure can result in a hazardous, severe, or major failure of the system. The standard defines this level of criticality as:

Failure conditions which would reduce the capability of the airplane or the ability of the flight crew to cope with adverse operating conditions to the extent that there would be:

(1) A large reduction in safety margins or functional capabilities,

(2 Physical distress or excessive workload such that the flight crew cannot be relied upon to perform their tasks accurately or completely, or

(3) Serious or fatal injuries to a relatively small number of the occupants other than the flight crew.

For software with such criticality, the standard requires decision and statement coverage.

Criticality level C, or Major, applies when a software failure can result in a major failure of the system. The standard defines this as:

Failure conditions which would reduce the capability of the airplane or the ability of the crew to cope with adverse operating conditions to the extent that there would be, for example, a significant reduction in safety margins or functional capabilities, a significant increase in crew workload or in conditions impairing crew efficiency, or discomfort to flight crew, or physical distress to passengers or cabin crew, possibly including injuries.

For software with such criticality, the standard requires only statement coverage.

Criticality level D, or Minor, applies when a software failure can only result in a minor failure of the system. The standard defines this as:

Failure conditions which would not significantly reduce airplane safety, and which would involve crew actions that are well within their capabilities. Minor failure conditions may include, for example, a slight reduction in safety margins or functional capabilities, a slight increase in crew workload, such as, routine flight plan changes, or some physical discomfort to passengers or cabin crew.

For software with such criticality, the standard does not require any level of coverage.

Finally, criticality level E, or No Effect, applies when a software failure cannot have an effect on the system. The standard defines this as:

Failure conditions that would have no effect on safety; for example, failure conditions that would not affect the operational capability of the airplane or increase crew workload.

These categories make a certain amount of sense. You should be more concerned about software that affects flight safety, such as rudder and aileron control modules, than about software that doesn’t, such as video entertainment systems. However, there is a risk of using a one-dimensional white-box measuring stick to determine how much confidence we should have in a system. Coverage metrics are a measure of confidence, it’s true, but we should use multiple coverage metrics, both white-box and black-box. Rest assured that DO-178C does require that requirements-based black-box testing be done in addition to the structural testing we have called out here. It also includes a raft of other QA techniques, including requirements and design reviews, unit testing, and static analysis.

Just to make sure we have hit all of the terminology correctly, the United States FAA DO-178C standard bases the extent of testing—measured in terms of white-box code coverage—on the potential impact of a failure. That makes DO-178C a risk-aware testing standard.

Another interesting example of how risk management, including quality risk management, plays into the engineering of complex and/or safety-critical systems is found in the ISO/IEC standard 61508, which is mentioned in the Advanced syllabus. This standard applies to embedded software that controls systems with safety-related implications, as you can tell from its title, “Functional safety of electrical/electronic/programmable electronic safety-related systems.”

The standard is very much focused on risks. Risk analysis is required. It considers two primary factors as determing the level of risk: likelihood and impact. During a project, we are to reduce the residual level of risk to a tolerable level, specifically through the application of electrical, electronic, or software improvements to the system.

The standard has an inherent philosophy about risk. It acknowledges that we can’t attain a level of zero risk—whether for an entire system or even for a single risk item. It says that we have to build quality, especially safety, in from the beginning, not try to add it at the end. Thus we must take defect-preventing actions like requirements, design, and code reviews.

As we were working on the second edition of this book, a news article was published* claiming that hackers could indeed “hack the satellite communications equipment on passenger jets through the WiFi and inflight entertainment systems.” The cyber security researcher, Ruben Santamarta, claims that the entertainment systems are wide open and vulnerable. Representatives for the company that manufactures the satellite equipment claim that such a hacker must have physical access to the equipment, so corruption is unlikely since “…there are strict requirements restricting access to authorized personnel only.” This does go to show, however, that even with the mandated coverage such as shown by Table 2–20, security is rarely as good as we might hope, and technical test analysts need to be ever vigilant for the possibility of a breach.

* http://www.reuters.com/article/2014/08/04/us-cybersecurity-hackers-airplanes-idUSKBN0G40WQ20140804

The standard also insists that we know what constitutes tolerable and intolerable risks and that we take steps to reduce intolerable risks. When those steps are testing steps, we must document them, including a software safety validation plan, software test specifications, software test results, software safety validation, verification report, and software functional safety report.

The standard addresses the author-bias problem. As discussed in the Foundation syllabus, this is the problem with self-testing, the fact that you bring the same blind spots and bad assumptions to testing your own work that you brought to creating that work. So the standard calls for tester independence, indeed insisting on it for those performing any safety-related tests. And since testing is most effective when the system is written to be testable, that’s also a requirement.

The standard has a concept of a safety integrity level (or SIL), which is based on the likelihood of failure for a particular component or subsystem. The safety integrity level influences a number of risk-related decisions, including the choice of testing and QA techniques.

Some of the test design techniques we have already discussed in this chapter, others fall under the Advanced Test Analyst purview and are discussed in Rex’s book Advanced Software Testing - Vol. 1: Guide to the ISTQB Advanced Certification as an Advanced Test Analyst. A few of the topics we will cover in upcoming chapters, such as dynamic analysis, performance testing, static analysis, and test automation.

Again, depending on the safety integrity level, the standard might require various levels of testing. These levels include module testing, integration testing, hardware-software integration testing, safety requirements testing, and system testing.

If a level of testing is required, the standard states that it should be documented and independently verified. In other words, the standard can require auditing or outside reviews of testing activities. In addition, continuing on with that theme of “guarding the guards,” the standard also requires reviews for test cases, test procedures, and test results along with verification of data integrity under test conditions.

The standard requires the use of structural test design techniques such as discussed in this chapter. Structural coverage requirements are implied, again based on the safety integrity level. (This is similar to DO-178C.) Because the desire is to have high confidence in the safety-critical aspects of the system, the standard requires complete requirements coverage not once but multiple times, at multiple levels of testing. Again, the level of test coverage required depends on the safety integrity level.

Structure-Based Testing Exercise

Using the code in Figure 2–18, answer the following questions:

1. How many test cases are needed for basis path coverage?

2. If we wanted to test this module to the level of multiple condition coverage (ignoring the possibility of short-circuiting), how many test cases would we need?

3. If this code were in a system that was subject to FAA/DO178C and was rated at Level A criticality, how many test cases would be needed for the first if() statement alone?

4. To achieve only statement coverage, how many test cases would be needed?

Image

Figure 2–18 Code for structure-based exercise

2.8.1 Structure-Based Testing Exercise Debrief

1. How many test cases are needed for basis path coverage?

The control graph should appear as shown in Figure 2–19.

The number of test cases we need for cyclomatic complexity can be calculated in three different ways. Well, maybe. The region method clearly is problematic because of the way the graph is drawn. This is a common problem when drawing directed control flow graphs; there is a real art to it, especially when working in a timed atmosphere.

Image

Figure 2–19 Directed control flow graph

Let’s try McCabe’s cyclomatic complexity formula:

C = E - N + 2

C = 15 - 9 + 2

C = 8

Alternately, we could try the rule of thumb method for finding cyclomatic complexity: count the decisions and add one. There are decisions made in the following lines:

1. Line 3: if

2. Line 7: if (remember, the else in line 10 is not a decision)

3. Line 12: while

4. Line 14: if

5. Line 16: if

6. Line 23: else if (this is really just an if—this is often called a ladder if construct)

7. Line 25: if

Therefore, the calculation is as follows:

C = 7 + 1

C = 8

All this means we have eight different test cases, as follows:

1. ABI

2. ABCI

3. ABCDI

4. ABCDEFI

5. ABCDEGI

6. ABCDEGHI

7. ABCDEFDI

8. ABCDEGHDI

At this point, you’ll need to develop test cases that will force execution through those paths. At this point in the process of developing tests from McCabe’s technique, you can often find that some paths are not achievable. For example, path ABCDI, in which the body of the while() loop is never executed, should be impossible, since the value of branch is initialized to root, which we already know is not equal to zero at this point in the code. You can try to use a debugger to force the flow to occur, but no ordinary set of input values will work.

2. If we wanted to test this module to the level of multiple condition coverage (ignoring the possibility of short-circuiting), how many test cases would we need?

The first conditional statement looks like this:

(! (isReq() && isBeq() && (isNort() || (isFlat() && isFreq()))))

Each of these functions would likely be a private method of the class we are working with (ObjTree). We can rewrite the conditional to make it easier to understand.

(! (A && B && (C || (D && E)))

A good way to start is to come up with a truth table that covers all the possibilities that the atomic conditions can take on. With five atomic conditions, there are 32 possible combinations, as shown in Table 2–21. Of course, you could just calculate that by calculating 25. But since we want to discuss how many test cases we would need if this code was written in a language that did short-circuit Boolean evaluations (it is!), we’ll show it here.

Table 2–21 Truth table for five atomic conditions

Image

Thirty-two separate test cases would be needed. Note that 16 of these test cases evaluate to TRUE. This evaluation would then be negated (see the negation operator in the code). So 16 test cases would survive and move to line 7 of the code. We would still need to achieve decision coverage on the other conditionals in the code (more on that later).

3. If this code were in a system that was subject to FAA/DO178C and was rated at Level A criticality, how many test cases would be needed for the first if() statement alone?

The standard that we must meet mentions five different levels of criticality.

Table 2–22 FAA/DO178C software test levels

Image

This means that we must achieve MC/DC level coverage for the first if() statement. Note that there are no other compound conditional statements in the other decisions, so we can ignore them; decision coverage will cover them.

In order to achieve MC/DC coverage, we must ensure the following:

A. Each atomic condition is evaluated both ways (T/F).

B. Decision coverage must be satisfied.

C. Each atomic condition must be able to affect the outcome independently while other atomic conditions are held without changing.

The expression we are concerned with follows:

(! (A && B && (C || (D && E)))

We will start out by ignoring the first negation, shown by the exclamation mark at the beginning. We don’t know about you, but inversion logic always gives us a headache. Since we are inverting the entire expression (check out the parentheses to see that), we can just look at the expression. First, let’s figure out how to make sure each atomic condition can affect the outcome independently. We will create a template table showing the individual atomic conditions and then work with that to identify neutral values for the tests.

Table 2–23 Template for MC/DC test creation

Image

Let’s find the neutral values we will use.

(A && B && (C || (D && E))

A: A is ANDed with B, therefore its neutral value is T.

B: B is ANDed with A, therefore its neutral value is T.

C: C is ORed with (D && E), therefore its neutral value is F.

D: D is ANDed with E, therefore its neutral value is T.

E: E is ANDed with D, therefore its neutral value is T.

Note that when we want (C || (D &&E)) to equal T, we can use the following combinations:

T F F

T T T

T T F

T F T

F T T

Likewise, if we want to make it equal to F, we can use the following combinations:

F T F

F F T

F F F

We may need to substitute these variations to get down to the theoretical minimum number of tests which is 6 (n+1). Substituting these values into Table 2–23 gives us Table 2–24.

Table 2–24 Test values for MC/DC

Image

It took us less than 4 minutes to come up with these values. To double check:

Cell (2,2) → T. But if A is changed to F → F
Cell (2,3) → F. But if A is changed to T → T
Cell (3,2) → T. But if B is changed to F → F
Cell (3,3) → F. But if B is changed to T → T
Cell (4,2) → T. But if C is changed to F → F
Cell (4,3) → F. But if C is changed to T → T
Cell (5,2) → T. But if D is changed to F → F
Cell (5,3) → F. But if D is changed to T → T
Cell (6,2) → T. But if E is changed to F → F
Cell (6,3) → F. But if E is changed to T → T

4. To achieve only statement coverage, how many test cases would be needed?

Okay, so this one is relatively simple! You could trace through the code and determine the number. Or, you could notice that the way it is written, statement coverage is going to be the same as decision coverage. How do we know that?

The first if() (line 3) needs to be tested both ways to get statement coverage. When TRUE, it does a quick return, which must be executed. When FALSE, we get to go further in the code.

The second if() (line 7), third (line 14), fourth (line 16), and sixth (line 25) each has a statement in both the TRUE and FALSE directions.

The else if() (line 23) is the else for the if() (line 14) and has an else of its own, both of which have code that needs to execute for statement coverage.

The only confusing piece to this answer is the while() (line 12). The way the code is written, it should not allow it to ever evaluate to FALSE. That means we would have to use a debugger to change the value of branch to test it.

Since we have already gone through getting decision coverage in answer 3, we will accept that answer.

2.9 A Final Word on Structural Testing

Before we leave structural testing, Boris Beizer wrote a sharp argument seemingly against doing white-box structure-based testing in his book Software Testing Techniques; we used his second edition, published in 1990. We think that he can express it better than we can, so it is included here:

Path testing is more effective for unstructured rather than structured code.

Statistics indicate that path testing by itself has limited effectiveness for the following reasons:

  • Planning to cover does not mean you will cover—especially when there are bugs contained.

  • It does not show totally wrong or missing functionality

  • Interface errors between modules will not show up in unit testing

  • Database and data-flow errors may not be caught

  • Incorrect interaction with other modules will not be caught in unit testing

  • Not all initialization errors can be caught by control flow testing

  • Requirements and specification errors will not be caught in unit testing.

After going through all of those problems with structural testing, Beizer goes on to say:

Creating the flowgraph, selecting a set of covering paths, finding input data values to force those paths, setting up the loop cases and combinations—it’s a lot of work. Perhaps as much work as it took to design the routine and certainly more work than it took to code it. The statistics indicate that you will spend half your time testing it and debuggingpresumably that time includes the time required to design and document test cases. I would rather spend a few quiet hours in my office doing test design than twice those hours on the test floor debugging, going half deaf from the clatter of a high-speed printer that’s producing massive dumps, the reading of which will make me half blind. Furthermore, the act of careful, complete, systematic test design will catch as many bugs as the act of testing.…The test design process, at all levels, is at least as effective at catching bugs as is running the test designed by that process.

2.10 Sample Exam Questions

1. Which of the following statements correctly represents the relative strength of different structural levels of testing?

A. 100 percent condition coverage guarantees 100 percent statement coverage.

B. 100 percent statement coverage guarantees 100 percent branch coverage but not 100 percent decision coverage.

C. 100 percent condition coverage does not guarantee 100 percent branch coverage.

D. 100 percent decision condition coverage does not guarantee 100 percent statement coverage.

2. Given the following predicate, which test would achieve decision condition coverage for this construct of code with a minimum number of tests? Assume a test contains 3 factors—values for A, B, and C. T == TRUE, F == FALSE.

while ((A || B) && (C < 100)) {

 

D++

}

A. (T,T,98); (F,T,101); (T,F,100)

B. (T,T,99); (F,F,0)

C. (T,F,579); (F,T,-63)

D. (T,T,0); (F,F,50); (T,F,100)

3. Using the following neutral table, select which set of tests gives MC/DC level coverage for the illustrated predicate. Assume no short-circuiting by the compiler.

Image

A. (T,T,F,F)(F,F,T,T)(F,T,F,T)(T,F,F,F)(F,F,T,F)

B. (T,T,T,T)(F,F,F,F)(T,T,F,F)(F,F,T,T)

C. (T,T,F,F)(T,F,T,T)(F,T,F,T)(T,T,T,F)(T,T,F,T)

D. (F,T,F,T)(F,T,F,,F)(F,T,T,T)(T,F,T,F)

4. You have been asked to test your C code to either MC/DC or multiple condition-level coverage. You have a specific piece of code, shown below. How many test cases would it take to test to MC/DC-level coverage? How many would it take to test to multiple condition-level coverage? Assume no short-circuiting by the compiler.

if ((!(a&&b)) || (c > d)) {

 

z = func(a,b,c,d);

}

else {

 

z = func(b,a,d,c);

A. MC/DC: 5, multiple condition: 16

B. MC/DC: 4, multiple condition: 8

C. MC/DC: 3, multiple condition: 4

D. MC/DC: 5, multiple condition: 8

5. The following C code function will allow a browser to connect to a given website.


#include<windows.h>
#include<wininet.h>
#include<stdio.h>
int main()
{


   HINTERNET Initialize,Connection,File;
   DWORD dwBytes;
   char ch;
   Connection = InternetConnect(Initialize,"www.xxx.com",
                INTERNET_DEFAULT_HTTP_PORT,NULL,NULL, 
                INTERNET_SERVICE_HTTP,0,0);


   File = HttpOpenRequest(Connection,NULL,"/index.html",
          NULL,NULL,NULL,0,0);


   if(HttpSendRequest(File,NULL,0,NULL,0))
   {
      while(InternetReadFile(File,&ch,1,&dwBytes))
      {
         if(dwBytes != 1)break;
         putchar(ch);
      }
   }
   InternetCloseHandle(File);
   InternetCloseHandle(Connection);
   InternetCloseHandle(Initialize);
   return 0;
}

What is the minimum number of test cases that would be required to achieve statement coverage for this code?

A. 1

B. 2

C. 4

D. 6

6. Given the following snippet of code, which of the following values for the variable Counter will give loop coverage with the fewest test cases?

...
for (i=0; i<=Counter; i++) {
   Execute some statements;
}

A. (-13, 0,1,795)

B. (-1,0,1)

C. (0,1,1000)

D. (-7,0,500)

7. In a module of code you are testing, you are presented with the following if() statement. How many different test cases would you need to achieve multiple condition coverage (assume no short-circuiting by the compiler).

if (A && B || (Delta < 1) && (Up < Down) || (Right >= Left)) {
   Execute some statements;
}
Else {
   Execute some statements;
}

A. 24

B. 32

C. 48

D. 128

8. The following code snippet reads through a file and determines whether the numbers contained are prime or not.

Table 2–25 Code snippet

1  Read (Val); 
2  While NOT End of File Do 
3       Prime := TRUE; 
4       For Holder := 2 TO Val DIV 2 Do 
5            If Val - (Val DIV Holder)*Holder= 0 Then 
6                 Write (Holder, ` is a factor of', Val); 
7                 Prime := FALSE; 
8            Endif; 
9       Endfor; 
10      If Prime = TRUE Then 
11           Write (Val , ` is prime'), 
12      Endif; 
13      Read (Val); 
14  Endwhile; 
15  Write('End of run')

Calculate the cyclomatic complexity of the code.

A. 3

B. 5

C. 7

D. 9

9. Given Beizer’s theories of path coverage, the flow below (Figure 2–20), and the tests already listed, which of the following paths through the code would best complete the set of tests required?

Image

Figure 2–20 Flow chart

Test case 1: a,e,g

Test case 2: a,b,d,e,g

Test case 3: a,b,d,e,f

A. a,b,c,b,d,e,f

B. a,e,f

C. a,b,c,b,c,e,g

D. a,b,c,e,f

10. Which of the following API quality risks is primarily due to loose coupling?

A. Security risks

B. Lost transactions

C. Low availability

D. Incorrect parameter marshalling

11. You are testing code that controls the radar system in a Boeing 777. The code is part of the completely redundant backup system and hence is judged to be Level B criticality according to standard ED-12B. To which two of the following levels of structural coverage would it be necessary to test this code?

A. 100 percent MC/DC coverage

B. 100 percent branch coverage

C. 100 percent statement coverage

D. 100 percent condition decision coverage

E. Multiple condition coverage

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset