Chapter 3. Exploratory Testing in the Small

“Any sufficiently advanced bug is indistinguishable from a feature.”

Rich Kulawiec

So You Want to Test Software?

The above quote is one of my favorites and captures much of the complexity of software testing in a single sentence. If we can’t tell features from bugs, how can we possibly do a good job of testing? If the product’s specification and documentation aren’t good enough to tell bugs from features, isn’t testing impossible? If the symptoms of failure are so subtle as to evade both automated and manual attempts to expose them, isn’t testing useless?

Imagine the following job description and ask yourself whether you would apply for it:

Software tester wanted. Position requires comparing an insanely complicated, poorly documented product to a nonexistent or woefully incomplete specification. Help from original developers will be minimal and given grudgingly. Product will be used in environments that vary widely with multiple users, multiple platforms, multiple languages, and other requirements yet unknown but just as important. We’re not quite sure how to define them, but security and performance are paramount, and post release failures are unacceptable and could cause us to go out of business.

Okay, so it’s tongue-in-cheek, but it is close enough to the mark that I bet anyone who has been in the industry long enough can appreciate the accuracy of this job description. For those of you lucky enough for this to seem like a foreign concept, you have my congratulations.

Testing such a complex product as software against incomplete expectations for nebulous quality concerns seems like an impossible ambition. Indeed, the lack of good information makes testing a lot harder than it has to be, and all testers suffer from this. However, we’ve been testing software for several decades, and notwithstanding the bugs shown in Chapter 1, “The Case for Software Quality,” that software has managed to change the world. Clearly, there is a lot we know about testing.

So what is it that software testers actually do when they approach this impossible task? Well, the first step is to appreciate the enormity and complexity of testing. Approaching it lightly and assuming that it will be easy is a really great way to fail. Admitting that no matter what you do will be inadequate is the right beginning attitude. Testing is infinite; we’re never really done, so we must take care to prioritize tasks and do the most important things first. Because testing really is infinite, we’ll never be finished. The goal is to get to the point that when the software is released, everything we have not done is less important than everything we have done. If we achieve this, we help minimize the risk of releasing too early.

Testing is about making choices. It’s about understanding the complexities involved in running tests and analyzing the information available to help us choose between the many possible variables inherent in the testing process. This chapter is about those choices in the small. It covers the little choices that exploratory testers make as they explore an application’s functionality, from how to decide which inputs to enter into a text box and how to interpret error messages, to understanding the relationship between prior inputs and those you might choose to enter later. In subsequent chapters, we discuss larger issues of exploration, but we first need to acquire the tools necessary to make the small decisions wisely.

The one nice thing about exploratory testing in the small is that there isn’t a lot of information necessary to perform these tasks. In-the-small testing is really about encapsulating testing experience and expertise with knowledge of how software is composed and how it executes in its operational environment so that we can make good choices during testing. These are very tactical techniques meant to solve small problems that every tester faces many times every day. They are not intended as a complete testing regime or even particularly useful for overall test case design. Those in-the-large issues are presented in the next two chapters.

The information presented in this chapter breaks choices into five specific properties of software that an exploratory tester must reason about as she tests: inputs, state, code paths, user data, and execution environment. Even taken individually, each of these presents a testing problem too large to solve with finite resources. Taken as a whole, the process of testing is mind bogglingly enormous. Thankfully, there is a great deal of guidance about how to approach this problem, and this chapter presents a collection of specific tactics that describe this guidance and how to use it to make in-the-small testing decisions.

Testing Is About Varying Things

Testers are tasked with answering questions such as the following:

• Will the software work as designed?

• Will the software perform the functions for which the user bought it?

• Will the software perform these functions fast enough, secure enough, robust enough, and so on?

Testers achieve these tasks by putting the software in some specific operational environment and then applying input that in some way mimics expected usage. This is where the trouble starts and the whole problem of infinity hits well-meaning testers square in the face. There are too many inputs to apply them all. There are too many environments to replicate them all. In fact, the number of variables of which there are “too many” is disturbing. This is why testing is about varying things. We must identify all the things that can be varied during a test and make sure we make smart decisions when we choose specific variations and exclude (out of necessity) other variations.

This reduces testing to a task of selecting a subset of inputs (and environments, etc.), applying those, and then trying to infer that this is somehow good enough. Eventually, the software has to ship, and testing can no longer impact the shipping code. We have a finite time to perform what is ultimately an infinite task. Clearly, our only hope lies in how we select those things that we can vary. We select the right ones, and we help ensure a good product. We select the wrong ones, and users are likely to experience failures and hate our software. A tester’s task is crucial and impossible at the same time!

One can easily see that completely ad hoc testing is clearly not the best way to go about testing. Testers who learn about inputs, software environments, and the other things that can be varied during a test pass will be better equipped to explore their application with purpose and intent. This knowledge will help them test better and smarter and maximize their chances of uncovering serious design and implementation flaws.

User Input

Imagine testing a huge application like Microsoft Office or a feature-rich website like Amazon. There are so many possible inputs and input combinations we could potentially apply that testing them all simply isn’t an option.

It turns out that it is even harder than it seems. Wherever a tester turns, infinite possibilities hit them head on. The first of these infinite sets are inputs.

What You Need to Know About User Input

What is an input? A general definition might be something like this:

An input is a stimulus generated from an environment that causes the application under test to respond in some manner.

This is very informal but good enough for our purposes. The key point is that an input originates from outside the application and causes some code in the application to execute. Things such as a user clicking a button is an input, but typing text into a text box is not until that text is actually passed to the application and it gets the opportunity to process it.1 Inputs must cause the software to execute and respond in some manner (including the null response).

1 This is assuming that the text box is separate from the application under test and can be legitimately viewed as a preprocessor of inputs. Of course, you may specifically want to test the functionality of the text box, in which case everything you type into it is an atomic input. It all depends on how you view the scope of the application.

Inputs generally fall into two categories: atomic input and abstract input. Things such as button clicks, strings, and the integer value 4 are atomic inputs; they are irreducible, single events. Some atomic inputs are related, and it is helpful to treat them as abstract input for the purposes of test selection. The integer 4 and the integer 2048 are both specific values (that is, atomic input). However, a tester could have chosen 5 or 256 to enter instead. It makes better sense to talk about such inputs in abstract terms so that they can be reasoned about as a whole. We can, for example, talk about an abstract input length for which we could enter any of the atomic values from 1 to 32768.

Variable input requires abstraction because of the large number of possible values that variable input can assume. Positive integers, negative integers, and character strings (of any significant length) are all practically infinite in that during a given testing cycle, we cannot apply them all. Without such exhaustive testing, we cannot ensure the software will process them all correctly.2

2 Treating two or more atomic inputs the same is known as equivalence classing those inputs. The idea is that there is no reason to submit the atomic input 4 and then separately submit 2, because they are in the same equivalence class. If you test one, you don’t need to test the other. I once heard a consultant claim that equivalence classes for testing was a myth (or maybe it was illusion, I don’t recall). His claim was that you can’t tell whether 2 and 4 are the same or different until you apply them both. From a completely black box point of view, this is technically true. But common sense would have to be completely abandoned to actually plan your tests around such a narrow view of the world. Why not check the source code and find out for sure? If the inputs cause the same code path to be generated, and both fit into their target data structures, they can be treated as equivalent for testing purposes. Don’t allow stubbornness to force you into testing the same paths over and over without any real hope of finding a bug or exploring new territory.

Any specific application accepts an arbitrarily large number of atomic inputs. Applying them all is unlikely, so from an input point of view, testing is about selecting a subset of possible inputs, applying them, and hoping that the subset will cause all the failures to surface and that we can assume the software is good enough when the other, previously untested inputs are submitted by actual users. To do this well, software testers must hone their skills in selecting one input as a better test than another input. In this and subsequent chapters, we talk about strategies to accomplish this.

But it gets harder than that. If all we had to worry about was the selection of a set of atomic inputs, testing would be much easier than it actually is. Two additional problems complicate input selection far more.

The first is the fact that inputs can often team up on software to cause it to fail. In fact, the combination of two or more inputs can often cause an application to fail even when the software has no problem with those inputs individually. You may perform a search for CDs just fine. You may independently perform a search for videos just fine. But when you search for both CDs and videos, the software goes pear shaped. Testers must be able to identify which inputs interact with one another and ensure they appear together in a single test case to have confidence that these behaviors are properly tested.

Finally, inputs can also cause problems depending on the order in which they are applied. Inputs a and b can be sequenced ab, ba, aa, or bb. We could also apply three or more consecutive inputs, which creates even more sequences (aaa, aab, aba, ...). And when more than two inputs are involved, there are even more sequence choices. If we leave out any specific sequence, it may very well be the one that causes a failure. We can order a book and check out, or we can order two books and check out, or we may choose to check out and then add to our order and check out a second time. There are too many options to contemplate (much less test) them all. Testers must be able to enumerate likely input sequences and ensure they get tested to have confidence that the software is ready for real users. Again, this is a topic we turn to in this and subsequent chapters.

How to Test User Input

The cursor sits in a text box, happily blinking away waiting for an input to be entered. Every tester faces this situation many times during the course of a day’s testing. What do you do? What strategy do you employ to decide on one input over another? What are all the considerations? It never ceases to amaze me that there is no one place a new tester can go to learn these strategies. It also amazes me that I can ask 10 testers what they would do and get 12 different answers. It’s time these considerations get documented, and this is my attempt to do so.

The place to begin is first to realize that your software is not special. There is a temptation for testers to imagine that the software they test is somehow different from every set of bits ever assembled into a contiguous binary. This simply is not the case. All software, from operating systems, APIs, device drivers, memory-resident programs, embedded applications, and system libraries, to web applications, desktop applications, form-based UIs, and games, all perform four basic tasks: They accept input, produce output, store data, and perform computation. They may exist in vastly different operational environments. Inputs may be constructed and transmitted to them in very different ways. Timing may be more of an issue in some types of applications than others, but all software is fundamentally the same, and it is this core similarity that I address in this book. Readers must take this general information and apply it to their own application using the specific rules that govern how their application accepts input and interacts with its environment. Personally, I have tested weapons systems for the U.S. government, real-time security monitors, and antivirus engines, cellular phone switches, operating systems from top to bottom, web applications, desktop applications, large server apps, console and desktop game software, and many other apps that time has expunged from my memory. I am presenting the core considerations that apply to them all, and will leave the actual application of the techniques in the capable hands of my readers.

Legal or Illegal Input?

One of the first distinctions to be made is positive versus negative testing. Are you trying to make sure the application works correctly, or are you specifically trying to make it fail? There are good reasons to do a lot of both types of testing, and for some application domains, negative testing is particularly important, so it helps to have a strategy to think through which good or bad values to test.

The first way that testers can slice this problem is based on what the developers think constitutes an illegal input. Developers have to create this partition very precisely, and they usually do so by writing error-handling code for what they see as illegal inputs. The decisions they make on how and when to create error handlers needs to be tested.

It is good to keep in mind that most developers don’t like writing error code. Writing error messages is rarely cited as the reason people are attracted to computer science. Developers want to write functional code, the code that serves as the reason people want to use the software in the first place. Often, error-handling code is overlooked or quickly (and carelessly) written. Developers simply want to get back to writing “real” functional code as quickly as possible, and testers must not overlook this area of applications because developer’s attitude toward it often ensures it is ripe with bugs.

Imagine developers writing functional code to receive an input. They may immediately see the need to check the input for validity and legality, and therefore they must either (a) stop writing the functional code and take care of the error handler, or (b) insert a quick comment (for example, “insert error code here”) and decide to come back to it later. In the former case, their brains have to context-switch from writing functional code to writing the error routine and then back again. This is a distracting process and creates an increased potential for getting it wrong. In the latter case, it isn’t unheard of to never get back to writing the error code at all, as developers are busy people. More than once I have seen such “to do” comments left in published and released software!

Developers have three basic mechanisms to define error handlers: input filters, input checks, and exceptions. Here’s how they work from a tester’s point of view.

Input Filters

Input filters are mechanisms to prevent illegal input from reaching the application’s mainline functional code. In other words, an input filter is written to keep bad input out of an application so that there is no need for the developer to worry about illegal values. If an input reaches the application, it is assumed to be a good input, and no further checks need to be made because input can be processed without worry. When performance is an issue, this is often the technique developers employ.

Input filters don’t produce error messages (that’s how they are distinguished from input checks, which are described next); instead, they quietly filter out illegal input and pass legal input to the application in question. For example, a GUI panel that accepts integer inputs may completely ignore any character or alphabetic input and only display the numeric input that is typed into its fields (see Figure 3.1).

Figure 3.1. This dialog box from PowerPoint allows the user to type numeric input only, thus filtering out invalid characters.

image

Also, the so-called list box or drop-down box is a type of input filter, in that it allows only valid inputs to be selected (see Figure 3.2). There is a clear advantage to developers because they can now write code without any further checks on input to complicate matters.

Figure 3.2. Another way to filter inputs is to allow users to choose from a predefined list of valid values.

image

From a testing point of view, we need to check a couple of things regarding input filters:

• First, did the developers get it right? If the developers partitioned the space of legal versus illegal input incorrectly, serious bugs can be the result. Imagine mistakenly putting an illegal input in the legal category. This would allow an illegal value into the software’s only line of defense (assuming that no further checks are made). If a tester suspects that this is the case, she must write a bug report so that the code gets fixed.3 The opposite is also serious: Putting a legal input into the illegal category will cause denial of service and serious user frustration, because what they are trying to do is perfectly legitimate and the software prevents them from doing it.

3 In cases where developers don’t see this as a problem, you might have to do a little more testing to convince them otherwise. Once an illegal input is in the system, apply inputs that make the software use that illegal input as often and in as many ways as possible to force any potential bugs to surface. This way you can strengthen your bug report with some more detailed information that exposes what bad things can happen when illegal inputs get processed.

• Second, can the filter be bypassed? If there is any other way to get input into the system, or to modify the inputs after they are in the system, the filter is useless, and developers will need to implement further error checking. This is a great bug to find before release because serious security side effects can be the result. Figure 3.3 shows a modified quantity value (achieved by editing the HTML source of this particular web page), which if it is not further checked will result in the user being charged a negative value and thus rip off the seller who uses this software for their online store.

Figure 3.3. Bypassing input constraints can be dangerous, as this negative number in the Quantity field shows. This technique is demonstrated in Chapter 3 of How to Break Web Software.

image

Input Checks

Input checks are part of the mainline code of an application and are implemented as IF/THEN/ELSE structures (or CASE, SELECT structures or lookup tables). They accept an input, and IF it is valid, THEN allow it to be processed, ELSE produce an error message and stop processing. The telltale sign of an input-check implementation is the presence of an error message that is generally descriptive and accurately reflects the nature of the invalidity of the input in question.

The error message here is key to the exploratory tester, and my advice is that each error message should be read carefully for mistakes and for clues to the mind of the developer. Error messages often describe fairly exact reasons why the input was invalid and how to fix it. This will give us new ideas for additional test input to drive other types of error messages to occur and, perhaps, cases that should result in error but do not.

The key difference between an input check and an exception (which is covered next) is that the input check is located immediately after the input is read from an external source. The code for reading the input has as its successor an IF statement that checks the input for validity. Therefore, the error message itself can be very precise: “Negative numbers are not allowed” is such a precise message; it tells the user exactly what was wrong with the input in question. When error messages are more general, it is an indication that an exception handler is being used. The topic of exceptions is tackled next.

Exception Handlers

Exception handlers are like error checks, but instead of checks on individual inputs, exception handlers are checks on anything that fails in an entire routine. Error handlers are located at the end of a program or in a separate file completely and handle any specified errors that are raised while the software is executing. This means that input violations are handled, but so is any other failure that can occur, including memory violations and so forth. By their very nature, exceptions handle a variety of failure scenarios, not just illegal inputs.

This means that when an error message is produced as a result of an exception being raised, it will be much more general than the specific wording possible for error checks. Because the exception could be raised by any line of code in the failing routine for any number of reasons, it’s difficult for them to be anything more than “an error has occurred” because the handler code can’t distinguish the exact nature of the problem.

Whenever a tester encounters such an open-ended, general error message, the best advice is to continue to test the same function. Reapply the input that caused the exception or vary it slightly in ways that may also cause a failure. Run other test cases through the same function. Tripping the exception over and over is likely to cause the program to fail completely.

Illegal inputs should be either ignored or result in some error message (either in a popup dialog or written to an error log file or some reserved area of the UI). Legal inputs should be processed according to the specification and produce the appropriate response. Any deviation from this and you’ve found a legitimate bug.

Normal or Special Input?

There are normal inputs, and then there are special inputs. A normal input has no special formatting or meaning and is readily digestible by the software under test. Normal inputs are those that the developers plan for and usually the ones that real users favor. Special inputs can come about through some extraordinary circumstance or by complete coincidence or accident. For example, a user may mean to type Shift-c into a text field but accidentally type Ctrl-c instead. Shift-c is an example of a normal input, the capital C character. But Ctrl-c has a completely different meaning assigned by, for example, the Windows operating system, to be copy or even cancel. Pressing Ctrl-c or some other special character in an input field can sometimes cause unexpected or even undesirable behavior.

All Ctrl characters, Alt, and Esc sequences are examples of special characters, and it is a good idea to test a sampling of these characters in your application and report undesirable behavior as bugs. Testers can also install special fonts that end users are likely to use and test different languages this way. Some fonts, such as Unicode and other multibyte character sets, can cause software to fail if it has been improperly localized to certain languages. A good place to start is to look at your product’s documentation and find out what languages it supports; then install the language packs and font libraries that will enable you to test those special inputs.

Another source of special characters comes from the platform on which your application is running. Every operating system, programming language, browser, runtime environment, and so forth has a set of reserved words that it treats as special cases. Windows, for example, has a set of reserved device names such as LPT1, COM1, AUX. When these are typed into fields where a filename is expected, applications often hang or crash outright. Depending on the container in which your application runs, the special characters you type in input fields may be interpreted by the container or by your application. The only way to find out for sure is to research the associated special characters and apply them as test input.

Default or User-Supplied Input?

Leaving text fields blank is an easy way to test. But as easy as it is for the tester, the same cannot be said of the software under test. Indeed, just because the tester didn’t do anything doesn’t mean the software doesn’t have some hard work to do.

Empty data-entry fields or passing null parameters to some API requires that the software execute its default case. Often, these default cases are overlooked or poorly thought out. They are also routinely overlooked in unit testing, so the last defense is the manual tester.

Developers have to deal with nonexistent input because they cannot trust that all users will enter non-null values all the time. Users can skip fields either because they don’t see them or don’t realize they require a value. If there are a lot of data-entry fields (like those on web forms that ask the user for billing address, shipping address, and other personal information), the error message may also change depending on which field was left blank. This, too, is important to test.

But there is more to explore beyond just leaving fields blank. Whenever a form has prepopulated values, these are what I call developer-assigned defaults. For example, you may see the value ALL in a print form field for the number of pages to print. These represent what developers think are the most likely values a reasonable user will enter. We need to test these assumptions, and we need to ensure that no mistakes were made when the developers selected those values as defaults.

The first thing to try when you see developer-assigned defaults is to delete the default value and leave the field blank. (This is often a scenario developers don’t think of. Because they took the time to assign a default, they don’t imagine a scenario where they have to deal with a missing value.) Then start experimenting with other values around the default. If it’s a numeric field, try adding one, try subtracting one. If it is a string, try changing a few values at the front of the string, try changing some at the tail of the string, try adding characters, try deleting characters. Try different strings of the same length and so on and so forth.

Input fields with default values prepopulated are often coded differently than fields that are presented with no default value at all. It pays to spend some extra time testing them.

Using Outputs to Guide Input Selection

This section has so far been about how to select inputs, and up to now all the techniques have been based on choosing inputs according to their desirable (or even undesirable) properties. In other words, some properties (type, size, length, value, and so forth) make them good as test input. Another way to select an input is to consider the output that it might (or should) generate when it is applied.

In many ways, this is akin to the behavior of a teenager trying to get permission of his parents to attend a party. The teen knows that there is one of two answers (outputs), yes or no, and he asks permission in such a way as to ensure his parents will favor the former output. Clearly, “Can I go to a wild and unsupervised rave?” is inferior to “May I join a few friends at Joey’s?” How one frames the question has a lot to do with determining the answer.

This concept applies to software testing, as well. The idea is to understand what response you want the software to provide, and then apply the inputs that you think will cause that specific output.

The first way many testers accomplish this is to list all the major outputs for any given feature and make sure they craft test input that produces those outputs. Organizing the input/output pairs into a matrix is a common way of making sure that all the interesting situations are covered.

At the highest level of abstraction, a tester can focus on generating illegal outputs or legal outputs, but most of the former will overlap the technique described earlier concerning the generation of error messages. Some such overlap is unavoidable, but you should try to focus on varying legal outputs as much as possible to ensure that new functionality and scenarios are covered.

This is a very proactive way of thinking about outputs; testers determine in advance what output they want the application to generate, and then explore scenarios that generate the desired response. A second way of thinking about outputs is more reactive but can be very powerful: Observe outputs, and then choose inputs that force the output to be recalculated or otherwise modified.

When the software generates some response for the first time, it is often the default case: Many internal variables and data structures get initialized from scratch the first time an output is generated. However, the second (or subsequent) time a response is generated, many variables will have preexisting values based from the prior usage. This means that we are testing a completely new path. The first time through we tested the ability to generate an output from an uninitialized state; the second time through we test the ability to generate the output from a previously initialized state. These are different tests, and it is not uncommon for one to pass where the other one fails.

A derivative of the reactive output test is to find outputs that persist. Persistent outputs are often calculated and then displayed on the screen or stored in a file that the software will read at some later time. If these values can be changed, it is important to change them and their properties (size, type, and so forth) to test regeneration of the values on top of a prior value. Run tests to change each property you can identify.

The complexity of input selection is only the first of the technical challenges of software testing. As inputs are continually applied to software, internal data structures get updated, and the software accumulates state information in the form of values of internal variables. Next we turn to this problem of state and how it complicates software testing.

State

The fact that any, some, or all inputs can be “remembered” (that is, stored in internal data structures) means that we can’t just get by with selecting an input without taking into account all the inputs that came before it. If we apply input a, and it changes the state of the application under test, then applying input a again cannot be said to be the same test. The application’s state has changed, and the outcome of applying input a could be very different. State impacts whether an application fails every bit as much as input does. Apply an input in one state and everything is fine; apply that same input in another state and all bets are off.

What You Need to Know About Software State

One way to think about software state is that we have to take the context created by the accumulation of all prior inputs into account when we select future inputs. Inputs cause internal variables to change values, and it is the combination of all these possible values that comprises the state space of the software. This leads us to an informal definition of state as follows:

A state of software is a coordinate in the state space that contains exactly one value for every internal data structure.

The state space is the cross-product of every internal variable. This is an astronomically large number of different internal states that govern whether the software produces the right or wrong response to an input.

The math is not encouraging. In theory, we must apply every atomic input (which we’ve established is a very large number) for every state of the application (which is an even larger number). This isn’t even possible for small applications, much less medium to large ones. If we enumerated this as a state machine where inputs cause a transition from one state to the next, we’d require a forest of paper to draw it.

For the hypothetical shopping example, we’d have to ensure that we can perform the “checkout” input for every possible combination of shopping cart entries. Clearly, we can treat many cart combinations as functionally equivalent,4 and we can focus on boundary cases such as an empty cart, without having to test every single instance of a shopping cart. We discuss such strategies for doing so in later chapters.

4 We’re back to the concept of equivalence classes again. If they really are an illusion, software testers are in some very serious trouble.

How to Test Software State

Software state comes about through the application’s interaction with its environment and its history of receiving inputs. As inputs are applied and stored internally, the software’s state changes. It’s these changes we want to test. Is the software updating its state properly? Does the state of the application cause some inputs to exhibit faulty behavior? Is the software getting into states it should never be in? The following are the major considerations for testing input and state interactions.

The input domain for software is infinite, as discussed in the previous section. We cited input variables, input combinations, and input sequences for this particular difficulty. However, the added dimension of state also has the effect of complicating a tester’s life even further. Software state comes about because of its ability to “remember” prior inputs and accumulate software state. State can be thought of as encapsulating the input history in that it is the way that software remembers what users did in previous occasions in which they used the software.

Because software state comes about through the successive application of input, testing it requires multiple test cases and successively executing, terminating, and re-executing the software. Software states are visible to testers if we take the time to notice how our input affects the system. If we enter some inputs and later see the values we entered, those inputs are stored internally and have become part of the state of the application. If the software uses the inputs in some calculation and that calculation can be repeated, the inputs must be stored internally, too.

Software state is another way of describing the sum total of prior inputs and outputs that the software “remembers.” State is either temporary, in that it is remembered only during a single execution of the app and forgotten when the application is terminated, or persistent, in that it is stored in a database or file and accessible to the application in subsequent executions. This is often referred to as the scope of data, and testing that the scope is correctly implemented is an important test.5

5 Getting the scope of data wrong has security implications. Imagine entering an input that represents a credit card number, which is supposed to be scoped only for single use. We must re-execute the application to test that the number is not incorrectly scoped as a persistent piece of state.

Much of the data that is stored either temporarily or persistently cannot be seen directly and must be inferred based on its influence on software behavior. If the same input causes two completely different behaviors, the state of the application must have been different in the two cases. For example, imagine the software that controls a telephone switch. The input “answer the phone” (the act of picking up the receiver on a land line or pressing the answer button on a cell phone) can produce completely different behaviors depending on the state of the software:

• If the phone isn’t registered to a network, there is no response or an error response.

• If the phone is not ringing, a dial tone is generated (in the case of a land line) or a redial list is presented (in the case of a cell phone).

• If the phone is ringing, a voice connection to the caller is made.

Here the state is the status of the network (registered or unregistered) and the status of the phone (ringing or idle). These values combined with the input we apply (answering the phone) determine what response or output is generated. Testers should attempt to enumerate and test as many combinations as are reasonable given their time and budget restraints and based on expectations of risk to the end user.

The relationship between input and state is crucial and a difficult aspect of testing, both in the small and in the large. Because the former is the subject of this chapter, consider the following advice:

Use state information to help find related inputs.

It is common practice to test input combinations. If two or more inputs are related in some manner, they should be tested together. If we are testing a website that accepts coupon codes that shouldn’t be combined with sale prices, we need to apply inputs that create a shopping cart with sale items and also enter a coupon code for that order. If we only test the coupon code on shopping carts without sale items, this behavior goes untested, and the owners of the site may end up losing money. We must observe the effects on the state (the shopping cart items and their price) as we test to notice this behavior and determine whether developers got it right. Once we determine that we have a group of related inputs and state data (in this example, sale items, coupon codes, and the shopping cart), we can methodically work through combinations of them to ensure we cover the important interactions and behaviors.

Use state information to identify interesting input sequences.

When an input updates state information, successive applications of that same input cause a series of updates to state. If the state is accumulating in some way, we have to worry about overflow. Can too many values be stored? Can a numeric value grow too large? Can a shopping cart become too full? Can a list of items grow too large? Try to spot accumulating state in the application you are testing, and repeatedly apply any and all inputs that impact that accumulation.

Code Paths

As inputs are applied and state is accumulated in the application under test, the application itself is executing line after line of code as its programming dictates. A sequence of code statements makes a path through the software. Informally, a code path is a sequence of code statements beginning with invocation of the software and ending with a specific statement often representing termination of the software.

There is a substantial amount of possible variation in code paths. Each simple branching structure (for example, the IF/THEN/ELSE statement) causes two possible branches, requiring that testers create tests to execute the THEN clause and separately the ELSE clause. Multibranch structures (for example, CASE or SELECT statements) create three or more possible branches. Because branching structures can be nested one inside the other and sequenced so that one can follow another, the actual number of paths can be very large for complex code.

Testers must be aware of such branching opportunities and understand which inputs cause the software to go down one branch as opposed to another. It isn’t easy to do, particularly without access to the source code or to tools that map inputs to code coverage. And the paths that are missed may very well be those with bugs.

Branches are only one type of structure that increase the number of code paths. Loops make them truly infinite. Unbounded loops execute until the loop condition evaluates to false. Often, this loop condition itself is based on user input. For example, users determine when to stop adding items to their shopping cart before they proceed to checkout: They’ve left the shopping loop and continue on to the checkout code.

There are a number of specific strategies for gaining coverage of code paths that are explored throughout this book.

User Data

Whenever software is expected to interact with large data stores, such as a database or complex set of user files, testers have the unenviable task of trying to replicate those data stores in the test lab. The problem is simple enough to state: Create data stores with specific data as similar as possible to the type of data we expect our users to have. However, actually achieving this is devilishly difficult.

In the first place, real user databases evolve over months and years as data is added and modified, and they can grow very large. Testers are restricted by a testing phase that may only last a few days or weeks, and so populating it with data must happen on much shorter time scales.

In the second place, real user data often contains relationships and structure that testers have no knowledge of and no simple way of inferring. It is often this complexity that causes the software that worked fine in the test lab to break when it has to deal with real user data.

In the third place, there is the problem of access to storage space. Large data stores often require expensive data centers that are simply not accessible to testers because of the sheer cost involved. Whatever testers do has to be done in a short period of time and on much smaller byte-scales than what will happen in the field after release.

An astute tester may observe that a simple solution for all this complexity would be to use an actual user database, perhaps by arranging a mutually beneficial relationship with a specific beta customer and testing the application while it is connected to their real data source. However, testers must use real data with great care. Imagine an application that adds and removes records from a database. Tests (particularly automated ones) that remove records would be problematic for the owners of the database. Testers must now do extra work to restore the database to its original form or work on some expendable copy of it.

Finally, another complication (as though we hadn’t enough complications already) surfaces to cause us angst when we deal with real customer data: privacy.

Customer databases often contain information that is sensitive in some way or even contains PII (personally identifiable information). In an age of online fraud and identity theft, this is a serious matter that you do not want to expose to your test team. Any use of real customer data must provide for careful handling of PII.

This makes both having and lacking real customer data problematic!

Environment

Even if it were possible to test every code path with every user input, state combination, and user data, software could still fail the moment it is installed in an environment it has never seen before. This is because the environment itself is an input source, and therefore testers need to ensure they test as many potential user environments as practical before release.

What’s in an environment? Well, it’s different depending on what type of an application you are testing. In general, it is the operating system and how it is configured; it is the various applications that coexist on that operating system that might interact with the application under test; it is any other driver, program, file, or setting that may directly or indirectly affect the application and how it reacts to input. It’s also the network that the application is connected to, and it’s available bandwidth, performance, and so forth. Anything that can affect the behavior of the application under test is part of the environment we must consider during testing.

Unlike user data, which is passive in its effect on software (data sits and waits for the software under test to manipulate it), the environment is actively interacting with our software programmatically. It provides input to our software and consumes its output. The environment consists of not only resources like the Windows Registry but also applications that are installed that interact with shared components. The sheer number of such variations is beyond our reach to re-create. Where are we going to get all the machines to re-create all those customer environments on? And if we had the hardware, how would we select the right subset of environments to include in our testing? We can’t test them all, but any tester who has been around has experienced a test case that runs fine on one machine and causes a failure on another. The environment is crucial, and it is devilishly difficult to test.6

6 Environment variation and testing is not covered in this book. However, Chapters 4 and 5 as well as Appendixes A and B of How to Break Software treat this topic at length.

Conclusion

Software testing is complicated by an overload of variation possibilities from inputs and code paths to state, stored data, and the operational environment. Indeed, whether one chooses to address this variation in advance of any testing by writing test plans or by an exploratory approach that allows planning and testing to be interleaved, it is an impossible task. No matter how you ultimately do testing, it’s simply too complex to do it completely.

However, exploratory techniques have the advantage that they encourage testers to plan as they test and to use information gathered during testing to affect the actual way testing is performed. This is a key advantage over plan-first methods. Imagine trying to predict the winner of the Super Bowl or Premier League before the season begins. This is difficult to do before you see how the teams are playing, how they are handling the competition, and whether key players can avoid injury. The information that comes in as the season unfolds holds the key to predicting the outcome with any amount of accuracy. The same is true of software testing, and exploratory testing embraces this by attempting to plan, test, and replan in small ongoing increments guided by full knowledge of all past and current information about how the software is performing and the clues it yields during testing.

Testing is complex, but effective use of exploratory techniques can help tame that complexity and contribute to the production of high-quality software.

Exercises

1. Suppose an application takes a single integer as input. What is the range of possible atomic inputs when the integer is a 2-byte signed integer? What if it is a 2-byte unsigned integer? What about a 4-byte integer?

2. As in question 1, suppose an application takes a single integer as input. Can you get by with just entering the integer 148 and assuming that if the software works when 148 is entered it will work when any integer is entered? If you answer yes, explain why. If you answer no, specify at least two conditions that will cause the software to behave differently if given 148 or another valid integer.

3. For question 2, what other values might you enter besides integers? Why would you want to do this?

4. Describe a case in which the combination of inputs can cause software to fail. In other words, each input taken on its own does not expose a failure, but when the inputs are combined, the software fails.

5. Imagine a web application that requests shipping information from a customer, including name, address, and several other fields. Most often, the information is entered in bulk, and then the information is checked only after the user clicks a Next or Submit button. Is this an example of an input check or an exception handler? Justify your answer.

6. Describe a case in which the order of inputs can cause software to fail. In other words, if you apply the sequence of inputs in one order and the software works, changing the order in which the inputs are applied might cause the software to fail.

7. Suppose you have a situation where software fails after it has been left running continuously for a long period of time. Would you ascribe this failure to input, state, code paths, environment, or data? Explain your reasoning.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset