Chapter 7. Touring and Testing’s Primary Pain Points

“One man’s crappy software is another man’s full time job.”

Jessica Gaston

The Five Pain Points of Software Testing

Never in the history of mankind has society depended so completely on a product that is often so deeply flawed. Not only does software control our systems of government, law enforcement, banking, defense, communication, transportation, and energy, but it also holds the key to the computationally intensive solutions that will one day remake this planet. How will we tame our economic markets, achieve clean energy, or control our changing climate without the computing power of software? Beyond the innovative spirit of the human mind, is there a single tool that is more important to the future of mankind than software?

Yet software, more than any other product in history, is famous for its ability to fail. Newsreels overflow with stories of stranded ships, region-wide power failures, malfunctioning medical devices, exploding spacecraft, financial loss, and even human death. Minor inconveniences are so commonplace that it is a joke at Microsoft that employees act as the help desk for all their nontechnical friends and family. Computers and the software that makes them do useful things are a wonder, but their complexity is too much for the way we develop software today.

It is testing that the industry relies on as the check-and-balance between innovation and dependability. The complex nature of software development and the fallibility of the humans who write code combine to virtually guarantee the introduction of errors. But as a process to manage and minimize these errors, testing has some serious drawbacks. The five most concerning pain points are the subject of this chapter, and we must solve these issues to have any hope that software of the future will be any better than the software of today.

This chapter comes last, after the tours have been thoroughly discussed, to lead the reader into using the tours to relieve these pain points. The five pain points are as follows:

Each is discussed in order next.

Aimlessness

Much has been written about the evils of a life without purpose, but it is tests without purpose and the aimlessness of much of our modern testing practice that are creating a major testing pain point. Testing is not simply something that we can just go do. It requires planning, preparation, strategy, and adaptable tactics for it to be performed successfully. But far too many software organizations ignore proper preparation in favor of just doing it. Testing is too important to treat it so casually.

When I was a professor at Florida Tech, I taught software testing, and one semester my class was much too large for my liking. I decided to run an experiment that would scare off a few of the less-serious students. On the first day, I convened the class in a computer lab and instructed the students to identify an application to test and work in pairs to test it. I gave them no further instruction on how to carry out such testing, but as an incentive, I told them that if they impressed me with their technique, they could stay in the class. If they did not, I would see that they were automatically dropped (not something I intended to do, but the threat was sufficient for my purpose).

I prowled the lab, which had the effect of increasing the tension in the room, and occasionally stopped a pair of students and demanded to know how they intended to find a bug. Each time I rendered such questioning, I got some variation of “not sure, doc, we’re just hoping it fails.” Finally, some astute student would realize that those answers weren’t working and get a bit closer to something that indicated strategy. In fact, I remember the exact statement that caused me to admit the first pair of students and send them on their way: “We’re going through all the text boxes entering long strings, hoping to find a place where they are not checking string length.1

1 They found one, too. See attack 4 on page 35 of How to Break Software.

Bingo! Perhaps this is not the best or most important strategy, but it is a strategy, and as such it helps counter aimlessness. Software testers are far too often without strategy and specific goals as they are running tests. When testing manually, they wander the app in an ad hoc manner. And when writing automation it is simply because they know how to write it; whether the automation will find worthwhile bugs, will stand the test of time, and be worth the cost of maintenance isn’t part of the picture.

This aimless nature of software testing must stop. How often will test managers provide the meaningless advice of just go and test it before we create a better way? Stop this aimless process now one team a time.

I know, it’s easier said than done. After all, there are an infinite number of tests possible for even the simplest of applications. But I argue that there are a finite number of testing goals.

Define What Needs to Be Tested

Software testing usually occurs by partitioning an application into sections based on a component (defined by structural boundaries like code files and assemblies) or feature (specific functionality of a component) basis and assigning individual testers or teams of testers to a component or feature. Many companies I have worked with have feature teams or even assign separate test leads to each large component.

But such a partitioning does not really support good testing practice. Users don’t care about components and features, and if we want to find bugs that real users are likely to stumble upon, we will benefit by following their lead. Users care about capabilities and use various components and features to exercise a desired capability. If we test according to capabilities, we can more closely align our testing to real-world usage.

For example, I can choose a set of capabilities to test as an ensemble or focus on a single one. I can purposefully explore the capabilities across a number of features or stick to the capabilities of a single feature. I can exercise capabilities that cross component boundaries or choose to stay within a component. I can rely on architecture documentation or written specs to build a component/feature/capability map to guide my testing and ensure that I cover the interesting combinations to the extent possible. Focusing on more fine-grained capabilities rather than higher-level notions of components and features puts me in a position to better understand what needs to be tested. To test a feature, I must exercise its capabilities in varying degree and order. By explicitly calling out the capabilities, I make this job more purposeful and can more easily account for progress and coverage.

Determine When to Test

The idea of decomposing features into capabilities can help organize a test team to focus on testing real-world concerns of users. In the best case, manual testers should be free to find subtle but important bugs by forcing an application to perform tasks it might be faced with when in the hands of a real user. However, the extent to which this is possible requires that prior testing must be effective at reducing the overall level of “bug noise.”

Bug noise is what I call the niggling bugs and issues that keep testers from being productive. If all testers are doing is finding technical issues like an input field allowing characters when it should only allow numbers or constantly finding the same bug over and over again, productivity will fall. In the best case, all of these issues have already been found by prior developer testing, unit test, code reviews, and so forth. If not, a great deal of effort in manual testing will be spent finding them, and that means fewer cycles to run more tours and find more subtle but probably more impactful issues.

This means that over time it is important to understand how the bug-finding efforts in each testing phase matches the actual bugs being found. At Microsoft, this is part of the bug triage process: For every bug found, explicitly determine when the bug should have been caught. That way we can learn how to focus review effort, unit testing effort, and so forth based on historical bug data. It takes a few project cycles to perfect this process, but it pays off in the long run.

Determine How to Test

As the prior point was focused on the testing phase, this point is about testing type, meaning specifically manual versus automated testing. In Chapter 2, “The Case for Manual Testing,” I spent a great deal of effort describing the differences between the two, and I will not rehash that here. However, within manual testing, it is also useful to classify how certain bugs were found. Was it ad hoc testing, scripted, or exploratory? Was a specific tour responsible for guiding the tester to the bug in question? If so, document this.

Teams that take care to match technique to bug have gone a long way toward answering the question how. Ultimately, a collective wisdom of sorts will emerge in the group in that certain bug types will be linked with certain tours or techniques, and testers will know that “this function/feature is best tested this way.”

This is where the tours come in. Tours are a way to identify higher-level test techniques and over time understand the relationship between tours and features they are good at testing and bugs they are likely to find. As teams establish a base set of simple and advanced tours, they will have the link between feature type and bugs that they can use to make testing far less aimless than before.

Repetitiveness

We test and then we test some more. As our application grows in features, we run old tests on the existing features and new tests on the new ones. As the product grows over its life cycle, however, the new tests soon become old ones, and all of them eventually become stale.

Even stale tests have their role to play. As bugs are fixed, features and functionality must be retested, and existing test cases are seen as the least expensive way to retest the application. Indeed, it is foolish to waste any test case, and the idea of disposable test assets is a repugnant one for busy and overworked testers. The industry has found test case reuse such a useful paradigm that we’ve given the activity special names such as regression tests or even regression suites (to make them sound more thorough?) to highlight their role and purpose. For an application of any size and longevity, regression test cases can sometimes number in the millions.

Let’s put aside the problem of maintaining such large test suites and focus on the repetition problem. Treading over already well-worn paths and data/environment combinations has only limited utility. It can be useful to verify a bug fix, but not to find new bugs or to test for potential side effects of a code change, and they are of no use whatsoever in testing new or substantially modified features. Worse still is the fact that many testers and developers put unwarranted faith in such tests, particularly when they are large in number. Running a million test cases sounds nice at face value (at least to managers and vice presidents), but it is what’s in those test cases that really matters. Is it good news or bad news that a million plus members of a regression suite have executed clean? Are there really no bugs, or is the regression suite just incapable of finding the bugs that remain? To understand the difference, we must have a firmer grasp on what testing has already occurred and how our present and future testing will add to the whole.

Know What Testing Has Already Occurred

When a regression suite executes clean, we can’t be sure whether this is good news or bad news. Boris Beizer called this phenomenon the pesticide paradox, and I can frame it no better than he did. If you spray a field of crops with the same pesticide, you will kill a large number of critters, but those that remain are likely to develop strong resistance to the poison. Regression suites and reused test cases are no different. Once a suite of tests finds its prescribed lot of bugs, those bugs that remain will be immune to its future effects. This is the paradox: The more poison you apply, a smaller percentage of bugs are killed over time.

Farmers need to know what pesticide formula they are using and understand that over time its value decreases. Testers must know what testing has already occurred and understand that reusing the same tired techniques will be of little bug-finding value. This calls for intelligent variation of testing goals and concerns.

Understand When to Inject Variation

Solving the pesticide paradox means that farmers must tinker with their pesticide formula, and for testers it requires injection of variation into test cases. That is a bigger subject than this book covers, but an important part of it is woven throughout the whole tours concept. By establishing clear goal-oriented testing techniques and understanding what types of bugs are found using those techniques, testers can pick and choose techniques that better suit their purpose. They can also vary the techniques, combine the techniques, and apply them in different orders and in different ways. Variation of testing focus is the key, and the methodology in this book provides the tools to achieve a consistent application of effective and ever-changing “pesticide.”

Of course, simply changing the formula is a process that can be improved, too. Farmers know that if they match the right pesticide for their specific crop and the bugs they expect to combat, they achieve even more success. Your scenarios and tours are your pesticide and inject variation in the scenarios, as described in Chapter 5, “Hybrid Exploratory Testing Techniques,” and using the tours in a variety of orders and with varying data and environments can help ensure that the ever-changing formula is one that potential bugs will never get used to.

Real pesticides have labels that show what crops they are safe for and what critters they target. Can we say the same about our tests? Not yet, but pesticide makers got where they are only by a lot of trial and error and by learning from all those trials. Software testers can and should do a lot of the same.

Transiency

Two communities regularly find bugs: the testers who are paid to find them, and the users who stumble upon them quite by accident. Clearly, the users aren’t doing so on purpose, but through the normal course of using the software to get work (or entertainment, socializing, and so forth) done, failures occur. Often, it is the magic combination of an application interacting with real user data on a real user’s computing environment that causes software to fail. Isn’t it obvious then that testers should endeavor to create such data and environmental conditions in the test lab to find these bugs before the software ships?

Actually, the test community has been diligently attempting to do just that for decades. I call this process bringing the user into the test lab, either in body or in spirit. Indeed, my own Ph.D. dissertation was on the topic of statistical usage testing, and I was nowhere near the first person to think of the idea, as my multipage bibliography will attest. However, there is a natural limit to the success of such efforts. Testers simply cannot be users or simulate their actions in a realistic enough way to find all the important bugs. Unless you actually live in the software, you will miss important issues. And most testers do not live in their software; they are transients, and once the application is shipped, they move on to the next one.

It’s like homeownership. It doesn’t matter how well the house is built. It doesn’t matter how diligent the builder and the subcontractors are during the construction process. The house can be thoroughly inspected during every phase of construction by the contractor, the homeowner, and the state building inspector. There are just some problems that will be found only after the house is occupied for some period of time. It needs to be used, dined in, slept in, showered in, cooked in, partied in, relaxed in, and all the other things homeowners do in their houses. It’s not until the teenager takes an hour-long shower while the sprinklers are running that the septic system is found deficient. It’s not until a car is parked in the garage overnight that we find out the rebar was left out of the concrete slab. And time matters, as well. It takes a few months of blowing light bulbs at the rate of one every other week to discover the glitch in the wiring, and a year has to pass before the nailheads begin protruding from the drywall. How can a home builder or inspector hope to find such issues?

These are some number of bugs that simply cannot be found until the house is lived in, and software is no different. It needs to be in the hands of real users doing real work with real data in real environments. Those bugs are as inaccessible to testers as nail pops and missing rebar are to home builders.

The tours and other exploratory constructs in this book are of limited value in fighting transience. Getting users involved in testing will help, getting testers involved with users so that they can create tours that mimic their actions will help, too. But at the end of the day, testers are transients. We can do what we can do and nothing more. It’s good to understand our limitations and plan for the inevitable “punch lists” from our users. Pretending that when an application is released the project is over is simply wrong headed. There is a warranty period that we are overlooking, and that period is still part of the testing phase. I approach this topic in the next chapter, which explores the future of testing.

Monotony

Testing is boring. Don’t pretend for a moment that you’ve never heard a developer, designer, architect, or other nonquality-assurance-oriented role express that sentiment. In fact, few QA people I know wouldn’t at least agree that many aspects of what they do day in and day out are, if not boring, monotonous and uncreative.

As exhilarating as the hunt for bugs is early in one’s career, for many it gets monotonous over time. I see this period of focusing exclusively on the hunt as a rite of passage, an important trial by fire that helps immerse a new tester in testing culture, technique, and mindset. However, if I had to do it for too long as the main focus of my day, I’d go bonkers. This monotony is the reason that many testers leave the discipline for what they see as the more creative pastures of design and development.

This is shortsighted because testing is full of interesting strategic problems that can entertain and challenge: deciding what to test and how to combine multiple features and environmental consideration in a single test; coming up with higher-level test techniques and concepts and understanding how a set of tests fits into an overall testing strategy. All of these are interesting, strategic problems that often get overlooked in the rush to test and test some more. The tactical part of testing, actually running test cases and logging bugs, is the least interesting part, yet it is the main focus of most testers’ day, week, month, and career.

Smart test managers and test directors need to recognize this and ensure that every tester splits their time between strategy and tactics. Take the tedious and repetitive parts of the testing process and automate them. Tool development is a major creative task at Microsoft, and is well rewarded by the corporate culture.

For the hard parts of the testing process, such as deciding what to test and determining test completeness, user scenarios, and so forth, we have another creative and interesting task. Testers who spend time categorizing tests and developing strategy (the interesting part) are more focused on better testing and thus spend less time running tests (the boring part).

Testing remains an immature science. A thinking person can make a lot of insights without inordinate effort. By ensuring that testers make time to step back from their testing effort and find insights that will improve their testing, teams will benefit. Not only are such insights liable to improve the overall quality of the test, but the creative time will improve the morale of the testers involved.

This book addresses this need for creativity using tours as a higher-level representation of test cases. The act of recognizing, documenting, sharing, and perfecting a tours-based approach to testing has been widely cited at Microsoft as a productive, creative, and fun way to do more effective testing.

Memorylessness

I have a habit of doing paired testing at Microsoft, where I sit with another tester and we test an application together. I vividly recall one such session with a tester who had a good reputation among his peers and was idolized by his manager for his prolific bug finding.2 Here’s how the conversation went in preparation for the paired testing session:

2 See my blog entry for “measuring testers” in Appendix C, “An Annotated Transcript of JW’s Microsoft Blog,” to see what I think of counting bugs as a way to measure a tester’s value.

ME: “Okay, we just installed the new build. It has some new code and some bug fixes, so there is a lot to do. Can you give me the rundown on what testing you did on the prior builds so that we can decide where to start?”

HIM: “Well, I ran a bunch of test cases and logged a bunch of bugs.”

ME: “Okay, but what parts did you test? I’d like to start off by testing some places you haven’t covered a great deal.”

And from this I got a blank stare. He was doing a lot of testing, but his memory of where he had been, what he had tested, and what he had missed was nonexistent. Unfortunately, he’s not unique in his lack of attention to his past. I think it is a common trait of modern testing practice.

Testing is a present-oriented task. By this I mean that testers are mostly caught in the moment and don’t spend a lot of time thinking about the future. We plan tests, write them, execute them, analyze them, and quickly forget them after they have been used. We don’t spend a lot of time thinking about how to use them on future versions of our application or even other testing projects.

Test teams are often even worse about thinking toward future efforts. Tests are conceived, run, and discarded. Features are tested without documenting the insights of what worked and what did not work, the good tests versus the bad. When the testing has finished, what has the team really learned?

Even the industry as a whole suffers from this amnesia. How many times has an edit dialog been tested over time and by how many different testers? How many times has the same function, feature, API, or protocol been tested? What is it about testing, that we don’t take collective wisdom seriously?

Often, the memory of good tests and good testing resides in the head of the testers who performed it. However, testers move from project to project, team to team, and company to company far too often for them to be a useful repository of knowledge.

Test cases aren’t a good currency for such memory either. Changes to an application often require expensive test case maintenance, and the pesticide paradox lessens the value of preexisting tests.

Tours are somewhat better because a single tour can represent any number of actual test cases, and if we are diligent about mapping tours to features and to bugs, we will create a ledger for our product that will give the next set of testers a great deal of insight into what we did that worked and what was less effective.

Conclusion

The industry’s approach to manual testing has been either to overprepare by writing scripts, scenarios, and plans in advance or underprepare by simply proceeding in an ad hoc manner. Software, its specs, requirements, and other documentation change too much to over-rely on the former and is too important to entrust to the latter. Exploratory testing with tours is a good middle ground. It takes the testing task up a level from simple test cases to a broader class of test strategy and technique.

Having a strategy and a set of prescriptive techniques allows testers to approach their task with much more purpose, directly addressing the problem of aimlessness. Tours also force a more variable approach to test case creation, so that the problems of repetitiveness and monotony are attacked head on. Furthermore, the tours provide a structure to discuss test technique and create a tribal knowledge and testing culture to address both transiency (as much as it can be addressed without real users in the loop) and memorylessness. Tour usage can be tracked, and statistics about their coverage and bug-finding ability can be compiled into more meaningful and actionable reports that testers can learn from and use to improve their future efforts.

Exercises

1. Think about the software you use every day. Write a tour that describes the way you use it.

2. If testers could acquire data from their users and use it during testing, we would likely find more bugs that users would care about. But users are often uncooperative in sharing their data files and databases. Can you list at least three reasons why?

3. How can a tester keep track of what parts of the application have been tested? Can you name at least four things a tester can use as the basis for completeness measures?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset