Appendix C. An Annotated Transcript of JW’s Microsoft Blog

“If you can’t say anything nice, don’t say nothing at all.”

Disney’s Thumper (quoting his father)

Into the Blogoshere

As a former professor, I wasn’t thrilled when the blogging revolution occurred. To one used to carefully researched academic papers that required anonymous peer review, technical editing, and editorial approval, blogging seemed an unprofessional and chaotic approach to publishing. Any idiot with an opinion, educated or not, could publish anything he or she wanted.

But the twenty-first century finally caught up with me, and I did a number of guest posts for various Microsoft blogs. When my boss first asked me to start blogging regularly, it was obvious why. We have a product to sell, and he thought my blog would drive a lot of interest.

That was his plan, and part of it worked out well. My blog has drawn a lot of traffic and sits in a place of respect among Microsoft developers (although nowhere near the top of the pile). But I didn’t use it to sell anything, I used it like the rest of the idiots in the blogosphere, to spout off about my favorite subject: software quality. I wanted to use it to drive the conversation to a higher level rather than to sell tools, and whether I have succeeded is not for me alone to decide.

I’ve gotten a lot of input and comments about my blogging. Some are posted on the blog itself at http://blogs.msdn.com/james_whittaker, but most were emailed to me or occurred in hallway conversations here and at conferences and didn’t get documented. Some are additive to the subject I was blogging about, and some pointed out just how misguided I was. And a few were complaints that I was portraying my employer in a less-than-glowing light (one such from a corporate VP, no less). I’ve tried to capture the essence of those comments in this annotated transcript that appears here in the order I wrote them.

Finally, given that I have left Microsoft, the blog on MSDN is likely to disappear, and this will be the only place where they are preserved. (Any contemporary context required to help you understand these postings is provided in italic.)

July 2008

Two years before this blog started, I joined Microsoft as a security architect in our core operating system division. Security is not something that is easily talked about, and my colleague Michael Howard had the security space covered so well that I didn’t bother. It was a constant source of annoyance that people were asking me where they could find my blog. Now, it is the consummate irony that I am going around telling people where to find it, even when they don’t ask.

Before We Begin

Okay, here it is. I submit.

I’ve been bugged about blogging for years. “Where’s JW’s blog?” “Why doesn’t JW blog?”...and so forth and et cetera. Well, the blog is here and why I haven’t been blogging up to now is no longer relevant so I won’t bore you with it. Instead, here’s the blog and I’ll do my best to ensure that it’s worth the wait.

For those of you familiar with my writing, I plan to update some of my more dated work (history of testing, testing’s ten commandments, and so forth) and preview some of the information that I will be publishing in paper and book form in the future. Specifically, I now (finally) have enough notes to revise my tutorial on manual exploratory testing: How to Break Software and will be embarking on that effort soon. This blog is where I’ll solicit feedback and report on my progress.

For now, here’s an update on what’s happening, testing-wise, for me at Microsoft:

• I am the Architect for Visual Studio Team System — Test Edition. That’s right, Microsoft is upping the ante in the test tools business and I find myself at the center of it. What can you expect? We’ll be shipping more than just modern replacements for tired old testing tools. We’ll be shipping tools to help testers to test: automated assistance for the manual tester; bug reporting that brings developers and testers together instead of driving them apart; and tools that make testers a far more central player in the software development process. I can’t wait!

• I am the Chair of the Quality and Testing Experts Community at Microsoft. This is an internal community of the most senior testing and quality thought leaders in the company. We kicked off the community with record-breaking attendance (the most of any of Microsoft’s technical network communities) at our inaugural event this past spring where some of our longest-tenured testers shared a retrospective of the history of testing at Microsoft followed by my own predictions for the future of the discipline. It was a lively discussion and underscored the passion for testing that exists at this company. In this quarter’s meeting we’re doing communal deep dives into the testing-related work that is coming out of Microsoft Research. MSR, the division responsible for Virtual Earth and the Worldwide Telescope also builds test tools! I can’t wait to ship some of this stuff!

• I am representing my division (DevDiv) on a joint project with Windows called a Quality Quest. Our quest is concerned with quality, specifically, what we need to do to ensure that our next generation of platforms and services are so reliable that users take quality for granted. Sounds like I took the blue pill, doesn’t it? Well, you won’t find us dancing around acting like our software is perfect. Anyone who has ever heard me speak (either before or after I joined Microsoft) has seen me break our apps with abandon. In this Quest, we’ll leave no stone unturned to get to the bottom of why our systems fail and what processes or technology can serve to correct the situation.

So here it is: the start of a blog that I hope will allow me to share my testing enthusiasm with a wide variety of folks who both agree and disagree with my strategy and tactics. Perhaps, just perhaps, enough of us will join the dialog to help add to the collective voice of those who just want software to work.

PEST (Pub Exploration and Software Testing)

Anyone who has read Chapter 6 of How to Break Software knows my fondness of mixing testing with pubs. Many of the training and challenge events I designed for my students actually took place in a pub. Somehow the pub atmosphere tore down walls and inhibitions and helped focus the conversation on testing. There were simply none of the usual office distractions to hold people back, and pubs just give me a Zen feeling that few other places can match. Perhaps this effect can be achieved in other settings but I haven’t bothered trying those places. Indeed, the only other place I’ve ever tried is a soccer pitch, but that blog post can wait. (Let me know if you’re interested.)

How wonderful it was to experience a group in England who have formalized it: PEST is Pub Exploration of Software Testing...that’s right, a group of visionary. (Would they be anything else in my mind?) England-based testers meet monthly (or thereabouts) in a pub to talk testing and challenge each other’s knowledge and understanding of the subject of exploratory testing. The end result is clearer-headed (at least after the hangover the next day) thinking about testing, techniques, automation, and many other subjects that they imbibe.

I had the pleasure of joining them July 17 at a pub just outside Bristol. Apparently in a nod to my work, the focus of this PEST was bug finding. They set up a total of four breaking stations: (1) a computer with the PEST website (still under development), (2) a vending machine (released product), (3) a child’s video game (released product), and (4) a machine running an app intentionally seeded with bugs. As attendees filed in (~40 in all), they were given one of 10 different beer mats and people with matching mats were teamed up for exploratory testing sessions. I helped adjudicate one of the stations and rang an old style hotel bell with every verified bug. The same happened at the other stations. Each team tested all four products for identical periods of time in a round-robin fashion, and at the end of the night, prizes were given for the team with the most bugs, the most severe bug, and the best test case.

The only problem is that as a designated passenger (and all the duties that entails on behalf of the designated driver), I was having too much fun to take notes and don’t have the official score sheet. Can anyone who attended please report the results for us? However, I remember well the quote of the night came from Steve Green of Labscape: “It’s quite strange actually, testing with other people.”

Steve (who clearly excelled in exploratory testing to the point that I’d hire him without further interview), please clarify for us whether the help was welcome? As a lone Jedi of the Testing Force...weigh in on the whole paired (or in this case, teamed) vs. solo testing debate!

PEST is a fantastic idea. I’m glad I had a ride home after it though.

Measuring Testers

This post stands as one of the most viewed and commented upon posts I have written. It resonated with a lot of testers inside and outside the company. Mostly the comments were positive, but many testers hated the idea of “being measured at all and in any way whatsoever.” But that’s what performance reviews are! Sorry, but measuring people is a way of life in the business world; why shouldn’t we enter into a discussion on how to go about measuring in a meaningful way? And fundamentally, our bug-finding ability is nothing unless we wield it to reduce the number of bugs that get written. That’s the real point of the post anyway, it’s not about measuring...it’s about improvement.

Yeah, I know...scary subject. But as it is review time here at the empire, this is a subject that has been front and center for both testers and the managers they report to, so I’ve been asked about it a lot. I always give the same advice to test managers, but I’ve done so with much trepidation. However, I suddenly feel better about my answer because I’m in good company.

Before I give it away, let me tell you why I am feeling better about my answer. I came across a quote today while looking at the slides that Jim Larus is using for his keynote tomorrow at ISSTA (the International Symposium on Software Testing and Analysis). The quote captures exactly my advice to managers here at Microsoft who ask me how to rate their SDETs. Moreover, the quote comes from Tony Hoare who is a professional hero of mine and a friend of my mentor Harlan Mills (and a Knight, a Turing Award winner and Kyoto Prize winner). If Tony had said the opposite, I would have a whole lot of apologizing to do to the many test managers I’ve given this advice to. Whenever we disagree, you see, I am always wrong.

So here’s my advice: don’t count bugs, their severity, test cases, lines of automation, number of regressed suites, or anything concrete. It won’t give you the right answer except through coincidence or dumb luck. Throw away your bug finding leader boards (or at least don’t use them to assign bonuses), and don’t ask the other testers in the group to rate each other. They have skin in this game, too.

Instead, measure how much better a tester has made the developers on your team. This is the true job of a tester, we don’t ensure better software, we enable developers to build better software. It isn’t about finding bugs because the improvement caused is temporal. The true measure of a great tester is that they find bugs, analyze them thoroughly, report them skillfully, and end up creating a development team that understands the gaps in their skill and knowledge. The end result will be developer improvement, and that will reduce the number of bugs and increase their productivity in ways that far exceeds simple bug removal.

This is a key point. It’s software developers that build software, and if we’re just finding bugs and assisting their removal, no real lasting value is created. If we take our job seriously enough, we’ll ensure the way we go about it creates real and lasting improvement. Making developers better, helping them understand failures and the factors that cause them will mean fewer bugs to find in the future. Testers are quality gurus and that means teaching those responsible for anti-quality what they are doing wrong and where they could improve.

Here’s Tony’s exact words:

“The real value of tests is not that they detect bugs in the code, but that they detect inadequacies in the methods, concentration and skill of those who design and produce the code.”

—Tony Hoare 1996

Now replace the word “tests” with “testers” and you end up with a recipe for your career. I imagine I’ll be examining this subject more in future posts. Follow the link above to get Jim Larus’ take on this as well as a guided tour through some of MSRs test technology, some of which is wide of Tony’s mark and some a bit closer.

By the way, note my use of the term “empire” to describe Microsoft. I got a few scathing complaints about this. Funny enough, none of the complaints came from Microsoft employees. Can it be that we actually take the term “empire” as a compliment?

Prevention Versus Cure (Part 1)

I wrote these next five blog posts over a two-day period while sitting at the offices of Stewart Noakes’s company TCL in Exeter, England. I had a visa issue that prevented me from taking a scheduled flight to India and was stuck in a sunny and warm Exeter and hung out with Stewart, drank a lot of ale, and talked about testing almost nonstop. This series on Prevention Versus Cure was a reader favorite, not far behind the Future series. Many readers said they were funny. I credit Stewart and delicious English ale for that.

Developer testing, which I call prevention because the more bugs devs find the fewer I have to deal with, is often compared to tester testing, which I call detection. Detection is much like a cure, the patient has gotten sick and we need to diagnose and treat it before it sneezes all over our users. Users get cranky when they get app snot all over them, and it is advisable to avoid that situation to the extent possible.

Developer testing consists of things like writing better specs, performing code reviews, running static analysis tools, writing unit tests (running them is a good idea too), compilation, and such. Clearly developer testing is superior to detection for the following reasons:

  1. An ounce of prevention is worth a pound of cure. For every bug kept out of the ecosystem, we decrease testing costs and those (censored) testers are costing us a (censored) fortune. (editor note to author: the readers may very well detect your cynicism at this point, suggest tone-down. Author note to editor: I’m a tester and I can only contain my cynicism for a finite period; that period has expired.)
  2. Developers are closer to the bug and therefore can find it earlier in the lifecycle. The less time a bug lives, the cheaper it is to remove. Testers come into the game so late, and that is another reason they cost so much.

Tester testing consists of mainly two activities: automated testing and manual testing. I’ll compare those two in a future post. For now, I just want to talk about prevention versus cure. Are we better to keep software from getting sick or should we focus on disease control and treatment?

Again the answer is obvious: Fire the testers. They come to the patient too late after the disease has run rampant and the cure is costly. What the heck are we thinking hiring these people in the first place?

To be continued.

Users and Johns

Lee Copeland hates this post, and I like Lee. But I think my added insight of the John is funny. In the immortal words of Larry the cable guy, “That’s funny; I don’t care who you are.”

Does anyone out there know who was the origin of the insight that the software industry and the illegal drug trade both call their customers users? Brian Marick was the person I stole it from but as far as I know he doesn’t claim it.

Anyway, it’s an interesting insight. There are so many sweet terms we could use for those creatures who so consistently pay our salary and mortgages. My favorite is client. It has such a nice professional, mysterious ring to it. But perhaps we are in good company with the drug dealers. We get the user addicted to our functionality to the point that they overlook its downside and they come gagging for another fix (uh, version, don’t forget to plug the rush-hole!).

I suppose we should be pleased that it stopped with “user.” I, for one, would quit this industry if we start calling them “johns.” Being associated with the drug dealers is one thing, but pimps? That’s where I draw the line.

Ode to the Manual Tester

This is the post that really cemented my love affair with manual testing. I complained loudly around the halls at Microsoft that Vista suffered because its testing was overautomated. A few good manual testers would have gone a long way. The manual tester provides a brain-in-the-loop that no automation can match. Humans can’t test faster, but they can test smarter. If you’ve read this entire book without understanding the depth of my passion for manual testing, you haven’t really read this book.

Anyone who has ever seen me present knows my fondness for bug demos. I have made the point for years that there is a lot to learn from the mistakes we make, and studying past bugs represents one of the most powerful ways to learn about preventing and detecting new ones. But this post won’t belabor that point. Instead, I want to open a discussion about the various ways in which we treat bugs. And I will end up with a point that many people won’t like: that manual detection beats automation. But let’s not get ahead of ourselves, because that point will be made with more than one caveat.

Bugs are a necessary byproduct of human endeavor. We make mistakes and software isn’t the only human-produced product that is imperfect. So in many ways we are stuck with bugs, but that doesn’t mean that prevention techniques aren’t important. We can and should try our best not to introduce impurities into the software ecosystem. Failing that our next line of defense is detection and removal. Clearly detection is inferior to prevention (the whole ounce versus pound debate), but since we humans are stuck with it, we should try to detect as many bugs as possible and as soon as possible.

The first chance of detection is had by developers since they are there at the very moment of creation. (The same can be said of architects and designers so switch those roles in this argument as you will, it does not change the outcome.) In general, the tools of the trade here are first manual inspection of the written code followed by automated static analysis. I have no doubt that developers find and destroy many bugs as they write, review, and refine their code. Another round of bugs is likely found in the process of compilation, linking, and debugging.

The number, type, and relative importance of the bugs found and fixed during these developer rituals remains unknown but in my opinion, these are the easiest and lowest hanging fruit in the bug forest. They are the bugs that surface solely on the basis of the code alone. The really complex bugs that require system context, environment context, usage history, and so forth are mostly out of bounds. Simply put, developers can’t find most of these kinds of bugs.

Enough bugs escape the net to necessitate the next round of bug finding. Round two is still mainly in the hands of developers (often with tester backup): unit testing and build verification/smoke testing. The key differentiator here is that the software is executing as opposed to just sitting there to be read. This opens the door to a whole new category of bugs as execution context is brought to bear on the problem.

After years of performing, observing, and studying unit test activities, I have to say I am unimpressed. Is there anyone out there that is really good at this? Developers, who are creators at heart, approach it unenthusiastically, and testers generally consider it not to be their job. Lacking clear ownership the reality is that if the code runs from end to end on whatever input scenarios come to mind first, it gets checked into the build. Again, lacking serious study we don’t know the relative importance of the bugs found during this second phase of detection, but given the fact that so many slip through to the next phase, it can’t be working as well as it could. My own opinion is that with such little time actually spent doing unit testing, no real software state gets built up, nor do realistic user scenarios actually get run. Our expectations should be low.

Testers own the third shot at detection. At Microsoft where I now work and the dozens of companies I consulted for prior, it’s test automation that reigns supreme. I have to wonder if years ago some phenom SDET at Microsoft created an automation platform, found a boat load of bugs with it, got some huge promotion because of it, and as a result word got out that automation is the way to improve your career. Too bad. Although I salute the many fine automators at this company, we have to face facts that despite all our automation heroics, bugs...and I mean important, customer-found bugs...are slipping through. Bugs that, in my opinion, can’t or won’t be found with automation.

Automation suffers from many of the context problems that I mentioned earlier (environment, state build up, and such.) but its actual Achilles’ heel is its inability to catch most failures. Unless a crash occurs, an exception is thrown or an assert is triggered, automation won’t notice the failure. Granted: automation is important and it finds a lot of bugs that need to be found, but we have to realize that ten thousand test cases a day isn’t as good as it sounds if you don’t notice if any of them fail.

The only way to catch many of the bugs that make their way to our customers’ desktop is by creating an environment that looks like our customers’ environment, running the software to build up data and state and being there to notice when the software actually fails. Automation can play a role in this, but in 2008, it’s manual testing that is our best weapon. Frankly, I don’t see the balance of power shifting away from the manual tester in the near term. If I am right and manual testing is our best chance to find the most important bugs that put our customers at risk, we should be spending a lot more time thinking about it and perfecting it.

I’d like to hear your opinion. What say you to the prospects for manual testing?

Prevention Versus Cure (Part 2)

I got this comment from a test manager at Intel after I posted this one: “After having a team concentrate almost exclusively on automation and bragging about our 1,500 automated tests, our application crashes the first time fingers hit the keyboard. Manual testing reigns when you want to find bugs customers will see.”

I like this guy.

Ok, re-hire the testers.

Perhaps you’ve noticed but the whole prevention thing isn’t working so well. Failures in software are running rampant. Before I talk about where we should invest our resources to reverse this trend, I want to talk about why prevention fails.

I see a number of problems, not the least of which is that good requirements and specifications seldom get written and when they do they often fall out-of-date as the focus shifts to writing and debugging code. We’re working on that problem in Visual Studio Team System but let’s not get ahead of ourselves. The question in front of us now is why prevention fails. It turns out, I have an opinion about this:

The developer-makes-the-worst-tester problem. The idea that a developer can find bugs in their own code is suspect. If they are good at finding bugs, then shouldn’t they have known not to write the bugs in the first place? This is why most organizations that care about good software hire a second set of eyes to test it. There’s simply nothing like a fresh perspective to detect defects. And there is no replacement for the tester attitude of how can I break this to complement the developer attitude of how can I build this.

The software-at-rest problem. Any technique such as code reviews or static analysis that don’t require the software to actually run, necessarily analyzes the software at rest. In general this means techniques based on analyzing the source code, byte code, or the contents of the compiled binary files. Unfortunately, many bugs don’t surface until the software is running in a real operational environment. Unless you run the software and provide it with real input, many bugs will simply remain hidden.

The no-data problem. Software needs input and data to execute its myriad code paths. Which code paths actually get executed depends on the inputs applied, the software’s internal state (the values of the data structures and variables), and external influences like databases and data files. It’s often the accumulation of data over time that causes software to fail. This simple fact limits the scope of developer testing which tends to be short in duration...too short to catch these data accumulation errors.

Perhaps tools and techniques will one day emerge that allow developers to write code without introducing bugs. Certainly it is the case that for narrow classes of bugs like buffer overflows that developer techniques can and have driven to near extinction. If this trend continues, the need for a great deal of testing will be negated. But we are a very long way, decades in my mind, from realizing that dream. Until then, we need a second set of eyes, running the software in an environment similar to real usage and using data that is as rich as real user data.

Who provides this second set of eyes? Software testers provide this service, using techniques to detect bugs and then skillfully reporting them so that they get fixed. This is a dynamic process of executing the software in varying environments, with realistic data and with as much input variation as can be managed in the short cycles in which testing occurs.

In part 3 of this blog series I will turn my attention to tester testing and talk about whether we should be doing this with automation or with manual testing.

Hail Europe!

I’ve been accused often of being a Europhile, and I have to admit that it’s true. I admire Europe’s culture, history, and generally find its people likable. (Even if they don’t always feel the same way about me...I’ve been told on more than one occasion that my speaking style is a little too forward for more conservative Europeans...I’d believe it except that they keep inviting me back). From a testing point of view, I have to say that, apologies to America and Asia, Europe takes testing to a level of respectability that we have not yet reached.

I had the extreme privilege to speak to a crowd of test practitioners based in the UK last week. The event was hosted by Transition Consulting (TCL) and boasted some of the UK’s top consumers of testing services. A list of those companies is posted here, but you’ll have to scroll down a bit because Stewart Noakes has been an active blogger recently.

One comment that some of my American readers might not like: European test audiences tend to be a lot more aware and a lot more involved in this discipline of testing. Everyone seemed familiar with my writing and the writing of people like Beizer, Kaner, and Bach. I was especially surprised at the discussion of the history of the Kaner and Beizer school’s of thought in the early ’90s and the general knowledge of both industry and academic testing conferences and publications. There seems to be more eagerness to delve into the field and its history here than I generally see in my own country. These folks are really well read!

Proponents of certification might point to that as a reason since certification of testers seems far more popular in Europe. Does certification help spark people’s passion for testing? Test training in general seems more popular in Europe.

I think it might have something to do with the American bias toward test automation, particularly Microsoft’s. Most in our test community are SDETs and approach testing from a very developer-oriented perspective. They may be less inclined to think of themselves as testers and less inclined to involve themselves in its culture and history. That’s a shame. (Obviously there are many counterexamples at Microsoft but I think this is generally true among the population of tens of thousands of us.)

I am probably going to get in a lot of trouble for this post. But now that I’ve mentioned certification, I have a hankering to blog about that now. I can almost guarantee that what I have to say about certification would draw some fire.

The Poetry of Testing

Okay. I admit it. I was in the pub a little too long to be blogging at this point. But I stand behind everything in this post! This is also one of the first indications of my second Euro-inspired passion: My favorite sport is soccer. Blame it on my kids; I never liked the sport either until they started playing it. Now it’s an addiction, and since the Champions League is a lunchtime event here in Seattle, you can find me and all my foreign friends at the local pub for every single game.

God Save the Queen! (A curious statement...from my American point of view. But given what history has recorded of certain of England’s kings I’ll grant the gender bias. Anyway, Save Her all the same as she presides over a country of such glorious breweries!)

If you haven’t guessed it already, I’m visiting England. I’m also in a pub. (You probably guessed that, too.) And I just met with a half dozen or so local testers who convinced me (with the offer of free pints) to meet for a book signing. I almost never turn down a signing and I never turn down free beer, especially at the current exchange rate.

Upon parting, they urged me to turn our conversation into a blog post. Here it is. I hope it doesn’t embarrass me in the morning.

A single signature seeker was a developer. When I asked him why he bought my book, he answered that he wanted to make sure his testers weren’t successful with the “tricks” I preached in it. He intended to frustrate them by writing code that wouldn’t fail that way.

I smiled and told him that if this was a soccer, excuse me...football, game I would rip my shirt off and celebrate my goal. He looked at me funny. I think the testers got it. I bet you do, too.

He went on to describe why developing code was better than testing it. He talked about the challenge of wrestling with the compiler and deftly parrying the attempts of the IDE and the operating system to thwart him on his mission. It was a battle to him, a conquest. He was a Knight, fighting for User and Binary.

It was a great story, and I didn’t get permission to identify him so I won’t, but his passion was fantastic, and the world is a better place because he’s in software development.

But if developers are the fighters, I think of myself and my fellow testers as the bards. Testing, to me, is poetry. As I test software I have visions of inputs mingling with data, some are stored internally; some are used temporarily and discarded. I hear music playing as inputs move through the app and find their way to a data structure or get used in some computation. It helps me to think about the inputs in this way; it helps me understand what the application is doing with the input I give it, and that in turn helps me to think of ways to break it. Every potential sour note represents some possible way the developer may have screwed up. Imagine your app processing input. Listen to the poetry it recites; it will tell you when it’s going to fail.

I find this especially true of testing web apps. I envision in my mind the formation of SQL queries that my inputs cause the application to make. I form impressions in my mind of the HTML traffic that is transmitted from client to server and the response back again. What is the application doing? Where is the data going and with what purpose? These are deep, existential questions worthy of the bard in all testers. And they find bugs. The more I can picture the internal processes going on in the application, the better I am able to understand how the developer might have made a mistake.

The music of the pounce is what makes it all worthwhile. That moment in which it becomes obvious that the software can do nothing but fail. It’s euphoria; the equivalent to scoring a winning goal. But, please, keep your shirt on. That’s a cautionable offense in football, and we don’t want developers to be brandishing yellow cards at us.

Prevention Versus Cure (Part 3)

After this post, I got loads of email from Microsoft testers who “came out of the closet” as manual testing sympathizers. Automation has taken precedence over manual testing at Microsoft much the same as development presides over testing. There’s just something in the genetics of the field that makes us admire coders. But the amount of manual testing that gets done is amazing to me. People don’t talk about it because it doesn’t help their review. But people do it because it helps their software.

Now that the testers are once again gainfully employed, what shall we do with them? Do we point them toward writing test automation or ask them to do manual testing?

First, let’s tackle the pros and cons of test automation. Automated testing carries both stigma and respect.

The stigma comes from the fact that tests are code and writing tests means that the tester is necessarily also a developer. Can a developer really be a good tester? Many can, many cannot, but the fact that bugs in test automation are a regular occurrence means that they will spend significant time writing code, debugging it, and rewriting it. One must wonder how much time they are spending thinking about testing the software as opposed to writing the test automation. It’s not hard to imagine a bias toward the latter.

The respect comes from the fact that automation is cool. One can write a single program that will execute an unlimited number of tests and find bugs. Automated tests can be run and then rerun when the application code has been churned or whenever a regression test is required. Wonderful! Outstanding! How we must worship this automation! If testers are judged based on the number of tests they run, automation will win every time. If they are based on the quality of tests they run, it’s a different matter altogether.

The kicker is that we’ve been automating for years, decades even, and we still produce software that readily falls down when it gets on the desktop of a real user. Why? Because automation suffers from many of the same problems that other forms of developer testing suffers from: it’s run in a laboratory environment, not a real user environment, and we seldom risk automation working with real customer databases because automation is generally not very reliable (it is software after all). Imagine automation that adds and deletes records of a database—what customer in their right mind would allow that automation anywhere near their database? And there is one Achilles heel of automated testing that no one has ever solved: the oracle problem.

The oracle problem is a nice name for one of the biggest challenges in testing: How do we know that the software did what it was supposed to do when we ran a given test case? Did it produce the right output? Did it do so without unwanted side effects? How can we be sure? Is there an oracle we can consult that will tell us—given a user environment, data configuration and input sequence—that the software performed exactly as it was designed to do? Given the reality of imperfect (or nonexistent) specs, this just is not a reality for modern software testers.

Without an oracle, test automation can only find the most egregious of failures: crashes, hangs (maybe), and exceptions. And the fact that automation is itself software often means that the crash is in the test case and not in the software! Subtle and/or complex failures are missed in their entirety.

So where does that leave the tester? If a tester cannot rely on developer bug prevention or automation, where should she place her hope? The only answer can be in manual testing. That will be the topic of part four of this series.

Back to Testing

As I said earlier I started my Microsoft career in security. I took a lot of flack from my testing readers when I “abandoned” testing for security back in 1999. But I couldn’t help myself. After the nonevent of Y2K, I was looking for the next big bug when David Ladd (who blogs at http://blogs.msdn.com/sdl) introduced me to security. I was so uneducated on security that it was a veritable intellectual playground, and I found my testing skills to be incredibly useful. Anyone who could find security bugs could make serious impact. I wrote my second and third books in the How to Break series during this time, invented a new way to find viruses, and got gobs of funding from a very paranoid U.S. government. But security turned out to be...well read on and you’ll find out.

Since starting this blog a couple weeks ago, I’ve received more comments via email than have been posted on the blog. Many more.

It reminds me of when I was a professor and ended every class with “Anyone have a question?” Silence almost always followed that query only to have students line up after class with questions. There is something about one-on-one interactions that just seems pleasing to people. I tried to take the time to remember the questions, so I could answer them later for the entire class when I thought those answers would be generally helpful.

Well, this is the blogging business, not the teaching business and I wonder how much of any of it is helpful; however, the question that has come most frequently to my inbox is “What made you leave security to come back to testing?” Perhaps the answer has some claim to general interest.

That answer: ignorance.

In fact, ignorance was what sent me the other direction back in 2000 when my friend and colleague David Ladd (who blogs here) tweaked my interest. Ignorance is core to progress in science, Matt Ridley explained it best: “Most scientists are bored by what they have already discovered, it is ignorance that drives them on.” When David laid out the wonder of security testing (and in that sense I never really left testing) to me and I was hooked. This is an important problem in a field I know nearly nothing about. Eight years, two patents, two security books, more than a dozen papers, and two startups later I have to admit I became a bit bored.

In some ways security is getting easier. Many of the problems with security are of our own creation. Buffer overflows, for example, never had to happen. They were a result of poor implementation of programming languages. Viruses didn’t either for other reasons. Microsoft and many other companies are changing the game. Better compilers, hardened operating systems, and managed code have made many security problems simply vanish. Virtualization and cloud computing will continue this trend. Ignorance is being replaced with knowledge and nowhere is that more noticeable than in security.

When I heard Visual Studio was looking for an architect for the test business, I found my juices stirring...the siren call of unbounded ignorance.

Working in security made me realize just how hard testing really is. Testing is not a problem created by humans; it’s the nature of the beast. It’s part of the very fabric of the computer and the network in their infinite possibilities. In fact, someone wondered in another private exchange if I found much had changed in my eight years “away.” “No” was my answer “and I did not expect to.” Security has changed so fundamentally in eight short years that had the situation been reversed and it was security I took a sabbatical from, my skills would likely be suspect. Instead I find myself working on much the same testing problems as I had before.

This is not an indictment of any testing researcher, practitioner, or testing in general: It is a nod to the complexity of the problem. There is a lot of ignorance to keep all of us busy trying to find the right knowledge with which to replace it. But we cannot let the seeming lack of progress deter us from working on one of the loveliest scientific problems of our time.

Thanks for asking.

August 2008

I am still in England at this time; the next is my last post before going back to Washington. So whatever conclusions you draw about the effect of English ale on my writing must end after this next one.

Prevention Versus Cure (Part 4)

Manual testing is human-present testing. A human tester using their brain, their fingers, and their wit to create the scenarios that will cause software either to fail or to fulfill its mission. Manual testing often occurs after all the other types of developer and automated techniques have already had their shot at removing bugs. In that sense, manual testers are at somewhat of an unlevel playing field. The easy bugs are gone; the pond has already been fished.

However, manual testing regularly finds bugs and, worse, users (who by definition perform manual testing) find them, too. Clearly there is some power in manual testing that cannot be overlooked. We have an obligation to study this discipline in much more detail...there’s gold in them-thar fingers.

One reason human-present testing succeeds is that it allows the best chance to create realistic user scenarios, using real user data in real user environments and still allow for the possibility of recognizing both obvious and subtle bugs. It’s the power of having an intelligent human in the testing loop.

Perhaps it will be the case that developer-oriented techniques will evolve to the point that a tester is unnecessary. Indeed, this would be a desirable future for software producers and software users alike, but for the foreseeable future, tester-based detection is our best hope at finding the bugs that matter. There is simply too much variation, too many scenarios, and too many possible failures for automation to track it all. It requires a brain-in-the-loop. This is the case for this decade, the next decade, and at perhaps a few more after that. We may look to a future in which software just works, but if we achieve that vision, it will be the hard work of the manual testers of this planet that made it all possible.

There are two main types of manual testing.

Scripted manual testing

Many manual testers are guided by scripts, written in advance, that guide input selection and dictate how the software’s results are to be checked for correctness. Sometimes scripts are specific: Enter this value, press this button, check for that result and so forth. Such scripts are often documented in Microsoft Excel tables and require maintenance as features get updated through either new development or bug fixes. The scripts serve a secondary purpose of documenting the actual testing that was performed.

It is often the case that scripted manual testing is too rigid for some applications or test processes and testers take a less formal approach. Instead of documenting every input, a script may be written as a general scenario that gives some flexibility to the tester while they are running the test. At Microsoft, the folks that manually test Xbox games often do this, so an input would be “interact with the mirror” without specifying exactly the type of interaction they must perform.

Exploratory testing

When the scripts are removed entirely, the process is called exploratory testing. A tester may interact with the application in whatever way they want and use the information the application provides to react, change course, and generally explore the application’s functionality without restraint. It may seem ad hoc to some, but in the hands of a skilled and experienced exploratory tester, this technique can be powerful. Advocates would argue that exploratory testing allows the full power of the human brain to be brought to bear on finding bugs and verifying functionality without preconceived restrictions.

Testers using exploratory methods are also not without a documentation trail. Test results, test cases, and test documentation is simply generated as tests are being performed instead of before. Screen capture and keystroke recording tools are ideal for this purpose.

Exploratory testing is especially suited to modern web application development using agile methods. Development cycles are short, leaving little time for formal script writing and maintenance. Features often evolve quickly so that minimizing dependent artifacts (like test cases) is a desirable attribute. The number of proponents of exploratory testing is large enough that its case no longer needs to be argued so I’ll leave it at that.

At Microsoft, we define several types of exploratory testing. That’s the topic I’ll explore in part five.

If Microsoft Is So Good at Testing, Why Does Your Software Still Suck?

I had no idea what kind of traffic a blog could generate until I wrote this. This is the first blog post I wrote that made the MSDN home page, and man did it generate the hits. My inbox was on fire, and mostly the comments were positive. But I remain convinced that this is the post that got certain execs watching me. Seriously, Microsoft has produced software that we are less than proud of...so has every other software company on the planet. Software is hard to write, harder to test, and hard to get even near to perfect. We desperately need to talk about the pains and be honest about the result so that we can improve what we are doing. The most gratifying part of this post was the mail I got from our competitors. They praised my honesty and admitted their own culpability. This is software, and we are all in this together.

What a question! I only wish I could convey the way that question is normally asked. The tone of voice is either partially apologetic (because many people remember that I was a major ask-er of that same question long before I became an ask-ee), or it’s condescending to the point that I find myself smiling as I fantasize about the ask-er’s computer blue-screening right before that crucial save. (Ok, so I took an extra hit of the Kool-Aid today. It was lime and I like lime.)

After 27 months on the inside I have a few insights. The first few are, I readily concede, downright defensive. But as I’ve come to experience firsthand, true nonetheless. The last one though is really at the heart of the matter: That, talent notwithstanding, testers at Microsoft do have some work to do.

I’m not going down the obvious path: that testing isn’t responsible for quality and to direct the question to a developer/designer/architect instead. (I hate the phrase “you can’t test quality in”; it’s a deflection of blame and as a tester, I take quality directly as my responsibility.)

But I am getting ahead of myself. I’ll take up that baton at the end of this post. Let’s begin with the defensive points:

  1. Microsoft builds applications that are among the world’s most complex. No one is going to argue that Windows, SQL Server, Exchange, and so forth aren’t complex, and the fact that they are in such widespread use means that our biggest competitors are often our own prior versions. We end up doing what we call “brown field” development (as opposed to “green field” or version 1 development) in that we are building on top of existing functionality. That means that testers have to deal with existing features, formats, [and] protocols along with all the new functionality and integration scenarios that make it very difficult to build a big picture test plan that is actually doable. Testing real end-to-end scenarios must share the stage with integration and compatibility tests. Legacy sucks and functionality is only part of it...as testers, we all know what is really making that field brown! Be careful where you step. Dealing with yesterday’s bugs keeps part of our attention away from today’s bugs.

    (Aside: Have you heard that old CS creationist joke: “Why did it take god only seven days to create the universe?” The answer: “No installed base.” There’s nothing to screw up, no existing users to piss off, or prior functionality and crappy design decisions to tiptoe around. God got lucky, us...not so much.)

  2. Our user-to-tester ratio sucks, leaving us hopelessly outnumbered. How many testers does it take to run the same number of test cases that the user base of, say, Microsoft Word can run in the first hour after it is released? The answer: far more than we have or could hire even if we could find enough qualified applicants. There are enough users to virtually ensure that every feature gets used in every way imaginable within the first hour (day, week, fortnight, month, pick any timescale you want and it’s still scary) after release. This is a lot of stress to put our testers under. It’s one thing to know you are testing software that is important. It’s quite another to know that your failure to do so well will be mercilessly exposed soon after release. Testing our software is hard; only the brave need apply.
  3. On a related point, our installed base makes us a target. Our bugs affect so many people that they are newsworthy. There are a lot of people watching for us to fail. If David Beckham wears plaid with stripes to fetch his morning paper, it’s scandalous; if I wore my underpants on the outside of my jeans for a week few people would even notice. (In their defense though, my fashion sense is obtuse enough that they could be readily forgiven for overlooking it.) Becks is a successful man, but when it comes to the “bad with the good” I’m betting he’s liking the good a whole lot more. You’re in good company, David.

    But none of that matters. We’ll take our installed base and our market position any day. No trades offered. But still, we always ready to improve. I think testers should step up and do a better job of testing quality in. That’s my fourth point.

  4. Our testers don’t play a strong enough role in the design of our apps. We have this “problem” at Microsoft that we have a whole lot of wicked smart people. We have these creatures called technical fellows and distinguished engineers who have really big brains and use them to dream really big dreams. Then they take these big dreams of theirs and convince general managers and VPs (in addition to being smart they are also articulate and passionate) that they should build this thing they dreamt about. Then another group of wicked smart people called program managers start designing the hell out of these dreams and developers start developing the hell out of them and a few dozen geniuses later this thing has a life of its own and then someone asks “how are we going to test this thing” and of course it’s A LITTLE LATE TO BE ASKING THAT QUESTION NOW ISN’T IT?

Smart people who dream big inspire me. Smart people who don’t understand testing and dream big scare the hell out of me. We need to do a better job of getting the word out. There’s another group of wicked smart people at Microsoft, and we’re getting involved a wee bit late in the process. We’ve got things to say and contributions to make, not to mention posteriors to save. There’s a part of our job we aren’t doing as well as we should: pushing testing forward into the design and development process and educating the rest of the company on what quality means and how it is attained.

We can test quality in; we just have to start testing a lot sooner. That means that everyone from TF/DE through the entire pipeline needs to have test as part of their job. We have to show them how to do that. We have to educate these smart people about what quality means and take what we know about testing and apply it not only to just binaries/assemblies, but to designs, user stories, specs and every other artifact we generate. How can it be the case that what we know about quality doesn’t apply to these early stage artifacts? It does apply. We need to lead the way in applying it.

I think that ask-ers of the good-tester/crappy-software question would be surprised to learn exactly how we are doing this right now. Fortunately, you’ll get a chance because Tara Roth, one of the directors of Test for Office is speaking at STAR West in November. Office has led the way in pushing testing forward and she’s enjoyed a spot as a leader of that effort. I think you’ll enjoy hearing what she has to say.

By the way, Tara kicked butt at STAR.

Prevention Versus Cure (Part 5)

This is the last part of the Prevention versus Cure series and shows some of my early thinking on how to divide exploratory testing into smaller more consumable parts. But if you have read this book, you’ll see that my thinking evolved a great deal. I decided against the freestyle-strategy-feedback model in favor of the in-the-small and in-the-large model that I used in this book. Now you can compare which one you like better.

Okay, we’re getting to the end of this thread and probably the part that most of you have asked about: exploratory testing, particularly how it is practiced at Microsoft.

At Microsoft, we define four types of exploratory testing. This isn’t meant as a taxonomy, it’s simply for convenience, but it underscores that exploratory testers don’t just test; they plan, they analyze, they think and use any and all documentation and information at their disposal to make their testing as effective as possible.

Freestyle Exploratory Testing

Freestyle exploratory testing is ad hoc exploration of an application’s features in any order using any inputs without regard to what features have and have not been covered. Freestyle testing employs no rules or patterns; just do it. It’s unfortunate that many people think that all exploratory testing is freestyle, but that undersells the technique by a long shot as we’ll see in the following variations.

One might choose a freestyle test as a quick smoke test to see if any major crashes or bugs can be easily found or to gain some familiarity with an application before moving on to more sophisticated techniques. Clearly, not a lot of preparation goes into freestyle exploratory testing, nor should it. In fact, it’s far more “exploratory” than it is “testing” so expectations should be set accordingly.

There isn’t much experience or information needed to do freestyle exploratory testing. However, combined with the exploratory techniques below, it can become a very powerful tool.

Scenario-Based Exploratory Testing

Traditional scenario-based testing involves a starting point of user stories or documented end-to-end scenarios that we expect our ultimate end user to perform. These scenarios can come from user research, data from prior versions of the application, and so forth, and are used as scripts to test the software. The added element of exploratory testing to traditional scenario testing widens the scope of the script to inject variation, investigation, and alternative user paths.

An exploratory tester who uses a scenario as a guide will often pursue interesting alternative inputs or pursue some potential side effect that is not included in the script. However, the ultimate goal is to complete the scenario so these testing detours always end up back on the main user path documented in the script.

Strategy-Based Exploratory Testing

If one combines the experience, skill, and Jedi-like testing perception of the experienced and accomplished software tester with freestyle testing, one ends up with this class of exploratory testing. It’s freestyle exploration but guided by known bug-finding techniques. Strategy-based exploratory testing takes all those written techniques (like boundary value analysis or combinatorial testing) and unwritten instinct (like the fact that exception handlers tend to be buggy) and uses this information to guide the hand of the tester.

These strategies are the key to being successful; the better the repertoire of testing knowledge, the more effective the testing. The strategies are based on accumulated knowledge about where bugs hide, how to combine inputs and data, and which code paths commonly break. Strategic testing combines the experience of veteran testers with the free-range habits of the exploratory tester.

Feedback-Based Exploratory Testing

This category of testing starts out freestyle but as soon as test history is built up, the tester uses that feedback to guide future exploration. “Coverage” is the canonical example. A tester consults coverage metrics (code coverage, UI coverage, feature coverage, input coverage, or some combination thereof), and selects new tests that improve that coverage metric. Coverage is only one such place where feedback is drawn. We also look at code churn and bug density, among others.

I think of this as “last time testing”: the last time I visited this state of the application I applied that input, so next time I will choose another. Or the last time I saw this UI control I exercised property A; this time I will exercise property B.

Tools are very valuable for feedback-based testing so that history can be stored, searched, and acted upon in real time.

The Future of Testing (Part 1)

Microsoft has this really cool home of the future built on campus that shows how technology and software will change the way families live and communicate. If you’ve ever been to the “carousel of progress” at Disney World, you have the right picture, except that Microsoft’s is by far more modern. (Disney’s was an old exhibit and a picture of the future from a 1960’s point of view.) We’ve also made a series of videos about the future of retail, health care, productivity, manufacturing, and the like, and one day I stumbled across these videos. As beautifully done as they are, they represent a very compelling future where computers, RFIDs, and software are everywhere. As a tester, this scared me, and I couldn’t help but think that with quality as bad as it is with today’s software, how will we ever manage to test tomorrow’s apps?

Thus began my future quest, and I talked about this with dozens of people around the company and started doing presentations to get input from hundreds more. The result was a keynote presentation at Euro STAR and this blog series. Again, I updated this vision in this book, but this will help you see how the idea progressed.

Outsourcing. It’s a familiar term and the way a lot of testing gets done here in 2008. However, it wasn’t always so and it’s not liable to be that way in the future either. In this post I will talk about how I think testing will get done in the future and how outsourcing might fundamentally change as a business model for software testing.

In the beginning, very little testing was outsourced. Testing was performed by insourcers, people employed within the same organization that wrote the software. Developers and testers (often the same people performing both tasks) worked side by side to get the software written, tested and out the door.

The vendors’ role in the insourcing days was to provide tools that supported this self service testing. But the vendors’ role soon changed as demand for more than just tools surfaced. Instead of just providing tools to insourcers, vendors emerged that provided testing itself. We call this outsourcing, and it is still the basic model for the way many development shops approach testing: hire it out.

So the first two generations of testing look like this:

image

The next logical step in the evolution of testing is for vendors to provide testers and this is exactly the era we’ve entered with crowdsourcing. Yesterday’s announcement by Utest marks the beginning of this era, and it is going to be very interesting to see it unfold. Will crowdsourcers outperform outsourcers and win this market for the future? Clearly market economics and the crowds’ ability to execute will determine that, but my personal view is that the odds are stacked in favor of the crowd. This is not really an either-or situation but the evolution of the field. The older model will, over time, make way for the newer model. This will be a case Darwinian natural selection played out in the matter of only a few short years. The fittest will survive with the timeframe determined by economics and quality of execution.

That gives us the third generation:

image

And what about the future? Is there an aggressive gene buried deep in the DNA of our discipline that will evolve crowdsourcing into something even better? I think so, though it is many years and a few technological leaps away. I’ll coin a new term for now just to put a name on this concept: testsourcing.

image

Testsourcing cannot be explained however without one key technological leap that has yet to occur. This technological leap is virtualization will be described in part two of this series.

The Future of Testing (Part 2)

For testsourcing to take hold of the future of testing, two key technological barriers must be broken: the reusability of test artifacts and the accessibility of user environments. Let me explain:

Reusability: The reusability of software development artifacts, thanks to the popularization of OO and its derivative technologies in the 1990s, is a given. Much of the software we develop today is comprised of preexisting libraries cobbled together into a cohesive whole. Unfortunately, testing is not there yet. The idea that I can write a test case and simply pass it off to another tester for reuse is rare in practice. Test cases are too dependent on my test platform: They are specific to a single application under test; they depend on some tool that other testers don’t have; they require an automation harness, library, network config (and so forth) that cannot be easily replicated by a would-be re-user.

Environment: The sheer number of customer environments needed to perform comprehensive testing is daunting. Suppose I write an application intended to be run on a wide variety of mobile phones. Where do I get all these phones to test my application on them? How do I configure all these phones so they are representative of my intended customers’ phones? And the same thing goes for any other type of application. If I write a web app, how do I account for all the different OS, browsers, browser settings, plug-ins, Registry configurations, security settings, machine-specific settings, and potentially conflicting application types?

The answer that is emerging for both of these needs is virtualization, which is steadily becoming cheaper, faster, and more powerful and is being applied to application domains that run the gamut from lab management to IT infrastructure deployment.

Virtualization has great potential to empower the “crowd” for crowdsourcing. Specialized test suites, test harnesses, test tools can be one-clicked into virtual machines that can be used by anyone, anywhere. Just as software developers of today can reuse the code of their colleagues and forebears, so too will the testers in the crowd be able to reuse test suites and test tools. And just as that reuse has increased the range of applications that a given developer can reliably build, it will increase the types of applications that a tester can test. Virtualization enables the immediate reusability of complicated and sophisticated testing harnesses.

Conveniently, virtualization does the same favor for testers with respect to user environments. A user can simply one-click their entire computer into a virtual machine and make it available to testers via the cloud. If we can store all the videos in the world for instant viewing by anyone, anywhere, then why can’t we do the same with virtual user environments? Virtualization technology is already there (in the case of PCs) or nearly there (in the case of mobile or other specialized environments). We simply need to apply it to the testing problem.

The end result will be the general availability of a wide variety of reusable, automated test harnesses, and user environments that can be employed by any tester anywhere. This serves to empower the crowd for crowdsourcing, putting them on more than even footing with the outsourcers from a technology standpoint, and since they far outnumber the outsourcers (at least in theory if not yet in practice), the advantage is clearly in favor of this new paradigm.

Market forces will also favor a crowdsourcing model powered by virtualization. User environments will have a cash value as crowd testers will covet them to gain a competitive advantage. Users will be incentivized to click that button to virtualize and share their environment. (Yes, there are privacy implications to this model, but they are solvable.) And since problematic environments will be even more valuable than those that work well, there will be an upside for users who experience intermittent driver and application errors: The test VMs they create will be more valuable...there’s gold in those lemons! Likewise, testers will be incentivized to share out testing assets and make them as reusable as possible. Market forces favor a future with reusable test artifacts and virtualization makes it possible.

So what does this virtualization-powered future mean to the individual tester? Well, fast forward 20-30 years in which time millions (?) of user environments will have been captured, cloned, stored, and made available. I can envision open libraries of such environments that testers can browse for free or proprietary libraries available by subscription only. Test cases and test suites will enjoy the same treatment and will be licensed for fees commensurate with their value and applicability.

Perhaps, there will come a time when there are very few human testers at all, only a few niche, and specialized products (or products of extreme complexity like operating systems) will actually require them. For the large majority of development, a single test designer can be hired to pick and choose from the massive number of available test virtual environments and execute them in parallel: millions of person-years of testing wrapped up in a matter of hours because all the automation and end-user configurations are available and ready to use. This is the world of testsourcing.

It’s the end of testing as we currently know it, but it is the beginning of a whole new set of interesting challenges and problems for the test community. And it’s a viable future that doesn’t require more than virtualization technology that either already exists or is on the near term horizon. It also implies a higher-order effort by testers as we move into a design role (in the case of actually performing testing) or a development role (in the case of building and maintaining reusable test artifacts). No more late cycle heroics; testers are first class citizens in this virtualized future.

September 2008

In addition to the Future series, I snuck in a few one-offs. This next one on certification generated a lot of attention. Apparently, certification is making the consultants who do training a lot of money, and my skepticism toward the value of certification was not appreciated. This post generated the first real hate mail as a result of this blog. I was accused of sabotaging the certification movement by implying that Microsoft thought it was nonsense. I did much more though than simply imply it... Most people at Microsoft really do think its nonsense!

On Certification

How do you feel about tester certification? I’ve heard all the arguments for and against and looked at the different certifications and their requirements. Frankly, I have not been impressed. My employer doesn’t seem impressed either. I have yet to meet a single tester at Microsoft who is certified. Most don’t even know there is such a thing. They’ve all learned testing the old fashioned way: by reading all the books and papers they can get their hands on, apprenticing themselves to people at the company that are better at it than they are, and critiquing the gurus and would-be gurus who spout off in person and in print.

Simple logic tells me this: Microsoft has some of the best testers I have ever met. (I mean, seriously, the empire knows their testing and they know their testers. I’ve studied testing and have been credited with more test innovation than perhaps I deserve, but I know this field, and rarely a day goes by that I don’t meet a tester who is a far shot better than I am. I’d love to name some of them here, but invariably I’d leave some out and they’d be pissed. Pissed testers are not easy to deal with so that’s why I haven’t bothered naming them.) So in my experience there is an inverse relationship between certification and testing talent. The same is true of testers at other companies I admire that I meet at conferences and meetings. The really good testers I know and meet just aren’t certified. There is the occasional counterexample, but the generalization holds. (Whether the reverse is true, I have little data with which to form an opinion.)

Let me repeat, this is my experience and experience does not equate to fact. However, the reason I am blogging about this is because I met three office managers/administrators recently who are certified. These three are not testers, but they work around software testers, and they hosted a certification course and thought it would be helpful to sit in and understand what the people around them did day in and day out. They sat the courses, took the exam, and got their certification.

Hmm.

Okay, I’ll grant they are smart, curious, and hard working. But there is more to testing than that triad. They readily admit they know little about computing, even less about software. From the time I spent with them, I didn’t get the impression that they would have made good testers. Their skill lies elsewhere. I doubt they would pass any class I ever taught at Florida Tech, and I imagine they’d find the empire’s training a bit too much for them to digest as well. Yet they aced the certification exam without breaking a sweat.

What am I missing? Isn’t the point of a certification to certify that you can do something? Certify is a really strong word that I am uncomfortable using so lightly. When I hire a certified plumber, I expect said plumber to plumb beyond my uncertified ability. When I hire a certified electrician, I expect that electrician to trivialize the problems that vexed me as an amateur. If I hired a certified tester, I would expect them to test with a similar magnitude of competence and skill. I wonder if an office manager of a plumbing company could so easily get certified to plumb.

Well I checked into it. Plumbers (at least in Seattle) are indeed certified, but they don’t get that certification by taking a course and an exam (although they do both). They serve time apprenticing to a master plumber. You better believe that by the time they get that seal of approval, they can plumb till the cows come home.

I realize testing isn’t plumbing but the word certification gives me pause. It’s a strong word. Is there something more to tester certification that I am missing? Is it simply that you understand the base nomenclature of software or that you can converse with other testers and appear as one of the crowd? Or that you simply sat through a course with enough of an open mind that some of it sunk in? What value does this actually bring to the discipline? Are we any better off because we have these certifications? Are we risking certifying people who really can’t test and thereby water down the entire discipline?

I don’t think these certifications are really certifications at all. It’s just training. Calling it a certification is over selling it by a long shot. In my mind a certification means you have a seal of approval to do something beyond what an amateur/tinkerer can accomplish. Otherwise, what has the certification accomplished?

I am proud of being a tester, and if I seem arrogant to be that way then so be it. What I do and what my compatriots do is beyond a single course that an office manager, no matter how smart, can just pick up.

However, if I am wrong about certification, I’d like to be enlightened. For the life of me, I don’t see the upside.

The Future of Testing (Part 3)

This is my favorite prediction and THUD is a tool we are actively constructing.

So we are now at my third prediction that deals with information and how testers will use information to improve their testing in the future.

Prediction 1: Testsourcing

Prediction 2: Virtualization

Prediction 3: Information

What information do you use to help you test your software? Specs? User manuals? Prior (or competing) versions? Source code? Protocol analyzers? Process monitors? Does the information help? Is it straightforward to use?

Information is at the core of everything we do as software testers. The better our information about what the software is supposed to be doing and how it is doing it, the better our testing can actually be. I find it unacceptable that testers get so little information and none of it is specifically designed to make it easier to do our jobs. I am happy to say that this is changing...rapidly...and that in the near term we will certainly be gifted with the right information at the right time.

I take my inspiration for testing information from video games. In video games, we have the surfacing and use of information darn near perfected. The more information about the game, the players, the opposition, the environment, the better you play and the higher score you achieve. In video games this information is displayed in something called a HUD, or heads up display. All a players’ abilities, weapons, health info are displayed and clickable for immediate use. Likewise, your location in the world is displayed in a small minimap and information about opponents is readily available. (My son used to play Pok...mon in which he had access to a Pok...dex which kept information about all the various species of Pok...mon he might encounter in the game...I’d like a Bug-...-dex that did the same for bugs I might encounter.)

But most of the testing world is mired in black box testing without such a rich information infrastructure. Where’s is our minimap that tells us which screen we are testing and how that screen is connected with the rest of the system? Why can’t I hover over a GUI control and see source code or even a list of properties the controls implements (and that I can test)? If I am testing an API, why can’t I see the list of parameter combinations that I and all my fellow testers have already tried? I need all of this quickly and in a concise and readily consumable format that assists my testing rather than shuffling through some SharePoint site or database full of disconnected project artifacts.

My colleague at Microsoft, Joe Allan Muharsky, calls the collection of information that I want so badly a THUD — the Tester’s Heads Up Display — putting the information a tester needs to find bugs and verify functionality in a readily consumable format for software testers. Think of a THUD as a skin that wraps around the application under test and surfaces information and tools that are useful in context of the application. Few THUDs are in use today or even contain the right information. In the future, no tester would think of testing without one, just like no gamer could imagine traversing an unpredictable and dangerous world without their HUD.

If this sounds a little like cheating, then so be it. Gamers who add cheats to their HUD have an even bigger advantage over gamers who don’t. And as in-house testers who have access to the source, the protocols, the back-end, front-end and middleware we, can indeed “cheat.” We can have a massive bug-finding advantage over ordinary black box testers and users. This is exactly the situation we want: to be in a position to find our own bugs faster and more efficiently than anyone else. This is cheating I approve of wholeheartedly but we’re not currently taking advantage of the information required for the cheats.

In the future, we will. That future will be fundamentally different than the information-starved present in which we are current working.

The Future of Testing (Part 4)

There is some magic in this prediction that in retrospect, the world has not yet perfected. But as these are predictions of the future, that seems appropriate. Many people talk about moving testing forward, but they mean simply getting testers involved earlier. From where I sit, we’ve been getting testers involved in spec reviews and the like for decades. That’s moving testers forward, not moving testing forward. What we really need to do is to get testable stuff earlier so that we can apply our trade earlier in the process.

Moving Testing Forward

There is a gap that exists in testing that is eating away at quality, productivity, and the general manageability of the entire development life cycle. It is the gap between when a bug is created and when that same bug is detected. The larger the gap, the more time a bug stays in the system. Clearly that’s bad, but pointing out that the longer bugs stay in the system, the more expensive they are to remove is what we’ve done in the past.

What we’re going to do in the future is close the gap.

But closing the gap means a fundamental change in the way we do testing. In 2008 a developer can introduce a bug, quite by accident mind you — our development environments do little to discourage that, and few concerted attempts are made to find the bug until the binary is built. We insert bugs and then simply allow them free reign until far too late in the process where we depend on late cycle bug finding heroics to bail us out.

As software testers we provide a valuable set of bug finding and analysis techniques; what we have to do in the future is apply these techniques earlier in the process, far sooner than we do now. There are two main things I foresee that will help us accomplish this. One is simply not waiting for the binary and applying our tests on early development artifacts. The second is building the binary earlier so we can test it earlier.

Let’s take these in order beginning with “testing on early development artifacts.” During late-cycle heroics we apply any number of bug finding strategies on the binary through its published interfaces. We take the compiled binary or collection of assemblies, byte code and such hook them to our test harnesses, and pummel them with inputs and data until we ferret out enough bugs to have some confidence that quality is good enough. (Perhaps I’ll cover measurement and release criteria in a future blog entry.) But why wait until the binary is ready? Why can’t we apply these test techniques on architecture artifacts?...On requirements and user stories?...On specifications and designs? Can it be possible that all the technology, techniques, and testing wisdom collected over the past half century applies only to an artifact that executes? Why aren’t architectures testable in the same way? Why can’t we apply what we know to designs and storyboards? Well the answer is that there is no good reason we don’t. I actually think that many progressive groups at Microsoft do apply testing techniques early, and that in the future we’ll figure out how to do this collectively. Testing will begin, not when something becomes testable as is the case now, but instead testing will begin the moment there exists something that needs testing. It’s a subtle but important distinction.

“Building the binary earlier” is the second part of this but doing so represents a technological hurdle that needs jumping. In 2008 we write software component by component and we can’t build the whole without each of the parts being ready. This means that testing must wait until all the components achieve some level of completion. Bugs are allowed to sit for days and weeks before testing can be brought to bear on their discovery. Can we substitute partially completed components with virtual ones? Or with stubs that mimic external behavior? Can we build general purpose chameleon components that change their behavior to match the system into which they are (temporarily) inserted? I predict we will...because we must. Virtual and chameleon components will allow testers to apply their detection craft soon after a bug is created. Bugs will have little chance to survive beyond their first breath.

Testing is too important to wait until the end of the development cycle to start it. Yes, iterative development and agile create testable code earlier (albeit smaller, incomplete functionality), but we still have far too many bugs appearing after release. Clearly what we are doing is not enough. The future must bring the power of testing to bear on early development artifacts and allow us to scaffold together a workable, testable environment long before the code is entirely build-able.

The Future of Testing (Part 5)

Visualization is one area in which we are making a lot of progress in the test tools world. This is an area only a few short years away. Software testing will become much more like playing a video game within two to five years.

Visualization.

What does software look like? Wouldn’t it be helpful if we had a visualization of software that we could use while the software was being constructed or tested? With a single glance we could see that parts of it remain unfinished. Dependencies, interfaces, and data would be easy to see and, one would hope, easier to test. At the very least we could watch the software grow and evolve as it was being built and watch it consume input and interact with its environment as it was being tested.

Other engineering disciplines have such visuals. Consider the folks who make automobiles. Everyone involved in the assembly process can see the car. They can see that it has yet to have bumpers or a steering wheel installed. They can watch it progress down the mechanized line from an empty shell to a fully functional product ready to be driven to a dealer. How much longer until it is complete? Well, its forty feet from the end of the line!

The fact that everyone involved in making the car has this shared vision of the product is extremely helpful. They speak in terms they can all understand because every part, every connection, every interface is where it is supposed to be when it is supposed to be there.

Unfortunately, that is not our world. Questions or the sort asked above “How long until it is complete?” or “What tasks remain undone?” vex us. This is a problem that 21st century testers will solve.

Architects and developers are already solving it. Visual Studio is replete with diagrams and visualizations from sequence charts to dependency graphs. Testers are solving it, too. Visualization solutions exist within the empire’s walls for seeing code changes in an Xbox title (objects whose code has churned glow green when rendered and then revert to normal after they have been tested) to identifying untested complexity within the Windows code base, (Heat maps of code coverage versus code complexity can be viewed in three dimensional space leading testers right to the problem areas.) The visualizations are stunning, beautiful, and allow testers to determine what needs testing simply by glancing at the visual.

We need more of this but we need to approach the problem carefully. We can’t simply accept the diagrams provided to us by the UML and modeling crowds. Those visuals are meant to solve other problems that may or may not overlap with the problems we face. Many of the existing visuals were created to serve architects or developers whose needs are different. We need to think this through as testers. We need visuals that map requirements to code, tests to interfaces, code churn to the GUI, and code coverage to controls. Wouldn’t it be nice to launch the app under test and be able to see controls glow with an intensity that reflects the amount of coverage or the number of tests that have touched them? Wouldn’t it be nice to be able to see a graphic animating network utilization or real time database communication? Why shouldn’t we be able to see the network traffic and the SQL queries as they happen? There is much that is going on unseen beneath the covers of our application, and it’s time we surfaced it and leveraged it to improve code quality.

This is an imminently solvable problem and one that many smart people are working on. This is software testing in living color.

October 2008

During the month of October I continued my series on the future of testing and my blog numbers really started to pick up, which brought some front-page exposure on MSDN and that caused even more traffic. I also began to get a lot more requests for my Future of Testing talk around the company, so I found myself talking about this subject more and debating it with a lot of smart Microsofties. This really helped expose some weaknesses and solidify the strengths of the vision. I began to gravitate toward the “information” prediction as the primary one of the eight predictions.

But in this next part, I talk about culture. I’ve never revealed to anyone who the technical fellow/distinguished engineer actually is in the following story. I very much doubt I ever will, but I am still meeting with him regularly about software testing.

The Future of Testing (Part 6)

Testing Culture

A couple of months ago I attended a lecture given by one of the Empire’s cache of technical fellows (maybe he was a distinguished engineer, I am not sure as they look so much alike). Like all our TFs the guy was wicked smart, and as he presented a design for some new product he and his team were building, I had an epiphany.

Evidently epiphanies cause me to display facial expressions akin to one who is passing a kidney stone. The TF noticed (so did the gal sitting next to me, but I don’t want to talk about that) and approached me after the talk. Here’s how that conversation went:

“James,” (he knew my name!) “you seem to have some issue with my design or with the product. I’d love to get your feedback.”

“No, I have no problem with either your product or with your design. My problem is with you.”

“Excuse me?”

“People like you scare me,” I told him. “You spend all your time dreaming about features and enabling scenarios and designing interfaces and protocols. You are in a position of importance and people listen to you and build the stuff you dream about. And you do all this without knowing squat about testing.”

And this was the moment he sought to do the right thing...reach out to test. He invited me review the design [and] get involved. It’s exactly what you’d expect him to do.

But it is exactly the wrong response.

Having a tester involved in design is better than not having test represented at all. But not much better. Testers will be looking for testability issues. Developers will be looking for implementation issues. Who will be looking at both? Who will be able to decide on the right trade-off? Neither. Getting testers involved in design is only incremental improvement; getting designers (and every other role) involved in test is the future.

Seriously, how is it that the people who build software understand so little about testing? And why have we not tried to fix this before? Are we, as testers, so vested in our current role that we are jealously guarding the keys to our intellectual kingdom? Is testing so arcane and obscure that developers can’t find the answers they seek? Have developers grown so accustomed to handing off this “less interesting” aspect of the process to us that they now take it for granted?

Adding testers to the mix hasn’t worked. Getting them involved earlier hasn’t worked. We have products that have a 1:1 ratio of developers to testers and yet those products are not seen as highly reliable. We also have products that have far “worse” ratio that are clearly better products. I think in the future we will come to see that the separation of roles isn’t working. The separation of roles might even guarantee that testing comes late to the dance and fails to fully leverage its intellectual potential on the product.

The current testing culture and separation of roles is broken and the way to fix it is by merging roles. Quality needs to be everyone’s job. Think of it in Tolkiensian terms: one role to rule them all!

Imagine a world where testing knowledge is contained in each and every contributor’s head. The architects know testing, the designers know testing, the developers know testing, and they apply that knowledge constantly and consistently in everything they do. This doesn’t wipe out the separate testing role; there is something to be said for some amount of test independence; it enables better testing. If each decision made throughout product development asks the right testing questions, then the final system test can reach a level of thoroughness we can only dream about now. If everyone on the project understood testing, imagine what a few dedicated testers could accomplish!

Getting to this testing utopia is going to require a massive cultural change. Testing must reach into academia and the other places where programming is taught. As developers progress in their careers, this education must continue and become more advanced and powerful. We need to get to the point that all project stakeholders understand testing and can’t help but to apply its principles in everything they do. Tools will one day support this as well. One day we will be to the point that untestable software just never gets written, not because some strong tester made it happen, but because everyone on the project made it happen.

Testing is too important to be the “bit at the end” of the process. It is early in the process where design decisions impact testing, and it is there that the solutions lay. It’s also too important to leave it in the hands of a single role dedicated to quality assurance. Instead we need a fundamental cultural shift that makes quality everyone’s job and embeds its principles in everything we do.

The Future of Testing (Part 7)

I blew this one. I should have called it “Testing as Design” because that is much more what I meant. The day-to-day activities of testing will move to a higher level with all the core assets such as test environments and reusable test cases available to pick and choose from. But here it is in its original and slightly flawed form.

Testers as Designers

Modern testers play largely a role of late cycle heroics that often goes unappreciated come review and bonus time. When we find the big bug it is because we were supposed to...that’s the expectation. When we miss the big bug, people ask questions. It’s often a case of ignored-if-you-do and damned-if-you-don’t.

This is going to change and it is going to change soon because it must. My friend Roger Sherman (Microsoft’s first companywide director of Test) describes this change as the testing caterpillar becoming a butterfly. According to Roger: Testing’s butterfly is design.

I couldn’t agree more. As testing and test techniques move earlier in the process, testers will do work more similar to software design than software verification. We will place more emphasis on designing quality strategy for all software artifacts and not just the binary. We will spend more time recognizing the need for testing rather than actually executing test cases. We will oversee and measure automation rather than building and debugging it. We will spend more time reviewing the status of pre-existing tests than building new ones. We will become designers and our work will be performed at a higher level of abstraction and earlier in the life cycle.

At Microsoft this role is often that of the test architect and I think most testing jobs are moving in this direction. If you’ve read the first six posts on this Future of Testing thread, then you’ll appreciate the changes that are making this design centric role possible in the first place.

Now this sounds like a nice future but there is a decidedly black lining to this silver cloud. The blackness comes from the types of bugs and the types of testing we are currently good at in 2008. It is no great stretch to say that we are better at finding structural bugs (crashes, hangs, and bugs having to do with the software and its plumbing rather than its functionality) than we are at finding business logic bugs. But the future I’ve painted in this series has any number of technological solutions to structural bugs. That will leave the software tester to deal with business logic bugs, and that is a category of issues that I do not think our entire industry deals with in any organized or intentional fashion.

Finding business logic bugs means that we have to understand the business logic itself. Understanding business logic means far more interaction with customers and competitors; it means steeping ourselves in whatever industry our software operates; it means not only working earlier in the software life cycle but also involving ourselves with prototypes, requirements, usability, and so forth like we have never done before.

There’s hard work early in the software life cycle that testers aren’t experienced in doing. Performing well up front will mean facing these challenges and being willing to learn new ways of thinking about customers and thinking about quality.

Things are decidedly different at the front end of the assembly line, and it’s a place more and more testers will find themselves as the future makes way for the present.

The Future of Testing (Part 8)

I got a call from our privacy folks after this one. Microsoft takes great pains to protect customer information and to behave in a manner that doesn’t create identity-theft problems and the like. Still, I think that we need to migrate testing into the field through our software. It’s self-testing and self-diagnostic software. Yes there are some privacy implications, but surely we can work through those.

Testing Beyond Release

This is the final part of my series on the future of testing. I hope you’ve enjoyed it. For this post I’ve saved what might be one of the more controversial of my predictions: Namely that in the future we will ship test code with our products and be able to exercise that code remotely. I can see the hackers’ grins and hear the privacy advocates’ indignation already, but I’ll respond to those concerns in a minute.

I was in the Windows org when Vista shipped, and I recall demonstrating it to my then 8-year-old son at home one evening. He plays (and works if you’ll believe that) on computers a great deal, and he really liked the Aero interface, the cool sidebar gadgets, and the speed at which his favorite games (which at that time were Line Rider and Zoo Tycoon) ran really impressed him. I recall thinking “too bad he’s not an industry blogger,” but I digress.

At the end of the demo, he hit me with the question every tester dreads: “Daddy, which part did you do?”

I stopped speaking, which is rare for me, and stammered something unintelligible. How do you tell an 8 year old that you worked for months (I had just started at Microsoft and only got in on Vista toward the end of its cycle) on something and didn’t actually create any of it? I tried my canned answers to this dreaded question (exclamation points required...they help me convince myself that what I am saying has some truth to it):

“I worked on making it better!”

“The fact that it works as well as it does...well that’s me!”

“If it weren’t for us testers, this thing would be a menace to society!”

I am especially fond of that last one. However, all of them ring hollow. How is it that I can work on a product for so long and not be able to point to more than the absence of some of the bugs as my contribution?

I think that’s where this idea came from: that test code should ship with the binary and it should survive release and continue doing its job without the testers being present. This isn’t a lame attempt to give me and my compatriots something to point to for bragging rights, but to provide ongoing testing and diagnostics. Let’s face it; we’re not done testing when the product releases, so why should we stop?

We already do some of this. The Watson technology (the famous “send/don’t send” error reporting for Windows apps) that ships in-process allows us to capture faults when they occur in the field. The next logical step is to be able to do something about them.

Watson captures a fault and snaps an image of relevant debug info. Then some poor sap at the other end of the pipe gets to wade through all that data and figure out a way to fix it via Windows update. This was revolutionary in 2004, still is actually. In 2–5 years it will be old school.

What if that poor sap could run additional tests and take advantage of the testing infrastructure that existed before the software was released? What if that poor sap could deploy a fix and run a regression suite in the actual environment in which the failure occurred? What if that poor sap could deploy a production fix and tell the application to regress itself?

He’d no longer be a poor sap, that’s for sure.

To accomplish this it will be necessary for an application to remember its prior testing and carry along that memory wherever it goes. And that means that the ability to test itself will be a fundamental feature of software of the future. Our job will be to figure out how to take our testing magic and embed it into the application itself. Our reward will be the pleasure of seeing that sparkle in our kids’ eyes when they see that the coolest feature of all is the one we designed!

Oh, and to the hackers and privacy folks: never fear! Hugh Thompson and I warned about including test code in shipping binaries (see Attack 10 in How to Break Software Security) long ago. Since we know how to break it, we’ll be in a great position to get it right.

Speaking of Google

Why is it that every time I use Google in the title of one of my posts, traffic seems to spike? This post was only a dumb announcement but was read more than many others! But given that I am now a Google employee, perhaps it was a premonition.

Actually, it is more like speaking at Google as I am headed to GTAC tomorrow to give the newest version of my Future of Testing talk. Hope to see you there.

I’ve received tons of feedback on my blog posts about the future. So much so that I spent most of this weekend integrating (or stealing, I suppose you could say, depending on your perspective) your insights, corrections, and additions. Thanks to all of you who have discussed these things with me and shared your wisdom.

If you happen to miss GTAC, I’ll be giving a similar but darker version at EuroSTAR entitled The End of Testing As We Know It in The Hague on November 11. Yes I was drinking and listening to REM when I made the presentation.

Both GTAC and EuroSTAR were big successes. I think my EuroSTAR talk benefited a great deal from the trial run and both sparked a lot of discussion. I made some fantastic contacts at Google to boot. Odd how so many of them used to work for Microsoft.

Manual Versus Automated Testing Again

I can’t believe how much mail I got over the whole manual versus automation question, and it’s fairly easy to see why. My Ph.D. dissertation was on model-based testing, and for years I taught and researched test automation. Now my obsession with manual testing is in full gear. It’s not an either-or proposition, but I do believe that manual testing has an extreme advantage of having a human testers brain fully engaged during the entire process whereas automation foregoes that benefit the moment it starts to run.

In my Future series I was accused of supporting both sides of the manual versus automated debate and flip-flopping like an American politician who can’t decide whether to kiss the babies or their moms. Clearly this is not an either-or proposition. But I wanted to supply some clarity in how I think about this.

This is a debate about when to choose one over the other and which scenarios one can expect manual testing to outperform automated testing and vice versa. I think the simplistic view is that automation is better at regression testing and API testing whereas manual testing is better for acceptance testing and GUI testing. I don’t subscribe to this view at all and think it diverts us from the real issues.

I think the reality of the problem has nothing to do with APIs or GUIs, regression or functional. We have to start thinking about our code in terms of business logic code or infrastructure code. Because that is the same divide that separates manual and automated testing.

Business logic code is the code that produces the results that stakeholders/users buy the product for. It’s the code that gets the job done. Infrastructure code is the code that makes the business logic work in its intended environment. Infrastructure code makes the business logic multiuser, secure, localized, and so forth. It’s the platform goo that makes the business logic into a real application.

Obviously, both types of code need to be tested. Intuitively, manual testing should be better at testing business logic because the business logic rules are easier for a human to learn than they are to teach to a piece of automation. I think intuition is bang-on correct in this situation.

Manual testers excel at becoming domain experts, and they can store very complex business logic in the most powerful testing tool around: their brains. Because manual testing is slow, testers have the time to watch for and analyze subtle business logic errors. Low speed but also low drag.

Automation, on the other hand, excels at low-level details. Automation can detect crashes, hangs, incorrect return values, error codes, tripped exceptions, memory usage, and so forth. It’s high speed but also high drag. Tuning automation to test business logic is very difficult and risky. In my humble opinion I think that Vista got bit by this exact issue: depending so much on automation where a few more good manual testers would have been worth their weight in gold.

So whether you have an API or a GUI, regress or testing fresh, the type of testing you choose depends on what type of bug you want to find. There may be special cases, but the majority of the time manual testing beats automated testing in finding business logic bugs, and automated testing beats manual testing in finding infrastructure bugs.

November 2008

This was the month I spoke at EuroSTAR. After my keynote, I was told that another speaker at the conference was quoting me: “James Whittaker sees no need for testers in the future.” So, I felt compelled to set the record straight.

I gave a keynote at EuroSTAR on the future of software testing where I began by painting a picture of the promise of software as an indispensible tool that will play a critical role in solving some of humankind’s most vexing problems. Software, I argued, provides the magic necessary to help scientists find solutions for climate change, alternative energy, and global economic stability. Without software how will medical researchers find cures for complex diseases and fulfill the promise of the human genome project? I made the point that software could very well be the tool that shifts the balance of these hard problems in our favor. But what, I asked by means of a litany of software failures, will save us from software?

Somehow as I painted my predictions of a future of software testing that promises a departure from late cycle heroics and low quality apps, some people got the impression that I predicted “no more testers.” How one can latch onto a 20-second sound bite while tuning out the remainder of a 45-minute keynote is beyond me. The U.S. elections are over, taking sound bites out of context is no longer in season.

This blog is replete with my biases toward manual testing and my admiration for the manual tester. If you read it and if you managed to listen for more than a couple of minutes during my keynote, you’d have to conclude that I believe that the role of the tester is going to undergo fundamental change. I believe that testers will be more like test designers and that the traditional drudgery of low level details like test case implementation, execution, and validation will be a thing of the past. Testers will work at a higher level and be far more impactful on quality.

I quite imagine that the vast majority of testers who actually listened to my full message will rejoice at such a future. I invite the others to take a second read.

Software Tester Wanted

I cannot believe people actually questioned whether I was joking with this post. Clearly, this description of a tester want ad was a little too close to the mark. I was accused of disrespecting my employer and my discipline with this one. Frankly, I think both the post and the reaction to it are just plain funny.

Software tester wanted. Position requires comparing an insanely complicated, poorly documented product to a nonexistent or woefully incomplete specification. Help from original developers will be minimal and given grudgingly. Product will be used in environments that vary widely with multiple users, multiple platforms, multiple languages, and other such impossibilities yet unknown but just as important. We’re not quite sure what it means, but security and privacy are paramount and post release failures are unacceptable and could cause us to go out of business.

Keeping Testers in Test

This is a sore point for a lot of testers at Microsoft: that many of the best testers move to development and program management. There is a perception that it increases promotion velocity, and it seems that perception is even stronger in other places.

I did a webinar for UTest.com today and got some great questions. One question seemed to really resonate: How do you keep good testers from moving to development.

I hear this question a lot. Many engineers see Test as a training ground for development. A testing job is just a foot in the door for a quick move to development. Sigh.

Let’s be honest, this is not a bad thing. I think that the more developers we have trained as testers is categorically good. They’ll write fewer bugs, communicate with test better and generally appreciate the work their test teams do on their behalf. I think the real sadness comes from the fact that Test as a discipline loses so many talented people.

I am not convinced that the folks who leave are really doing so because of the developers’ greener pastures. After all, there is a lot of code to write as a tester and it’s often a freer coding atmosphere. I think people leave because too many test managers are stuck in the past and living just to ship. Everywhere I see testers move to development I see teams that lack a real innovative spirit and the converse is most certainly true. The happiest, most content testers are in groups that covet innovators and provide opportunity to invent, investigate, and discover.

Want your testers to stay. Give them the opportunity to innovate. If all you see is test cases and ship schedules, all your testers will see is the door. Can’t say I blame them either.

December 2008

I wasn’t very busy in December blog-wise. So if you aren’t going to write much, then use the titles that draw the readers: Google. Believe it or not, this one also made the front page of MSDN! Talk about a formula that works.

Google Versus Microsoft and the Dev:Test Ratio Debate

Every since I gave a talk at Google’s GTAC event here in Seattle this past October, I’ve had the chance to interact with a number of Google testers comparing and contrasting our two companies’ approach to testing. It’s been a good exchange.

Now it seems that, Google focuses on testing with an intensity that is in the same general ballpark as ours. We both take the discipline and the people who do it seriously. But I think that there are some insights into the differences that are worth pondering.

Specifically, the disparity between our respective developer-to-tester ratios is worth a deeper look. At Microsoft the d:t ratio varies somewhat from near 1:1 in some groups to double or triple that in others. At Google just the opposite seems to be the case with a single tester responsible for a larger number of bug-writing devs. (Clearly we have that in common!)

So which is better? You tell me, but here are my thoughts (without admission of any guilt on Microsoft’s part or accusations against Google):

  1. 1:1 is good. It shows the importance we place on the test profession and frees developers to think about development tasks and getting the in-the-small programming right. It maximizes the number of people on a project actively thinking about quality. It speeds feature development because much of the last minute perfecting of a program can be done by testers. And it emphasizes tester independence, minimizing the bias that keeps developers from effectively testing their own code.
  2. 1:1 is bad. It’s an excuse for developers to drop all thoughts of quality because that is someone else’s job. Devs can just build the mainline functionality and leave the error checking and boring parts to the testers.

It’s interesting to note that Microsoft testers tend to be very savvy developers and are often just as capable of fixing bugs as they are of finding bugs. But when they do so, do devs really learn from their mistakes when they have someone else cleaning up after them? Are testers, when talented and plentiful, an excuse for devs to be lazy? That’s the other side of this debate:

  1. Many:1 is good. When testers are scarce, it forces developers to take a more active role in quality and increases the testability and initial quality of the code they write. We can have fewer testers because our need is less.
  2. Many:1 is bad. It stretches testers too thin. Developers are creators by nature and you need a certain number of people to take the negative viewpoint or you’re going to miss things. Testing is simply too complicated for such a small number of testers. Developers approach testing with the wrong, creationist attitude and are doomed to be ineffective.

So where’s the sweet spot? Clearly there are application-specific influences in that big server apps require more specialized and numerous testers. But is there some general way to get the mix of testers, developers, unit testing, automated testing, and manual testing right? I think it is important that we start paying attention to how much work there really is in quality assurance and what roles are most impactful and where. Test managers should be trying to find that sweet spot.

January 2009

The year 2008 ended with a now-famous bug from our Zune product. It was the talk of the testing circles at Microsoft. We debated the bug, how it got there, and why it was missed. This post was my take.

The Zune Issue

As you can imagine there is a pretty lively debate going on over the Zune date math issue here in the hallways and on our internal mailing lists. There are plenty of places one can find analyses of the bug itself, like here, but I am more interested in the testing implications.

One take: This is a small bug, a simple comparator that was “greater than” but should have been “greater than or equal to.” It is a classic off-by-one bug, easily found by code review and easily fixed then forgotten. Moreover, it wasn’t a very important bug because its lifespan was only one day every leap year, and it only affected the oldest of our product line. In fact, it wasn’t even our bug; it was in reused code. Testing for such proverbial needles is an endless proposition, blame it on the devs and ask them not to do it again. (Don’t get your knickers in a twist, surely you can detect the sarcasm.)

Another take: This is a big bug, in the startup script for the device and thereby affected every user. Moreover, its effect is nothing short of bricking the device, even if only for a day (as it turns out, music is actually a big deal on that specific day). This is a pri-1, sev-1, run-down-the-halls-screaming-about-it kind of bug.

As a tester can I take any view but the latter? But the bug happened. Now we need to ask what can we learn from this bug?

Clearly, the code review that occurred on this particular snippet is suspect. Every code review I have ever been part of, a check on every single loop termination condition is a top priority, particularly on code that runs at startup. This is important because loop termination bugs are not easily found in testing. They require a “coming together” of inputs, state, and environment conditions that are not likely to be pulled out of a hat by a tester or cobbled together using unthinking automation.

This brings me to my first point. We testers don’t do a good job of checking on the quality of code reviews and unit testing where this bug could have been more easily found. If I were still a professor I would give someone a Ph.D. for figuring out how to normalize code review results, unit test cases, and system test cases (manual and automated). If we could aggregate these results, we could actually focus system testing away from the parts of the system already covered by upstream “testing.” Testers would, for once, be taking credit for work done by devs, as long as we can trust it.

The reason that system testing has so much trouble dealing with this bug is that the tester would have to recognize that the clock was an input (seems obvious to many, but I don’t think it is a given), devise a way to modify the clock (manually or as part of their automation), and then create the conditions of the last day of a year that contained 366 days. I don’t think that’s a natural scenario to gravitate toward even if you are specifically testing date math. I can imagine a tester thinking about February 29, March 1, and the old and new daylight savings days in both Fall and Spring. But what would make you think to distinguish Dec 31, 2008 as any different from Dec 31, 2007? Y2K seems an obvious year to choose and so would 2017, 2035, 2999, and a bunch of others, but 2008?

This brings me to my second point. During the discussions about this bug on various internal forums, no less than a dozen people had ideas about testing for date related problems that no one else involved in the discussions had thought of. I was struck by a hallway debate between two colleagues who were discussing how they would have found the bug and what other test cases needed to be run for date math issues. Two wicked smart testers that clearly understood the problem date math posed but had almost orthogonal approaches to testing it!

The problem with arcane testing knowledge (security, y2k, localization all come to mind) is that we share our knowledge by discussing it and explaining to a tester how to do something. “You need to test leap year boundaries” is not an ineffective way of communicating. But it is exactly how we are communicating. What we should be doing is share our knowledge by passing test libraries back and forth. I wish the conversation had been: “You need to test leap year boundaries and here’s my library of test cases that do it.” Or “Counting days is a dangerous way to implement date math, when you find your devs using that technique, run these specific test cases to ensure they did it right.”

The testing knowledge it took to completely cover the domain of this specific date math issue was larger than the set of folks discussing it. The discussion, while educational and stimulating, isn’t particularly transportable to the test lab. Test cases (or models/abstractions thereof) are transportable, and they are a better way to encapsulate testing knowledge. If we communicated in terms of test cases, we could actually accumulate knowledge and spread it to all corners of the company (we have a lot of apps and devices that do date math) much faster than sitting around explaining the vagaries of counting time. Someone who didn’t understand the algorithms to count time could still test it using the test assets of someone else who did understand it.

Test cases, reusable and reloadable, are the basis for accumulated knowledge in software testing. Testing knowledge is simply far too distributed across various experts’ heads for any other sharing mechanism to work.

Exploratory Testing Explained

As I got closer to finishing this book, I ramped up my exploratory testing rhetoric and began looking for skeptics who would help me find flaws and improve it. One thing you can say about Microsoft is that we have our share of skeptics. This post came as a result of debating and learning from such skeptics. Even though it is pretty vanilla flavored, it has been a reader favorite.

I just finished talking (actually the conversation was more like a debate) to a colleague, exploratory testing critic, and a charter member of the plan-first-or-don’t-bother-testing-at-all society.

I am happy to say, he conceded the usefulness (he would not grant superiority) of exploratory testing. Perhaps I have finally found a useful explanation of the efficacy of exploration. Here’s what I said:

“Software testing is complicated by an overload of variation possibilities from inputs and code paths to state, stored data and the operational environment. Indeed, whether one chooses to address this variation in advance of any testing by writing test plans or by an exploratory approach that allows planning and testing to be interleaved, it is an impossible task. No matter how you ultimately do testing, it’s simply too complex to do it completely.

However, exploratory techniques have the key advantage that they encourage a tester to plan as they test and to use information gathered during testing to affect the actual way testing is performed. This is a key advantage over plan-first methods. Imagine trying to predict the winner of the Super Bowl or Premier League before the season begins...this is difficult to do before you see how the teams are playing, how they are handling the competition, and whether key players can avoid injury. The information that comes in as the season unfolds holds the key to predicting the outcome with any amount of accuracy. The same is true of software testing and exploratory testing embraces this by attempting to plan, test, and replan in small ongoing increments guided by full knowledge of all past and current information about how the software is performing and the clues it yields in the testing results.

Testing is complex, but effective use of exploratory techniques can help tame that complexity and contribute to the production of high quality software.”

Test Case Reuse

I got an email from Capers Jones on this one urging me to consider reuse of other artifacts like specs and design, and so on. I like getting email from famous people. But dude (may I call you dude, Mr. Jones?), I am a tester. Someone else needs to think about reuse in those spaces.

I’ve given my “future of testing” talk four times (!) this week and by far the part that generates the most questions is when I prophesize about test case reuse. Given that I answered it differently all four times (sigh), I want to use this space to clarify my own thinking and to add some specifics.

Here’s the scenario: One tester writes a set of test cases and automates them so that she can run them over and over again. They are good test cases, so you decide to run them as well. However, when you do run them, you find they won’t work on your machine. Your tester friend used automation APIs that you don’t have installed on your computer and scripting libraries that you don’t have either. The problem with porting test cases is that they are too specific to their environment.

In the future we will solve this problem with a concept I call environment-carrying tests (nod to Brent Jensen). Test cases of the future will be written in such a way that they will encapsulate their environment needs within the test case using virtualization. Test cases will be written within virtual capsules that embed all the necessary environmental dependencies so that the test case can run on whatever machine you need it to run on.

The scope of technological advances we need for this to happen are fairly modest. However, the Achilles heel of reuse has never been technological so much as economic. The real work required to reuse software artifacts has always been on the consumer of the reused artifact and not on its producer. What we need is an incentive for testers to write reusable test cases. So, what if we create a “Testipedia” that stored test cases and paid the contributing tester, or their organization, for contributions? What is a test case worth? A dollar? Ten dollars? More? Clearly they have value, and a database full of them would have enough value that a business could be created to host the database and resell test cases on an as-needed basis. The more worthy a test case, the higher its value and testers would be incentivized to contribute.

Reusable test cases will have enough intrinsic value that a market for test case converters would likely emerge so that entire libraries of tests could be provided as a service or licensed as a product.

But this is only part of the solution. Having test cases that can be run in any environment is helpful, but we still need test cases that apply to the application we want to test. As it turns out, I have an opinion on this and I’ll blog about it next.

More About Test Case Reuse

We mostly write test cases that are specifically tied to a single application. This shouldn’t come as any big surprise given that we’ve never expected test cases to have any value outside our immediate team. But if we want to complete the picture of reusable test cases that I painted in my last post, we need to write test cases that can be applied to any number of different apps.

Instead of writing a test case for an application, we could move down a level and write them for features instead. There are any number of web applications, for example, that implement a shopping cart, so test cases written for such a feature should be applicable to all such apps. The same can be said of many common features like connecting to a network, making SQL queries to a database, username and password authentication, and so forth. Feature-level test cases are far more reusable and transferable than application-specific test cases.

The more focused we make the scope of the test cases we write, the more general they become. Features are more focused than applications, functions and objects are more focused than features, controls and data types are more focused than functions, and so forth. At a low enough level, we have what I like to call “atomic” test cases. A test atom is a test case that exists at the lowest possible level of abstraction. Perhaps you’d write a set of test cases that simply submits alphanumeric input into a text box control. It does one thing only and doesn’t try to be anything more. You may then replicate this test atom and modify it for different purposes. For example, if the alphanumeric string in question is intended to be a username; then a new test atom that encoded the structure of valid usernames would be refined from an existing atom. Over time thousands (and hopefully orders of magnitude more) of such test atoms would be collected.

Test atoms can be combined into test molecules. Two alphanumeric string atoms might be combined into a test molecule that tests a username and password dialog box. I can see cases where many independent test authors would build such molecules and then over time the best such molecule would win out and yet the alternatives would still be available. With the proper incentives, test case authors would build any number of molecules that could then be leased or purchased for reuse by application vendors that implement similar functionality.

At some point, enough test atoms and molecules would exist that the need to write new, custom tests would be minimal. I think that Wikipedia, a site with user supplied, policed, and maintained content, would be what the industry would need to store all these tests. Perhaps such a community Testipedia can be constructed or companies can build their own internal Testipedias for sensitive applications. But a library of environment-carrying (see my last post) test atoms and molecules would have incredible value.

A valuable extension of this idea is to write atoms and molecules in such a way that they will understand whether they apply to an application. Imagine highlighting and then dragging a series of ten thousands tests onto an application and having the tests themselves figure out whether they apply to the application and then running themselves over and over within different environments and configurations.

Ah, but now I am just dreaming.

I’m Back

I got email from a number of former students (it’s so nice that they continue to follow my work even now when I can no longer grade them) who remember my post-vacation intensity. Time to think is very important for those of us in a position to make work for other people with those thoughts!

When you’re on vacation do you think about work? Not thoughts of dread, worry, or angst but reflection, planning, and problem solving. I just did. Last Sunday I awoke in Seattle to freezing temps and a dusting of snow. By midday I was building a sandcastle on Ka’anapali Beach, Maui, in 79 degree sunshine. If that’s not getting away from it all, I don’t know what is.

Yet my mind wasn’t really away. In fact, I thought about work all the time. Given that software was everywhere I looked, it’s not hard to see why. My entire trip was booked online, even the taxi to the airport. Not a single person besides myself took part in the process. Just me...and a load of software.

The taxi cab itself contained software, as did the airplane. The baggage carousel, the espresso machine, the car rental counter (no person there, just a self serve terminal), and even the surveillance camera that watched my son juggle his soccer ball while I packed our bags in the trunk. All alone, except for the software. Even the frozen concoction machine had software that helped it maintain the right temperature. (It broke, incidentally, making me thankful that I am a beer drinker.)

Is it possible for anyone in this field to really get away from it all? (Don’t get me started on the motion sensors that control the air conditioning in the hotel room. I’m all for turning them off when they are not in use, but apparently sitting still and being cool was not one of their end-to-end scenarios.)

The truth of the matter is that getting away from it all just isn’t necessary for me. I like seeing software in action and I enjoy brooding over problems of testing it. Vacations free my mind from the daily grind and leave my mind to question things that back home I might overlook. Does this make me work obsessed or just indicate that I really like what I do?

Vacations have always been like this for me. When I was a professor, two students who led my research lab, Ibrahim El-Far and Scott Chase, actually avoided me when I returned from a trip, afraid of the work my new insights would bring. They never quite managed to successfully do so.

Which brings me back to the motion sensor in my room. The problem isn’t so much a poor tester, rather poor testing guidance. The sensor does exactly what it is designed to do and testing it based on those requirements got me in the sit-and-sweat loop. The problem is that no one thought to give it a field try...what I call “day in the life” testing. Had the tester thought to take the sensor through a 24-hour cycle of usage they would have identified that problematic ten-hour period (yes, ten, it’s a vacation after all) when motion is low and the desire to be cool is high. But what tool gives such guidance? Modern tools help testers in many ways, but helping them think of good test scenarios isn’t one of them. They help us organize, automate, regress, and so forth, but do they really help us to test?

That’s the tool I want. Tomorrow, when I return, I am going to direct someone to build it for me. Ibrahim and Scott, you are off the hook this time.

Of Moles and Tainted Peanuts

There was a full page ad for Jif peanut butter in my morning paper that caught my attention. (For those non-U.S. readers, our nation is experiencing a salmonella bacteria outbreak that has been traced back to contaminated peanuts.) The ad touted Jif’s rigorous testing processes and reassured readers that testing for salmonella was a long-time habit for the Jif company, and we should feel confident in consuming their products.

Now clearly peanut butter is not software. I very much doubt that the processes for making peanut butter have changed much over the past few decades. I also imagine that one batch of peanut butter is about the same as the last batch. I concede that we have a harder problem.

But the term “long-time habit” really caught me. Because I haven’t seen too many long-term habits established in the testing industry. We plan tests, write test cases, find bugs, report bugs, use tools, run diagnostics, and then we get a new build and start the process all over again. But how much do we learn in the process? How much do we retain from one build to the next? Are we getting better each time we test? Are we getting better purposefully or are we just getting more experienced? In many ways, the only real repository of historical wisdom (those long-term habits of Jif) is embodied in our tools.

My friend Alan Page likens testing to playing whack-a-mole. You know the one: Chuck a quarter in and plastic moles pop up through a random sequence of holes and you whack them on the head with a mallet. Whack one, another appears, and even previously whacked moles can pop up again requiring additional mallet treatment. It’s a never ending process, just add quarters.

Sounds familiar? Testing is whack-a-mole with developers applying the quarters liberally. Now, defect prevention notwithstanding, we can take a lesson from Jif. They understand that certain risks are endemic to their business, and they’ve designed standard procedures for mitigating those risks. They’ve learned how to detect salmonella, and they have wired those tests into their process.

Have we paid enough attention to our history that we can codify such routine test procedures and require their consistent application?

Clearly software is not peanut butter. Every piece of software is different; Office’s salmonella is likely irrelevant to Windows and vice versa. But that is no excuse to play whack-a-mole with bugs. We have to get better. Perhaps we can’t codify a salmonella testing procedure into a cookbook recipe, but we can start being more proactive with the whole business of learning from our mistakes.

I propose that testers take a break from finding bugs and just take some time to generalize. When the bug pops its head from some coded hole, resist the temptation to whack it. Instead, study it. How did you find it? What was it that made you investigate that particular part of the app? How did you notice the hole and what was happening in the app that caused the bug to pop out? Is the test case that found the bug generalizable to find more bugs similar to the one you are ready to whack? Is there some advice you could pass along to other testers that would help them identify such holes?

In other words, spend part of your time testing the current product you are trying to ship. Spend the rest of the time making sure you learn to test the next product better. There is a way to do this, a metaphor we use here at Microsoft to help.

I’ll discuss the metaphor and how we apply it to form long-time habits, peanut butter style, in my next post.

And that’s where I end this annotated posting and the exact place where this book begins. The metaphor I speak of is the tourist metaphor, which is presented in the pages of this book, primarily Chapter 4, “Exploratory Testing in the Large.”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset