6 Test Tools and Automation

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6 Test Tools and Automation

The intentions of a tool are what it does. A hammer intends to strike, a vise intends to hold fast, a lever intends to lift. They are what it is made for. But sometimes a tool may have other uses that you don’t know. Sometimes in doing what you intend, you also do what the knife intends, without knowing.

—Phillip Pullman

The sixth chapter of the Advanced Technical Test Analyst syllabus is concerned with tools. The 2007 Advanced syllabus (Chapter 9) contained a discussion of test tools that might be used by all testers. In this 2012 syllabus, the tools chapter has been rescoped to discuss only the tools and techniques likely to be used by technical test analysts. We will discuss the following topics:

1. Integration and Information Interchange between Tools

2. Defining the Test Automation Project

3. Specific Test Tools

In addition, we have decided to keep some tool discussions from the first edition of our book that we believe are of interest to technical test analysts even though not necessarily in scope anymore for ISTQB.

6.1 Integration and Information Interchange between Tools

Learning objectives

TTA-6.1.1 (K2) Explain technical aspects to consider when multiple tools are used together.

Test tools can and should be made to work together to solve complex test automation tasks. In many organizations, multiple test and development tools are used. We could have a static analysis and unit test tool, a test results reporting tool, a test data tool, a configuration management tool, a defect management tool, and a graphical user interface test execution tool. In such a case, it would be nice to integrate all the test results into our test management tool and add traceability from our tests to the requirements or risks they cover. As a technical test analyst, you should help the test manager plan for, design, and implement integration of disparate test tools.

Integration of test tools can be done automatically, if all the tools have compatible interfaces. If you buy a single vendor’s test tool suite, the tools within the suite should be fully integrated out of the box. (Of course, you should consider not buying test suites that don’t have such self-integration.) That test tool suite might not integrate with your other tools. In the more likely case that some of the interfaces aren’t compatible—or maybe don’t even exist—you should plan to create integration modules to tie the tools together. Such modules are sometimes referred to as glue or middleware. While there are costs associated with building integration modules, it provides significant benefits and risk-reductions.

First, by actively sharing information across tools, integrity constraints can be put in place to ensure consistency across the tools. Related information that is stored in different places quickly gets out of sync without an active process to keep it in sync. Test results reports that are created using information from disparate and inconsistent repositories will be confusing at best, and more likely plain wrong. For example, integrity constraints can ensure that test case ID numbers stored in two repositories are kept in sync if the master repository of test cases is updated in such a way that test case ID numbers are changed.

Second, if information must be manually merged across tools, not only is that prone to error, it’s also very costly. Rex once had to spend about an hour every day merging data from two different test case tracking systems. With proper tool integration, this data merging happens regularly, automatically, invisibly, and without tester intervention or effort.

When integrating test tools, focus on the data, because getting consistent data across all the tools is the essential element here. The data should be merged in a fully automated fashion, happening on a regular basis and also available on demand if you need to force a synchronization. The repositories should be set up so that no inaccuracies or failed data copies can occur invisibly; in other words, in the rare (and it must be rare) circumstance that one or more pieces of information fails to synchronize, an administrator must be informed. If a network connection or server goes down, the integration facilities should recover automatically, including catching up any missed data synchronizations.

It’s easy to focus on the user interface across the integrated tools, but the data is the important thing. Yes, it would be nice to have the tools work in a consistent way, but unless you are building custom tools, that will be hard to do after the fact. What’s more important—and should be considered a must-have during tool acquisition—is the ability to ensure that across all tools, data is captured, stored, and presented consistently. Security of the data is also important. Otherwise, people can inadvertently or on purpose damage data in one weakly secured tool, then watch in horror or glee as that damaged data propagates across all the other tools.

If it sounds like we’re describing a small development project, yes, that’s right, we are. As such, it needs planning and design. Rex has found that such projects lend themselves well to iterative approaches such as Agile or spiral life cycle models, but there is a need for an overall design that identifies all the tool touchpoints, their interfaces, and the data to be shared. If you aren’t used to doing this kind of work, you might want to involve a skilled business analyst in your organization. Just jumping in and starting work on integration of tools may turn out to be the start of an open-ended project where work is never done and no one is ever satisfied with the results.

Figure 6–1 is an example of an integrated automated test system built for an insurance company. The system under test—or, more properly, the system of systems under test—is shown in the middle.

On the front end are three main interface types: browsers, legacy Unixbased green screen applications, and a newer Windows-based consolidated view. The front-end applications communicate through the insurance company’s network infrastructure, and through the Internet, to the iSeries server at the back end. The iSeries, as you might imagine for a well-established regional insurance company, manages a very large repository of customers, policies, claim history, accounts payables and receivables, and the like.

Figure 6–1 Integrated test system example

On the right side of the figure, you see the main elements of the test automation system. For each of the three interface types, we need a driver that will allow us to submit inputs and observe responses. The terminal driver is shown with a dotted line around it because there was some question initially about whether that would be needed. The controller/logger piece uses the drivers to make tests run, based on a repository of scripts, and it logs results of the tests. The test data and scripts are created using tools as well, and the test log analysis is performed with a tool.

Notice that all of these elements on the right side of the figure could be present in a single, integrated tool. However, this is a test system design figure, so we leave out the question of implementation details now. It is a good practice to design what you need first and then find tools that can support it rather than letting the tools dictate how you design your tests. Trust us on this one; we both have the scars to prove it! If you let the tools drive the testing, you can end up not testing important things.

This brings us to the left side and bottom of the figure. In many complex applications, the action on the screens is just a small piece of what goes on. What really matters is data transformations, data storage, data deletion, and other data operations. So, to know whether a test passed or failed, we need to check the data. The data probe allows us to do this.

The pipe is a construct for passing requests to the data probe from the controller and for the data probe to return the results. For example, if starting a particular transaction should add 100 records to a table, then the controller uses one of the applications to start the transaction—through a Windows interface via the Windows driver, say—and then has the data probe watch for 100 records being added. See, it could be that the screen messages report success, but only 90 records are added. So, we need a way to catch those kinds of bugs, and this design does that for us.

In all likelihood, the tool or tools used to implement the right-hand side of this figure would be one or two commercial or freeware tools, integrated together. The data probe and pipe would probably be custom developed.

6.2 Defining the Test Automation Project

While we often think of test automation as meaning specifically the automation of test execution, we can automate other parts of the test process as well. You would be correct in thinking that most of the test automation that happens involves attempts to automate tasks that are tedious or difficult to do manually. These tasks include test and requirements management, defect tracking and workflow, configuration management, and certainly test execution tasks such as regression and performance testing.

Getting the full benefit from a test tool involves not only careful selection and implementation of the tool but also careful ongoing management of it. Too often, an automation project fails because the only guidance the automation team received was, “Get ’er done!”

The success or failure of an automation project can often be determined very early in the project, based on the extent of preparation, the design, and the architecture built before starting to crank out tests. Too often as consultants and practitioners we’ve seen test teams saddle themselves with constraints due to poor design decisions made at the outset of automation. There is only one way to start an automation program: thinking long term. The decisions that you make at the beginning will be with you for years, unless, like many organizations, you paint yourselves into a corner and have to start fresh down the road.

6.2.1 Preparing for a Test Automation Project

Learning objectives

TTA-6.2.1 (K2) Summarize the activities that the technical test analyst performs when setting up a test automation project.

You should plan to use configuration management for all test tool artifacts, including test scripts, test data, and any other outputs of the tool, and remember to link the version numbers of tests and test tools with the version numbers of the items tested with them.

We will use two related terms, architecture and framework. These two terms are often used interchangeably, but they will not be in this book. An architecture is a conceptual way of building an automated system. Architectures we will discuss include record/playback, simple framework, data-driven, and keyword-driven architectures. A framework is a specific set of techniques, modules, tools, and so on that are molded together to create a solution using a particular architecture. We will discuss these differences in detail later in the chapter.

When automating test execution, you should plan to create a proper framework for your automation system. A good framework supports another important aspect of good test execution automation, which is creating and maintaining libraries of tests. With consistent decisions in place about the size of test cases, naming conventions for test cases and test data, interactions with the test environment and such, you can now create a set of reusable test building blocks with your tools. You’ll need libraries in which to store those.

Automated tests are programs for testing other programs. So, as with any program, the level of complexity and the time required to learn how to use the system means that you’ll want to have some documentation in place about how it works, why it was built the way it was, and so forth. Eventually, the original architects will be gone; any knowledge not documented will be long gone too. Documentation doesn’t have to be fancy, but any automated test system of any complexity needs it.

Remember to plan for expansion and maintenance. Failure to think ahead, particularly in terms of how the tests can be maintained, will reduce the scalability of the automated system; that will reduce the possibility of getting positive value on your automation project.

ISTQB has moved the section discussing the business case for automation to the Advanced Test Manager syllabus. While we agree with that decision generally, we believe that a technical test analyst must also be aware of the “why” of automation in order to build a system that will generate the most value for the organization. Therefore, in this edition of our book, we will still include some discussion on the benefits of automation.

Test automation should occur only when there is a strong business case for it, usually one that involves shrinking the test execution period, reducing the overall test effort, and/or covering additional quality risks that could not be covered by manual testing.

When we talk about the benefits of automation, notice that these are benefits compared to the duration, effort, or coverage we would have with manual testing. The value of our automation has to be considered in terms of comparing it to other alternatives we might choose. Importantly, those alternatives must be alternatives that the organization actually would have pursued. In other words, if we automate a large number of tests but they are tests we would not bother to run manually, we should not claim a return on investment in terms of time savings compared to manual execution of those tests. As Rex often says, just because something’s on sale doesn’t mean that it’s a bargain; it’s not if you don’t need it.

In any business case, we have to consider costs, risks, and benefits. Let’s start with the costs. We can think of costs in terms of initial costs and recurring costs. Initial costs include the following:

Evaluating and selecting the right tool. Many companies try to shortcut this and they pay the price later, so don’t succumb to the temptation.
Purchasing the tool, adapting an open-source tool, or developing your own tool.
Learning the tool and how to use it properly. This includes all costs of intraorganizational knowledge transfer and knowledge building, including designing and documenting the test automation architecture.
Integrating the tool with your existing test process, other test tools, and your team. Your test processes will have to change. If they don’t change, then what benefit are you actually getting from the tool?

Recurring costs include the following:

Maintaining the tool(s) and the test scripts. This issue of test script durability—how long a script lasts before it has to be updated—is huge. Make sure you design your test system architecture to minimize this cost, or to put it another way, to maximize test durability.
Paying for ongoing license fees.
Paying support fees for the tool.
Paying for ongoing training costs.
Porting the tests to new platforms (including OS upgrade).
Extending the coverage to new features and applications.
Dealing with issues that arise in terms of tool availability, constraints, and dependencies.
Instituting continuous quality improvement for your test scripts.

It’s a natural temptation to skip thinking about planned quality improvement of the automation system. However, with a disparate team of people doing test automation, not thinking about it guarantees that the tool usage and scripts will evolve in incompatible ways and your reuse opportunities will plummet. Trust us on this; we saw a client waste well over $250,000 and miss a project deadline because there were two automation people creating what was substantially the same tool using incompatible approaches.

In the Foundation syllabus, you’ll remember that there was a recommendation to use pilot projects to introduce automation. That’s a great idea. However, keep in mind that pilot projects based on business cases will often miss important recurring costs, especially maintenance. Pilots are often targeted using only short-term thinking. We suggest trying to leverage the pilot into considering long-term issues also.

We can also think of costs in terms of fixed costs and variable costs. Fixed costs are those that we incur no matter how many test cases we want to automate. Tool purchase, training, and licenses are primarily fixed costs. Variable costs are those that vary depending on the number of tests we have. Test script development, test data development, and the like are primarily variable costs.

Due to the very high fixed costs of automation, we will usually have to worry about the scalability of the testing strategy. That is, we usually need to do a lot of testing to amortize the fixed costs and try to earn real value on our investment. The scalability of the system will determine how much investment of time and resources is needed to add and maintain more tests.

When determining the business case, we must also consider risks. The Foundation syllabus discussed the following risks:

Dealing with the unrealistic expectations of automation in general. Management often believes that spending money on automation guarantees success: the silver bullet theory.
Underestimating the time and resources needed to succeed with automation. Included in this underestimation are initial costs, time, and effort needed to get started and the ongoing costs of maintenance of the assets created by the effort.
Overestimation of what automation can do in general. This often manifests itself in management’s desire to lay off manual testers, believing that the automation effectively replaces the need for manual testing.
Overreliance on the output of a single tool, misunderstanding all of the components needed that go into a successful automation project.
Forgetting that automation consists of a series of processes—the same processes that go into any successful software project.
Various vendor issues, including poor support, vendor organizational health, tools (commercial or open-source) becoming orphans, and the inability to adapt to new platforms.

In this Advanced Technical Test Analyst (ATTA) book, we must also consider these risks:

Your existing manual testing could be incomplete or incorrect. If you use that as a basis for your automated tests, guess what, you’re just doing the wrong thing faster! You need to double-check manual test cases, data, and scripts before automating because it will be more expensive to fix them later. Automation is not a cure for bad testing, no matter how much management often wants to think so.
You produce brittle, hard-to-maintain test scripts, test frameworks, and test data that frequently needs updates when the software under test changes. This is the classic test automation bugaboo. Careful design of maintainable, robust, modular test automation architectures, design for test script and data reuse, and other techniques can reduce the likelihood of this happening. When it does happen, it’s a test automation project killer, guaranteed, because the test maintenance work will soon consume all resources available for test automation, bringing progress in automation coverage to a standstill.
You experience an overall drop in defect detection effectiveness because everyone is fixated on running the scripted, invariable, no-human-in-the-loop automated tests. Automated tests can be great at building confidence, managing regression risks, and repeating tests the same way, every time. However, the natural exploration that occurs when human testers run test cases doesn’t happen with automated scripts. Automated tests tend to present the ultimate pesticide paradox because they tend to run exactly the same each time. You need to ensure that an adequate mix of human testing is included. Most bugs will still be found via manual testing because regression test bugs, reliability bugs, and performance bugs—which are the main types of bugs found with automated tests—account for a relatively small percentage of the bugs found in software systems.

Now, as you can see, all of these risks can—and should—be managed. There is no reason not to use test automation where it makes sense.

Of course, the reason we incur the costs and accept the risks is to receive benefits. What are the benefits of test automation?

First, it must be emphasized that smart test teams invest—and invest heavily—in developing automated test cases, test data, test frameworks, and other automation support items with an aim of reaping the rewards on repeatable, low-maintenance automated test execution over months and years. When we say “invest heavily” what we mean is that smart test teams do not take shortcuts during initial test automation development and rollout because they know that will reduce the benefits down the road.

Smart test teams are also judicious about which test cases they automate, picking each test case based on the benefit they expect to receive from automating it. Brain-dead approaches like trying to automate every existing manual test case usually end in well-deserved—and expensive—failures.

Once they are in place, we can expect well-designed, carefully chosen automated tests to run efficiently with little effort. Because the cost and duration are low, we can run them at will, pushing up overall coverage and thus confidence upon release.

Given the size of the initial investment, you have to remember that it will take many months, if not years, to pay back the initial costs. Understand that in most cases, there are no shortcuts. If you try to reduce the initial costs of development, you will create a situation where the benefits of automated test execution are zero or less than zero; you can do the math yourself on how long it takes to reach the break-even point in that situation.

So, above and beyond the benefits of saved time, reduced effort, and better coverage (and thus lower risk), what else do we get from test automation done well?

Better predictability of test execution time. If we can start the automated test set, leave for the night, come back in the morning, and find the tests have all run, that’s a very nice feeling, and management loves that kind of thing.
The ability to quickly run regression and confirmation tests creates a byproduct benefit. Since we can manage the risk associated with changes to the product better and faster, we can allow changes later in a project than we otherwise might. Now, that’s a dual-edged sword, for sure, because it can lead to recklessness, but used carefully, it’s a nice capability to have for emergencies.
Since test automation is seen as more challenging and more esteemed than manual testing, many testers and test teams find the chance to work on automated testing rewarding.
Because of the late and frequent changes inherent in certain life cycle models, especially in Agile and iterative life cycles, the ability to manage regression risk without ever-increasing effort is an essential reason for using automation. In such life cycles, regression risk is higher due to frequent code changes.
Automation is a must for certain test types that cannot be covered manually in any meaningful way. These include performance and reliability testing. With the right automation in place, we can reduce risk by testing these areas.

This section is not intended to give an exhaustive or universal list of costs, risks, and benefits associated with test automation. Be sure to assist the test manager in their analysis of the business case, including these three elements.

6.2.2 Why Automation Projects Fail

Learning objectives

TTA-6.2.3 (K2) Summarize common technical issues that cause automation projects to fail to achieve the planned return on investment.

At the risk of going in a different order than the ISTQB ATTA syllabus specifies, we believe that this section on why automation fails must come before we start discussing automation architectures. The main reason for that is that we use different architectures to solve many of the specific problems we present here.

In this section, we want to talk about some strategies for dealing with the common failures that afflict test execution automation.

The first topic has to be who is participating in the automation project. A person who just knows how to physically operate an automation test tool is not an automator any more than a person who knows a word processing program is an author. In our careers, we have met many people who claim to be automators—especially on their resumes when they are applying for jobs. However, when pressed on how to solve particularly common automation problems, they haven’t a clue. Let us be clear: The automation tool is not an automation solution; it is only the starting point. An automator must know more than how to drive the tool.

In Jamie’s career, he has used almost every popular vendor automation tool—and a great many of the open-source tools as well. He can fairly claim to have succeeded with almost every automation tool at one time or another but must also admit to having failed with just about every tool at least once also. Rex has built a larger number of automated testing tools, some of which were quite successful and some of which were abysmal failures. A person who claims to be an experienced automator but cannot intelligently discuss all of the times that they failed should be avoided. Every good automator that we have met became good by learning from their failures.

The most common question Jamie is asked when teaching a class or speaking to a group of people is, “Which tool do you recommend?” Rex gets this question a lot when talking about tools as well. Behind this question, we think most people are actually asking, “Which tool can we bring in that will guarantee our automation success?” The answer is always the same. There are no “right” tools for every situation!

Consider asking a race car driver, “Which is the best spark plug to use to win a race?” A good answer might be that there are a number of good spark plugs that can be used but none of them will guarantee a win (unless of course the driver is being paid to shill for one specific brand). The fact that the car has spark plugs is certainly essential, but the brand is probably not. And so it is with automation tools.

Purely on-the-fly, seemingly pragmatic automation using any tool can work for a brief time; as problems crop up, the automator can fix them—for a short time. Eventually, such an automation program will fall over from its own weight. This failure is a certainty. The more test cases, the more problems, and the more problems, the more time will be needed to solve them. There are so many problems inherent with automation, a fully drawn-out, strategic plan is the only chance an organization has to succeed. As an automator who has been doing automation for over 20 years, Jamie has never—NEVER—seen an automation program succeed in the long term without a fully planned-out strategy up front. As someone involved in testing and test management for almost 30 years, Rex concurs and can add that explaining this fact to management is often very difficult indeed.

So, here are some of the ingredients needed for a successful test automation strategy. First and foremost, automate functional testing for the long term. Short term thinking (i.e., we need to get the current project fully automated by next month) will always fail to earn long-term value. Judging from the number of requests we get for exactly this service—get the automation project working to test the current release we are already behind on—this chestnut is almost universally ignored.

Build a maintainable automated test framework. Think of this as the life support system for the tests. We will discuss how to do this in an upcoming section. Remember that the most important test you will ever run in automation is the next one. That is, no matter what happens to the current test—pass, fail, or warning—it means little if the framework—without direct human intervention—cannot get the next test to run. And the next test after that. The framework supports the unattended execution ability of the suite.

Unless there is an overwhelming reason to do otherwise, only automate those tests that are automatable; that is, they can run substantially unattended and human judgment is not required during test execution to interpret the results. Having said that, we have found sometimes there were good business reasons to build manual/automated hybrids where, at certain points in the execution, a tester intervenes manually to advance the test.

Automate those tests and tasks that would be error prone if done by a person. This includes not only regression testing—which is certainly a high-value target for automation—but also creating and loading test data.

Only automate those test suites and even test cases within test suites where there’s a business case. That means you have to have some idea of how many times you’ll repeat the test case between now and the retirement of the application under test. If the test only needs to be run once in a blue moon, it may be better left as a manual test.

Even though most automated tests involve deliberate, careful, significant effort, be ready to take advantage of easy automation wins where you find them. Pick the low-hanging fruit. For example, if you find that you can use a freeware scripting language to exercise your application in a loop in a way that’s likely to reveal reliability problems, do it. As a good example of this, you can see the case study that Rex and a client wrote about constructing an automated monkey test from freeware components in a matter of a few weeks.¹

That said, be careful with your test tool portfolio, both open source and commercial. It’s easy to have that get out of control, especially if everyone downloads their own favorite open-source test tool. Have a careful process for evaluating and selecting test tools, and don’t deviate from that process unless there is a good business reason to do so.

To enable reuse and consistency of automation, make sure to provide guidelines for how to procure tools, how to select tests to automate, how to write (and document) maintainable scripts, and other similar tasks. This should entail a well-thought-out, well-engineered, and well-understood process.

Most test automation tools—at least those for execution—are essentially programming languages with various bells and whistles attached. Typically, we are going to create our testing framework in these languages and then use data or keywords in flat files, XML files, or databases to drive the tests. This supports maintainability. We will discuss this further when we talk about architectures later in this chapter.

Every tester has run up against the impossibility of testing every possible combination of inputs. This combinatorial explosion cannot be solved via automation, but we are likely to be able to run more combinations with automation than manually.

Some tools also provide the ability to go directly to an application’s application programming interface (API). For example, some test tools can talk directly to the web server at the HTTP and HTTPS interfaces rather than pushing test input through the browser. Tests written to the API level tend to be much more stable than those written to the GUI level because the API tends to evolve much more slowly than the GUI.

Scripting languages and their capabilities vary widely. Some scripting languages are like general-purpose programming languages. Others are domain specific, like TTCN-3 (used predominately in the conformance testing of communication systems). Some are not domain specific but have features that have made them popular in certain domains, like TCL in the telephony and embedded systems worlds. Many modern tools support widely understood programming languages (e.g., Java, Ruby, and VBA) rather than the custom, special-purpose languages of the early tools (e.g., TSL, SQA Basic, and 4Test).

Not all tools cost money—at least to buy. Some you download off the Internet and some you build yourself.

In terms of open-source test tools, there are lots of them. As with commercial software, the quality varies considerably. We’ve used some very solid open-source test tools, and we’ve also run into tools that would have to be improved to call them garbage.

Even if an open-source tool costs nothing to buy, it will cost time and effort to learn, use, and maintain. So, evaluate open-source tools just as you would commercial tools—rigorously and against your systems, not by running a canned demo. Remember, the canned demo will almost always work, and it establishes nothing more than basic platform compatibility.

In addition to quality considerations, with open-source tools that have certain types of licenses, such as Creative Commons and the GNU Public License, you might be forced to share enhancements you create. Your company’s management, and perhaps the legal department, will want to know about that if it’s going to happen.

If you can’t find an open-source or commercial tool, you can always build your own. Plenty of organizations do that. You might consider it if the core competencies of your organization include tool building and customized programming. However, it tends to be a very expensive way to go. In one recent engagement, Rex saw a team of three or four test engineers trying to build a test management system (with all the features of commercial test management systems) in the space of a year or so. Rex had to explain that most commercial test management systems represent person-centuries or even person-millennia of accumulated development effort, and so they were unlikely to succeed in any semblance of a realistic time frame.

Sometimes the creation of a custom tool is much easier than Rex’s example in the previous paragraph. However, when building a tool is a reasonable task, there are other risks to deal with. Since one or two people develop these lightweight tools as a side activity, there’s a high risk that when the tool developer leaves, the tool will become an orphan. Make sure all custom tools are fully documented.

Be aware that when testing safety-critical systems, there can be regulatory requirements about the certification of the tools used to test them. These requirements could preclude the use of custom and open-source tools for testing such systems, unless you are ready to perform the certification yourself. Don’t let self-certification scare you away from custom or open-source tools, though. Rex has had a number of clients tell him that this process is relatively straightforward.

We would like to take a moment to discuss the deployment of test tools. Before deploying any tool, try to consider all of its capabilities. Often while doing a tool search for a particular capability, we have found that the organization already had a tool that incorporated that capability but no one considered it. Historically, automation tools have a very high “shelf-ware index.” Many are the times that we have found multiple automation tool sets with valid licenses sitting in the back of the lab closet. Closely related to finding unused tool sets on a shelf is to learn that a currently used tool may be extensible to deliver a needed capability. In other words, the tool currently does not have the soughtafter capability, but with a little programming, configuration, and/or extension it could. Of course, in order to achieve that aim, it is essential to first understand how the tool works.

Various tools might require different levels of expertise to use. For example, a person without strong technical skills, including programming, is not likely to be successful using a performance tool. Likewise, without programming skills, the possibility that a person can be a successful automator is negligible. A test management tool should be managed by someone with strong organizational skills. A requirements management tool will tend to work much better when managed by a person with an analyst background. Make sure you match the tool to the person and the person to the tool.

When we use certain tools, we are creating software. Automation and performance tools come immediately to mind. When we create software, the output needs to be managed the way we manage other software. That includes configuration management, reviews, and inspections and testing! It always amazes us how often testers want the programmers to completely manage the software that comes out of the development team while totally ignoring all good software practices for the software that comes out of the test team.

Let’s take that a little further. We are going to discuss creating different architectures for automation later. When we create an architecture for test automation, it should not be done haphazardly. We should have architecture design documents, detailed design documents, and design and code reviews as well as component, integration, and system testing. What we are building is no less of a product than the product we are going to test.

And one final note. We mentioned earlier about auditing the capabilities of the tools you are using. Audit the automation itself. What tests do you already have and what do they test? As consultants, we have often been called in to audit an automation department to find out why they do not seem to be adding value to the test team. We have often found that they have hundreds and hundreds of scripted tests that are not doing good testing. Some of these scripts are poor because they directly automated manual tests without considering whether the tests were actually automatable (not every test is). We have found automated tests that had no way of matching expected with actual results. The assumption was made that if the script did not blow up, it must have passed. That might be clever, but it is not testing.

Earning value with automation is rarely easy and never accidental. Before deploying a tool, it is essential to understand that.

6.2.3 Automation Architectures (Data Driven vs. Keyword Driven)

Learning objectives

TTA-6.2.2 (K2) Summarize the differences between data-driven and keyword-driven automation.

Many years ago, Jamie attended a workshop on automation that was attended by many of the most experienced automators in the country. Several of the attendees got into a somewhat heated discussion as to who invented data-driven and keyword-driven automation concepts. They all claimed that they personally had come up with and developed these techniques.

They finally came to the realization that, indeed, they all had. Independently. Since then Jamie has discovered that there have been many cases in history where multiple people, needing a solution to a particular set of problems, came up with similar solutions.

In the early 1990s, everyone trying to automate testing had a common problem. Automation of testing was a meme complex² that had spread like wildfire, and there were a lot of tools being developed for it. Unfortunately, the basic capture/replay process just did not work. The basic model of capture/replay was really an unfunny joke. Virtually every person who wanted to be a serious automator, who saw the possibilities in the general idea while failing miserably in the execution, tried to come up with solutions. Many would-be automators fell by the wayside; but many of us persevered and eventually came up with solutions that worked for us. Many of these solutions looked very similar. Looking back, we guess it would have been strange if we did not come up with the same solutions—we all had the same problems with the same tools in roughly the same domain.

ISTQB Glossary

data-driven testing: A scripting technique that stores test input and expected results in a table or spreadsheet so that a single control script can execute all of the tests in the table. Data-driven testing is often used to support the application of test execution tools such as capture/playback tools.

keyword-driven testing: A scripting technique that uses data files to contain not only test data and expected results but also keywords related to the application being tested. The keywords are interpreted by special supporting scripts that are called by the control script for the test.

record (capture)/playback tool: A type of test execution tool where inputs are recorded during manual testing in order to generate automated test scripts that can be executed later (i.e., replayed). These tools are often used to support automated regression testing.

test execution tool: A type of test tool that is able to execute other software using an ATA automated test script (e.g., capture/playback).

In the next few paragraphs, we want to discuss the natural evolution in automation. We call it natural because there was no sudden breakthrough; there was just a step-by-step progression that occurred in many places.

The driver of this natural evolution is traditionally thought of as return on investment (ROI). Test automation may not fit within the traditional concept of an investment, but it must add value to the organization that wants to use it. If done correctly, automation can add value. When done well, it can add great value. But when an automation program is set up or executed poorly, the organization would be better off throwing their money into an active volcano than spending it on automation.

The costs of an automation program are typically very high. Therefore, the value that program must provide must also be very high. One way to provide value is to allow the organization to run many tests it could not otherwise run.

Back in the early days of automation, many automators could create large numbers of tests. Unfortunately, the ability to run the tests effectively was spotty—for a variety of reasons we will discuss later. Having a large number of tests that could not be run successfully was a problem all automators faced.

We needed a solution. The most common solution, as you will see, was a logical progression of architectures. Many automators went from capture/replay to the framework architecture, then to data driven, and some of us went finally to keyword/action word architectures. Our terminology is not definitive; like so much of testing, there is little commonality in naming conventions. Even the ISTQB glossary has a superficial set of definitions when it comes to automation. Therefore, we will try to define our concepts with examples and you can feel free to call the concepts whatever you like.

Incidentally, the evolution of automation is still occurring. Often we have walked into an organization as consultants and found the test group reinventing the automation wheel. If your organization wants to use automation, and you do not bring in an experienced automator who has already walked this long path, you will tend to go through the same evolution, making the same mistakes. However, since the wheel has already been invented, you might consider hiring or renting the expertise. If you do bring in an automator, make sure they know what they are doing. Too often, we have seen people with resumes claiming expertise at automation when the only thing they know is a single capture/playback tool.

In his book Outliers, Malcolm Gladwell suggests that it takes 10,000 hours of doing something to become an expert. Ten thousand hours works out to five years of continuous employment; that is, 40 hours per week, 50 weeks a year, for five years. So, Rex has a rule of thumb that to be considered an expert, an automator should have at least five continuous years performing automation. Jamie would tend to agree, but he would include the requirement that it be good automation for at least four of those years (as compared to five years of fumbling around).

6.2.3.1 The Capture/Replay Architecture

So, let’s take a look at the problem. You buy a tool (or bring in an open-source tool) that does capture/replay. These tools are essentially a wrapper around a programming language (the playback part) and have a mechanism to capture interface actions (keystrokes and mouse actions) and place them in a script using that programming language. At a later date, when you want to run the test again, you submit the script to an execution machine that re-creates the interface actions as if the human tester were still there.

Note that the script that was created essentially encapsulates everything you need—it has both the data and the instructions as to what to do with the data all in one place.

What could go wrong with that?

Capture/playback automation actually is a brilliant idea (other than the huge logic problems involved). It can be used, occasionally successfully, as a short-term solution to a short-term problem. If we need quick and dirty regression testing for a single release, it might work. Jamie once recorded a quick script that could be run multiple times at 3 a.m. to isolate a problem with a remote process. When Jamie came in the next morning, the script had triggered the failure, and they had a good record of it due to the recorded script. If you have a lab full of workstations and want to record a quick and dirty load test to exercise a server, you can do that.

But, if we want a stable, long-term testing solution that works every time we click the go button, well, the capture/replay tool won’t do that. It is our experience, and that of every automator that we have ever spoken to, that the capture/ replay architecture is completely worthless as a long-term testing solution.

In Figure 6–2 you see a recorded script from one of the all-time popular capture/replay tools, called WinRunner from Mercury-Interactive (currently owned by HP). We have removed the spaces to save room, but this is pretty much what was captured when recording. This [partial] script was generated to exercise a medical software package that was used to allow doctors to prescribe drugs and treatments for patients directly.

Figure 6–2 Partial WinRunner script

First of all, it is a little difficult to read. This script exemplifies why we have code guidelines and standards. But because it is meant to execute directly, maybe we aren’t supposed to be able to read it.

Let’s discuss for a moment how a human being interacts with a computer. After all, we are really trying to simulate a human being when we automate a test.

When a human being wants to interact with a GUI (in this case a Windows application), they sit down, look at the screen, and interact with what they see on the screen using the keyboard and the mouse. As mentioned earlier, GUI objects seen on the screen are generally metaphors: we see files to edit, buttons to press, tree lists to search, menus to select. There is always an active window (the one that will get input); we make it active by clicking on it. There is an active object in the window; when it is active we can tell because it usually changes somehow to let us know. There may be a blinking cursor in it, it may be highlighted, or its color may change. We deal directly with that active object using the keyboard and/or mouse. If the object we want to deal with is not active, we click on or tab to it to make it active. If we don’t see the object, we don’t try to deal with it. If something takes a little too long to react to our manipulation, we wait for it to be ready. If it takes way too long, we report it as an anomaly or defect.

Essentially, a manual test case is an abstraction. No matter how complete, it describes an abstract action that is filtered through the mind and fingertips of the manual tester. Open a file, save a record, enter a password: all of those are abstract ideas that need to be translated. The human tester reads the step in the test procedure and translates the abstract idea to the metaphor on the screen using the mouse and keyboard.

In this script, you see a logical translation of those steps. The first line is identifying the window we want to make active, to interact with. The second line details the control we want to deal with, in this case a specific text box. We type a string into that (the “edit_set”) and then press the Tab key to move to the next control. Step-by-step, we deal with a window, a control, an action. The data is built right into the script.

So where is the problem? The script is a little ugly, but programming languages often are. We don’t expect them to read as if they were to be awarded a Pulitzer Prize in literature. As long as it drives the test to do the right thing, does it matter if it’s ugly? What if it does not work like a human would, however?

A recorded script is completely, 100 percent literal. It tries to do exactly what the tester did—and nothing else. That is really the crux of the problem; it models the human tester completely wrong. Think about what a capture/replay tool is actually saying about the tester it tries to model. The tester is just a monkey who mindlessly does what the manual test case tells them to do. Click here, type there.

But modeling a tester as a repeatable monkey is not valid. A manual tester—at least one that knows what they are doing—adds important elements to that abstract list of steps we call a test procedure. We can narrow it down to two important characteristics added by the human tester to every line of any test: context and reasonableness.

Regarding context, the tester can see and understand what is going on with the workstation. A recorded script cannot add context other than in a really limited way. Look back at the script in Figure 6–2. It has the tester tab from the ID text field (edt_MR Number) to the password text field (edt_Password). If the tab order had changed, a human would see that and tab again, or pick up the mouse and move to the right place with a single click. The script expects the password text field to forever be one tab after the ID text field. Change kills automation when relying on capture/replay. When a failure occurs, it is often signaled by something out of context. A human being is constantly scanning the entire screen to understand the current context. If something incorrect happens—something out of context—the human sees it, evaluates it, and makes a decision. Is it an anomaly that we need to document but then we can continue on? Is it an unrelated failure that we must stop for? The automation tool has no such contextual capability.

If the scripter puts in a check for a particular thing, and that thing is incorrect, the tool will find it. But nothing else will be found. If the script tries to do something—say, type in the password text field—and it does not find the field, it can report in the log that a control was not found. But that test has just failed, often for a superficial nit that a human could have dealt with gracefully.

The other characteristic added by a human is reasonableness. It is clear that there has to be some kind of timing to an automated script. If the script is told to do something to a control that is not currently visible, it will wait for a short amount of time (typically 3 seconds). If the control does not show in that time, boom, the test just failed! Suppose it is a control on a web page that is slow loading? Fail! Suppose it is a control that is out of view due to scrolling? Fail (with most tools). Suppose the developer changed the tag on the control? Fail. A human can sit and look at the screen. It takes 4 seconds rather than 3 seconds? We’ll wait—and we might just note in the test log that the control took a long time showing up. Not on the screen? A tester will scroll it. Renamed? A tester will find it.

The flip side of reasonableness is that human beings will often see when there is an issue; for example, when the screen does not render correctly but the automation tool misses it. Here is a funny story from Jamie’s early automation career. Jamie was to the point that he was automating the testing of complete dialogs in a Windows 3.1 application. He must have bragged too much because one of the developers decided to teach him a lesson. One small, rarely used modal dialog was modified such that the foreground colors of the dialog were changed to match the background colors. The automation never noticed it—the script testing the dialog worked flawlessly as it identified objects by their listed properties, one of which was not color. As a tester, it was Jamie’s duty to at least run a sanity test on the dialog. As an automator, he didn’t. They shipped the code that way. Support said they got very quizzical questions as to why there was a completely empty dialog in the application. Oops! The truth is that some modern automation tools solve some of these problems. Others don’t. There is no capture/replay tool that solves every problem; there are no tools that can always add context and reasonableness except through programming.

Error recovery is virtually always a problem with capture/replay. Early tools had no ability to recover from an error; many modern tools have a limited ability on their own. So, assume that we do have a failure in a test. A human discerns there was an error, gracefully shuts down the application, restarts it, and moves on to the next test. What does the recorded automation script do in case of a failure? The early ones mainly just stopped. Most long-time automators wish they had a nickel for every morning they came in and the suite was stuck on the second test and had not moved all night. Some of the modern tools can, in limited cases, shut down the system and continue on to the next test. Sometimes... But suppose a dialog box pops up that was unexpected? We’ll see you Monday morning.

6.2.3.2 Evolving from Capture/Replay

The sad but true fact is that change is the cause of most capture/replay failures. Jamie once had an executive rail at him because the automation was always broken. Every time they ran the scripts, the tests failed because the developers had made changes to the system under test that the automation tool could not resolve. (While we were writing this book, Rex had a number of programmers and testers make the exact same complaint about tests created with the tools QuickTest Pro and Test Complete.) Jamie told the executive that he could fix the problem, easy as pie. When the executive asked how, Jamie told him to have the development team stop making changes. No changes, no failures. As you might expect, he did not take Jamie’s advice...

So, the developer changes the order that events are handled. Boom, automation just failed. A human tester: no problem.

A human being sees a control and identifies it by its associated text, its location, or its context. A tool identifies an object by its location, or its associated tag, or by its index of like fields on the screen (from the top-left to bottom-right) or possibly by an internal identifier. If the way the control is identified changes at all, the tool likely does not find the control. The control was moved a few pixels? If location was the way the automator identified the control, boom, automation just failed. A human tester tends not to have the same problem.

Timing changes. Boom, automation is likely to fail. A human has no such problem. Dynamic content where controls may be enabled/disabled or made visible/invisible based on business rules that are unknown until runtime. Boom, automation is likely to fail sometimes. The human tester simply evaluates what is showing at the time and decides on the fly whether it is correct or not.

System context change? Suppose the recorded test saves a file. The next time the test is run, unless the file was physically removed, when the file save occurs, we are likely to get an extra dialog box popping up: “Do you want to overwrite the file?” Boom, the automation just failed. A human would simply click Yes to clear the message and then move on. The more clever the programmers are, putting up reminder messages or asynchronous warning messages, the more it fouls up the automation. Already created that record in the database? Sorry, you can’t create it again.

Frankly, good automators using good processes can minimize these kinds of problems. Working hand in hand with the developers can minimize some. Modern automation tools can minimize some. But, even with the best of everything, you still have testing that just barely limps along. The testing is brittle, just waiting for the next pebble to trip over.

And scalability—the ability to run large numbers of tests without much added cost—is the ultimate capture/replay automation project killer! When the automators have to spend all of their time repairing existing scripts rather than creating new ones, the automation project is already dying.

Let’s work through a theoretical situation, one that every automator who has made the jump from capture/replay to the next step has endured.

You have built a thousand test cases using capture/replay. Each one of those test cases at one time or another has to open a file. Each one of them recorded the same sequence: pulling down the File menu and clicking the Open File menu item.

You get Wednesday’s build and kick off the automation. Each automated test case in turn fails. You analyze the problem and there you find, for no particular reason, that the developer has changed the menu item from Open File to Open. Okay, you grab a cup of coffee and start changing every one of your scripts. Simple fix, really. Open each one up, find every place it says Open File, remove the word File. If you are really smart, you might do a universal change using search-and-replace or a GREP-like tool; watch that, though, because Open File may show up in a variety of places, some of which did not change. Work all night, get all 1,000 edited, kick them off, find the 78 you edited incorrectly, fix those, and by Friday morning all is well with the world. Of course, you did just totally waste two days...

In Monday’s build, the developer decided that change just wasn’t elegant, so he changed it back to Open File. You slowly count to 10 in three languages under your breath to avoid saying something to the programmer you cannot take back.

Scalability is a critical problem for the capture/replay architecture. As soon as you get a non-trivial number of test scripts, you get a non-trivial amount of changes that will kill your productivity. The smallest change can—and will—kill your scripts. You can minimize the changes by getting development to stop making changes for spurious reasons, but change is going to come and it is going to kill your scripts, your productivity, and hence your value.

6.2.3.3 The Simple Framework Architecture

Every developer can easily see the solution to this problem. Two generations ago, when spaghetti code was the norm, programmers came up with the idea of decomposition, building callable subroutines for when they do a task multiple times.

As noted earlier, the automation tool the tester is using has—at its heart—a programming language. It should not be a surprise that programming is the solution to these problems.

For the programmer change noted earlier, you can create a function called OpenFile() and pass in the filename you want to open as a parameter. You could even just put the recorded code into the function if you wanted. In each one of your 1,000 scripts, replace the recorded code with the function call, passing in the correct filename.

Oops. You get all this done and the developer has [another] change of heart. You get the new build, every test case fails. Ah, but now you go in and change the body of the function itself, recompile all of the scripts, and voila! They all run. Elapsed time: maybe 10 minutes.

Scalability is an important key to successful automation. We need to run hundreds if not thousands of automated tests to recoup our fixed automation costs, much less get positive value. If the automation team cannot reliably run lots of tests, automation will never add a positive value.

Notice now, however, that this is no longer a complete capture/replay architecture. The architecture is now partially recorded and partially programmed. And there are a lot more failure points than just the Open [File]. We could create lots of different functions for other places liable to change. And, come to think of it, we could do more than just open a file using recorded strokes. As long as we are programming a function, we can make it elegant. Perhaps add error handling with meaningful error messages. If the file is stored on a drive that is not mapped, we can add automatic drive mapping inside the function. If the file is large or remote, we can allow more time for it to open without letting it fail. If it takes too long to open but does finally open, we can put a warning message in the log without failing the test. If something unexpected happens, we can take a snapshot of the screen at the failure point so we have an image for the defect record. We can put multiple tasks in a single function, giving us aggregate functionality. Rather than separate keystrokes, we could have a LogIn() function that brings up a dialog, types in the user ID and password, presses the go button, and checks the results.

The automator is limited only by imagination. The more often a function is going to be used, the more value there is in making it elegant.

This leads to what we call the Simple Framework architecture. Other people have other names, and there does not appear to be any standard name used yet. The architecture is defined by decomposing various tasks into callable functions and adding a variety of helper functions that can be called (e.g., logging functions, error handling functions, etc.).

We said earlier that we would differentiate between the terms framework and architecture. That becomes a little cloudier when the architecture is named “Simple Framework.” Sorry about that.

The architecture is the conceptual or guiding idea that is used in building the framework. Perhaps this metaphor might help. Consider the architecture to be an automobile. It has four wheels, two or four doors, seats, and an engine. There are many different versions of automobiles, including Saab, Toyota, Chevy, Ford, Dodge, and so on. Each of them is seen as an automobile (as compared to a truck or an airplane, say).

We build an instance of an architecture, calling it a framework. We might build it with a specific tool, building special functionality to make up for any shortcomings that tool might have. We may add special logging for this particular project, special error handling for that. There are several open-source frameworks available that fit certain architectures (e.g., data-driven frameworks, keyword-driven frameworks).

This particular architecture we have named the Simple Framework. The specific details of how you implement it are up to you and your organization. Those decisions should be based on need, skill set, and always—ALWAYS—with an eye toward adding long-term value to the automated testing.

Functionality that is used a lot gets programmed with functions. The more likely a function is to fail, the more time we spend carefully programming error handling and specialized logging. Some stuff that is rarely done might still be recorded using the tool’s facilities. A script may be partially recorded, partially scripted. We might add functionality outside the tool; Jamie likes to add custom-written DLLs to integrate more complex functions into the framework.

In the capture/replay architecture, we could allow anyone with any skill set to record the test scripts. Note that now we need one or more specialists: programming testers that many people just call automators. Without programming skills, the framework does not get built. Without excellent software engineering skills, a framework may get built that is just as failure prone and brittle as the capture/replay architecture was. Because, in the final analysis, we are creating a long-term project: building a software application we call a framework.

There are still some risks that we have now that we must consider. Scalability is better than the capture/replay framework, but it’s still not great; for each test case, we still have a separate script that must be executed. That may mean thousands and thousands of physical artifacts (scripts) that must be managed. We have seen automation frameworks where the line-of-code count is higher than that of the system under test.

In addition, test data is still directly encoded in each script. That is a problem when we want to run a test that covers the same area with different data.

And we must ask the question, Who is going to write the tests? Too often, we have seen where an organization refuses to hire testers who are not also programmers. They insist that every tester must be able to also write automation code.

Frankly, we think this is a huge mistake. A tester may have some programming skills, or they may not. Are you going to fire every tester in your organization who came from support? All the domain experts who don’t know anything about programming? We look at the skill sets of tester and automator as potentially overlapping but not necessarily the same. Not every tester wants to be a programmer—that may even be why they are testers. Testing is much more about risk than it is about programming. If all of our testers are consumed with worrying about the automation architecture, when are they going to be able to think about the risk management tasks that are their real value add?

We believe the best organizational design is to have a test subgroup made up of automation specialists. This would include, like any development team, both designers and programmers (or in a small group, it might be the same person filling both roles). This automation group provides a service to the testers, negotiating on the specific tests that will get automated. Automation is done purposely, with an eye toward adding long-term value. Each project team may have its own automation team, but it is generally our experience that a centralized team, shared between projects, is much more cost effective.

We have solved some of our automation problems with our simple framework architecture, but we are not done yet. We still have some automation risks that we might want to mitigate.

6.2.3.4 The Data-Driven Architecture

The number of scripts in our simple framework architecture that we have to deal with is problematic. As we mentioned earlier, we are going to need a lot of testing to help recover our fixed costs, much less our variable costs. More test cases, more scripts. But there is more overhead (i.e., variable costs) the more scripts we have. And the really annoying thing is that so many of our test cases tend to do the same things, just using different data.

This is the situation that tends to drive automators from the simple framework architecture to the data-driven architecture.

Consider the following scenario. We are testing a critical GUI screen with 25 separate fields. There are a lot of different scenarios that we want to test. We also need to test the error handling to make sure we are not entering bad data into the database. Manual testing this is likely to be ugly, mind-numbing, braindeadening, soul-sucking testing of the worst sort. Enter all the values. Click OK. Make sure it is accepted. Go back. Enter all the data. Click OK. Go back. Repeat until we want to find a bridge to jump off. This is the exact reason automation was invented, right? But we could easily have 100 to 200 different scripts, one for each different (but similar) test case. That is a lot of potential maintenance.

To automate this in our simple framework architecture, we create a script that first gets us to the right position to start entering data. Then we sequentially fill each field with a value. After all are filled, click the OK button (or whatever action causes the system to evaluate the inputs). If it is a negative test, we expect a specific error message. If it is a positive test, we expect to get... somewhere, defined by the system we are testing. Each script looks substantially the same except for the data.

This is where a new architecture evolved. Most people call it data-driven testing, and Jamie invented it. To be fair, so did just about every single automator in the mid-1990s who had to deal with this kind of scenario. In Jamie’s case, he realized that he could parameterize the data and put it into a spreadsheet, one column per data input. Later he used a database; it does not really matter where you put the data as long as you can access it programmatically. Some automators prefer flat file formats like comma-separated value (CSV). One column per data input, and then we might add one or more extra columns for the expected result or error message. Each row of data represents a single test. We simply create a single script and build into it a mechanism to go get the appropriate row of data. Now, that single script (built so it’s almost identical to all of our framework scripts except for the ability to pick data from a data store) represents any number of actual tests. Have one dozen, two dozen, a hundred rows of data, it doesn’t matter. It is still only one script.

Want a new test? Add a new row of data to the data store. Assuming you built the framework correctly, it will pick up the number of tests dynamically, so there are no other changes needed. Next time the automation runs, the new test is automatically picked up and run.

Remember that earlier we said that scalability is an important key to success. Now, to thoroughly test a single screen, we can conceptually have one script and one data store. Compare that to the possibility of 100 to 200 or more scripts just to test that GUI screen.

So now we have a data-driven architecture. Notice that nothing precludes us from having a framework script—or 100—that are not data driven. We might even have a mostly recorded script or two for things that don’t need to be tested repetitively. It takes an automator to build in the ability to pick up data from the data store, to parameterize the functions. Or perhaps the tool might have that ability built in. We might also add some more error handling, better logging, etc.

Jamie has a basic rule of thumb when dealing with automation. Have the automators hide as much as possible of the complexity inside the automation so that testers do not have to worry about it. We want the testers to be concerned about risk, about test analysis and design, and about finding failures. We do not want them worrying how the automation works. Need a completely new scenario? The tester needs to give the automator enough information that they can script it—a good solid manual test procedure is optimal. If it is something they need to test a lot, say so. After that, want a new test, same scenario? Add a row of data to the data store and voila...

At this point, the number of tests is no longer proportional to the number of scripts. And, as Martha Stewart used to say, “That is good.” Scalability of maintenance becomes nonlinear where repairing one script may fix dozens or hundreds of tests.

But we are not yet done. Perhaps we can get rid of scripts all together.

6.2.3.5 The Keyword-Driven Architecture

Let’s think about a perfect testing world for a second. A tester, sitting on a pillow at home (we said perfect, right?) comes up with the perfect test scenario. She waves her magic wand and, presto chango, the test comes into being. It knows how to run itself. It knows how to report on itself. It can run in conjunction with any other set of tests or it can run alone. It has no technology associated with it, no script to break. If the system changes, it automatically morphs to “do the right thing.”

Okay, the magic wand may be a little bit difficult to achieve. But the rest of it might just be doable—kind of.

We are going to talk about what is now known as keyword-driven (or sometimes action-word) testing. Keyword-driven testing has been described as a way to use a metalanguage that allows a tester to directly automate without knowing anything about programming. But, if the meta thing bothers you, don’t worry about it. We’ll sneak up on it.

Let’s forget about automation for a second. Instead, let’s just look at something we all have seen. Table 6–1 contains a partial manual test procedure—not a full-blown IEEE 829 test procedure specification but a nice minimalist test procedure.

Now, let’s think what we are really seeing in this table. A test procedure step can often be described in three columns. The first column has an abstract task that we want to perform to the system under test—abstract in that it does not tell you how to do it; it is really a placeholder for the knowledge and skill of the manual tester. The tester knows (we hope) how to start the system, how to log in, how to create a record, how to edit a record. That’s why we pay testers rather than train monkeys, right?

Table 6–1 A minimal manual test procedure

The second column is not abstract; it is very tangible. This is the exact data that we will be using. ISTQB says that this data is the test case, along with the expected result in column three. But, note that column three is also kind of abstract. Start up correctly, log in correctly, record created correctly (key returned), change notification. Again, we are expecting the tester to know what the right thing is and how to determine it.

Remember the discussion we had earlier about context and reasonableness? As shown above, a manual test procedure—at least columns one and three—is just an abstract shell to which a manual tester pours in context and reasonableness when they run the test. Certainly column two is not abstract; that is the concrete data of the test case. With a good tester, we usually do not have to go into excruciating detail for columns one and three; they know the context and what is reasonable for their domain

Almost every manual test procedure we have ever written kind of looks like this or at least could be written like this. Now imagine that you already have a framework with functions for common items like starting up the system being tested, logging in, creating a new record, and so on. Each framework function has been programmed to contain both context and reasonableness. A script is merely a stylized way to string those executable functions together. So we should ask, “Do we really need a script?”

As you can guess, the answer is no. The script is there for the benefit of the tool, not the tester. If we had a way to make it easy for a tester (with no programming experience needed) to list the tasks they wanted to do in the order they wanted, and to pick up the data they wanted to use for those tasks, we could figure out a way to scan through them and execute them without a formal script.

This is what a keyword language is. A metalanguage (in this case, meta means high level) that has, as a grammar, the tasks that a tester wants to execute. It likely does not have the normal structures (loops, conditionals, etc.) that a standard, procedural programming language has—although in some cases those have been built in. It is actually a lot more like SQL, a descriptive language rather than a procedural language.

Here you see some framework functions that could exist from the simple framework architecture automation already existing (or they could be built completely from the ground up if we are just starting):

StartApp(str Appname)
Login(str UserID, str Pwd)
CreateNewRec(ptr DataRow)
EditRec(int RecNum, ptr DataRow)
CloseApp(str Appname)

The actual keywords here are <StartApp>, <Login>, <CreateNewRec>, <Edit-Rec>, and <CloseApp>. Notice the keywords are selected to have some kind of domain-inspired meaning to the testers who will be using them. In the parentheses, you see the data parameters that must be passed in. It still looks like a programming language, right? Well, we need to do a little more to make this user friendly.

The reason this entire keyword mechanism exists is to make it easy for non-programming testers to directly build executable test cases. The easier we make the metalanguage to use, the lower we can drive our variable costs, like training and support.

6.2.4 Creating Keyword-Driven Tables

Learning objectives

TTA-6.2.4 (K3) Create a keyword table based on a given business process.

In our experience, the best way to create a keyword grammar is to mine the manual test cases that already exist. The abstract tasks that will eventually end up as keywords are already likely defined there. The scope of the keywords’ actual functionality will usually come from the knowledge of the actual testers who run those tests. This will ensure that the testers who are going to be tasked with building keyword-driven tests will understand intuitively how to use them.

The actual granularity of the keywords should be debated by the team designing the architecture; that is, how much functionality should a keyword actually entail? There are no absolute answers to the question, but different levels of granularity will require different technical solutions.

For example, at the lowest granularity, you could have keywords that say ClickButton() or TypeEdit(). This would clearly be tied very closely to the interface, and would therefore be problematic when the interface changes. However, there have been keyword tools that consisted of such low-level granularity. Consider how these might be used to automate the login process (see Table 6–2).

Table 6–2 Low-granularity keywords

This scheme would require more columns than we discussed in our previous example because we would be required to identify specific screen objects. Any changes to the interface would likely break the keyword scripts (although to be fair, there are ways of dealing with that by using GUI maps or other indirection schemes.) While this gives ultimate control to the tester, it does not really save them time or allow the abstraction we see as a benefit to the architecture.

Consider how, by setting our granularity level a bit higher, we can make it easier for the testers to use the keywords without knowing anything about the actual interface (i.e., using more abstraction). We could conceivably have a Login() keyword that takes two arguments as seen in Table 6–3.

Table 6–3 Higher-granularity keywords

Note that the data passed in could be indirectly referenced by passing in pointers to data stores.

Using higher granularity may make it easier to abstract certain tasks, but it can certainly be taken too far. For example, Jamie was involved in a keyword project conversion once where the starting point was a simple framework-based architecture. To avoid losing the existing automation during the conversion, he created a keyword as shown in Table 6–4.

Table 6–4 A step too far

This keyword does exactly what you would expect: runs the entire existing script found out on the server. When Jamie checked back with the group 11 months later, they had not graduated to actual keywords yet. It was working, so they did not want to invest further resources.

One thing that can be done is allow aggregation of keywords. Suppose we have a number of tasks involved in creating a record in a database. At times we might want to perform each of these tasks discretely, so we have a different keyword for each task. The following tasks might be covered:

CreateNewRecord()
AddPersonalInfo()
AddBusinessInfo()
AddShippingInfo()
SaveExistingRecord()

While we may have started with a single high-granularity keyword that encapsulated all of these, it might make business sense to allow each to be a separate keyword. However, we might still want to use them all together at once. Rather than create a whole new keyword, some implementations allow a new keyword to be created through aggregation. A new keyword, CreateAndPopulateRecord(), could be defined as the combination of the previous five keywords. This obviously requires some intelligent finessing of the inputted data, but it is a reasonable way to allow new keywords to be created.

6.2.4.1 Building an Intelligent Front End

Whatever the level of granularity the keywords are going to have, in our mind the most important design choice to make this architecture easily usable is to have an intelligent front end that can lead a tester through the process of building the test. It should have drop-down lists that are contextually loaded with keywords so the user does not need to remember the keyword grammar. Such a front end can be built in Excel (on the low end) or in just about any rapid application development (RAD) language that the automator is comfortable with (Jamie historically tends to use Delphi) on the high end. It is our belief that the better we build the front end, the more we can expect to gain from using it long term.

Consider one way such a front end might work. We start creating a new test with a screen that shows three columns and rows that are all blank. Logically, there are only a few things a tester could do; the most logical step would be to start the system under test. So, the first column of the blank test would have a drop-down list that would include the keyword <StartApp>. If the user clicks on <StartApp>, the front end knows that it takes one argument and therefore prompts the tester to enter that in the second column. The drop-down list in the second column would include all of the applications for which there are keywords available. The list is loaded dynamically as soon as the first column is selected (i.e., <StartApp> is chosen). The existing StartApp() function in the simple framework already knows how to check to make sure the application started correctly, so column three is not strictly needed here.

Moving on to the next keyword (the second row of the test), there are only a small number of logical steps that a tester might take next, so the drop-down list in the first column would automatically load keywords representing those steps. In other words, the next keyword drop-down list is populated with only those keywords that logically could be called based on where the previous keyword left us (assuming that the execution actually succeeded). The keyword <Login> would be one of the possible tasks, so the tester selects that. <Login> takes two parameters, so the front end prompts for the user ID and password to use. The third column drop-down might have all known valid and nonvalid results that the framework function may return. If the expectation is that the login should work, that would be chosen. If this is a negative test, a specific expected error message could be selected.

Each step of the way, the tester is guided and helped by the front end. We do not assume that the tester knows the keywords, nor do we assume that they will remember the argument number or types. We do assume that the tester knows the domain being tested, however. The keywords that are loaded in the column one drop-down list at any given time should be a subset of all the keywords, where the starting point is the ending point of the previous keyword. Or, the drop-down may show a tree of all available keywords possible in a hierarchical structure. The important point we are trying to make is that the more help this front-end tool can give to the test creator, the less training needed and the higher the productivity of the testers.

6.2.4.2 Keywords and the Virtual Machine

This scheme assumes that a keyword always executes successfully to the end so that it leaves the application being tested in the expected state. Of course, any tester can tell you that is not likely to always be the case. Errors might occur or the GUI could be changed. This is where the simple framework comes in. If there is any failure during the execution of the keyword, or if the keyword execution does not drive the system to the expected state, the framework must document the occurrence in the log, clean up the environment (including backing out and perhaps shutting down the application being tested), and moving the automation suite to the next test to be run.

This recovery is transparent to the tester. When correctly built, the keyword architecture allows the testers using it to make the assumption that every test will pass, thus simplifying the testing from the viewpoint of the tester. After all, that is exactly how manual tests are written; each test is expected to pass and, if it doesn’t, the manual tester will be able to handle the cleanup and logging tasks.

We might not have figured out the hypothetical magic wand mentioned earlier yet, but the more intelligence automators can add to the framework, the closer we get to it.

If we assume that we can build an intelligent front end to help guide the user, then what we need are keywords. Each keyword should be modeled on a physical task that a tester would be expected to perform, as we had discussed for our manual test procedure. The tester arranges the keywords in the order of execution, exactly the way they did for the manual test procedure. Instead of having a manual tester supply the context and reasonableness during execution, however, the automator supplies it by programming a framework function to do the task. The automator builds into the keyword function the data pickup, error handling, synchronization, expected result handling, and anything else that a manual tester would have handled.

The automator also must build a virtual machine to execute the keywords. Reading the definition for keyword testing in the ISTQB glossary, it references the word script three times, calling this a scripting technique run by a control script. We believe this definition is too limiting. There is absolutely no reason that we have to use scripts or scripting techniques to build a successful key- word-driven tool. We are going to simply call this process a virtual machine. A given automation framework may use scripts, but the tools that Jamie has built have not always used scripts.

This virtual machine will handle all of the logging tasks, exception handling, failed test handling, and execution of the keywords. Essentially, this virtual machine is the framework. Had we started with a simple framework architecture, much of the complexity needed for the keyword architecture is already there. When started, the framework likely sets up the environment, initializes the log, picks up the first test to be run, picks up the first keyword, executes it, picks up the next keyword, and so on. At any time, a failure causes the virtual machine to back out of the current test, log the error (possibly including taking and storing a snapshot of the screen), clean up the environment, and prepare to run the next test case.

As part of the front end, there also should be a business process by which a tester can request a new keyword to perform a certain task. Suppose we were automating MS Word testing. A tester might ask for a keyword that allows them to create a table in a document, with row and column count as parameters. Once the tester asks for it, the automator would make the keyword available so that testers could immediately start using it in their test cases.

Note that we have complete separation of the keyword language, which is abstract and descriptive, and the virtual machine, which has all of the actual execution facilities. A new keyword can be created and used weeks before the keyword execution functionality is built into the virtual machine. Certainly the automator would need to create the functionality behind the keyword to make it executable before the tests are to be run.

This ability for functional testers to create keywords well before the functionality providing for execution of them is written fits well into the model we are trying to describe here. After all, manual test cases can be created well before the code is available to run them, following the ISTQB processes of analysis, design, and implementation. All of this can be as complex as needed by the organization. Note that we are discussing building a front end, a virtual execution environment, exception handler, and all of the other bells and whistles: all of this work means the automators must be good developers. This is a real software system that we are talking about and it likely will take a small team of automators/programmers to build it.

6.2.4.3 Benefits of the Keyword Architecture

Consider the implications of this architecture. We have now effectively built a truly scalable test system. Suppose you have five automators building keyword systems for several different projects in an organization. There are some serious costs there. But notice that any number of projects with any number of testers can be supported; 10 or 20 or 100 testers could all use the keyword systems. There is no limit to the number of testers, nor is there a limit to the number of tests. Whether we have 1,000 or 100,000 test cases, we still need the same number of automators. Compare that to when we were using scripts and there was a finite number of scripts that a single automator could support.

This architecture also allows us to break away from the GUI. Want to test 3,270 or 5,250 dumb terminal emulators? UNIX or AIX interfaces? It takes some intelligent programming on the part of the automator, but the sky really is the limit.

Reviewing where we are now:

Domain experts (i.e., test analysts) describe the keywords needed and build the test cases using them.
Automation experts write the automation code to back the keywords and a mechanism for putting them in order easily (the front end).
An unlimited number of testers can be supported by relatively few automators.
Test creation is essentially point and click from the front end.

Incidentally, for most of the keyword systems that Jamie has built, he has included the execution module in the front end. This allows anyone to graphically choose what test suites to run and where and when to run them using the same tool as the tests were built in. Some of these also had log browsers built in to provide the ability to walk through the results.

There are several benefits to the keyword architecture that we have not yet addressed.

Throughout this entire discussion of automation, we have stressed what a problem change is. Anytime we are dealing with scripts and the underlying system under test changes, the scripts stop working and they must be repaired. That means the more tests we have, the more effort it takes to fix them. The simple framework architecture and data-driven architectures helped some, but there was still this issue with the scripts.

Manual test procedures, on the other hand, are rarely brittle—at least if we are careful to write them toward the logical end of the detail spectrum rather than as rigid concrete tests with screens, inputs, and outputs all hardcoded into the procedure. As long as we do the same tasks, change does not break them. The manual test procedures—at least when logical rather than concrete—are essentially abstract. Of course, the amount of detail in the test procedure is going to be governed by more than how changeable they need to be. Testing medical, mission-critical, and similar types of software is likely to require extremely detailed procedures.

Consider a common task like opening a file in Microsoft Word. We did it one way in Word for Windows 2000, another way in Word for Windows 2008. Each different version of Word changed the way we opened a file; the interface changed radically during that time period. The differences are mostly subtle—insurmountable to capture/replay, but easy to manual testers. Any automated function would have to change each time. But a manual test procedure that said, “Open a file in Word” would not need to change; the details are abstracted out.

A keyword test is very much like a manual test procedure. It contains abstract functionality, like <OpenFile>. Well-designed keyword tests are very resistant to change. The interface can change a lot, but the abstract idea of what a step is doing changes very slowly.

With keyword-based testing, our main assets—the keyword tests them-selves—are resistant to change. The (perhaps) thousands of test cases do not need to change when the interface changes because they are abstract. The code backing each keyword will certainly be likely to change over time. And, the automators will need to make those changes. But the main assets, the tests themselves, will likely not need to change. The number of keywords tends to be relatively stable, so the maintenance on them will be relatively small (especially when compared to the amount of testing that is supported).

When you look at the advantages, if an organization is going to start an automation project, keywords are often the way to go. Not every domain is suitable for this kind of automation, but many are.

One disadvantage of capture/replay automation has always been how late in a cycle it can be done. In order to record a script, the system has to be well into system test so a recording can be successfully made. The earlier you record a script, the more likely change will invalidate it. The simple framework architecture mitigates that a little, but by the time the simple framework functions can be completed, plus all of the scripts created, it is again late in the cycle. Data-driven testing still uses scripts so we still have the same problem (to a lesser degree).

But keywords are abstract. As long as we have a good idea what the workflow is going to be for our system, we can have the keywords defined before the system code is written. Our testers could be developing their keyword test cases before the system is delivered, exactly the way they could when they were creating manual test cases and procedures. Conceivably, all keyword automated test cases could be ready on first delivery of the code. Certainly the functionality of the keywords will not be there yet. However, the automators could plausibly have the functionality close to ready based on early releases from development. Then, once the code is delivered, some minor tweaks and the automation might be running—early in system test when it could really make a difference.

Historically, automation has mostly been useful for regression testing during system test (or later). With keyword testing, it is possible to push automation further up in the schedule. It is thinkable that a suite of automated tests could be applied to functionality testing the first time that functionality comes into test if the automators were supplied with early code releases to write their functions. Of course, in Agile life cycles, people use approaches like acceptance test driven development (ATDD) and behavior driven development (BDD) to create automated tests that can be run the first time the functionality is available for testing.

Data-driven techniques can be used to extend the amount of testing also. Rather than inputting specific data into column two, we could input pointers to data stores where multiple data sets are stored. This would effectively give us a data-driven keyword architecture.

There are a number of options available for keyword-driven tools. There are several open-source frameworks available as well as many commercial tools that incorporate keyword technology. Some of these sit on existing automation capture/replay tools, so you can adapt to keywords without losing your investment in frameworks that you have already written. If your organization has the skill set, you might also want to build your own framework. The front end could then be customized for higher productivity and to fit your own needs.

Over his career, Jamie has created a number of these, starting back in the middle 1990s. While they take a lot of time and effort, he has generally found it to be well worth the investment.

Note that the keyword tests themselves, which are really the testware assets for the organization, are completely abstracted from the execution layer—the virtual machine we have discussed. That means that the automators, if they had to, could completely switch automation tools without having to change any tests. Simply rewrite the framework functions that make up the virtual machine and the organization is back in business. Contrast that to all of the architectures we have discussed where all of the tests (scripts) are intimately tied into the tool they were written in.

It is possible to have several specialized automation tools (vendor and/or open source and/or home brewed) all working to support the abstract layer where the value resides. No longer is the automation team held hostage by a particular tool that may become an orphan because a tool company decided to deprecate it.

6.2.4.4 Creating Keyword-Driven Tables Exercise

Refer to the pseudo code recorded script as seen in Figure 6–3. This is the same code we saw in Figure 6–2.

Devise a keyword grammar that could be used to test this portion of the application. Note that you will need to use your imagination a bit to figure out what the recorded script is doing. The important thing is to look for the underlying business logic while reading the actions.

Figure 6–3 Recorded WinRunner script

6.2.4.5 Keyword-Driven Exercise Debrief

The first six lines of this script could be seen as a single action. Note that the following would be a description of a human being (assume a doctor) interacting with the application using the keyboard and mouse:

The doctor must have started the application earlier, causing a password dialog screen to pop up. So the first thing is to mouse click on that screen, causing it to become active (line 1).
Since line 2 consists of us typing into an edit box (MR_number), we have to assume it was the default control. So we need to type in the value (“MRE5418”)
In line 3, we tab from the first edit box to the password edit box.
We type in the password in line 4 (“kzisnyixmynxfy”).
Notice that in line 5 we are setting the value in another edit box. How did we get there? It may have been a mouse click; it may have been automatically handled by a Return press that was handled directly by the control. It really does not matter. The important part is it tells us that we have a third value that needs to be entered. In line 5, we type into Edit_2 (which is actually the domain edit field) the domain value (VN00417.)

The sum result is a single logical action. Since keyword names should be self-documenting, we are going to call this one

It will take three arguments:

User ID, Password, and Domain name

This would be where the intelligent front end would need to come in to help the tester. There would have been a <StartApp> keyword that would have initialized the system, leaving it at the screen needed for <LoginDomain> to be called. After startup, the Task To Do drop-down list would contain <LoginDomain>. When selected, it would automatically pop up three edits so the user would see that three arguments must be passed in when using the keyword. After filling them in, the user would go on to enter the next keyword.

Lines 7 and 8 would be performed by the doctor as follows: After the doctor logs in to the domain, some kind of dialog pops up with a question. Since the script shows the user selected No when it popped up, we can reasonably assume that there are two separate paths we could take (Yes or No).

So the keyword we need is going to take a single Boolean argument. In this case, the question that was asked was, Do you want to prescribe a treatment? A logical keyword should show that a decision is being made. Therefore, we will call it

<DoTreatment> or perhaps <DoTreatment?>

This keyword will take a single Boolean argument that the front end would likely show as a check box.

The next step would appear to be redundant but was part of the security on the multilevel system. Lines 9 through 13 are the same actions as 1 through 6 on a different window. This is another layer of security allowing the doctor to get into the section of the system that allows prescribing drugs. Notice that we cannot use the same keyword <LoginDomain> because it is a different window that we are logging in to. So we need another keyword:

This will take three arguments: user ID, password, and a Boolean value. Note that in this case we are handling the box that pops up after the login by passing in whether we will click Yes or No. In this case, the message is a nag (informational) message that we want to ignore—hence the Yes answer.

Our keyword script (so far) would look like Table 6–5.

Table 6–5 Possible keyword solution

Finally, notice that we could have conceivably passed four arguments to <LoginDomain>, with the fourth being whether to answer Yes or No to the screen that pops up after we log in to the domain. That would have eliminated the <DoTreatment> keyword. We think this is strictly a matter of taste. We need to balance the amount of things that a single keyword does by the number of keywords we have and the complexity of trying to understand each task.

6.3 Specific Test Tools

While automation is likely to be a very large portion of the tasks for a technical test analyst, there are other tool sets a TTA may be required to use. The following sections describe several different types of tools that may be seen. Although this is not an exhaustive list, it is representative of what we have seen.

6.3.1 Fault Seeding and Fault Injection Tools

Learning objectives

TTA-6.3.1 (K2) Summarize the purpose of tools for fault seeding and fault injection.

Fault seeding and fault injection are different but related techniques.

Fault seeding uses a compiler-like tool to put bugs into the program. This is typically done to check the ability of a set of tests to find such bugs. Of course, the modified version of the program with the bugs is not retained as production code! This technique is sometimes called mutation testing, where an effort is made to generate all possible mutants.³ NASA uses this technique to help in reliability testing. This technique can only be used when it is practical to generate a large number of fault seeded (or “be-bugged”) variants of a program and then to test all those variants (or “mutants”). This generally requires automation not only of the fault seeding but also of the testing of the variants.

Fault injection is usually about injecting bad data or events at an interface. This has the effect of forcing the system to execute exception handling code. For example, Rex has a tool that allows him to randomly corrupt file contents. The core logic of that tool is shown in Figure 6–4. Similar techniques can be used to bombard the software with a stream of incorrect inputs, failed services from the network or operating system, unavailable memory or hard disk resources, and so forth.

These types of tools are useful to both developers and technical test analysts.

Figure 6–4 Core loop for a file corrupter utility

6.3.2 Performance Testing and Monitoring Tools

Learning objectives

TTA-6.3.2 (K2) Summarize the main characteristics and implementation issues for performance testing and monitoring tools.

Before performance tools were readily available, Jamie used to simulate performance testing using a technique they called sneaker-net testing. Early Saturday or Sunday morning, a group of people would come into the office to do some testing. The idea was to try to test a server when few (or no) people were on it.

Each person would get set up on a workstation and prepare to connect to the server. Someone would give a countdown, and at go! everyone would press the connection key. The idea was to try to load the server down with as many processes as they could to see what it would do. Sometimes they could cause it to crash; usually they couldn’t. Occasionally, if they could get it to crash once, they could not crash it a second time.

This nonscientific attempt at testing was pretty weak. They could not get meaningful measurements; it was essentially binary: failed | didn’t fail. If they did get a failure, it was rarely repeatable, so they often did not understand what it told them.

Luckily we don’t have to do this anymore; in today’s world, real performance test tools are ubiquitous and relatively cheap compared to even 10 years ago. There are now open-source and leased tools and tools in the cloud, as well as some incredible vendor tools available.

We talked about the testing aspects of performance testing in Chapter 4. In this chapter we want to discuss the tool aspects. For the purposes of this discussion, let’s assume that we have done all of the modeling/profiling of our system, evaluated the test and production environments, and are getting ready to get down to actually doing the testing.

Often, the test system is appreciably smaller than the production system. Be aware that you may need to extrapolate all of your findings before they are actually meaningful. Extrapolation has been termed “black magic” by many testers. If the system model you are working from is flawed, extrapolating it to a fullsize system will push the results that much further off. On top of that, there could easily be bottlenecks in the system that will only show up when the system is running at higher levels than we might be able to test in a lab. We can check for that on the system we test but have no way of knowing which problems will show up in production, where the hardware is likely different or at least set up differently.

6.3.2.1 Data for Performance Testing

At this point we need to define the data we are going to use, generate the load, and measure the dynamic aspects that we planned on. It is pretty clear that performance testing with poor or insufficient data is not going to be terribly useful; coming up with the right data is a non-trivial task.

We could use production data, but there are a couple of caveats that we need to consider. There are likely laws that are going to prevent us from using certain data. For example, in the United States, testing with any data that is covered by the Health Insurance Portability and Accountability Act (HIPAA) is problematic. Different countries have different privacy laws that might apply; testers must ensure that what they are using for data is not going to get them charged with a crime. There are tools available for data anonymization (also called data scrambling) that can render production data safe to use by changing values to make them still realistic but no longer personally identifiable.

Generally, production data is much bigger than the available data space in the test environment. That raises many questions about how the production data (assuming we are allowed to use it) can be extracted without losing important links or valuable connections between individual data pieces. If we lose the context that surrounds the data, it becomes questionable for testing (since that context might be needed by the system being tested).

Fortunately there are a variety of tools that can be used to generate the data you need for testing. Depending on the capabilities needed, possibilities range from building your own data using an automation tool to building it using really pricey vendor tools. Jamie’s suggestion, based on getting lost several times in his career by underestimating the effort it would take to generate enough useful data, is to plan lots of time, effort, and resources on the creation of your data.

There are three distinct types of data that will be required.

There is the input data that your virtual users will need to send to the server:

User credentials (user IDs and passwords): Reusing credentials during the test may invalidate some of the findings (due to caching and other issues). In general, you should have a separate user ID for each virtual user tested.
Search criteria: Part of exercising the server will undoubtedly include searching for stuff. These searches should be realistic, for obvious reasons. Searches could be by name, address, region, invoices, product codes, and so on. Don’t forget wild card searches if they are allowed. You should be familiar with all of the different kind of searches your system can do; model and create data for them.
Documents: If your system deals with attached documents (including uploading and downloading) then those must be supplied also. Different file types should be tested; once again, if the server is going to do a particular thing in production, it probably should be modeled in test.

Then we have the target data. That would include all of the data in the back end that the server is going to process. This is generally where you get into huge data sets. If the data set is too small, some of the testing will not be meaningful (timing and resources needed for searches and sorts, for example). You will likely need to be able to extrapolate all of your findings (good luck with that) if this data set is appreciably smaller than in production.

Jamie’s experience in performance testing is that the test data is usually smaller than in production. Rex has seen many times where the amount of data is similar to in production. Clearly, either case is possible. If you have lots of data, great. If not; well, you must do the best you can with what you have. It is important that you document any perceived risks in either case.

Note that the backend data will need to support the input data that we discussed earlier.

You will most certainly need to have a process to roll the data back after a test is completed. Remember that we often will run a number of performance tests so that we can average out the results. If we are not testing with the database in the same condition each time, then the results may not be meaningful. Don’t forget to budget in the time it takes to reset the data.

Finally, we have to consider the runtime data. Simply getting an acknowledgment from the server that it has finished is probably not sufficient. To tell whether the server is working correctly, you should validate the return values. That means you need to know beforehand what the returned data is supposed to be. You could conceivably compare the return data manually; don’t forget to bring your lunch! Clearly this is a spot for comparator tools, which of course, requires you to know what is expected.

6.3.2.2 Building Scripts

One good way to get reference data is to run your performance transactions before the actual performance test begins. You are going to check the transactions before using them in a test, aren’t you? Make sure the data is correct, save it off, and then you can use a comparator tool during the actual run. Since we know what kinds of transactions we are going to be testing—we identified them in the modeling step—we now need to script them. For each transaction, we need to identify the metrics we want to collect if we did not do that while modeling the system. Then the scripting part begins.

Some performance tools have a record module that captures the middleware messages that connect the client to the server and places them in a script while the tester runs the transaction scenario by hand from the client. This is what Jamie has most often seen. More complex and time consuming would be to program the transaction directly. This will entail more complexity and effort, but it may be needed if the system is not available early enough to do recording.

Once we have a script, we need to parameterize it. When it was recorded, the actual data used were placed in the script. That data needs to be pulled out and a small amount of programming needs to be done so that in each place the script used a constant value, it now picks the data up from the data store.

Once that’s done, the script itself must be tested. Try running it by itself and then try running multiple instances of it to make sure the data parameterization was done correctly.

6.3.2.3 Measurement Tools

By the time we get to this point, we are almost ready. However, we still need to set up our measuring tools. Most servers and operating systems have a variety of built-in tools for measuring different aspects of the server performance.

There are two types of metrics we need to think about. As a reminder from Chapter 4, the first metrics are response time and throughput measures, which compute how long it takes for the system to respond to stimuli under different levels of load and how many transactions can be processed in a given unit of time. The second set of metrics deals with resource utilization; how many resources are needed to deliver the response and throughput we need.

Response time generally consists of the amount of time it takes to receive a response after submitting a request. This is often measured by the server (or workstation) that is submitting the requests to the server and is usually performed by the tool submitting the requests. Any other time before the request is submitted and after the response is received is called “think time.” Throughput measures are also generally measured by the performance tool by calculating how many requests were submitted where responses were received within a given measure of time.

For the resource usage measurements, most are done on the servers. If we are dealing with a Windows server, then monitoring is relatively simple. Perfmon is a Microsoft tool that comes with the server operating system; it allows hundreds of different measurements to be captured.

If you are dealing with Linux or UNIX, there are a bunch of different tools that you might use:

Pmap: process memory usage
Mpstat: multiprocessor usage
Free: memory usage
Top: dynamic real-time view of process activity on server
Vmstat: system activity, hardware and system info
SAR: collects and reports system activity
Iostat: average CPU load and disk activity

For mainframe testing, there are a variety of built-in or add-on tools that can be used.

Explaining how each of these tools works is, unfortunately, out of scope for this book. However, you can find each one and uncover extensive knowledge by performing a web search.

When dealing with these measurements, remember that we are still testing. That means that it is important to compare expected results against the actual values that are returned.

6.3.2.4 Performing the Testing

So now we are all set. We can start our performance test, right? Well, maybe not quite yet. We need to try everything together in a dress rehearsal; that is, we have to smoke test our performance test. It is our experience that all of the different facets almost never work correctly together on the first try. When we start ramping up a non-trivial load, we often start triggering some failures.

This is a great time to have the technical support people at hand. The network expert, database guru, and server specialist should all be handy. When—as inevitably happens—the system stubs its toe and falls over, they can do immediate troubleshooting to find out why. Often it is just a setting on the server or a tweak to the database that is needed and you can get back to the smoke test. Sometimes, of course, especially early in the process, you find a killer failure that requires extensive work. You need to be prepared because that does happen fairly often.

Once you get it to all run seamlessly, you might want to capture some baselines. What kind of response times are we getting with low load—and what did we expect to get? These will come in handy later when we’re ramping up the test for real.

There are some extra things to think about when setting up a performance test.

In real life, people do not enter transaction after transaction without delay. They take some time to think about what they are seeing. This kind of information should have been discussed when we were modeling the system, as well as how many virtual users we wanted to test.

Depending on the kind of testing we are doing, we can ramp up the virtual user count in several different ways:

The big bang, where we just dump everyone onto the system at once (a spike test scenario)
Ramp up and down slowly, throwing all of the different transactions into the mix
Delay some users while using other ones

The way we ramp up is often decided based on the kind of testing we are doing.

Likewise, the duration of the test will also depend on the type of test. We may run it for a fixed amount of time, based on our objectives. Some tests we might want to run until we are out of test data. Other tests we might decide to run until the system fails.

After we shut down the performance test, we still have some work to do. We must go grab all of the measurements that were captured. One good practice is to capture all of the information from the run so we can analyze it later. Inevitably, we forget something on our initial sweep; if we don’t save it off, it will be lost forever.

After all this, draw a deep breath, step back, and compare what you found with what you expected. Did you learn what you wanted to? Or in other words, did the results show that the requirements are fulfilled? This is not a silly question. Jamie read a fascinating article in a recent science magazine that discussed how often researchers run an experiment and then don’t believe the results they got because it did not fit in with their preconceived mind-set. As testers, we are professional pessimists. We need to learn to identify a passing test—and a failing test.

If the test passed, then you are done. Write the report and go get a tall refreshing beverage; you deserve it. If there are details that weren’t captured...well, welcome to the club. Tomorrow is another day.

6.3.2.5 Performance Testing Exercise

Given the efficiency requirements in the HELLOCARMS system requirements document, determine the actual points of measurement and provide a brief description of how you will measure the following:

1. 040-010-050

2. 040-010-060

3. 040-020-010

The results will be discussed in the next section.

6.3.2.6 Performance Testing Exercise Debrief

040-010-050

Credit-worthiness of a customer shall be determined within 10 seconds of request. 98% or higher of all Credit Bureau Mainframe requests shall be completed within 2.5 seconds of the request arriving at the Credit Bureau.

In this case, we are testing time behavior. Note that this requirement is badly formed in that there are two completely different requirements in one. That being said, we should be able to use the same test to measure both.

The first is 2.5 seconds to complete the Credit Bureau request.

Ideally, this measurement would be taken right at the Credit Bureau Mainframe, but that is probably not possible given the location of it. Instead, we would have to instrument the Scoring Mainframe and measure the timing of a transaction request to the Credit Bureau Mainframe against the return of that same transaction. That would not be exact because it does not include transport time, but since we are talking about a rather large time frame (2.5 seconds), it would likely be close enough.

The second is 10 seconds for the determination from the Telephone Banker side.

This measurement could be taken from the client side. Start the timer at the point the Telephone Banker presses the Enter button and stop it at the point the screen informs the banker that it has completed. Note, in this case, that we would infer that the client workstation must be part of the loop, but since it is single threaded (i.e., only doing this one task), we would expect actual client time to be negligible. So, our actual measurement could conceivably be taken from the time the virtual user (VU) sends the transaction to the time the screen is fully loaded. That would allow the performance tool to measure the time.

Clearly, this test would need to be run with different levels of load to make sure there is no degradation at rated load: 2,000 applications per hour in an early release (040-010-110) and later at 4,000 applications per hour (040-010-120). In addition, it would need to be run a fairly long time to get an acceptable data universe to calculate the percentage of transactions that met the requirements.

040-010-060

Archiving the application and all associated information shall not impact the Telephone Banker’s workstation for more than .01 seconds.

Again, we are measuring time behavior.

Because the physical archiving of the record is only a part of this test, this measurement would be made by an automation tool running concurrently with the performance tool. Our assumption is that the way the requirement is worded, we want the Telephone Banker ready to take another call within the specified time period. That means the workstation must reset its windows, clear the data, and so on while the archiving is occurring.

We would have a variety of scenarios that run from cancellation by the customer to declined applications to accepted. We would include all three types of loans, both high and low value. These would be run at random by the automation tool while the performance tool loaded down the server with a variety of loans.

The start of the time measurement will depend on the interface of HELLOCARMS. After a job has completed, the system might be designed to reset itself or the user might be required to signal readiness to start a new job. If the system resets itself, we would start an electronic timer at the first sign of that occurring and stop it when the system gives an indication that it is ready. If the user must initiate the reset by hand, that will be the trigger to start the timer.

040-020-010

Load Database Server to no more than 20% CPU and 25% Resource utilization average rate with peak utilization never more than 80% when handling 4,000 applications per hour.

This requirement is poorly formed. During review, we would hope that we would be able to get clarification on exactly what resources are being discussed when specifying percentages. For the purposes of this exercise, we are going to make the following assumptions:

25% resource utilization will be limited to testing for memory and disk usage.
Peak utilization applies to CPU alone.
The network has effectively unlimited bandwidth.

This test will be looking at resource utilization, so we would monitor a large number of metrics directly on the server side for all servers:

Processor utilization on all servers supplying the horsepower
Available memory
Available disk space
Memory pages /second
Processor queue length
Context switches per second
Queue length and time of physical disk accesses
Network packets received errors
Network packets outbound errors

The test would be run for an indeterminate time, ramping up slowly and then running at the rated rate (4,000 applications per hour) with occasional forays just above the required rate.

After the performance test had run, we would graph out the metrics that had been captured from the database server. Had the server CPU reached 80 percent at any time, we would have to consider the test failed. Averaging out the entire time the test had been running, we would look for memory, disk, and CPU usage on the database server to make sure that they averaged less than the rated value.

Note that this test points out a shortcoming of performance testing. In production, the database server could easily be servicing other processes beyond HELLOCARMS. These additional duties could easily cause the resources to have higher utilization than allowed under these requirements.

In performance testing, it is always difficult to make sure we are measuring apples to apples and oranges to oranges. We fear without more information, we might just be measuring an entire fruit salad in this test.

6.3.3 Tools for Web Testing

Learning objectives

TTA-6.3.3 (K2) Explain the general purpose of tools used for web-based testing.

Web tools are another common type of test tool.

A frequent use of these tools is to scan a website for broken or missing hyperlinks. Some tools may also provide a graph of the link tree, the size and speed of downloads, hits, and other metrics. Some of these are used before going live; others can be used to run regular checks on a live website to minimize the time that broken links are exposed or to optimize the user experience.

Some tools can do a form of static analysis on the HTML/XML to check for conformance to standards. This validation is often done against W3C specifications, but some tools also might validate a website against Section 508 (the accessibility standard) of the US Rehabilitation Act of 1973.

Spell checking and validation against specific browsers can also be done by some tools.

There are a wide variety of web testing tools that fall into the category of test automation and/or performance tools:

Selenium: An open-source suite of tools that run in several different browsers across different operating systems.
Latka: An end-to-end functional testing automation tool implemented in Java. It uses XML syntax to define HTTP/HTTPS requests and a set of validations to ensure that the requests were answered correctly.
Watij: A Java-based open-source tool that automates functional testing of web applications through a real browser.
SlimDog: A simple script-based web testing tool based on HttpUnit.
LoadSim: A Microsoft-supplied tool that simulates loads on Microsoft Exchange servers.
Sahi: A JavaScript-based record playback tool for browser-based testing.

Of course, not all tools used to test website performance are special-purpose tools. For example, ordinary performance tools can be used to drive load to ascertain if servers can reasonably handle expected load.

Many special-purpose security tools have been designed to search for penetration and other security issues endemic to websites. Some of these can be used by a developer while writing code; the tool provides real-time feedback to the programmer by automatically detecting risky code. Other tools can provide a complete security assessment of the website before going live.

A variety of test tools are now available for testing client-side capabilities. For example, JavaScript and Ajax are often used on the client to offload some processing from the server; these can be tested in different browsers to ensure optimum user satisfaction.

An important point to remember about many of these web testing tools is that they perform some testing tasks really well but other testing tasks are either not supported or difficult to do. This is fairly common with open-source tools. The designers of the tool are often interested in solving a particular problem and they design the tool accordingly. While this is certainly not true of every open-source tool, an organization that decides to use open-source tools should expect to mix several tools together to create a total solution.

Web tools are used by both test analysts and technical test analysts. They can use these tools at any point in the life cycle once the website to be analyzed exists.

6.3.4 Model-Based Testing Tools

Learning objectives

TTA-6.3.4 (K2) Explain how tools support the concept of model-based testing.

The phrase model-based testing can, as a practical matter, mean a number of different things. At a high level, you can say that model-based testing occurs when a model (e.g., a state transition diagram, a business process flow, or the web of screen linkages in a user interface) describes the behavior or structure of a system and that model is used to generate functional or nonfunctional test cases. This process can be supported by tools:

To generate the tests, either at a logical level or concrete level of detail
To execute the tests against the system
To do both of these things, either sequentially (generate then execute) or in parallel (just-in-time generation for execution)

Model-based testing is interesting in that it is a practical, real-world testing approach that grew out of academic research, especially on the subject of state-based systems.

When done with tool support, a nice benefit of the technique is that the tool can generate and execute a huge number of tests, as we’ll illustrate with an example. If time is limited, some of the tools have the ability to prune the test set, though the tester will need to be able to indicate to the tool the criteria for selection and rejection of tests. Because of the larger number of tests, and the automatic (in some cases randomized) generation of input values, these model-based test tools will often find bugs that manual testers would miss.

Let’s look at an example of a model-based testing tool developed to do functional and nonfunctional testing on a simple mobile application.⁴ The application is called ePRO-LOG, an electronic diary for patients to use for data collection in clinical trials and disease management programs. Depending on the treatment, these diaries include over 100 forms that must be traversed to capture information about a particular treatment, such as injection-site inflammation, allergic reactions, and so forth. Not only were there many forms, but all the forms might have to be supported in a dozen or more languages used in several locales. This led to a large and complex collection of configurations. Each configuration must be adequately tested according to FDA regulations. These regulations call for 100 percent verification of requirements with traceability of tests and their results back to the requirements. The company was also concerned about reliability, translation completeness, functionality of UI, and input error checking.

In addition to addressing these testing issues, the company wanted the following benefits:

Capture accurate diagrams of the screen flows that had been tested.
Reduce tedious, error-prone manual screen verification testing.
Reduce duration and effort associated with the testing.
Automatically generate tests for any screen flow or translation.

As a first step, we evaluated commercial automated test tools as a potential solution but were unable to find any that would satisfy our requirements. Therefore, a custom tool was built using open-source components. We started with the basic idea of a dumb monkey tool, an unscripted automated test tool that gives input at random. The dumb monkey was implemented in Perl under Cygwin running on a Windows PC. This tool went beyond the typical dumb monkey, though, as it had the ability to check against a model of the system’s behavior, encapsulated in machine-readable requirements specifications. Since we were building a tool we called the monkey, as you can imagine elements of humor entered the project, as illustrated by the terminology shown in Table 6–6.

Table 6–6 Terms used for the model-based testing tool

The monkey started off working much the same as any dumb monkey tool, sending random inputs to the user interface, forcing a random walk through the diary implemented in the ePRO-LOG application. Because the inputs are random, the tests are diary independent. Three simple test oracles are implemented at this point without any underlying model: check each form for broken links, missing images, and input validation errors. The tests, once started, run for days and test a huge number of input combinations, thus doing long-term reliability and robustness testing as well. At this point, we are still in the realm of typical dumb monkey test tools.

However, the tool captured all the tests performed and results obtained in human-readable documentation, thus supporting FDA audits if necessary. Since the proper translation of each form had to be checked, the capture of this information simplified that process as well. This is not typical of a dumb monkey. A graphical view of how the elements of the tool all fit together is shown in Figure 6–5.

Figure 6–5 ePRO-LOG and the test tool’s subsystem interactions

The aspect that transformed the tool from a dumb monkey to a model-based testing tool, however, was the ability to read machine-readable requirements specifications to determine the correct appearance of each screen. These requirements were machine readable to support another element of FDA auditing of the software development (rather than testing) process, so there was no extra work to create them. The tool had a parser that allowed it to read those specifications as it walked through the application. (The product of this parsing process is referred to as Monkey Chow, since it was a main input to the tool.) If the tool found that the wrong screen was displayed, or that the right screen was displayed but some of the information on the screen was wrong, it would log that as a failure. All of the logging was done in a way that it would support FDA audits, which was a tremendous benefit given the effort required to do that manually.

Figure 6–6 Using the requirements as a model

Figure 6–6 shows the model-based element of the tool. The Presentation Monkey produces a screen flow diagram from the Monkey Droppings file as shown previously in Figure 6–5. This diagram shows what screens were observed during monkey testing. It also produces a screen flow diagram using the requirements specifications instead of the Monkey Droppings file. This diagram shows the expected functional flow of ePRO-LOG. This is fed to a comparator tool that can log any anomalies.

An example is shown in Figure 6–7. The requirements specifications said that the screen flow should go from FormT4 to FormT5 before FormSave, but instead the application went directly from FormT4 to FormSave. In addition, the requirements said the screen flow should go from FormI2 directly to Form-Save, but instead the application went from FormI2 to FormI3 before going to FormSave.

Figure 6–7 An example of the monkey finding two anomalies

6.3.5 Tools to Support Component Testing and Build Process

Learning objectives

TTA-6.3.5 (K2) Outline the purpose of tools used to support component testing and the build process.

While often considered the domain of developers or release engineering teams, in some cases we have seen clients assign build and component test automation responsibilities to technical test analysts. Even if another team owns these tools, you should know how they work because you’ll find ways to integrate automated tests into the automated build and component test framework. If you are working in an organization following an Agile life cycle, you will almost certainly encounter these tools. Even if your organization follows a sequential life cycle, you can take advantage of these tools, though you’re more likely to have to take a lead role in establishing the tools in that case.

These tools are essential to support a process referred to as continuous integration. Some of the tools are language specific, but a commonality has grown up around these tools such that the same process can be set up and automated for most modern programming languages. Continuous integration is a prevalent practice commonly associated with Agile life cycles. However, Rex has been involved in projects using continuous integration frameworks since 1992, at which time anyone referring to a colleague as an Agile programmer would probably have been inspired by that programmer’s ability to juggle.

Continuous integration involves programmers checking in code to the configuration management system once that code has achieved certain entry criteria. At the least, those entry criteria should include passing a set of automated unit tests. The best practice is to require static analysis of the code, followed by a code review meeting that examines the code, the unit tests, the unit test results, the code coverage achieved by the unit tests, and the static analysis results.

The code is checked in, merged with any other changing code as necessary, and integrated with the other code in the relevant code branch. At this point, an automated process occurs periodically—at least once a day but often more frequently—to compile, build, test, and perhaps even deploy a new release. This tight loop of check-in/build/test allows the quick detection and repair of bugs, especially the particularly momentum-killing build-breaking bugs. The specifics of the continuous integration tools and process can vary but will include some or all of the following:

Using a version control tool, often integrated with a continuous integration tool, to drive the entire process automatically, checking for new or changed code periodically and, if appropriate, triggering the build-and-test cycle
Performing a static code analysis (that might be skipped or restricted in terms of scope if the individual units are already subjected to such an analysis prior to check-in)
Building the release via whatever compile and link process is relevant for the particular language or languages being used
Running the automated unit tests, not only for the new and changed code but for all code, measuring the code coverage achieved by these unit tests, and reporting the test results and coverage metrics
If the static analysis, build, and unit tests are successful, installing the release into a test environment, which may be a dedicated environment for this purpose or the general environment used for system testing
Running a set of automated integration tests, which should be (but regrettably seldom are) focused on testing interfaces between the units and reporting the test results
Running higher-level automated tests, such as system tests and feature acceptance tests, which can in some cases include performance and reliability tests, though these test sets may take longer to run and analyze and thus might be run separately from the continuous integration process

The reporting of test results should also be entirely automated, either via updating an intranet site or project wiki or sending email to the project team. In the case of an Agile project, the task board may be updated to display status as well.

If bugs are found during the build-and-test cycle, developers can use debugging tools to chase down the problems. As a technical test analyst, you might be involved in helping to do this task, especially if you have strong programming skills. However, you’ll want to make sure the test manager is aware that you are doing so, because that may distract you from tasks more properly within your scope.

When these processes work well, they can almost entirely eliminate the broken-build or untestable-build problems that can plague many test teams. The automated tests are also effective at catching some percentage of regression bugs as well. However, if only automated unit tests are included in the test set, the regression detection abilities will be limited. Including high-level tests in the process will greatly increase regression risk mitigation.

In Figure 6–8, you can see an example of a continuous integration process that Rex and his colleagues created for a client. The four steps of the process are shown in the figure. UCI stands for “unit, component, and integration,” which were the three levels of tests that the programmers were responsible for defining and automating. It proved rather difficult with this team—as it has with most programmer teams Rex has worked with—to get them to create good integration tests. As a technical test analyst, you should help support this process because when specific integration tests aren’t done, system test becomes a de facto big bang integration test level, with all the risk associated with that approach to integration testing. This project happened in early 2001, so you can see that continuous integration is a well-established, time-proven technique.⁵

Figure 6–8 A continuous integration process with associated tools

6.4 Sample Exam Questions

1. A test team uses a test management tool to capture its test cases, test results, and bug reports. The business analysts use a separate tool to capture business requirements and system requirements. Which of the following tool integration considerations most applies in this situation?

A. Traceability

B. Defect classification

C. Return on investment

D. Probe effect

2. Consider the following costs:

I. Designing and documenting the test automation architecture

II. Ongoing license fees

III. Extending coverage for new features

IV. Hardware and software to run the automation tool

V. Maintenance costs of scripts

VI. Integrating automation processes into the project

Which of the above should be considered as initial costs and which should be seen as recurring costs?

A. All are initial costs.

B. I, II, IV are initial costs; III, V, and VI are recurring costs.

C. I, IV, VI are initial costs; II, III, and V are recurring costs.

D. I, III, and VI are initial costs; II, V, and VI are recurring costs.

3. Your organization has just spent over one million dollars on a capture/replay tool set to introduce automation to an ongoing project. You have been named the lead automator for the automation project. Which of the following tasks should be considered the most important for you to deal with right away?

A. Begin building scripts to start accumulating automation assets.

B. Start training all testers on how to use the automation tool set.

C. Research and request a test management tool to manage the automation.

D. Select the automation architecture you are going to pursue.

4. Select the reason an organization may choose to build a keyword-driven architecture instead of a data-driven architecture.

A. To allow automation of functional tests

B. To allow multiple tests to use the same script

C. To allow automation of regression tests

D. To allow for better use of technically trained testers

5. Which of the following are possible reasons that automation projects fail to deliver value?

I. Testers often do not know how to write code.

II. Stakeholders’ expectations are often unrealistic.

III. Excessive interface changes from build to build.

IV. Underestimating the amount of time and resources needed.

V. Manual testing processes are too immature.

VI. Tool used does not work well in test environment.

VII. Failure to design an architecture that anticipates change.

A. II, III, IV, V, VI

B. I, III, IV, V, VII

C. None of these

D. All of these

6. Which of the following points of information about your organization would tend to make keyword-driven automation a desirable automation method rather than using a straight data-driven methodology?

A. Almost all of the testers on your test team have backgrounds in programming.

B. The systems you are testing have radical interface changes at least three times a year.

C. Most of your testers came from the user or business community.

D. Your organization has a limited budget for purchasing tools.

7. As the lead automator, you have been tasked with creating the design for the keyword-driven architecture. You have decided that, because so many of the testers who will be creating tests are highly experienced using the current interface, you will allow very granular keywords (e.g., ClickButton(btnName), TypeEdit(edtName, Text), etc.). Which of the following may be the worst disadvantage of allowing this level of granularity?

A. Testers may have problem using the low-level keywords.

B. Keyword tests will become vulnerable to interface changes.

C. Keyword tests will likely become too long to understand.

D. The framework that supports execution will be too brittle.

8. Which of the following statements captures the difference between fault seeding and fault injection?

A. Fault seeding involves corrupting inputs, data, and events, while fault injection involves systematically introducing defects into the code.

B. Fault injection involves corrupting inputs, data, and events, while fault seeding involves systematically introducing defects into the code.

C. Fault injection and fault seeding are the same; both involve corrupting inputs, data, and events.

D. Fault injection and fault seeding are the same; both involve systematically introducing defects into the code.

9. You are in the analysis and design phase of your performance testing project. You have evaluated the production and test environments. You have created the data to be used and built and parameterized the scripts. You have set up all of the monitoring applications and notified the appropriate support personnel so they are ready to troubleshoot problems. Which of the following tasks, had it not been done, would surely invalidate all of your testing?

A. Ensure that the test environment is identical to the production environment.

B. Model the system to learn how it’s actually used.

C. Purchase or rent enough virtual user licenses to match peak usage.

D. Bring in experienced performance testers to train all of the participants.

10. You are senior technical test analyst for a test organization that is rapidly falling behind the curve; each release, you are less able to perform all of the testing tasks that are needed by your web project. You have very little budget for tools or people, and the time frame for the project is about to be accelerated. The testers in the group tend to have very little in the way of technical skills. Currently, 100 percent of your testing is manual, with about 15 percent of that being regression testing. Which of the following decisions might help you catch up to the curve?

A. Allow the testers to use open-source tools to pick low-hanging fruit.

B. Put a full automation project into place and try to automate all testing.

C. Find an inexpensive requirements/test management tool to roll out.

D. Build your own automation tool so it does not cost anything.

11. Which of the following is unique about the way model-based testing tools support testing compared to typical test execution tools?

A. The tools can automatically generate test results logs.

B. The tools can compare actual and expected results for anomalies.

C. The tools can generate the expected results of the test from a model.

D. The tools can generate the model from any type of requirements specification.

12. Which of the following is a way that component testing and build tools directly benefit testers?

A. The programmer can step through code to find defects.

B. The tools automatically update the Agile task board.

C. The testers don’t have to worry about regression of existing features.

D. The incidence of broken or untestable builds is reduced.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6 Test Tools and Automation

Create new playlist

Sign In

Sign Up

6 Test Tools and Automation

6.1 Integration and Information Interchange between Tools

6.2 Defining the Test Automation Project

6.2.1 Preparing for a Test Automation Project

6.2.2 Why Automation Projects Fail

6.2.3 Automation Architectures (Data Driven vs. Keyword Driven)

6.2.3.1 The Capture/Replay Architecture

6.2.3.2 Evolving from Capture/Replay

6.2.3.3 The Simple Framework Architecture

6.2.3.4 The Data-Driven Architecture

6.2.3.5 The Keyword-Driven Architecture

6.2.4 Creating Keyword-Driven Tables

6.2.4.1 Building an Intelligent Front End

6.2.4.2 Keywords and the Virtual Machine

6.2.4.3 Benefits of the Keyword Architecture

6.2.4.4 Creating Keyword-Driven Tables Exercise

6.2.4.5 Keyword-Driven Exercise Debrief

6.3 Specific Test Tools

6.3.1 Fault Seeding and Fault Injection Tools

6.3.2 Performance Testing and Monitoring Tools

6.3.2.1 Data for Performance Testing

6.3.2.2 Building Scripts

6.3.2.3 Measurement Tools

6.3.2.4 Performing the Testing

6.3.2.5 Performance Testing Exercise

6.3.2.6 Performance Testing Exercise Debrief

6.3.3 Tools for Web Testing

6.3.4 Model-Based Testing Tools

6.3.5 Tools to Support Component Testing and Build Process

6.4 Sample Exam Questions

Table of Contents for
6 Test Tools and Automation