Chapter Fourteen
Evaluating Instructional and Noninstructional Interventions

In Chapter 2, we discussed the process for conducting a needs assessment, which helps the instructional design professional scope out a problem or opportunity that can be addressed by an instructional or noninstructional intervention. We then covered designing and utilizing learning assessments, identifying the desired outcomes and aligning these with the overall goals of the instructional intervention. These chapters provide the foundation for evaluation which, while conducted after the intervention, should be established early in the ID process when business and performance needs are identified, desired behavior changes are determined, and learning objectives and measurement strategies are set. This chapter summarizes the evaluation strategies and tactics used to answer the critically important question—what difference did it make? The answer to this question provides insights into potential improvements that can be made and valuable information to share with a variety of stakeholders who want to know if their investments are yielding worthy returns.

In this chapter, we clarify assumptions about formative and summative evaluation, define key terms associated with these activities, and include a case study that dramatizes issues that can arise when developing a formative evaluation plan. We also describe the steps used to establish a formative evaluation plan and approaches to implementing that plan. This chapter also describes summative evaluation and covers various post-intervention evaluation models.

Purpose and Definitions of Evaluation

Instructional designers often hold the belief that their work is not finished until the targeted audience can learn from the material. The fourth edition of Mastering concentrated primarily on formative evaluation, which “involves gathering information on adequacy and using this information as a basis for further development” (Seels and Richey 1994, 57). This ensures that the instruction is sound before it is deployed to large numbers of intended learners. This edition provides more complete coverage of summative evaluation, which “involves gathering information on adequacy and using this information to make decisions about utilization” (Seels and Richey 1994, 57). It helps to determine the results of the instruction post-implementation. Evaluation in all its forms has figured prominently in instructional design practice as decision makers demand increasing accountability (Rothwell, Lindhold, and Wallick 2003).

According to The Standards (Koszalka, Russ-Eft, and Reiser 2013, 55), even though the competency and two performance statements associated with evaluation are classified as advanced, “even a novice instructional designer should be familiar with the need for evaluation and revision prior to any dissemination of instructional products and programs.” High accountability is now being demanded for all or most forms of instruction and evaluation skills are critical for all designers to possess. As organizations have tightened their budgets, the need to prove that training is a worthwhile investment has become more and more important. The performance statements associated with this competency indicate that instructional designers should be able to (Koszalka, Russ-Eft, and Reiser 2013, 56): “(a) Design evaluation plans (advanced); (b) Implement formative evaluation plans (essential); (c) Implement summative evaluation plans (essential); and (d) Prepare and disseminate evaluation report (advanced).”

Stakeholders in Evaluation

Stakeholders are those who are interested in the results of an evaluation effort and can range from front line employees to senior level executives including the CEO (Phillips and Phillips 2010a; Phillips and Phillips, 2010b). The interests and objectives of stakeholders can vary and can even compete. Table 14.1 lists stakeholders and their potential interest(s) in the outcomes of the evaluation.

Table 14.1 Interest in Evaluation by Stakeholder

Stakeholder Key Questions or Interests
Instructional designers Did the learning work as intended?
Did learners respond favorably to the learning?
What improvements can be made for next time?
Instructors or facilitators Were learners satisfied with the experience?
Did learners achieve the intended outcomes?
How can I improve my delivery?
Learners Did my knowledge or skill improve?
Am I more productive or effective in my job?
Managers of learners Did my people acquire new knowledge or skills?
Are they more productive or effective in their jobs?
Did the benefits received outweigh the “cost” of participating?
Executives or sponsors Are learners more productive?
Are learners demonstrating behaviors that will further our strategic objectives?
Are learners adding greater value than prior to participating?

Data Collection Approaches

When an instructional designer engages in evaluation, it is similar to a scientist conducting research. Many of the same principles, strategies, and techniques apply. Research has many purposes, but “there are two basic purposes for research: to learn something, or to gather evidence” (Taflinger 2011). Like the researcher, the instructional designer involved in evaluation efforts works to frame important questions and systematically attempts to answer those questions. The ability to collect and analyze data is central to the role of the evaluator.

Sources of Evaluation Data

There are many data sources that can be tapped into for evaluation. The data may be quantitative or numeric, such as survey responses using a 5-point scale. Qualitative data is non-numeric like the results of an interview. Some of the most commonly utilized sources of data include: the learners themselves, managers, subject experts, program sponsors, and what Allison Rossett (1987, 25) refers to as extant data or the “stuff that companies collect that represents the results of employee performance.” Examples of extant data include organizational performance data or reports (sales, profitability, productivity, quality), engagement or satisfaction surveys, and talent data or reports (retention, performance ratings, promotion, mobility, exit interviews).

Use of a single data source can lead to inaccurate evaluation results because the perspective being considered may be highly subjective or limited. It is therefore important to use multiple data sources whenever feasible and practical. Doing so can increase the validity of the data, a subject addressed later in this chapter.

Data Collection Methods

Similar to the recommendation of tapping into multiple data sources when conducting an evaluation, we also recommend using multiple data collection methods when possible. Methods are how data are collected from source(s). Multiple sources and methods will help to ensure the designer does not have gaps in the data collected, thereby providing more valid information upon which to make decisions. Below are several data collection methods.

  1. Interviews. The interview is a commonly used method to gather feedback and input directly from key sources. To ensure consistency of approach, interview protocols may be developed to articulate questions to be posed to the interviewee. Some of the advantages of the interview method include directly hearing responses from the interviewee, the ability to observe body language, and the opportunity to pose follow up questions to dig deeper into a response. It's helpful when interviews are recorded to aid with transcription and later analysis, but this has the potential downside of causing discomfort for the interviewee. Interviews yield qualitative data, sometimes voluminous amounts, which require skill and expertise to analyze effectively.
  2. Focus Groups. The focus group is similar to the interview because it is typically conducted in-person or through virtual technologies. Unlike an interview often conducted with one individual, a focus group involves multiple people (8–12 people is common). It's an effective way to gain multiple perspectives relatively quickly and on a wide range of topics or questions of interest. Having a skilled facilitator helps keep the conversation moving, ensures full participation, and avoids groupthink (when individual voices, especially those who may have dissenting opinions, are withheld to conform to the will of the group—which is often falsely reflected by who talks the most or the loudest). A skilled facilitator will be on the watch for groupthink and other dysfunctional behaviors and take actions to prevent or deal effectively with them. Sometimes a “scribe” is used to capture key points and responses from the focus group or it may be recorded and later transcribed.
  3. Observation. Observing someone in their work environment is a labor-intensive data collection method whereby a trained observer watches learners or performers at work to determine if behaviors, performance, interactions, or other variables have changed or improved. Obtrusive observations, where the observer is present and visible to the performer or subject, provide a direct and first-hand means by which to gather data and ask questions during the observation. Unobtrusive observation methods are when the presence of the observer is not known or highly visible to the research subject, thus creating a more natural environment where there is less likelihood of the Hawthorne Effect (when behavior is changed by the mere presence of extra attention being paid). Quantifying observational data is sometimes aided with tools like checklists, flow charting tools, or frequency counts.
  4. Surveys or Questionnaires. These are data collection methods that can yield quantitative and qualitative responses depending on the intended outcomes and research design used. Questions with rating scales yield numeric data that can be analyzed, while open-ended questions allow the respondent to provide narrative comments. Issues of validity and reliability make possessing or having access to expertise in the design of surveys important. In the past, surveys were mainly conducted using paper and pencil, but today many software packages and online tools such as Survey Monkey, Zoomerang, and Poll Everywhere make the mass collection of data more efficient. Many even come with basic analysis and reporting tools, and also provide raw data that can be imported into more sophisticated data analytic software such as SPSS or Excel. An example of a survey that contains qualitative and quantitative questions is shown in Exhibit 14.1.
  5. Tests. Tests can include knowledge tests used to assess acquisition of information, such as the essential elements of a company's balance sheet or detailed knowledge of the organization's products. Skill tests are used to assess whether behaviors, such as conducting an effective performance review, have improved. Sometimes evaluators use pretests to determine knowledge or skills prior to the learning intervention to establish a baseline. Post-tests are used following learning to measure the acquisition of knowledge or skill, ideally compared to performance on the pretest. Criterion-referenced tests (CRT) establish objective standards of performance typically clarified to learners so they visualize success and eventually attain it through practice, feedback, and eventual mastery. Norm-reference tests (NRT) compare performance against others who have taken the test rather than against an established standard of performance. Similar to surveys, the ability to construct tests and write effective questions is a specialized skill where deep subject expertise is important to ensure validity and reliability.
  6. Extant Data Review. It's helpful to leverage organizational data that already exists and use it for evaluation purposes. Sometimes extant data is easy to obtain, such as when organizational records are well-kept and easily accessible in a weekly sales report or quarterly review of customer survey results. In other cases, the evaluator may need to comb through paper files to pull together and make sense of important but poorly maintained and sometimes inaccurate data. There are often numerous potential extant data that could be leveraged for evaluation. The challenge becomes narrowing the scope to the data that will be most valuable to the key stakeholders, data that's believed to be accurate and valid, and data that's easy to access and analyze.

Formative Evaluation

Evaluation that happens at the end of an intervention is summative evaluation. However, the intervention should also be evaluated throughout the design and development process. This evaluation is called formative evaluation and it helps to pinpoint adjustments that must be made during the design process, so desired results are more likely to be achieved.

Assumptions about Formative Evaluation

Instructional designers make three fundamental assumptions when evaluating instructional materials and methods. First, they view evaluation as primarily a formative process. This assumption rests on the belief that instructional materials and methods should be evaluated—and revised—prior to widespread use to increase their instructional effectiveness. In this way, it is hoped that learner confusion will be minimized.

Second, instructional designers assume that evaluation means placing value on something. Evaluation is not objective and empirical; rather, it rests heavily on human judgment and human decisions. Human judgment reflects the individual values of instructional designers and the groups they serve.

Third, instructional designers expect to collect and analyze data as part of the evaluation process. To determine how well instructional materials and methods work, instructional designers must try them out. It is then possible, based on actual experience with learners, to make useful revisions to the materials.

Defining Terms Associated with Formative Evaluation

Before undertaking a formative evaluation, instructional designers should take the time to familiarize themselves with at least two key terms: formative product evaluation and formative process evaluation. However, instructional designers should also minimize the use of this special terminology. Operating managers or clients will only be confused or turned off by it.

  1. Formative Product Evaluation. The term formative product evaluation means appraising instructional materials during preparation. Its key purposes are to provide instructional designers with descriptive and judgmental information about the value of instruction. Descriptive information outlines the value of instructional components. Judgmental information assesses how much learning results from the instructional materials when used with learners and places a value on those results.
  2. Formative Process Evaluation. Formative process evaluation is related to formative product evaluation and means the appraisal of instructional methods, including how planned learning experiences are delivered or facilitated. Like product evaluation, it provides both descriptive and judgmental information about planned learning experiences.

Developing a Formative Evaluation Plan

Instructional designers should develop a formative evaluation plan that focuses attention on the instructional materials. There are seven steps in developing a formative evaluation plan. We will describe them in the following sections.

Step 1: Determining Purpose, Objectives, Audience, and Subject

The first step of formative evaluation is to determine the purpose, objectives, audience, and subject. Answer the question, why is this evaluation being conducted? How much is the focus solely on the quality of the instructional materials or methods, and how much is it on other issues, such as the following (Kirkpatrick 1996):

  • How much will the targeted learners enjoy the instructional materials, content, or methods?
  • How much will the participants learn?
  • How much impact will the learning experience have on the participants' job performance?
  • How much impact will the planned learning experience have on the organization?

As part of the first step, clarify the desired results of the formative evaluation. For each purpose identified, establish measurable objectives for the evaluation. In this way, instructional designers help themselves and others assess the results against what was intended.

In addition, consider who wants the evaluation and why. Is it being conducted primarily for the benefit of instructional designers, senior leaders, key decision makers, immediate supervisors of the targeted learners, or some combination of all these groups? Always clarify who will review the results of the formative evaluation and what information they need from it. This will help to identify what to evaluate and how to present the findings.

Identify who will participate in the formative evaluation. Will the evaluation be focused on representative targeted learners only, or will it also focus on learners with special needs or low abilities? Subject-matter specialists? Representatives of the supervisors of targeted trainees? Their managers? Senior leaders? There are reasons to target formative evaluation to each group of subjects, depending on the purpose and objectives of the evaluation.

Step 2: Assessing Information Needs

The second step in conducting formative evaluation is to assess the information needs of the targeted audiences. Precisely what information is sought from the results of the formative evaluation? Usually, the targeted audiences will provide important clues about information needs:

  • Instructional designers will usually be interested in how they can revise instructional materials or delivery methods to make them more effective for learners.
  • Key decision makers will usually be interested in how well the materials meet previously identified instructional needs and solve human performance problems. They may also want to assess how much and what kind of financial or managerial support is necessary to ensure instructional success or on-the-job application of what was learned.
  • Immediate supervisors of targeted learners will usually be interested in familiarizing themselves with the instructional content so they can hold learners accountable on their jobs for applying what they learned.
  • Representatives of the targeted learners may be interested in how easy or difficult the instructional materials are and how test results will be used. In addition, consider the extent to which each group might be interested in determining how well instructional materials and methods convey the content, allow participants to apply what they learn, measure accomplishment, and demonstrate learner achievement of performance objectives.

Step 3: Considering Proper Protocol

The third step in conducting a formative evaluation is to consider proper protocol. Several questions about the protocol of conducting formative evaluation should be answered:

  • How much do the targeted audiences expect to be consulted about a formative evaluation before, during, and after it is conducted?
  • What permissions are necessary to carry out the study?
  • Whose permissions are necessary?
  • What formal or informal steps are necessary to secure the permissions to conduct a formative evaluation, select subjects, collect data, and feed back results?

Protocol is affected by five key factors: (1) the decision makers' experience with formative evaluation, (2) labels, (3) timing, (4) participation, and (5) method of evaluation.

The decision makers' experience with formative evaluation is the first factor influencing protocol. If the decision makers have had no experience with formative evaluation, instructional designers should take special care to lay the foundation for it by describing to the key stakeholders what it is and why it is necessary. If decision makers have had experience with formative evaluation, determine what mistakes (if any) were made in previous evaluative efforts so repeating them can be avoided. Common mistakes may include forgetting to secure the Evaluating Instruction permissions, forgetting to feed back to decision makers information about evaluation results, and forgetting to use the results in a visible way to demonstrate that the evaluation was worth the time and effort.

Labels are a second factor affecting protocol. Avoid using the imposing term “formative evaluation” with anyone other than instructional designers, since it may only create confusion. Try more descriptive labels such as walkthroughs, rehearsals, tryouts, or executive previews.

Timing is a third factor affecting protocol. Is it better to conduct a formative evaluation at certain times in the month or year than at other times, due to predictable work cycles or work schedules? Make sure that formative evaluations will not be carried out when they conflict with peak workloads or other events, like a company board meeting or an earning's call which may make it difficult for key stakeholders to approve or participate.

The participation of key stakeholders is a fourth factor affecting protocol. How essential is it to obtain permission from a few key individuals before conducting a formative evaluation? If essential, who are they? How is their permission secured? How much time should be allowed for obtaining the permissions?

The method of evaluation is the fifth and final factor affecting protocol. Given the organization's culture, should some instruments, methods of data collection, or analysis be used instead of others?

Instructional designers should never underestimate the importance of protocol. If protocol is forgotten, instructional designers can lose support for the instructional effort before it begins. Remember, any instructional experience is a change effort. Also, formative evaluation, like needs assessment, offers a valuable opportunity to build support for change. But if proper protocol is violated, it could work against success. The audiences will focus attention on the violation, not instructional materials or methods.

Step 4: Describing the Population to Be Studied and Selecting the Subjects

The fourth step in conducting formative evaluation is to describe the population for study and to select participants. Always describe from the outset the population to be studied. Usually, instructional materials or methods should be tried out with a sample, usually chosen at random, from the targeted group of learners. But take care to precisely clarify the learners with whom the materials will be used. Should participants in formative evaluation be chosen on the basis of any specialized information such as situation-related characteristics, decision-related characteristics, or learner related characteristics?

Sometimes, it may be appropriate to try out instructional materials or methods with such specialized populations as exemplars (the top performers), veterans (the most experienced), problem performers (the lowest performers), novices (the least experienced), high-potential workers (those with great, but as yet unrealized, performance capabilities), or disabled workers. Formative evaluations conducted with each group will yield specialized information about how to adapt instructional materials to account for unique needs rather than taking a one-size-fits-all approach.

Once the learners have been identified, select a random sample. Use automated human resource information systems for that chore. If a specialized population is sought for the study, other methods of selecting a sample may be substituted. These could include announcements to employees or supervisors, word-of-mouth contact with supervisors, or appeals to unique representatives. If specialized methods of selecting participants for formative evaluation are used, consider the protocol involved in contacting possible participants, gaining their cooperation, securing permission from their immediate supervisors or union representatives, and getting approval for any time off the job that may be necessary.

Step 5: Identifying Other Variables of Importance

The fifth step in conducting a formative evaluation is to identify other variables of importance. Ask these questions to identify the variables:

  1. What settings should be used for the formative evaluation?
  2. What program issues are particularly worth pretesting before widespread delivery of instruction?
  3. Should the formative evaluation focus solely on instructional issues, and how much (if at all) should it focus on other important but noninstructional issues such as equipment needs, staff needs, financial resources required, facility requirements, and noninstructional needs of participants?
  4. What positive but postinstructional outcomes of the planned learning experience can be anticipated? What negative postinstructional outcomes can be anticipated?
  5. What estimates should be made about expected costs of the instructional program?
  6. How accurate are the prerequisites previously identified?

Step 6: Formulating a Study Design

The sixth step in conducting a formative evaluation is to create an evaluation design. The central question is this: How should the formative evaluation be conducted?

An evaluation design is comparable, in many respects, to a research design (Campbell and Stanley 1966), except that its purpose is to judge instructional materials and methods rather than make new discoveries. An evaluation design is the “plan of attack”—the approach to be used in carrying out the evaluation. In formulating a design, be sure to (1) define key terms; (2) clarify the purpose and objectives of the evaluation; (3) provide a logical structure or series of procedures for assessing instructional materials and methods; (4) identify the evaluation's methodologies, such as surveys, trial runs or rehearsals, and interviews; (5) identify populations to be studied and means by which representative subjects will be selected; and (6) summarize key standards by which the instructional materials and methods will be judged.

Step 7: Formulating a Management Plan to Guide the Study

The seventh and final step in conducting a formative evaluation is to formulate a management plan, a detailed schedule of procedures, events, and tasks to be completed to implement the evaluation design. A management plan should specify due dates and descriptions of the tangible products resulting from the evaluation. It should also clarify how information will be collected, analyzed, and interpreted in the evaluation.

The importance of a management plan should be obvious. When a team is conducting a formative evaluation, the efforts of team members must be coordinated. A management plan helps avoid the frustration that results when team members are unsure of what must be done, who will perform each step, and where and when the steps will be performed.

There are two ways to establish a management plan. One way is to prepare a complete list of the tasks to be performed, preferably in the sequence in which they are to be performed. This list should be complete and detailed, since this task-by-task management plan becomes the basis for dividing up the work of instructional designers, establishing timetables and deadlines, holding staff members accountable for their segments of project work, and (later) assessing individual and team effort.

A second way is to describe the final work product of the project and the final conditions existing on project completion. What should the final project report contain? Who will read it? What will happen because of it? How much and what kind of support will exist in the organization to facilitate the successful introduction of the solution? Ask team members to explore these and similar questions before the formative evaluation plan is finalized, using their answers to organize the steps to achieve the final results.

Four Major Approaches to Conducting Formative Evaluation

Although there are many ways to conduct formative evaluation (Bachman 1987; Chernick 1992; Chinien and Boutin 1994; Dick and King 1994; Gillies 1991; Heideman 1993; Russell and Blake 1988; Tessmer 1994; Thiagarajan 1991), four major approaches will be discussed here. Each has its own unique advantages and disadvantages. These approaches may be used separately or in combination. These include:

  1. Expert reviews.
  2. Management or executive rehearsals.
  3. Individualized pretests and pilot tests.
  4. Group pretests and pilot tests.

We will describe each approach briefly.

Expert Reviews

There are two kinds of expert reviews: (1) those focusing on the content of instruction and (2) those focusing on delivery methods. Most instructional designers associate expert reviews with content evaluation. Expert reviews focusing on content are, by definition, conducted by subject-matter experts (SMEs), individuals whose education or experience regarding the instructional content cannot be disputed. Expert reviews ensure that the instructional package, often prepared by instructional designers experts (IDEs) who may not be versed in the specialized subject, follows current or desired work methods or state-of-the-art thinking on the subject.

A key advantage of the expert review is that it ensures that materials are current, accurate, and credible. Expert reviews may be difficult and expensive to conduct if “experts” on the subject cannot be readily located or accessed.

Begin an expert review by identifying experts from inside or outside the organization. Do that by accessing automated human resource information systems (skill inventories) if available, contacting key management personnel, or conducting surveys. Identify experts outside the organization by asking colleagues, accessing automated sources such as the Association for Talent Development's Membership Information Service, or compiling a bibliography of recent printed works on the subject and then contacting authors.

Once the experts have been identified, prepare a list of specific, open-ended questions for them to address about the instructional materials. Prepare a checklist in advance to ensure that all questions you want answers to are considered and answered thoroughly. See Exhibit 14.2.

Expert reviews are rarely conducted in group settings; rather, each expert prepares an independent review. The results are then compiled and used by instructional designers to revise instructional materials. Expert reviews that focus on delivery methods are sometimes more difficult to conduct than expert reviews focusing on content. The reason: experts on delivery methods are not that easy to find. One good approach is to ask “fresh” instructional designers, those who have not previously worked on the project, to review instructional materials for the delivery methods used.

For each problematic issue the reviewers identify, ask them to note its location in the instructional materials and suggest revisions. Another good approach is to ask experienced instructors or tutors to review an instructional package. If the package is designed for group-paced, instructor-led delivery, offer a dress rehearsal and invite experienced instructors to evaluate it. If the package is designed for individualized, learner-paced delivery, ask an experienced tutor to try out the material.

Management or Executive Rehearsals

Management or executive rehearsals differ from expert reviews. They build support by involving key stakeholders in the preparation and review of instructional materials prior to widespread delivery. In a management rehearsal, an experienced instructor describes to supervisors and managers of the targeted learners what content is covered by the instructional materials and how they are to be delivered. No attempt is made to “train” the participants in the rehearsal; rather, the focus is on familiarizing them with its contents so they can provide support to and hold their employees accountable for on-the-job application.

To conduct a management or executive rehearsal, begin by identifying and inviting key managers to a briefing of the materials. Some instructional designers prefer to limit invitations to job categories, such as top managers or middle managers. Others prefer to offer several with various participants rehearsals.

Prepare a special agenda for the rehearsal. Make it a point to cover at least the following eight aspects: (1) the purpose of the instructional materials; (2) the performance objectives; (3) the business needs, human performance problems, challenges, or issues addressed by the instruction; (4) a description of targeted learners; (5) evidence of need; (6) an overview of the instructional materials; (7) steps taken so far to improve the instruction; and (8) steps that members of this audience can take to encourage application of the learning in the workplace.

Individualized Pretests and Pilot Tests

Individualized pretests, conducted onsite or offsite, is another approach to formative evaluation. Frequently recommended as a starting point for trying out and improving draft instructional materials, they focus on learners' responses to instructional materials and methods, rather than those of experts or managers. Most appropriate for individualized instructional materials, they are useful because they yield valuable information about how well the materials will work with the targeted learners. However, pretests and pilot tests have their drawbacks: they can be time consuming, and they require learners to take time away from work and may pose difficulties for supervisors and co-workers in today's lean staffed, right-sized organizations.

Individualized pretests are intensive “tryouts” of instructional materials by one learner. They are conducted to find out just how well one participant fares with the instructional materials. A pretest is usually held in a nonthreatening or off-the-job environment, such as in a corporate training classroom or learning center. Instructional designers should meet with one person chosen randomly from a sample of the target population. Begin the session by explaining that the purpose of the pretest is not to “train” or evaluate the participant but, instead, to test the material. Then deliver the material one-on-one. Each time the participant encounters difficulty, encourage the person to stop and point it out. Note these instances for future revision. Typically, instructional designers should direct their attention to the following three issues: (1) How much does the participant like the material? (2) How much does the participant learn (as measured by tests)? (3) What concerns does the participant express about applying what he or she has learned on the job? Use the notes from this pretest to revise the instructional materials.

The individualized pilot test is another approach to formative evaluation. It is usually conducted after the pretest, and focuses on participants' reactions to instructional materials in a setting comparable to that in which the instruction is to be delivered. Like pretests, pilot tests provide instructional designers with valuable information about how well the instructional materials work with representatives from the group of targeted learners. However, their drawbacks are similar to those for pretests: they can be time consuming, and they require learners to take time away from work.

Conduct a pilot test in a field setting, one resembling the environment in which the instructional materials are used. Proceed exactly as for a pretest with the following six steps: (1) select one person at random from a sample of the target population; (2) begin by explaining that the purpose of the pilot test is not to train or evaluate the participant but to test the material; (3) progress through the material with the participant in a one-to-one delivery method; (4) note each instance in which the participant encounters difficulty with the material; (5) focus attention on how much the participant likes the material, how much the participant learns as measured by tests, and what concerns the participant raises about applying on the job what he or she has learned; and (6) use the notes from the pilot test to revise instructional materials prior to widespread use.

Group Pretests and Pilot Tests

Group pretests resemble individualized pretests but are used to try out group-paced, instructor-led instructional materials. Their purpose is to find out just how well a randomly selected group of participants from the targeted learner group fares with the instructional materials. Held in an off-the-job environment, such as in a corporate training classroom or learning center, the group pretest is handled precisely the same way as an individualized pretest.

A group pilot test resembles an individualized pilot test but is delivered to a group of learners from the targeted audience, not to one person at a time. Typically the next step following a group pretest, it focuses on participants' reactions to instructional materials in a field setting, just like its individualized counterpart. Administer attitude surveys to the learners about the experience, and written, computerized assessments, or demonstration tests to measure learning. Realize in this process that a relationship exists between attitudes about instruction and subsequent on-the-job application (Dixon 1990).

Using Approaches to Formative Evaluation

Each approach to formative evaluation is appropriate under certain conditions. Use an expert review to double-check the instructional content and the recommended delivery methods. Use a management or executive rehearsal to build support for instruction, familiarize key stakeholders with its contents, and establish a basis for holding learners accountable on the job for what they learned off the job. Use individualized pretests and pilot tests to gain experience with, and improve, individualized instructional materials prior to widespread delivery; use group pretests and pilot tests to serve the same purpose in group-paced, instructor-led learning experiences.

Providing Feedback from Formative Evaluation

One final issue to consider when conducting formative evaluation is how to provide feedback to key stakeholders about the study and its results. The shorter the report, the better. One good format is to prepare a formal report with an attached, and much shorter, executive summary to make it easier for the reader, a one to two page.

The report should usually describe the study's purpose, key objectives, limitations, and any special issues. It should also detail the study methodology (including methods of sample selection) and instruments prepared and used during the study, and should summarize the results. Include copies of the instructional materials reviewed, or at least summaries. Then describe the study's results, including descriptions of how well learners liked the material, how much they learned as measured by tests, what barriers to on-the-job application of the instruction they identified, and what revisions will be made to the materials.

Formative product evaluation results are rarely presented to management, since their primary purpose is to guide instructional designers in improving instructional materials. However, instructional designers can feed back the results of formative evaluation to management as a way of encouraging management to hold employees accountable on the job for what they learned.

Summative Evaluations

Summative evaluation involves gathering information about a learning intervention after it has been deployed. It helps the instructional designer and other key decision makers identify what worked and what didn't work, determine value, and report on the difference made because of the solution. Besides identifying improvements, summative evaluation also helps to determine next steps such as accelerating the deployment to reach learners more quickly, expanding deployment to reach more learners, and sometimes discontinuing the intervention if results are deemed insufficient relative to the costs.

Kirkpatrick's Four Levels

In 1960, Donald Kirkpatrick introduced his now famous “Four Levels” framework (Kirkpatrick 1959, 1960). Still today, this is the most widely used framework for thinking about and conducting learning evaluation in organizations. Level 1 focuses on learner satisfaction, level 2 evaluates acquisition of new knowledge or skill, level 3 examines learning transfer from the classroom to the workplace, and finally, level 4 determines the impact of the intervention on organizational or business outcomes. With each successive “level” of evaluation, starting with level 1, the focus moves from the individual to the organizational impact of the intervention. Each level yields different insights and is important for various reasons. Rigor, resource intensity, sophistication and expense also increases with each successive level. The frequency with which the levels are employed within organizations decreases as you ascend from level 1 through level 4. Various stakeholders place greater or lesser importance on the different levels. The time at which each level of evaluation is used also differs. Levels 1 and 2 occur during or immediately after the intervention is complete whereas Levels 3 and 4 evaluations can be conducted days, months, or even years after the invention. This section will detail each of the four levels and some of the key applications and considerations to apply them effectively.

  1. Level 1. This level of evaluation attempts to measure learner satisfaction (“did they like it?”). This form of evaluation is used most frequently in organizations. The most common means of gathering level 1 data is via a post-delivery survey, which is often and somewhat sarcastically referred to as a “smile sheet,” not to minimize the value of level 1 evaluations—they are often the lifeblood of professional facilitators. Many instructors can barely stand to wait to receive their evaluation results after delivery is complete. This is because it provides almost instantaneous feedback about how well (or poorly) they did during the session. Facilitators can gain valuable insights into their performances and training managers can hold facilitators accountable through level 1 feedback. Facilitator effectiveness is only one of several dimensions that can be assessed through this form of evaluation. Facilities and room set-up, preprogram communications and logistics, and food quality and service are important insights for training managers and operational staff. Instructional designers are highly interested in level 1 data as it relates to attainment of instructional/performance objectives, participant material, activities and exercises, content, length, and flow. Data can be collected on all via a level 1 evaluation.
  2. Level 1 evaluations are relatively easy to administer and yield many useful insights. In the past, administration was typically via a paper-based survey that learners completed at the end of the program. Today, paper-based evaluations may still be used, but in many organizations they have been replaced by online surveys using commonly available survey software applications such as Zoomerang or Survey Monkey, or more powerful integrated software platforms like Metrics That Matter (www.knowledgeadvisors.com).
  3. A recent development in level 1 evaluation is to incorporate the concept of Net Promoter Score (NPS) (see www.netpromoter.com/why-net-promoter/know). NPS is a marketing and customer loyalty concept pioneered by Fred Reichheld (2006), who states that NPS “is based on the fundamental perspective that every company's customers can be divided into three categories: Promoters are loyal enthusiasts who keep buying from a company and urge their friends to do the same. Passives are satisfied but unenthusiastic customers who can be easily wooed by the competition. Detractors are unhappy customers trapped in a bad relationship” (19). One question is asked about the respondent's willingness to refer or recommend this company, product, or service to a colleague or friend. A calculation is made based on the number of respondents who fall into each category and an NPS score is calculated and can be tracked. Besides the primary quantitative question, respondents are also asked to provide qualitative feedback stating the reason for the score and what the company can do to improve. Mattox (2013) has conducted research and introduced strategies and approaches to incorporating NPS into learning evaluation. While NPS is growing in popularity, it is not without its critics and skeptics (Thalheimer 2013).
  4. Despite their widespread use and obvious benefits, level 1 evaluations have limitations. Level 1 feedback is highly subjective because it is reporting how the learners felt about the experience. Learner satisfaction, like customer satisfaction, is of critical importance, but a long running fable in the learning field is of a study that found there was a stronger correlation between learner satisfaction and the size of the donuts served in the morning of the program, than any other variable—the larger the donuts, the higher the level 1 evaluation scores! (See Exhibit 14.3 for a sample level 1 evaluation.)
  5. Level 2. The purpose of Level 2 evaluations is to ascertain knowledge or skill acquisition (“did they learn it?”). While level 1 evaluation is used the most frequently, level 2 evaluation is also common. Testing is the most common form of level 2 evaluation. Knowledge tests attempt to evaluate a learner's increase in awareness or understanding of a new concept, idea, or information. They are often used at the end of a learning program, but can also be embedded throughout. Sometimes a pretest is administered to determine baseline knowledge before learning begins. A multiple choice test at the end of a Know Your Security online compliance training program is an example of a knowledge test to determine a person's understanding of the role they play in protecting their organization from threats such as theft of intellectual property. Tests can include various types of questions such as multiple choice, true/false, matching, fill-in-the-blank, or written and can be paper-based or electronic. There is typically a “correct” answer against which learners are graded.
  6. A skill-oriented test attempts to measure the acquisition of, or proficiency in, a behavior. While a skill typically has a knowledge component(s) to it, a skill test focuses on demonstrating an observable skill. A driver's test is not only a rite of passage for most teenagers, it's an example of a skill test. Observation, discussed earlier, is often used with this form of evaluation so evaluators can watch for predetermined expected behaviors (braking, maneuvering, accelerating, hand position, parallel parking) and how well or poorly they are exhibited. Testing to obtain a driver's license is a good example of combining tests—there's a knowledge test to demonstrate an understanding of the “rules of the road” and in the vehicle skill test to demonstrate essential driving abilities.
  7. In organizations, testing is used in the learning process to certify individuals as capable of performing. Capability to perform and actual on-the-job performance are two things, which will be covered shortly. Organizations often use testing during and after learning interventions for employees who will enter new roles. Sometimes certification in a job role is required and even when not mandatory, the intent is to ensure a worker is fully prepared as they onboard and perform in their new role. In many organizations testing is used for compliance, sometimes required by government regulators. The validity of testing used and the maintenance of records are essential to prove compliance and avoid potential heavy penalties or worse. Organizations like Questionmark (www.questionmark.com) and Knowledge Advisors (www.knowledgeadvisors.com), through their Metrics That Matter platform, provide sophisticated technology-based assessment and testing databases, tools, and mobile applications. (See Exhibit 14.4 for a sample level 2 evaluation.)
  8. Level 3. While levels 1 and 2 primarily occur during or immediately following learning, level 3 in Kirkpatrick's framework attempts to provide insight into applying learning on-the-job and occurs post-program and answers the question “did they apply it?” It helps to measure the transfer of learning that occurred between the learning event and some future point in time. The actual timeframe for conducting a level 3 evaluation can vary depending on several factors including the knowledge or skill involved, the opportunities afforded the learner to apply what they learned, the motivation to apply what was learned, and the level of interest and urgency key decision makers have in seeing the results of the level 3 evaluation.
  9. Level 3 evaluation can use several of the data collection methods reviewed earlier in this chapter including interviews, surveys, and observations. The learners themselves can self-report, through a survey, the extent to which they've applied the newly acquired knowledge or skill. Likewise, the direct manager, peers, and even customers who interact with learners can provide level 3 feedback on what they've seen the learner apply. Direct observation, by a training professional, manager, or peer, is another means by which to evaluate transfer of learning. Consider the short vignette below.

    As a new call center representative, Leslie went through three weeks of intensive training to learn both the service process of her new company, 800 Service Direct, and also “soft” skills such as dealing with an irate customer, active listening, and responding with empathy. After the training, Leslie fielded calls on her own. Her supervisor, Marco, used a common call center technology known as “double jacking” whereby he could listen in on her calls. After each call, Leslie and Marco reviewed and discussed what she did as well as how she could improve. Using an observation checklist, Marco filled in with notes during the phone calls. Over time, Leslie's ability to handle calls, even the proverbial “curve balls,” increased to a point where she was fully proficient and able to handle calls entirely on her own.

  10. This vignette illustrates a direct observation (using a standardized checklist) approach to evaluating the transfer of learning, but it goes a step beyond. It not only evaluates the degree to which Leslie applied what she learned during her three weeks of training, it also supported a coaching tool to engage an iterative process so Marco could assist her in getting from a lower level of competence to full proficiency. This is an example of a level 3 evaluation conducted relatively quickly after the initial training because there was a need to have Leslie fully performing in her new role quickly. The “costs” associated with incompetence were too great—even including the potential loss of valued customers. Further, this learning lends itself well to almost real time level 3 evaluation because the skills and knowledge in this case are an essential, if not complete, part of job performance. Other forms of learning, like a workshop on Creativity and Innovation, may lend themselves to level 3 evaluation that occurs many months following the learning event itself.
  11. Level 4. Also occurring postlearning, level 4 evaluation is the least frequently used level among the four, due to the time, cost, and resources typically involved and the expertise needed to do it effectively. Level 4 evaluation attempts to measure the impact of the learning (“Did it make a difference?”). This level goes beyond applying learning to determine the results or impact of that application. In the call center scenario described above, level 4 metrics might include things like customer retention, reduced call time, or increased sales. Level 4 is about organizational or business metrics that relate to quality, productivity, efficiency, sales, profitability, cycle time, turnover, performance, and other, typically quantifiable, measures.
  12. It's difficult to conduct a level 4 evaluation when key metrics and baseline measurement are not identified and integrated up-front, during the needs assessment phase of the instructional design process. Designing and deploying a Building High Performance Teams program and then, after the fact, saying “Let's do a level 4 evaluation on it” is a foolhardy endeavor for the professional instructional designer. Better to plan for, design, and set the stage for level 4 evaluation from the beginning of the project, by asking the customer questions about quantifiable results not being achieved or that are desired.
  13. One of the issues that may arise when conducting level 4 evaluations is isolating the impact the learning had on the level 4 metrics compared to the impact that many other potential variables had (the economy, the job market, a new organizational strategy, change of leadership, new technology, or better tools and equipment). It is easy for others to poke holes at level 4 evaluation studies by pointing to a myriad of other factors, other than or besides learning, that could have influenced the results. The wise instructional designer will engage closely with others, such as business leaders, finance experts, and other subject experts when they embark on this evaluation endeavor because these individuals are well versed and actually “own” the organizational metrics under examination.

Alternatives to Kirkpatrick

Despite its widespread adoption in organizations, the Kirkpatrick model is not without its critics. Some have lamented that it was introduced nearly five decades ago and has changed little since that time. Others suggest that it is too linear or has an over emphasis on training when multiple interventions may be used and needed in a performance improvement situation. Some have gone beyond merely being critical of Kirkpatrick and have proposed enhancements or alternative approaches. Phillips's (2011) ROI Model, Brinkerhoff's (2005) Success Case Method.

The Phillips ROI Model

The Phillips (2011) ROI model extends beyond Kirkpatrick's fourth level and adds a fifth level, which attempts to calculate the return-on-investment (ROI) of the intervention. ROI is a financial calculation that quantifies the financial value of the impact measures identified in level 4 (such as sales or productivity) relative to the costs of the intervention. The ROI formula is: ROI (%) = Net Program Benefits/the Program Costs × 100.

Success Case Method

Robert Brinkerhoff (2010) proposed an alternative to Kirkpatrick's framework and called it the Success Case Method (SCM). This approach looks beyond traditional training interventions and also recognizes that many variables may be at play for performance and results. Brinkerhoff (2005) asserts that “Performance results can't be achieved by training alone; therefore training should not be the object of evaluation” (87).

The Success Case approach takes more of a holistic and systemic approach and suggests that the following questions be addressed:

  • How well is an organization using learning to improve performance?
  • What organizational processes/resources are in place to support performance improvement?
  • What needs to be improved?
  • What organizational barriers stand in the way of performance improvement? (Brinkerhoff, 88)

Brinkerhoff's approach to evaluation “combines the ancient craft of storytelling with more current evaluation approaches of naturalistic inquiry and case study” (91). Given its more qualitative nature, data collection methods associated with the SCM may include interviews, observations, document review, and surveys.

As the name suggests, the Success Case Method attempts to identify individuals and/or groups who have applied their learning and who achieved positive organizational outcomes or results. Besides identifying successful examples, the SCM attempts to answer why they succeeded (what enabled success?). And while less glamorous, the method can also look at examples of unsuccessful applications or outcomes and then attempt to pinpoint the reasons for this lack of success (what were the barriers or challenges encountered?). Both dimensions can be useful in identifying stories that help to paint a picture and determine the worth or value of the intervention and how it can be improved going forward.

Reporting Evaluation Results

An evaluation report can take a variety of formats and is used to capture the results so they can be communicated to various stakeholders. The following are questions that can help the instructional designer think through the best format to use for the report:

  • Is the evaluation report being requested by someone or is the designer initiating the report? In some situations the instructional designer is asked by someone to provide an evaluation report (a reactive situation). In other circumstances the designer assumes the responsibility for creating and sharing an evaluation report (a proactive approach). When an evaluation report is requested, the designer solicits input from the requestor regarding what is of highest interest and importance. This insight guides the focus of what is included in the report. In proactive cases, the designer must try to anticipate what might be most valuable and useful to the recipient and this can help to guide the content and report creation. Where the designer is being proactive, there is probably nothing preventing a direct conversation with the potential stakeholder(s) to explain the intent and solicit information about what would be valuable to that person.
  • Who is the recipient of the evaluation report and what is their interest in the results? Return to the list of possible stakeholders and key questions these individuals may be interested in. It may be useful to separate primary from secondary stakeholders. Primary stakeholders are those who are the main recipient of the evaluation report and will make important decisions, like whether or not to continue funding. While difficult, an effective designer can “walk in the shoes” of the primary recipient and get well grounded in the information that person is most likely to be interested in, so it can be included in a way that is easy to interpret and act upon. While secondary stakeholders are important, the level of rigor and customization of the report is typically less than is afforded those who are primary.
  • What is my purpose in sharing this evaluation information? An evaluation report can serve many purposes and the wise designer will be intentional in his or her approach to achieve the maximum impact. One purpose may be to inform the stakeholder of the intervention itself. The evaluation report raises awareness of work that may otherwise not be visible or widely recognized. In this way, the evaluation report serves as a marketing or communication tool. Another purpose is to describe the impact of the intervention and justify the investment. Some stakeholders may ask “What happened as a result of that program?” or “What return did we get out of that initiative?” Armed with an evaluation report, a designer is equipped to answer those questions if asked or can even preempt the question by sharing the report proactively with key stakeholders. Another purpose is to stimulate an action to be taken. Action that might be taken could be internal to the learning organization such as making adjustments to the design or recruiting new facilitators. External actions could include things like expanding the deployment of a program to a different business unit or globally, decreasing funding, or deciding whether to outsource facilitation.
  • What preference does the stakeholder interested in the evaluation results have regarding how the information is presented? Evaluation results can be shared with stakeholders in a variety of ways. An evaluation report can be delivered with no opportunity for discussion or interaction with the stakeholder. Or, it may be sent as a preread and then discussed in an in-person or virtual meeting. Another approach is that the evaluation results can be presented to the stakeholder using a presentation program like PowerPoint or Prezi. Again, knowing the style, preferences, and expectations of the audience in advance helps to guide the manner in which the report is delivered.
  • What level of depth or detail does the stakeholder wish to receive? Similar to the manner in which the report is presented, another factor is the level of detail desired by the stakeholder. Just because a stakeholder is a senior executive does not automatically mean that the report should stay high level, as some senior leaders want to understand the specifics or may have questions about the details. An effective designer will seek to understand the needs of the stakeholder and then select what is presented in the report based on this insight. As revealed shortly, an executive summary is a way to pull together the key elements of the evaluation into a succinct snapshot that may be sufficient for some audiences while an appendix can contain data, details, or backup material.

Creating the Report

This section describes a more traditional or formal written format that an evaluation report might take as a way to summarize the contents and sequencing that are typically found (Torres et al. 2005). This framework can be expanded or condensed and sections can be removed depending on the audience and intent of the report. Also, this framework can be used to create a stand-alone report or the contents can be converted into other presentation formats such as PowerPoint for the basis of verbally communicating select information to key stakeholders. Below are the main elements of an evaluation report.

  • Title page: Includes the report title, date, the author who conducted the evaluation, and the recipient of the report.
  • Executive summary: Includes the overall purpose and objectives of the evaluation, methodology used, and primary findings and recommendations or next steps.
  • Introduction/overview: Includes the background and context of the initiative, target audience, goals and objectives, methodology used (evaluation approach and data collection methods and sources), and any limitations of the evaluation.
  • Findings and results: Includes level 1 (participant reaction and satisfaction), level 2 (learning), level 3 (behavior change or transfer of learning), level 4 (impact), and level 5 (return on investment) evaluation results or uses other organizing methods to objectively present the results (e.g., key questions, data collected, themes discovered, etc). Findings should balance both positive and suboptimal results. Metrics, visuals (charts, graphs, tables), and qualitative themes should be included. The report author should also include any barriers and enablers to achieving the objectives.
  • Recommendations and next steps: Includes the implications of the findings and results and summarizes the conclusions that can be logically drawn. Once recommendations are presented, any next steps/actions that can be taken because of these conclusions are shared.
  • Appendix: Additional details related to any of the above items, such as participant information, program materials, tests, surveys or other evaluation tools, and any other background information.

Disseminating the Report

Once created, the next step is to distribute the report to stakeholders in the most effective way so the information in the report is most likely to be reviewed, processed, and acted upon. The timing of the dissemination is one consideration. If the report is distributed too long after the intervention, the value and relevance may be diminished. Likewise, if it is sent out at a time when it may compete with other priorities, such as the end or beginning of a business cycle, it may not get the attention it deserves from stakeholders. The method of dissemination, mentioned earlier, is ideally matched to the needs and desires of the key recipient(s).

Sometimes using multiple methods addresses the varied needs of the audience and reinforces the key messages being conveyed. A short preread could be sent prior to an in-person meeting that incorporates a presentation and discussion, which is then followed by a full report. Media involves the vehicles used to distribute the report. A traditional approach is to use a word-processing software such as Microsoft Word to create a print-based report that can be printed or distributed electronically. Other software such as PowerPoint can create an evaluation report that incorporates graphics and animation. More sophisticated web-based tools can create a multimedia-based approach that lets the recipient engage with the material in an interactive and dynamic manner (rather than a one-directional method).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset