Chapter Thirteen
Designing Learning Assessments

According to Instructional Design Competencies: The Standards (2013), instructional designers should be able to (Koszalka, Russ-Eft, and Reiser 2013, 54) “design learning assessment (advanced). This is an advanced competency and it includes three performance statements, two are essential and one is advanced.” They include: “(a) identify the learning processes and outcomes to be measured (essential); (b) construct reliable and valid methods of assessing learning and performance (advanced); (c) ensure that assessment is aligned with instructional goals, anticipated learning outcomes, and instructional strategies (essential)” (Koszalka, Russ-Eft, and Reiser 2013, 54).

This chapter provides additional information about this competency. It offers advice on how to identify the learning processes and outcomes to be measured, construct reliable and valid methods of assessing learning and performance, and ensure that assessment is aligned with instructional goals, anticipated learning outcomes, and instructional strategies.

Introduction

Instructional designers should usually develop performance measurements during or immediately following the preparation of performance objectives. Measurements of all kinds—sometimes called metrics—have been commanding attention in recent years. One reason has been growing demand by stockholders and stakeholders for accountability from all organizational levels. Another reason is that instructional designers are being held accountable for showing results for whatever training investments are made by their organizations.

What Are Performance Measurements?

Performance measurements are various means established by instructional designers for monitoring learner achievement. Paper-and-pencil tests are perhaps the most common. Test items may be developed directly from performance objectives before instructional materials are prepared. In this way, accountability for results is built into instruction early on.

However, paper-and-pencil testing is not the only way to assess learner achievement. Other methods may also be used. For instance, trainees can be observed on the job as they perform the tasks they have learned. Computerized skills assessment is also becoming common, as is portfolio analysis, in which work samples are assessed.

Why Are Performance Measurements Important?

Performance measurements become benchmarks that, with performance objectives (discussed in the previous chapter), provide guidance to prepare instructional programs. They help answer an age-old question about every instructional experience: “What should be taught?” (Egan 1978, 72). They are important for three major reasons. First, they ensure economical choice of instructional content. Establishing performance measurements is part of the preliminary work to be completed before instructional materials are developed; it helps identify the content that should be included and the success level expected of learners upon completion of instruction. Second, performance measurements provide a basis for learner accountability to ensure that learner progress toward predetermined performance goals can be monitored during and after instruction. Third, performance measurements can help link up learner achievement to organizational strategic plans.

Identifying What Learning Processes and Outcomes to Measure

Instructional designers should be capable of developing tests, written questionnaires, interviews, and other methods of measuring performance. The performance measures should be written and correspond to performance objectives, rely on methods of measuring learning outcomes, comply with time and instructional constraints, and meet requirements for validity and reliability. Instructional designers should be able to develop performance measurements when furnished with necessary information on the characteristics of learners, the settings in which they are expected to perform, constraints on performance and instructional development, instructional objectives, and plans for analyzing needs and evaluating results.

Stated simply, instructional designers should be able to answer two basic questions before they prepare instructional materials: (1) what should be measured? and (2) how should it be measured? To answer the first question, instructional designers should determine the purpose of the measurement and focus on methods of measuring instruction. To answer the second question, they should be able to design instruments—and write items for the instruments—to achieve the intended purpose.

Deciding on the Purpose

Once performance objectives have been written based on work requirements, instructional designers should decide:

  • What purpose will guide their performance measurement efforts?
  • What performance measurement methods should assess learners' progress?
  • How performance should be measured.

Instructional designers should always begin by clarifying their purposes for measuring performance. There are at least four possible purposes (Kirkpatrick and Kirkpatrick 2006):

  1. Participant reaction. How much do participants enjoy what they are learning? How much do they enjoy the instructional methods used?
  2. Participant learning. How well are participants meeting performance objectives? How well have they learned?
  3. On-the-job performance change. How much change is evident on the job, based on what participants have learned? How well has the learning transferred from the instructional to the application environment?
  4. Organizational impact. How has the organization been affected by the results of an instructional experience?

Determining Sources of Information

After determining the purpose of performance measurement, instructional designers should next determine the sources of information used in measurement. There are three major sources of information. Performance objectives are the first. They should provide clues about what to measure because each objective must contain a measurable criterion for assessment. To measure performance instructional designers should consider how well learners have met the criterion set forth in each objective. Each objective should be directly tied to meeting job-related learning needs. Hence, measuring objectives provides information about how well learning needs are being met by instruction.

Learner (worker) performance is the second source of information. Since instruction is—or should be—intended to improve individual performance in the workplace, information about what to measure should result from analysis of worker responsibilities, work standards, historical patterns of experienced workers' performance problems on the job, and forecasts of likely future job changes. Using job descriptions, performance appraisal data, work standards, and such other information as emerges from the results of participant reaction sheets, instructional designers should be able to develop performance measures linked directly to successful job performance.

Stakeholder preferences are the third source of information. Stakeholders are people who have a vested interest in instructional outcomes. Consider, for instance, what top managers and other interested parties want to know about instruction or its results. Often, instructional designers find that two key questions merit special consideration when measuring instruction or its results: (1) who wants to know? and (2) what do they want to know? A third question that may be addressed is, why do they want to know? Some instructional designers find it helpful to consult a menu of general questions about performance measures when deciding what to measure. Rae (1986, 9–10) developed such a menu, shown below, that still remains useful.

Issue Questions
Content of instruction Is it relevant and in step with the instructional needs?
Is it up-to-date?
Method of instruction Were the methods used the most appropriate ones for the subject?
Were the methods used the most appropriate for the learning styles of the participants?
Amount of learning What was the material of the course?
Was it new to the learner? Was it useful, although not new to the learner, as confirmation or revision material?
Instructor skills Did the instructor have the attitude and skill to present the material in a way that encouraged learning?
Length and place of instruction Given the material essential to learning, was the length and pace of the instruction adequate? Were some aspects of instruction labored and others skimped?
Objectives Did the instruction satisfy its declared objectives?
Was the learner given the opportunity to satisfy any personal objectives?
Was this need welcomed?
Were personal objectives satisfied?
Omissions Were any essential aspects omitted from the learning event?
Was any material included that was not essential to the learning?
Learning transfer How much of the learning is likely to be put into action when the learner returns to work?
If it is to be a limited amount only or none, why is this?
What factors will deter or assist the transfer of learning?
Accommodation If course accommodation is within the control of the instructor or relates to the instructional event, he or she may wish to ask whether the hotel or conference center training center was suitable.
Was the accommodation acceptable?
Were the meals satisfactory?
Relevance Was this course/seminar/conference/workshop/tutorial/coaching assignment/project the most appropriate means of presenting a learning opportunity?
Application of learning Which aspects of your work now include elements which result directly from the learning event?
Which new aspects of work have you introduced because of your learning?
Which aspects of your previous work have you replaced or modified because of the learning?
Which aspects of your learning have you not applied? Why not?
Efficiency How much more efficient or effective are you in your work because of the instructional experience? Why or why not?
Hindsight With the passage of time and attempts to apply the learning, are there any amendments you would wish to make to the training you received?
Select appropriate sources of information for performance measurement based on learner characteristics, setting resources and constraints, statements of performance objectives, and needs assessment or analysis or evaluation plan.

Deciding How to Measure

When deciding how to measure performance, instructional designers should apply the same classic criteria that Newstrom and Lilyquist (1979) have suggested in selecting a data collection method for needs assessment. The following five issues may warrant consideration:

  1. Learner involvement: How much learner involvement is desired or feasible?
  2. Management involvement: How much management involvement is desired or feasible?
  3. Time required: How much time is available for measurement?
  4. Cost: How much is the organization willing to spend to measure performance?
  5. Relevant quantifiable data: How important is it for instructional designers to devise quantifiable measurements directly linked to on-the-job performance?

Different methods of measuring performance earn high, moderate, or low ratings on each of these criteria. It is usually necessary to identify priorities—that is, determine which one is the most important, second most important, and so on.

An Overview of Steps in Preparing Instruments

Having decided on a purpose (what is to be measured) and a measurement method (how it will be measured), instructional designers are then ready to develop measurement instruments. Instruments may be classified into three general types: (1) questionnaires, interview guides or schedules, observation forms, simulations, and checklists, (2) criterion-referenced tests, and (3) others. There are 10 basic steps to be taken during preparing a measurement instrument:

  1. Clarifying the purpose of measurement and selecting an instrument.
  2. Giving the instrument a descriptive title.
  3. Conducting background research.
  4. Drafting or modifying items.
  5. Sequencing—or reviewing the sequence of—items.
  6. Trying out the instrument on a small-group representative of the learner population.
  7. Revising the instrument based on the small-group tryout.
  8. Testing the instrument on a larger group.
  9. Using the instrument—but establishing a means of tracking experience with it.
  10. Revising the instrument—or items—periodically.

These steps are summarized in the following paragraphs.

Step 1: Clarifying the Purpose of Measurement and Selecting an Instrument

Instructional designers should develop performance measurements by thinking through exactly why they are measuring instruction and, more important, what results they wish to achieve. Performance objectives are one starting point, since one purpose of measurement should usually be to determine how well learners have met instructional objectives by the end of the instructional experience. Instructional designers should ask themselves, among other questions, this one: How can I find out whether these results are being achieved during the instructional experience and whether they were achieved following the instructional experience? At this point they can select or prepare an instrument well suited to helping answer this question.

Step 2: Giving the Instrument a Descriptive Title

If performance will be measured using an instrument developed by someone else, instructional designers should consider the title to see if it accurately describes what they wish to measure. If the instrument will be tailor-made, the title should be chosen with great care. The reason: by selecting a title, instructional designers focus their thinking on exactly what will be measured.

Step 3: Conducting Background Research

Instructional designers can often save themselves considerable time and effort by locating previously prepared instruments. One way to do that is to network with other instructional designers to find out whether they have developed instruments for similar purposes. In addition, instructional designers can sometimes successfully track down elusive instruments or research studies by using specialized reference guides. Tests in print can be located through the impressive library of the Educational Testing Service in Princeton, New Jersey, which maintains a collection of 10,000 tests.

Background research on instrumentation will rarely be a complete waste of time. Even when instructional designers cannot locate instruments that measure exactly what they want, they may still locate examples that will stimulate new ideas about item layout or item sequence.

When previously prepared instruments are found, instructional designers should decide whether to use them as they are or modify them to meet special needs. If previously prepared instruments can be easily modified, instructional designers can reduce the time and effort to prepare and validate an instrument. But if efforts to locate instruments or research are to no avail, then it will be necessary to prepare a tailor-made instrument. Begin instrument development by addressing several important questions: Who will be measured? Who will conduct the measurement? What will be measured? When will the measurement occur? Where will the measurement be conducted? How will the measurement be conducted?

Step 4: Drafting or Modifying Items

Relying on instructional objectives or other sources as a starting point, instructional designers should next decide what questions to ask to measure the changes wrought by the instructional experience. If a previously prepared instrument was located, each item must be reviewed to ensure that it is appropriate. Drafting original items or questions for interviews, questionnaires, observation forms, simulations, or checklists is a highly creative activity. Generate items or questions using focus groups or other creative methods.

When drafting items, instructional designers should consider item format. Item format refers to the way performance is measured. Questionnaires or interview guides, for instance, may rely on open-ended items, closed-ended items, or some combination. Open-ended items produce qualitative or essay responses. The question “What do you feel you have learned in this instructional experience?” is an open-ended item. Closed-ended items produce quantifiable responses. Respondents asked to “rate how much you feel you learned during this instructional experience on a scale from 1 to 5, with 1 representing ‘very little’ and 5 representing ‘very much,’” are answering a closed-ended item. An instrument relies on a combination when it contains both open-ended and closed-ended items.

Open-ended items are frequently used in conducting exploratory measurement studies. While the information they yield is difficult to quantify and analyze, they may also establish response categories for closed-ended instruments. Closed-ended items are frequently utilized in analytical measurement studies. Although the information they produce is easily quantified and analyzed, it can sometimes mislead if respondents are not given response categories. When that happens, respondents will select an approximation of what they believe and reply accordingly. Item format has a different, although related, meaning for observation forms, simulations, or checklists. These instruments are usually designed around observable behaviors associated with the instructional objectives or competent on-the-job performance. Instructional designers may prepare these instruments to count the frequencies of a behavior (How often did the learner do something?), assess the quality of a behavior (How well did the learner perform?), or both. The instrument user may exercise considerable flexibility in identifying what behavior to count or assess. Alternatively, the user may not exercise flexibility in assessing behaviors, because categories are predefined or methods of assessment have been provided on the instrument itself.

Item format has yet another meaning regarding tests. Developing criterion-referenced tests poses a challenge somewhat different from developing questionnaires, interviews, simulations, or other measurement instruments. Test preparation is an entire field of its own. When developing criterion-referenced tests, “the verb component of the instructional objective indicates the form that a test item should take” (Kemp 1985, 161). Examples of behaviors specified in instructional objectives and appropriately matched test item formats are shown in Table 13.1.

Table 13.1 Behaviors Specified in Instructional Objectives and Corresponding Test Items

Type of test item Brief description of test-item format Behavior (verb specified in the instructional objective)
1. Essay
(Example: “What are the chief advantages and disadvantages of the essay format as a test item?”)
A type of test item requiring a learner to respond in essay format. This type of item is appropriate for assessing higher levels of cognition—such as analysis, synthesis, and evaluation. Construct
Define
Develop
Discuss
Generate
Locate
Solve
State
2. Fill-in-the-blank
(Example: “The________ -in-the-blank is a type of test item.”)
A type of test item requiring the learner to fill in the blank with an appropriate word or phrase. Scoring can be objective because the required response is quite specific—often only one word is correct. Construct
Define
Identify
Locate
Solve
State
3. Completion
(Example: “A type of test item that requires the completion of a sentence is called the__________ .”)
A type of test item that closely resembles the fill-in-the-blank type, except that the learner is asked to complete a sentence stem. Construct
Define
Develop
Discuss
Generate
Identify
Locate
Solve
State
4. Multiple-choice
(Example: “A type of test item requiring the learner to choose from more than one possible answer is the (a) multiple-choice; (b) essay; (c) completion.”)
Kemp (1985, p. 162) calls multiple-choice “the most useful and versatile type of objective testing.” Learners must choose between three and five options or alternatives as the answer to a question. Discriminate
Identify
Locate
Select
Solve
5. True-false
(Example: “A true-false test item is less versatile than a multiple-choice one.” True-False)
A type of test item in which learners are asked to determine whether a statement is true or false. Discriminate
Locate
Select
Solve
6. Matching
(See the example below.)
A type of test item in which learners are asked to match up items in one column with items in another column. Discriminate
Locate
Select
7. Project
(Example: “Write an essay question to describe ten steps in preparing an assessment instrument.”)
A type of test in which learners are asked to demonstrate the ability to perform a task they have (presumably) learned through participation in an instructional experience. Construct
Develop
Generate
Locate
Solve

Source: Taken from W. Rothwell and H. Kazanas, Mastering the Instructional Design Process: A Systematic Approach (4th ed.) (San Francisco: Pfeiffer, 2008), 201–202.

Step 5: Sequencing—or Reviewing the Sequence of—Items

One choice is to sequence items in a logical order based on work tasks. Another choice is to sequence items according to a learning hierarchy.

Step 6: Trying Out the Instrument on a Small-Group Representative of the Learner Population

Sometimes called instrument pretesting, this step should not be confused with learner pretesting. If possible, instructional designers should select a sample of people representative of the learner population to participate in the instrument pre-test and ask for their help in identifying wording that is unclear or is otherwise inappropriate. Instructional designers should explain the instrument items to the group rather than ask them to answer the questions. Their responses should be noted for use during the next step.

Step 7: Revising the Instrument Based on the Small-Group Tryout

If a complete revision is necessary, which should rarely be the case, another small group should be selected for a second instrument pretest. Otherwise, instructional designers should revise items, based on their notes from the previous step, to improve clarity.

Step 8: Testing the Instrument on a Larger Group

The next step is a field test of the instrument on a larger group under conditions resembling, closely, those in which the instrument will later be used. The results of the field test should be noted.

Step 9: Using the Instrument—But Establishing a Means of Tracking Experience with It

Instructional designers should use the instrument but should also establish a way of tracking future experience with it. The results must be monitored. If tests are administered, instructional designers should periodically conduct item analysis to determine what questions the learners are missing and how often they are missing them. If questionnaires or interviews are used to measure performance, instructional designers must note the response patterns they receive to determine whether questions are yielding useful answers. If instructional designers are using structured observation, they should periodically review the categories they initially created.

Step 10: Revising the Instrument—or Specific Items—Periodically

As performance measurements are made using instruments, instructional designers gain experience. They can take advantage of that experience by periodically revising the instrument, or items on it. Revisions should also be made whenever changes are made to performance objectives or when new performance objectives are added.

Other Methods of Measuring Performance

Apart from questionnaires, interviews, simulations, and checklists, other methods may measure participant reactions, participant learning, on-the-job performance change, or organizational impact. However, not every method is appropriate for every purpose. These methods (note we do not call them items or instruments) include advisory committees, external assessment centers, attitude surveys, group discussions, exit interviews, and performance appraisals.

An advisory committee is a group comprising stakeholders in instructional experiences. A committee may be established as standing (permanent and formal) or ad hoc (temporary and informal). One way to use an advisory committee is to ask its members to observe an instructional experience and assess how well they feel its objectives are achieved. Another way is to direct results of participant tests or other measures to committee members for interpretation.

An external assessment center is measuring individual knowledge and skills. It is an extended simulation of job or group work. It could be used—although it would admittedly be expensive to do so—to determine what measurable change resulted from an instructional experience.

An attitude survey is usually intended to assess individual perceptions about working conditions, coworkers, work tasks, and other issues. It could determine people's perceptions of what changes or how much change resulted from instructional experiences.

A group discussion is a meeting. It could identify measurement issues or assess a group's perceptions about what changes or how much change occurred because of an instructional experience.

An exit interview is a meeting with an employee just prior to the individual's departure from an organization, department, or work unit. Sometimes, exit interviews may be combined with questionnaires mailed to terminating employees some time after they leave the organization. Exit interviews may identify measurement issues or assess an individual's perceptions about what changes or how much change occurred because of an instructional experience.

A performance appraisal is an assessment of an individual's job-related activities and results over a predetermined time frame. It could document a supervisor's perceptions of what changes or how much change occurred because of an individual's participation in an instructional experience.

Constructing Reliable and Valid Methods of Assessing Learning and Performance

Three issues should be considered in any effort to assess or evaluate people. One is reliability; one is validity; and one is credibility. The Standards covers the first two. The third is also worthy of consideration because it is perhaps most practical of all.

Reliability refers to the consistency of measures. It means that an assessment consistently measures what it is supposed to. There are several categories of reliability:

  • Interrater reliability examines how much agreement exists between several raters.
  • Test-retest reliability examines how much scores are consistent from one measurement opportunity to another.
  • Intermethod reliability examines how consistently scores are maintained when different approaches or instruments are used for the measurement.
  • Internal consistency reliability examines consistency in results across items in the same test.

Validity refers to how accurately a measure responds to the real world. Can an assessment measure what it is supposed to measure? Many forms of validity exist. Among them:

  • Construct validity answers the question “how well does an assessment really measure what it is supposed to measure based on the theory?”
  • Convergent validity answers the question “how much is an assessment associated with other measures based on theory-based correlation?”
  • Discriminant validity answers the question “how well does a measure discriminate from other measures that are supposed to be unrelated?”
  • Content validity looks at the content of an assessment to answer the question “how representative is an assessment of behaviors it is supposed to measure?”
  • Representation validity looks at how well an assessment can be conducted. Is it practical and possible to measure?
  • Face validity refers to how well an assessment appears to measure what it is supposed to measure.
  • Criterion validity has to do with correlating an assessment and a criterion variable. How well does an assessment compare to other measures of the same thing?
  • Concurrent validity refers to how much the assessment correlates with other assessments of the same thing.
  • Predictive validity answers the questions “how much will an assessment predict the occurrence of the same thing?” and “will an assessment predict that something will happen?”

Validity and reliability are important. Most forms of assessments must be tested for their reliability or validity, and that is especially important to do if employment decisions will be made because of a worker's scores on an assessment. Often, competent statistical and psychometric consulting help must be used to ensure that all tests of validity and reliability are performed properly. That is true if assessment results may be challenged on the basis of employment discrimination.

Credibility refers to the trustworthiness of the assessment or evaluation method or tool.

Stated another way, do people such as managers believe in the assessment, and does a high or low score mean something that they believe? In many organizations, instructional designers may find that credibility with managers and workers is far more important for daily actions and decisions than statistical tests of reliability and validity.

Ensuring the Assessment Is Aligned with Instructional Goals, Anticipated Learning Outcomes, and Instructional Strategies

To ensure that assessment is aligned with instructional goals, anticipated learning outcomes, and instructional strategies, instructional designers should develop simple yet elegant strategies to double-check what they have designed with:

  • Strategists
  • Learners
  • Other instructional designers on the team.

A good time to do that is during formative evaluation, assuming it is conducted. If it is not, then answering questions about these issues should be integrated with pilot tests or rapid prototyping and field-based testing efforts. It is as simple as asking stakeholders such questions as:

  • How well are assessments aligned with the organization's strategic goals?
  • How well are assessments aligned with the desired learning outcomes?
  • How well are assessments aligned with instructional strategies?
  • What can be done to improve alignment?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset