According to Instructional Design Competencies: The Standards (2013), instructional designers should be able to (Koszalka, Russ-Eft, and Reiser 2013, 54) “design learning assessment (advanced). This is an advanced competency and it includes three performance statements, two are essential and one is advanced.” They include: “(a) identify the learning processes and outcomes to be measured (essential); (b) construct reliable and valid methods of assessing learning and performance (advanced); (c) ensure that assessment is aligned with instructional goals, anticipated learning outcomes, and instructional strategies (essential)” (Koszalka, Russ-Eft, and Reiser 2013, 54).
This chapter provides additional information about this competency. It offers advice on how to identify the learning processes and outcomes to be measured, construct reliable and valid methods of assessing learning and performance, and ensure that assessment is aligned with instructional goals, anticipated learning outcomes, and instructional strategies.
Instructional designers should usually develop performance measurements during or immediately following the preparation of performance objectives. Measurements of all kinds—sometimes called metrics—have been commanding attention in recent years. One reason has been growing demand by stockholders and stakeholders for accountability from all organizational levels. Another reason is that instructional designers are being held accountable for showing results for whatever training investments are made by their organizations.
Performance measurements are various means established by instructional designers for monitoring learner achievement. Paper-and-pencil tests are perhaps the most common. Test items may be developed directly from performance objectives before instructional materials are prepared. In this way, accountability for results is built into instruction early on.
However, paper-and-pencil testing is not the only way to assess learner achievement. Other methods may also be used. For instance, trainees can be observed on the job as they perform the tasks they have learned. Computerized skills assessment is also becoming common, as is portfolio analysis, in which work samples are assessed.
Performance measurements become benchmarks that, with performance objectives (discussed in the previous chapter), provide guidance to prepare instructional programs. They help answer an age-old question about every instructional experience: “What should be taught?” (Egan 1978, 72). They are important for three major reasons. First, they ensure economical choice of instructional content. Establishing performance measurements is part of the preliminary work to be completed before instructional materials are developed; it helps identify the content that should be included and the success level expected of learners upon completion of instruction. Second, performance measurements provide a basis for learner accountability to ensure that learner progress toward predetermined performance goals can be monitored during and after instruction. Third, performance measurements can help link up learner achievement to organizational strategic plans.
Instructional designers should be capable of developing tests, written questionnaires, interviews, and other methods of measuring performance. The performance measures should be written and correspond to performance objectives, rely on methods of measuring learning outcomes, comply with time and instructional constraints, and meet requirements for validity and reliability. Instructional designers should be able to develop performance measurements when furnished with necessary information on the characteristics of learners, the settings in which they are expected to perform, constraints on performance and instructional development, instructional objectives, and plans for analyzing needs and evaluating results.
Stated simply, instructional designers should be able to answer two basic questions before they prepare instructional materials: (1) what should be measured? and (2) how should it be measured? To answer the first question, instructional designers should determine the purpose of the measurement and focus on methods of measuring instruction. To answer the second question, they should be able to design instruments—and write items for the instruments—to achieve the intended purpose.
Once performance objectives have been written based on work requirements, instructional designers should decide:
Instructional designers should always begin by clarifying their purposes for measuring performance. There are at least four possible purposes (Kirkpatrick and Kirkpatrick 2006):
After determining the purpose of performance measurement, instructional designers should next determine the sources of information used in measurement. There are three major sources of information. Performance objectives are the first. They should provide clues about what to measure because each objective must contain a measurable criterion for assessment. To measure performance instructional designers should consider how well learners have met the criterion set forth in each objective. Each objective should be directly tied to meeting job-related learning needs. Hence, measuring objectives provides information about how well learning needs are being met by instruction.
Learner (worker) performance is the second source of information. Since instruction is—or should be—intended to improve individual performance in the workplace, information about what to measure should result from analysis of worker responsibilities, work standards, historical patterns of experienced workers' performance problems on the job, and forecasts of likely future job changes. Using job descriptions, performance appraisal data, work standards, and such other information as emerges from the results of participant reaction sheets, instructional designers should be able to develop performance measures linked directly to successful job performance.
Stakeholder preferences are the third source of information. Stakeholders are people who have a vested interest in instructional outcomes. Consider, for instance, what top managers and other interested parties want to know about instruction or its results. Often, instructional designers find that two key questions merit special consideration when measuring instruction or its results: (1) who wants to know? and (2) what do they want to know? A third question that may be addressed is, why do they want to know? Some instructional designers find it helpful to consult a menu of general questions about performance measures when deciding what to measure. Rae (1986, 9–10) developed such a menu, shown below, that still remains useful.
Issue | Questions |
Content of instruction | Is it relevant and in step with the instructional needs? Is it up-to-date? |
Method of instruction | Were the methods used the most appropriate ones for the subject? Were the methods used the most appropriate for the learning styles of the participants? |
Amount of learning | What was the material of the course? Was it new to the learner? Was it useful, although not new to the learner, as confirmation or revision material? |
Instructor skills | Did the instructor have the attitude and skill to present the material in a way that encouraged learning? |
Length and place of instruction | Given the material essential to learning, was the length and pace of the instruction adequate? Were some aspects of instruction labored and others skimped? |
Objectives | Did the instruction satisfy its declared objectives? Was the learner given the opportunity to satisfy any personal objectives? Was this need welcomed? Were personal objectives satisfied? |
Omissions | Were any essential aspects omitted from the learning event? Was any material included that was not essential to the learning? |
Learning transfer | How much of the learning is likely to be put into action when the learner returns to work? If it is to be a limited amount only or none, why is this? What factors will deter or assist the transfer of learning? |
Accommodation | If course accommodation is within the control of the instructor or relates to the instructional event, he or she may wish to ask whether the hotel or conference center training center was suitable. Was the accommodation acceptable? Were the meals satisfactory? |
Relevance | Was this course/seminar/conference/workshop/tutorial/coaching assignment/project the most appropriate means of presenting a learning opportunity? |
Application of learning | Which aspects of your work now include elements which result directly from the learning event? Which new aspects of work have you introduced because of your learning? Which aspects of your previous work have you replaced or modified because of the learning? Which aspects of your learning have you not applied? Why not? |
Efficiency | How much more efficient or effective are you in your work because of the instructional experience? Why or why not? |
Hindsight | With the passage of time and attempts to apply the learning, are there any amendments you would wish to make to the training you received? |
When deciding how to measure performance, instructional designers should apply the same classic criteria that Newstrom and Lilyquist (1979) have suggested in selecting a data collection method for needs assessment. The following five issues may warrant consideration:
Different methods of measuring performance earn high, moderate, or low ratings on each of these criteria. It is usually necessary to identify priorities—that is, determine which one is the most important, second most important, and so on.
Having decided on a purpose (what is to be measured) and a measurement method (how it will be measured), instructional designers are then ready to develop measurement instruments. Instruments may be classified into three general types: (1) questionnaires, interview guides or schedules, observation forms, simulations, and checklists, (2) criterion-referenced tests, and (3) others. There are 10 basic steps to be taken during preparing a measurement instrument:
These steps are summarized in the following paragraphs.
Instructional designers should develop performance measurements by thinking through exactly why they are measuring instruction and, more important, what results they wish to achieve. Performance objectives are one starting point, since one purpose of measurement should usually be to determine how well learners have met instructional objectives by the end of the instructional experience. Instructional designers should ask themselves, among other questions, this one: How can I find out whether these results are being achieved during the instructional experience and whether they were achieved following the instructional experience? At this point they can select or prepare an instrument well suited to helping answer this question.
If performance will be measured using an instrument developed by someone else, instructional designers should consider the title to see if it accurately describes what they wish to measure. If the instrument will be tailor-made, the title should be chosen with great care. The reason: by selecting a title, instructional designers focus their thinking on exactly what will be measured.
Instructional designers can often save themselves considerable time and effort by locating previously prepared instruments. One way to do that is to network with other instructional designers to find out whether they have developed instruments for similar purposes. In addition, instructional designers can sometimes successfully track down elusive instruments or research studies by using specialized reference guides. Tests in print can be located through the impressive library of the Educational Testing Service in Princeton, New Jersey, which maintains a collection of 10,000 tests.
Background research on instrumentation will rarely be a complete waste of time. Even when instructional designers cannot locate instruments that measure exactly what they want, they may still locate examples that will stimulate new ideas about item layout or item sequence.
When previously prepared instruments are found, instructional designers should decide whether to use them as they are or modify them to meet special needs. If previously prepared instruments can be easily modified, instructional designers can reduce the time and effort to prepare and validate an instrument. But if efforts to locate instruments or research are to no avail, then it will be necessary to prepare a tailor-made instrument. Begin instrument development by addressing several important questions: Who will be measured? Who will conduct the measurement? What will be measured? When will the measurement occur? Where will the measurement be conducted? How will the measurement be conducted?
Relying on instructional objectives or other sources as a starting point, instructional designers should next decide what questions to ask to measure the changes wrought by the instructional experience. If a previously prepared instrument was located, each item must be reviewed to ensure that it is appropriate. Drafting original items or questions for interviews, questionnaires, observation forms, simulations, or checklists is a highly creative activity. Generate items or questions using focus groups or other creative methods.
When drafting items, instructional designers should consider item format. Item format refers to the way performance is measured. Questionnaires or interview guides, for instance, may rely on open-ended items, closed-ended items, or some combination. Open-ended items produce qualitative or essay responses. The question “What do you feel you have learned in this instructional experience?” is an open-ended item. Closed-ended items produce quantifiable responses. Respondents asked to “rate how much you feel you learned during this instructional experience on a scale from 1 to 5, with 1 representing ‘very little’ and 5 representing ‘very much,’” are answering a closed-ended item. An instrument relies on a combination when it contains both open-ended and closed-ended items.
Open-ended items are frequently used in conducting exploratory measurement studies. While the information they yield is difficult to quantify and analyze, they may also establish response categories for closed-ended instruments. Closed-ended items are frequently utilized in analytical measurement studies. Although the information they produce is easily quantified and analyzed, it can sometimes mislead if respondents are not given response categories. When that happens, respondents will select an approximation of what they believe and reply accordingly. Item format has a different, although related, meaning for observation forms, simulations, or checklists. These instruments are usually designed around observable behaviors associated with the instructional objectives or competent on-the-job performance. Instructional designers may prepare these instruments to count the frequencies of a behavior (How often did the learner do something?), assess the quality of a behavior (How well did the learner perform?), or both. The instrument user may exercise considerable flexibility in identifying what behavior to count or assess. Alternatively, the user may not exercise flexibility in assessing behaviors, because categories are predefined or methods of assessment have been provided on the instrument itself.
Item format has yet another meaning regarding tests. Developing criterion-referenced tests poses a challenge somewhat different from developing questionnaires, interviews, simulations, or other measurement instruments. Test preparation is an entire field of its own. When developing criterion-referenced tests, “the verb component of the instructional objective indicates the form that a test item should take” (Kemp 1985, 161). Examples of behaviors specified in instructional objectives and appropriately matched test item formats are shown in Table 13.1.
Table 13.1 Behaviors Specified in Instructional Objectives and Corresponding Test Items
Type of test item | Brief description of test-item format | Behavior (verb specified in the instructional objective) |
1. Essay (Example: “What are the chief advantages and disadvantages of the essay format as a test item?”) |
A type of test item requiring a learner to respond in essay format. This type of item is appropriate for assessing higher levels of cognition—such as analysis, synthesis, and evaluation. | Construct Define Develop Discuss Generate Locate Solve State |
2. Fill-in-the-blank (Example: “The________ -in-the-blank is a type of test item.”) |
A type of test item requiring the learner to fill in the blank with an appropriate word or phrase. Scoring can be objective because the required response is quite specific—often only one word is correct. | Construct Define Identify Locate Solve State |
3. Completion (Example: “A type of test item that requires the completion of a sentence is called the__________ .”) |
A type of test item that closely resembles the fill-in-the-blank type, except that the learner is asked to complete a sentence stem. | Construct Define Develop Discuss Generate Identify Locate Solve State |
4. Multiple-choice (Example: “A type of test item requiring the learner to choose from more than one possible answer is the (a) multiple-choice; (b) essay; (c) completion.”) |
Kemp (1985, p. 162) calls multiple-choice “the most useful and versatile type of objective testing.” Learners must choose between three and five options or alternatives as the answer to a question. | Discriminate Identify Locate Select Solve |
5. True-false (Example: “A true-false test item is less versatile than a multiple-choice one.” True-False) |
A type of test item in which learners are asked to determine whether a statement is true or false. | Discriminate Locate Select Solve |
6. Matching (See the example below.) |
A type of test item in which learners are asked to match up items in one column with items in another column. | Discriminate Locate Select |
7. Project (Example: “Write an essay question to describe ten steps in preparing an assessment instrument.”) |
A type of test in which learners are asked to demonstrate the ability to perform a task they have (presumably) learned through participation in an instructional experience. | Construct Develop Generate Locate Solve |
Source: Taken from W. Rothwell and H. Kazanas, Mastering the Instructional Design Process: A Systematic Approach (4th ed.) (San Francisco: Pfeiffer, 2008), 201–202.
One choice is to sequence items in a logical order based on work tasks. Another choice is to sequence items according to a learning hierarchy.
Sometimes called instrument pretesting, this step should not be confused with learner pretesting. If possible, instructional designers should select a sample of people representative of the learner population to participate in the instrument pre-test and ask for their help in identifying wording that is unclear or is otherwise inappropriate. Instructional designers should explain the instrument items to the group rather than ask them to answer the questions. Their responses should be noted for use during the next step.
If a complete revision is necessary, which should rarely be the case, another small group should be selected for a second instrument pretest. Otherwise, instructional designers should revise items, based on their notes from the previous step, to improve clarity.
The next step is a field test of the instrument on a larger group under conditions resembling, closely, those in which the instrument will later be used. The results of the field test should be noted.
Instructional designers should use the instrument but should also establish a way of tracking future experience with it. The results must be monitored. If tests are administered, instructional designers should periodically conduct item analysis to determine what questions the learners are missing and how often they are missing them. If questionnaires or interviews are used to measure performance, instructional designers must note the response patterns they receive to determine whether questions are yielding useful answers. If instructional designers are using structured observation, they should periodically review the categories they initially created.
As performance measurements are made using instruments, instructional designers gain experience. They can take advantage of that experience by periodically revising the instrument, or items on it. Revisions should also be made whenever changes are made to performance objectives or when new performance objectives are added.
Apart from questionnaires, interviews, simulations, and checklists, other methods may measure participant reactions, participant learning, on-the-job performance change, or organizational impact. However, not every method is appropriate for every purpose. These methods (note we do not call them items or instruments) include advisory committees, external assessment centers, attitude surveys, group discussions, exit interviews, and performance appraisals.
An advisory committee is a group comprising stakeholders in instructional experiences. A committee may be established as standing (permanent and formal) or ad hoc (temporary and informal). One way to use an advisory committee is to ask its members to observe an instructional experience and assess how well they feel its objectives are achieved. Another way is to direct results of participant tests or other measures to committee members for interpretation.
An external assessment center is measuring individual knowledge and skills. It is an extended simulation of job or group work. It could be used—although it would admittedly be expensive to do so—to determine what measurable change resulted from an instructional experience.
An attitude survey is usually intended to assess individual perceptions about working conditions, coworkers, work tasks, and other issues. It could determine people's perceptions of what changes or how much change resulted from instructional experiences.
A group discussion is a meeting. It could identify measurement issues or assess a group's perceptions about what changes or how much change occurred because of an instructional experience.
An exit interview is a meeting with an employee just prior to the individual's departure from an organization, department, or work unit. Sometimes, exit interviews may be combined with questionnaires mailed to terminating employees some time after they leave the organization. Exit interviews may identify measurement issues or assess an individual's perceptions about what changes or how much change occurred because of an instructional experience.
A performance appraisal is an assessment of an individual's job-related activities and results over a predetermined time frame. It could document a supervisor's perceptions of what changes or how much change occurred because of an individual's participation in an instructional experience.
Three issues should be considered in any effort to assess or evaluate people. One is reliability; one is validity; and one is credibility. The Standards covers the first two. The third is also worthy of consideration because it is perhaps most practical of all.
Reliability refers to the consistency of measures. It means that an assessment consistently measures what it is supposed to. There are several categories of reliability:
Validity refers to how accurately a measure responds to the real world. Can an assessment measure what it is supposed to measure? Many forms of validity exist. Among them:
Validity and reliability are important. Most forms of assessments must be tested for their reliability or validity, and that is especially important to do if employment decisions will be made because of a worker's scores on an assessment. Often, competent statistical and psychometric consulting help must be used to ensure that all tests of validity and reliability are performed properly. That is true if assessment results may be challenged on the basis of employment discrimination.
Credibility refers to the trustworthiness of the assessment or evaluation method or tool.
Stated another way, do people such as managers believe in the assessment, and does a high or low score mean something that they believe? In many organizations, instructional designers may find that credibility with managers and workers is far more important for daily actions and decisions than statistical tests of reliability and validity.
To ensure that assessment is aligned with instructional goals, anticipated learning outcomes, and instructional strategies, instructional designers should develop simple yet elegant strategies to double-check what they have designed with:
A good time to do that is during formative evaluation, assuming it is conducted. If it is not, then answering questions about these issues should be integrated with pilot tests or rapid prototyping and field-based testing efforts. It is as simple as asking stakeholders such questions as: