CHAPTER 3

Item Scaling

The first step in collecting unbiased, reliable, and valid survey data is having an unbiased, reliable, and valid survey instrument. The type of scale you use sets the foundation of the survey instrument itself, and often determines the kind of statistical analysis you will conduct.

A scale is defined as a series of item anchors that progress in value or magnitude, which offer some meaningful and quantifiable distinction among responses to a survey question. In other words, a scale is a continuous spectrum or series of categories. The purpose of scaling is to represent—usually quantitatively—an item’s, a person’s, or an event’s place in the spectrum.

The four common types of scales in business research are as follows:

  • Nominal scale: The simplest type of scale. The numbers or letters assigned to objects serve as labels for identification or classification.
  • Ordinal scale: This scale arranges objects or alternatives according to their magnitude. A typical ordinal scale in business asks respondents to rate brands, companies, experiences, etc., as “excellent,” “good,” “fair,” or “poor.” Although we know that “excellent” is better than “good,” unfortunately with ordinal scaling, it’s difficult to tell by how much.
  • Interval scale: Interval scales not only indicate order, but they also measure order (or distance) in units of equal intervals. The location of the zero point in interval scaling is arbitrary. A classic example of an interval scale is the Fahrenheit temperature scale. For most behavioral business research, interval scales are typically the best form of measurement.
  • Ratio scale: Ratio scales have an absolute zero (e.g., money, or weight). The absolute zero represents a point on the scale where there is an absence of the given attribute.

The four scale types are depicted in Exhibit 3.1.

The Likert scale, which is an ordinal scale, is one of the most common scales you will come across in survey research. The original Likert scale was developed in 1932 by Rensis Likert, a psychologist in England, who was interested in measuring people’s opinions or attitudes on a variety of items. He developed a 7-point, bipolar agreement scale as a result. This scale has been used since then, and is probably the most widely used scale aside from the dichotomous yes/no scale. Often, if you are asked to create a questionnaire that uses a rating scale, the requestor likely means they want responses to their questions to be on a Likert-style scale. When using the Likert-style scale, one of the first questions to ask is how many anchor points you want to include. The most common number of anchor points are 5 or 7, and they usually range from Disagree (or Strongly Disagree) to Agree (or Strongly Agree). Examples of Likert scale anchors are given in Exhibit 3.2.

Exhibit 3.2

Likert Scales

1 = Strongly unfavorable to the concept

2 = Somewhat unfavorable to the concept

3 = Neutral

4 = Somewhat favorable to the concept

5 = Strongly favorable to the concept

or

1 = Extremely unfavorable to the concept

2 = Strongly unfavorable to the concept

3 = Somewhat unfavorable to the concept

4 = Neutral

5 = Somewhat favorable to the concept

6 = Strongly favorable to the concept

7 = Extremely favorable to the concept

or

1 = Agree

2 = Neutral

3 = Disagree

There are trade-offs between choosing a 5-point scale and a 7-point scale. The 7-point scale provides more choices for the respondent and a more detailed response set for the analyst, which allows for more variability in responses; however, more choices may, in turn, lengthen the time to complete the survey. Response options that are too detailed may require more thought and time to complete: “Was I really somewhat satisfied or just satisfied?” Furthermore, the additional two anchors of a 7-point scale may not contribute substantially to your data, depending on the purpose or intent of the questions you are asking.

On the other hand, having too few choices (i.e., 5 anchors or less) can be frustrating to the respondents, especially if none of the anchors truly fit their actual opinion. A 3-point scale is displayed at the bottom of Exhibit 3.2. Here, you can see how limiting the smaller scale is in terms of understanding the respondents’ true answers.

Each survey respondent is given a summated score (i.e., the sum of their ratings for all items) based on their responses to the survey. This summated score serves as the “final score” for each respondent. On some scales, you may have items that are actually reversed from the normal direction of the scale based on the question you’re asking. These are called reverse scored items. You will need to reverse the response value for each of these items before summing the total. When using reverse scored items, you will need to recalculate the directionality of the item to ensure it is appropriately scored. In Exhibit 3.3, Item 1 has a negative response associated with a higher numeric scale value; however, Item 2 has a negative response associated with a lower numeric scale value. The scoring of Item 1 would have to be reversed before it could be summed with Item 2.

Bipolar questions offer positive, neutral, and negative choices. Whenever you are dealing with bipolar answers, it is important to have an odd number of choices, no matter how large the scale. This becomes even more important when using a Likert scale with qualitative data. In this situation, the respondent is forced to decide whether they lean more toward the side of agreement or disagreement for each item. Exhibit 3.4 illustrates bipolar and unidirectional Likert scales.

In Exhibit 3.4, the first bipolar scale is designed to allow respondents to agree, disagree, or remain neutral about their level of satisfaction with their recent bonus check. The second unidirectional scale is designed to only capture positive responses. In this instance, it is probably safe to say everyone surveyed would feel at least somewhat positive to get a surprise bonus check for $5,000; so, the scale only begins at a neutral response. These unidirectional scales are easily converted to quantitative scales if they are designed correctly, which allows for a rigorous analysis.

If the bipolar scale only had 4 points, we would not have been able to see neutral responses; however, often less rigorous surveyors will simply average the responses to artificially create a neutral response, as shown in Exhibit 3.5. This is inaccurate and should be avoided.

What Is to Be Measured?

Researchers have the benefit of selecting their own measuring system; however, the first question they should ask themselves is “What should I measure?” Answering this question is not as simple as it may seem. Both defining the concept you want to measure, and figuring out how best to measure it are complex questions, in part because there is often more than one way of measuring a single concept. Further, true measurement of concepts requires a process of precisely assigning scores or numbers to the attributes of people or objects. To have precise measurement in business research requires a careful conceptual definition, an operational definition, and a system of consistent rules for assigning numbers or scales.

  • Conceptual definitions: Before the measurement process can occur, the researcher must identify and define the concepts relevant to the problem. A concept (or construct) is a generalized idea about a class of objects, attributes, occurrences, or processes. Concepts such as brand loyalty, personality, and so on, present noteworthy challenges in terms of definition and measurement.
  • Operational definitions: Concepts must be operationalized to be accurately measured. An operational definition gives meaning to a concept by specifying how that concept is specifically defined in the context of the current research. It specifies what the investigator must do to measure the concept under investigation.
  • Rules of Measurement: A rule is a guide instructing us what to do. An example of a measurement rule might be as follows: “Assign the numerals 1 through 7 to individuals according to how brand loyal they are. If the individual is an extremely brand loyal individual, assign a 1; if the individual demonstrates no brand loyalty, assign a 7.” Operational definitions help the researcher specify the rules for assigning numbers.

Attitude Measurement

One of the most common reasons for collecting survey data is to measure employee or consumer attitudes. Due to the importance of this application, the next few pages are dedicated to this topic.

To measure attitude effectively, we first need to operationalize the term “attitude.” Attitudes are an enduring and consistent disposition, thought, or feeling about someone or something, including persons, events, and objects, that is expressed in someone’s consequent behavior or manner.

Attitudes are composed of three components:

  • Affective component: Reflects an individual’s general feelings toward an object.
  • Cognitive component: Represents one’s awareness of, and knowledge about, an object.
  • Behavioral component: Reflects behavioral expectations, such as buying intentions.

Attitudes are considered latent constructs, or variables that are not directly observable, but measurable by an indirect means, such as verbal expression or overt behavior. Obtaining verbal statements from respondents generally requires that the respondent perform a task such as ranking, rating, sorting, or making a choice or a comparison.

The most common techniques for measuring attitudes include the following:

  • Ranking: Requires that the respondent rank order a small number of objects in overall performance, based on some characteristic or stimulus. Consumers often rank order their preferences. An ordinal scale may be used to determine respondent attitudes toward a set of objects or attributes by asking the respondents to rank order them from most preferred to least preferred. See Exhibit 3.6.

  • In paired comparisons; the respondents are presented with two objects at a time and are asked to pick the one they prefer. Ranking objects with respect to one attribute is not difficult if only a few products are compared, but as the number of items increases, the number of comparisons increases geometrically (n*(n−1)/2). If the number of comparisons is too great, respondents may experience survey fatigue, and eventually, fail to make careful and meaningful discrimination among objects.
  • Rating: Asks the respondent to estimate the magnitude of a characteristic, or quality, that an object possesses. The respondent’s position on a scale is where he or she would rate an object. See Exhibit 3.7.

  • Sorting: The respondent might be presented with several concepts typed on cards and be asked to arrange the cards into several piles, or otherwise classify or categorize the concepts. Sorting tasks requires that respondents indicate their attitude or beliefs by arranging items. A commonly used sorting technique is R.H. Bruskin’s Association–Identification–Measurement (A.I.M.) technique, which measures how well customers associate and identify elements of advertising with a product.

The Constant Sum Sorting Scale is a technique wherein the respondents are asked to allocate a constant sum of units, such as points, dollars, chips, or chits among the stimulus objects according to some specified criterion. In other words, a Constant Sum Sorting Scale is a scaling technique that involves the assignment of a fixed number of units to each attribute of the object, reflecting the importance a respondent attaches to a given object. This Constant Sum Sorting Scale works best with respondents having a higher education level; the results will approximate interval measures. An example of a Constant Sum Sorting Scale is given in Exhibit 3.8.

Exhibit 3.8

Constant Sum Sorting Scale

Divide 100 points among each of the following brands according to your preference for the brand:

Brand A _______________

Brand B _______________

Brand C _______________

  • Choice: Choosing between two or more alternatives is another type of measurement. It is assumed that the chosen object is preferred over the other. See Exhibit 3.9.

Of the techniques for measuring attitudes listed above, Rating Scales are perhaps the most common form of measuring attitudes. Some examples of rating scales are as follows.

  • Simple attitude rating scale: In its most basic form, attitude scaling asks that an individual agree with a statement or respond to a single question. This type of self-rating scale classifies respondents into one of two categories, giving it the same limited properties of a nominal scale; however, such scales are beneficial for questionnaires that are extremely long, and for various other reasons. Many simplified scales are merely checklists.
  • Continuum attitude rating scale: Most attitude theorists believe that attitudes vary along a continuum. The purpose of an attitude scale is to find out an individual’s position on the continuum. These simple scales do not allow for making fine distinctions in attitudes, but several other scales have been developed that do provide more precise measurement.
  • Category rating scale: A category scale is a more sensitive measure than a scale having only two response categories since it provides more information. The way a question is worded is an extremely important factor in the usefulness of these rating scales.
  • Summated rating scale: The Likert scale is an extremely popular means for measuring attitudes using a summated approach. With the Likert scale, respondents indicate their own attitudes by checking how strongly they agree or disagree with carefully constructed statements toward the attitudinal object. Their responses to these questions are summed.

To measure attitude, researchers assign scores or weights, which are not printed on the questionnaire, to the answers. Strong agreement, for example, might indicate the most favorable attitudes on whatever question or statement is being presented about an object or topic. For this response, the weight of five would be assigned to indicate “Strongly Agree.” If, in this same question set, a negative question or statement were presented about the same object or topic, the weights would then be reversed and a response of “Strongly Disagree” would be assigned the weight of five. The total score is the summation of the weights assigned to an individual’s total responses

In the Likert procedure, many statements must first be generated to assess or describe a certain construct. Once the initial item list has been created, an item analysis is then performed to determine which of those initial items are the strongest. The strongest items are retained for the final scale. The purpose of the item analysis is to ensure the items retained are the strongest predictors of positive or negative responses, and therefore, truly discriminate among those with positive and negative attitudes. Items that are poor, because they lack clarity or elicit mixed response patterns, are eliminated from the final statement list. Questions with no variation are also removed. This step in the design of a questionnaire is too often neglected by business researchers, but is essential to truly ensuring your questionnaire is as strong as possible.

Exhibit 3.10 provides examples of wording on rating scales for various attributes.

Sometimes, the results are displayed graphically to provide a quick overall profile of the findings, as shown in Exhibit 3.11. In this example, an 11-point Likert rating instrument was used to compare the attitudes of consumers to two different airlines (indicated by either a solid or broken line). The horizontal lines indicate that there were 14 questions, along which both horizontal lines are graphed. The ticks on the lines correspond to the rating values from one to eleven (left to right). Positive attitudes are indicated on the left; conversely, negative attitudes are indicated on the right.

  • Semantic differential: The semantic differential is a series of attitude scales. This popular technique for measuring attitude consists of the identification of a person, work group, company, brand, store, or other subject, followed by a series of 7-point bipolar rating scales. Bipolar adjectives such as “good” and “bad” serve as anchors on either end (or pole) of the scale. The respondent makes repeated judgments of the construct under investigation on each of the scales. Business researchers have found the semantic differential versatile and have modified the use of the scale for business applications. Replacing the bipolar adjectives with descriptive phrases is an adaptation in image studies.

    A weight is assigned to each position on the rating scale. Traditionally, scores are 7, 6, 5, 4, 3, 2, 1 or +3, +2, +1, 0, −1, −2, −3. Many business researchers assume that the semantic differential provides interval data, but some critics argue that the data has only ordinal properties since the weights are arbitrary.

  • Numerical scales: Numerical scales have numbers as response options—rather than “semantic space” or verbal descriptions. If the scale items have five response positions, the scale is called a 5-point numerical scale. The numerical scale utilizes bipolar adjectives in the same manner as the semantic differential.
  • Staple scales: Modern versions of the staple scale place a single adjective as a substitute for the semantic differential when it is difficult to create pairs of bipolar adjectives. For example, you may have a scale that has one extreme labeled as “Low Quality,” and at another extreme, the scale labeled as “High Quality.” The advantages and disadvantages, as well as the results of a staple scale, are very similar to those for a semantic differential; however, the staple scale tends to be easier to conduct and administer.
  • Graphic rating scales: A graphic rating scale presents respondents with a graphic continuum. The respondents can choose any point on the continuum to indicate their attitude. Typically, the respondent’s score is determined by measuring the length (in millimeters) from one end of the continuum to the point marked by the respondent.

The graphic scale has the advantage of allowing the researchers to choose any interval they wish for the purposes of scoring. The disadvantage of the graphic scale is that there are no standard answers.

A frequently used variation on the graphic scale design is the scale ladder; this and other picture or graphic response options enhance communication with respondents.

  • Thurstone equal-appearing interval scale: The construction of the Thurstone scale is a rather complex process that requires two stages. The first stage is a ranking operation performed by judges, who assign scale values to attitudinal statements. The second stage consists of asking subjects to respond to the attitudinal statements.
  • Behavioral differential scales: To measure the behavioral component of an attitude requires specific scales. The behavioral component of an attitude involves the behavioral expectation of an individual toward an attitude object. Category scales to measure the behavioral component of an attitude ask a respondent the “likelihood of” or “intention to” perform some future action (e.g., make a purchase). The wording of statements used in these cases often includes phrases such as “I would recommend,” “I would write,” or “I would buy,” to indicate behavioral tendencies. An example of this type of scale is the behavioral differential scale (see Exhibit 3.12). The behavioral differential instrument has been developed for measuring the behavioral intentions of subjects toward any object or category of objects. A description of the object to be judged is placed on the top of a page, and the subjects indicate their behavioral intentions toward this object on a series of scales.

A table summary comparison of the rating scales is provided in Exhibit 3.13.

Types of Survey Questions

  • Open-Ended Response Question

A question that poses some problem and asks the respondent to answer in his or her own words.

What things do you like most about your job?

What names of local banks can you think of offhand?

What comes to mind when you look at this advertisement?

Do you think that there are some ways in which life in the United States is getting worse? Please explain why you feel this way.

  • Fixed-Alternative Question

A question in which the respondent is given specific limited alternative responses and asked to choose the one closest to his or her own viewpoint.

Did you work overtime and/or did you work at more than one job this past week?

Yes ____ No ____

Compared to ten years ago, would you say that the quality of most products made in Japan is higher, about the same, or not as good?

Higher ____ About the same ____ Not as good ____

How much of your shopping for clothing and household items do you do in discount stores? Would you say:

All ____

Most ____

About half ____

About one-quarter ____

Less than one-quarter ____

In management, is there a useful distinction between what is legal and what is ethical?

Yes ____    No ____

In Aesop’s fable “The Ant and the Grasshopper,” the ant spent his time working and planning for the future, while the grasshopper lived for the moment and enjoyed himself. Which are you more like?

 The ant  The grasshopper

  • Simple-Dichotomy Question

A question that requires the respondent to choose one of two dichotomous alternatives.

Did you make any long-distance calls last week?

 Yes  No

  • Determinant-Choice Question

A type of fixed-alternative question that requires a respondent to choose one (and only one) response from among several possible alternatives.

Please give us some information about your flight. In which section of the aircraft did you sit?

 First class  Business class  Coach class

  • Graphic Rating Scale

A measure of attitude consisting of a graphic continuum that allows respondents to rate an object by choosing any point on the continuum.

Please evaluate each attribute in terms of how important it is to you by placing an “X” at the position on the horizontal line that most reflects your feelings.

Seating comfort    Not Important _____    Very Important _____

In-flight meal     Not Important _____     Very Important _____

Air fare     Not Important _____    Very Important _____

  • Graphic Rating Scale emphasizing Pictorial Visual Communications

  • Staple Scale

For this question, select the positive numbers for words you think accurately describe the supervisor. Larger positive numbers indicate greater accuracy of the word in describing the supervisor. Select the negative numbers for words you think do not accurately describe the supervisor. Larger negative numbers indicate less accuracy of the word in describing the supervisor. Therefore, select positive numbers for words that you think are very accurate, and select negative numbers for words that you think are very inaccurate.

  • Frequency-Determination Question

A type of fixed-alternative question that asks about the general frequency of an occurrence.

How frequently do you watch the television channel, MTV?

Every day....................................................................

5–6 times a week........................................................

2–4 times a week........................................................

Once a week...............................................................

Less than once a week.................................................

Never..........................................................................

  • Behavioral Questions

Below are examples of behavioral questions. Typical behavioral questions include “I” statements or phrases.

I would write a letter to my Congressman or other government official in support of this company if it were in a dispute with the government.

  • Extremely likely
  • Very likely
  • Somewhat likely
  • Neither likely nor unlikely (i.e., about 50–50 chance)
  • Somewhat unlikely
  • Very unlikely
  • Extremely unlikely

How likely is it that you will change jobs in the next six months?

  • I definitely will change.
  • I probably will change.
  • I might change.
  • I probably will not change.
  • I definitely will not change.

The U.S. Census Bureau has used a scale of subjective probabilities, ranging from 100 for “absolutely certain” to 0 for “absolutely no chance,” to measure expectations. Management researchers have used the following similar subjective probability scale to estimate the chance of job candidates accepting a position, if they were offered:

____ 100% (Absolutely certain) I will accept

____ 90% (Almost sure) I will accept

____ 80% (Very big chance) I will accept

____ 70% (Big chance) I will accept

____ 60% (Not so big a chance) I will accept

____ 50% (About even) I will accept

____ 40% (Smaller chance) I will accept

____ 30% (Small chance) I will accept

____ 20% (Very small chance) I will accept

____ 10% (Almost certainly not) I will accept

____ 0% (Certainly not) I will accept

Selecting a Measurement Scale

There is no single “best” scale that applies to all research projects. The scale you choose will be a function of the nature of the attitudinal object being measured, the manager’s defined problem, and/or the linkages to other choices that have already been made (e.g., telephone survey vs. mail survey). There are several issues that will be helpful to consider:

  • Is a ranking, sorting, rating, or choice technique best? The answer to this question is largely determined by the definition of the problem, and especially by the desired type of statistical analysis.
  • Should a monadic or comparative scale be used? If a scale is other than a ratio scale, the researcher must decide whether to use a standard of comparison or not. A monadic rating scale uses no such comparison; it asks a respondent to rate a single concept in comparison with a benchmark. In many cases, “the ideal situation” presents a reference for comparison with the actual situation.
  • What type of category labels, if any, will be used for the rating scale? We have discussed verbal labels, numerical labels, and unlisted choices. The maturity and educational levels of the respondents and the required statistical analysis will influence this decision.
  • How many scale categories or response positions are required to accurately measure an attitude? The researcher must determine the number of meaningful positions that is best for each specific project.
  • Should a balanced or unbalanced rating scale be used? The fixed-alternative format may be balanced, with a neutral or indifferent point at the center of the scale, or unbalanced. Unbalanced scales are used when responses are expected to be skewed to one end of the scale; an unbalanced scale may eliminate this “end piling.”
  • Should respondents be given a forced-choice scale or a nonforced-choice scale? In many situations, a respondent has not formed an attitude toward a concept, and simply cannot provide an answer. If many respondents in the sample are expected to be unaware of the attitudinal object under investigation, this problem may be eliminated by using a nonforced-choice scale that provides a “no opinion” category. The argument for forced choice is that people really do have attitudes, even if they are unfamiliar with the attitudinal object.
  • Should a single measure or an index measure be used? The researcher’s conceptual definition will be helpful in making this choice. The researcher has many scaling options. The choice is generally influenced by what is planned for the later stages of the research project.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset