Chapter 2

Types of Variables and Measurement and Accuracy Scales

Abstract

This chapter highlights the importance of defining variable measurement scales to develop researches, and to treat and analyze data. The differences between metric or quantitative variables, and nonmetric or qualitative variables are established, as well as the circumstances in which each type of variable must be used, based on the research objectives. For each type of variable, we must use the most suitable statistical treatment.

Keywords

Types of variables; Metric or quantitative variable; Nonmetric or qualitative variable; Scales of measurement; Nominal; Ordinal; Scales of accuracy; Discrete; Continuous; Dichotomous or binary

And God said: π, i, 0, and 1, and the Universe was created.

Leonhard Euler

2.1 Introduction

A variable is a characteristic of the population (or sample) being studied, and it is possible to measure, count, or categorize it.

The type of variable collected is crucial in the calculation of descriptive statistics and in the graphical representation of results, as well as in the selection of the statistical methods that will be used to analyze the data.

According to Freund (2006), statistical data are the raw materials of statistical research, always appearing in cases of measurement or record of observations.

This chapter discusses the existing types of variables (metric or quantitative and nonmetric or qualitative), as well as their respective scales of measurement (nominal and ordinal for qualitative variables, and interval and ratio for quantitative variables). Classifying the types of variables based on the number of categories and scales of accuracy is also discussed (binary and polychotomous for qualitative variables and discrete and continuous for quantitative variables).

2.2 Types of Variables

Variables can be classified as nonmetric, also known as qualitative or categorical, or metric, also known as quantitative (Fig. 2.1). Nonmetric or qualitative variables represent the characteristics of an individual, object, or element that cannot be measured or quantified. The answers are given in categories. In contrast, metric or quantitative variables represent the characteristics of an individual, object, or element that result from a count (a finite set of values) or from a measurement (an infinite set of values).

Fig. 2.1
Fig. 2.1 Types of variables.

2.2.1 Nonmetric or Qualitative Variables

As we will study in Chapter 3, the representation of the characteristics of nonmetric or qualitative variables can be done through frequency distribution tables or in a graphical way, without having to calculate the measures of position, dispersion, and shape. The only exception is the mode, a measure that provides the variable's most frequent value, and it can also be applied to nonmetric variables.

Imagine that a questionnaire will be used to collect data on family income from a sample of consumers, based on certain salary ranges. Table 2.1 shows the variable categories.

Table 2.1

Family Income Ranges × Social Class
ClassMinimum Wage Salaries (MWS)Family Income ($)
AAbove 20 MWSAbove $ 15,760.00
Bfrom 10 to 20 MWSFrom $ 7880.00 to $ 15,760.00
Cfrom 4 to 10 MWSFrom $ 3152.00 to $ 7880.00
Dfrom 2 to 4 MWSFrom $ 1576.00 to $ 3152.00
EUp to 2 MWSUp to $ 1576.00

Note that both variables are qualitative, since the data are represented by ranges. However, it is very common for researchers to classify them incorrectly, mainly when the variable has numerical values in the data. In this case, it is only possible to calculate the frequencies, and not the summary measures, such as, the mean and standard deviation.

The frequencies obtained for each income range can be seen in Table 2.2.

Table 2.2

Frequencies × Family Income Ranges
FrequenciesFamily Income ($)
10%Above $ 15,760.00
18%From $ 7880.00 to $ 15,760.00
24%From $ 3152.00 to $ 7880.00
36%From $ 1576.00 to $ 3152.00
12%Up to $ 1576.00

A common error found in papers that use qualitative variables represented by numbers is the calculation of the sample mean, or any other summary measure. First of all, the researcher calculates the mean of the limits of each range, assuming that this value corresponds to the real mean of the consumers found in that range. However, since the data distribution is not necessarily linear or symmetrical around the mean, this hypothesis is often violated.

In order for us to be able to calculate summary measures, such as, the mean and standard deviation, the variable being studied must necessarily be quantitative.

2.2.2 Metric or Quantitative Variables

Quantitative variables can be represented in a graphical way (line charts, scatter plots, histograms, stem-and-leaves, and boxplots), through measures of position or location (mean, median, mode, quartiles, deciles, and percentiles), measures of dispersion or variability (range, average deviation, variance, standard deviation, standard error, and coefficient of variation), or through measures of shape, such as, skewness and kurtosis, as we will study in Chapter 3.

These variables can be discrete or continuous. Discrete variables can take on a finite set of values that frequently come from a count, such as, for example, the number of children in a family (0, 1, 2…). Conversely, continuous variables take on values that are in an interval with real numbers, such as, for example, an individual's weight or income.

Imagine a dataset with 20 people's names, age, weight and height, as shown in Table 2.3.

Table 2.3

Dataset With Information on 20 People
NameAge (Years)Weight (kg)Height (m)
Mariana48621.60
Roberta41561.62
Luiz54841.76
Leonardo30821.90
Felipe35761.85
Marcelo60981.78
Melissa28541.68
Sandro50701.72
Armando40751.68
Heloisa24501.59
Julia44651.62
Paulo39831.75
Manoel22681.78
Ana Paula31561.66
Amelia45601.64
Horacio62881.77
Pedro24801.92
Joao28751.80
Marcos49921.76
Celso54661.68

Table 2.3

The data are available in the file VarQuanti.sav. To classify the variables on SPSS (Fig. 2.2), let’s click on Variable View. Note that the variable Name is qualitative (a string), and it is measured on a nominal scale (column Measure). On the other hand, variables Age, Weight, and Height are quantitative (Numeric), and they are measured in scale (Scale). The variable scales of measurement will be studied in more detail in Section 2.3.

Fig. 2.2
Fig. 2.2 Classification of the variables.

2.3 Types of Variables × Scales of Measurement

Variables can also be classified according to the level or scale of measurement. Measurement is the process of assigning numbers or labels to objects, people, states, or events, in accordance with specific rules, to represent the quantities or qualities of the attributes. Rule is a guide, a method, or a command that tells the researcher how to measure the attribute. Scale is a set of symbols or numbers, based on a rule, and it applies to individuals or to their behaviors or attitudes. An individual’s position in the scale is based on whether this individual has the attribute that the scale must measure.

There are several taxonomies found in the existing literature to classify the scales of measurement of all types of variables (Stevens, 1946; Hoaglin et al., 1983). We will use Stevens classification because it is simple, it is widely used, and because its nomenclature is used in statistical software.

According to Stevens (1946), the scales of measurement of nonmetric, categorical or qualitative variables can be classified as nominal and ordinal, while the metric or quantitative variables are classified as interval and ratio scales (or proportional), as shown in Fig. 2.3.

Fig. 2.3
Fig. 2.3 Types of variables × scales of measurement.

2.3.1 Nonmetric Variables—Nominal Scale

The nominal scale classifies the units into classes or categories regarding the characteristic represented, not establishing any magnitude or order relationship. It is called nominal because the categories are only differentiated by their names.

We can assign numerical labels to the variable categories, but arithmetic operations, such as, addition, subtraction, multiplication, and division over these numbers are not allowed. The nominal scale only allows some basic arithmetic operations. For instance, we can count the number of elements in each class or apply hypotheses tests regarding the distribution of the population units in the classes. Thus, most of the usual statistics, such as, the mean and standard deviation, do not make any sense for nominal scale qualitative variables.

As examples of nonmetric variables on nominal scales, we can mention professions, religion, color, marital status, geographic location, or country of origin.

Imagine a nonmetric variable related to the country of origin of 10 large multinational companies. To represent the categories of the variable Country of origin, we can use numbers, assigning value 1 to the United States, 2 to the Netherlands, 3 to China, 4 to the United Kingdom, and 5 to Brazil, as shown in Table 2.4. In this case, the numbers are only labels or tags to help identify and classify objects.

Table 2.4

Companies and Country of Origin
CompanyCountry of Origin
Exxon Mobil1
JP Morgan Chase1
General Electric1
Royal Dutch Shell2
ICBC3
HSBC Holdings4
PetroChina3
Berkshire Hathaway1
Wells Fargo1
Petrobras5

This scale of measurement is known as a nominal scale, that is, the numbers are randomly assigned to the object categories, without any kind of order. To represent the behavior of nominal data, we can use descriptive statistics, such as, frequency distribution tables, bar or pie charts, or the calculation of the mode (Chapter 3).

Now, we will discuss how to define labels for qualitative variables on a nominal scale by using the SPSS software (Statistical Package for the Social Sciences). After that, we will be able to construct absolute and relative frequencies tables and charts.

Before generating the dataset, let’s define the characteristics of the variables being studied in Variable View (visualization of variables). In order to do that, click on the respective spreadsheet that is available in the lower left side of the Data Editor, or click twice on the column var.

The first variable, called Company, is a string, that is, its data are inserted as characters or letters. It was established that the maximum number of characters of the respective variable would be 18. In the column Measure, the scale of measurement of the variable Company is defined, which is nominal.

The second variable, called Country, is numerical, since its data are inserted as numbers. However, the numbers are only used to categorize or label the objects, so, the scale of measurement of the respective variable is also nominal (Fig. 2.4).

Fig. 2.4
Fig. 2.4 Defining the variable characteristics in Variable View.

To insert the data from Table 2.4, we are going to go back to Data View. The information must be typed as shown in Fig. 2.5 (the columns represent the variables and the rows represent the observations or individuals).

Fig. 2.5
Fig. 2.5 Inserting the data found in Table 2.4 into Data View.

Since the variable Country is represented by numbers, it is necessary to assign labels to each variable category, as shown in Table 2.5.

Table 2.5

Categories Assigned to the Countries
CategoriesCountry
1United States
2The Netherlands
3China
4The United Kingdom
5Brazil

In order to do that, we must click on Data → Define Variable Properties… and select the variable Country, according to Figs. 2.6 and 2.7.

Fig. 2.6
Fig. 2.6 Defining labels for each nominal variable category.
Fig. 2.7
Fig. 2.7 Selecting the nominal variable Country.

Since the nominal scale of measurement of the variable Country has already been defined in the column Measure in Variable View, we can see that it already appears correctly in Fig. 2.8. Defining the labels for each category must be done at this moment, and it can also be seen in the same figure.

Fig. 2.8
Fig. 2.8 Defining the labels for the variable Country.

The database starts being seen with the label names assigned, as shown in Fig. 2.9. By clicking on Image 1 Value Labels, located on the toolbar, it is possible to alternate from the numerical values of the nominal or ordinal variable and their respective labels.

Fig. 2.9
Fig. 2.9 Dataset with labels.

Having structured the dataset, it is possible to construct absolute and relative frequencies tables and charts on SPSS.

The descriptive statistics to represent the behavior of a single qualitative variable and of two qualitative variables will be studied in Chapters 3 and 4, respectively.

2.3.2 Nonmetric Variables—Ordinal Scale

A nonmetric variable on an ordinal scale classifies the units into classes or categories regarding the characteristic being represented, establishing an order between the units of the different categories. An ordinal scale is a scale on which data is shown in order, determining a relative position of the classes according to one direction. Any set of values can be assigned to the variable categories, as long as the order between them is respected.

As in the nominal scale, arithmetic operations (sums, subtractions, multiplications, and divisions) between these values do not make any sense. Thus, the application of the usual descriptive statistics is also limited to nominal variables. Since the scale numbers are only meant to classify them, the descriptive statistics that can be used for ordinal data are frequency distribution tables, charts (including bar and pie charts), and the mode, as will study in Chapter 3.

Examples of ordinal variables include consumers' opinion and satisfaction scales, educational level, social class, age, etc.

Imagine a nonmetric variable called Classification that measures a group of consumers' preference regarding a certain wine brand. The definition of labels for each ordinal variable category can be found in Table 2.6. Value 1 is assigned to the worst classification, value 2 to the second worst, and so on, until value 5, which is the best classification, as shown in this table.

Table 2.6

Consumers' Classification of a Certain Wine Brand
ValueLabel
1Very bad
2Bad
3Average
4Good
5Very good

Instead of using scales from 1 to 5, we could have assigned any other numerical scale, as long as the order of classification had been respected. Thus, the numerical values do not represent a score of the product's quality, they are only meant to classify it. So, the difference between these values does not represent the difference of the attribute analyzed. These scales of measurement are known as ordinal scales.

Fig. 2.10 shows the characteristics of the variables being studied in Variable View on SPSS. The variable Customer is a string (its data are inserted as characters or letters) with a nominal scale of measurement. On the other hand, the variable Classification is numerical (numerical values were assigned to represent the variable categories) with an ordinal scale of measurement.

Fig. 2.10
Fig. 2.10 Defining the variable characteristics in Variable View.

The procedure for defining labels for qualitative variables on an ordinal scale is the same as the one already presented for nominal variables.

2.3.3 Quantitative Variable—Interval Scale

According to Stevens classification (1946), metric or quantitative variables have data in an interval or ratio scale.

Besides ordering the units based on the characteristic being measured, the interval scale has a constant unit of measure. The origin or point zero of this scale of measurement is random, and it does not express an absence of quantity.

A classic example of an interval scale is temperature measured in Celsius (°C) or in Fahrenheit (°F). Choosing temperature zero is random and differences of equal temperatures are determined by the identification of equal expansion volumes in the liquid inside the thermometer. Hence, the interval scale allows us to infer differences between the units to be measured. However, we cannot state that a value in a specific interval of the scale is a multiple of another one. For instance, assume that two objects are measured at 15°C and 30°C, respectively. Measuring the temperature allows us to determine how much one object is hotter than the other. However, we cannot state that the object with 30°C is twice as hot as the other with 15°C.

The interval scale does not vary under positive linear transformations. So, an interval scale can be transformed into another through a positive linear transformation. Transforming degrees Celsius into degrees Fahrenheit is an example of a linear transformation.

Most descriptive statistics can be applied to variable data with an interval scale, except statistics based on the ratio scale, such as, the variation coefficient.

2.3.4 Quantitative Variable—Ratio Scale

Analogous to the interval scale, the ratio scale orders the units based on the characteristic measured and has a constant unit of measure. On the other hand, the origin (or point zero) is unique and value zero expresses the absence of quantity. Therefore, it is possible to know if a value in a specific interval of the scale is a multiple of another.

Equal ratios between values of the scale correspond to equal ratios between the units measured. Thus, ratio scales do not vary under positive proportion transformations. For example, if a unit is 1 m high and the other 3 m, we can say that the latter is three times higher than the former.

Among the scales of measurement, the ratio scale is the most complete, because it allows us to use all arithmetic operations. In addition to this, all the descriptive statistics can be applied to the data of a variable expressed on a ratio scale.

Examples of variables whose data can be on the ratio scale include income, age, how many units of a certain product were manufactured, and distance traveled.

2.4 Types of Variables × Number of Categories and Scales of Accuracy

Qualitative or categorical variables can also be classified based on the number of categories: (a) dichotomous or binary (dummies), when they only take on two categories; (b) polychotomous, when they take on more than two categories.

On the other hand, metric or quantitative variables can also be classified based on the scale of accuracy: discrete or continuous.

This classification can be seen in Fig. 2.11.

Fig. 2.11
Fig. 2.11 Qualitative variables × Number of categories and Quantitative variables × Scales of accuracy.

2.4.1 Dichotomous or Binary Variable (Dummy)

A dichotomous or binary variable (dummy) can only take on two categories, and the values 0 or 1 are assigned to these categories. Value 1 is assigned when the characteristic of interest is present in the variable and value 0 if otherwise. As examples, we have: smokers (1) and nonsmokers (0), a developed country (1) and an underdeveloped country (0), vaccinated patients (1) and nonvaccinated patients (0).

Multivariate dependence techniques have as their main objective to specify a model that can explain and predict the behavior of one or more dependent variables through one or more explanatory variables. Many of these techniques, including the simple and multiple regression analysis, binary and multinomial logistic regression, regression for count data, and multilevel modeling, among others, can easily and coherently be applied with the use of nonmetric explanatory variables, as long as they are transformed into binary variables that represent the categories of the original qualitative variable. In this regard, a qualitative variable with n categories, for example, can be represented by (n − 1) binary variables.

For instance, imagine a variable called Evaluation, expressed by the categories good, average, or bad. Thus, two binary variables may be necessary to represent the original variable, depending on the researcher’s objectives, as shown in Table 2.7.

Table 2.7

Defining Binary Variables (Dummies) for the Variable Evaluation
Binary Variables (Dummies)
EvaluationD1D2
Good00
Average10
Bad01

Table 2.7

Further details about the definition of dummy variables in confirmatory models will be discussed in Chapter 13, including the presentation of the operations necessary to generate them on software such as Stata.

2.4.2 Polychotomous Variable

A qualitative variable can take on more than two categories and, in this case, it is called polychotomous. As examples, we can mention social classes (lower, middle, and upper) and educational levels (elementary school, high school, college, and graduate school).

2.4.3 Discrete Quantitative Variable

As described in Section 2.2.2, discrete quantitative variables can take on a finite set of values that frequently come from a count, such as, for example, the number of children in a family (0, 1, 2…), the number of senators elected, or the number of cars manufactured in a certain factory.

2.4.4 Continuous Quantitative Variable

Continuous quantitative variables, on the other hand, are those whose possible values are in an interval with real numbers and result from a metric measurement, as, for example, weight, height, or an individual’s salary (Bussab and Morettin, 2011).

2.5 Final Remarks

Whenever treated and analyzed through several different statistical techniques, data are transformed into information and can support the decision-making process.

These data can be metric (quantitative) or nonmetric (categorical or qualitative). Metric data represent the characteristics of an individual, object, or element that result from a count or measurement (patients' weight, age, interest rate, among other examples). In the case of nonmetric data, these characteristics cannot be measured or quantified (answers as, for example, yes or no, educational levels, among others).

According to Stevens (1946), the scales of measurement of nonmetric, categorical or qualitative variables can be classified as nominal and ordinal, while the metric or quantitative variables are classified on interval and ratio scales (or proportional).

A lot of data can be collected in a metric as well as in a nonmetric way. Assume that we wish to assess the quality of a certain product. In order to do that, scores from 1 to 10 regarding certain attributes can be assigned, and a Likert scale can be defined from information that has already been established. In general, and whenever possible, questions must be defined in a quantitative way, in order for the researcher not to lose data information.

For Fávero et al. (2009), generating the questionnaire and defining the variable scales of measurement will depend on several aspects, including the research objectives, the modeling to be adopted to achieve such objectives, the average time to apply the questionnaire, and how it will be collected. A dataset can present variables on metric and on nonmetric scales, it does not need to restrict itself to only one type of scale. This combination can provide some interesting researches and, jointly with the suitable modeling, it can generate information aimed at assisting the decision-making process.

The type of variable collected is crucial in the calculation of descriptive statistics and in the graphical representation of results, as well as in the selection of the statistical methods that will be used to analyze the data.

2.6 Exercises

  1. 1) What is the difference between qualitative and quantitative variables?
  2. 2) What are scales of measurement and what are the main types of scales? What are the differences between them?
  3. 3) What is the difference between discrete and continuous variables?
  4. 4) Classify the variables below according to the following scales: nominal, ordinal, binary, discrete, or continuous.
    1. a. A company's revenue.
    2. b. A performance rank: good, average, and bad.
    3. c. Time to process a part.
    4. d. Number of cars sold.
    5. e. Distance traveled in km.
    6. f. Municipalities in the Greater Sao Paulo.
    7. g. Family income ranges.
    8. h. A student's grades: A, B, C, D, O, or R.
    9. i. Hours worked.
    10. j. Region: North, Northeast, Center-West, South, and Southeast.
    11. k. Location: Sao Paulo or Seoul.
    12. l. Size of the organization: small, medium, and large.
    13. m. Number of bedrooms.
    14. n. Classification of risk: high, average, speculative, substantial, in moratorium.
    15. o. Married: yes or no.
  5. 5) A researcher wishes to study the impact of physical aptitude on the improvement of productivity in an organization. How would you describe the binary variables to be included in this model, so that the variable physical aptitude could be represented? The possible variable categories are: (a) active and healthy; (b) acceptable (could be better); (c) not good enough; (d) sedentary.

References*

Bussab W.O., Morettin P.A. Estatística básica. seventh ed. São Paulo: Saraiva; 2011.

Fávero L.P., Belfiore P., Silva F.L., Chan B.L. Análise de dados: modelagem multivariada para tomada de decisões. Rio de Janeiro: Campus Elsevier; 2009.

Freund J.E. Estatística aplicada: economia, administração e contabilidade. 11th ed. Porto Alegre: Bookman; 2006.

Stevens S.S. On the theory of scales of measurement. Science. 1946;103(2684):677–680.


* "To view the full reference list for the book, click here"

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset