Chapter 14

Binary and Multinomial Logistic Regression Models

Abstract

This chapter presents the binary and multinomial logistic regression models, establishing the circumstances based upon which the binary and multinomial regression models can be used. The objective is to estimate an occurrence probability model of an event based on the maximum likelihood method. The results of statistics tests pertinent to the logistic models are evaluated. Confidence intervals of the model parameters for the purpose of prediction are also elaborated, as well as the analysis of sensitivity and interpretation of the sensitivity curve, the ROC curve and the cutoff concepts, overall model efficiency, sensitivity, and specificity. The binary and multinomial regression models are also prepared in Microsoft Office Excel®, Stata Statistical Software®, and IBM SPSS Statistics Software® and their results are interpreted.

Keywords

Binary logistic regression; Multinomial logistic regression; Probability of event occurrence; Odds; Estimation by maximum likelihood; Cutoff; Sensitivity analysis; Overall model efficiency; Sensitivity; Specificity; Excel; Stata and SPSS

In the fields of observation, chance favors only the mind that is prepared.

Louis Pasteur

14.1 Introduction

The logistic regression models, even though quite useful and easy to apply, are still little used in many areas of human knowledge. Even though the development of software and the increase in computer processing capability has provided for their application in a more direct way, many researchers still do not know their usefulness and, above all, the conditions for their correct use.

Different from the traditional technique of regression estimated by ordinary least squares methods, where the dependent variable is presented in a quantitative way and some presuppositions should be obeyed, as we studied in the previous chapter, the techniques of logistic regression are used when the phenomenon to be studied (outcome variable) presents itself in a qualitative way and, therefore, is represented by one or more dummy variables, depending on the number of possible answers (categories) for this dependent variable.

Imagine, for example, that a researcher is interested in evaluating the probability of heart attacks in financial market executives, based on their physical characteristics (weight, waistline), their eating habits, and their health habits (physical exercise, smoking). A second researcher wants to evaluate the chance of consumers who acquire durable goods in a determined period to go into default due to the income, marital status, and educational level of each. Notice that the heart attack or default are dependent variables in both cases and their events may or may not occur, due to the explanatory variables inserted into the respective models, and, therefore, are qualitative dichotomous variables that represent each of the variables under study. Our intent is to estimate the occurrence probability of these phenomena and, therefore, we will use the binary logistic regression.

Imagine now that a third researcher is interested in studying the probability of obtaining credit by small- and medium-sized companies, due to their financial and operational characteristics. It is known that each company can receive unrestricted credit, restricted credit, or no credit at all. In this case, the dependent variable that represents the phenomenon is also qualitative, but offers three possible answers (categories). Therefore, to estimate the probability of the alternative proposals occurring, we should use the multinomial logistic regression.

Then, if a phenomenon under study is presented by means of two, and only two, categories, it will be represented by only one dummy variable. The first category will be the reference and indicate the event of noninterest (dummy = 0) and the other category will indicate the event of interest (dummy = 1), and we are dealing with a binary logistic regression technique. On the other hand, if the phenomenon under study presents more than two categories as occurrence possibilities, we must initially define the reference category to then estimate the multinomial logistic regression model.

In having a qualitative variable as a phenomenon to be studied, estimation by means of the least minimum squares method, as studied in the previous chapter, is not viable since this dependent variable does not present average and variance and, therefore, there is no way to minimize the sum of the square of the error terms without generating an incoherent arbitrary ponderation. Being that the insertion of this dependent variable in modeling software is done based on typing in the values that represent each of the answer possibilities, it is common to see forgetfulness in defining the category labels that correspond to each of the entered values and, therefore, it is possible that an unadvised or beginning researcher estimate the model by means of the least squares regression, including obtaining outputs, since the software will interpret the dependent variable as being quantitative. This serious mistake is, unfortunately, more common than one would think! The binary and multinomial logistic regression techniques are elaborated based on the estimation by maximum likelihood, to be studied in Sections 14.2.1 and 14.3.1, respectively.

Analogous to what was discussed in the previous chapter, the logistic regression models are defined based on subjacent theory and the experience of the researcher, in such a way that it is possible to estimate the desired model, analyze the obtained results by means of statistical tests, and prepare predictions.

In this chapter, we will cover the binary and multinomial logistic regression models, with the following objectives: (1) introduce the concepts of logistic regression, (2) present estimation by maximum likelihood, (3) interpret the obtained results and prepare predictions, and (4) present the application of these techniques in Excel, Stata, and SPSS. First, the solution to an example will be worked out in Excel simultaneously with the presentation of the concepts and its manual solution. After introducing the concepts, the procedures for preparing the technique in Stata and SPSS will be presented, maintaining the standard adopted in the book.

14.2 The Binary Logistic Regression Model

The binary logistic regression model has, as its main objective, as the study of the probability of the occurrence of an event defined by Y, which presents itself in a qualitative, dichotomic form (Y = 1 to describe the occurrence of an event of interest and Y = 0 to describe the occurrence of the non-event), based on the behavior of explanatory variables. In this way, we can define a vector of explanatory variables, with respective estimated parameters, in the following way:

Zi=α+β1X1i+β2X2i++βkXki

si9_e  (14.1)

in that Z is known as the logit, α represents the constant, βj (j = 1, 2, …, k) are the estimated parameters for each explanatory variable, Xj are the explanatory variables (metric or dummies), and the subscript i represents each sample observation (i = 1, 2, …, n, where n is the size of the sample). It is important to highlight that Z does not represent the dependent variable, denominated by Y, and our present objective is to define the pi probability expression for the occurrence of the event of interest for each observation, in function of logit Zi, or rather, in function of the estimated parameters for each explanatory variable. To do this, we should define the concept of an event’s chance of occurrence, also known as odds, in the following way:

oddsYi=1=pi1pi

si10_e  (14.2)

Imagine that we are interested in studying the event “passing the calculus course.” If, for example, the probability of a determined student to pass this course is of 80%, their chance of passing will be of 4 to 1 (0.8/0.2 = 4). If the probability of another student passing the same course is 25%, given that they studied much less than the first student, the chance of passing will be 1 to 3 (0.25/0.75 = 1/3). Even though we are used to daily using the terms chance or odds as synonyms for probability, the concepts are different!

The binary logistic regression defines the Z logit as the natural logarithm of odds, such that:

ln(oddsYi=1)=Zi

si11_e  (14.3)

from which comes:

ln(pi1pi)=Zi

si12_e  (14.4)

Being that our intent is to define an expression for the probability of occurrence of an event under study in function of the logit, we can mathematically isolate pi based on Expression (14.4) in the following manner:

pi1pi=eZi

si13_e  (14.5)

pi=(1pi)eZi

si14_e  (14.6)

pi(1+eZi)=eZi

si15_e  (14.7)

And, therefore, we have that:

Probability of occurrence of the event:

pi=eZi1+eZi=11+eZi

si16_e  (14.8)

Probability of occurrence of the non-event:

1pi=1eZi1+eZi=11+eZi

si17_e  (14.9)

Obviously, the sum of Expressions (14.8) and (14.9) is equal to 1.

Based on Expression (14.8), we can elaborate a table with p values in function of the Z values. Being that Z varies from −∞ to +∞, we will, for teaching purposes only, use integer values between − 5 and + 5. Table 14.1 gives these values.

Table 14.1

Probability of Occurrence of an Event (p) in Function of the Z Logit
pi=11+eZisi1_eZi
0.0067− 5
0.0180− 4
0.0474− 3
0.1192− 2
0.2689− 1
0.50000
0.73111
0.88082
0.95263
0.98204
0.99335

Based on Table 14.1, we can prepare a graph of p = f(Z), as presented in Fig. 14.1. By means of this graph, we see that the estimated probabilities, in function of the different values assumed by Z, are situated between 0 and 1, which was guaranteed when we imposed that the logit was equal to the natural logarithm of odds. As such, given the parameter data estimated in the model and the value of each of the explanatory variables for each i data observation, we can calculate the value of Zi and, by means of the logistic curve presented in Fig. 14.1 (also known as the S curve, or sigmoid), estimate the probability of occurrence for an event under study for this determined i observation.

Fig. 14.1
Fig. 14.1 Graph of p = f(Z).

Based on Expressions (14.1) and (14.8), we can define the general expression for the estimated probability of occurrence for an event that is presented in a dichotomic form for an i observation in the following way:

pi=11+e(α+β1X1i+β2X2i++βkXki)

si18_e  (14.10)

What the binary logistic regression estimates, therefore, is not the predicted values of the dependent variable, but, yet, the probability of occurrence of the event under study for each observation. We now move on to the estimation of the logit parameters by means of presenting an example prepared initially in Excel.

14.2.1 Estimation of the Binary Logistic Regression Model by Maximum Likelihood

We will now present the concepts pertinent to estimation by maximum likelihood using an example similar to that developed throughout the previous chapter. However, now the dependent variable will be qualitative and dichotomic.

Imagine that our curious professor, who has already considerably explored the effects of determined explanatory variables on the travel time for a group of students to get to school, by means of the multiple regression technique, is now interested in investigating if these same explanatory variables influence the probability of a student arriving late to class. In other words, the phenomenon in question to be studied only presents two categories (arrive late to class or not) and the event of interest refers to arriving late.

To this end, the professor researched 100 students at the school where he teaches, questioning each of them regarding if they had arrived late that day. The professor also asked regarding the distance traveled (in kilometers), the number of traffic lights through which each went, the time of day when the trip was made (morning or afternoon), and the driving style each considers themselves to have (calm, moderate, or aggressive). Part of the prepared dataset is found in Table 14.2.

Table 14.2

Example: Late (Yes or No) × Distance Traveled, Number of Traffic Lights, Time of Day for the Trip to School, and Driving Style
StudentArrived Late to School
(Yi)
Distance Traveled to School (km)
(X1i)
Number of Traffic Lights—sem (X2i)Time of Day
(X3i)
Driving Style
(X4i)
GabrielaNo12.57morningcalm
PatriciaNo13.310morningcalm
GustavoNo13.48morningmoderate
LeticiaNo23.57morningcalm
Luiz OvidioNo9.58morningcalm
LeonorNo13.510morningcalm
DalilaNo13.510morningcalm
AntonioNo15.410morningcalm
JuliaNo14.710morningcalm
MarianaNo14.710morningcalm
FilomenaYes12.811afternoonaggressive
EstelaYes1.013morningcalm

Table 14.2

For the dependent variable, being the event of interest refers to arrive late, this category will present values equal to 1, with the category not arrive late with values equal to 0.

Following what was defined in the previous chapter in relation to the explanatory qualitative variables, the reference category of the variable corresponding to the time of day will be afternoon, or rather, the cells in the dataset with this value will assume values equal to 0, leaving the cells with the category morning with values equal to 1. Now, the driving style variable should be transformed into two dummies (variables style2 for the moderate category and style3 for the aggressive category), being that we have defined the calm category as being the reference.

As such, Table 14.3 presents part of the final dataset to be used for the estimation of our binary logistic regression model.

Table 14.3

Substitution of Qualitative Variable Categories With Respective Dummy Variables
StudentArrived Late to School
(Dummy Yes = 1; No = 0)
(Yi)
Distance Traveled to School (km)
(X1i)
Number of Traffic Lights—sem (X2i)Time of Day
Dummy per
(X3i)
Driving Style
Dummy style2
(X4i)
Driving Style
Dummy style3
(X5i)
Gabriela012.57100
Patricia013.310100
Gustavo013.48110
Leticia023.57100
Luiz Ovidio09.58100
Leonor013.510100
Dalila013.510100
Antonio015.410100
Julia014.710100
Mariana014.710100
Filomena112.811001
Estela11.013100

Table 14.3

The complete dataset can be accessed by means of the Late.xls file.

In this way, the logit whose parameters we wish to estimate is defined in the following way:

Zi=α+β1disti+β2semi+β3peri+β4style2i+β5style3i

si19_e

and the estimated probability that a determined student arrives late can be written in the following way:

pi=11+e(α+β1disti+β2semi+β3peri+β4style2i+β5style3i)

si20_e

Being that it does not make sense for us to define the term of error for each observation, given that the dependent variable presents itself in a dichotomic form, there is no way to estimate the equation parameters by means of the sum of the residuals squares, as we did when estimating the traditional regression models. In this case, therefore, we will use the likelihood function from which the maximum likelihood estimation will be elaborated. Estimation by maximum likelihood is the most popular parameter estimation technique for logistic regression models.

Due to this fact, it is also important to mention, in relation to the presuppositions studied for regression models estimated by least minimum squares, that the researcher should only be concerned with the presupposition of the absence of multicollinearity of the explanatory variables as to the estimation of logistic regression models.

In the binary logistic regression, the dependent variable follows a Bernoulli distribution, in other words, the fact of a determined i observation have occurred or not in the event of interest can be considered as a Bernoulli trial, in which the probability of the occurrence of the event is pi and probability of the occurrence of the non-event is (1 − pi). In general, we can write that the probability of the occurrence of Yi, being that Yi is equal to 1 or equal to 0, is given as:

p(Yi)=pYii(1pi)1Yi

si21_e  (14.11)

For a sample with n observations, we can define the likelihood function as being:

L=ni=1[pYii(1pi)1Yi]

si22_e  (14.12)

from which comes, based on Expressions (14.8) and (14.9), that:

L=ni=1[(eZi1+eZi)Yi(11+eZi)1Yi]

si23_e  (14.13)

Being, in practice, it is more convenient to work with the logarithmic likelihood function, we arrive at the following function, also known as the log likelihood function:

LL=ni=1{[(Yi)ln(eZi1+eZi)]+[(1Yi)ln(11+eZi)]}

si24_e  (14.14)

And now, a question must be asked: What are the values of the logit parameters that cause the LL value of Expression (14.14) to be maximized? This important question is the main key for the estimation by maximum likelihood of binary logistic regression models, and can be answered using linear programming tools, so as to estimate the α, β1, β2, …, βk parameters based on the following objective function:

LL=ni=1{[(Yi)ln(eZi1+eZi)]+[(1Yi)ln(11+eZi)]}=max

si25_e  (14.15)

We will solve this problem with the Excel Solver tool using the data from our example. For such, we should open the LateMaximumLikelihood.xls file, which will help in the calculation of the parameters.

In this file, besides the dependent variable and the explanatory variables, three new variables were created, which correspond to the Zi logit, the probability of occurrence for the event of interest pi, and to the LLi logarithmic likelihood function for each observation, respectively. Table 14.4 shows part of the data when the α, β1, β2, β3, β4, and β5 parameters are equal to 0.

Table 14.4

Calculation of LL When α = β1 = β2 = β3 = β4 = β5 = 0
StudentYiX1iX2iX3iX4iX5iZipiLLi(Yi)ln(pi)+(1Yi)ln(1pi)si2_e
Gabriela012.5710000.5− 0.69315
Patricia013.31010000.5− 0.69315
Gustavo013.4811000.5− 0.69315
Leticia023.5710000.5− 0.69315
Luiz Ovidio09.5810000.5− 0.69315
Leonor013.51010000.5− 0.69315
Dalila013.51010000.5− 0.69315
Antonio015.41010000.5− 0.69315
Julia014.71010000.5− 0.69315
Mariana014.71010000.5− 0.69315
Filomena112.81100100.5− 0.69315
Estela11.01310000.5− 0.69315
SumLL=100i=1{[(Yi)ln(pi)]+[(1Yi)ln(1pi)]}si3_e− 69.31472

Table 14.4

Fig. 14.2 presents part of the data in the LateMaximumLikelihood.xls file, being that some cells were hidden due to the number of observations being equal to 100.

Fig. 14.2
Fig. 14.2 Data from LateMaximumLikelihood.xls file.

As we can see, when α = β1 = β2 = β3 = β4 = β5 = 0, the sum value of the logarithmic likelihood function is equal to − 69.31472. However, there should be an excellent combination of parameter values, such that the objective function presented in Expression (14.15) should be obeyed, or rather, the sum value of the logarithmic likelihood function is the maximum possible.

According to the logic proposed by Belfiore and Fávero (2012), we will now open the Excel Solver tool. The objective function is in cell J103, which is our destination cell and which should be maximized. Besides this, parameters α, β1, β2, β3, β4, and β5, whose values are in cells M3, M5, M7, M9, M11, and M13, respectively, are variables cells. The Solver window will be as shown in Fig. 14.3.

Fig. 14.3
Fig. 14.3 Solver—maximization of the sum of logarithmic likelihood function.

By clicking on Solve and then OK, we will obtain the best solution to the linear programming problem. Table 14.5 shows part of the obtained data.

Table 14.5

Values Obtained When Maximizing LL
StudentYiX1iX2iX3iX4iX5iZipiLLi(Yi) ⋅ ln (pi) + (1−Yi) ⋅ ln (1−pi)
Gabriela012.57100− 11.734780.00001− 0.00001
Patricia013.310100− 3.258150.03704− 0.03774
Gustavo013.48110− 7.423730.00060− 0.00060
Leticia023.57100− 9.312550.00009− 0.00009
Luiz Ovidio09.58100− 9.628560.00007− 0.00007
Leonor013.510100− 3.214110.03864− 0.03940
Dalila013.510100− 3.214110.03864− 0.03940
Antonio015.410100− 2.795720.05756− 0.05928
Julia014.710100− 2.949870.04974− 0.05102
Mariana014.710100− 2.949870.04974− 0.05102
Filomena112.8110015.966470.99744− 0.00256
Estela11.0131002.333830.91164− 0.09251
SumLL=100i=1{[(Yi)ln(pi)]+[(1Yi)ln(1pi)]}si3_e− 29.06568

Table 14.5

Then, the maximum possible value of the sum of the logarithmic likelihood function is LLmax = − 29.06568. The solution to this problem generated the following parameter estimates:

  • α = − 30.202
  • β1 = 0.220
  • β2 = 2.767
  • β3 = − 3.653
  • β4 = 1.346
  • β5 = 2.914

and, as such, the Zi logit can be written as follows:

Zi=30.202+0.220disti+2.767semi3.653peri+1.346style2i+2.914style3i

si26_e

Fig. 14.4 presents part of the results obtained by modeling the LateMaximumLikelihood.xls file.

Fig. 14.4
Fig. 14.4 Parameters estimation as to the maximization of LL by Solver.

And, therefore, the estimated probability expression for a student to arrive late can be written in the following way:

pi=11+e(30.202+0.220disti+2.767semi3.653peri+1.346style2i+2.914style3i)

si27_e

Thus, the posing of some interesting questions is now fitting:

What is the average estimated probability to arrive late to school when traveling 17 kilometers and going through 10 traffic lights, making the trip in the morning and having what is considered an aggressive driving style?

On average, how much does chance to arrive late to school change if a route 1 kilometer longer is adopted while maintaining the remaining conditions constant?

Does a student considered aggressive present, on average, a higher chance of arriving late than another who is considered calm? If yes, how much is this chance increased, maintaining the remaining conditions constant?

Before answering these important questions, we need to verify if all the estimated parameters are statistically significant at a determined confidence level. If this is not the case, we will need to re-estimate the final model, so that the model presents only statistically significant parameters so that, from then on, the elaboration of inferences and predictions is possible.

Therefore, having estimated by maximum likelihood the probability equation parameters of the event occurrence, we now begin the study of the general statistical significance of the obtained model, as well as the statistical significance of the parameters, analogous to that done when studying the traditional regression models in the previous chapter. It is important to mention that, in the Appendix of this chapter, we will make a brief presentation of the probit regression models, which can be used alternately to the binary logistic regression models for those cases where the probability of occurrence curve for a determined event adjusts itself more adequately to the cumulative density function of the standard normal distribution.

14.2.2 General Statistical Significance of the Binary Logistic Regression Model and Each of Its Parameters

If, for example, we prepare a linear graph of our dependent variable (late) in function of the variable referent to the number of traffic lights (sem), we notice that the model estimates are not able to adjust themselves in a satisfactory way to the behavior of the dependent variable, being that it is a dummy. The graph in Fig. 14.5A presents this behavior. On the other hand, if the binary logistic regression model is prepared and the estimates of the probability of arriving late for each observation in our sample are plotted, specifically in function of the number of traffic lights through which each student goes, we notice that the adjustment is much more adequate to the behavior of the dependent variable (S curve), with estimated values limited to between 0 and 1 (Fig. 14.5B).

Fig. 14.5
Fig. 14.5 Linear and logistic adjustments of the dependent variable in function of the sem Variable. (A) Linear adjustment and (B) logistic adjustment.

Therefore, being the dependent variable is qualitative, it makes no sense to discuss its percentage of variance explained by the predicting variables. In other words, in logistic regression models, there is no coefficient of determination R2 as in the traditional regressions estimated by the least minimum squares method. However, many researchers present, in their work, a coefficient known as the McFadden pseudo R2. Its expression is given as:

pseudoR2=2LL0(2LLmax)2LL0

si28_e  (14.16)

Its usefulness is quite limited and is restricted to cases where the researcher in interested in comparing one or two distinct models, given that one of the many existing criteria for the model choice is that of the higher McFadden pseudo R2.

In our example, as we have already discussed in the previous section and calculated by means of Excel Solver, LLmax, which is the maximum possible value of the sum of the logarithmic likelihood function, is equal to − 29.06568.

Now LL0 represents the maximum possible value of the sum of the logarithmic likelihood function for a model known as the null model, in other words, for a model that only presents constant α and no explanatory variable. By means of the same procedure performed in the previous section, however now using the LateMaximumLikelihoodNullModel.xls file, we will obtain that LL0 = − 67.68585. Figs. 14.6 and 14.7 show the Solver window and part of the results obtained by modeling in this file, respectively.

Fig. 14.6
Fig. 14.6 Solver—maximization of the sum of logarithmic likelihood function for the null model.
Fig. 14.7
Fig. 14.7 Estimating parameters through LL maximization by Solver—null model.

Then, based on Expression (14.16), we obtain:

pseudoR2=2(67.68585)[(2(29.06568))]2(67.68585)=0.5706

si29_e

As we discussed, a higher McFadden pseudo R2 can be used as criteria to choose one model over another. However, as we will study in Section 14.2.4, there is another, more adequate criteria to choose the best model, which refers to a greater area below the receiver operating characteristic (ROC) curve.

Many researchers also use the McFadden pseudo R2 as a performance indicator for a chosen model, independent of the comparison with other models. However, its interpretation demands much care and, at times, there is the inevitable temptation to erroneously associate it with the variance percentages of the dependent variable. As we will study in Section 14.2.4, the best performance indicator for a binary logistic regression model is the overall model efficiency, which is defined based on the determination of a cutoff, the concepts of which will be studied in the same section.

Even though the usefulness of the McFadden pseudo R2 is limited, software such as Stata and SPSS calculate and present it in their respective outputs, as we will see in Sections 14.4 and 14.5, respectively.

Analogous to the procedure presented in the previous chapter, we will first study the general statistical significance of the model being proposed. The χ2 test provides the means to verify the model significance, since its null and alternate hypotheses, for a general logistic regression model, are:

  • H0: β1 = β2 = … = βk = 0
  • H1: there is at least one βj ≠ 0, respectively

While the F-test is used for regression models where the dependent variable presents itself quantitatively, which generates decomposition of the variance (ANOVA table), studied in the previous chapter, the χ2 test is more adequate for models estimated by the maximum likelihood method, such as the logistic regression models.

The χ2 test provides the researcher an initial verification regarding the existence of the model being proposed, since, if all the estimated βj (j = 1, 2, …, k) parameters are statistically equal to 0, the behavior of the alteration of each of the X variables will not influence the probability occurrence of the event under study in any way. The χ2 statistic has the following expression:

χ2=2(LL0LLmax)

si30_e  (14.17)

Returning to our example, we have that:

χ25d.f.=2[67.68585(29.06568)]=77.2403

si31_e

For 5 degrees of freedom (number of explanatory variables considered in the model, or rather, the number of β parameters), we have, by means of Table D in the Appendix, that the χc2 = 11.070 (χ2 critical for 5 degrees of freedom and for the significance level of 5%). In this way, being the χ2 calculated χcal2 = 77.2403 > χc2 = 11.070, we can reject the null hypothesis that all of the βj (j = 1, 2, …, 5) parameters are statistically equal to zero. Then, at least one X variable is statistically significant to explain the probability of occurrence of the event under study and we have a statistically significant binary logistic regression model for the purpose of prediction.

Software such as Stata and SPSS do not offer the χc2 for the defined degrees of freedom and a determined level of significance. However, they offer the χcal2 level of significance for these degrees of freedom. As such, instead of analyzing if χcal2 > χc2, we should check if the level of significance of χcal2 is lower than 0.05 (5%) so as to give continuity to the regression analysis. As such:

  • If the P-value (or Sig. χcal2 or Prob. χcal2) < 0.05, there is at least one βj ≠ 0.

The χcal2 level of significance can be obtained in Excel by means of the command Formulas → Insert Function → DIST.QUI, which will open the dialog box seen in Fig. 14.8.

Fig. 14.8
Fig. 14.8 Obtaining the χ2 level of significance (command Insert Function).

Analogous to the F-test, the χ2 test evaluates the joint significance of the explanatory variables, not defining which one or ones of the variables considered in the model are statistically significant to influence the probability of an event occurrence.

In this way, it is necessary that the researcher evaluate if each of the binary logistic regression model parameters is statistically significant and, in this sense, the Wald z statistic would be important to provide the statistical significance for each parameter to be considered in the model. The z nomenclature refers to the fact that the distribution of this statistic is standard normal. The Wald z test hypotheses for the α and for each βj (j = 1, 2, …, k) are:

  • H0: α = 0
  • H1: α ≠ 0
  • H0: βj = 0
  • H1: βj ≠ 0, respectively

The expressions for the calculation of the Wald z statistic for each α and βj parameter are given by:

zα=αs.e.(α)zβj=βjs.e.(βj)

si32_e  (14.18)

where s.e. refers to the standard error of each parameter under analysis. Given the complexity of the standard error calculation for each parameter, we will not perform them at this time. However, we recommend reading Engle (1984). The s.e. values for each parameter, for our example, are:

  • s.e. (α) = 9.981
  • s.e. (β1) = 0.110
  • s.e. (β2) = 0.922
  • s.e. (β3) = 0.878
  • s.e. (β4) = 0.748
  • s.e. (β5) = 1.179

Then, as we have already calculated the parameter estimates, we have that:

zα=αs.e.(α)=30.2029.981=3.026

si33_e

zβ1=β1s.e.(β1)=0.2200.110=2.000

si34_e

zβ2=β2s.e.(β2)=2.7670.922=3.001

si35_e

zβ3=β3s.e.(β3)=3.6530.878=4.161

si36_e

zβ4=β4s.e.(β4)=1.3460.748=1.799

si37_e

zβ5=β5s.e.(β5)=2.9141.179=2.472

si38_e

After obtaining the Wald z statistics, the researcher can use the normal curve distribution table to obtain the critical values for a given level of significance and check if such tests reject or do not reject the null hypothesis.

For the 5% level of significance, we have, by means of Table E in the Appendix, that zc = − 1.96 for the lower tail (probability for the lower tail of 0.025 for the two-tailed distribution) and zc = 1.96 for the upper tail (probability for the upper tail also of 0.025 for the two-tailed distribution).

The zc values for the 5% significance level can be obtained in Excel by means of the command Formulas → Insert Function → NORM.S.INV, being that the researcher should type in a probability of 2.5% to obtain zc for each lower tail, and 97.5% to obtain the zc for each upper tail, as shown in Figs. 14.9 and 14.10, respectively.

Fig. 14.9
Fig. 14.9 Obtaining zc for the lower tail (command Insert Function).
Fig. 14.10
Fig. 14.10 Obtaining zc for the upper tail (command Insert Function).

Only the Wald z statistic for the β4 parameter presented a value between − 1.96 and 1.96, which indicates, at the 5% significance level, that, for this case, there was no rejection of the null hypothesis, or rather, this parameter cannot be considered statistically different from zero.

As in the case of the χ2 test, statistical packages also offer the Wald z test values for the significance levels, which facilitates the decision, being that, with a 95% confidence level (5% significance level) we have:

  • If P-value (or Sig. zcal or Prob. zcal) < 0.05 for α, α ≠ 0
  • and
  • If P-value (or Sig. zcal or Prob. zcal) < 0.05 for determined explanatory X variable, β ≠ 0.

As such, being − 1.96 < zβ4 = 1.799 < 1.96, we see that the P-value of the Wald z statistic of the style2 variable will be greater than 0.05.

The nonrejection of the null hypothesis for the β4, parameter, at the 5% significance level, indicates that the corresponding style2 variable is not statistically significant to increase or decrease the probability of arriving late to school late in the presence of the other explanatory variables and, therefore, should be excluded from the final model.

At this time, we will perform a manual exclusion of this variable so as to obtain the final model. However, it is important to remember that the manual exclusion of a variable can cause another initially significant variable to come to present a nonsignificant parameter, and this problem tends to worsen with the greater number of explanatory variables in the dataset. The opposite can also occur, i.e., it is not recommended to perform a simultaneous manual exclusion of two or more variables whose parameters, at first sight, do show themselves to be statistically different from zero, since a determined β parameter can become statistically different from zero. Fortunately, these phenomena do not occur in this example and, as such, we opt to manually exclude the style2 variable. This will be proven when we estimate the binary logistic regression model by means of the Stepwise procedure in Stata (Section 14.4) and SPSS (Section 14.5).

Therefore, we will open the LateMaximumLikelihoodFinalModel.xls file. Notice that now the calculation of the (Zi) logit no longer takes into account the variable style2 parameter, which was excluded from the model. Figs. 14.11 and 14.12 show the Solver window and part of the results obtained in the modeling by means of this last file, respectively.

Fig. 14.11
Fig. 14.11 Solver—maximization of the sum of logarithmic likelihood function for the final model.
Fig. 14.12
Fig. 14.12 Estimating parameters through LL maximization by Solver—final model.

Then, for the final model, we have that LLmax = − 30.80079. Before parting to the definition of the final expression of the occurrence probability of the event under study, we need to define if the new estimated model (final model) presents loss in the quality of adjustment in relation to the estimated complete model with all the explanatory variables. To do this, the likelihood-ratio test, which verifies the complete model adjustment in comparison with the final model adjustment, can be used, presenting the following expression:

χ21d.f.=2(LLfinal modelLLcomplete model)

si39_e  (14.19)

For our example data, we have that:

χ21d.f.=2[30.80079(29.06568)]=3.4702

si40_e

Then, for 1 degree of freedom, we have, by means of Table D in the Appendix, that χc2 = 3.841 (χ2 critical for 1 degree of freedom and for the 5% significance level). This way, being that the χ2 calculated χcal2 = 3.4702 < χc2 = 3.841, we do not reject the null hypothesis of the likelihood-ratio test, or rather, the estimation of the final model with the exclusion of the style2 variable did not alter the quality of the adjustment, to the 5% significance level, which causes this model to be preferable in relation to the estimated complete model with all of the explanatory variables.

In Sections 14.4 and 14.5 we will present, by means of Stata and SPSS, respectively, another quite usual test to verify the quality of adjustment for the final model, known as the Hosmer-Lemeshow test. By dividing the dataset into 10 groups per the deciles of the final model estimated probabilities for each observation, this test evaluates, by means of performing an χ2 test, if there are significant differences between the number of frequencies observed and expected for the number of observations in each of the 10 groups and, in case such differences are not statistically significant, at a determined significance level, the estimated model will not present problems in relation to the quality of the proposed adjustment.

Being as such, we return to the analysis of the final estimated model results. The solution to this new problem generated the following final parameter estimates:

  • α = − 30.935
  • β1 = 0.204
  • β2 = 2.920
  • β3 = − 3.776
  • β5 = 2.459

with the respective standard errors:

  • s.e. (α) = 10.636
  • s.e. (β1) = 0.101
  • s.e. (β2) = 1.011
  • s.e. (β3) = 0.847
  • s.e. (β5) = 1.139

and the following Wald z statistics:

zα=αs.e.(α)=30.93510.636=2.909

si41_e

zβ1=β1s.e.(β1)=0.2040.101=2.020

si42_e

zβ2=β2s.e.(β2)=2.9201.011=2.888

si43_e

zβ3=β3s.e.(β3)=3.7760.847=4.458

si44_e

zβ5=β5s.e.(β5)=2.4591.139=2.159

si45_e

with all values of zcal < − 1.96 or > 1.96 and, therefore, with P-values for the Wald z statistics < 0.05.

The final model also presents the following statistics:

pseudoR2=2(67.68585)[(2(30.80079))]2(67.68585)=0.5449

si46_e

χ24d.f.=2[67.68585(30.80079)]=73.77012>χ2c4d.f.=9.48773

si47_e

As such, we can write the Zi logit as follows:

Zi=30.935+0.204disti+2.920semi3.776peri+2.459style3i

si48_e

with the final estimated probability expression that student i will arrive late to school:

pi=11+e(30.935+0.204disti+2.920semi3.776peri+2.459style3i)

si49_e

These parameters and respective statistics can also be obtained by means of the Stepwise procedure when estimating the binary logistic regression model in Stata and SPSS.

Based on the estimation of the probability function, a curious researcher could, for example, desire to prepare a graph of the estimated probabilities for each student to arrive late to school (column H in the final model file in Excel) in function of the number of traffic lights through which each must go on the route (column D in Excel). Fig. 14.13 presents this graph and, contrary to the graph in Fig. 14.5B, which offers determined logistic adjustment (only values equal to 0 or 1 for the dependent variable), this new graph presents a logistic probability adjustment.

Fig. 14.13
Fig. 14.13 Logistic probability adjustment in function of the sem variable.

Based on Fig. 14.13, which also presents the logistic curve adjusted to the cloud of points that represents the estimated probabilities for each observation, we can see that, while the probability of arriving late to school is very low when going through up to 8 traffic lights along the route, the probability becomes quite high when the student is obliged to go through 11 or more traffic lights during the trip.

Deepening the analysis of the probability function, we can return to our three important questions, answering each one at a time:

What is the average estimated probability to arrive late to school when traveling 17 kilometers and going through 10 traffic lights, making the trip in the morning and having what is considered an aggressive driving style?

Using the last probability expression and substituting the provided values in this equation, we will have:

p=11+e[30.935+0.204(17)+2.920(10)3.776(1)+2.459(1)]=0.603

si50_e

Then, the average estimated probability of arriving late to school is, within the provided conditions, equal to 60.3%.

On average, how much does chance to arrive late to school change if a route 1 kilometer longer is adopted while maintaining the remaining conditions constant?

To answer this question, we should resort to Expression (14.3), which can be written as follows:

oddsYi=1=eZi

si51_e  (14.20)

such that, maintaining the remaining conditions constant, the chance of arriving late to school when adopting a route that is 1 kilometer longer is:

oddsY=1=e0.204=1.226

si52_e

Then, the chance is multiplied by a factor of 1.226, or rather, if the remaining conditions are maintained constant, the chance of arriving late to school when adopting a route that is 1 kilometer longer is, on average, 22.6% higher.

Does a student considered aggressive present, on average, a higher chance of arriving late than another who is considered calm? If yes, how much is this chance increased, maintaining the remaining conditions constant?

Being that β5 is positive, we can state that the probability that a student who is considered aggressive arriving late to school is higher than a student who is considered calm, a fact that is also proven when we analyze chance, given that, if β5 > 0, then eβ5 > 1, or rather, the chance will be higher to arrive late when the student has an aggressive driving style in relation to the one who is calm. This is proved, once again, that being aggressive behind the wheel leads nowhere!

Maintaining the remaining conditions constant, the chance to arrive late to school when being aggressive behind the wheel in relation to being calm is given as:

oddsY=1=e2.459=11.693

si53_e

Then, that chance is multiplied by a factor of 11.693, or rather, maintaining the remaining conditions constant, the chance of arriving late to school when being aggressive behind the wheel in relation to being calm is, on average, 1069.3% higher.

It is worth commenting that there are no differences in the probability of arriving late to school when one is considered moderate or calm, given that the β4 parameter (referent to the moderate category) presents itself as statistically equal to zero, at the 5% significance level.

As we can see, these calculations always use the average estimates for the parameters. We now embark on the study of the confidence intervals for these parameters.

14.2.3 Construction of the Confidence Intervals of the Parameters for the Binary Logistic Regression Model

The confidence intervals for the coefficients of Expression (14.10), for parameters α and βj (j = 1, 2, …, k), at the 95% confidence level, can be written as follows:

α±1.96[s.e.(α)]βj±1.96[s.e.(βj)]

si54_e  (14.21)

where, as we have seen, 1.96 is the zc for the 95% confidence level (5% significance level).

As such, we can prepare Table 14.6, which gives the estimated parameter coefficients for the probability expression for the event of interest in our example, with the respective standard errors, the Wald z statistics and the confidence intervals for the 5% significance level.

Table 14.6

Calculation of the Confidence Intervals of the Parameter Coefficients
ParameterCoefficientStandard Error
(s.e.)
zConfidence Interval (95%)
α − 1.96. [s.e. (α)]
βj − 1.96. [s.e. (βj)]
α + 1.96. [s.e. (α)]
βj + 1.96. [s.e. (βj)]
α (constant)− 30.93510.636− 2.909− 51.782− 10.088
β1 (dist variable)0.2040.1012.0200.0060.402
β2 (sem variable)2.9201.0112.8880.9384.902
β3 (per variable)− 3.7760.847− 4.458− 5.436− 2.116
β5 (style3 variable)2.4591.1392.1590.2274.691

Table 14.6

This table is equal to what we will obtain when estimating the model in Stata and SPSS by means of the Stepwise procedure. Based on the parameter confidence intervals, we can write the lower (minimum) and upper (maximum) limits expressions for the estimated probability that a student i will arrive late to school, with 95% confidence. As such, we will have:

pimin=11+e(51.782+0.006disti+0.938semi5.436peri+0.227style3i)

si55_e

pimax=11+e(10.088+0.402disti+4.902semi2.116peri+4.691style3i)

si56_e

Based on Expression (14.20), the confidence interval of the chance an event of interest occurs for each parameter βj (j = 1, 2, …, k), at the 95% confidence level, can be written the following way:

eβj±1.96[s.e.(βj)]

si57_e  (14.22)

Notice that we did not present the expression for the chance confidence interval for parameter α, since it only makes sense to discuss that change in the chance of the occurrence of an event under study when it is altered by a unit, for example, a determined model explanatory variable, maintaining the remaining conditions constant.

For the data in our example and based on the values of Table 14.6, we will, then, prepare Table 14.7, which presents the confidence intervals of the chance (odds) of occurrence for an event of interest for each parameter βj.

Table 14.7

Calculation of Confidence Intervals of Chance (Odds) for Each Parameter βj
ParameterChance (Odds)Chance Confidence Interval (95%)
eβjeβj − 1.96. [s.e. (βj)]eβj + 1.96. [s.e. (βj)]
β1 (dist variable)1.2261.0061.495
β2 (sem variable)18.5412.555134.458
β3 (per variable)0.0230.0040.120
β5 (style3 variable)11.6931.254109.001

Table 14.7

These values can also be obtained by means of Stata and SPSS, as we will show in Sections 14.4 and 14.5, respectively.

According to what was discussed in the previous chapter, if the confidence interval for a determined parameter contains zero (or if chance contains 1), the same will be considered statistically equal to zero for the confidence level with which the researcher is working. If this happens with the parameter α, it is recommended that nothing be altered in the modeling, since such fact is due to the use of small samples, and a larger sample will solve this problem. On the other hand, if the confidence interval of a parameter βj contains zero, this will be excluded from the final model when done by the Stepwise procedure. Even though it was not shown here, the confidence interval of the estimated parameter for the variable style2 contains zero being that, as discussed, its zcal value was situated between − 1.96 and 1.96 and, therefore, such variable was excluded from the final model.

As was also discussed, the rejection of the null hypothesis for a determined β parameter, at a specified significance level, indicates that the corresponding X variable is significant to explain the probability of occurrence for an event of interest and, consequently, should remain in the final model. We can, therefore, conclude that the decision to exclude a determined X variable in a logistic regression model can be done by means of the direct analysis of the Wald z statistic of its respective β parameter (if − zc < zcal < zc → P-value > 0.05 → we cannot reject that the parameter is statistically equal to zero) or by means of the analysis of the confidence interval (if the same contains zero). Box 14.1 presents the criteria of inclusion or exclusion of the βj (j = 1, 2, …, k) parameters in logistic regression models.

Box 14.1

Decision to Include βj Parameters in Logistic Regression Models

ParameterWald z Statistic (For Significance Level α)z Test (Analysis of P-Value for Significance Level α)Analysis of Confidence IntervalDecision
βj zc α/2 < zcal < zc α/2P-value > sig. level αConfidence interval contains zeroExclude parameter from model
zcal > zc α/2
or
zcal < − zc α/2
P-value < sig. level αConfidence interval does not contain zeroMaintain parameter in model

Unlabelled Table

Obs.: Most common in applied social science is the adoption of significance level α = 5%.

14.2.4 Cutoff, Sensitivity Analysis, Overall Model Efficiency, Sensitivity, and Specificity

Having estimated the probability model for the occurrence of an event, we will now define the concept of cutoff, based on which it will be possible to classify, in our example, the observations based on the estimated probabilities for each of them. We return to the estimated probability expression for the final model:

pi=11+e(30.935+0.204disti+2.920semi3.776peri+2.459style3i)

si49_e

Having calculated the pi values by means of the LateMaximumLikelihoodFinalModel.xls file, we will prepare a table with some observations from our sample. Table 14.8 gives the pi values for the randomly chosen 10 observations, solely for teaching purposes.

Table 14.8

pi Values for 10 Observations
Observationpi
Adelino0.05444
Carolina0.67206
Cristina0.55159
Eduardo0.81658
Cintia0.64918
Raimundo0.05340
Emerson0.04484
Raquel0.56702
Rita0.85048
Leandro0.46243

A cutoff is defined by the researcher so that the observations can be classified in function of their calculated probabilities and, as such, is used when there is the desire to prepare occurrence predictions of the event for observations not present in the sample, based on the probability of observations present in the sample.

Thus, if a determined observation not present in the sample presents a probability of occurring in the event higher than the defined cutoff, it is hoped that there is the incidence of the event and, therefore, will be classified as an event. On the other hand, if its probability is lower than the defined cutoff, it is hoped that there is the incidence of a non-event and, therefore classified as a non-event.

In general, we can stipulate the following criteria:

  • If pi > cutoff → the i observation should be classified as an event.
  • If pi < cutoff → the i observation should be classified as a non-event.

Being that the probability expression is estimated based on the observations present in the sample, the classification, for other observations not initially present in the sample, takes into consideration the behavioral consistency of the estimators and, therefore, for inferential effects, the sample should be significant and representative of population behavior, as with any confirmatory model.1

The cutoff serves for the researcher to evaluate the real incidence of the event for each observation and compare it with the expectation that each observation occurs, in fact, in the event. This being done, it will be possible to evaluate the model success rate based on the actual observations present in the sample and, per inference, assume that such success rate is maintained when there is the desire to evaluate the event incidence for other observations not present in the sample (prediction).

Based on the data from the observations presented in Table 14.8, and choosing, for example, a cutoff of 0.5, we can define that:

  • If pi > 0.5 → the i observation should be classified as an event.
  • If pi < 0.5 → the i observation should be classified as a non-event.

Table 14.9 gives, for each of the 10 randomly chosen observations, the real occurrence of the event and its respective classification based on the cutoff definition.

Table 14.9

Real Event Occurrence and Classification for 10 Observations With Cutoff = 0.5
ObservationEventpiClassification
Cutoff = 0.5
AdelinoNo0.05444No
CarolinaNo0.67206Yes
CristinaNo0.55159Yes
EduardoNo0.81658Yes
CintiaNo0.64918Yes
RaimundoNo0.05340No
EmersonNo0.04484No
RaquelNo0.56702Yes
RitaYes0.85048Yes
LeandroYes0.46243No

Table 14.9

Now we can prepare a new classification table, still based only on these 10 observations, so as to evaluate if the observations were correctly classified with a cutoff of 0.5 (Table 14.10).

Table 14.10

Classification Table for 10 Observations (Cutoff = 0.5)
Real Occurrence of the EventReal Occurrence of the Non-event
Classified as event15
Classified as non-event13

In other words, for these 10 observations, only one was an event and presented a probability higher than 0.5, or rather, it was an event and was in fact classified as such (correctly classified). The other three observations were also classified correctly, or rather, they were not an event and were not classified as an event. On the other hand, six observations were classified incorrectly, or rather, while one was an event, even though it presented a probability lower than 0.5 and, therefore, not classified as an event, the other five were not an event but presented estimated probabilities higher than 0.5 and, consequently, were classified as an event.

For our sample of 100 observations, we can elaborate Table 14.11, which gives the complete classification for the 0.5 cutoff. This table can also be obtained by modeling in Stata and SPSS.

Table 14.11

Classification Table for Complete Sample (Cutoff = 0.5)
Real Occurrence of the EventReal Occurrence of the Non-event
Classified as event5611
Classified as non-event330

For the complete sample, we see that 86 observations were correctly classified, for a cutoff of 0.5, being that 56 were an event and were in fact classified as such, and another 30 were an event not occurring and were not classified as an event with this cutoff. However, 14 observations were incorrectly classified, being that 3 were an event but were not classified as such and 11 were not an event but were classified as having been.

This analysis, known as sensitivity analysis, generates classifications that depend on the choice of cutoff. Further ahead, we will make alterations to the cutoff, so as to show that the quantity of classified observations as event or non-event, change.

At this time, we will define the concepts of overall model efficiency, sensitivity, and specificity.

The overall model efficiency (OME) corresponds to the percentage of classification hits for a determined cutoff. For our example, the overall model efficiency is calculated as follows:

OME=56+30100=0.8600

si60_e

For the 0.5 cutoff, 86.00% of the observations are classified correctly. As was mentioned in Section 14.2.2, the overall model efficiency, for a determined cutoff, is much more adequate to evaluate model performance than the McFadden pseudo R2 since the dependent variable presents itself in a dichotomic qualitative way.

Sensitivity deals with the percentage of hits, for a determined cutoff, considering only the observations that are, in fact, events. Then, in our example, the denominator for calculating sensitivity is 59, and its expression is given as:

Sensitivity=5659=0.9492

si61_e

As such, for a cutoff of 0.5, 94.92% of the observations that are events are classified correctly.

Now, specificity, on the other hand, refers to the percentage of hits, for a given cutoff, considering only the observations that are not events. In our example, the expression is given as:

Specificity=3041=0.7317

si62_e

As such, 73.17% of the observations of events not occurring are classified correctly, or rather, for a cutoff of 0.5, the probability of the occurrence of an event lower than 50% is presented.

Obviously, overall model efficiency, sensitivity, and specificity change when the cutoff value is changed. Table 14.12 presents a new classification for the sample observations, considering a cutoff of 0.3. In this case, we have the following classification criteria:

  • If pi > 0.3 → the i observation should be classified as event.
  • If pi < 0.3 → the i observation should be classified as non-event.

Table 14.12

Classification Table for Complete Sample (Cutoff = 0.3)
Real Occurrence of the EventReal Occurrence of the Non-event
Classified as event5713
Classified as non-event228
Overall model efficiency0.8500
Sensitivity0.9661
Specificity0.6829

In comparing the values obtained for a cutoff of 0.5, we see, in this case (cutoff of 0.3), that while sensitivity presents a small increase, specificity is reduced a little more dramatically, which results, in the overall ambit, in a reduction of the overall model efficiency percentage.

Now, let’s alter the cutoff once again, which will be, for our example, 0.7. For this new situation, we have the following classification criteria:

  • If pi > 0.7 → the i observation should be classified as event.
  • If pi < 0.7 → the i observation should be classified as non-event.

Table 14.13 shows this new classification, with the calculations for overall model efficiency, sensitivity, and specificity.

Table 14.13

Classification Table for Complete Sample (Cutoff = 0.7)
Real Occurrence of the EventReal Occurrence of the Non-event
Classified as event475
Classified as non-event1236
Overall model efficiency0.8300
Sensitivity0.7966
Specificity0.8780

In this case, we see another behavior, or rather, while sensitivity presents a considerable reduction, specificity increases. We can even see that the rate of hits for those that are events becomes lower than the rate of hits for those that are not events. However, the overall model efficiency, with a 0.7 cutoff, also presents a reduction in percentage in relation to the model with a cutoff of 0.5.

This sensitivity analysis can be done with any cutoff value of between 0 and 1, which allows the researcher to decide regarding defining a cutoff that attends their prediction objectives. If, for example, the objective is to maximize the overall model efficiency, a determined cutoff can be used that, as we know, can generate nonmaximized values of sensitivity or specificity. If, on the other hand, the objective is to maximize sensitivity, or rather, the rate of hits for those that are events, a cutoff can be defined that will not necessarily maximize the overall model efficiency. Finally, if there is the desire to maximize the rate of hits for observations that are not events (specificity), another cutoff can be defined.

In other words, the analysis of sensitivity is prepared based on the subjacent theory for each study and takes into consideration the choices desired by the researcher in terms of event occurrence prediction for observations not present in the sample, being, therefore, a management and strategic analysis of the phenomenon being investigated.

In academic work and in management reports from diverse organizations, it is common that sensitivity analysis graphs be presented and discussed. The most common are those known as the sensitivity curve and the ROC curve, which have distinct ends. While the sensitivity curve is a graph that presents the sensitivity and specificity values in function of the different cutoff values, the ROC curve is a graph that presents the variation in sensitivity in function of (1 − specificity).

We will present the sensitivity curve (Fig. 14.14) and the ROC curve (Fig. 14.15) for the data calculated in our example. Even though not complete, being that three cutoff values have already been used (0.3, 0.5, and 0.7), said curves will allow that some analyses be formed.

Fig. 14.14
Fig. 14.14 Sensitivity curve for three cutoff values.
Fig. 14.15
Fig. 14.15 ROC curve for three cutoff values.

By means of the sensitivity curve, we can see that it is possible to define a cutoff that matches sensitivity with specificity, or rather, the cutoff that causes the rate of hits prediction to those which will be events to be equal to the rate of hits prediction for those that which will not be events. It is important to mention, however, that this cutoff does not guarantee that the overall model efficiency is the maximum possible.

Besides this, the sensitivity curve allows the researcher to evaluate the tradeoff between sensitivity and specificity as to the alteration in cutoff, being that, in many cases, as has been discussed, the objective of the prediction could be to increase rate of hits for those that will be an event without there being a considerable loss in the rate of hits for those that are not events.

The ROC shows the actual behavior of the tradeoff between sensitivity and specificity by bringing, on the abscises axis, the values of (1 − specificity), presenting a convex format in relation to point (0, 1). As such, a determined model with a greater area below the ROC curve presents greater overall prediction efficiency, combining all of the cutoff possibilities and, as such, its choice should be preferable in comparison with another model with a smaller area below the ROC curve. In other words, if a researcher wants, for example, to include new explanatory variables in the model, a comparison of the overall performance of the models can be prepared based on the area below the ROC curve, being that, the greater its convexity in relation to point (0, 1), the greater its area (higher sensitivity and higher specificity) and, consequently, better the model estimated for the effects of prediction. Fig. 14.16 presents an illustration of this concept.

Fig. 14.16
Fig. 14.16 Model choice criteria with greater area below the ROC curve.

According to Swets (1996), the ROC curve has this name because it compares the alteration of two model operational characteristics (sensitivity and specificity). It was first used by engineers in the Second World War in the study to detect enemy objects in battle. Next, it was introduced to psychology for the investigation of the perceptual detection of determined stimuli and is, today, widely used in the field of medicine, such as radiology, and in different fields of applied social science, such as economics and finance. In this specific case, it is used considerably in risk and credit management and the probability of default.

In Sections 14.4 and 14.5, we will present the sensitivity ROC curves by means of Stata and SPSS, respectively, with all cutoff value possibilities between 0 and 1 for the final estimated model, including calculation of the respective area below the ROC curve.

14.3 The Multinomial Logistic Regression Model

When the dependent variable that represents the phenomenon under study is qualitative, but offers more than two possible answers (categories), we should use the multinomial logistic regression to estimate the occurrence probabilities for each alternative. To do this, we must first define the reference category.

Imagine a situation where a variable dependent presents itself in a qualitative form with three possible answer categories (0, 1, or 2). If the chosen reference category is the 0 category, we will have two other event possibilities in relation to this category, which will be represented by categories 1 and 2 and, as such, two explanatory variable vectors will be defined with the respective estimated parameters, or rather, two logits, as follows:

Zi1=α1+β11X1i+β21X2i++βk1Xki

si63_e  (14.23)

Zi2=α2+β12X1i+β22X2i++βk2Xki

si64_e  (14.24)

where the logit number now appears in the subscript of each parameter to be estimated.

Then, generically, if the dependent variable that represents the variable under study presents M answer categories, the number of estimated logits will be (M − 1) and, based on the same, we can estimate the probability of occurrence for each of the categories. The general expression of the logit Zim (m = 0, 1, …, M − 1) for a model where a dependent variable assumes M answer categories is:

Zim=αm+β1mX1i+β2mX2i++βkmXki

si65_e  (14.25)

where Zi0 = 0 and, therefore, eZi0 = 1.

Until now, in this chapter, we have been working with two categories and, consequently, only one Zi logit. In this way, the probabilities of the occurrence of a non-event and an event were calculated, respectively, by means of the following expressions:

Probability of occurrence of the non-event:

1pi=11+eZi

si66_e  (14.26)

Probability of occurrence of the event:

pi=eZi1+eZi

si67_e  (14.27)

Now for three categories, and based on Expressions (14.23) and (14.24), we can estimate the probability of occurrence for reference category 0 and the occurrence probabilities of the two distinct events represented by categories 1 and 2. As such, the expressions for these probabilities can be written in the following way:

Probability of occurrence for category 0 (reference):

pi0=11+eZi1+eZi2

si68_e  (14.28)

Probability of occurrence for category 1:

pi1=eZi11+eZi1+eZi2

si69_e  (14.29)

Probability of occurrence for category 2:

pi2=eZi21+eZi1+eZi2

si70_e  (14.30)

such that the sum of the probability of event occurrences, represented by the distinct categories, will always be 1.

In their complete form, Expressions (14.28)(14.30) can be written as:

pi0=11+e(α1+β11X1i+β21X2i++βk1Xki)+e(α2+β12X1i+β22X2i++βk2Xki)

si71_e  (14.31)

pi1=e(α1+β11X1i+β21X2i++βk1Xki)1+e(α1+β11X1i+β21X2i++βk1Xki)+e(α2+β12X1i+β22X2i++βk2Xki)

si72_e  (14.32)

pi2=e(α2+β12X1i+β22X2i++βk2Xki)1+e(α1+β11X1i+β21X2i++βk1Xki)+e(α2+β12X1i+β22X2i++βk2Xki)

si73_e  (14.33)

In general, for a model where a dependent variable assumes M answer categories, we can write the probability expressionpim (m = 0, 1, …, M − 1) as follows:

pim=eZimM1m=0eZim

si74_e  (14.34)

Analogous to the procedure developed in Sections 14.2.114.2.3, we will now estimate the parameters for Expressions (14.23) and (14.24) by using an example. We will also evaluate the general statistical significance of the model and parameters, as well as estimate their confidence intervals at a determined significance level. As such, we will again use, at this time, Excel.

14.3.1 Estimation of the Multinomial Logistic Regression Model by Maximum Likelihood

We will present the concepts pertinent to estimation of a multinomial logistic regression by maximum likelihood using an example similar to that developed in the previous section.

Now, imagine that our tireless professor is not only interested in studying what causes students to arrive late to school or not. He now wants to know if the students arrive late to their first or second class. In other words, the professor is now interested in investigating if some variables relative to the route taken influence the probability of arriving or not arriving late to the first class or the second class. Now, the dependent variable comes to have three categories: not arrive late, arrive late to the first class, and arrive late to the second class.

Being thus, the professor researched the same 100 students in the school where he lectures; however, the research was done on another day. Being that some students were a little tired of answering so many questions as of late, the professor, besides the variable referent to the phenomenon under study, decided to ask only regarding the distance (dist) and the number of traffic lights (sem) each went through that day on their way to school. Part of the dataset can be found in Table 14.14.

Table 14.14

Example: Late (No, Yes to First Class or Yes to Second Class) × Distance Traveled and Amount of Traffic Lights
StudentArrived Late to School (No = 0; Yes to First Class = 1; Yes to Second Class = 2)
(Yi)
Distance Traveled to School (km)
(X1i)
Number of Traffic Lights—sem (X2i)
Gabriela220.515
Patricia221.318
Gustavo221.416
Leticia231.515
Luiz Ovidio217.516
Leonor221.518
Dalila221.518
Antonio223.418
Julia222.718
Mariana222.718
Rodrigo116.016
Estela01.013

Table 14.14

As we can see, the dependent variable now has three distinct values, which are nothing more than labels referent to each of the three answer categories (M = 3). It is unfortunately common for beginning researchers to prepare multiple regression models, for example, assuming that the dependent variable is quantitative, being that it presents numbers in its column. As we discussed in the previous section, this is a serious mistake!

The complete dataset for this new example can be found in the LateMultinomial.xls file.

The expressions for the logits we wish to estimate are, therefore:

Zi1=α1+β11disti+β21semi

si75_e

Zi2=α2+β12disti+β22semi

si76_e

which refer to events 1 and 2, respectively, presented in Table 14.14. Notice that the event represented by the label 0 refers to the reference category.

Then, based on Expressions (14.31)(14.33), we can write the estimated occurrence probability expressions for each event corresponding to each category of the dependent variable. Being thus, we have:

pi0=11+e(α1+β11disti+β21semi)+e(α2+β12disti+β22semi)

si77_e

pi1=e(α1+β11disti+β21semi)1+e(α1+β11disti+β21semi)+e(α2+β12disti+β22semi)

si78_e

pi2=e(α2+β12disti+β22semi)1+e(α1+β11disti+β21semi)+e(α2+β12disti+β22semi)

si79_e

where pi0, pi1, and pi2 represent the probability that a student i will not arrive late (category 0), the probability that a student i will arrive late to the first class (category 1), and the probability that a student i will arrive late to the second class (category 2), respectively.

To estimate the parameters of the probability expressions, we will again use estimation by maximum likelihood. Generically, in the multinomial logistic regression, where the dependent variable follows a binomial distribution, an i observation can occur in a determined event of interest, given M possible events and, therefore, the occurrence probability pim (m = 0, 1, …, M − 1) for this specific event can be written in the following manner:

p(Yim)=M1m=0(pim)Yim

si80_e  (14.35)

For a sample with n observations, we can define the likelihood function in the following way:

L=ni=1M1m=0(pim)Yim

si81_e  (14.36)

from which comes, based on Expression (14.34), that:

L=ni=1M1m=0(eZimM1m=0eZim)Yim

si82_e  (14.37)

Analogous to the procedure adopted when studying the binary logistic regression, we will here work with the logarithmic likelihood function, which leads us to the following function, also known as log likelihood function:

LL=ni=1M1m=0[(Yim)ln(eZimM1m=0eZim)]

si83_e  (14.38)

And, therefore, we can ask an important question: Given M categories of the dependent variable, what are the values of the logit parameters Zim (m = 0, 1, …, M − 1) represented by Expression (14.25) that cause the LL value of Expression (14.38) to be maximized? This fundamental question is the main key to the estimation of the parameters of the multinomial logistic regression model by maximum likelihood method, and can be answered with the use of linear programming tools, so as to solve the problem with the following objective function:

LL=ni=1M1m=0[(Yim)ln(eZimM1m=0eZim)]=max

si84_e  (14.39)

Returning to our example, we will solve this problem using the Excel Solver tool. To do this, we should open the LateMultinomialMaximumLikelihood.xls file, which will help in the parameter calculation.

In this file, besides the dependent variable and explanatory variables, three Yim (m = 0, 1, 2) variables were created referent to the three categories of the dependent variable. This procedure should be done so as to validate Expression (14.35). These variables were created based on the criteria presented in Table 14.15.

Table 14.15

Criteria for Creation of Variables Yim (m = 0, 1, 2)
YiYi0Yi1Yi2
0100
1010
2001

Table 14.15

Besides this, six other new variables were also created and correspond to the logits Zi1 and Zi2, the probabilities pi0, pi1 and pi2 to the logarithmic likelihood function LLi for each observation, respectively. Table 14.16 shows part of the data when all parameters are equal to 0.

Table 14.16

Calculation of LL When α1 = β11 = β21 = α2 = β12 = β22 = 0
StudentYiYi0Yi1Yi2X1iX2iZi1Zi2pi0pi1pi2LLi2m=0[(Yim)ln(pim)]si5_e
Gabriela200120.515000.330.330.33− 1.09861
Patricia200121.318000.330.330.33− 1.09861
Gustavo200121.416000.330.330.33− 1.09861
Leticia200131.515000.330.330.33− 1.09861
Luiz Ovidio200117.516000.330.330.33− 1.09861
Leonor200121.518000.330.330.33− 1.09861
Dalila200121.518000.330.330.33− 1.09861
Antonio200123.418000.330.330.33− 1.09861
Julia200122.718000.330.330.33− 1.09861
Mariana200122.718000.330.330.33− 1.09861
Rodrigo101016.016000.330.330.33− 1.09861
Estela01001.013000.330.330.33− 1.09861
SumLL=100i=12m=0[(Yim)ln(pim)]si6_e− 109.86123

Table 14.16

Exclusively for teaching purposes, we present the calculation of LL for an observation where Yi = 2 and where all parameters are equal to zero:

LL1=2m=0[(Y1m)ln(p1m)]=(Y10)ln(p10)+(Y11)ln(p11)+(Y12)ln(p12)=(0)ln(0.33)+(0)ln(0.33)+(1)ln(0.33)=1.09861

si85_e

Fig. 14.17 presents part of the data present in the LateMultinomialMaximumLikelihood.xls file.

Fig. 14.17
Fig. 14.17 Data from LateMultinomialMaximumLikelihood.xls file.

As we discussed in Section 14.2.1, we should also have an optimum combination of parameter values here, such that the objective function presented in Expression (14.39) should be obeyed, or rather, that the sum value of the likelihood function be the maximum possible. We again resort to Excel Solver to solve this problem.

The objective function is in cell M103, which will be our destination cell and should be maximized. The parameters α1, β11, β21, α2, β12, and β22, whose values are in cells P3, P5, P7, P9, P11, and P13, respectively, are the variables cells. The Solver window will be as shown in Fig. 14.18.

Fig.14.18
Fig.14.18 Solver—maximization of the sum of logarithmic likelihood function for the multinomial logistic regression model.

In clicking on Solve and then OK, we will obtain the optimum solution for the linear programming problem Table 14.17 shows part of the data obtained.

Table 14.17

Values Obtained for the Maximization of LL for the Multinomial Logistic Regression Model
StudentYiYi0Yi1Yi2X1iX2iZi1Zi2pi0pi1pi2LLi2m=0[(Yij)ln(pij)]si7_e
Gabriela200120.5153.370363.238160.017990.523410.45860− 0.77959
Patricia200121.3188.8288312.787510.000000.018730.98127− 0.01891
Gustavo200121.4165.543917.104410.000680.173460.82586− 0.19133
Leticia200131.5159.5197715.103010.000000.003750.99625− 0.00375
Luiz Ovidio200117.5163.363672.897780.020820.601620.37756− 0.97402
Leonor200121.5188.9406413.003230.000000.016910.98308− 0.01706
Dalila200121.5188.9406413.003230.000000.016910.98308− 0.01706
Antonio200123.41810.0028115.052620.000000.006370.99363− 0.00639
Julia200122.7189.6114914.297580.000000.009140.99086− 0.00918
Mariana200122.7189.6114914.297580.000000.009140.99086− 0.00918
Rodrigo101016.0162.525111.279850.058520.731040.21044− 0.31329
Estela01001.0130− 10.87168− 23.585940.999980.000020.00000
SumLL=100i=12m=0[(Yim)ln(pim)]si6_e− 24.51180

Table 14.17

The maximum value possible for the logarithmic likelihood function is LLmax = − 24.51180. The solution to this problem generated the following parameter estimates:

  • α1 = − 33.135
  • β11 = 0.559
  • β21 = 1.670
  • α2 = − 62.292
  • β12 = 1.078
  • β22 = 2.895

and, in this way, the logits Zi1 and Zi2 can be written as follows:

Zi1=33.135+0.559disti+1.670semi

si86_e

Zi2=62.292+1.078disti+2.895semi

si87_e

Fig. 14.19 presents part of the results obtained by modeling the LateMultinomialMaximumLikelihood.xls file.

Fig. 14.19
Fig. 14.19 Estimating parameters through LL maximization by Solver—multinomial logistic regression model.

Based on the expressions of the logits Zi1 and Zi2, we can write the expressions of the occurrence probabilities for each of the categories of the dependent variable as follows:

Probability of a student i not arriving late (category 0):

pi0=11+e(33.135+0.559disti+1.670semi)+e(62.292+1.078disti+2.895semi)

si88_e

Probability of a student i arriving late to the first class (category 1):

pi1=e(33.135+0.559disti+1.670semi)1+e(33.135+0.559disti+1.670semi)+e(62.292+1.078disti+2.895semi)

si89_e

Probability of a student i arriving late to the second class (category 2):

pi2=e(62.292+1.078disti+2.895semi)1+e(33.135+0.559disti+1.670semi)+e(62.292+1.078disti+2.895semi)

si90_e

Having estimated by maximum likelihood the equations parameters of occurrence probability for each of the categories of the dependent variable, we can prepare the classification of the observations and define the overall model efficiency of the multinomial logistic regression. Different from the binary logistic regression, where the classification is prepared based on the definition of a cutoff, in the multinomial logistic regression, the classification of each observation is done based on the higher probability among those calculated (pi0, pi1, or pi2). As such, for example, being that observation 1 (Gabriela) presented pi0 = 0.018, pi1 = 0.523, and pi2 = 0.459, we should classify it as category 1, or rather, by means of example it is expected that Gabriela will arrive late to the first class. However, we can see that, actually, this student arrived late to the second class and, therefore, for this case, we did not obtain a hit.

Table 14.18 presents the classification for our complete sample, with emphasis on the hits for each category of the dependent variable, highlighting as well the overall model efficiency (overall percentage of hits).

Table 14.18

Classification Table for Complete Sample
ObservedClassification
Did Not Arrive LateArrived Late to First ClassArrived Late to Second ClassPercentage of Positives (%)
Did not arrive late472095.9
Arrived late to first class112375.0
Arrived late to second class053085.7
Overall model efficiency89.0

Table 14.18

By means of the Table 14.18 analysis, we can see that the model presents an overall percentage of hits of 89.0%. However, the model presents a higher percentage of hits (95.9%) for those cases where there was the indication of not arriving late to class. On the other hand, when there are indications that a student will arrive late to the first class, the model will have a lower percentage of hits (75.0%).

We now go on to the study of the general statistical significance of the obtained model, as well as the statistical significance of the actual parameters, as we did in Section 14.2.

14.3.2 General Statistical Significance of the Multinomial Logistic Regression Model and Each of Its Parameters

As in the binary logistic regression studied in Section 14.2, multinomial logistic regression modeling also offers statistics referent to the McFadden pseudo R2 and to χ2, whose calculations are given based on Expressions (14.16) and (14.17), respectively, given again here:

pseudoR2=2LL0(2LLmax)2LL0

si28_e  (14.40)

χ2=2(LL0LLmax)

si30_e  (14.41)

While the McFadden pseudo R2, as discussed in Section 14.2.2, is quite limited in terms of information regarding the model adjustment, able to be used when the researcher is interested in comparing distinct models, the χ2 statistic allows that a verification test be performed on the proposed model, since that, if all estimated parameters βjm (j = 1, 2, …, k; m = 1, 2, …, M − 1) are statistically equal to 0, the behavior of the alteration of each of the explanatory variables will not influence in any way the occurrence probabilities of the events represented by the categories of the dependent variables. The null hypotheses and alternative of the χ2 test for a general multinomial logistic regression model are:

  • H0: β11 = β21 = … = βk1 = β12 = β22 = … = βk2 = β1 M − 1 = β2 M − 1 = … = βk M − 1 = 0
  • H1: there is at least one βjm ≠ 0, respectively

Returning to our example, we have that LLmax, which is the maximum possible value of the sum of the logarithmic likelihood function, is equal to − 24.51180. To calculate LL0, which represents the maximum possible value of the sum of the logarithmic likelihood function for a model that only presents the constants α1 and α2 and no explanatory variable, we will again use Solver, by means of the LateMultinomialMaximumLikelihoodNullModel.xls file. Figs. 14.20 and 14.21 show the Solver window and part of the results obtained by modeling in this file, respectively.

Fig. 14.20
Fig. 14.20 Solver—maximization of the sum of logarithmic likelihood function for the multinomial logistic regression null model.
Fig. 14.21
Fig. 14.21 Estimating parameters through LL maximization by Solver—multinomial logistic regression null model.

Based on the null model, we have LL0 = − 101.01922 and, as such, we can calculate the following statistics:

pseudoR2=2(101.01922)[(2(24.51180))]2(101.01922)=0.7574

si93_e

χ24g.l.=2[101.01922(24.51180)]=153.0148

si94_e

For 4 degrees of freedom (number of β parameters being that there are two explanatory variables and two logits), we have, by means of Table D in the Appendix, that χc2 = 9.488 (χ2 critical for 4 degrees of freedom and for the 5% significance level). In this way, being χ2 calculated χcal2 = 153.0148 > χc2 = 9.488, we can reject the null hypothesis that all βjm (j = 1, 2; m = 1, 2) parameters are statistically equal to zero. Then, at least one X variable is statistically significant to explain the probability of occurrence of at least one of the events under study. In the same way as discussed in Section 14.2.2, we can define the following criteria:

  • If P-value (or Sig. χcal2 or Prob. χcal2) < 0.05, there is at least one βjm ≠ 0.

Besides the general statistical significance of the model, it is necessary to verify the statistical significance of each parameter, by means of the analysis of the respective Wald z statistics, whose null hypotheses and alternative are, for the parameters αm (m = 1, 2, …, M − 1) and βjm (j = 1, 2, …, k; m = 1, 2, …, M − 1):

  • H0: αm = 0
  • H1: αm ≠ 0
  • H0: βjm = 0
  • H1: βjm ≠ 0, respectively

The Wald z statistics are obtained based on Expression (14.18); however, maintaining the pattern presented in Section 14.2.2, we will not perform the standard error calculations for each parameter that, for our example, are:

  • s.e. (α1) = 12.183
  • s.e. (β11) = 0.243
  • s.e. (β21) = 0.577
  • s.e. (α2) = 14.675
  • s.e. (β12) = 0.302
  • s.e. (β22) = 0.686

Then, as we have already estimated the parameters, we have that:

zα1=α1s.e.(α1)=33.13512.183=2.720

si95_e

zβ11=β11s.e.(β11)=0.5590.243=2.300

si96_e

zβ21=β21s.e.(β21)=1.6700.577=2.894

si97_e

zα2=α2s.e.(α2)=62.29214.675=4.244

si98_e

zβ12=β12s.e.(β12)=1.0780.302=3.570

si99_e

zβ22=β22s.e.(β22)=2.8950.686=4.220

si100_e

As we can see, all calculated Wald z statistics present values lower than zc = − 1.96 or greater than zc = 1.96 (values critical to the 5% significance level, being the probabilities on the lower tail and the upper tail are equal to 0.025).

As such, we can see, for our example, that the criteria:

  • If P-value (or Sig. zcal or Prob. zcal) < 0.05 for αm, αm ≠ 0
  • and
  • If P-value (or Sig. zcal or Prob. zcal) < 0.05 for βjm, βjm ≠ 0

are obeyed. In other words, the dist and sem variables are statistically significant, to the 95% confidence level, to explain the differences in the probabilities of arriving late to the first class and the second class in relation to not arriving late. The expressions for these probabilities are those estimated in Section 14.3.1 and presented at its end.

As such, based on the final estimated probability models, we can propose three interesting questions, as we did in Section 14.2.2:

What is the average estimated probability of arriving late to the first class after traveling 17 kilometers and going through 15 traffic lights?

Being that arriving late to the first class is category 1, we should use the estimated probability expression pi1. As such, for this situation, we have that:

p1=e[33.135+0.559(17)+1.670(15)]1+e[33.135+0.559(17)+1.670(15)]+e[62.292+1.078(17)+2.895(15)]=0.722

si101_e

Then, the average estimated probability of arriving late to the first class is, under the informed conditions, equal to 72.2%.

On average, how much does one alter the chance of arriving late to the first class, in relation to not arriving late to school, in adopting a route that is 1 kilometer longer, maintaining the remaining conditions constant?

To answer this question, we will again resort to Expression (14.3), which can be written as follows:

oddsYi1=1=eZi1

si102_e  (14.42)

such that, maintaining the remaining conditions constant, the chance to arrive late to the first class in relation to not arriving late to school, by adopting a route that is 1 kilometer longer, is:

oddsY1=1=e0.559=1.749

si103_e

Then, the chance is multiplied by a factor of 1.749, or rather, maintaining the other conditions constant, the chance of arriving late to the first class in relation to not arriving late, by adopting a route that is 1 kilometer longer, is, on average, 74.9% higher. In multinomial logistic regression models, chance (odds ratio) is also known as relative risk ratio.

On average, how much does one alter the chance of arriving late to the second class, in relation to not arriving late to school, in going through 1 more traffic light, maintaining the remaining conditions constant?

In this case, being the event of interest refers to the category of arriving late to the second class, the chance expression comes to be:

oddsY1=2=e2.895=18.081

si104_e

Then, the chance is multiplied by a factor of 18.081, or rather, maintaining the remaining conditions constant, the chance of arriving late to the second class in relation to not arriving late to school, in going through 1 more traffic light in the route to school, is, on average, 1708.1% higher.

As we can see, these calculations always use the average parameter estimates. As we did in Section 14.2, we will now go on to the study of the confidence intervals for these parameters.

14.3.3 Construction of the Confidence Intervals of the Parameters for the Multinomial Logistic Regression Model

The confidence intervals for the estimated parameters in a multinomial logistic regression are also calculated by means of Expression (14.21) presented in Section 14.2.3. Then, at the 95% confidence level, they can be defined, for parameters αm (m = 1, 2, …, M − 1) and βjm (j = 1, 2, …, k; m = 1, 2, …, M − 1) in the following way:

αm±1.96[s.e.(αm)]βjm±1.96[s.e.(βjm)]

si105_e  (14.43)

in that 1.96 is the zc for the 5% significance level.

For the data in our example, Table 14.19 presents the estimated coefficients of parameters αm (m = 1, 2) and βjm (j = 1, 2; m = 1, 2) of the occurrence probability expressions in the events of interest, with the respective standard errors, the Wald z statistics, and the confidence intervals for the 5% significance level.

Table 14.19

Confidence Interval Calculation for the Multinomial Logistic Regression Parameters
ParameterCoefficientStandard Error (s.e.)zConfidence Interval (95%)
αm − 1.96. [s.e. (αm)]
βjm − 1.96. [s.e. (βjm)]
αm + 1.96. [s.e. (αm)]
βjm + 1.96. [s.e. (βjm)]
α1 (constant)− 33.13512.183− 2.720− 57.014− 9.256
β11 (dist variable)0.5590.2432.3000.0821.035
β21 (sem variable)1.6700.5772.8940.5392.800
α2 (constant)− 62.29214.675− 4.244− 91.055− 33.529
β12 (dist variable)1.0780.3023.5700.4861.671
β22 (sem variable)2.8950.6864.2201.5504.239

Table 14.19

As we already know, no confidence interval contains zero and, based on their values, we can write the lower (minimum) and upper (maximum) limits of the estimated occurrence probabilities for each of the categories of the dependent variable.

Confidence Interval (95%) of estimated probability of student i to not arrive late (category 0):

pi0min=11+e(57.014+0.082disti+0.539semi)+e(91.055+0.486disti+1.550semi)

si106_e

pi0max=11+e(9.256+1.035disti+2.800semi)+e(33.529+1.671disti+4.239semi)

si107_e

Confidence Interval (95%) of the estimated probability student i arrives late to first class (category 1):

pi1min=e(57.014+0.082disti+0.539semi)1+e(57.014+0.082disti+0.539semi)+e(91.055+0.486disti+1.550semi)

si108_e

pi1max=e(9.256+1.035disti+2.800semi)1+e(9.256+1.035disti+2.800semi)+e(33.529+1.671disti+4.239semi)

si109_e

Confidence Interval (95%) of the estimated probability student i arrives late to second class (category 2):

pi2min=e(91.055+0.486disti+1.550semi)1+e(57.014+0.082disti+0.539semi)+e(91.055+0.486disti+1.550semi)

si110_e

pi2max=e(33.529+1.671disti+4.239semi)1+e(9.256+1.035disti+2.800semi)+e(33.529+1.671disti+4.239semi)

si111_e

Analogous to that prepared in Section 14.2.3, we can define the confidence interval expression for the chances (odds or relative risk ratios) of occurrence for each of the events represented by the subscript m (m = 1, 2, M − 1) in relation to the event occurrence represented by category 0 (reference) for each parameter βjm (j = 1, 2, …, k; m = 1, 2, …, M − 1), at the 95% confidence level, in the following way:

eβjm±1.96[s.e.(βjm)]

si112_e  (14.44)

For the data in our example, and based on the values calculated in Table 14.19, we will prepare Table 14.20, which represents the confidence intervals for the chances (odds or relative risk ratios) of occurrence for each of the events in relation to the reference event for each parameter βjm (j = 1, 2; m = 1, 2).

Table 14.20

Calculation of the Confidence Intervals of the Chances (Odds or Relative Risk Ratios) for Each Parameter βjm
EventParameterChance (Odds)Confidence Interval for Chance (95%)
eβjmeβjm − 1.96. [s. e. (βjm)]eβjm + 1.96. [s. e. (βjm)]
Arrive late to first classβ11 (dist variable)1.7491.0852.817
β21 (sem variable)5.3121.71516.453
Arrive late to second classβ12 (dist variable)2.9391.6255.318
β22 (sem variable)18.0814.71369.363

Table 14.20

These values will also be obtained by means of Stata modeling, to be presented in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset