23.6 Estimation of Hierarchical Linear Models in SPSS

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

${performance}_{tjk} = π_{0 jk} + π_{1 jk} \cdot {year}_{jk} + e_{tjk}$

$π_{0 jk} = b_{00 k} + b_{01 k} \cdot {gender}_{jk} + r_{0 jk}$

$π_{1 jk} = b_{10 k} + b_{11 k} \cdot {gender}_{jk} + r_{1 jk}$

$b_{00 k} = γ_{000} + γ_{001} \cdot {texp}_{k} + u_{00 k}$

$b_{01 k} = γ_{010}$

$b_{10 k} = γ_{100} + γ_{101} \cdot {texp}_{k} + u_{10 k}$

$b_{11 k} = γ_{110},$

which results in the following expression:

$\begin{array}{l} {performance}_{tjk} = γ_{000} + γ_{100} \cdot {year}_{jk} + γ_{010} \cdot {gender}_{jk} + γ_{001} \cdot {texp}_{k} \\ + γ_{110} \cdot {gender}_{jk} \cdot {year}_{jk} + γ_{101} \cdot {texp}_{k} \cdot {year}_{jk} \\ + u_{00 k} + u_{10 k} \cdot {year}_{jk} + r_{0 jk} + r_{1 jk} \cdot {year}_{jk} + e_{tjk} \end{array}$

si120_e

To estimate this model, it is necessary to create one more new variable (texpyear), which corresponds to the multiplication between texp and year. Thus, let’s type the following command:

gen texpyear = texp⁎year

Therefore, we can estimate the model proposed by typing the following command:

xtmixed performance year gender texp genderyear texpyear || school: year || student: year, var nolog reml

whose outputs can be found in Fig. 23.42.

Even though the estimations of the fixed effects parameters and random effects variances are significant, at a significance level of 0.05, it is necessary to study the structure of the random effects (u_00k, u_10k and r_0jk, r_1jk) variance-covariance matrix . Based on the outputs found in Fig. 23.42, we have:

• Random effects variance-covariance matrix for level school

$var [\begin{array}{c} u_{00 k} \\ u_{10 k} \end{array}] = [\begin{array}{c} 87.994 & 0 \\ 0 & 0.263 \end{array}]$

si121_e

• Random effects variance-covariance matrix for level student

$var [\begin{array}{c} r_{0 jk} \\ r_{1 jk} \end{array}] = [\begin{array}{c} 337.627 & 0 \\ 0 & 3.092 \end{array}]$

si122_e

To store the results of this estimation, we have to type:

estimates store finalindependent

Since we did not specify any covariance structure for these error terms, in the preparation of the command xtmixed, Stata assumes that this structure is independent, that is, that cov(u_00k, u_10k) = 0 and that cov(r_0jk, r_1jk) = 0. Nevertheless, we can generalize these matrices’ structure by allowing u_00k and u_10k to be correlated, and r_0jk and r_1jk to be correlated too. In order to do that, in the command xtmixed, it is necessary to add the term covariance(unstructured) to the random effects components of levels school and student, such that:

xtmixed performance year gender texp genderyear texpyear || school: year, covariance(unstructured) || student: year, covariance(unstructured) var nolog reml

which generates the outputs seen in Fig. 23.43.

The fixed effects parameter estimations are extremely close to those obtained when estimating the model that considers the existence of a structure that is independent from the random effects variance-covariance matrices (Fig. 23.42).

Regarding the random effects parameters, except for the estimations of u_10k and cov(u_00k, u_10k), which are statistically significant at a significance level of 0.10, all the other estimations are significant at a significance level of 0.05. Since the respective | z | > 1.64, and this is the critical value of the standardized normal distribution that results in a significance level of 0.10. For educational purposes, we will use a confidence level of 90% to continue the analysis.

Thus, considering that cov(u_00k, u_10k) and cov(r_0jk, r_1jk) are statistically different from zero, based on the outputs in Fig. 23.43 we can write that:

• Random effects variance-covariance matrix for level school

$var [\begin{array}{c} u_{00 k} \\ u_{10 k} \end{array}] = [\begin{array}{c} 88.737 & - 3.185 \\ - 3.185 & 0.255 \end{array}]$

si123_e

• Random effects variance-covariance matrix for level student

$var [\begin{array}{c} r_{0 jk} \\ r_{1 jk} \end{array}] = [\begin{array}{c} 350.913 & - 13.251 \\ - 13.251 & 3.258 \end{array}]$

si124_e

Researchers will also obtain these matrices if they type the following command right after the last estimation:

estat recovariance

whose outputs can be found in Fig. 23.44.

Even statistically different from zero, the estimations of the random effects covariances in both levels of the analysis, if researchers wish to prove the better suitability of this last model over the one that considers the matrix with independent error terms, they just need to run a likelihood-ratio test to compare both estimations.

For this purpose, first, let’s type the following command, regarding the estimation with unstructured random effects:

estimates store finalunstructured

Next, we can type the command to carry out the abovementioned test:

lrtest finalunstructured finalindependent

The result can be seen in Fig. 23.45.

This χ² statistic for the test, with 2 degrees of freedom, can also be obtained through the following expression:

$χ_{2}^{2} = (- 2 \cdot {LL}_{r - ind} - (- 2 \cdot {LL}_{r - unstruc})) = \{- 2 \cdot (- 7, 419.679) - [- 2 \cdot (- 7, 376.715)]\} = 85.93,$

which results in a Sig. χ₂² = 0.000 < 0.05. Therefore, we can state that the structure of the random effects variance-covariance matrices can be considered unstructured in this example. That is, we can consider that error terms u_00k and u_10k are correlated (cov(u_00k, u_10k) ≠ 0) and that error terms r_0jk and r_1jk are correlated too (cov(r_0jk, r_1jk) ≠ 0).

We have arrived at our final model, with the following specification:

$\begin{array}{l} {performance}_{tjk} = 54.734 + 4.516 \cdot {year}_{jk} - 14.702 \cdot {gender}_{jk} + 1.179 \cdot {texp}_{k} \\ + 0.652 \cdot {gender}_{jk} \cdot {year}_{jk} - 0.057 \cdot {texp}_{k} \cdot {year}_{jk} \\ + u_{00 k} + u_{10 k} \cdot {year}_{jk} + r_{0 jk} + r_{1 jk} \cdot {year}_{jk} + e_{tjk} \end{array}$

si126_e

Next, we can obtain the expected BLUPS (best linear unbiased predictions) values of random effects u_10k, u_00k, r_1jk, and r_0jk of our final model by typing:

predict u10final u00final r1final r0final, reffects

which generates four new variables in the dataset, which are called u10final, u00final, r1final, and r0final. They correspond to the slope and intercept random effects of level school and to the slope and intercept random effects of level student, respectively. The following command, whose outputs can be found in Fig. 23.46, makes the descriptions of these random effects be presented:

Fig. 23.46 Description of random effects u_10k, u_00k, r_1jk, and r_0jk.

desc u10final u00final r1final r0final

Besides, we can also obtain the expected values of each student’s school performance in each of the periods monitored, by typing the following command:

predict yhatstudent, fitted level(student)

which defines the variable yhatstudent, which can also be obtained through the following command:

gen yhatstudent = 54.73435 + 4.515641⁎year - 14.70213⁎gender + 1.178656⁎texp + .6518855⁎genderyear - .0566496⁎texpyear + u00final + u10final⁎year + r0final + r1final⁎year

which corresponds to the expression:

$\begin{array}{l} perform \hat{a} nce_{student}_{jk} = 54.734 + 4.516 \cdot {year}_{jk} - 14.702 \cdot {gender}_{jk} + 1.179 \cdot {texp}_{k} \\ + 0.652 \cdot {gender}_{jk} \cdot {year}_{jk} - 0.057 \cdot {texp}_{k} \cdot {year}_{jk} \\ + u_{00 k} + u_{10 k} \cdot {year}_{jk} + r_{0 jk} + r_{1 jk} \cdot {year}_{jk} \end{array}$

si127_e

If researchers type the following command, they will obtain the expected values of each student’s school performance in each of the periods monitored; however, without considering the random effects in the level student:

predict yhatschool, fitted level(school)

which defines the variable yhatschool in the dataset, which can also be obtained through the following command:

gen yhatschool = 54.73435 + 4.515641⁎year - 14.70213⁎gender + 1.178656⁎texp + .6518855⁎genderyear - .0566496⁎texpyear + u00final + u10final⁎year

which corresponds to the expression:

$\begin{array}{l} perform \hat{a} nce_{school}_{k} = 54.734 + 4.516 \cdot {year}_{jk} - 14.702 \cdot {gender}_{jk} + 1.179 \cdot {texp}_{k} \\ + 0.652 \cdot {gender}_{jk} \cdot {year}_{jk} - 0.057 \cdot {texp}_{k} \cdot {year}_{jk} \\ + u_{00 k} + u_{10 k} \cdot {year}_{jk} \end{array}$

si128_e

Error terms e_tjk can be obtained by typing the command predict etjk, res (which is equivalent to performance - yhatstudent).

Therefore, at this moment, we are able to conclude the analysis. We have seen that students’ school performance follows a linear trend throughout time, there is a significant variance of intercepts and slopes between those who study at the same school and between those who study at different schools, and students’ gender is significant to explain part of this variation. Professors’ years of teaching experience at each school (level-3 variable) itself also explains part of the discrepancies in the annual school performance between students from different schools.

The following command, typed after the command sort student year, makes a chart be generated (Fig. 23.47) with the predicted values of school performance throughout time for the first 50 students in the sample (yhatstudent) and, through which, we can see different intercepts and slopes throughout time for different students.

sort student year

graph twoway connected yhatstudent year if student <= 50, connect(L)

Finally, a more inquisitive researcher, aiming at questioning the superiority of multilevel models in relation to traditional regression models estimated through OLS, whenever there are datasets with nested structures, decides to construct a chart. Through this chart, it is possible to compare the predicted school performance values generated by this three-level hierarchical modeling (HLM3) to those generated by an estimation through OLS, for all the students in the sample, in each of the periods analyzed, using the same explanatory variables year, gender, texp, genderyear, and texpyear. Obviously, there are only fixed effects in the estimation through OLS.

Thus, the following sequence of commands is typed, which generates the chart seen in Fig. 23.48:

quietly reg performance year gender texp genderyear texpyear
predict yhatreg
graph twoway mspline yhatreg performance || mspline yhatstudent performance || lfit performance performance ||, legend(label(1 "OLS") label(2 "HLM3") label(3 "Observed Values"))

The dotted line at 45 degrees shows the observed school performance values of each one of the students in the sample in each of the periods analyzed (performance × performance). By using the chart in Fig. 23.48, we can clearly see the superiority of our linear trend model with explanatory variables and random intercepts and slopes in levels 2 and 3 (complete HLM3 model) over the multiple linear regression model estimated through OLS with the same explanatory variables. This demonstrates the importance of considering the random effects components whenever there are nested data structures.

In a consolidated way, Table 23.5 shows the general commands in Stata for preparing a two-level hierarchical linear model with clustered data, and a three-level hierarchical linear model with repeated measures, as studied in Sections 23.5.1 and 23.5.2, respectively. This is a broad topic and new intermediate models can always be estimated by researchers, based on their research objectives and on the constructs proposed.

Table 23.5

Hierarchical Modeling, Intermediate Models (Multilevel Step-Up Strategy) and Commands in Stata
Modeling	Intermediate Model	Command in Stata
Two-Level Hierarchical Linear Model with Clustered Data	Null Model (Nonconditional Model)	xtmixed Y \|\| var(level 2):
	Random Intercepts Model	xtmixed Y X \|\| var(level 2):
	Random Intercepts and Slopes Model	xtmixed Y X \|\| var(level 2): X
	Random Intercepts and Slopes Model and Correlated Error Terms	xtmixed Y X \|\| var(level 2): X covariance(unstructured)
Three-Level Hierarchical Linear Model with Repeated Measures	Null Model (Nonconditional Model)	xtmixed Y \|\| var(level 3): \|\| var(level 2):
	Linear Trend Model with Random Intercepts	xtmixed Y t \|\| var(level 3): \|\| var(level 2):
	Linear Trend Model with Random Intercepts and Slopes	xtmixed Y t \|\| var(level 3): t \|\| var(level 2): t
	Linear Trend Model with Random Intercepts and Slopes and Level-2 Variable	xtmixed Y t X Xt \|\| var(level 3): t \|\| var(level 2): t
	Linear Trend Model with Random Intercepts and Slopes and Level-2 and Level-3 Variables	xtmixed Y t X W Xt Wt WXt \|\| var(level 3): t \|\| var(level 2): t
	Linear Trend Model with Random Intercepts and Slopes and Level-2 and Level-3 Variables and Correlated Error Terms	xtmixed Y t X W Xt Wt WXt \|\| var(level 3): t, covariance(unstructured) \|\| var(level 2): t, covariance(unstructured)

Table 23.5

Note: Considering a level-2 variable X, a level-3 variable W (whenever there is one), and t as the temporal variable. In addition to this, Y refers to the dependent variable. In all the cases, the term that corresponds to the estimation method was omitted. As discussed previously, while the estimation method adopted by Stata up to Version 12 is the restricted maximum likelihood (reml) by default, the default method becomes the maximum likelihood (mle) after Version 13.

After having made these considerations and having respected the multilevel step-up strategy throughout this entire section, at this moment, let’s estimate the same models in SPSS. In order to give researchers the opportunity to compare both software packages, the procedures and routines for estimating the models are presented, as well as the logic with which the outputs are generated.

23.6 Estimation of Hierarchical Linear Models in SPSS

Now, let’s present the step by step for preparing our examples in IBM SPSS Statistics Software. The use of the images in this section has been authorized by the International Business Machines Corporation©.

At this moment, the main objective is to give researchers the opportunity to use multilevel modeling techniques in SPSS. Every time we present an output, we will mention the respective result obtained when preparing the techniques in Stata. So that researchers can compare them and, thus, decide which software package to use, based on the characteristics of each one and on how accessible they are.

23.6.1 Estimation of a Two-Level Hierarchical Linear Model With Clustered Data in SPSS

Going back to the example used in Section 23.5.1, let’s remember that our professor collected data on the school performance (grades from 0 to 100 plus a bonus for participation in class) of 2,000 students from 46 schools. He also collected data on the number of hours spent studying per week (level-1 explanatory variable), the type of school (public or private), and professors’ teaching experience (in years) at each school (level-2 explanatory variables). The complete dataset is in the file PerformanceStudentSchool.sav.

Maintaining the logic presented here, initially, let’s estimate the null model, as follows:

Null Model

${performance}_{ij} = γ_{00} + u_{0 j} + r_{ij}$

Even though it is possible to estimate multilevel models using the option Analyze → Mixed Models in SPSS, based on point-and-click procedures, in this section, we have chosen to estimate the models through syntax, to provide a better comparison for the estimations elaborated in Section 23.5.1, and to facilitate the understanding of how to include variables into fixed and random effects components. In order to do that, with the file PerformanceStudentSchool.sav open, we must click on File → New → Syntax. For the null model, we must type the following syntax in the window that will open:

MIXED performance

/METHOD = REML

/PRINT = SOLUTION TESTCOV

/FIXED = INTERCEPT

/RANDOM = INTERCEPT | SUBJECT(school) .

where the first line (MIXED)⁴ only shows the dependent variable performance and both lines after that (METHOD and PRINT) determine the estimation method adopted (in this case, restricted maximum likelihood estimation, or REML), and that the estimations of the fixed effects with their corresponding standard errors be presented in the outputs, respectively. Finally, in the last two lines (FIXED and RANDOM), in addition to the intercept term, the variables that will be a part of the fixed and random effects components, respectively, can be specified, where the term SUBJECT inserted after the vertical bar | identifies the group variable that corresponds to level 2 (in our case, the variable school).

Fig. 23.49 shows the window in SPSS with the inclusion of the syntax that corresponds to the null model, highlighting the button Run Selection that will have to be clicked so that the multilevel modeling can be estimated.

Next, in Fig. 23.50, the outputs generated by SPSS are presented.

Initially, we can see that the output Model Dimension shows the number of levels considered in the modeling (in this case, 2), and the number of parameters estimated (in this case, 3, including the error term). The term Variance Components informs us that a variance-covariance matrix structure with independent random effects is being considered.

In Information Criteria, the value -2 Restricted Log Likelihood is presented, which corresponds to − 2 times the maximum value obtained for the logarithm of the restricted likelihood function to estimate the model parameters. We can see that the output in SPSS shows that − 2 ∙ LL_r = 17,504.04, which is exactly equal to − 2 times the value presented in Stata (Fig. 23.13), since − 2 ∙(− 8752.02) = 17,504.04.

Next, in Fixed Effects, the estimation of parameter γ₀₀ is presented (fixed effect), which corresponds to the average of students’ expected school performance (horizontal line estimated in the null model, or general intercept). We can see that the estimation of γ₀₀ = 61.049 corresponds to the one obtained in Fig. 23.13 in the estimation of the null model in Stata.

Finally, the estimations of level-1 and level-2 error terms’ variance components (random effects) are presented (Covariance Parameters). Here, we can also verify that the outputs correspond to the ones obtained in Stata, since the estimations of τ₀₀ = 135.779 (Intercept [subject = school]) and σ² = 347.562 (Residual). Nevertheless, note that, different from Stata, SPSS displays the z statistics of the estimations of the error terms’ variances directly, with their respective significance levels. Thus, for the data in our example, we can see that there is variability in the school performance of students from different schools, since Sig. z τ₀₀ < 0.05 (if the confidence level is defined at 95%).

Based on the intraclass correlation, which is calculated later, we can see that approximately 28% of the total school performance variance is due to the alterations between schools.

$rho = \frac{τ_{00}}{τ_{00} + σ^{2}} = \frac{135.779}{135.779 + 347.562} = 0.281$

si130_e

In order to maintain the logic presented in Section 23.5.1, at this moment, let’s estimate the random intercepts model, including the explanatory variable hours, as follows:

Random Intercepts Model

${performance}_{ij} = γ_{00} + γ_{10} \cdot {hours}_{ij} + u_{0 j} + r_{ij}$

The syntax to estimate this model in SPSS is:

MIXED performance WITH hours

/METHOD = REML

/PRINT = SOLUTION TESTCOV

/FIXED = INTERCEPT hours

/RANDOM = INTERCEPT | SUBJECT(school) .

where all the explanatory variables that researchers want must be inserted after the term WITH in the first line of the syntax. After we run it, we arrive at the main outputs shown in Fig. 23.51.

Fig. 23.51 (A) Main outputs of the random intercepts model. (B) Fixed Effects. (C) Covariance Parameters.

These outputs correspond to the ones presented in Fig. 23.15 (Stata) and, through them, we can see that there is statistical significance in the estimations of the variances of error terms τ₀₀ = 19.125 and σ² = 31.764, which result in the following intraclass correlation:

$rho = \frac{τ_{00}}{τ_{00} + σ^{2}} = \frac{19.125}{19.125 + 31.764} = 0.376$

si132_e

Thus, there is an increase in the proportion of the variance component of the intercept in relation to the null model. This favors the decision to include the variable hours to study the school performance behavior when comparing the schools.

Therefore, now, our model starts to have the following specification:

${performance}_{ij} = 0.534 + 3.252 \cdot {hours}_{ij} + u_{0 j} + r_{ij}$

where the fixed effect of the intercept, now, corresponds to the average expected school performance, between schools, of the students who, for some reason, do not study (hours_ij = 0). The slope allows us to state that one more hour spent studying per week, on average, makes the expected mean school performance, between schools, increase 3.252 points, and this parameter is statistically significant.⁵

At this moment, let’s insert slope random effects into our multilevel model that, by maintaining the intercept random effects, will start to have the following expression:

Random Intercepts and Slopes Model

${performance}_{ij} = γ_{00} + γ_{10} \cdot {hours}_{ij} + u_{0 j} + u_{1 j} \cdot {hours}_{ij} + r_{ij}$

The new syntax is:

MIXED performance WITH hours

/METHOD = REML

/PRINT = SOLUTION TESTCOV

/FIXED = INTERCEPT hours

/RANDOM = INTERCEPT hours | SUBJECT(school) .

which generates the outputs shown in Fig. 23.52.

Analogously, these outputs correspond to the ones shown in Fig. 23.18 (Stata).

We can see that the parameter and variance estimations in the random intercepts and slopes model are identical to the ones obtained when the model parameters were estimated. The model that only had random intercepts (Fig. 23.51). This occurs because the estimation of variance τ₁₁ (hours [subject = school]) is statistically equal to zero, which makes the value obtained of − 2 ∙ LL_r the same as the one shown in Fig. 23.51.

Hence, applying a likelihood-ratio test would offer an output that would obviously favor the use of a random intercepts model, since the significance level Sig. χ₁² (12,744.329 − 12,744.329 = 0) = 1.000 > 0.05, as shown in Fig. 23.19.

If researchers wish to generalize the structure of the random effects variance-covariance matrix, allowing u_0j and u_1j to be correlated, they just need to estimate the model parameters using the term COVTYPE(UN) at the end of the RANDOM line of the last syntax. It will become:

MIXED performance WITH hours

/METHOD = REML

/PRINT = SOLUTION TESTCOV

/FIXED = INTERCEPT hours

/RANDOM = INTERCEPT hours | SUBJECT(school) COVTYPE(UN) .

where the term COVTYPE(UN) considers that there is an unstructured variance-covariance matrix. This model’s outputs are not presented here. However, a likelihood-ratio test to compare the estimations of random intercepts and slopes models with independent and correlated error terms u_0j and u_1j will show that the structure of the variance-covariance matrix between u_0j and u_1j can be considered independent, similar to what is shown in Fig. 23.23.

Being independent from the random effects variance-covariance matrix structure and being the random intercepts model the most suitable, let’s now estimate the complete final model that has the following specification:

Complete Final Model

$\begin{array}{l} {performance}_{ij} = γ_{00} + γ_{10} \cdot {hours}_{ij} + γ_{01} \cdot {texp}_{j} + γ_{02} \cdot {priv}_{j} \\ + γ_{11} \cdot {priv}_{j} \cdot {hours}_{ij} + u_{0 j} + r_{ij} \end{array}$

si135_e

Note that we have already begun the last estimation obtained in Section 23.5.1. The syntax to estimate the model is:

MIXED performance WITH hours texp priv

/METHOD = REML

/PRINT = SOLUTION TESTCOV

/FIXED = INTERCEPT hours texp priv priv⁎hours

/RANDOM = INTERCEPT | SUBJECT(school)

/SAVE = PRED FIXPRED .

where the last line shows the term SAVE = PRED FIXPRED, which makes two new variables be generated in the dataset, PRED_1 and FXPRED_1. The former corresponds to the predicted values of school performance per student (yhat in Stata), with random intercepts components u_0j. The latter refers to the predicted values of school performance only resulting from the fixed effects component. The outputs generated are shown in Fig. 23.53 and the expected BLUPS (best linear unbiased predictions) values of our final model’s random effects u_0j can be obtained through the following syntax:

Fig. 23.53 (A) Main outputs of the final complete model with random intercepts. (B) Fixed Effects. (C) Covariance Parameters.

COMPUTE blups = PRED_1-FXPRED_1.

which generates a new variable in the dataset, called blups, equal to the variable u0final defined in the estimation of this model in Stata.

These outputs correspond to the ones presented in Fig. 23.25 (Stata). With significant estimations of the variances of the random effects and of the fixed effects parameters, at a confidence level of 95% (except for the estimation of the parameter of the combined variable hours⁎priv, significant at a confidence level of 90%), we obtain the following expression of the model proposed:

$\begin{array}{l} {performance}_{ij} = - 2.710 + 3.281 \cdot {hours}_{ij} + 0.866 \cdot {texp}_{j} - 5.610 \cdot {priv}_{j} \\ - 0.080 \cdot {priv}_{j} \cdot {hours}_{ij} + u_{0 j} + r_{ij} \end{array}$

si136_e

constructed with the inclusion of level-1 and level-2 explanatory variables and through a multilevel step-up strategy. Hence, we can conclude that there are differences in the school performance behavior between students from the same schools and from different schools. These differences occur due to the number of hours each student spends studying per week, on what type of school it is (public or private), and on the professors’ teaching experience (in years) at each school have.

Next and in SPSS too, let’s study an example with a three-level hierarchical linear model with repeated measures.

23.6.2 Estimation of a Three-Level Hierarchical Linear Model With Repeated Measures in SPSS

In this section, we are going back to the example used in Section 23.5.2. Bear in mind that our professor managed to get data on the school performance (grades from 0 to 100) throughout four years (level-1 temporal variable) of 2000 students from 15 schools. He also collected data on each student’s gender (level-2 explanatory variable), and on professors’ years of teaching experience in each of the schools (level-3 explanatory variable). The complete dataset is presented in the file PerformanceTimeStudentSchool.sav.

It is important to mention that the time SPSS takes to process estimations of multilevel models is considerably longer than Stata, mainly for three or more levels.

Maintaining the logic presented in Section 23.5.2, initially, let’s estimate the null model, as follows:

Null Model

${performance}_{tjk} = γ_{000} + u_{00 k} + r_{0 jk} + e_{tjk}$

For this null model, we must type the following routine in the syntax window:

MIXED performance

/METHOD = REML

/PRINT = SOLUTION TESTCOV

/FIXED = INTERCEPT

/RANDOM = INTERCEPT | SUBJECT(student)

/RANDOM = INTERCEPT | SUBJECT(school) .

where the first line (MIXED) only shows the dependent variable performance and both lines after that (METHOD and PRINT) determine the estimation method adopted (in this case, restricted maximum likelihood estimation, or REML), and that the estimations of the fixed effects with their corresponding standard errors be presented in the outputs. In the following line (FIXED), the variable that will be a part of the fixed effects components can be specified, in addition to the intercept term. Finally, in the last two lines of the routine (RANDOM), besides the intercept terms, the variables that will be part of the random effects components in the different levels of the analysis can be specified. The term SUBJECT inserted after the vertical bar | identifies the group variable that corresponds to each level (in our case, student for level 2 and school for level 3).

Fig. 23.54 shows the outputs generated by SPSS.

We will not analyze all the outputs of the model generated once again, because they are identical to the ones shown in Fig. 23.34, obtained in the estimation of this null model in Stata.

Nevertheless, we can see that the estimation of parameter γ₀₀₀ (Fixed Effects) is equal to 68.714, which corresponds to the average of the students’ expected annual school performance (horizontal line estimated in the null model, or general intercept).

Besides, we know that the estimations of the error terms’ variances (Covariance Parameters) τ_u000 = 180.194 (Intercept [subject = school]), τ_r000 = 325.799 (Intercept [subject = student]), and σ² = 41.649 (Residual) are statistically different from zero, at a significance level of 0.05. This fact allows us to state that there is significant variability in the school performance throughout the four years of the analysis, there is significant variability in the school performance, throughout time, between students of the same school, and there is significant variability in the school performance, throughout time, between students from different schools.

Both intraclass correlations, which correspond to levels 2 and 3 of the analysis, can be calculated as follows:

• Level-2 intraclass correlation

${rho}_{student ∣ school} = corr (Y_{tjk}, Y_{t ´ jk}) = \frac{τ_{u 000} + τ_{r 000}}{τ_{u 000} + τ_{r 000} + σ^{2}} = \frac{180.194 + 325.799}{180.194 + 325.799 + 41.649} = 0.924$

si83_e

• Level-3 intraclass correlation

${rho}_{school} = corr (Y_{tjk}, Y_{t ´ j ´ k}) = \frac{τ_{u 000}}{τ_{u 000} + τ_{r 000} + σ^{2}} = \frac{180.194}{180.194 + 325.799 + 41.649} = 0.329$

si84_e

Thus, the correlation between the annual school performances, for the same school, is equal to 32.9% (rho_school), and the correlation between the annual school performances, for the same student of a certain school, is equal to 92.4% (rho_{student | school}).

In order to maintain the same logic presented in Section 23.5.2, at this moment, let’s estimate the linear trend model with random intercepts and slopes, including the variable year (repeated measure) as an explanatory variable into level 1, as follows:

Linear Trend Model with Random Intercepts and Slopes

${performance}_{tjk} = γ_{000} + γ_{100} \cdot {year}_{jk} + u_{00 k} + u_{10 k} \cdot {year}_{jk} + r_{0 jk} + r_{1 jk} \cdot {year}_{jk} + e_{tjk}$

The syntax to estimate this model in SPSS is:

MIXED performance WITH year

/METHOD = REML

/PRINT = SOLUTION TESTCOV

/FIXED = INTERCEPT year

/RANDOM = INTERCEPT year | SUBJECT(student)

/RANDOM = INTERCEPT year | SUBJECT(school) .

where all the explanatory variables that researchers want must be inserted after the term WITH in the first line of the syntax. After nine iterations and a few processing minutes, we arrived at the main outputs shown in Fig. 23.55.

Fig. 23.55 (A) Main outputs of the linear trend model with random intercepts and slopes. (B) Fixed Effects. (C) Covariance Parameters.

These outputs correspond to the ones shown in Fig. 23.39. Through them, we can see that the parameters estimated of the fixed and random effects components are statistically different from zero, at a significance level of 0.05. This gives us subsidies to state that students’ school performance follows a linear trend throughout time, and that there is a significant variance of intercepts and slopes between those who study at the same school and between those who study at different schools⁶. By using the level-2 intraclass correlation, calculated later, we estimate that the random effects of students and schools form approximately 99% of the total variance of the residuals!

$\begin{array}{l} {rho}_{student ∣ school} = corr (Y_{tjk}, Y_{t ´ jk}) = \frac{τ_{u 000} + τ_{u 100} + τ_{r 000} + τ_{r 100}}{τ_{u 000} + τ_{u 100} + τ_{r 000} + τ_{r 100} + σ^{2}} \\ = \frac{224.343 + 0.560 + 374.285 + 3.157}{224.343 + 0.560 + 374.285 + 3.157 + 3.868} = 0.994 \end{array}$

si141_e

At this moment, our model starts to have the following specification:

${performance}_{tjk} = 57.858 + 4.343 \cdot {year}_{jk} + u_{00 k} + u_{10 k} \cdot {year}_{jk} + r_{0 jk} + r_{1 jk} \cdot {year}_{jk} + e_{tjk}$

Finally, let’s investigate if level-2 and level-3 variables gender and texp also explain the variation in the annual school performance between students. After some intermediate analyses, let’s move on to estimate the following complete three-level model:

Linear Trend Model with Random Intercepts and Slopes, Level-2 Variable gender, and Level-3 Variable texp (Complete Model)

si143_e

To estimate this model, let’s generalize the structure of the random effects variance-covariance matrices, allowing (u_00k, u_10k) and (r_0jk, r_1jk) to be correlated (unstructured variance-covariance matrices). In order to do that, we must insert the term COVTYPE(UN) at the end of the RANDOM lines, making the syntax in SPSS become:

MIXED performance WITH year gender texp

/METHOD = REML

/PRINT = SOLUTION TESTCOV

/FIXED = INTERCEPT year gender texp gender⁎year texp⁎year

/RANDOM = INTERCEPT year | SUBJECT(student) COVTYPE(UN)

/RANDOM = INTERCEPT year | SUBJECT(school) COVTYPE(UN)

/SAVE = PRED FIXPRED RESID .

where the last line now shows the term SAVE = PRED FIXPRED RESID, which makes three new variables be generated in the dataset, PRED_1, FXPRED_1, and RESID_1. They correspond to the predicted values of the school performance per student (yhatstudent in Stata), to the predicted values of the school performance only resulting from the fixed effects component, and to error terms e_tjk, respectively.

After five iterations and a few processing minutes, we arrived at the outputs shown in Fig. 23.56.

These outputs correspond to the ones shown in Fig. 23.43 (Stata) and, through which, we can see that all the parameters estimated for the fixed effects component are statistically different from zero, at a significance level of 0.05. On the other hand, in relation to the parameters of the random effects components, only the estimations of u_10k and cov(u_00k, u_10k) are statistically significant at a significance level of 0.10. All the others are significant at a significance level of 0.05. Thus, considering that cov(u_00k, u_10k) and cov(r_0jk, r_1jk) are statistically different from zero, we can write:

• Random effects variance-covariance matrix for level school

$var [\begin{array}{c} u_{00 k} \\ u_{10 k} \end{array}] = [\begin{array}{c} 88.734 & - 3.185 \\ - 3.185 & 0.255 \end{array}]$

si144_e

• Random effects variance-covariance matrix for level student

$var [\begin{array}{c} r_{0 jk} \\ r_{1 jk} \end{array}] = [\begin{array}{c} 350.913 & - 13.251 \\ - 13.251 & 3.257 \end{array}]$

si145_e

Therefore, the expression of our final model has the following specification⁷:

si146_e

constructed with the inclusion of level-1 and level-2 explanatory variables and through a multilevel step-up strategy.

Therefore, we can conclude that students’ school performance follows a linear trend throughout time. In addition, there is a significant variance of intercepts and slopes between those who study at the same school and between those who study at different schools. Students’ gender is significant to explain part of this variation. Professors’ years of teaching experience at each school also explains part of the discrepancies in the annual school performance between students from different schools.

Similar to Table 23.5, presented at the end of Section 23.5, Table 23.6 consolidates the general estimation routines, in SPSS, for multilevel models.

Table 23.6

Hierarchical Modeling, Intermediate Models (Multilevel Step-Up Strategy) and Routines in SPSS
Modeling	Intermediate Model	Routine in SPSS
Two-Level Hierarchical Linear Model with Clustered Data	Null Model (Nonconditional Model)	MIXED Y /FIXED = INTERCEPT /RANDOM = INTERCEPT \| SUBJECT(level2_var) .
	Random Intercepts Model	MIXED Y WITH X /FIXED = INTERCEPT X /RANDOM = INTERCEPT \| SUBJECT(level2_var) .
	Random Intercepts and Slopes Model	MIXED Y WITH X /FIXED = INTERCEPT X /RANDOM = INTERCEPT X \| SUBJECT(level2_var) .
	Random Intercepts and Slopes Model and Correlated Error Terms	MIXED Y WITH X /FIXED = INTERCEPT X /RANDOM = INTERCEPT X \| SUBJECT(level2_var) COVTYPE(UN) .
Three-Level Hierarchical Linear Model with Repeated Measures	Null Model (Nonconditional Model)	MIXED Y /FIXED = INTERCEPT /RANDOM = INTERCEPT \| SUBJECT(level2_var) /RANDOM = INTERCEPT \| SUBJECT(level3_var) .
	Linear Trend Model with Random Intercepts	MIXED Y WITH t /FIXED = INTERCEPT t /RANDOM = INTERCEPT \| SUBJECT(level2_var) /RANDOM = INTERCEPT \| SUBJECT(level3_var) .
	Linear Trend Model with Random Intercepts and Slopes	MIXED Y WITH t /FIXED = INTERCEPT t /RANDOM = INTERCEPT t \| SUBJECT(level2_var) /RANDOM = INTERCEPT t \| SUBJECT(level3_var) .
	Linear Trend Model with Random Intercepts and Slopes and Level-2 Variable	MIXED Y WITH t X /FIXED = INTERCEPT t X X⁎t /RANDOM = INTERCEPT t \| SUBJECT(level2_var) /RANDOM = INTERCEPT t \| SUBJECT(level3_var) .
	Linear Trend Model with Random Intercepts and Slopes and Level-2 and Level-3 Variables	MIXED Y WITH t X W /FIXED = INTERCEPT t X W X⁎t W⁎t W⁎X⁎t /RANDOM = INTERCEPT t \| SUBJECT(level2_var) /RANDOM = INTERCEPT t \| SUBJECT(level3_var) .
	Linear Trend Model with Random Intercepts and Slopes and Level-2 and Level-3 Variables and Correlated Error Terms	MIXED Y WITH t X W /FIXED = INTERCEPT t X W X⁎t W⁎t W⁎X⁎t /RANDOM = INTERCEPT t \| SUBJECT(level2_var) COVTYPE(UN) /RANDOM = INTERCEPT t \| SUBJECT(level3_var) COVTYPE(UN) .

Table 23.6

Note: Considering a level-2 variable X, a level-3 variable W (whenever there is one), and t as the temporal variable. Besides, Y refers to the dependent variable. In all the commands, having considered an estimation through restricted maximum likelihood estimation (omitted term /METHOD = REML).

23.7 Final Remarks

Data mining is a broad theme that is only beginning to be explored in depth in the field of business. This chapter only provides a brief discussion about the concepts, processes, stages, tasks, and the types of methods and techniques it can employ.

In this context, we believe that one of the most recent and relevant modeling techniques within the data-mining environment is multilevel modeling. It allows researchers and managers to assess the relationship between a certain performance variable and one or more predictor variables, which characterize different analysis levels. Moreover, each level is formed by individuals or groups nested into other groups and so on. Since variables from a certain group are invariable between groups or individuals that correspond to lower levels that are nested into that group, it is natural for many researches and constructs to use such models. Since many datasets have nested data structures, as those that simultaneously have students' and school, company and country, municipality and state, or real estate and neighborhood characteristics, for instance.

Many can be the characteristics of the datasets with nested data structures. The most common are those with absolute nesting, in which there are clustered data or data with repeated measures. In this chapter, we chose to present examples in which datasets are used to estimate two-level hierarchical linear models with clustered data and three-level hierarchical linear models with repeated measures. Nonetheless, from which, we believe researchers will have the conditions to estimate, for example, three-level models with clustered data or even consider a higher number of analysis levels, resulting from more complex nesting structures.

Multilevel models allow us to identify and analyze individual heterogeneities and the heterogeneities between the groups to which these individuals belong, making it possible to specify random components in each analysis level. This fact represents the main difference of the traditional regression models estimated through OLS, which cannot consider the natural nesting of data and, consequently, generate biased parameter estimators.

Although many papers use multilevel models only to estimate null models to investigate the variance decomposition of the phenomenon being studied in the different analysis levels, the possibility of including explanatory variables that correspond to the different levels in the fixed and random effects components allows us to investigate possible relationships between these variables and the dependent variable. This makes it possible to establish new research objectives and interesting constructs.

Currently, it is possible to see a growing concern of software and tools manufacturers regarding the processing capability of commands and routines to estimate more complex multilevel models. We cannot forget to mention the important and educational software HLM (Hierarchical Linear and Nonlinear Modeling), produced by Scientific Software International (SSI) and developed by Professors Stephen Raudenbush (University of Michigan), Anthony Bryk (University of Chicago), and Richard Congdon (Harvard University).

To estimate multilevel models, as well as for any other modeling technique, it is necessary for the application to be accompanied by methodological rigor and certain care when analyzing the results, mainly if these are meant for making forecasts. The use of a certain estimation method, to the detriment of another, can help researchers and managers choose the most suitable model, adding value to their research, and allowing new studies on the topic chosen to be carried out.

Discovering implicit and contextual standards from larger and larger volumes of data becomes an essential condition for organizations to become successful in competitive environments, and multilevel modeling contributes in a considerable way with a list of techniques for the data-mining process.

23.8 Exercises

1) The organization of an international science competition for high school students from 24 countries (j = 1, ..., 24) wishes to investigate participants’ performance behavior based on their characteristics and on the characteristics of the countries they came from. Even though the coordinators of the event know that performance is a result of several factors, such as, participants’ dedication and the characteristics of the schools where they study. At this moment, they wish to try to verify if there is a relationship between the scores obtained in the competition, students’ social status, translated by their median household income, and the importance given by their countries to issues, such as, scientific and technological development, translated here by the investments in research and development. The dataset collected, which contains data on the top five students from each country, which represents a total of 120 participants in the competition (i = 1, ..., 120), and generates a balanced clustered data structure, can be found in the file Science_Competition.dta. The variables found in this dataset are:

Variable	Description
country	A string variable that identifies the country.
idcountry	Country code j.
resdevel	Country’s investments in research and development, in % of the GDP (Source: World Bank).
idstudent	Student code i.
score	Science score obtained by the student in the competition (0 to 100).
income	Student’s median household income per month (US$).

By using this dataset, we would like you to:

a) Elaborate a table that proves the existence of a balanced clustered data structure of students in countries.

b) Construct charts that allow us to visualize the average score obtained in the science competition by the participants from each country.

c) Given the existence of two analysis levels, with students (level 1) nested into countries (level 2), estimate the following null model:

${score}_{ij} = b_{0 j} + r_{ij}$

$b_{0 j} = γ_{00} + u_{0 j}$

which results in:

${score}_{ij} = γ_{00} + u_{0 j} + r_{ij}$

d) Through the estimation of the null model, is it possible to verify if there is variability in the scores obtained between students from different countries?
e) From the result of the likelihood-ratio test generated, is it possible to reject the null hypothesis that the random intercepts are equal to zero? That is, is it possible to rule out the estimation of a traditional linear regression model for these clustered data?
f) Also based on the estimation of the null model, calculate the intraclass correlation and discuss the result.
g) Construct a chart that has a linear adjustment by OLS, for each country, of each student’s science score behavior based on their median household income.

h) Estimate the following random intercepts model:

${score}_{ij} = b_{0 j} + b_{1 j} \cdot {income}_{ij} + r_{ij}$

$b_{0 j} = γ_{00} + u_{0 j}$

$b_{1 j} = γ_{10},$

which results in:

${score}_{ij} = γ_{00} + γ_{10} \cdot {income}_{ij} + u_{0 j} + r_{ij}$

i) At a significance level of 0.05, discuss the statistical significance of the estimations of fixed and random effects parameters.
j) Construct a bar chart that allows us to visualize random intercept terms u_0j per country.

k) Estimate the following random intercepts and slopes model:

${score}_{ij} = b_{0 j} + b_{1 j} \cdot {income}_{ij} + r_{ij}$

$b_{0 j} = γ_{00} + u_{0 j}$

$b_{1 j} = γ_{10} + u_{1 j}$

which results in:

${score}_{ij} = γ_{00} + γ_{10} \cdot {income}_{ij} + u_{0 j} + u_{1 j} \cdot {income}_{ij} + r_{ij}$

l) Based on the estimations of the random intercepts model and random intercepts and slopes model, run a likelihood-ratio test and discuss the result.

m) Estimate the following multilevel model:

${score}_{ij} = b_{0 j} + b_{1 j} \cdot {income}_{ij} + r_{ij}$

$b_{0 j} = γ_{00} + u_{0 j}$

$b_{1 j} = γ_{10} + γ_{11} \cdot {resdevel}_{j}$

which results in:

${score}_{ij} = γ_{00} + γ_{10} \cdot {income}_{ij} + γ_{11} \cdot {resdevel}_{j} \cdot {income}_{ij} + u_{0 j} + r_{ij}$

n) Present the expression of the last model estimated, with random intercepts and level-1 and level-2 variables.
o) Construct a chart in which it is possible to compare the predicted values of the score obtained in the science competition, generated through this two-level hierarchical modeling (HLM2), to the real values obtained (observed values) by the students of the sample.

2) A firm that rents commercial offices has a portfolio with 277 properties in a certain municipality. Its board of directors would like to find out if there are differences in the rental prices per square meter between properties and in the average rental prices between different districts, throughout time. In order to do that, the marketing team structured a dataset, which can be found in the file Commercial_Properties.dta. It contains the characteristics of these 277 offices that have already been rented (j = 1, ..., 277), whose rental prices were monitored for the last six years (t = 1, ..., 6), and of the 15 municipal districts (k = 1, ..., 15), where these properties are located. The variables found in this dataset are:

Variable	Description
district	District code k.
property	Property code j.
lnp	Natural logarithm of the rental price per square meter (adjusted by the inflation, base year 1).
year	Temporal variable (repeated measure) that corresponds to the period of monitoring (year 1 to 6).
food	Is there a restaurant or food court in the building where the property is located? (No = 0; Yes = 1).
space4	Are there four or more parking spaces? (No = 0; Yes = 1).
valet	Is there valet parking in the building where the property is located? (No = 0; Yes = 1).
subway	Is there a subway station in the district where the property is located? (No = 0; Yes = 1).
violence	Average mortality rate due to external causes in the district where the property is located (per 100,000 inhabitants).

This dataset, in which periods (level 1) are nested into properties (level 2), and these into districts (level 3), is structured according to the logic presented in the following figure:

We would like you to:

a) Elaborate a table that proves the existence of an unbalanced clustered data structure of properties in districts.

b) Elaborate a table that proves the existence of an unbalanced panel data in relation to the property periods of monitoring.

c) Construct a chart that allows us to visualize the temporal evolution of the natural logarithm of the rental price per square meter of the properties under analysis.

d) Construct a chart that allows us to check if there is an approximately linear behavior of the mean of the natural logarithm of the rental price per square meter of the properties during the periods.

e) Construct a chart that has, per municipal district, the temporal evolutions of the means of the natural logarithms of the rental prices per square meter of the properties (linear adjustments through OLS).

f) Given the existence of three analysis levels, with repeated measures (level 1) nested into the properties (level 2), and these nested into the municipal districts (level 3), estimate the following null model:

$ln {(p)}_{tjk} = π_{0 jk} + e_{tjk}$

$π_{0 jk} = b_{00 k} + r_{0 jk}$

$b_{00 k} = γ_{000} + u_{00 k}$

which results in:

$ln {(p)}_{tjk} = γ_{000} + u_{00 k} + r_{0 jk} + e_{tjk}$

g) Based on the estimation of the null model, calculate the level-2 and level-3 intraclass correlations and discuss the results.
h) Still through the estimation of the null model, is it possible to state that there is variability in the rental price of the commercial properties throughout the period analyzed, and that there is variability in the rental price, throughout time, between properties in the same district, and between properties located in different districts?
i) From the result of the likelihood-ratio test generated, is it possible to reject the null hypothesis that the random intercepts are equal to zero? That is, is it possible to rule out the estimation of a traditional linear regression model for these data?
j) Estimate the following linear trend model with random intercepts:

$ln {(p)}_{tjk} = π_{0 jk} + π_{1 jk} \cdot {year}_{jk} + e_{tjk}$

$π_{0 jk} = b_{00 k} + r_{0 jk}$

$π_{1 jk} = b_{10 k}$

$b_{00 k} = γ_{000} + u_{00 k}$

$b_{10 k} = γ_{100},$

which results in the following expression:

$ln {(p)}_{tjk} = γ_{000} + γ_{100} \cdot {year}_{jk} + u_{00 k} + r_{0 jk} + e_{tjk}$

k) At a significance level of 0.05, discuss the statistical significance of the estimations of fixed and random effects parameters.
l) Construct two bar charts that allow us to visualize the random intercepts per district and per property.

m) Estimate the following linear trend model with random intercepts and slopes:

$ln {(p)}_{tjk} = π_{0 jk} + π_{1 jk} \cdot {year}_{jk} + e_{tjk}$

$π_{0 jk} = b_{00 k} + r_{0 jk}$

$π_{1 jk} = b_{10 k} + r_{1 jk}$

$b_{00 k} = γ_{000} + u_{00 k}$

$b_{10 k} = γ_{100} + u_{10 k}$

which results in:

$ln {(p)}_{tjk} = γ_{000} + γ_{100} \cdot {year}_{jk} + u_{00 k} + u_{10 k} \cdot {year}_{jk} + r_{0 jk} + r_{1 jk} \cdot {year}_{jk} + e_{tjk}$

n) Calculate the new level-2 and level-3 intraclass correlations and discuss the results.
o) Run a likelihood-ratio test to compare the estimations of the linear trend models with random intercepts and with random intercepts and slopes.

p) Estimate the following linear trend model with random intercepts and slopes and level-2 variables:

$ln {(p)}_{tjk} = π_{0 jk} + π_{1 jk} \cdot {year}_{jk} + e_{tjk}$

$π_{0 jk} = b_{00 k} + b_{01 k} \cdot {food}_{jk} + b_{02 k} \cdot space 4_{jk} + r_{0 jk}$

$π_{1 jk} = b_{10 k} + b_{11 k} \cdot {valet}_{jk} + r_{1 jk}$

$b_{00 k} = γ_{000} + u_{00 k}$

$b_{01 k} = γ_{010}$

$b_{02 k} = γ_{020}$

$b_{10 k} = γ_{100} + u_{10 k}$

$b_{11 k} = γ_{110},$

which results in the following expression:

$\begin{array}{l} ln {(p)}_{tjk} = γ_{000} + γ_{100} \cdot {year}_{jk} + γ_{010} \cdot {food}_{jk} + γ_{020} \cdot space 4_{jk} + γ_{110} \cdot {valet}_{jk} \cdot {year}_{jk} \\ + u_{00 k} + u_{10 k} \cdot {year}_{jk} + r_{0 jk} + r_{1 jk} \cdot {year}_{jk} + e_{tjk} \end{array}$

si186_e

q) Present the expression of the last model estimated, with repeated measures, random intercepts and slopes, and level-2 variables.
r) Through this model, is it possible to state that the natural logarithm of the rental price per square meter of the properties follows a linear trend throughout time, and that there is a significant variance of intercepts and slopes between those located in the same district and between those located in different districts? If yes, does the existence of a restaurant or food court, the existence of four or more parking spaces, and the existence of valet parking in the building where the property is located explain part of this variability?
s) Estimate the following linear trend model with random intercepts and slopes and level-2 and level-3 variables:

$ln {(p)}_{tjk} = π_{0 jk} + π_{1 jk} \cdot {year}_{jk} + e_{tjk}$

$π_{0 jk} = b_{00 k} + b_{01 k} \cdot {food}_{jk} + b_{02 k} \cdot space 4_{jk} + r_{0 jk}$

$π_{1 jk} = b_{10 k} + b_{11 k} \cdot {valet}_{jk} + r_{1 jk}$

$b_{00 k} = γ_{000} + γ_{001} \cdot {subway}_{k} + u_{00 k}$

$b_{01 k} = γ_{010}$

$b_{02 k} = γ_{020}$

$b_{10 k} = γ_{100} + γ_{101} \cdot {subway}_{k} + γ_{102} \cdot {violence}_{k} + u_{10 k}$

$b_{11 k} = γ_{110},$

which results in the following expression:

$\begin{array}{l} ln {(p)}_{tjk} = γ_{000} + γ_{100} \cdot {year}_{jk} + γ_{010} \cdot {food}_{jk} + γ_{020} \cdot space 4_{jk} + γ_{001} \cdot {subway}_{k} \\ + γ_{110} \cdot {valet}_{jk} \cdot {year}_{jk} + γ_{101} \cdot {subway}_{k} \cdot {year}_{jk} + γ_{102} \cdot {violence}_{k} \cdot {year}_{jk} \\ + u_{00 k} + u_{10 k} \cdot {year}_{jk} + r_{0 jk} + r_{1 jk} \cdot {year}_{jk} + e_{tjk} \end{array}$

si195_e

t) Present the random effects variance-covariance matrices for the levels district and property.
u) Estimate the same linear trend model with random intercepts and slopes, and level-2 and level-3 variables; however, now considering correlated random effects (u_00k, u_10k) and (r_0jk, r_1jk).

v) Present the random effects variance-covariance matrices for the levels district and property.
w) Run a likelihood-ratio test to compare the estimations of the models with independent and correlated random effects (u_00k, u_10k) and (r_0jk, r_1jk). What can we conclude based on the result of this test?

x) What is the final expression of the multilevel model estimated?
y) Is it possible to state that the existence of subways and the violence rate in the district explain part of the variability of the evolution of the natural logarithm of the rental price per square meter between the properties located in different districts?
z) Construct a chart in which it is possible to compare the predicted values of the natural logarithm of the rental price per square meter generated through this three-level hierarchical modeling (HLM3) to those generated through an estimate by OLS—which uses the same explanatory variables of the model in Item (x) inserted into the fixed effects component (year, food, space4, subway, valet⁎year, subway⁎year, and violence⁎year)—and to the real values observed of the natural logarithm of the rental price per square meter of the properties.

Appendix

A.1 Hierarchical Nonlinear Models

As we have already discussed, the generalized linear latent and mixed models (GLLAMM), similar to the generalized linear models (GLM), encompass the hierarchical linear models (HLM) we studied throughout this chapter, and the hierarchical nonlinear models (HNM). The latter refer to the situations in which, if there is a nested data structure, the dependent variable presents itself as a categorical variable or as a variable with count data, reason why we have chosen to present examples of hierarchical nonlinear models as logistic, Poisson and negative binomial in this Appendix. Fig. 23.57 shows the logic of the generalized linear latent and mixed models, highlighting the models that will be studied from now.

(A) Hierarchical Logistic Models

Fig. 23.57 Generalized linear latent and mixed models, highlighting the hierarchical nonlinear models.

Analogous to what we studied in Chapter 14, mixed effects logistic regression models can be used whenever the dependent variable is qualitative and dichotomic, the data are found in a certain nested structure (in levels), and there may be clustered data or data with repeated measures. In these situations, researchers can estimate a model aiming at capturing the relationship between the behavior of explanatory variables and the occurrence of the phenomenon being studied, represented by a dichotomic variable (dummy), as well as studying the variance decomposition of the random effects components due to the presence of a multilevel structure.

In this section, we will present a two-level hierarchical logistic model with clustered data. In general and from Expressions (14.10) and (23.23), we can define this model with two analysis levels. The first level offers explanatory variables X₁, ..., X_Q, which refer to each individual i (i = 1, ..., n), and the second level, explanatory variables W₁, ..., W_S that refer to each group j (j = 1, ..., J), invariable for the observations that belong to the same group, as follows:

$Level 1 : p_{ij} = \frac{1}{1 + e^{- (b_{0 j} + b_{1 j} \cdot X_{1 ij} + b_{2 j} \cdot X_{2 ij} + \dots + b_{Qj} \cdot X_{Qij})}}$

si196_e (23.45)

where p_ij represents the probability of occurrence of the event we are interested in for each observation i that belongs to a certain group j, and b_qj (q = 0, 1, ..., Q) refer to the level-1 coefficients.

$Level 2 : b_{qj} = γ_{q 0} + \sum_{s = 1}^{S_{q}} γ_{qs} \cdot W_{sj} + u_{qj}$

si197_e (23.46)

where γ_qs (s = 0, 1, ..., S_q) refer to the level-2 coefficients, and u_qj are the level-2 random effects, normally distributed, with mean equal to zero and variance τ_qq. Furthermore, possible independent error terms of u_qj have a mean equal to zero and variance π²/3.

At this moment, let’s present an example. A research was carried out at a global level aiming at investigating if there are differences when couples, who reside in different countries, travel abroad for tourism. In order to do that, data on 1,622 couples located in 50 countries were collected, as well as the average age of each couple, and the number of children they have. Part of the dataset is presented in Table 23.7. However, the complete dataset can be found in the file Tourism.dta.

Table 23.7

Example: Traveling Abroad of Couples (Level 1) Residents in Different Countries (Level 2)
Observation (Couple i - Level 1)	Country j Where the Couple Lives (Level 2)	Traveled Abroad for Tourism in the Last Year (Y_ij)	Couple’s Average Age (X_1ij)	Number of Children (X_2ij)
1	France	Yes	68	2
2	France	Yes	37	0
…
117	France	Yes	54	3
…
1,604	Egypt	No	55	2
1,605	Egypt	No	51	2
…
1,622	Egypt	Yes	39	0

Table 23.7

After opening this file, we can type the command desc, which makes it possible to analyze the dataset characteristics, such as, the number of observations, the number of variables, and the description of each one of them. Fig. 23.58 shows this output in Stata.

Fig. 23.58 Description of the Tourism.dta Dataset.

Since the main goal of this Appendix is not to discuss the concepts presented throughout this chapter once again, let’s carry out the following estimation:

$p {(tourism)}_{ij} = \frac{1}{1 + e^{- (b_{0 j} + b_{1 j} \cdot {age}_{ij} + b_{2 j} \cdot {children}_{ij})}}$

si198_e

$b_{0 j} = γ_{00} + u_{0 j}$

$b_{1 j} = γ_{10}$

$b_{2 j} = γ_{20},$

which results in the random intercepts model:

$p {(tourism)}_{ij} = \frac{1}{1 + e^{- (γ_{00} + γ_{10} \cdot {age}_{ij} + γ_{20} \cdot {children}_{ij} + u_{0 j})}}$

si202_e

where the variable tourism is dichotomic (dummy), in which values equal to 1 correspond to the couples that traveled abroad for tourism in the last year, and values equal to 0 are the opposite.

To estimate this model in Stata, we must type the following command:

melogit tourism age children || country: , nolog⁸

whose outputs can be found in Fig. 23.59.

Based on this figure, initially, we can see that we have 1622 observations (couples) nested into 50 groups (countries), which characterizes a two-level clustered data structure.

A more inquisitive researcher may verify that the parameter estimations of the fixed and random effects components are identical to the ones that would be obtained through the following command:

meglm tourism age children || country: , family(bernoulli) link(logit) nolog

where the term meglm means multilevel mixed effects generalized linear model. Therefore, that makes it necessary to define the family of distributions of the dependent variable and, in this case, it is Bernoulli, and the canonic link function, which in this situation is logistic.⁹

Moreover, the odds ratios of the fixed effects parameters can also be obtained directly, by typing the term or (odds ratio) at the end of the commands presented.

Given that the independent error terms of u_qj have variance equal to π²/3, we can define the following intraclass correlation:

$rho = \frac{τ_{00}}{τ_{00} + \frac{π^{2}}{3}} = \frac{0.255}{0.255 + \frac{π^{2}}{3}} = 0.072,$

si203_e

which suggests that approximately 7% of the total variance of the error terms are due to alterations in the dependent variable’s behavior between countries. After Stata 13, it is possible to obtain this intraclass correlation directly, by typing the command estat icc right after the estimation of the corresponding model.

Even though Stata does not show the result of the z tests with their respective significance levels for the random effects parameters directly, the fact that the estimation of variance component τ₀₀, which corresponds to random intercepts u_0j, is considerably higher than its standard error suggests that there is significant variation in the behavior of couples who reside in different countries when it comes to traveling abroad for tourism. Statistically, we can see that z = 0.255 / 0.088 = 2.90 > 1.96, being 1.96 the critical value of the standardized normal distribution which results in a significance level of 0.05.

Even if country variables that may possibly explain such behavior have not been considered, such as, cultural, economic, or social characteristics, we are able to verify that, while an increment in age increases the expected probability that couples will start traveling abroad for tourism, ceteris paribus, traveling decreases with the increment in the number of children, also ceteris paribus. The model estimated has the following expression:

$p {(tourism)}_{ij} = \frac{1}{1 + e^{- (0.439 + 0.015 \cdot {age}_{ij} - 0, 424 \cdot {children}_{ij} + u_{0 j})}}$

si204_e

At the bottom of Fig. 23.59, we can see, from the result of the likelihood-ratio test, that the estimation of this multilevel model is more suitable than the estimation of a traditional binary logistic regression model for the data in our example.

Therefore, we can obtain the expected probability values of the occurrence of the event being studied (traveling abroad for tourism) for each of the couples in the sample. In order to do that, we must type the following command, which generates a new variable (phat) in the dataset:

predict phat

Besides, we can also obtain the error terms u_0j, invariable for couples from the same country. In order to do that, we must type the following command:

predict u0, remeans

which makes the new variable, u0, also be generated in the dataset.

The following command, which generates the outputs seen in Fig. 23.60, shows the values of phat and the error terms u0 only for the couples who reside in Brazil:

list country tourism phat u0 if country == "Brazil"

Only for educational purposes, researchers may verify that variable phat can also be generated through the following expression:

gen phat = (1) / (1 + exp(-(0.4393717 + 0.0150543⁎age - 0.4239421⁎children + u0)))

Finally, we can construct a chart that shows, based on the variable children, the adjustments of curves S (sigmoid functions) of the expected probabilities that couples who reside in five specific countries, chosen based on their different locations around the globe, travel abroad for tourism. This chart, which can be seen in Fig. 23.61, is obtained by typing the following command:

Fig. 23.61 Adjustments of the expected probabilities that couples who reside in five countries travel abroad for tourism, based on the number of children.

graph twoway scatter phat children || mspline phat children if country =="France" || mspline phat children if country =="United States" || mspline phat children if country =="Japan" || mspline phat children if country =="South Africa" || mspline phat children if country =="Venezuela" ||, legend(label(2 "France") label(3 "United States") label(4 "Japan") label(5 "South Africa") label(6 "Venezuela"))

Through this chart, we are able to see the different behavior between couples from different countries in relation to traveling abroad for tourism clearly.

(B) Hierarchical Models for Count Data

Analogous to what we studied in Chapter 15, mixed effects regression models for count data can be used when the dependent variable is quantitative, however, with discrete and non-negative values, and when the data are in a certain nested structure (in levels), and there may be clustered data or data with repeated measures.

In this section, we will present a hierarchical model for count data with three levels and clustered data. In general and from Expressions (15.4), (23.30), and (23.31), we can define this three-level model. The first level shows level-1 explanatory variables Z₁, ..., Z_P, which refer to units i (i = 1, ..., n). The second level, level-2 explanatory variables X₁, ..., X_Q, which refer to units j (j = 1, ..., J), and they are invariable for the units that belong to the same group j. The third level, level-3 explanatory variables W₁, ..., W_S, which refer to units k (k = 1, ..., K), and they are invariable for the units that belong to the same group k. This model is as follows:

$Level 1 : ln (λ_{ijk}) = π_{0 jk} + π_{1 jk} \cdot Z_{1 jk} + π_{2 jk} \cdot Z_{2 jk} + \dots + π_{Pjk} \cdot Z_{Pjk}$

(23.47)

where λ is the expected number of occurrences or the estimated average incidence rate of the phenomenon being studied for a certain exposure. π_pjk (p = 0, 1, ..., P) refer to the level-1 coefficients, and Z_pjk is the p-th level-1 explanatory variable for observation i in level-2 unit j and in level-3 unit k.

$Level 2 : π_{pjk} = b_{p 0 k} + \sum_{q = 1}^{Q_{p}} b_{pqk} \cdot X_{qjk} + r_{pjk}$

si206_e (23.48)

where b_pqk (q = 0, 1, ..., Q_p) refer to the level-2 coefficients. X_qjk is the q-th level-2 explanatory variable for unit j in the level-3 unit k. r_pjk are the level-2 random effects, assuming, for each unit j, that the vector (r_0jk, r_1jk, ..., r_Pjk)´ follows a multivariate normal distribution with each element having mean zero and variance τ_rπpp.

$Level 3 : b_{pqk} = γ_{pq 0} + \sum_{s = 1}^{S_{pq}} γ_{pqs} \cdot W_{sk} + u_{pqk}$

si207_e (23.49)

where γ_pqs (s = 0, 1, ..., S_pq) refer to the level-3 coefficients, W_sk is the s-th level-3 explanatory variable for unit k, and u_pqk are the level-3 random effects, assuming that for each unit k, the vector formed by terms u_pqk follows a multivariate normal distribution with each element having mean zero and variance τ_uπpp.

Imagine that a national research has been carried out aiming at studying the relationship between the number of traffic accidents and the average amount of alcohol ingested per inhabitant/day (in grams). This research was carried out in several Brazilian municipal districts located in the whole country in the last year. It also wants to find out if there are differences in this relationship between districts located in different municipalities and different states of the federation. In order to do that, data from 1,062 municipal districts located in 234 municipalities in all 27 units of the federation (26 states and the Federal District) were analyzed. Part of the dataset is presented in Table 23.8. However, the complete dataset can be found in the file Traffic_Accidents.dta.

Table 23.8

Example: Traffic Accidents in Municipal Districts (Level 1) From Different Municipalities (Level 2) and Different States (Level 3)
State k (Level 3)	Municipality j (Level 2)	Municipal district i (Level 1)	Number of Traffic Accidents in the Last Year (Y_ijk)	Average Amount of Alcohol Ingested per Inhabitant/Day, in Grams (Z_jk)
AC	1	1	9	12.57
AC	2	2	10	13.36
...
AC	3	11	2	12.33
...
TO	231	1,052	2	11.94
TO	231	1,053	3	10.54
...
TO	234	1,062	5	11.74

Table 23.8

Fig. 23.62 shows the output generated in Stata when we typed the command desc.

Following the logic presented in Chapter 15, initially, let’s construct a histogram for the variable accidents, which will be the dependent variable of the model to be proposed. In order to do that, we must type the following command, which generates the histogram in Fig. 23.63.

Fig. 23.63 Histogram of dependent variable accidents.

hist accidents, discrete freq

As studied in Chapter 15, it is interesting for researchers to assess if the mean and variance of the dependent variable are equal, or at least close to one another, before estimating models that involves count data. By doing that, they will have an idea of the suitability of the estimation of the Poisson model, or if it will be necessary to estimate a negative binomial model. By typing the following command, it will be possible for this preliminary diagnostic to be elaborated, whose results can be found in Fig. 23.64:

Fig. 23.64 Mean and variance of the dependent variable accidents.

tabstat accidents, stats(mean var)

Even if the variance of the variable accidents is much higher than its mean, which indicates that there is overdispersion in the data, initially and for educational purposes, we will estimate a Poisson model. In the modeling of the number of traffic accidents, even though a possibility is the inclusion of dummy variables that represent municipalities and states in the fixed effects component, we will treat them as random effects and estimate a multilevel Poisson regression model with three levels and random intercepts. Furthermore, the definition of the existence of overdispersion in the data, which suggests a better suitability of the multilevel negative binomial regression model in relation to the Poisson model, will be elaborated next, through a likelihood-ratio test.

Therefore, let’s carry out the following estimation:

$ln ({accidents}_{ijk}) = π_{0 jk} + π_{1 jk} \cdot {alcohol}_{jk}$

$π_{0 jk} = b_{00 k} + r_{0 jk}$

$π_{1 jk} = b_{10 k}$

$b_{00 k} = γ_{000} + u_{00 k}$

$b_{10 k} = γ_{100},$

which results in the random intercepts model:

$ln ({accidents}_{ijk}) = γ_{000} + γ_{100} \cdot {alcohol}_{jk} + u_{00 k} + r_{0 jk}$

where the variable accidents represents the phenomenon being studied. It is quantitative and only has non-negative and discrete values (count data), indicating the incidence of traffic accidents in the last year in the municipal district i located in the municipality j of state k.

To estimate the model proposed in Stata, we must type the following command:

mepoisson accidents alcohol || state: || municipality: , nolog¹⁰

in which the insertion logic of the different levels follows the same nesting criterion discussed throughout this chapter, that is, from the highest to the lowest level, and these levels are separated by the terms ||. The outputs generated are shown in Fig. 23.65.

Fig. 23.65 Outputs of the multilevel Poisson model with random intercepts in Stata.

Based on this figure, initially, we can see the existence of a three-level unbalanced clustered data structure. Besides, the result of the likelihood-ratio test shows that there is significant variability between the districts located in different municipalities and states, which favors the use of the multilevel Poisson model in relation to a traditional Poisson regression model without random effects.

Before moving on, we can type the command estimates store mepoisson, which makes the results of this estimation be stored for future comparison to the ones that will be obtained through the estimation of the negative binomial model. Moreover, we can also type predict lambda, which generates a new variable in the dataset (lambda) that corresponds to the values estimated of the incidence of traffic accidents in the last year in each of the 1062 municipal districts. Finally, researchers may also type the term irr (incidence rate ratio) at the end of the command presented, as studied in Chapter 15, so that the incidence rates of traffic accidents per year corresponding to the alterations in each fixed effects parameter can be estimated.

An even more inquisitive researcher may verify that the parameter estimations of the fixed and random effects components are identical to the ones that would be obtained through the following command:

meglm accidents alcohol || state: || municipality: , family(poisson) link(log) nolog

which explains, for the generalized linear latent and mixed model (term meglm), that the distribution of the dependent variable considered is the Poisson and the canonic link function is the logarithmic.

After the estimation of the random effects parameters, it is possible for the number of traffic accidents to present overdispersion. Thus, we must re-examine the data by estimating a negative binomial model, so that its results may be compared to the ones obtained by the estimation of the Poisson model. In order to do that, we must type the following command:

menbreg accidents alcohol || state: || municipality: , nolog¹¹

The results obtained are shown in Fig. 23.66.

At the bottom of this figure and from the result of the likelihood-ratio test, we can see that the estimation of this multilevel model is more suitable than the estimation of a traditional negative binomial regression model without random effects for the data in our example. In addition, all the fixed and random effects parameters are statistically different from zero, at a significance level of 0.05.

The estimation of the variances of u_00k and r_0jk resulted in smaller values than the respective values obtained when estimating the multilevel Poisson model (from 0.386 to 0.377 for u_00k and from 0.083 to 0.061 for r_0jk), a fact that is due to the addition of an overdispersion parameter that controls the variability of the data.

In Fig. 23.66, we can see that the estimation of lnalpha is presented. As studied in Chapter 15, remember that alpha (or ϕ), which is the conditional overdispersion of the data, represents the inverse of the shape parameter of the Gamma distribution. For the data in our example, we have $al \hat{p} ha = e^{- 2.258} = 0.105$ .

Analogously, the fixed and random effects parameters can also be obtained through the following command:

meglm accidents alcohol || state: || municipality: , family(nbinomial) link(log) nolog

In order to compare the estimations of the multilevel Poisson and negative binomial models, we must run a likelihood-ratio test, by typing the following command:

lrtest mepoisson ., force

where the term mepoisson refers to the estimation of the Poisson model. Since we are comparing two different estimators (mepoisson and menbreg), we must use the term force when elaborating this likelihood-ratio test. The result of the test can be seen in Fig. 23.67 and, through it, we can see that the negative binomial model is the most suitable, proving that there is overdispersion in the data.

Fig. 23.67 Likelihood-ratio test to compare the estimations of the multilevel Poisson and negative binomial models.

Therefore, the expression of the estimated average number of traffic accidents per year, for a certain municipal district i in a certain municipality j in a state k, is given by:

$u_{ijk} = e^{(0.754 + 0.047 \cdot {alcohol}_{jk} + u_{00 k} + r_{0 jk})}$

where u represents the expected number of occurrences or the estimated average rate of incidence of traffic accidents for one year. In order for these estimated numbers to be generated in the dataset (new variable u), we can type the following command:

predict u

Besides, we can also obtain the error terms u_00k (invariable for the districts located in the same state) and r_0jk (invariable for the districts located in the same municipality). In order to do that, we must type the following command:

predict u00 r0, remeans

which makes two new variables, u00 and r0, be created in the dataset.

The following command, which generates the outputs in Fig. 23.68, shows the values of u, u00, and r0 only for the municipality districts in the State of Mato Grosso:

list state municipality accidents u u00 r0 if state =="MT", sepby(municipality)

Through this figure, we can see that, while the values of u00 do not vary for all the municipal districts in the State of Mato Grosso, the values of r0 do not vary per municipality.

Only for educational purposes, researchers may verify that variable u can also be generated through the following expression:

gen u = exp(0.7538477 + 0.0466768⁎alcohol + u00 + r0)

Finally, we can construct a chart that compares the estimation adjustments of the traditional and multilevel negative binomial models. This chart, which can be seen in Fig. 23.69, is obtained by typing the following commands:

Fig. 23.69 Adjustments of the estimated number of traffic accidents obtained through the traditional and multilevel negative binomial models, based on the average amount of alcohol ingested per inhabitant/day in the district.

quietly nbreg accidents alcohol
predict utrad
graph twoway scatter accidents alcohol || mspline utrad alcohol || mspline u alcohol ||, legend(label(2 "Traditional Negative Binomial") label(3 "Multilevel Negative Binomial"))

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 23.6 Estimation of Hierarchical Linear Models in SPSS

Create new playlist

Sign In

Sign Up

23.6 Estimation of Hierarchical Linear Models in SPSS

23.6.1 Estimation of a Two-Level Hierarchical Linear Model With Clustered Data in SPSS

23.6.2 Estimation of a Three-Level Hierarchical Linear Model With Repeated Measures in SPSS

23.7 Final Remarks

23.8 Exercises

Appendix

A.1 Hierarchical Nonlinear Models

Table of Contents for
23.6 Estimation of Hierarchical Linear Models in SPSS