Appendix B

Log-linear Model—An Introduction

(Source: Knoke and Burke [1] and Course Material by Angela Jeansonne)

Log-linear analysis is an extension of the two-way contingency table where the conditional relationship between two or more discrete, categorical variables is analyzed by taking the natural logarithm of the cell frequencies within a contingency table. The log-linear model states the expected cell frequencies of a cross-tabulation (the img) as functions of parameters representing characteristics of the categorical variables and their relationships with each other. The general log-linear model does not distinguish between independent and dependent variables. All variables are treated alike as ‘response variables’ whose mutual associations are explored.

The usual data suitable for log-linear analysis are contingency tables. Angela Jeansonne provided an example in the class materials. Suppose we are interested in the relationship between sex, heart disease, and body weight. We could take a sample of 200 subjects and determine the sex, approximate body weight, and who does and does not have heart disease. The continuous variable, body weight, is broken down into two discrete categories: not overweight and overweight. The contingency table containing the data may look like this:

img

Angela Jeansonne in the class material states that the basic strategy in log-linear modeling involves fitting models to the observed frequencies in the cross-tabulation of categorical variables. The models can then be represented by a set of expected frequencies that may or may not resemble the observed frequencies. Models will vary in terms of the marginal they fit, and can be described in terms of the constraints they place on the associations or interactions that are present in the data. The pattern of association among variables can be described by a set of odds and by one or more odds ratios derived from them. Once expected frequencies are obtained, we then compare models that are hierarchical to one another and choose a preferred model, which is the most parsimonious model that fits the data. It is important to note that a model is not chosen if it bears no resemblance to the observed data. The choice of a preferred model is typically based on a formal comparison of goodness-of-fit statistics associated with models that are related hierarchically (models containing higher order terms also implicitly include all lower order terms). Ultimately, the preferred model should distinguish between the pattern of the variables in the data and sampling variability, thus providing a defensible interpretation.

Angela Jeansonne further wrote the log-linear model. The following model refers to the traditional chi-square test where two variables, each with two levels (2 × 2 table), are evaluated to see if an association exists between the variables.=:

(B.1) equation

where img is the logarithm of the expected cell frequency of the cases for cell ij in the contingency table, µ is the overall mean of the natural logarithm of the expected frequencies, λ terms each represent ‘effects’ which the variables have on the cell frequencies, A and B are the variables, and i and j the categories within the variables.

Angela Jeansonne also stated that the above model is considered a saturated model because it includes all possible one-way and two-way effects. Given that the saturated model has the same number of cells in the contingency table as it does effects, the expected cell frequencies will always exactly match the observed frequencies, with no degrees of freedom remaining (Knoke and Burke, 1980). For example, in a 2 × 2 table there are four cells and in a saturated model involving two variables there are four effects, img, img, img, img, therefore the expected cell frequencies will exactly match the observed frequencies. Thus, in order to find a more parsimonious model that will isolate the effects best demonstrating the data patterns, a non-saturated model must be sought. This can be achieved by setting some of the effect parameters to zero. For instance, if we set the effects parameter img to zero (i.e., we assume that variable A has no effect on variable B, or vice versa) we are left with the unsaturated model.

(B.2) equation

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset