Hour 17. Common R Models

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Hour 17. Common R Models

What You’ll Learn in This Hour:

How to fit GLM Models

How to fit Nonlinear Models

How to fit Survival Models

How to fit Time Series Models

In Hour 16, “Introduction to R Models and Object Orientation,” we explored the ways in which we can fit and assess statistical models in R. To achieve this, we used a simple linear modeling approach using the lm function. However, as mentioned, R has the most rich analytic feature set in any technology today. In this hour, we’ll extend the ideas of the previous hour to other modeling approaches. Specifically, we’ll look at Generalized Linear Models, Nonlinear Models, Time Series Models, and Survival Models. We’ll finish this hour by looking at other modeling approaches provided by R, and see where to access further information on these model types.

Note: Theory versus Code

In this hour, we provide a high-level overview of the theory for each modeling approach and then show how the models can be implemented in R. Consequently, we will not spend too much time on the detailed theory, or on the assessment of model performance, beyond that which helps you understand how methods can be applied to model objects.

Generalized Linear Models

In Hour 16, we used the lm function to fit Linear Models to our data. The “linear” aspect, here, refers to the fitting of a dependent variable against a linear function of independent variables. Here’s an example:

Y = θ₀ + θ₁X₁ + θ₂X₂ + ... + θ_NX_N + ε

Here, our Dependent Variable (Y) is modeled against N Independent Variables (X₁ to X_N), with parameters (θ₀ to θ_N) to be estimated by the model-fitting process. With the Linear Model, such as that fit by the lm function, we make a number of assumptions. In particular, we assume that the Dependent Variable (Y) is continuous and Normally distributed. Furthermore, we assume the errors (ε) are independent and identically distributed such that E(ε) = 0 and var (ε) = σ². We also assume that the errors (ε) are Normally distributed with mean 0 and variance σ² for the purposes of tests.

GLM Definition

The Linear Model, described here, can be considered a special case of the Generalized Linear Model (GLM) framework. The GLM approach allows us to fit models where

The Dependent Variable may not be continuous and Normally distributed.

The variance of the Dependent Variable may depend on the mean.

The GLM framework uses four elements to fit a model:

A probability distribution from the exponential family

A “linear predictor” to be modeled

A “link function” defining how the linear predictor is related to the Dependent Variable

A “variance function” explaining how the variance depends on the mean

In the GLM framework, the Dependent Variable (Y) is assumed to be generated from a specific distribution from the exponential family, a large range of distributions. A number of common distributions are listed in Table 17.1.

TABLE 17.1 Selection of Distributions from the Exponential Family

The linear predictor is of the following form:

γ = θ₀ + θ₁X₁ + θ₂X₂ + ... + θ_NX_N

Here, the linear predictor (γ) is linearly related to N Independent Variables (X₁ to X_N), with parameters (θ₀ to θ_N) to be estimated by the model-fitting process.

The link function (g) is of the format g(μ) = γ and specifies how the linear predictor (γ) is related to the mean of the Dependent Variable, E(Y) = μ.

The variance function (V) explains how the variance of the Dependent Variable var (Y), depends on its mean (μ), specified as var (Y) = ϕV(μ). The variance function is typically dictated by the selected probability distribution.

Fitting a GLM

We can use the glm function to fit a Generalized Linear Model (GLM) in R. The key inputs to the glm function are listed in Table 17.2.

TABLE 17.2 Key Inputs to glm

The formula, data, and na.action inputs are similar to the arguments seen with the lm function. Here, the formula describes the linear predictor we wish to model. The family input describes the link and variance function to be applied by the GLM framework. The family argument is typically specified as a character string or function. Some common examples are seen in Table 17.3, with further detail found in the ?family help file.

TABLE 17.3 GLM Family Inputs

Fitting Gaussian Models

In Hour 16, we used the lm function to fit Linear Models to our data. This is, perhaps, the simplest case of the GLM framework, where

The probability distribution is Gaussian.

The link function is the identity function (because the linear predictor describes the Dependent Variance directly, without transformation).

Thus, we can re-create a model from the previous chapter by instead using the glm function, as shown here:

Table of Contents for Hour 17. Common R Models

Create new playlist

Sign In

Sign Up

Hour 17. Common R Models

Generalized Linear Models

GLM Definition

Fitting a GLM

Fitting Gaussian Models

The glm Object

Detailed Summary

Diagnostic Plots

Logistic Regression

Fitting a Logistic Regression

Predictions from a Logistic Regression

Coefficients from a Logistic Regression

Poisson Regression

GLM Extensions

Nonlinear Models

Nonlinear Regression

Fitting a Nonlinear Regression

Nonlinear Regression of the Puromycin Data

Making Predictions

Extended Model

Nonlinear Model Extensions

Survival Analysis

The ovarian Data Frame

Censoring

Estimating the Survival Function

Kaplan-Meier Estimate

Parametric Methods

Adding Covariates

Proportional Hazards

Plotting a Proportional Hazards Model

Survival Model Extensions

Time Series Analysis

Time Series Objects

Decomposing Time Series

Smoothing

Autocorrelations

Fitting ARIMA Models

Predicting from ARIMA Models

Summary

Q&A

Workshop

Quiz

Answers

Activities

Table of Contents for
Hour 17. Common R Models