Figure 1.1 Location of R
installer.
Figure 1.2 Language selection.
Figure 1.3 With modern versions of Windows, this suggestion can be safely ignored.
Figure 1.4 The license agreement must be acknowledged to use R
.
Figure 1.5 It is important to choose a destination folder with no spaces in the name.
Figure 1.6 This dialog is used to choose the destination folder.
Figure 1.7 This is a proper destination, with no spaces in the name.
Figure 1.8 It is best to select everything except 32-bit components.
Figure 1.10 Choose the Start Menu folder where the shortcuts will be installed.
Figure 1.13 A progress bar is displayed during installation.
Figure 1.14 Confirmation that installation is complete.
Figure 1.15 Introductory screen for installation on a Mac.
Figure 1.16 Version selection.
Figure 1.17 The license agreement, which must be acknowledged to use R
.
Figure 1.18 The license agreement must also be agreed to.
Figure 1.20 The administrator password might be required for installation.
Figure 1.21 A progress bar is displayed during installation.
Figure 1.22 This signals a successful installation.
Figure 2.1 The standard R
interface in Windows.
Figure 2.2 The standard R
interface on Mac OS X.
Figure 2.3 The general layout of RStudio.
Figure 2.4 Object Name Autocomplete in RStudio.
Figure 2.5 Clicking File >> New Project
begins the project creation process.
Figure 2.7 Dialog to choose the location of a new project directory.
Figure 2.8 Dialog to choose an existing directory in which to start a project.
Figure 2.9 Here is the option to choose which type of repository to start a new project from.
Figure 2.11 Clicking Tools >> Options
brings up RStudio options.
Figure 2.12 General options in RStudio.
Figure 2.13 Options for customizing the code editing pane.
Figure 2.14 Options for code appearance.
Figure 2.15 These options control the placement of the various panes in RStudio.
Figure 2.16 Options related to packages. The most important is the CRAN mirror selection.
Figure 2.17 This is where to choose whether to use Sweave or knitr
and select the PDF viewer.
Figure 3.1 RStudio’s Packages pane.
Figure 3.2 RStudio’s package installation dialog.
Figure 3.3 RStudio’s package installation dialog to install from an archive file.
Figure 7.1 Histogram of diamond carats.
Figure 7.2 Scatterplot of diamond price versus carat.
Figure 7.3 Boxplot of diamond carat.
Figure 7.4 Histogram of diamond carats using ggplot2
.
Figure 7.5 Density plot of diamond carats using ggplot2
.
Figure 7.6 Simple ggplot2
scatterplot.
Figure 7.7 Scatterplot of diamonds data mapping diamond color to the color aesthetic.
Figure 7.8 Scatterplot faceted by color.
Figure 7.10 Histogram faceted by color.
Figure 7.11 Boxplot of diamond carats using ggplot2
.
Figure 7.12 Boxplot of diamond carats by cut using ggplot2
.
Figure 7.13 Violin plot of diamond carats by cut using ggplot2
.
Figure 7.15 Line plot using ggplot2
.
Figure 7.16 Line plot with a seperate line for each year.
Figure 12.1 Plot of foreign assistance by year for each of the programs.
Figure 14.1 Plot of random normal variables and their densities, which results in a bell curve.
Figure 14.3 Normal distribution function.
Figure 15.3 ggpairs
plot of tips
data using both continuous and categorial variables.
Figure 15.5 Histogram of tip amount by sex. Note that neither distribution appears to be normal.
Figure 15.7 Density plot showing the difference of heights of fathers and sons.
Figure 16.3 Histogram of value per square foot for NYC condos. It appears to be bimodal.
Figure 16.9 Coefficient plot for condo value regression.
Figure 16.10 Coefficient plots for models with interaction terms. (a) includes individual variables and the interaction term, while (b) only includes the interaction term.
Figure 17.1 Density plot of family income with a vertical line indicating the $150,000 mark.
Figure 17.4 Coefficient plot for a logistic regression on ACS data.
Figure 17.6 Survival curve for Cox proportional hazards model fitted on bladder data.
Figure 17.8 Andersen-Gill survival curves for bladder2
data.
Figure 18.1 Coefficient plot for condo value data regression in house1
.
Figure 18.4 Base graphics plots for residuals versus fitted values.
Figure 18.9 Histogram of the batting average bootstrap. The vertical lines are two standard errors from the original estimate in each direction. They make up the bootstrapped 95% confidence interval.
Figure 19.3 Cross-validation curve for ridge regression fitted on ACS data.
Figure 19.4 Coefficient profile plot for ridge regression fitted on ACS data.
Figure 19.6 Cross-validation curve for glmnet
with α= 0.75.
Figure 19.7 Coefficient path for glmnet
with α= 0.75.
Figure 20.3 Diamonds data with a number of different smoothing splines.
Figure 20.4 Scatterplot of price versus carat with a regression fitted on a natural cubic spline.
Figure 21.1 GDP for a number of nations from 1960 to 2011.
Figure 21.2 Time series plot of U.S. Per Capita GDP.
Figure 21.4 Plot of the U.S. Per Capita GDP diffed twice.
Figure 21.5 ACF and PACF plots for the residuals of ideal model chosen by auto.arima
.
Figure 21.8 Differenced GDP data.
Figure 21.9 Coefficient plots for VAR model of GDP data for Canada and Japan.
Figure 21.10 Time series plot of AT&T ticker data.
Figure 21.11 Series chart for AT&T.
Figure 21.12 Residual plots from GARCH model on AT&T data.
Figure 21.13 Predictions for GARCH model on AT&T data.
Figure 22.3 Plot of Hartigan’s Rule for a series of different cluster sizes.
Figure 22.4 Confusion matrix for clustering of wine data by cultivars.
Figure 22.5 Gap curves for wine data. The blue curve is the observed within-cluster dissimilarity, and the green curve is the expected within-cluster dissimilarity. The red curve represents the Gap statistic (expected-observed) and the error bars are the standard deviation of the gap.
Figure 22.8 Hierarchical clustering of wine data.
Figure 22.9 Hierarchical clustering of country information data.
Figure 22.12 Hierarchical clustering of wine data split by the height of cuts.