Addition
matrices, 68
order of operation, 36
Aggregation
in data.table package, 135–138
AICC, 320
Akaike Information Criterion (AIC), 255–257, 259–260
@aliases tag, 382
all.obs option, 196
Ampersands (&) in compound tests, 111
model comparisons, 254
And operator in compound tests, 111–112
Andersen-Gill analysis, 244–245
Angle brackets (<>)
packages, 375
regular expressions, 169
ANOVA. See Analysis of variance (ANOVA)
Ansari-Bradley test, 204
Appending elements to lists, 68
apt-get mechanism, 2
Arguments
C++ code, 385
CSV files, 74
ifelse, 110
package documentation, 380
Arithmetic mean, 187
ARMA model, 315
Asterisks (*)
Markdown, 368
multiple regression, 228
NAMESPACE file, 377
vectors, 44
Attributes for data.frame, 54
Author information
LATEX documents, 360
packages, 375
@author tag, 382
Autocorrelation, 318
Autoregressive (AR) moving averages, 315–322
Average linkage methods, 352, 355
Axes in nonlinear least squares model, 298
Back ticks ( ` ) with functions, 49
Backslashes () in regular expressions, 166
histograms, 84
Bayesian Information Criterion (BIC), 255–257, 259
Beamer mode in LATEX, 369
Beginning of lines in regular expressions, 167
Bell curve, 171
Bernoulli distribution, 176
BIC (Bayesian Information Criterion), 255–257, 259
Binomial distribution, 176–181, 185–186
Bioconductor, 373
BitBucket repositories, 25, 31
Books, 394
Boxplots
Breakpoints for splines, 302
Byte-compilation for packages, 376
ByteCompile field, 376
cache option for knitr chunks, 365
Calling functions, 49
arguments, 100
C++, 384
conflicts, 33
Carets (^) in regular expressions, 167
Case sensitivity
characters, 40
package names, 384
regular expressions, 162
variable names, 38
Cauchy priors in Bayesian shrinkage, 293–294
Causation vs. correlation, 199
Censored data in survival analysis, 240–241
Centroid linkage methods, 352, 355
Change Install Location option, 9
character data, 40
Charts, 329
chartsnthings site, 393
Chi-Squared distribution, 185–186
Chunks
Markdown, 368
Citations in LATEX documents, 366
Classification trees, 311
Clusters, 337
registering, 283
code
indenting, 99
running in parallel, 282
Code Editing options, 21
logistic regression, 236
multiple regression, 226–228, 230–231
Collate field for packages, 375–376
Colons (:)
Color
boxplots, 92
LATEX documents, 362
line graphs, 96
Column index for arrays, 71
Columns
Comma separated files (CSVs), 73–74
comment option, 365
Comments, 46
knitr chunks, 365
package documentation, 381
Comparing
multiple variables, 192
vectors, 46
Compilation in C++
code, 384
Complete linkage methods, 352, 355
complete.obs option, 196
Components, installing, 5
Comprehensive R Archive Network (CRAN), 1, 29, 384
Concatenating strings, 155–156
Conferences, 393
Confidence intervals
GAM, 310
multiple regression, 226
paired two-sample t-tests, 207
Control statements, 105
Converting shapefile objects into data.frame, 349
Correlation and covariance, 191–200
Covariates in simple linear regression, 211
Cox proportional hazards model, 242–244
.cpp files, 386
CRAN (Comprehensive R Archive Network), 1, 29, 384
Cross tables, 149
Cross-validation
CSVs (comma separated files), 73–74
Cubic splines, 302
Curly braces ({})
functions, 99
regular expressions, 166
Data
missing. See Missing data
Data Analysis Using Regression and Multilevel/Hierarchical Models, 50, 291, 394
converting shapefile objects into, 349
Elastic Net, 272
joins, 145
Data Gotham conference, 393
Data meetups, 391
Data munging, 117
Data reshaping, 141
Data structures, 53
Data types, 38
C++ code, 387
character, 40
matrices, 68
Databases, reading from, 75–76
LATEX documents, 360
packages, 375
DeclareGraphics Extensions, 360
Degrees of freedom
ANOVA, 215
multiple regression, 225
splines, 300
Delimiters in CSV files, 74
Delta in model comparisons, 258
Dendrograms
hierarchical clustering, 352
Density plots, 87–88, 184, 207
Dependencies in packages, 30
Dependent variables in simple linear regression, 211
Depends field
C++ code, 386
packages, 375
@description tag, 382
Destination in installation, 4–5
@details tag, 382
dev option for knitr chunks, 365
Deviance in model comparisons, 256
Dimensions in K-means algorithm, 339
direction argument, 265
Directories
creating, 18
installation, 4
names, 18
Distance between clusters, 352
Distance metric for K-means
algorithm, 337
Distributions. See Probability distributions
Division
matrices, 68
order of operation, 36
Documentation
functions, 49
documentclass, 360
Documents as R resources, 394
Dollar signs ($)
data.frame, 56
multiple regression, 225
regular expressions, 167
%dopar% operator, 284
dot-dot-dot argument (...), 102
DSN connections, 75
Dynamic Documents with R and knitr, 394
dzslides format, 369
echo option for knitr chunks, 365
EDA (Exploratory data analysis), 83, 199, 219
Elements of Statistical Learning: Data Mining, Inference, and Prediction, 394
End of lines in regular expressions, 167
engine option for knitr chunks, 365
Ensemble methods, 312
RStudio. See RStudio overview
Equal to symbol (=)
if and else, 105
variable assignment, 36
Equality of matrices, 68
Esc key in command line commands, 15
eval option for knitr chunks, 365
everything option, 196
@examples tag, 382
Exclamation marks (!) in Markdown, 368
Expected value, 188
Experimental variables in simple linear regression, 211
Exploratory data analysis (EDA), 83, 199, 219
Exponential distribution, 185–186
Exponents, order of operation, 36
@export tag, 382
Extra arguments, 102
Extracting
F-tests
ANOVA, 215
multiple regression, 225
simple linear regression, 214–215
two-sample, 204
factor data type, 40
factors
as.numeric with, 160
Elastic Net, 273
storing, 60
vectors, 48
FALSE value
fig.scap option, 365
fig.show option, 365
fill argument for histograms, 87
Fitted values against residuals plots, 249–251
folder structure, 373
formula interface
ANOVA, 208
Elastic Net, 272
multiple regression, 224, 226, 230
simple linear regression, 213
Formulas for distributions, 185–186
Frontend field for packages, 374
Functions
assigned to objects, 99
C++, 384
conflicts, 33
do.call, 104
documentation, 49
package documentation, 380
return values, 103
g++ compiler, 385
Gamma linear model, 240
GAMs (generalized additive models), 304–310
Gap statistic in K-means algorithm, 343–344
Garbage collection, 38
GARCH (generalized autoregressive
conditional heteroskedasticity)
Gaussian distribution, 171–176
gcc compiler, 385
General options for RStudio tools, 20–21
Generalized additive models (GAMs), 304–310
Generalized autoregressive conditional heteroskedasticity (GARCH) models, 327–336
Generalized linear models, 233
miscellaneous, 240
Geometric distribution, 185–186
Git
integration with RStudio, 25–26
selecting, 19
Git/SVN option, 25
GitHub repositories, 25
for bugs, 392
package installation from, 31, 383
README files, 380
Graphics, 83
Greater than symbols (>)
if and else, 105
variable assignment, 37
Groups, 117
Hadoop framework, 117
Hash symbols (#)
comments, 46
Markdown, 368
package documentation, 381
pandoc, 369
header command in pandoc, 369
Heatmaps, 193
Help pages in package documentation, 381
Hierarchical clustering, 352–357
Histograms, 84
bootstrap, 264
multiple regression, 219
Poisson regression, 238
residuals, 253
HTML tables, extracting data from, 80–81
Hypergeometric distribution, 185–186
Hypothesis tests in t-tests, 201–203
IDEs (Integrated Development Environments), 13–14
Images in LATEX documents, 360
@import tag, 382
Imports field for packages, 375
include option for knitr chunks, 365
Indenting code, 99
Independent variables in simple linear regression, 211
Indexes
arrays, 71
data.table, 129
LATEX documents, 360
lists, 66
Indicator variables
data.frame, 60
multiple regression, 225
PAM, 345
Inferences
ensemble methods, 312
multiple regression, 216
@inheritParams tag, 382
Innovation distribution, 330
Input variables in simple linear regression, 211
Install dependencies option, 30
install.packages command, 31
Install Packages option, 30
installing packages, 29–32, 383–384
installing R, 2
on Linux, 10
Integers in regular expressions, 166
Integrated Development
Intel Matrix Kernel Library, 10
Interactivity, 13
Intercepts
multiple regression, 216
simple linear regression, 212–213
Interquartile Range (IQR), 85–86
Introduction to R, 394
Inverse gaussian linear model, 240
IQR (Interquartile Range), 85–86
Italics in Markdown, 367
Iteration with loops, 113
while, 115
data.table, 149
Joint Statistical Meetings, 393
k-fold cross-validation, 257–258
key columns with join, 144
keys for data.table package, 133–135
knots for splines, 302
L1 penalty, 271
L2 penalty, 271
Lags in autoregressive moving average, 318–319
lambda functions, 279–282, 285–289
Language selection, 3
lasso in Elastic Net, 271, 276, 279, 282
LATEX program
installing, 359
Leave-one-out cross-validation, 258
Legends in scatterplots, 89
Length
characters, 40
Less than symbols (<)
if and else, 105
variable assignment, 36
letters vector, 70
LETTERS vector, 70
Levels
Elastic Net, 273
LICENSE file, 380
Licenses
SAS, 77
Windows, 3
Line breaks in Markdown, 367
Linear models, 211
simple linear regression, 211–216
LinkingTo field, 386
Links
C++ libraries, 386
hierarchical clustering, 352, 355
linear models, 240
Markdown, 368
Linux
C++ compilers, 385
installation on, 10
Lists
Markdown, 367
Loading
rdata files, 162
log-likelihood in AIC model, 255
Log-normal distribution, 185–186
Logical operators
vectors, 46
Logistic distribution, 185–186
Loops, 113
while, 115
Mac
C++ compilers, 385
downloading R, 1
Machine learning, 304
Machine Learning for Hackers, 394
Machine Learning meetups, 391
Maintainer field for packages, 375
makeCluster function, 283
makeindex, 360
MapReduce paradigm, 117
Maps
heatmaps, 193
Matrices
with cor, 192
Elastic Net, 272
VAR, 324
Matrix Kernel Library (MKL), 10
Mean
ANOVA, 209
bootstrap, 262
normal distribution, 171
various statistical distributions, 185–186
Mean squared error in cross-validation, 258
Measured variables in simple linear regression, 211
Memory in 64-bit versions, 2
Merging
data.table, 149
Minitab format, 77
Minus signs (-) in variable assignment, 36–37
Missing data, 50
apply, 118
cov, 199
mean, 188
NA, 50
NULL, 51
PAM, 346
MKL (Matrix Kernel Library), 10
Model diagnostics, 247
stepwise variable selection, 265–269
Moving average (MA) model, 315
Moving averages, autoregressive, 315–322
Multicollinearity in Elastic Net, 273
Multidimensional scaling in K-means algorithm, 339
Multinomial distribution, 185–186
Multinomial regression, 240
Multiple group comparisons, 207–210
Multiple imputation, 50
Multiple time series in VAR, 322–327
Multiplication
order of operation, 36
Multivariate time series in VAR, 322
na.or.complete option, 196
na.rm argument
mean, 188
standard deviation, 189
NA value
with mean, 188
overview, 50
Name-value pairs for lists, 64
Names
data.frame columns, 58
directories, 18
packages, 384
vectors, 47
names function for data.frame, 54–55
Natural cubic splines, 302
Negative binomial distribution, 185–186
Nested indexing of list elements, 66
NEWS file, 379
Nodes in decision trees, 311–312
Noise
autoregressive moving average, 315
VAR, 324
Nonlinear models, 297
generalized additive model, 304–310
nonlinear least squares model, 297–299
Nonparametric Ansari-Bradley test, 204
Not equal symbols (!=) with if and else, 105
nstart argument, 339
Null hypotheses
paired two-sample t-tests, 207
Numbers in regular expressions, 165–169
Objects, functions assigned to, 99
Octave format, 77
1/mu^2 function, 240
Operations
order, 36
Or operators in compound tests, 111–112
Order of operations, 36
Ordered factors, 48
out.width option, 365
Outcome variables in simple linear regression, 211
Outliers in boxplots, 86
Overdispersion in Poisson regression, 238
Overfitting, 312
p-values
ANOVA, 208
multiple regression, 225
Package field in DESCRIPTION file, 374–377
building, 33
checking and building, 383–384
folder structure, 373
options, 23
submitting to CRAN, 384
uninstalling, 32
unloading, 33
Paired two-sample t-tests, 206–207
pairwise.complete option, 197
PAM (Partitioning Around Medoids), 345–352
Parentheses ()
arguments, 100
compound tests, 111
expressions, 63
functions, 99
if and else, 105
order of operation, 36
regular expressions, 163
Partial autocorrelation, 318–319
Partitioning Around Medoids (PAM), 345–352
Passwords in installation, 9
Patterns, searching for, 161–169
Percent symbol (%) in pandoc, 369
Periods (.)
uses, 99
variable names, 37
Plots
coefficient. See Coefficient plots
scatterplots. See Scatterplots
Plus signs (+) in regular expressions, 169
POSIXct data type, 40
Pound symbols (#)
comments, 46
Markdown, 368
package documentation, 381
pandoc, 369
Prediction in GARCH models, 335
Predictive Analytics meetups, 391
Predictors
Elastic Net, 272
generalized additive models, 304
logistic regression, 233
simple linear regression, 211, 213
Probability distributions, 171
Program FilesR directory, 4
prompt option for knitr chunks, 365
Quantiles
binomial distribution, 181
multiple regression, 225
summary function, 190
Quasibinomial linear model, 240
Quasipoisson family, 239
Question marks (?)
with functions, 49
regular expressions, 169
Quotes (”) in CSV files, 74
R-Bloggers site, 393
R CMD commands, 383
R Enthusiasts site, 393
R in Finance conference, 393
R Inferno, 394
R Productivity Environment (RPE), 26–27
Raise to power function, 45
Random numbers
binomial distribution, 176
Random starts in K-means algorithm, 339
Rcmdr interface, 14
RData files
creating, 77
loading, 162
Readability of functions, 99
Reading data, 73
from statistical tools, 77
README files, 380
Real-life resources, 391
books, 394
conferences, 393
documents, 394
Stack Overflow, 392
Twitter, 393
Web sites, 393
Reference Classes system, 377
Registering clusters, 283
Regression
generalized additive models, 304
Regression to the mean, 211
Regression trees, 310
Regularization and shrinkage, 271
Relationships
correlation and covariance, 191–200
simple linear regression, 211–216
Repeating command line commands, 15
Reshaping data, 141
Residual standard error in least squares model, 298
Residual sum of squares (RSS), 254–255
Resources. See Real-life resources
Responses
decision trees, 310
logistic regression, 233
multiple regression, 216–217, 219, 225
Poisson regression, 237
residuals, 247
simple linear regression, 211–213
Return values in functions, 103
Revolution Analytics site, 393
Ridge in Elastic Net, 271, 279
.Rmd files, 369
.Rnw files, 362
Rows
in arrays, 71
bootstrap, 262
data.frame, 53
data.table, 131
with mapply, 120
RPE (R Productivity Environment), 26–27
RSS (residual sum of squares), 254–255
RTools, 385
Run as Administrator option, 3
Running code in parallel, 283
S3 system, 377
@S3method tag, 382
S4 system, 377
s5 slide show format, 369
SAS format, 77
correlation, 192
generalized additive models, 307
splines, 303
scope argument, 265
Scraping web data, 81
Seamless R and C++ Integration with Rcpp, 394
Searches, regular expressions for, 161–169
Secret weapon, 293
Sections in LATEX documents, 361
@seealso tag, 382
Seeds for K-means algorithm, 338
Semicolons (;) for functions, 100
sep argument, 155
Shapefile objects, converting into
data.frame, 349
Shapiro-Wilk normality test, 204
Shortcuts, keyboard, 15
Shrinkage
Elastic Net, 271
Simple linear regression
Single linkage methods, 352, 355
64-bit vs. 32-bit R, 2
Size
binomial distributions, 176–179
lists, 65
sample, 187
Slashes (/) in C++ code, 385–386
Slide show formats, 369
slideous slide show format, 369
Slope in simple linear regression, 212–213
Small multiples, 89
Smoothing functions in GAM, 304
Software license, 3
Split-apply-combine method, 117, 124
SPSS format, 77
Square brackets ([])
arrays, 71
lists, 65
Markdown, 368
vectors, 47
Squared error loss in nonlinear least squares model, 297
Stack Overflow source, 392
Standard deviation
missing data, 189
normal distribution, 171
simple linear regression, 213
Standard error
least squares model, 298
simple linear regression, 213–216
t-tests, 202
start menu shortcuts, 6
startup options, 5
Stata format, 77
Stationarity, 318
Statistical graphics, 83
Statistical tools, reading data from, 77
Stepwise variable selection, 265–269
Strings, 155
stringsAsFactors argument, 75
Submitting packages to CRAN, 384
Subtraction
matrices, 68
order of operation, 36
Suggests field in packages, 375–376
Systat format, 77
t distribution
functions and formulas, 185–186
GARCH models, 330
t-tests, 200
multiple regression, 225
Tab key for autocompleting code, 15
Tables of contents in pandoc, 371
Tensor products, 308
test folder, 374
LATEX documents, 362
32-bit vs. 64-bit R, 2
Tildes (∼) in aggregation, 120
Time series and autocorrelation, 315
autoregressive moving average, 315–322
@title tag, 382
Titles
help files, 381
LATEX documents, 360
slides, 369
Transposing matrices, 70
Trees
hierarchical clustering, 354
TRUE value
Twitter resource, 393
Type field for packages, 374–375
Types. See Data types
Underscores (_)
Markdown, 367
variable names, 37
Unequal length vectors, 46
Uniform (Continuous) distribution, 185–186
Uninstalling packages, 32
Unloading packages, 33
@useDynLib tag, 382
UseMethod command, 377
useR! conference, 393
User installation options, 9
VAR (vector autoregressive) model, 322–327
Variables, 36
names, 37
relationships between, 211–216
Variance, 189
GARCH models, 327
Poisson regression, 238
t-tests, 203
various statistical distributions, 185–186
Vector autoregressive (VAR) model, 322–327
Vectorized arguments with ifelse, 110
data.frame, 56
factors, 48
multiple regression, 217
sprintf, 157
Version control, 19
Version field for packages, 375
Versions, 2
Vertical lines (|) in compound tests, 111
vim mode, 21
Volatility in GARCH models, 330
Weakly informative priors, 290
Websites
R resources, 393
Welch two-sample t-tests, 203
while loops, 115
White noise
autoregressive moving average, 315
VAR, 324
WiFi hotspot locations, 297–298
Windows
C++ compilers, 385
downloading R, 1
Windows Live Writer, 15
within-cluster dissimilarity, 343
Wrapper functions, 386
Writing R Extensions, 394
X-axes in nonlinear least squares model, 298
Xcode, 385
Y-axes in nonlinear least squares model, 298
y-intercepts
multiple regression, 216
simple linear regression, 212–213
Zero Intelligence Agents site, 393
zypper mechanism, 2