Perhaps the biggest reason for R
’s phenomenally ascendant popularity is its collection of user-contributed packages. As of mid-September 2013, there were 4,845 packages available on CRAN1, written by an estimated 2,000 different people. Odds are good that if a statistical technique exists, it has been written in R
and contributed to CRAN. Not only are there an incredibly large number of packages, many are written by the authorities in the field such as Andrew Gelman, Trevor Hastie, Dirk Eddelbuettel and Hadley Wickham.
1. http://cran.r-project.org/web/packages/
A package is essentially a library of prewritten code designed to accomplish some task or a collection of tasks. The survival
package is used for survival analysis, ggplot2
is used for plotting and sp
is for dealing with spatial data.
It is important to remember that not all packages are of the same quality. Some are built to be very robust and are well-maintained, while others are built with good intentions but can fail with unforeseen errors and others still are just plain poor. Even with the best packages, it is important to remember that most were written by statisticians for statisticians, so they may differ from what a computer engineer would expect.
This book will not attempt to provide an exhaustive list of good packages to use because that is constantly changing. However, there are some packages that are so pervasive that they will be used in this book as if they were part of base R
. Some of these are ggplot2
, reshape2
and plyr
by Hadley Wickham; glmnet
by Trevor Hastie, Robert Tibshirani and Jerome Friedman; Rcpp
by Dirk Eddelbuettel; and knitr
by Yihui Xie. We have written a package on CRAN, coefplot
, with more to follow.
As with many tasks in R
, there are multiple ways to install packages. The simplest is to install them using the GUI provided by RStudio and shown in Figure 3.1. Access the Packages pane shown in this figure either by clicking its tab or by pressing Ctrl+7
on the keyboard.
In the upper-left corner, click the Install Packages button to bring up the dialog in Figure 3.2.
From here simply type the name of a package (RStudio has a nice autocomplete feature for this) and click Install. Multiple packages can be specified, separated by commas. This downloads and installs the desired package, which is then available for use. Selecting the Install dependencies checkbox will automatically download and install all packages that the desired package requires to work. For example, our coefplot
package depends on ggplot2
, plyr
, useful
, stringr
and reshape2
, and each of those may have further dependencies.
An alternative is to type a very simple command into the console:
> install.packages("coefplot")
This will accomplish the same thing as working in the GUI.
There has been a movement recently to install packages directly from GitHub or BitBucket repositories, especially to get the development versions of packages. This can be accomplished using devtools
.
> require(devtools)
> install_github(repo = "coefplot", username = "jaredlander")
If the package being installed from a repository contains source code for a compiled language—generally C++ or FORTRAN—then the proper compilers must be installed. More information is in Section 24.6.
Sometimes there is a need to install a package from a local file, either a zip of a prebuilt package or a tar.gz of package code. This can be done using the installation dialog mentioned before but switching the Install from: option to Package Archive File as shown in Figure 3.3. Then browse to the file and install. Note that this will not install dependencies, and if they are not present the installation will fail. Be sure to install dependencies first.
Similarly to before, this can be accomplished using install.packages
.
> install.packages("coefplot_1.1.7.zip")
In the rare instance when a package needs to be uninstalled, it is easiest to click the white X inside a grey circle on the right of the package description in RStudio’s Packages pane shown in Figure 3.1. Alternatively, this can be done with remove.packages
where the first argument is a character vector
naming the packages to be removed.
Now that packages are installed they are almost ready to use and just need to be loaded first. There are two commands that can be used, either library
or require
. They both accomplish the same thing—loading the package—but require
will return TRUE
if it succeeds and FALSE
with a warning if it cannot find the package. This returned value is useful when loading a package from within a function, a practice considered acceptable to some, improper to others. In general usage there is not much of a difference, so it comes down to personal preference. The argument to either function is the name of the desired package, with or without quotes. So loading the coefplot
package would look like:
> require(coefplot)
Loading required package: coefplot
Loading required package: ggplot2
It prints out the dependent packages that get loaded as well. This can be suppressed by setting the argument quietly
to TRUE
.
> require(coefplot, quietly = TRUE)
A package only needs to be loaded when starting a new R
session. Once loaded, it remains available until either R
is restarted or the package is unloaded, as described in Section 3.2.1.
An alternative to loading a package through code is to select the checkbox next to the package name in RStudio’s Packages pane, seen on the left of Figure 3.1. This will load the package by running the code just shown.
Sometimes a package needs to be unloaded. This is simple enough either by clearing the checkbox in RStudio’s Packages pane or by using the detach
function. The function takes the package name preceded by package:
all in quotes.
> detach("package:coefplot")
It is not uncommon for functions in different packages to have the same name. For example, coefplot
is in both arm
(by Andrew Gelman) and coefplot
.2 If both packages are loaded, the function in the package loaded last will be invoked when calling that function. A way around this is to precede the function with the name of the package, separated by two colons (::).
2. This particular instance is because we built coefplot
as an improvement on the one available in arm
. There are other instances where the names have nothing in common.
> arm::coefplot(object)
> coefplot::coefplot(object)
Not only does this call the appropriate function, it also allows the function to be called without even loading the package beforehand.
Building a package is one of the more rewarding parts of working with R
, especially sharing that package with the community through CRAN. Chapter 24 discusses this process in detail.
Packages make up the backbone of the R
community and experience. They are often considered what makes working with R
so desirable. This is how the community makes its work, and so many of the statistical techniques, available to the world. With such a large number of packages, finding the right one can be overwhelming. CRAN Task Views (http://cran.r-project.org/web/views/
) offers a curated listing of packages for different needs. However, the best way to find a new package might just be to ask the community. Appendix A gives some resources for doing just that.