Hour 2. The R Environment


What You’ll Learn in This Hour:

Image Environments for writing R code

Image Basic R syntax

Image Elements of the RStudio IDE

Image The premise of an R object

Image Working with R packages

Image Getting internal help


At the end of Hour 1, “The R Community,” you installed R and the popular RStudio Desktop IDE. In this hour we start a new R session via RStudio, type some basic commands, and explore the idea of an R “object.” You will be more formally introduced to the concept of an R package, and in the “Activities” section you will load an R package from the CRAN repository containing datasets that supplement the book.

Integrated Development Environments

At the end of the previous hour you installed two pieces of software, R and RStudio Desktop. In this hour we focus on RStudio. The R language can be accessed in many different ways, however. For example, when you installed R, you also installed the R GUI, which for a long time was the way most R users interacted with the language. The RStudio Desktop IDE is therefore not necessary in order to use R, but it certainly helps.

The R GUI

The R GUI is installed with R and provides an environment in which you can work with R interactively via the R console. The R GUI contains a small selection of drop-down menus that allow you to quickly install and load R packages, load workspaces, and access the R manuals. There is also a series of quick-access buttons that include a “Run line or selection” button for working with scripts and a Stop Current Computation button to allow users to cancel submitted statements.

Compared with modern IDEs such as RStudio, the R GUI is beginning to look quite dated. It remains very quick to load, however, and can be useful if all you need to do is open R to run one or two commands. Throughout this hour and the remaining hours, we will access the R language via the far richer RStudio IDE. Many of the features we look at in this hour are also available directly through the R language or via the R GUI. They may, however, have a slightly different name within the R GUI or behave slightly differently.

The RStudio IDE

RStudio is a U.S.-based company that builds tools to assist R users. One such tool is their extremely popular integrated development environment (IDE) for R, called RStudio (see Figure 2.1). The first publically available version of the RStudio environment was released in 2011 and was made available in both desktop and server formats, with the server version accessed via a web browser. Since then, development has continued at some pace, and the IDE has surpassed many others to become the de facto standard way of interfacing with R.

Image

FIGURE 2.1 The RStudio Environment

Today, RStudio is still open source and available as both a desktop and a server product. Commercial versions of both products are now available for those that require additional features such as security or commercial support.

In Hour 1, you installed the latest version of RStudio Desktop appropriate for your operating system. The RStudio environment consists of four primary panels or panes. The size of these panes can easily be adjusted by clicking the dividing line between two panes and moving up/down or left/right accordingly. In order to change the layout of the panes, you need to use the menu options. Select Tools > Global Options... and then click the Pane Layout button on the left. The structured pane layout within RStudio is one of the features that sets RStudio apart from the standard R GUI. Panes such as Packages and Environment provide a user interface to core R functionality that new users are typically not aware of. More generally, the RStudio environment has helped make R more accessible to many new users who might previously have been put off by the rather basic looking R GUI.

The most relevant and useful features within RStudio will be briefly covered within the remainder of this hour. RStudio is an evolving product, and new features are being added all the time. Full documentation is available on the RStudio product website, www.rstudio.com/products/rstudio/, and is accessible via the Help menu within RStudio.

Other Development Environments

The R GUI and RStudio are by no means the only ways to interface with R. Notepad++ is a very popular general-purpose text editor that understands R syntax. You can even use the editor to submit code using the NppToR plug-in available from SourceForge. Similarly, ESS is an add-on package that enhances the Emacs text editor, enabling interaction with R. The highly customizable Vim editor also has an R plug-in.

Eclipse is a very popular development platform maintained by the Eclipse Foundation, which offers support for a number of programming languages. The StatET plug-in enables users to create customized R environments. Eclipse with StatET is particularly useful when working on large projects across multiple languages. Casual users may find it a little too heavyweight for their needs, however. There is also Rattle, an open-source GUI for data mining in R, as well as Tinn-R, an R GUI and development environment for Windows.

The brief list presented here is by no means exhaustive, and you can call R from a number of different applications and environments. For example, you can call R from Excel using a tool called RExcel. Similarly, the major business intelligence vendors all allow users to write extensions in R and provide their own script editors. Oracle, HP, and Teradata all offer the ability to run R within their respective databases. Microsoft announced in May 2015 that they will be offering the same functionality in SQL Server 2016.

R Syntax

Basic R syntax loosely resembles other mathematical/statistical scripting languages such as Matlab and Python. In this section, we take a look at the R console and type a few simple commands to see how an interactive R session functions.

The Console

Within both the R GUI and RStudio, you access your R session via the R Console. The console is essentially equivalent to running a command-line R session. Working directly within the R console, you type an R command, and when you press Enter, the result of that command is displayed on the line(s) below.

When you start an R session, you are greeted with an initial start-up message containing information about the version of R you are using, along with a selection of commands that the R Core Development Team would like you to know about (see Figure 2.2). Following the start-up message is the > symbol. This is commonly referred to as the command prompt.

Image

FIGURE 2.2 The R Console


Caution: No Warranty!

Note the “ABSOLUTELY NO WARRANTY” comment in the initial startup message. If things go wrong, there is no one you can pick up the phone and complain to!


A flashing cursor to the right of the command prompt is a sign that R is ready for you to submit a new command for processing. An example of the use of the console for a simple mathematical operation can be seen here:

> 4*5  # A simple command
[1] 20
>

Here, we asked R to evaluate the expression 4*5. The correct answer, 20, was printed on the following line, and we were returned to the command prompt and flashing cursor. The [1] relates to the way R prints vectors. It is something we will look at more closely in Hour 3, “Single-Mode Data Structures.” Note the use of the # symbol in order to comment our code. R will ignore anything to the right of the first # symbol of a line.


Caution: Comment Blocks

There is no multiline comment capability within R, so comment blocks may only be achieved by starting each line of code with a #.


The command prompt reappears once R has finished processing a complete line of code. If we do not provide a complete line of code, we will get a “continuation” prompt, +, as follows:

> 4*  # An incomplete line
+

Often this occurs when a closing brace or quotation mark is accidently omitted, though it can also be used deliberately. Because R only processes the statement once the “line” of code is complete, incomplete lines do not necessarily cause syntax errors. If the break was deliberate or if we know what to type to complete the line, we can simply complete the line and press Enter. If we have made a more serious error or are unsure of what mistake we have made, we can press the Esc key to cancel the statement and return to the standard command prompt.

Using the R Console

Let’s type a few commands into the console using the following steps:

1. Open RStudio and wait for the command prompt to appear.

2. Type in a mathematical expression to evaluate, such as 20/4.

3. Press Enter.

The correct result should be displayed after a [1] and you should be returned to the command prompt, >.

Scripting

Professional-level code is rarely, if ever, developed directly in a console or command line. Large volumes of well-structured, readable, and well-documented code should be developed within an R script. The RStudio environment provides an enhanced text editor, shown in Figure 2.3, which can be used to develop R scripts. RStudio refers to this as the Source pane. You can open a script window using File > New File > R Script or via the equivalent buttons or keyboard shortcuts within the application.

Image

FIGURE 2.3 The script editor and console windows

During script development, code from the Source pane can be executed in the console by using the Run button at the top of the Source pane. Equivalently, the keyboard shortcut Ctrl+Enter (Windows) or Command+Return (OS X) can be used. By default, submission of code occurs on a line-by-line basis. RStudio will submit the entirety of the line on which the cursor is placed, regardless of where on the line the cursor is placed. By highlighting only part of a line or, for that matter, multiple lines, you can choose exactly what is submitted to the console.

Many of the examples in this book are brief and will therefore use the R Console directly. However, it is thoroughly recommended that you store all of the code you generate when working through the book in your own script or series of scripts. The content of the script editor can be written to a file by selecting File > Save As... or by using the quick access Save button at the top of the Source pane. In Hour 7, “Writing Functions: Part I,” we will begin writing functions, and it is almost impossible to do so without using scripts.

R Objects

R is often described as a loosely object-oriented programming language. If you have a background in computer science and have used truly object-oriented languages such as Java, you probably would not consider R to be object-oriented. If, like the authors of this book, you have more of an analytical background, you may find the multiple references to “objects” throughout the R manuals a little off-putting.

We will look closer at object orientation in R during Hour 16, “Introduction to R Models and Object Orientation,” and then again in Hour 21, “Writing R Classes,” and Hour 22, “Formal Class Systems.” To begin with, however, we won’t worry too much about the impact of object orientation in R. All it really means is that everything has a name and can be classified into different types of “objects.” For example, there are “function” objects, “data” objects, and “statistical model” objects. This book will focus first on “data” objects, then move on to the use of specific “function” objects (such as particular graphic and statistical modelling function objects).

R Packages

Sets of R “objects” are held together in “packages,” which are structured elements that store data, functions, and other information. When R is installed, it is distributed with a set of core packages, which can be seen in the “library” subdirectory of the R installation. Only a small subset of the installed packages is actually loaded when you start an R session. This helps reduce the start-up time and avoid a behavior known as masking, which we discuss later in this hour. The Packages pane in RStudio shows you which packages are installed on your machine.

The Search Path

When an R session begins, a set of “default” packages are loaded into the environment, providing immediate access to the most commonly used R functions and other objects. The list of packages included within the environment is called the R “search path,” which can be viewed using the search function. The physical location of the packages loaded can be viewed using the searchpaths function. These functions are demonstrated in Listing 2.1.

LISTING 2.1 The Search Path


 1: > search()
 2:  [1] ".GlobalEnv"        "tools:rstudio"     "package:stats"
 3:  [4] "package:graphics"  "package:grDevices" "package:utils"
 4:  [7] "package:datasets"  "package:methods"   "Autoloads"
 5: [10] "package:base"
 6: > searchpaths()
 7:  [1] ".GlobalEnv"
 8:  [2] "tools:rstudio"
 9:  [3] "C:/Program Files/R/R-3.1.2/library/stats"
10:  [4] "C:/Program Files/R/R-3.1.2/library/graphics"
11:  [5] "C:/Program Files/R/R-3.1.2/library/grDevices"
12:  [6] "C:/Program Files/R/R-3.1.2/library/utils"
13:  [7] "C:/Program Files/R/R-3.1.2/library/datasets"
14:  [8] "C:/Program Files/R/R-3.1.2/library/methods"
15:  [9] "Autoloads"
16: [10] "C:/PROGRA~1/R/R-31~1.2/library/base"



Note: Text Wrapping

In the function call to the search function in Listing 2.1, the output was printed with three elements on each line, whereas the searchpaths output was longer so only one element was printed on each line. The number in square brackets tells us the position in the search path for the first element on the line.



Note: RStudio Tools

The "tools:rstudio" item is unique to RStudio. It contains many hidden objects used by the RStudio IDE. The average R user will never touch any of the objects within this item.


Listing Objects

Each package loaded contains (possibly many) R objects that can be accessed. R provides functions to list the objects available in each package. One such function is the objects function. The objects function lists the objects contained in a package. To use the function, you simply call it, specifying the position of the package on the search path from which you wish to list the objects. Alternatively, you can use the “package: [packageName]” syntax produced by running search(). For example, if you want to see the names of the objects contained within the graphics package, you can run either of these lines:

objects(4)                     # Assumes that graphics is 4th in the search path
objects("package:graphics")    # Assumes nothing about the search path

The ls.str function provides a listing of the objects in a package together with a short view of each object (usually the arguments if the object is a function). You call ls.str in the same way as objects, using either the position of the package in the search path or the text produced by running search().


Tip: Find Hidden Objects

When you list package objects in this manner, you list only those objects that the package developer has chosen to expose to the user.

If, however, you wish to view all objects in a package, you can use the all.names argument to the objects function, setting all.names = TRUE.


The R Workspace

Not all the items in the search path refer to R packages. In particular, the first item returned using both search and searchpaths was ".GlobalEnv". This refers to what is known as the “Global Environment,” (or “workspace”) which is a storage box for objects that you create during your R session. This might be data that you read in to R or functions that you write yourself. To begin with, it is empty, but you can easily create your own objects. The standard method for assigning a name to an object is to use the < and - characters to create an arrow (<-). To the left of the arrow you specify the name of a new object you wish to create. To the right you specify the value that the object will take. Here is an example:

> x <- 3*4
> x
[1] 12


Note: Dynamic Typing

R is a “dynamically typed” programming language. This means that you do not have to specify the type (or class) of an object before you assign it a value. The effect of dynamic typing is that R is quicker to write but slower to run than statically typed languages such as Java and C.


Instead of the left arrow, you can use the = sign. Some would argue that the left arrow makes it clear that a new object is being created, whereas others would argue that the = sign is more consistent with assignment in other programming languages. In most situations, there is very little difference, but experienced R package developers tend to use the left arrow, and this is what we will use for the examples throughout this book.


Note: Assigning to the Right

The assignment arrow works both ways. For example, you can create a variable, x, that has the value 9 by typing 9 -> x. Very few people actually use a right arrow to assign, however. It is generally considered good practice to avoid using it.


Object Naming

R object names can be practically any length, and be made up of any combination of letters, numbers, and the . and _ characters. The only real restriction is that it cannot start with a number or “_”. Objects beginning with a dot are accessible but hidden objects. It is important to note that R is a case-sensitive language; therefore, an object named myObject is completely different from one named myobject.


Note: Naming objects with quotes

Strictly speaking, it is possible to start an object name with a number or underscore. It is also possible to include spaces. However, these forms of naming are generally discouraged. We must use one of three types of quotes to identify the non-standard object name: single quotes, '; double quotes, “; or backticks, `. The standard convention in R is to use backticks if naming objects in this way.


There is no widely adopted object-naming convention among R users. Throughout this book we will predominantly use a convention known as “camelCase,” because this is the convention that applies to most cases within the Mango Solutions coding standards. The camelCase convention specifies that each new word within an object’s name, excluding the first, should start with a capital letter. A variant of the convention is also discussed within Google’s R Style Guide, which is a great starting point for anyone looking for styling tips to help ensure professional-level R code.


Tip: Removing Objects

It is possible to remove objects from the workspace using the rm function—for example rm(x).

The objects and ls functions default to the first item in the search path (that is, the Global Environment). You can therefore delete every object in the Global Environment using rm(list=objects()) or rm(list=ls()).


The Working Directory

In R, the working directory is the default directory from which you import files, and to which you write information. A thorough understanding of how to query and change the working directory is essential in order to collaborate and/or share code effectively. If a codebase is well structured and relative file paths (as opposed to absolute file paths) are used throughout, then setting the working directory need only occur once right at the start of an R session.


Tip: Navigating the File System

The R function list.files can also be used to list all the files and folders within a particular directory, returning either file/directory names alone or full file paths.


You can view the current working directory using the getwd function, and change the working directory using the setwd function. RStudio allows the working directory to be updated via the Session > Set Working Directory menu item. It can also be set via the Files pane.

Note the use of the forward slash (/) in the directory paths specified in Listing 2.2. Every time R reads a backslash (), it skips onto the next character and tries to evaluate what is known as an “escape sequence.” This can be painful when you’re copying directory paths from Windows Explorer. The simple solution is to replace every backslash with either a forward slash or a double backslash (\). This includes paths to servers. For example, a Windows path of \server would become \\server or //server in R.

LISTING 2.2 A Working Direcotry


 1: > # Print the current working directory
 2: > getwd()
 3: [1] "C:/Users/username/Desktop/STY"
 4: > # Change the current working directory using an absolute path
 5: > setwd("C:/Users/username/Desktop")
 6: > getwd()
 7: [1] "C:/Users/username/Desktop"
 8: > # Change the current working directory using a relative path
 9: > setwd("STY")
10: > getwd()
11: [1] "C:/Users/username/Desktop/STY"


The backslash itself is known as an escape character. An escape character has a special place in programming because it changes the behavior of subsequent characters, assuming the escape sequence is known. The double backslash (\) is one such use of an escape sequence in R. We will explore some useful escape sequences such as and in later hours.

Saving Workspace Objects

The collection of objects in the Global Environment that you create during an R session are held in memory during the session. When you close R, you must choose whether to save these objects to disk for use at a later date or to delete them.

When a user decides to quit RStudio (and hence close their R session), they are presented with a dialog box similar to the one shown in Figure 2.4, asking them if they would like to “Save workspace image to ~/.RData.” The options presented are Save, Don’t Save, and Cancel. Selecting Save will create an .RData file within the current working directory. This is a compressed format that R can use to regenerate the objects within your Global Environment. RStudio automatically saves an .Rhistory file containing a list of all the commands typed during the R session. This file is visible in RStudio via the History pane.

Image

FIGURE 2.4 To save or not save?


Tip: Saving Large Objects

The save function can be used at any time during an R session. For example, it can be used to create custom .RData files containing objects you specify directly. The save function, along with its counterpart load, are great for working with very large datasets because the time to load objects stored as .RData files can be an order of magnitude faster than reading data from a CSV file or other formats.


In a professional environment it is common to work on multiple projects, each with its own directory structure. RStudio allows the creation of projects via a button in the top-right corner of the IDE. When you create a new project within a specified directory, RStudio stores some information within that directory relating to your project. The impact of creating a new project is that the R session restarts and the working directory is set to be the project directory. When you return to a project after closing down RStudio, any files you had open when you closed the program down are reopened, enabling you to continue where you left off. This is not unique to RStudio, and tools such as Eclipse with StatET offer a slightly richer project setup, allowing you to associate a particular version of R with your project.

Using R Packages

The base R distribution consists of approximately 30 R packages classified as either “core” (otherwise known as “base”) or “recommended.” The packages that make up the base R distribution contain a huge amount of functionality. However, the success of R has largely been due to the contribution of several thousand authors who have chosen to submit new functionality via additional R packages.

The main repository for R packages is CRAN, for which the number of R packages passed 7,000 in 2015. There is also a specialist repository for R developers called R-Forge; however, an increasing number of authors are choosing to share development versions of their packages on the more general-purpose GitHub. In addition to these primary repositories, the field of bioinformatics has its own repository known as Bioconductor, which “provides tools for the analysis and comprehension of high-throughput genomic data.” The Bioconductor community is very strong and even maintains its own conference, BioC.

Finding the Right Package

The CRAN repository is growing at an incredible rate. When I began teaching R courses in 2011, there were fewer than 2,000 packages on CRAN. In 2015, the number of packages passed 7,000. The R Core Development Team is constantly looking for ways to limit the number of packages, and the formation of the R Consortium may bring some control to the situation. However, at present, there is no standard way of finding the right package. A good starting point is CRAN’s Task Views, shown in Figure 2.5.

Image

FIGURE 2.5 CRAN Task Views

At the time of writing, there are 33 Task Views. Each is manually maintained by members of the R community with a special interest in the topic that their Task View covers. There is no higher-level classification of views, so the views themselves are quite diverse and a great deal of overlap occurs between the various Task Views. This is to be expected given that there is no requirement that an R package should focus on a single topic. Conversely, not every package on CRAN appears in a Task View.

A drawback of the open-source nature of CRAN is the duplication of effort that occurs when two independent developers attempt to solve the same problem. This has resulted in several packages that attempt to do the same thing, just in slightly different ways. Ensuring better collaboration on such projects in the future is one of the primary goals of the R Consortium. The aim of CRAN Task Views is to tell you what is available, not to try to rank the packages in any way. Finding the right package via CRAN can therefore be a bit of a challenge!

In 2012, RStudio began maintaining its own CRAN mirror and publish download logs of all the packages downloaded from the mirror. The popularity of the RStudio environment (which defaults to downloading from this mirror) means that if you want to know which packages are the most popular, these download logs can give you a good indication. Gábor Csárdi’s METACRAN (http://www.r-pkg.org/) summarizes the RStudio download logs in a more interactive, user-friendly manner. Alternatively, just search for blog posts discussing the popularity of R packages—there are plenty! Many of the popular general-purpose packages are discussed in this book.

Installing an R Package

The Packages pane in RStudio provides a user-friendly interface for installing and loading R packages. When you install an R package, you essentially create a directory on your machine. Once installed, the package lives on your machine permanently until such time that you choose to delete it.


Tip: Removing Packages

You can delete packages from your system using the remove.packages function.


When you install your first R package, you may be asked if you wish to create your own local library. A library is a just a name for a collection of R packages. Local libraries are particularly useful when you are logged in to your operating system as a standard user and do not have all the necessary admin privileges in order to create new files within your R installation. If you have a local library, you may notice that the Packages pane in RStudio is divided into “User Library” and “System Library” to show where the packages are installed.

The quickest way to install an R package in RStudio is to navigate to the Packages pane and click the Install button. This loads the pop-up shown in Figure 2.6, for installing packages from both CRAN and locally.

Image

FIGURE 2.6 The Install Packages window


Tip: Local Libraries

You can ask R which libraries it is using with the .libPaths function. The same function can also be used to point R at different local libraries. The system library cannot be changed, but you can create as many local libraries as you like.

If you don’t specify the package location when loading a package, R will look through each library in turn to try to find a package with the name you specified.


Installing from CRAN

To install from CRAN, you need to ensure the Install From field shown in Figure 2.6 is pointing to CRAN. If you were using R on the command line or through the R GUI, you would first have to choose your CRAN mirror. RStudio does this for you, however, so you don’t have to worry about choosing a mirror. If you are connected to the Internet and your firewall allows it, you simply need to start typing the name of the package you wish to install in the Packages field, and RStudio will autocomplete the rest for you. Note that if you have multiple libraries, you can choose which one to install to, though RStudio defaults to a local library if you have one.


Caution: Package Quality

A package must pass many checks to make it on to CRAN. It is therefore natural to assume that being on CRAN is a sign of package quality. Although this is partly true, packages downloaded from CRAN have not necessarily been fully tested, or developed in a “valid” environment. Only the “core” and “recommended” packages have been tested by the R Core Development Team.


To save yourself some effort, we recommend leaving the Install Dependencies box checked unless you are concerned about what might be installed onto your system. For one thing, your package will fail to load unless the dependencies are installed. Therefore, if you don’t leave this box checked, you will have to manually install each dependency separately. Bear in mind that some of the more popular packages can have 10 or more dependencies.

Note that the Install Packages tool generates a line of code in the R Console that calls the R function install.packages. This function resides in the utils package, which is loaded by default when you start R. It is possible to call this function directly in any R session.

Installing from a Package Archive File (Binary)

CRAN is the primary package repository for R users, though it is not the exclusive repository. Many commercial organizations build their own utility packages for internal use and may instead distribute package binaries over an intranet. The term “binary” refers to a package that has been built into an archive (a “.zip” on Windows, a “.tgz” on OS X), ready for installation. When you install a package directly from CRAN, the appropriate package binary is chosen for your operating system; it is downloaded to a temporary location and then “unpacked” and installed. When you install manually from a binary, you are simply skipping the CRAN piece and pointing directly to the binary for R to unpack. It is important to note that binaries are constructed in order to be unpacked by R, and you should never try to install a package that you have unzipped yourself.

Installing from Source

Since R is open source, the source code is always available to use and is distributed as a “.tar.gz” file. In addition to installing from a package binary, we may also install directly from the package source. Linux users have to install from source, though Windows and OS X users usually won’t have a need to until they start building their own packages. There are other occasions when it can be useful, but installing from source takes a lot longer than installing from a binary and may require additional tools. For example, Windows users need to install a version of Rtools that is appropriate for their R version. Instructions for installing Rtools can be found in the Appendix.

To install from a source using the RStudio GUI, Linux users simply need to follow the instructions above for installing a package archive file. For those on Windows or OS X we first need download the “tar.gz” file locally. We then install the package as we would a local package binary. Regardless of our operating system, we can install directly from the console by adding the type = "source" argument when running the install.packages function.


Tip: Installing from GitHub

The package devtools contains a function, install_github, that facilitates a direct installation from the GitHub repository. You can use install.packages to install packages directly from other repositories as well.


Loading an R Package

When you start R, only a subset of your installed packages is actually loaded for use within the R session. This helps reduce the startup time and avoid a behavior called masking. In order to access the functionality of other installed packages, you must load them into the environment. The Packages pane in RStudio lists all the packages that your R session is aware of. To load any of these packages, you simply check the box next to a package name and the packages is loaded. Checking the box calls a line of R code using the library function. You can also call the library function directly from the R console.

When developing reusable production-level code, it is best to avoid using untraceable “point-and-click” actions as much as possible. It is standard practice to place multiple calls to the library function at the top of an R script so that other users can run your code. If R cannot find the specified package library, it will produce an error. The require function is an alternative to library that returns a warning if a package is not present, allowing more control over the behavior of the script—for example, “do this, but only if package X has successfully been loaded.” We will look closer at errors and warnings and control flow when we discuss writing R functions in Hour 7, “Writing Functions: Part I,” and Hour 8, “Writing Functions: Part II.” In a professional development environment, checking that the right packages are available is only half the battle. Errors may still occur due to differences in package versions or operating systems, but we’ll come to that later!

Package Dependencies

When you’re developing packages, it is highly unlikely you will need to write every function from scratch. It is likely that you will use one or more functions defined within another package. Rather than copy all the relevant code into your own package, you simply specify a “dependency” on the other package. This avoids duplication and ensures that bugs need only be fixed in a single location. When you load an R package with a dependency, the dependency is also loaded and added to the search path. Note that this means the dependent package must also be installed on your machine.

Masking

Masking occurs when two or more “environments” on the search path contain one or more objects with the same name. Whenever we refer to an object by typing its name, R looks in each of the loaded environments on the search path for that object in turn, starting with the Global Environment. If R finds an object with the name it is looking for, it stops searching. Any objects it doesn’t find have been hidden, or “masked.”

We can delete objects from our own workspace but we cannot delete objects from R packages, only mask them. If you inadvertently mask an object, you can simply clone your object with a different name and use rm to delete the original object from your workspace, thereby unmasking the hidden object.


Tip: Ensuring the Right Object Is Used

Masking is much less of a problem than most new users perceive it to be. This is largely due to package namespaces, which we will look at more closely in Hour 19, “Package Building,” and Hour 20, “Advanced Package Building.” To avoid any potential masking issues, it is possible to reference an object within a package directly by using the [packageName]::[objectName] syntax—for example, base::pi.


Internal Help

The help function can be used to display help on a function or indeed any R object. RStudio allows users to navigate R’s help files via the Help tab. If the phrase you search for exactly matches the name of an R object available in your current session, then the help file for that object is returned. Otherwise, it searches your package libraries (including packages that are not loaded) for possible help pages.


Note: Help from the Console

The RStudio Help pane simply provides wrappers for functionality contained within the utils package. A general search of all help files can be achieved using either the help.search function or the shorthand version, ??. Similarly, if you know the name of the object you require help with, you can use a function help or its shorthand, ?.


The help files can be a little daunting if you are unfamiliar with the standard terminology, as demonstrated in Figure 2.7, which shows the help file for the mean function referring to terms such as “objects,” “vectors,” and “methods” in several places.

Image

FIGURE 2.7 The help page for the mean function

There is a standard set of fields that package maintainers are encouraged to complete, though few are actually necessary. For example, in order to publish a package on CRAN, you must pass what is known as an “R CMD check.” This requires that all your examples in the Examples section of the help file run successfully. However, it is also possible to pass the check by not including the Examples section!

Summary

In this hour we looked at the available development environments for R, focusing on the RStudio environment. We looked closely at the makeup of the language and saw how R is constructed from a number of core and recommended packages that can be extended by downloading additional packages from a repository such as CRAN. In the “Workshop” and “Activities” sections, you will load RStudio, begin using the R console, and install your first R package.

In the next two hours we will look at the standard data objects that are the building blocks of the R language, beginning with vectors and working through to R’s data frame structure. You will learn how to create, combine, and subset these structures.

Q&A

Q. I created an object named using the syntax x <- 5 but when I tried the line X + 2 I got “Error: object ‘X’ not found.” Is that right?

A. If you have been using a language such as SAS, this may seem odd but it is correct. R is case sensitive, so x and X are not the same thing.

Q. A colleague sent you an R package via a .zip file but after unzipping the file you found that you could not install the package. Why is this?

A. R packages are commonly distributed as binaries or “.zip” files. Unless you want to build the package from source yourself, you need to provide R with a binary file, which means keeping it zipped up.

Q. Is it possible to install two different versions of the same package to different libraries? If so, what happens when I try to load them?

A. It is entirely possible to install different versions of the same package to different libraries. Unless you specify exactly which one you are loading, R will load the one highest up the library path. Thankfully, you can only load one version of a package at a time. If you do try to load a package that has already been loaded, R does not produce an error or warning, so our advice is to be careful!

Q. Is it possible to have multiple versions of R installed? If so, how are the package libraries affected?

A. You can have as many versions of R installed on your machine as you like, which is great if you work in a heavily regulated environment and need to ensure you can exactly reproduce results from a time when you were working with an earlier version of R. RStudio lets you switch between R versions via the Tools > Global Options... menu, though you will need to restart RStudio for the change to take effect.

The system library is associated with your version of R and therefore this is automatically updated to use the new versions of the core and recommended R packages when you switch to a new version of R. User libraries default to a version-specific location as well, so there is little risk to using packages built for a different version of R. On the flip side, this means that each time you install a new version of R, you will need to install your favorite packages for that R version as well.

Workshop

The workshop contains quiz questions and exercises to help you solidify your understanding of the material covered. Try to answer all questions before looking at the “Answers” section that follows.

Quiz

1. True or false? You must install RStudio in order to work interactively with R.

2. Which of the following is not used for assignment in R?

A. <-

B. _

C. ->

D. =

3. What does the line objects(4) tell you?

4. What is the difference between installing and loading an R package?

5. What is the difference between an Rhistory and an .RData file?

6. What is masking?

Answers

1. False. There are many ways of working interactively with R, though RStudio is the most popular.

2. The answer is B. However, you might be surprised to learn that prior to R, underscores were used for assignment in S.

3. The line objects(4) produces a list of objects that are contained within the fourth item in the search path. In the example used in this hour, this was graphics, though that might not always be the case. As new packages are loaded, the position of packages in the search path can change.

4. Installing an R package creates a permanent directory on your machine. Typically, you only install a package once for a version of R. Loading a package enables you to actually use it within the R session.

5. An.Rhistory file contains a list of commands that were executed during an R session (or sessions). An .RData file stores R objects and can be used to re-create Global Environment objects from a previous R session.

6. Masking occurs when two or more “environments,” typically packages, contain an object with the same name. When you type that name into the console, R finds the object that is higher up the search path. Any objects that are not found are hidden, or “masked.”

Activities

1. Start an R session by opening RStudio.

2. Print the search path for your R session.

3. List all objects from the “datasets” package using the objects function.

4. Use the Packages pane to install the mangoTraining package from CRAN.

5. Load the mangoTraining package into the R session.

6. List the objects the mangoTraining package contains.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset