R is a popular statistical software package, primarily because it is freely available at www.r-project.org. As a result, many instructors as well as many of the more sophisticated statistical practitioners are switching to it. We have found that using R makes sense with graduate students who are already familiar with statistical methodology, especially those students with some experience using more sophisticated statistical software packages such as SAS. We personally recommend using less sophisticated and fully supported statistical software packages such as Minitab and SAS-JMP for undergraduates and those new to formal statistical analysis. However, we realize that some instructors prefer to use R even for these less sophisticated students. As a result, we created this appendix to introduce some of the basics of R.
According to the project's webpage:
R is a very sophisticated statistical software environment, even though it is freely available. The contributors include many of the top researchers in statistical computing. In many ways, it reflects the very latest statistical methodologies. On the other hand, the contributors truly form a community that is quite fluid. It can take quite a bit of work to keep current with the latest features of R. The help documentation with the basic releases is really of limited value. Of course, it many ways, you get what you pay for!
R itself is a high-level programming language. Most of its commands are pre-written functions. It does have the ability to run loops and call other routines, for example, in C. Since it is primarily a programming language, it often presents challenges to novice users.
The best way to understand R is through examples. We present here some of the R code illustrated through the text. We can illustrate many of the basic features of basic data entry and data manipulation with the vapor pressure data set in Exercise 5.2. The data are:
Temp | vp |
273 | 4.6 |
283 | 9.2 |
293 | 17.5 |
303 | 31.8 |
313 | 55.3 |
323 | 92.5 |
333 | 149.4 |
343 | 233.7 |
353 | 355.1 |
363 | 525.8 |
373 | 760.0 |
The brute force way to enter the data uses the c() function:
temp <- c(273, 283, 293, 303, 313, 323, 333, 343, 353, 363, 373) vp <- c(4.6, 9.2, 17.5, 31.8, 55.3, 92.5, 149.4, 233.7, 355.1, 525.8, 760.0)
To check your data entry, you can use the print() function. In our case,
print(temp) print(vp)
The resulting output is:
> print(temp) [1] 273 283 293 303 313 323 333 343 353 363 373 > print(vp) [1] ?4.6 ?9.2 ?17.5 ?31.8 ?55.3 ?92.5 149.4 233.7 355.1 525.8 760.0
For small data sets, the brute force approach works well. For larger data sets, we recommend using the read.table() function. You can create a text file with the data in columns. Generally, the first row is a “header” giving the variable names. The read.table() function works well for this type of file. Let vapor.txt be such a file for the vapor pressure data. The first step is to change the working directory for R to the directory that contains the data file. You can do this under the File box. The following command reads the data file and places the data into the object vapor.
vapor <- read.table(“vapor.txt”, header=TRUE, sep=””)
To check the contents of vapor, we can use the print() function. The resulting output is:
> print(vapor) temp vp 1 273 4.6 2 283 9.2 3 293 17.5 4 303 31.8 5 313 55.3 6 323 92.5 7 333 149.4 8 343 233.7 9 353 355.1 10 363 525.8 11 373 760.0
If we read the data from a file, then we cannot refer to the temperatures as temp even though temp was the name of the column in the original data file; rather, we must also specify the object that contains it. The following command prints the temp column of the vapor object.
> print(vapor$temp) [1] 273 283 293 303 313 323 333 343 353 363 373
Basic physical chemistry suggests modeling the natural log of the vapor pressure as a linear function of the inverse of the temperature. The following commands create the inverse of the temperatures and then prints them.
> inv_temp <- 1/vapor$temp > print(inv_temp) [1] 0.003663004 0.003533569 0.003412969 0.003300330 0.003194888 0.003095975 [7] 0.003003003 0.002915452 0.002832861 0.002754821 0.002680965
The log() function genrates the natural log. The following commands create the natural log of the vapor pressures and then prints them.
> log\_vp <- log(vapor$vp) > print(log_vp) [1] 1.526056 2.219203 2.862201 3.459466 4.012773 4.527209 5.006627 5.454038 [9] 5.872399 6.264921 6.633318
Another useful command for regression analysis is the sqrt() function, which works exactly like the log() function.
R does generate plots, but it takes a great deal of work to make good looking plots. The basic plot function is plot(y,x) where y is the object on the y-axis and x is the object on the y-axis. The following command generates the scatter plot for the vapor pressure data.
> plot(vapor$vp,vapor$temp)
The write.table() function generates an output data file that is useful for using other plotting software. The following code appends the inverse temperatures and the natural logs of the vapor pressures to the original data to form a new object vapor2 and then creates the output data file vapor_output.txt.
> vapor2 <- cbind(vapor,inv_temp,log\_vp) > write.table(vapor2,”vapor\_output.txt”)
R does a very nice job manipulating matrices. This textbook, however, uses statistical software to perform the matrix calculations “under the hood,” so to speak. The text does show the matrix formulations of the procedures we discuss. However, we do not expect students to perform these calculations directly. As a result, we consider an introduction to the matrix manipulations within R beyond our scope. As appropriate, the text does give the basic R code to perform analyses. We leave it to the course instructor to present the details of the matrix manipulations within R.
R Commander is an add-on package to R. It also is freely available. It provides an easy-to-use user interface, much like Minitab and JMP, to the parent R product. R Commander makes it much more convenient to use R; however, it does not provide much flexibility in its analysis. For example, R Commander does not allow the user to use the externally studentized residual for the residual plots. R Commander is a good way for users to get familiar with R. Ultimately, however, we recommend the use of the parent R product.