Getting started with dplyr

To start off with, I will create an R script called dplyr_intro.R and set up my R environment. First, you should set your working directory to the ch7 project folder. Next, you should read the fuel economyhttps://catalog.data.gov/dataset/consumer-price-index-average-price-data dataset into a dataframe as follows:

setwd("path/to/your/project/folder")
vehicles<-read.csv("data/vehicles.csv")

The next step is to import the dplyr and tibble packages. In R, you can import a package using the library() function. The following lines import the dplyr package and the tibble package:

library('dplyr')
library('tibble')

I will start with the select() function. The select() function allows you to select a certain number of columns from a dataframe and returns another dataframe containing only those selected columns. As its first argument, the select() function takes a dataframe. The following arguments to the select() function after the first argument are the names of the columns to be selected from the original dataframe. The result is a dataframe containing only the selected columns.

Though I won't be using them here, dplyr provides a few functions to make it possible to select columns as a function of their name or position. These include starts_with(), ends_with(), contains(), matches() and num_range(). These functions can be helpful when working with datasets that contain a large number of columns. Details are available in the documentation at http://dplyr.tidyverse.org/reference/select.html.

With the fuel economy dataset, one use of the select() function might be to create a dataframe that just contains the product details for each car. In the following continuation of dplyr_intro.R, I use the select() function to select the make, model, and year columns from the vehicles dataframe: 

vehicles.product <- select(data,make,model,year)
print(Vehicles.product)

Running the previous lines should create a dataframe containing just the product information and produce a printout of the dataframe as follows:

You can make the printout a bit more elegant by converting the result to a tibble object as follows:

## the same thing as a tibble
vehicles.product.as.tibble <- as_tibble(
select(vehicles,make,model,year)
)
print(vehicles.product.as.tibble)

Printing out the tibble version of the dataframe, as is done in the previous lines, will produce a much cleaner result, as follows:

Now that the product information for each of the data entries is available, it may be helpful to arrange the rows in a particular order. Arranging the order of the rows could be particularly helpful if the data will need to be viewed or processed manually.

The arrange() function arranges the rows in a particular order by column. The following code will arrange the vehicle product data first by make, then by model, and then by year:

vehicles.product.arranged <- as.tibble(
arrange(vehicles.product,make,model,year)
)
print(vehicles.product.arranged)

The arranged version of the product data is organized in alphabetical order by make and model, and then by year, as follows:

So far, I've performed two operations on the original fuel economy data. First, I selected the columns relevant to the vehicle product information. Then I arranged the rows by make, model, and year. Conducting these steps required two separate chunks of code and two separate variables. The following is what the code looks like so far:

## select the product info from the car
vehicles.product <- as_tibble(
select(vehicles,make,model,year)
)

## arrange the rows of the vehicle product data
vehicles.product.arranged <- as.tibble(
arrange(vehicles.product,make,model,year)
)

This may seem fairly legible, but when there are several consecutive processing steps the code can become quite hard to follow. In the following section, I will introduce a new syntax for chaining operations together to make the larger amounts of code more legible.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset