Creating graphs with dates

Sometimes, you may wish to create a time series graph that involves dates along the horizontal axis. We can create such graphs using qplot. Let's try plotting a time series graph. We use the built-in economics dataset (see http://docs.ggplot2.org/current/economics.html) and plot the population against the date. Let's see the first six rows using head():

head(economics)

Here are the first six rows of the data:

      date    pce   pop    psavert  uempmed unemploy
1 1967-06-30 507.8 198712     9.8     4.5     2944
2 1967-07-31 510.9 198911     9.8     4.7     2945
3 1967-08-31 516.7 199113     9.0     4.6     2958
4 1967-09-30 513.3 199311     9.8     4.9     3143
5 1967-10-31 518.5 199498     9.7     4.7     3066
6 1967-11-30 526.2 199657     9.4     4.8     3018

Now we look at the last six rows using the following command:

tail(economics)

The output is as follows:

Creating graphs with dates

We can see that the economics dataset runs from the year 1967 to 2007 and contains dates in a particular format (hyphens separate the year, month, and day). We wish to plot certain variables by date. However, before we plot, note that R likes dates in the format year-month-day. For example, let's extract the first date in the economics dataset:

economics$date[1]

The output is as follows:

[1] "1967-06-30"

You can use the as.Date() function to ensure that R understands a particular format. For example, November 3, 2011 may be expressed as 03/11/2011. However, R does not yet understand this format. Let's read this format into R.

date1 <- "03/11/2011" 
date1

The output obtained is as follows:

[1] "03/11/2011"

We cannot use this format directly, but we can express the date in the format in which R likes dates. Enter the following syntax:

date1B <- as.Date(date1, "%d/%m/%Y")
date1B

Now the output is:

[1] "2011-11-03" 

Note the percentage signs. The lowercase m stands for the month, the lowercase d stands for the day of the month, and finally the uppercase Y stands for the year. Other examples may involve the lowercase b (an abbreviation of the name of the month; for example, Mar) or the uppercase B, which refers to the full name of the month. You can convert other formats to the necessary format using as.Date() and percentage signs. For example, you can use the following syntax:

as.Date('12MAR89',format='%d%b%y')
 [1] "1989-03-12"

Now use the following syntax:

as.Date('August 11, 1987',format='%B %d, %Y')
 [1] "1987-08-11"

In these examples, you can see that we recast the given date to the preferred format for R by instructing R how to interpret each component of the given date.

Let's create our graph, placing date as the first argument inside the qplot() command. Enter the following syntax:

qplot(date, pop, data=economics, geom="line", col = I("red"), size = I(2))

You will get this graph:

Creating graphs with dates

The graph has horizontal axis labels for every decade. For datasets spanning shorter periods of time, qplot may produce default axis labels for each year or even for each month.

Now, let's plot against a particular set of dates that are labeled appropriately. We will select only data pertaining to 2006-6-1 and after. We use the subset() command and the comparison operator > to select our set of dates:

econdata <- subset(economics, date > as.Date("2006-6-1"))
econdata

We get the following output:

      date     pce     pop    psavert uempmed unemploy
469 2006-06-30 9338.9 299801    -1.7     8.2     7228
470 2006-07-31 9352.7 300065    -1.5     8.4     7116
471 2006-08-31 9348.5 300326    -1.0     8.1     6912
472 2006-09-30 9376.0 300592    -0.8     8.0     6715
473 2006-10-31 9410.8 300836    -0.9     8.2     6826
474 2006-11-30 9478.5 301070    -1.1     7.3     6849
475 2006-12-31 9540.3 301296    -0.9     8.1     7017
476 2007-01-31 9610.6 301481    -1.0     8.1     6865
477 2007-02-28 9653.0 301684    -0.7     8.5     6724
478 2007-03-31 9705.0 301913    -1.3     8.7     6801

Now let's create our graph, a line graph in red and twice the default line width, using the following syntax:

qplot(date, pop, data=econdata, geom="line", col = I("red"), size = I(2))

Here is the output graph:

Creating graphs with dates

So far, we have plotted one variable (pop). However, the variables are configured in separate columns (one variable to each column), whereas qplot needs all of the variables we wish to plot in a single column. So, how do we plot two or more of the variables on the same graph? To create graphs of one or more variables in our dataset (pce, pop, psavert, and so on), we use the melt() function (provided within the reshape package) in order to configure the data into a format that qplot can use. The reshape package provides functions that enable you to recast data into formats that are suitable for qplot and ggplot. The melt() function creates a new column that stores the variables. To use the functions provided within reshape, first install the reshape package by entering install.packages("reshape") on the command line. Then, load the reshape library using the library() command:

library(reshape)

Now we use the melt() command:

dat <- melt(econdata, id = "date")
head(dat)

The output is as follows:

      date      variable value
1  2006-06-30      pce   9338.9
2  2006-07-31      pce   9352.7
3  2006-08-31      pce   9348.5
4  2006-09-30      pce   9376.0
5  2006-10-31      pce   9410.8
6  2006-11-30      pce   9478.5

Note that all of the variables are now arranged column-wise and given the column name variable. It makes sense to plot both population and unemployment together, because they are related variables and because the other variables exist on completely different scales. Therefore, we subset for these two variables only. We use the logical operator for OR (the vertical line) to include data for pop and unemploy together:

datsub <- subset(dat, variable == "pop" |  variable == "unemploy") 

datsub

You will get the following output:

      date      variable  value
11 2006-06-30      pop   299801
12 2006-07-31      pop   300065
13 2006-08-31      pop   300326
14 2006-09-30      pop   300592
15 2006-10-31      pop   300836
16 2006-11-30      pop   301070
17 2006-12-31      pop 301296
18 2007-01-31      pop 301481
19 2007-02-28      pop 301684
20 2007-03-31      pop 301913
41 2006-06-30 unemploy   7228
42 2006-07-31 unemploy   7116
43 2006-08-31 unemploy   6912
44 2006-09-30 unemploy   6715
45 2006-10-31 unemploy   6826
46 2006-11-30 unemploy   6849
47 2006-12-31 unemploy   7017
48 2007-01-31 unemploy   6865
49 2007-02-28 unemploy   6724
50 2007-03-31 unemploy   6801

Now we make the variables of this object visible to R by name using attach():

attach(datsub)

Now let's use qplot to plot the two series, mapping a color to each variable:

qplot(date, value, data = datsub, type = "point", size = I(3), id = variable, color = variable)

Here is our graph:

Creating graphs with dates

These two series are of different magnitudes, but at least we have included them on the same graph. Note that the date axis includes labels (giving the month) in quarters (that is, where the calendar year is divided into four quarters).

Navigate to http://docs.ggplot2.org/current/, and refer to scale_x_date for examples of plotting multiple times series on a single graph.

One last example will suffice to illustrate the formatting options available through qplot. We load the scales library in order to access various date formatting functions. The scales library enables us to choose the format we want for labels on our time series graphs. For example, we may wish to provide axis labels in the format month/day. We use scale_x_date() to do this job:

library(scales)

W <-  qplot(date, value, data = datsub, type = "point", size = I(3), id = variable, color = variable)

W + scale_x_date(labels = date_format("%m/%d"))

The graph looks like this:

Creating graphs with dates

Our graph includes dates (quarterly) according to the required format: month/day.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset