Sometimes, you may wish to create a time series graph that involves dates along the horizontal axis. We can create such graphs using qplot
. Let's try plotting a time series graph. We use the built-in economics dataset (see http://docs.ggplot2.org/current/economics.html) and plot the population against the date. Let's see the first six rows using head()
:
head(economics)
Here are the first six rows of the data:
date pce pop psavert uempmed unemploy 1 1967-06-30 507.8 198712 9.8 4.5 2944 2 1967-07-31 510.9 198911 9.8 4.7 2945 3 1967-08-31 516.7 199113 9.0 4.6 2958 4 1967-09-30 513.3 199311 9.8 4.9 3143 5 1967-10-31 518.5 199498 9.7 4.7 3066 6 1967-11-30 526.2 199657 9.4 4.8 3018
Now we look at the last six rows using the following command:
tail(economics)
The output is as follows:
We can see that the economics dataset runs from the year 1967 to 2007 and contains dates in a particular format (hyphens separate the year, month, and day). We wish to plot certain variables by date. However, before we plot, note that R likes dates in the format year-month-day. For example, let's extract the first date in the economics dataset:
economics$date[1]
The output is as follows:
[1] "1967-06-30"
You can use the as.Date()
function to ensure that R understands a particular format. For example, November 3, 2011 may be expressed as 03/11/2011. However, R does not yet understand this format. Let's read this format into R.
date1 <- "03/11/2011" date1
The output obtained is as follows:
[1] "03/11/2011"
We cannot use this format directly, but we can express the date in the format in which R likes dates. Enter the following syntax:
date1B <- as.Date(date1, "%d/%m/%Y") date1B
Now the output is:
[1] "2011-11-03"
Note the percentage signs. The lowercase m
stands for the month, the lowercase d
stands for the day of the month, and finally the uppercase Y
stands for the year. Other examples may involve the lowercase b
(an abbreviation of the name of the month; for example, Mar) or the uppercase B
, which refers to the full name of the month. You can convert other formats to the necessary format using as.Date()
and percentage signs. For example, you can use the following syntax:
as.Date('12MAR89',format='%d%b%y') [1] "1989-03-12"
Now use the following syntax:
as.Date('August 11, 1987',format='%B %d, %Y') [1] "1987-08-11"
In these examples, you can see that we recast the given date to the preferred format for R by instructing R how to interpret each component of the given date.
Let's create our graph, placing date as the first argument inside the qplot()
command. Enter the following syntax:
qplot(date, pop, data=economics, geom="line", col = I("red"), size = I(2))
You will get this graph:
The graph has horizontal axis labels for every decade. For datasets spanning shorter periods of time, qplot
may produce default axis labels for each year or even for each month.
Now, let's plot against a particular set of dates that are labeled appropriately. We will select only data pertaining to 2006-6-1
and after. We use the subset()
command and the comparison operator >
to select our set of dates:
econdata <- subset(economics, date > as.Date("2006-6-1")) econdata
We get the following output:
date pce pop psavert uempmed unemploy 469 2006-06-30 9338.9 299801 -1.7 8.2 7228 470 2006-07-31 9352.7 300065 -1.5 8.4 7116 471 2006-08-31 9348.5 300326 -1.0 8.1 6912 472 2006-09-30 9376.0 300592 -0.8 8.0 6715 473 2006-10-31 9410.8 300836 -0.9 8.2 6826 474 2006-11-30 9478.5 301070 -1.1 7.3 6849 475 2006-12-31 9540.3 301296 -0.9 8.1 7017 476 2007-01-31 9610.6 301481 -1.0 8.1 6865 477 2007-02-28 9653.0 301684 -0.7 8.5 6724 478 2007-03-31 9705.0 301913 -1.3 8.7 6801
Now let's create our graph, a line graph in red and twice the default line width, using the following syntax:
qplot(date, pop, data=econdata, geom="line", col = I("red"), size = I(2))
Here is the output graph:
So far, we have plotted one variable (pop
). However, the variables are configured in separate columns (one variable to each column), whereas qplot
needs all of the variables we wish to plot in a single column. So, how do we plot two or more of the variables on the same graph? To create graphs of one or more variables in our dataset (pce
, pop
, psavert
, and so on), we use the melt()
function (provided within the reshape
package) in order to configure the data into a format that qplot
can use. The reshape
package provides functions that enable you to recast data into formats that are suitable for qplot
and ggplot
. The melt()
function creates a new column that stores the variables. To use the functions provided within reshape, first install the reshape package by entering install.packages("reshape")
on the command line. Then, load the reshape
library using the library()
command:
library(reshape)
Now we use the melt()
command:
dat <- melt(econdata, id = "date") head(dat)
The output is as follows:
date variable value 1 2006-06-30 pce 9338.9 2 2006-07-31 pce 9352.7 3 2006-08-31 pce 9348.5 4 2006-09-30 pce 9376.0 5 2006-10-31 pce 9410.8 6 2006-11-30 pce 9478.5
Note that all of the variables are now arranged column-wise and given the column name variable
. It makes sense to plot both population and unemployment together, because they are related variables and because the other variables exist on completely different scales. Therefore, we subset for these two variables only. We use the logical operator for OR (the vertical line) to include data for pop
and unemploy
together:
datsub <- subset(dat, variable == "pop" | variable == "unemploy") datsub
You will get the following output:
date variable value 11 2006-06-30 pop 299801 12 2006-07-31 pop 300065 13 2006-08-31 pop 300326 14 2006-09-30 pop 300592 15 2006-10-31 pop 300836 16 2006-11-30 pop 301070 17 2006-12-31 pop 301296 18 2007-01-31 pop 301481 19 2007-02-28 pop 301684 20 2007-03-31 pop 301913 41 2006-06-30 unemploy 7228 42 2006-07-31 unemploy 7116 43 2006-08-31 unemploy 6912 44 2006-09-30 unemploy 6715 45 2006-10-31 unemploy 6826 46 2006-11-30 unemploy 6849 47 2006-12-31 unemploy 7017 48 2007-01-31 unemploy 6865 49 2007-02-28 unemploy 6724 50 2007-03-31 unemploy 6801
Now we make the variables of this object visible to R by name using attach()
:
attach(datsub)
Now let's use qplot
to plot the two series, mapping a color to each variable:
qplot(date, value, data = datsub, type = "point", size = I(3), id = variable, color = variable)
Here is our graph:
These two series are of different magnitudes, but at least we have included them on the same graph. Note that the date axis includes labels (giving the month) in quarters (that is, where the calendar year is divided into four quarters).
Navigate to http://docs.ggplot2.org/current/, and refer to scale_x_date
for examples of plotting multiple times series on a single graph.
One last example will suffice to illustrate the formatting options available through qplot
. We load the scales
library in order to access various date formatting functions. The scales
library enables us to choose the format we want for labels on our time series graphs. For example, we may wish to provide axis labels in the format month/day. We use scale_x_date()
to do this job:
library(scales) W <- qplot(date, value, data = datsub, type = "point", size = I(3), id = variable, color = variable) W + scale_x_date(labels = date_format("%m/%d"))
The graph looks like this:
Our graph includes dates (quarterly) according to the required format: month/day.