Scatterplot matrices of bivariate data are helpful to identify relationships between variables in a dataset. We can create scatterplot matrices using pairs()
and the tilde sign, along with plus signs that instruct R to include the desired variables:
pairs(~WEIGHT_1 + WEIGHT_2 + HEIGHT, data=T, main="Scatterplot Matrix of Medical Data")
This syntax gives the following matrix:
If we want a smooth curve (LOWESS) in each bivariate plot, we include the argument panel=panel.smooth
:
pairs(~WEIGHT_1 + WEIGHT_2 + HEIGHT, data=T, main="Scatterplot Matrix of Medical Data", panel = panel.smooth)
The matrix obtained is as follows:
We see a strong linear relationship between WEIGHT_1
and WEIGHT_2
, and curved relationships between those variables and HEIGHT
.
Why not create functions to draw graphs? Here's a function for histograms of vectors of data with standard titles and labels. It allows you to add numbers to the title and axis labels and choose the color. We use the function()
command to set up a function that provides the attributes of our choice (for example, labels and title colors). Enter the following function on the R command line:
nicehist <- function(x, k, col) { hist(x, main = paste("HISTOGRAM_", k, sep = ""), xlab = paste("VALUES_", k, sep = ""), ylab = paste("COUNTS_", k, sep = ""), col = col) } f <- c(3,2,5,4,3,2,7,6,5,7,8,6,4,5,6)
Let's include 3
in the title and create a red histogram. Enter the following syntax:
nicehist(f, 3, "red")
The histogram obtained is as follows:
Let's include 99
in the title and create a light purple histogram.
nicehist(f, 99, "#FFCCFF")
The histogram now looks like this:
You can see that creating a function is a good idea if you have to create many similar plots and need to save time. Of course, you can create more complex function to create other types of graph.
Here is a function that I wrote to plot error bars on your graphs. Copy and paste it into R.
ploterrors <- function(w, z, err) { zmin <- z - err zmax <- z + err HATWIDTH <- 0.012 HAT <- HATWIDTH *( max(w) - min(w) ) for( k in 1:length(z) ) { lines( c(w[k], w[k] ), c( z[k], zmin[k] ) , lwd = 0.8 ) lines( c(w[k], w[k] ), c( z[k], zmax[k] ) , lwd = 0.8 ) lines( c(w[k] - HAT, w[k] + HAT ), c( zmin[k], zmin[k] ), lwd = 0.8 ) lines( c(w[k] - HAT, w[k] + HAT ), c( zmax[k], zmax[k] ), lwd = 0.8 ) } }
Note that you can change the width parameter of the hat from my preferred value of 0.012 to some other number that gives you your preferred width. You can also vary the line width by changing lwd
from 0.8 to your chosen value. Now, we will set up some data and a set of errors, and then we will plot:
X <- c(1,2,3,4,5,6,7,8) Y <- c(1,2,3.4,5.6,7.8,10.3, 15.7, 18.3) ERROR <- c(0.5, 1.2, 0.23, 2.21, 1.43, 1.28 , 2.18, 1.41) plot(X, Y, xlab = "X VALUES", ylab = "Y VALUES", pch = 16) lines(X, Y)
Now, we include the error bars using the function by including the horizontal and vertical axis variables and including the vector of errors as follows:
ploterrors(X, Y, ERROR)
We get the following graph that includes the data and error bars: