Creating histograms

In ggplot, we can create histograms using geom_histogram(). Histograms record frequencies for a continuous variable by dividing it into bins of a particular width. Using the medical dataset, use the following syntax to create a basic histogram of patient height, setting the bin width to 10 cm. Again, you can read the data by copying and pasting it from the code file for this chapter. The syntax is as follows:

ggplot(T, aes(x=HEIGHT)) + geom_histogram(binwidth=10)  

Within a histogram, we may wish to identify subgroups of the population using different colors. In our example, we can include a different color for each gender using a color scheme from scale_fill_brewer(). Again, we use a bin width of 10 cm:

ggplot(T, aes(x=HEIGHT, fill=GENDER)) + geom_histogram(binwidth=10) + scale_fill_brewer(type = "div", palette = 4) 

Here is what the histogram looks like:

Creating histograms

Essentially, we have two histograms together. This information is very useful, but perhaps a better alternative is to produce a grouped histogram using the argument position = "dodge":

ggplot(T, aes(x=HEIGHT, fill=GENDER)) + geom_histogram(position="dodge", binwidth=10) + scale_fill_brewer(type = "qual", palette = 2) 

This syntax gives the following histogram:

Creating histograms

This grouped histogram has an attractive appearance and presents the information effectively. However, the bins look as though they represent 5 cm each. In fact, each bin represents 10 cm, but the histogram includes bars for both genders within each bin. Let's try a similar example, this time partitioning by ETH (a three-level categorical variable) and using a different color palette from scale_fill_brewer(). To achieve this graph, we include factor(ETH) to force a grouped histogram for three levels:

ggplot(T, aes(x=HEIGHT, fill=factor(ETH))) + geom_histogram(position="dodge", binwidth=10) + scale_fill_brewer(type = "qual", palette = 6) 

Our histogram looks like this:

Creating histograms

Again, the bin width remains at 10 cm, but now we have three bars within each bin. The use of scale_color_brewer() has allowed us to make effective and attractive histograms in which subgroups are identified by color.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset