Using ggplot2-like plots

Ggplot2 is an R library for data visualization popular among R users. The main idea of ggplot2 is that the product of data visualization consists of many layers. Like a painter, we start with an empty canvas and then gradually add layers of paint. Usually, we interface with R code from Python with rpy2 (I will discuss several interoperability options in Chapter 11, of my book Python Data Analysis). However, if we only want to use ggplot2, it is more convenient to use the pyggplot library. In this recipe, we will visualize population growth for three countries using Worldbank data retrievable through pandas. The data consists of various indicators and related metadata. The spreadsheet at http://api.worldbank.org/v2/en/topic/19?downloadformat=excel (retrieved July 2015) has descriptions of the indicators. I think that we can consider the Worldbank dataset to be static; however, similar datasets have frequent changes quite often enough to keep an analyst busy almost full time. Obviously, changing the name of an indicator (probably) could break the code, so I decided to cache the data via the joblib library. The joblib library is related to scikit-learn, and we will discuss it in more detail in Chapter 9, Ensemble Learning and Dimensionality Reduction. Unfortunately, this approach has some limitations; in particular, we are not able to pickle all Python objects.

Getting ready

First, you need R with ggplot2 installed. If you are not going to seriously use ggplot2, maybe you should skip this recipe altogether. The homepage of R is http://www.r-project.org/ (retrieved July 2015). The documentation of ggplot2 is at http://docs.ggplot2.org/current/index.html (retrieved July 2015). You can install pyggplot with pip—I used pyggplot-23. To install joblib, visit https://pythonhosted.org/joblib/installing.html (retrieved July 2015). I have joblib 0.8.4 via Anaconda.

How to do it...

  1. The imports are as follows:
    import pyggplot
    from dautil import data
  2. Load the data with the following code:
    dawb = data.Worldbank()
    pop_grow = dawb.get_name('pop_grow')
    df = dawb.download(indicator=pop_grow, start=1984, end=2014)
    df = dawb.rename_columns(df, use_longnames=True)
  3. The following line initializes pyggplot with the pandas DataFrame object we created:
    p = pyggplot.Plot(df)
  4. Add a bar chart with the following line:
    p.add_bar('country', dawb.get_longname(pop_grow), color='year')
  5. Flip the chart so that the bars point to the right and render:
    p.coord_flip()
    p.render_notebook()

Refer to the following plot for the end result:

How to do it...

The code is in the using_ggplot.ipynb file in this book's code bundle.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset