Plotting using matplotlib

This section provides a brief introduction to plotting in pandas using matplotlib. The matplotlib API is imported using the standard convention, as shown in the following command:

In [1]: import matplotlib.pyplot as plt 

Series and DataFrame have a plot method, which is simply a wrapper around plt.plot. Here, we will examine how we can do a simple plot of a sine and cosine function. Suppose we wished to plot the following functions over the interval pi to pi:

  • f(x) = cos(x) + sin (x)
  • g(x) = cos (x) - sin (x)

This gives the following interval:

    In [51]: import numpy as np
    In [52]: X = np.linspace(-np.pi, np.pi, 256,endpoint=True)
    
    In [54]: f,g = np.cos(X)+np.sin(X), np.sin(X)-np.cos(X)
    In [61]: f_ser=pd.Series(f)
             g_ser=pd.Series(g)
    
    
    In [31]: plotDF=pd.concat([f_ser,g_ser],axis=1)
             plotDF.index=X
             plotDF.columns=['sin(x)+cos(x)','sin(x)-cos(x)']
             plotDF.head()
    Out[31]:  sin(x)+cos(x)  sin(x)-cos(x)
    -3.141593  -1.000000   1.000000
    -3.116953  -1.024334   0.975059
    -3.092313  -1.048046   0.949526
    -3.067673  -1.071122   0.923417
    -3.043033  -1.093547   0.896747
    5 rows × 2 columns

We can now plot the DataFrame using the plot() command and the plt.show() command to display it:

    In [94]: plotDF.plot()
             plt.show()
    
    We can apply a title to the plot as follows:
    In [95]: plotDF.columns=['f(x)','g(x)']
             plotDF.plot(title='Plot of f(x)=sin(x)+cos(x), 
         g(x)=sinx(x)-cos(x)')
             plt.show()

The following is the output of the preceding command:

 Plotting time series data using matplotlib

We can also plot the two series (functions) separately in different subplots, using the following command:

In [96]: plotDF.plot(subplots=True, figsize=(6,6))
           plt.show()

The following is the output of the preceding command:

Plotting some more time series data using matplotlib

There is a lot more to using the plotting functionality of matplotlib within pandas. For more information, take a look at the documentation at http://pandas.pydata.org/pandas-docs/dev/visualization.html.

It is often quite useful to visualize all the variables of a multivariate time series data. Let's plot all the variables of the following data in a single plot. Note that the date column is the index here:

 Occupancy dataset

The subplot feature in matplotlib lets us plot all of the variables at once:

    axes = plot_ts.plot(figsize=(20,10), title='Timeseries Plot', subplots=True, layout=(plot_ts.shape[1],1), xticks = plot_ts.index)
    # Get current axis from plot
    ax = plt.gca()
    import matplotlib.dates as mdates
    # Set frequency of xticks
    ax.xaxis.set_major_locator(mdates.DayLocator(interval = 1))
    # Format dates in xticks 
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
    plt.show()

The following is the output:

Time series plot for all the variables in the occupancy dataset
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset