Chapter 3. Graphics with matplotlib

This chapter explores matplotlib, an IPython library for production of publication-quality graphs. In this chapter, the following topics will be discussed:

  • Two-dimensional plots using the plot function and setting up line widths, colors, and styles
  • Plot configuration and annotation
  • Three-dimensional plots
  • Animations

Being an IPython library, matplotlib consists of a hierarchy of classes, and it is possible to code using it in the usual object-oriented style. However, matplotlib also supports an interactive mode. In this mode, the graphs are constructed step-by-step, thus adding and configuring each component at a time. We lay emphasis on the second approach since it is designed for the rapid production of graphs. The object-oriented style will be explained whenever it is needed or leads to better results.

Note

The sense in which the word interactive is used in this context is somewhat different from what is understood today. Graphs produced by matplotlib are not interactive in the sense that the user can manipulate the graphs once they have been rendered in the notebook. Instead, the terminology comes from the time when matplotlib was used mostly in command-line mode, and each new line of code modified the existing plots. Curiously, the software that was the original inspiration for matplotlib still uses a command line-based interface.

The plot function

The plot() function is the workhorse of the matplotlib library. In this section, we will explore the line-plotting and formatting capabilities included in this function.

To make things a bit more concrete, let's consider the formula for logistic growth, as follows:

The plot function

This model is frequently used to represent growth that shows an initial exponential phase, and then is eventually limited by some factor. The examples are the population in an environment with limited resources and new products and/or technological innovations, which initially attract a small and quickly growing market but eventually reach a saturation point.

A common strategy to understand a mathematical model is to investigate how it changes as the parameters defining it are modified. Let's say, we want to see what happens to the shape of the curve when the parameter b changes.

To be able to do what we want more efficiently, we are going to use a function factory. This way, we can quickly create logistic models with arbitrary values for r, a, b, and c. Run the following code in a cell:

def make_logistic(r, a, b, c):
    def f_logistic(t):
        return a / (b + c * exp(-r * t))
    return f_logistic

The function factory pattern takes advantage of the fact that functions are first-class objects in Python. This means that functions can be treated as regular objects: they can be assigned to variables, stored in lists in dictionaries, and play the role of arguments and/or return values in other functions.

In our example, we define the make_logistic() function, whose output is itself a Python function. Notice how the f_logistic() function is defined inside the body of make_logistic() and then returned in the last line.

Let's now use the function factory to create three functions representing logistic curves, as follows:

r = 0.15
a = 20.0
c = 15.0
b1, b2, b3 = 2.0, 3.0, 4.0
logistic1 = make_logistic(r, a, b1, c)
logistic2 = make_logistic(r, a, b2, c)
logistic3 = make_logistic(r, a, b3, c)

In the preceding code, we first fix the values of r, a, and c, and define three logistic curves for different values of b. The important point to notice is that logistic1, logistic2, and logistic3 are functions. So, for example, we can use logistic1(2.5) to compute the value of the first logistic curve at the time 2.5.

We can now plot the functions using the following code:

tmax = 40
tvalues = linspace(0, tmax, 300)
plot(tvalues, logistic1(tvalues)) 
plot(tvalues, logistic2(tvalues)) 
plot(tvalues, logistic3(tvalues))

The first line in the preceding code sets the maximum time value, tmax, to be 40. Then, we define the set of times at which we want the functions evaluated with the assignment, as follows:

tvalues = linspace(0, tmax, 300)

The linspace() function is very convenient to generate points for plotting. The preceding code creates an array of 300 equally spaced points in the interval from 0 to tmax. Note that, contrary to other functions, such as range() and arange(), the right endpoint of the interval is included by default. (To exclude the right endpoint, use the endpoint=False option.)

After defining the array of time values, the plot() function is called to graph the curves. In its most basic form, it plots a single curve in a default color and line style. In this usage, the two arguments are two arrays. The first array gives the horizontal coordinates of the points being plotted, and the second array gives the vertical coordinates. A typical example will be the following function call:

plot(x,y)

The variables x and y must refer to NumPy arrays (or any Python iterable values that can be converted into an array) and must have the same dimensions. The points plotted have coordinates as follows:

x[0], y[0]
x[1], y[1]
x[2], y[2]

The preceding command will produce the following plot, displaying the three logistic curves:

The plot function

You may have noticed that before the graph is displayed, there is a line of text output that looks like the following:

[<matplotlib.lines.Line2D at 0x7b57c50>]

This is the return value of the last call to the plot() function, which is a list (or with a single element) of objects of the Line2D type. One way to prevent the output from being shown is to enter None as the last row in the cell. Alternatively, we can assign the return value of the last call in the cell to a dummy variable:

_dummy_ = plot(tvalues, logistic3(tvalues))

The plot() function supports plotting several curves in the same function call. We need to change the contents of the cell that are shown in the following code and run it again:

tmax = 40
tvalues = linspace(0, tmax, 300)
plot(tvalues, logistic1(tvalues), 
     tvalues, logistic2(tvalues), 
     tvalues, logistic3(tvalues))

This form saves some typing but turns out to be a little less flexible when it comes to customizing line options. Notice that the text output produced now is a list with three elements:

[<matplotlib.lines.Line2D at 0x9bb6cc0>,
 <matplotlib.lines.Line2D at 0x9bb6ef0>,
 <matplotlib.lines.Line2D at 0x9bb9518>]

This output can be useful in some instances. For now, we will stick with using one call to plot() for each curve, since it produces code that is clearer and more flexible.

Let's now change the line options in the plot and set the plot bounds. Change the contents of the cell to read as follows:

plot(tvalues, logistic1(tvalues), 
     linewidth=1.5, color='DarkGreen', linestyle='-') 
plot(tvalues, logistic2(tvalues), 
     linewidth=2.0, color='#8B0000', linestyle=':') 
plot(tvalues, logistic3(tvalues), 
     linewidth=3.5, color=(0.0, 0.0, 0.5), linestyle='--')
axis([0, tmax, 0, 11.])
None

Running the preceding command lines will produce the following plots:

The plot function

The options set in the preceding code are as follows:

  • The first curve is plotted with a line width of 1.5, with the HTML color of DarkGreen, and a filled-line style
  • The second curve is plotted with a line width of 2.0, colored with the RGB value given by the hexadecimal string #8B0000, and a dotted-line style
  • The third curve is plotted with a line width of 3.0, colored with the RGB components, (0.0, 0.0, 0.5), and a dashed-line style

Notice that there are different ways of specifying a fixed color: a HTML color name, a hexadecimal string, or a tuple of floating-point values. In the last case, the entries in the tuple represent the intensity of the red, green, and blue colors, respectively, and must be floating-point values between 0.0 and 1.0. A complete list of HTML name colors can be found at http://www.w3schools.com/html/html_colornames.asp.

Line styles are specified by a symbolic string. The allowed values are shown in the following table:

Symbol string

Line style

'-'

Solid (the default)

'--'

Dashed

':'

Dotted

'-.'

Dash-dot

'None', ' ', or ''

Not displayed

After the calls to plot(), we set the graph bounds with the function call:

axis([0, tmax, 0, 11.])

The argument to axis() is a four-element list that specifies, in this order, the maximum and minimum values of the horizontal coordinates, and the maximum and minimum values of the vertical coordinates.

It may seem non-intuitive that the bounds for the variables are set after the plots are drawn. In the interactive mode, matplotlib remembers the state of the graph being constructed, and graphics objects are updated in the background after each command is issued. The graph is only rendered when all computations in the cell are done so that all previously specified options take effect. Note that starting a new cell clears all the graph data. This interactive behavior is part of the matplotlib.pyplot module, which is one of the components imported by pylab.

Besides drawing a line connecting the data points, it is also possible to draw markers at specified points. Change the graphing commands indicated in the following code snippet, and then run the cell again:

plot(tvalues, logistic1(tvalues), 
     linewidth=1.5, color='DarkGreen', linestyle='-',
     marker='o', markevery=50, markerfacecolor='GreenYellow',
     markersize=10.0) 
plot(tvalues, logistic2(tvalues), 
     linewidth=2.0, color='#8B0000', linestyle=':',
     marker='s', markevery=50, markerfacecolor='Salmon',
     markersize=10.0)  
plot(tvalues, logistic3(tvalues), 
     linewidth=2.0, color=(0.0, 0.0, 0.5), linestyle='--',
     marker = '*', markevery=50, markerfacecolor='SkyBlue',
     markersize=12.0)
axis([0, tmax, 0, 11.])
None

Now, the graph will look as shown in the following figure:

The plot function

The only difference from the previous code is that now we added options to draw markers. The following are the options we use:

  • The marker option specifies the shape of the marker. Shapes are given as symbolic strings. In the preceding examples, we use 'o' for a circular marker, 's' for a square, and '*' for a star. A complete list of available markers can be found at http://matplotlib.org/api/markers_api.html#module-matplotlib.markers.
  • The markevery option specifies a stride within the data points for the placement of markers. In our example, we place a marker after every 50 data points.
  • The markercolor option specifies the color of the marker.
  • The markersize option specifies the size of the marker. The size is given in pixels.

There are a large number of other options that can be applied to lines in matplotlib. A complete list is available at http://matplotlib.org/api/artist_api.html#module-matplotlib.lines.

Adding a title, labels, and a legend

The next step is to add a title and labels for the axes. Just before the None line, add the following three lines of code to the cell that creates the graph:

title('Logistic growth: a={:5.2f}, c={:5.2f}, r={:5.2f}'.format(a, c, r))
xlabel('$t$')
ylabel('$N(t)=a/(b+ce^{-rt})$')

In the first line, we call the title() function to set the title of the plot. The argument can be any Python string. In our example, we use a formatted string:

title('Logistic growth: a={:5.2f}, b={:5.2f}, r={:5.2f}'.format(a, c, r)) 

We use the format() method of the string class. The formats are placed between braces, as in {:5.2f}, which specifies a floating-point format with five spaces and two digits of precision. Each of the format specifiers is then associated sequentially with one of the data arguments of the method. Some of the details of string formatting are covered in Appendix B, A Brief Review of Python, and the full documentation is available at https://docs.python.org/2/library/string.html.

The axis labels are set in the calls:

xlabel('$t$')
ylabel('$N(t)=a/(b+ce^{-rt})$')

As in the title() functions, the xlabel() and ylabel() functions accept any Python string. Note that in the '$t$' and '$N(t)=a/(b+ce^{-rt}$' strings, we use LaTeX to format the mathematical formulas. This is indicated by the dollar signs, $...$, in the string.

After the addition of a title and labels, our graph looks like the following:

Adding a title, labels, and a legend

Next, we need a way to identify each of the curves in the picture. One way to do that is to use a legend, which is indicated as follows:

legend(['b={:5.2f}'.format(b1),
        'b={:5.2f}'.format(b2),
        'b={:5.2f}'.format(b3)])

The legend() function accepts a list of strings. Each string is associated with a curve in the order they are added to the plot. Notice that we are again using formatted strings.

Unfortunately, the preceding code does not produce great results. The legend, by default, is placed in the top-right corner of the plot, which, in this case, hides part of the graph. This is easily fixed using the loc option in the legend function, as shown in the following code:

legend(['b={:5.2f}'.format(b1),
        'b={:5.2f}'.format(b2),
        'b={:5.2f}'.format(b3)], loc='upper left')

Running this code, we obtain the final version of our logistic growth plot, as follows:

Adding a title, labels, and a legend

The legend location can be any of the strings: 'best', 'upper right', 'upper left', 'lower left', 'lower right', 'right', 'center left', 'center right', 'lower center', 'upper center', and 'center'. It is also possible to specify the location of the legend precisely with the bbox_to_anchor option. To see how this works, modify the code for the legend as follows:

legend(['b={:5.2f}'.format(b1),
        'b={:5.2f}'.format(b2),
        'b={:5.2f}'.format(b3)],  bbox_to_anchor=(0.9,0.35))

Notice that the bbox_to_anchor option, by default, uses a coordinate system that is not the same as the one we specified for the plot. The x and y coordinates of the box in the preceding example are interpreted as a fraction of the width and height, respectively, of the whole figure. A little trial-and-error is necessary to place the legend box precisely where we want it. Note that the legend box can be placed outside the plot area. For example, try the coordinates (1.32,1.02).

The legend() function is quite flexible and has quite a few other options that are documented at http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.legend.

Text and annotations

In this subsection, we will show how to add annotations to plots in matplotlib. We will build a plot demonstrating the fact that the tangent to a curve must be horizontal at the highest and lowest points. We start by defining the function associated with the curve and the set of values at which we want the curve to be plotted, which is shown in the following code:

f = lambda x:  (x**3 - 6*x**2 + 9*x + 3) / (1 + 0.25*x**2)
xvalues = linspace(0, 5, 200)

The first line in the preceding code uses a lambda expression to define the f() function. We use this approach here because the formula for the function is a simple, one-line expression. The general form of a lambda expression is as follows:

lambda <arguments> : <return expression>

This expression by itself creates an anonymous function that can be used in any place that a function object is expected. Note that the return value must be a single expression and cannot contain any statements.

The formula for the function may seem unusual, but it was chosen by trial-and-error and a little bit of calculus so that it produces a nice graph in the interval from 0 to 5. The xvalues array is defined to contain 200 equally spaced points on this interval.

Let's create an initial plot of our curve, as shown in the following code:

plot(xvalues, f(xvalues), lw=2, color='FireBrick')
axis([0, 5, -1, 8])
grid()
xlabel('$x$')
ylabel('$f(x)$')
title('Extreme values of a function')
None # Prevent text output

Most of the code in this segment is explained in the previous section. The only new bit is that we use the grid() function to draw a grid. Used with no arguments, the grid coincides with the tick marks on the plot. As everything else in matplotlib, grids are highly customizable. Check the documentation at http://matplotlib.org/1.3.1/api/pyplot_api.html#matplotlib.pyplot.grid.

When the preceding code is executed, the following plot is produced:

Text and annotations

Note that the curve has a highest point (maximum) and a lowest point (minimum). These are collectively called the extreme values of the function (on the displayed interval, this function actually grows without bounds as x becomes large). We would like to locate these on the plot with annotations. We will first store the relevant points as follows:

x_min = 3.213
f_min = f(x_min)
x_max = 0.698
f_max = f(x_max)
p_min = array([x_min, f_min])
p_max = array([x_max, f_max])
print p_min
print p_max

The variables, x_min and f_min, are defined to be (approximately) the coordinates of the lowest point in the graph. Analogously, x_max and f_max represent the highest point. Don't be concerned with how these points were found. For the purposes of graphing, even a rough approximation by trial-and-error would suffice. In Chapter 5, Advanced Computing with SciPy, Numba, and NumbaPro, we will see how to solve this kind of problem accurately via SciPy. Now, add the following code to the cell that draws the plot, right below the title() command, as shown in the following code:

arrow_props = dict(facecolor='DimGray', width=3, shrink=0.05, 
              headwidth=7)
delta = array([0.1, 0.1])
offset = array([1.0, .85])
annotate('Maximum', xy=p_max+delta, xytext=p_max+offset,
         arrowprops=arrow_props, verticalalignment='bottom',
         horizontalalignment='left', fontsize=13)
annotate('Minimum', xy=p_min-delta, xytext=p_min-offset,
         arrowprops=arrow_props, verticalalignment='top',
         horizontalalignment='right', fontsize=13)

Run the cell to produce the plot shown in the following diagram:

Text and annotations

In the code, start by assigning the variables arrow_props, delta, and offset, which will be used to set the arguments in the calls to annotate(). The annotate() function adds a textual annotation to the graph with an optional arrow indicating the point being annotated. The first argument of the function is the text of the annotation. The next two arguments give the locations of the arrow and the text:

  • xy: This is the point being annotated and will correspond to the tip of the arrow. We want this to be the maximum/minimum points, p_min and p_max, but we add/subtract the delta vector so that the tip is a bit removed from the actual point.
  • xytext: This is the point where the text will be placed as well as the base of the arrow. We specify this as offsets from p_min and p_max using the offset vector.

All other arguments of annotate() are formatting options:

  • arrowprops: This is a Python dictionary containing the arrow properties. We predefine the dictionary, arrow_props, and use it here. Arrows can be quite sophisticated in matplotlib, and you are directed to the documentation for details.
  • verticalalignment and horizontalalignment: These specify how the arrow should be aligned with the text.
  • fontsize: This signifies the size of the text. Text is also highly configurable, and the reader is directed to the documentation for details.

The annotate() function has a huge number of options; for complete details of what is available, users should consult the documentation at http://matplotlib.org/1.3.1/api/pyplot_api.html#matplotlib.pyplot.annotate for the full details.

We now want to add a comment for what is being demonstrated by the plot by adding an explanatory textbox. Add the following code to the cell right after the calls to annotate():

bbox_props = dict(boxstyle='round', lw=2, fc='Beige')
text(2, 6, 'Maximum and minimum points
have horizontal tangents', 
     bbox=bbox_props, fontsize=12, verticalalignment='top')

The text()function is used to place text at an arbitrary position of the plot. The first two arguments are the position of the textbox, and the third argument is a string containing the text to be displayed. Notice the use of ' ' to indicate a line break. The other arguments are configuration options. The bbox argument is a dictionary with the options for the box. If omitted, the text will be displayed without any surrounding box. In the example code, the box is a rectangle with rounded corners, with a border width of 2 pixels and the face color, beige.

As a final detail, let's add the tangent lines at the extreme points. Add the following code:

plot([x_min-0.75, x_min+0.75], [f_min, f_min],
     color='RoyalBlue', lw=3)
plot([x_max-0.75, x_max+0.75], [f_max, f_max],
     color='RoyalBlue', lw=3)

Since the tangents are segments of straight lines, we simply give the coordinates of the endpoints. The reason to add the code for the tangents at the top of the cell is that this causes them to be plotted first so that the graph of the function is drawn at the top of the tangents. This is the final result:

Text and annotations

The examples we have seen so far only scratch the surface of what is possible with matplotlib. The reader should read the matplotlib documentation for more examples.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset