© Magnus Lie Hetland 2017

Magnus Lie Hetland, Beginning Python, 10.1007/978-1-4842-0028-5_21

21. Project 2: Painting a Pretty Picture

Magnus Lie Hetland

(1)Trondheim, Norway

In this project, you’ll learn how you can create graphics in Python. More specifically, you’ll be creating a PDF file with graphics helping you visualize data that you read from a text file. While you could get such functionality from a regular spreadsheet, Python gives you much more power, as you’ll see when you get to the second implementation and automatically download your data from the Internet.

In the previous chapter, we looked at HTML and XML—and here is another acronym, which I guess you’re probably familiar with: PDF, short for Portable Document Format. PDF is a format created by Adobe that can represent any kind of document with graphics and text. The PDF file is not really editable (as, say, a Microsoft Word file would be), but there is reader software freely available for most platforms, and the PDF file should look the same no matter which reader you use or which platform you are on (as opposed to HTML, with which the correct fonts may not be available, you would normally have to ship pictures as separate files, and so on).

What’s the Problem?

Python is excellent for analyzing data. With its file-handling and string-processing facilities , it’s probably easier to create some form of report from a data file than to create something similar in your average spreadsheet, especially if what you want to do requires some complicated programming logic.

You have seen (in Chapter 3) how you can use string formatting to get pretty output—for example, if you want to print numbers in columns. However, sometimes plain text just isn’t enough. (As they say, a picture is worth a thousand words.) In this project, you’ll learn the basics of the ReportLab package, which enables you to create graphics and documents in the PDF format (and a few other formats) almost as easily as you created plain text earlier.

As you play with the concepts in this project, I encourage you to find some application that is interesting to you. I have chosen to use data about sunspots (from the Space Weather Prediction Center, a part of the US National Oceanic and Atmospheric Administration) and to create a line diagram from these data.

The program should be able to do the following:

  • Download a data file from the Internet

  • Parse the data file and extract the interesting parts

  • Create PDF graphics based on the data

As in the previous project, these goals might not be fully met by the first prototype.

Useful Tools

The crucial tool in this project is the graphics-generating package. Quite a few such packages are available; I have chosen ReportLab because it is easy to use and has extensive functionality for both graphics and document generation in PDF. If you want to go beyond the basics, you might also want to consider the PYX graphics package ( http://pyx.sf.net ), which is really powerful and has support for TEX-based typography.

To get the ReportLab package, go to the official web site at http://www.reportlab.org . There you will find the software, documentation, and samples. You can either download the library from the site or install it via pip. When you have done this, you should be able to import the reportlab module, as follows:

>>> import reportlab
>>>
Note

Although I show you how some ReportLab features work in this project, much more functionality is available. To learn more, I suggest you obtain the manuals available on the ReportLab web site. They are quite readable and are much more comprehensive than this one chapter could possibly be.

Preparations

Before you start programming, you’ll need some data with which to test your program. I have chosen (quite arbitrarily) to use data about sunspots, available from the web site of the Space Weather Prediction Center ( http://www.swpc.noaa.gov ). You can find the data I use in my examples at ftp://ftp.swpc.noaa.gov/pub/weekly/Predict.txt .

This data file is updated weekly and contains information about sunspots and radio flux. (Don’t ask me what that means.) Once you have this file, you’re ready to start playing with the problem.

Here is a part of the file to give you an idea of how the data look:

#         Predicted Sunspot Number And Radio Flux Values
#                     With Expected Ranges
#
#         -----Sunspot Number------  ----10.7 cm Radio Flux----
# YR MO   PREDICTED    HIGH    LOW   PREDICTED    HIGH    LOW
#--------------------------------------------------------------
2016 03        30.9    31.9    29.9       96.9    97.9    95.9
2016 04        30.5    32.5    28.5       96.1    97.1    95.1
2016 05        30.4    33.4    27.4       94.9    96.9    92.9
2016 06        30.3    35.3    25.3       93.2    96.2    90.2
2016 07        30.2    35.2    25.2       91.6    95.6    87.6
2016 08        30.0    36.0    24.0       90.3    94.3    86.3
2016 09        29.8    36.8    22.8       89.5    94.5    84.5
2016 10        30.0    37.0    23.0       88.9    94.9    82.9
2016 11        30.1    38.1    22.1       88.1    95.1    81.1
2016 12        30.5    39.5    21.5       87.8    95.8    79.8

First Implementation

In this first implementation, let’s just put the data into our source code, as a list of tuples. That way, it’s easily accessible. Here is an example of how you can do it:

data = [
    # Year Month Predicted High Low
    (2016, 03, 30.9, 31.9, 29.9),
    (2016, 04, 30.5, 32.5, 28.5),
    # Add more data here
    ]

With that out of the way, let’s see how you can turn the data into graphics.

Drawing with ReportLab

ReportLab consists of many parts and enables you to create output in several ways. The most basic module for generating PDFs is pdfgen. It contains a Canvas class with several low-level methods for drawing. To draw a line on a Canvas called c, you call the c.line method, for example.

We’ll use the more high-level graphics framework (in the package reportlab.graphics and its submodules), which will enable us to create various shape objects and to add them to a Drawing object that you can later output to a file in PDF format.

Listing 21-1 shows a sample program that draws the string “Hello, world!” in the middle of a 100 × 100-point PDF figure. (You can see the result in Figure 21-1.) The basic structure is as follows: You create a drawing of a given size, you create graphical elements (in this case, a String object) with certain properties, and then you add the elements to the drawing. Finally, the drawing is rendered into PDF format and saved to a file.

A326949_3_En_21_Fig1_HTML.jpg
Figure 21-1. A simple ReportLab figure

Listing 21-1. A Simple ReportLab Program (hello_report.py)
from reportlab.graphics.shapes import Drawing, String
from reportlab.graphics import renderPDF


d = Drawing(100, 100)
s = String(50, 50, 'Hello, world!', textAnchor='middle')


d.add(s)

renderPDF.drawToFile(d, 'hello.pdf', 'A simple PDF file')

The call to renderPDF.drawToFile saves your PDF file to a file called hello.pdf in the current directory.

The main arguments to the String constructor are its x and y coordinates and its text. In addition, you can supply various attributes (such as font size, color, and so on). In this case, I’ve supplied a textAnchor, which is the part of the string that should be placed at the point given by the coordinates.

Constructing Some PolyLines

To create a line diagram (a graph) of the sunspot data, you need to create some lines. In fact, you need to create several lines that are linked. ReportLab has a special class for this: PolyLine.

A PolyLineis created with a list of coordinates as its first argument. This list is of the form [(x0, y0), (x1, y1), ...], with each pair of x and y coordinates making one point on the PolyLine. See Figure 21-2 for a simple PolyLine.

A326949_3_En_21_Fig2_HTML.jpg
Figure 21-2. PolyLine([(0, 0), (10, 0), (10, 10), (0, 10)])

To make a line diagram, one polyline must be created for each column in the data set. Each point in these polylines will consist of a time (constructed from the year and month) and a value (which is the number of sunspots, taken from the relevant column). To get one of the columns (the values), list comprehensions can be useful.

pred = [row[2] for row in data]

Here, pred (for “predicted”) will be a list of all the values in the third column of the data. You can use a similar strategy for the other columns. (The time for each row would need to be calculated from both the year and month, for example, year + month/12.)

Once you have the values and the timestamps, you can add your polylines to the drawing like this:

drawing.add(PolyLine(list(zip(times, pred)), strokeColor=colors.blue))

It isn’t necessary to set the stroke color, of course, but it makes it easier to tell the lines apart. (Note how zip is used to combine the times and values into a list of tuples.)

Writing the Prototype

You now have what you need to write your first version of the program. The source code is shown in Listing 21-2.

Listing 21-2. The First Prototype for the Sunspot Graph Program (sunspots_proto.py)
from reportlab.lib import colors
from reportlab.graphics.shapes import *
from reportlab.graphics import renderPDF


data = [
#    Year  Month   Predicted    High   Low
    (2007, 8,      113.2,       114.2, 112.2),
    (2007, 9,      112.8,       115.8, 109.8),
    (2007, 10,     111.0,       116.0, 106.0),
    (2007, 11,     109.8,       116.8, 102.8),
    (2007, 12,     107.3,       115.3,  99.3),
    (2008, 1,      105.2,       114.2,  96.2),
    (2008, 2,      104.1,       114.1,  94.1),
    (2008, 3,      99.9,        110.9,  88.9),
    (2008, 4,      94.8,        106.8,  82.8),
    (2008, 5,      91.2,        104.2,  78.2),
    ]


drawing = Drawing(200, 150)

pred = [row[2]-40 for row in data]
high = [row[3]-40 for row in data]
low = [row[4]-40 for row in data]
times = [200*((row[0] + row[1]/12.0) - 2007)-110 for row in data]


drawing.add(PolyLine(list(zip(times, pred)), strokeColor=colors.blue))
drawing.add(PolyLine(list(zip(times, high)), strokeColor=colors.red))
drawing.add(PolyLine(list(zip(times, low)), strokeColor=colors.green))


drawing.add(String(65, 115, 'Sunspots', fontSize=18, fillColor=colors.red))
renderPDF.drawToFile(drawing, 'report1.pdf', 'Sunspots')

As you can see, I have adjusted the values and timestamps to get the positioning right. The resulting drawing is shown in Figure 21-3.

A326949_3_En_21_Fig3_HTML.jpg
Figure 21-3. A simple sunspot graph

Although it is pleasing to have made a program that works, there is clearly still room for improvement.

Second Implementation

So, what did we learn from the prototype? We have figured out the basics of how to draw stuff with ReportLab. We have also seen how we can extract the data in a way that works well for drawing the graph. However, there are some weaknesses in the program. To position things properly, I had to add some ad hoc modifications to the values and timestamps. And the program doesn’t actually get the data from anywhere (or, rather, it “gets” the data from a list inside the program itself, rather than reading it from an outside source).

Unlike Project 1 (in Chapter 20), the second implementation won’t be much larger or more complicated than the first. It will be an incremental improvement that uses some more appropriate features from ReportLab and actually fetches its data from the Internet.

Getting the Data

As you saw in Chapter 14, you can fetch files across the Internet with the standard module urllib. Its function urlopen works in a manner quite similar to open but takes a URL instead of a file name as its argument. When you have opened the file and read its contents, you need to filter out what you don’t need. The file contains empty lines (consisting of only whitespace) and lines beginning with some special characters (# and :). The program should ignore these. (See the sample file fragment in the section “Preparations” earlier in this chapter.)

Assuming that the URL is kept in a variable called URL and that the variable COMMENT_CHARS has been set to the string '#:', you can get a list of rows (as in our original program) like this:

data = []
for line in urlopen(URL).readlines():
    line = line.decode()
    if not line.isspace() and not line[0] in COMMENT_CHARS:
        data.append([float(n) for n in line.split()])

The preceding code will include all the columns in the data list, although you aren’t particularly interested in the ones pertaining to radio flux. However, those columns will be filtered out when you extract the columns you really need (as you did in the original program).

Note

If you are using a data source of your own (or if, by the time you read this, the data format of the sunspot file has changed), you will, of course, need to modify this code accordingly.

Using the LinePlot Class

If you thought getting the data was surprisingly simple, drawing a prettier line plot isn’t much of a challenge either. In a situation like this, it’s best to thumb through the documentation (in this case, the ReportLab docs) to see if a feature that can do what you need already exists so you don’t need to implement it all yourself. Luckily, there is just such a thing: the LinePlot class from the module reportlab.graphics.charts.lineplots. Of course, you could have looked for this to begin with, but in the spirit of rapid prototyping, you just used what was at hand to see what you could do. Now it’s time to go one step further.

The LinePlot is instantiated without any arguments, and then you set its attributes before adding it to the Drawing. The main attributes you need to set are x, y, height, width, and data. The first four should be self-explanatory; the latter is simply a list of point lists, each a list of tuples, like the one you used in your PolyLines.

To top it off, let’s set the stroke color of each line. The final code is shown in Listing 21-3. The resulting figure (which will, of course, look quite a bit different with different input data) is shown in Figure 21-4.

A326949_3_En_21_Fig4_HTML.jpg
Figure 21-4. The final sunspot graph

Listing 21-3. The Final Sunspot Program (sunspots.py)
from urllib.request import urlopen
from reportlab.graphics.shapes import *
from reportlab.graphics.charts.lineplots import LinePlot
from reportlab.graphics.charts.textlabels import Label
from reportlab.graphics import renderPDF


URL = 'ftp://ftp.swpc.noaa.gov/pub/weekly/Predict.txt'
COMMENT_CHARS = '#:'


drawing = Drawing(400, 200)
data = []
for line in urlopen(URL).readlines():
    line = line.decode()
    if not line.isspace() and line[0] not in COMMENT_CHARS:
        data.append([float(n) for n in line.split()])


pred = [row[2] for row in data]
high = [row[3] for row in data]
low = [row[4] for row in data]
times = [row[0] + row[1]/12.0 for row in data]


lp = LinePlot()
lp.x = 50
lp.y = 50
lp.height = 125
lp.width = 300
lp.data = [list(zip(times, pred)),
           list(zip(times, high)),
           list(zip(times, low))]
lp.lines[0].strokeColor = colors.blue
lp.lines[1].strokeColor = colors.red
lp.lines[2].strokeColor = colors.green


drawing.add(lp)

drawing.add(String(250, 150, 'Sunspots',
            fontSize=14, fillColor=colors.red))


renderPDF.drawToFile(drawing, 'report2.pdf', 'Sunspots')

Further Exploration

Many graphics and plotting packages are available for Python. One good alternative to ReportLab is PYX, which I mentioned earlier in this chapter. Using either ReportLab or PYX (or some other package), you could try to incorporate automatically generated graphics into a document (perhaps generating parts of that as well). You could use some of the techniques from Chapter 20 to add markup to the text. If you want to create a PDF document, then Platypus, a part of ReportLab, is useful for that. (You could also integrate the PDF graphics with some typesetting system such as LATEX.) If you want to create web pages, there are ways of creating pixmap graphics (such as GIF or PNG) using Python as well—just do a web search on the topic.

If your primary goal is to plot data (which is what we did in this project), you have many alternatives to ReportLab and PYX. One good option is Matplotlib/pylab ( http://matplotlib.org ), but a lot of other similar packages are available.

What Now?

In the first project, you learned how to add markup to a plain-text file by creating an extensible parser. In the next project, you’ll learn about analyzing marked-up text (in XML) by using parser mechanisms that already exist in the Python standard library. The goal of the project is to use a single XML file to specify an entire web site, which will then be generated automatically (with files, directories, added headers, and footers) by your program. The techniques you learn in the next project will be applicable to XML parsing in general, and given the ubiquity of XML, that can’t hurt.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset