3.6. First Python Application

Now that we are familiar with the syntax, style, variable assignment and memory allocation, it is time to look at a more complex example of Python programming. Many of the things in this program will be parts of Python which may have unfamiliar constructs, but we believe that Python is so simple and elegant that the reader should be able to make the appropriate conclusions upon examination of the code.

The source file we will be looking at is fgrepwc.py, named in honor of the two Unix utilities of which this program is a hybrid. fgrep is a simple string searching command. It looks at a text file line by line and will output any line for which the search string appears. Note that a string may appear more than once on a line. wc is another Unix command; this one counts the number of characters, words, and lines of an input text file.

Our version does a little of both. It requires a search string and a filename, and outputs all lines with a match and concludes by displaying the total number of matching lines found. Because a string may appear more than once on a line, we have to state that the count is a strict number of lines that match rather than the total number of times a search string appears in a text file. (One of the exercises at the end of the chapter requires the reader to “upgrade” the program so that the output is the total number of matches.)

One other note before we take a look at the code: The normal convention for source code in this text is to leave out all comments, and place the annotated version on the CD-ROM. However, we will include comments for this example to aid you as you explore your first longer Python script with features we have yet to introduce.

We now introduce fgrepwc.py, found below as Listing 3.1, and provide analysis immediately afterward.

Listing 3.1. File Find (fgrepwc.py)

This application looks for a search word in a file and displays each matching line as well as a summary of how many matching lines were found.

1  #!/usr/bin/env python
2
3  "fgrepwc.py -- searches for string in text file"
4
5  import sys
6  import string
7
8  # print usage and exit
9  def usage():
10     print "usage:  fgrepwc [ -i ] string file"
11     sys.exit(1)
12
13 # does all the work
14 def filefind(word, filename):
15
16     # reset word count
17     count = 0
18
19     # can we open file? if so, return file handle
20     try:
21         fh = open(filename, 'r')
22
23     # if not, exit
24     except:
25         print filename, ":",sys.exc_info()[1]
26         usage()
27
28     # read all file lines into list and close
29     allLines = fh.readlines()
30     fh.close()
31
32     # iterate over all lines of file
33     for eachLine in allLines:
34
35         # search each line for the word
36         if string.find(eachLine, word) > -1:
37             count = count + 1
38             print eachLine,
39
40     # when complete, display line count
41     print count
42
43 # validates arguments and calls filefind()
44 def checkargs():
45
46     # check args; 'argv' comes from 'sys' module
47     argc = len(sys.argv)
48     if argc != 3:
49         usage()
50
51     # call fgrepwc.filefind() with args
52     filefind(sys.argv[1], sys.argv[2])
53
54 # execute as application
55 if __name__ == '__main__':
56     checkargs()

Lines 1–3

The Unix start up line is followed by the module documentation string. If you import the fgrepwc module from another module, this string can be accessed with fgrepwc.__doc__. This is a key feature because it makes previously static text information available in a dynamic execution environment. We can also point out that what we described is usually the only use of the documentation string. It serves no other purpose, but it can double as a comment which is conveniently located at the top of a file. (We invite the reader to take a look at the documentation string at the commencement of the cgi module in the standard library for a serious example of module documentation.)

Lines 5–6

We've already seen the sys and string modules. The sys module contains mostly variables and functions that represent interaction between the Python interpreter and the operating system. You will find items in here such as the command-line arguments, the exit() function, the contents of the Python path environment variable PYTHONPATH, the standard files, and information on errors.

The string module contains practically every function you'll need in processing strings, such as integer conversion via atoi() (and related functions), various string variables, and other string manipulation functions.

The main motivation to provide modules to import is to keep the language small, light, fast, and efficient, and bring in only software that you need to get the job done. Plug'n'play with only the modules you need. Perl and Java have a similar setup, importing modules, packages, and the like, and to a certain extent so do C and C++ with the inclusion of header files.

Lines 8–11

We declare a function called usage() here which has no arguments/parameters. The purpose of this function is to simply display a message to the user indicating the proper command-line syntax with which to initiate the script, and exit the program with the exit() function, found in the sys module. We also mentioned that in the Python namespace, calling a function from an imported module requires a “fully-qualified” name. All imported variables and functions have the following formats: module.variable or module.function(). Thus we have sys.exit().

An alternative from-import statement allows the import of specific functions or variables from a module, bringing them into the current namespace. If this method of importing is used, only the attribute name is necessary.

For example, if we wanted to import only the exit() function from sys and nothing else, we could use the following replacement:

							from sys import exit

Then in the usage() function, we would call exit(1) and leave off the “sys.”. One final note about exit(): The argument to sys.exit() is the same as the C exit() function, and that is the return value to the calling program, usually a command-line shell program. With that said, we point out that this “protocol” of printing usage and exiting applies only to command-line driven applications.

In web-based applications, this would not be the preferred way to quit a running program, because the calling web browser is expecting an acceptable valid HTML response. For web applications, it is more appropriate to output an error message formatted in HTML so that end-users can correct their input. So, basically, no web application should terminate with an error. Exiting a program will send a system or browser error to the user, which is incorrect behavior and the responsibility falls on the website application developer.

The same theory applies to GUI-based applications, which should not “crash out” of their executing window. The correct way to handle errors in such applications is to bring up an error dialog and notify the user and perhaps allow for a parameter change which may rectify the situation.

Lines 13–41

The core part of our Python program is the filefind() function. filefind() takes two parameters: the word the user is searching for, and the name of the file to search.

A counter is kept to track the total number of successful matches (number of lines that contain the word). The next step is to open the file. The try-except construct is used to “catch” errors which may occur when attempting to open the file. One of Python's strengths is its ability to let the programmer handle errors and perform appropriate action rather than simply exiting the program. This results in a more robust application and a more acceptable way of programming. Chapter 10 is devoted to errors and exceptions.

Barring any errors, the goal of this section of function is to open a file, read in all the lines into a buffer that can be processed later, and close the file. We took a sneak peek at files earlier, but to recap, the open() built-in function returns a file object or file handle, with which all succeeding operations are performed on, i.e., readlines() and close().

The final part of the function involves iterating through each line, looking for the target word. Searching is accomplished using the find() function from the string module. find() returns the starting character position (index) if there is a match, or -1 if the string does not appear in the line. All successful matches are tallied and matching lines are displayed to the user.

filefind() concludes by displaying the total number of matching lines that were found.

Lines 43–52

The last function found in our program is checkargs(), which does exactly two things: checking for the correct number of command-line arguments and calling filefind() to do the real work. The command-line arguments are stored in the sys.argv list. The first argument is the program name and presumably, the second is the string we are looking for, and the final argument is the name of the file to search.

Lines 54–56

This is the special code we alluded to earlier: the code that determines (based on __name__) the different courses of action to take if this script was imported or executed directly. With the boilerplate if statement, we can be sure that checkargs() would not be executed if this module were imported, nor would we want it to. It exits anyway because the check for the command-line arguments would fail. If the code did not have the if statement and the main body of code consisted of just the single line to call checkargs(), then checkargs() would be executed whether this module was imported or executed directly.

One final note regarding fgrepwc.py. This script was created to run from the command-line. Some work would be required, specifically interface changes, if you wanted to execute this from a GUI or web-based environment.

The example we just looked at was fairly complex, but hopefully it was not a complete mystery, with the help of our comments in this section as well as any previous programming experience you may have brought. In the next chapter, we will take a closer look at Python objects, the standard data types, and how we can classify them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset