© Magnus Lie Hetland 2017

Magnus Lie Hetland, Beginning Python, 10.1007/978-1-4842-0028-5_11

11. Files and Stuff

Magnus Lie Hetland

(1)Trondheim, Norway

So far, we’ve mainly been working with data structures that reside in the interpreter itself. What little interaction our programs have had with the outside world has been through input and print. In this chapter, we go one step further and let our programs catch a glimpse of a larger world: the world of files and streams. The functions and objects described in this chapter will enable you to store data between program invocations and to process data from other programs.

Opening Files

You can open files with the open function , which lives in the io module but is automatically imported for you. It takes a file name as its only mandatory argument and returns a file object. Assuming that you have a text file (created with your text editor, perhaps) called somefile.txtstored in the current directory, you can open it like this:

>>> f = open('somefile.txt')

You can also specify the full path to the file, if it’s located somewhere else. If it doesn’t exist, however, you’ll see an exception traceback like this:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'somefile.txt'

If you wanted to create the file by writing text to it, this isn’t entirely satisfactory. The solution is found in the second argument to open.

File Modes

If you use open with only a file name as a parameter, you get a file object you can read from. If you want to write to the file, you must state that explicitly, supplying a mode. The mode argument to the open function can have several values, as summarized in Table 11-1.

Table 11-1. Most Common Values for the Mode Argument of the open Function

Value

Description

'r'

Read mode (default)

'w'

Write mode

'x'

Exclusive write mode

'a'

Append mode

'b'

Binary mode (added to other mode)

't'

Text mode (default, added to other mode)

'+'

Read/write mode (added to other mode)

Explicitly specifying read mode has the same effect as not supplying a mode string at all. The write mode enables you to write to the file and will create the file if it does not exist. The exclusive write mode goes further and raises a FileExistsErrorif the file already exists. If you open an existing file in write mode, the existing contents will be deleted, or truncated, and writing starts afresh from the beginning of the file; if you’d rather just keep writing at the end of the existing file, use append mode.

The '+' can be added to any of the other modes to indicate that both reading and writing is allowed. So, for example, 'r+' can be used when opening a text file for reading and writing. (For this to be useful, you will probably want to use seek as well; see the sidebar “Random Access ” later in this chapter.) Note that there is an important difference between 'r+' and 'w+': the latter will truncate the file, while the former will not.

The default mode is 'rt', which means your file is treated as encoded Unicode text. Decoding and encoding are then performed automatically, with UTF-8 as the default encoding. Other encodings and Unicode error-handling strategies may be set using the encoding and errors keyword arguments. (See Chapter 1 for more on Unicode.) There is also some automatic translation of newline characters. By default, lines are ended by ' '. Other line endings (' ' or ' ') are automatically replaced on reading. On writing, ' ' is replaced by the system’s default line ending (os.linesep).

Normally, Python uses what is called universal newline mode, where any valid newline (' ', ' ', or ' ') is recognized, for example, by the readlines method, discussed later. If you wish to keep this mode but want to prevent automatic translation to and from ' ', you can supply an empty string to the newline keyword argument, as in open(name, newline=''). If you want to specify that only ' ' or ' ' is to be treated as a valid line ending, supply your preferred line ending instead. In this case, the line ending is not translated when reading, but ' ' will be replaced by the proper line ending when writing.

If your file contains nontextual, binary data, such as a sound clip or image, you certainly wouldn’t want any of these automatic transformations to be performed. In that case, you simply use binary mode ('rb', for example) to turn off any text-specific functionality.

There are a few other more slightly advanced optional arguments, as well, for controlling buffering and working more directly with file descriptors. See the Python documentation, or run help(open) in the interactive interpreter, to find out more .

The Basic File Methods

Now you know how to open files. The next step is to do something useful with them. In this section, you learn about some basic methods of file objects and about some other file-like objects , sometimes called streams. A file-like object is simply one supporting a few of the same methods as a file, most notably either read or write or both. The objects returned by urlopen (see Chapter 14) are a good example of this. They support methods such as read and readline, but not methods such as write and isatty, for example.

Reading and Writing

The most important capabilities of files are supplying and receiving data. If you have a file-like object named f, you can write data with f.write and read data with f.read. As with most Python functionality, there is some flexibility in what you use as data, but the basic classes used are str and bytes, for text and binary mode, respectively.

Each time you call f.write(string), the string you supply is written to the file after those you have written previously.

>>> f = open('somefile.txt', 'w')
>>> f.write('Hello, ')
7
>>> f.write('World!')
6
>>> f.close()

Notice that I call the close method when I’m finished with the file. You’ll learn more about it in the section “Closing Files” later in this chapter. Reading is just as simple. Just remember to tell the stream how many characters (or bytes, in binary mode) you want to read. Here’s an example (continuing where I left off):

>>> f = open('somefile.txt', 'r')
>>> f.read(4)
'Hell'
>>> f.read()
'o, World!'

First I specify how many characters to read (4), and then I simply read the rest of the file (by not supplying a number). Note that I could have dropped the mode specification from the call to open because 'r' is the default .

Piping Output

In a shell such as bash, you can write several commands after one another, linked together with pipes, as in this example:

$ cat somefile.txt | python somescript.py | sort

This pipeline consists of three commands.

  • cat somefile.txt: This command simply writes the contents of the file somefile.txt to standard output (sys.stdout).

  • python somescript.py: This command executes the Python script somescript. The script presumably reads from its standard input and writes the result to standard output.

  • sort: This command reads all the text from standard input (sys.stdin), sorts the lines alphabetically, and writes the result to standard output.

But what is the point of these pipe characters (|), and what does somescript.py do? The pipes link up the standard output of one command with the standard input of the next. Clever, eh? So you can safely guess that somescript.py reads data from its sys.stdin (which is what cat somefile.txt writes) and writes some result to its sys.stdout (which is where sort gets its data).

A simple script ( somescript.py) that uses sys.stdin is shown in Listing 11-1. The contents of the file somefile.txt are shown in Listing 11-2.

Listing 11-1. Simple Script That Counts the Words in sys.stdin
# somescript.py                          
import sys
text = sys.stdin.read()
words = text.split()
wordcount = len(words)
print('Wordcount:', wordcount)

Listing 11-2. A File Containing Some Nonsensical Text
Your mother was a hamster and your
father smelled of elderberries.

Here are the results of cat somefile.txt | python somescript.py:

Wordcount: 11                                                                                              

Reading and Writing Lines

Actually, what I’ve been doing until now is a bit impractical. I could just as well be reading in the lines of a stream as reading letter by letter. You can read a single line (text from where you have come so far, up to and including the first line separator you encounter) with the readline method. You can use this method either without any arguments (in which case a line is simply read and returned) or with a nonnegative integer, which is then the maximum number of characters that readline is allowed to read. So if some_file.readline() returns 'Hello, World! ', then some_file.readline(5) returns 'Hello'. To read all the lines of a file and have them returned as a list, use the readlines method.

The method writelines is the opposite of readlines: give it a list (or, in fact, any sequence or iterable object) of strings, and it writes all the strings to the file (or stream). Note that newlines are not added; you need to add those yourself. Also, there is no writeline method because you can just use write.

Closing Files

You should remember to close your files by calling their close method. Usually, a file object is closed automatically when you quit your program (and possibly before that), and not closing files you have been reading from isn’t really that important. However, closing those files can’t hurt and might help to avoid keeping the file uselessly “locked” against modification in some operating systems and settings. It also avoids using up any quotas for open files your system might have.

You should always close a file you have written to because Python may buffer (keep stored temporarily somewhere, for efficiency reasons) the data you have written, and if your program crashes for some reason, the data might not be written to the file at all. The safe thing is to close your files after you’re finished with them. If you want to reset the buffering and make your changes visible in the actual file on disk but you don’t yet want to close the file, you can use the flush method. Note, however, that flush might not allow other programs running at the same time to access the file because of locking considerations that depend on your operating system and settings. Whenever you can conveniently close the file, that is preferable.

If you want to be certain that your file is closed, you could use a try/finally statement with the call to close in the finally clause.

# Open your file here
try:
    # Write data to your file
finally:
    file.close()

There is, in fact, a statement designed specifically for this kind of situation—the with statement.

with open("somefile.txt") as somefile:
     do_something(somefile)

The with statement lets you open a file and assign it to a variable name (in this case, somefile). You then write data to your file (and, perhaps, do other things) in the body of the statement, and the file is automatically closed when the end of the statement is reached, even if that is caused by an exception .

Using the Basic File Methods

Assume that somefile.txtcontains the text in Listing 11-3. What can you do with it?

Listing 11-3. A Simple Text File
Welcome to this file
There is nothing here except
This stupid haiku

Let’s try the methods you know, starting with read(n).

>>> f = open(r'C:	extsomefile.txt')
>>> f.read(7)
'Welcome'
>>> f.read(4)
' to '
>>> f.close()

Next up is read():

>>> f = open(r'C:	extsomefile.txt')
>>> print(f.read())
Welcome to this file
There is nothing here except
This stupid haiku
>>> f.close()

Here’s readline():

>>> f = open(r'C:	extsomefile.txt')
>>> for i in range(3):
        print(str(i) + ': ' + f.readline(), end='')
0: Welcome to this file
1: There is nothing here except
2: This stupid haiku
>>> f.close()

And here’s readlines():

>>> import pprint
>>> pprint.pprint(open(r'C: extsomefile.txt').readlines())
['Welcome to this file ',
'There is nothing here except ',
'This stupid haiku']

Note that I relied on the file object being closed automatically in this example. Now let’s try writing, beginning with write(string).

>>> f = open(r'C:	extsomefile.txt', 'w')
>>> f.write('this is no haiku')
13
>>> f.close()

After running this, the file contains the text in Listing 11-4.

Listing 11-4. The Modified Text File
this
is no
haiku

Finally, here’s writelines(list):

>>> f = open(r'C:	extsomefile.txt')
>>> lines = f.readlines()
>>> f.close()
>>> lines[1] = "isn't a "
>>> f = open(r'C: extsomefile.txt', 'w')
>>> f.writelines(lines)
>>> f.close()

After running this, the file contains the text in Listing 11-5.

Listing 11-5. The Text File, Modified Again
this
isn't a
haiku

Iterating over File Contents

Now you’ve seen some of the methods file objects present to us, and you’ve learned how to acquire such file objects. One of the common operations on files is to iterate over their contents, repeatedly performing some action as you go. There are many ways of doing this, and you can certainly just find your favorite and stick to that. However, others may have done it differently, and to understand their programs, you should know all the basic techniques.

In all the examples in this section, I use a fictitious function called processto represent the processing of each character or line. Feel free to implement it in any way you like. Here’s one simple example:

def process(string):
    print('Processing:', string)

More useful implementations could do such things as storing data in a data structure, computing a sum, replacing patterns with the re module, or perhaps adding line numbers.

Also, to try out the examples, you should set the variable filename to the name of some actual file.

One Character (or Byte) at a Time

One of the most basic (but probably least common) ways of iterating over file contents is to use the read method in a while loop. For example, you might want to loop over every character (or, in binary mode, every byte) in the file. You could do that as shown in Listing 11-6. If you’d rather read chunks of several characters or bytes, supply the desired length to read.

Listing 11-6. Looping over Characters with read
with open(filename) as f:
    char = f.read(1)
    while char:
        process(char)
        char = f.read(1)

This program works because when you have reached the end of the file, the read method returns an empty string, but until then, the string always contains one character (and thus has the Boolean value true). As long as char is true, you know that you aren’t finished yet.

As you can see, I have repeated the assignment char = f.read(1), and code repetition is generally considered a bad thing. (Laziness is a virtue, remember?) To avoid that, we can use the while True/break technique introduced in Chapter 5. The resulting code is shown in Listing 11-7.

Listing 11-7. Writing the Loop Differently
with open(filename) as f:
    while True:
        char = f.read(1)
        if not char: break
        process(char)

As mentioned in Chapter 5, you shouldn’t use the break statement too often (because it tends to make the code more difficult to follow). Even so, the approach shown in Listing 11-7 is usually preferred to that in Listing 11-6, precisely because you avoid duplicated code .

One Line at a Time

When dealing with text files, you are often interested in iterating over the lines in the file, not each individual character. You can do this easily in the same way as we did with characters, using the readline method (described earlier, in the section “Reading and Writing Lines”), as shown in Listing 11-8.

Listing 11-8. Using readline in a while Loop
with open(filename) as f:
    while True:
        line = f.readline()
        if not line: break
        process(line)

Reading Everything

If the file isn’t too large, you can just read the whole file in one go, using the read method with no parameters (to read the entire file as a string) or the readlines method (to read the file into a list of strings, in which each string is a line). Listings 11-9 and 11-10 show how easy it is to iterate over characters and lines when you read the file like this. Note that reading the contents of a file into a string or a list like this can be useful for other things besides iteration. For example, you might apply a regular expression to the string, or you might store the list of lines in some data structure for further use.

Listing 11-9. Iterating over Characters with read
with open(filename) as f:
    for char in f.read():
        process(char)

Listing 11-10. Iterating over Lines with readlines
with open(filename) as f:
    for line in f.readlines():
        process(line)

Lazy Line Iteration with fileinput

Sometimes you need to iterate over the lines in a very large file, and readlines would use too much memory. You could use a while loop with readline, of course, but in Python, for loops are preferable when they are available. It just so happens that they are in this case. You can use a method called lazy line iteration—it’s lazy because it reads only the parts of the file actually needed (more or less).

You have already encountered fileinput in Chapter 10. Listing 11-11 shows how you might use it. Note that the fileinput module takes care of opening the file. You just need to give it a file name.

Listing 11-11. Iterating over Lines with fileinput
import fileinput
for line in fileinput.input(filename):
    process(line)

File Iterators

It’s time for the coolest (and the most common) technique of all. Files are actually iterable, which means that you can use them directly in for loops to iterate over their lines. See Listing 11-12 for an example.

Listing 11-12. Iterating over a File
with open(filename) as f:
    for line in f:
        process(line)

In these iteration examples, I have used the files as context managers, to make sure my files are closed. Although this is generally a good idea, it’s not absolutely critical, as long as I don’t write to the file. If you are willing to let Python take care of the closing, you could simplify the example even further, as shown in Listing 11-13. Here, I don’t assign the opened file to a variable (like the variable f I’ve used in the other examples), and therefore I have no way of explicitly closing it.

Listing 11-13. Iterating over a File Without Storing the File Object in a Variable
for line in open(filename):
    process(line)

Note that sys.stdin is iterable, just like other files, so if you want to iterate over all the lines in standard input, you can use this form:

import sys
for line in sys.stdin:
    process(line)

Also, you can do all the things you can do with iterators in general, such as converting them into lists of strings (by using list(open(filename))), which would simply be equivalent to using readlines.

>>> f = open('somefile.txt', 'w')
>>> print('First', 'line', file=f)
>>> print('Second', 'line', file=f)
>>> print('Third', 'and final', 'line', file=f)
>>> f.close()
>>> lines = list(open('somefile.txt'))
>>> lines
['First line ', 'Second line ', 'Third and final line ']
>>> first, second, third = open('somefile.txt')
>>> first
'First line '
>>> second
'Second line '
>>> third
'Third and final line '

In this example, it’s important to note the following:

  • I’ve used print to write to the file. This automatically adds newlines after the strings I supply.

  • I use sequence unpacking on the opened file, putting each line in a separate variable. (This isn’t exactly common practice because you usually won’t know the number of lines in your file, but it demonstrates the “iterability” of the file object.)

  • I close the file after having written to it, to ensure that the data is flushed to disk. (As you can see, I haven’t closed it after reading from it. Sloppy, perhaps, but not critical.)

A Quick Summary

In this chapter, you’ve seen how to interact with the environment through files and file-like objects, one of the most important techniques for I/O in Python. Here are some of the highlights from the chapter:

  • File-like objects: A file-like object is (informally) an object that supports a set of methods such as read and readline (and possibly write and writelines).

  • Opening and closing files: You open a file with the open function, by supplying a file name. If you want to make sure your file is closed, even if something goes wrong, you can use the with statement.

  • Modes and file types: When opening a file, you can also supply a mode, such as 'r' for read mode or 'w' for write mode. By appending 'b' to your mode, you can open files as binary files and turn off Unicode encoding and newline substitution.

  • Standard streams: The three standard files (stdin, stdout, and stderr, found in the sys module) are file-like objects that implement the UNIX standard I/O mechanism (also available in Windows).

  • Reading and writing: You read from a file or file-like object using the method read. You write with the method write.

  • Reading and writing lines: You can read lines from a file using readline and readlines. You can write files with writelines.

  • Iterating over file contents: There are many ways of iterating over file contents. It is most common to iterate over the lines of a text file, and you can do this by simply iterating over the file itself. There are other methods too, such as using readlines, that are compatible with older versions of Python.

New Functions in This Chapter

Function

Description

open(name, ...)

Opens a file and returns a file object

What Now?

So now you know how to interact with the environment through files, but what about interacting with the user? So far we’ve used only input and print, and unless the user writes something in a file that your program can read, you don’t really have any other tools for creating user interfaces. That changes in the next chapter, where I cover graphical user interfaces, with windows, buttons, and the like.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset