Chapter 4. Managing Files

As with any well-developed scripting language, Python is very prepared to handle the need to directly manage and manipulate files. Python includes several built-in functions, as well as additional modules to help manage files. These functions and modules provide the versatility and power to handle file parsing, data storage and retrieval, and filesystem management, as well as archive management.

It’s not possible to adequately address all the file management features of Python in this book; however, this chapter will provide the most common phrases to create and use files, manage files on a file system, and archive files for storage or distribution.

Opening and Closing Files

Example . 

file = open(inPath, 'rU')
file = open(outPath, 'wb')
file.close()

To use most of the built-in file functions in Python, you will need to first open the file, perform whatever file operations are necessary, and then close it. Python uses the simple open(path [,mode [,buffersize]]) call to open files for both reading and writing. The path is a path string pointing to the file. The mode determines what mode the file will be opened in, as shown in Table 4.1 .

Table 4.1. File Modes for Python’s Built-In File Functions

Mode

Description

r

Opens an existing file for reading.

w

Opens a file for writing. If the file already exists, the contents are deleted. If the file does not already exist, a new one is created.

a

Opens an existing file for updating, keeping the existing contents intact.

r+

Opens a file for both reading and writing. The existing contents are kept intact.

w+

Opens a file for both writing and reading. The existing contents are deleted.

a+

Opens a file for both reading and writing. The existing contents are kept intact.

b

Is applied in addition to one of the read, write, or append modes. Opens the file in binary mode.

U

Is applied in addition to one of the read, write, or append modes. Applies the “universal” newline translator to the file as it is opened.

The optional buffersize argument specifies which buffering mode should be used when accessing the file. 0 indicates that the file should be unbuffered, 1 indicates line-buffering, and any other positive number indicates a specific buffer size to be used when accessing the file. Buffering the file improves performance because part of the file is cached in computer memory. Omitting this argument or specifying a negative number results in the system default buffer size to be used.

After using the file, you should close it using the built-in close() function. This will free up the system resources and keep the file from being held open any longer than necessary.

Note

Using the universal newline mode U is extremely useful if you need to deal with files that are created by applications that are not consistent in managing newline characters. The universal newline mode converts all the different variations ( , , ) to the standard character.

inPath = "input.txt"
outPath = "output.txt"

#Open a file for reading
file = open(inPath, 'rU')
if file:
    # read from file here (see Reading an Entire
File
    # later in this chapter for more info)
    file.close()
else:
    print "Error Opening File."

#Open a file for writing
file = open(outPath, 'wb')
if file:
    # write to file here (see Writing a File later
    # in this chapter for more info)
    file.close()
else:
    print "Error Opening File."

open_file.py

Reading an Entire File

Example . 

buffer += open(filePath, 'rU').read()
inList = open(filePath, 'rU').readlines()
while(1):
    bytes = file.read(5)
    if bytes:
        buffer += bytes

Python provides several methods to read the entire contents of a file. The first is to open the file and call the read() function. This will read the entire contents of the file until an EOF marker is encountered and returns the contents of the file as a string.

Another method to read an entire file is to use the readlines() function. This reads the entire contents of the file, separating each line into individual strings, until an EOF marker is encountered. Once the end of the file is found, a list of strings representing each line is returned.

In case of very large files, you might want to read only a specific number of bytes at a time. Use the read(bytes) function to read a specific number of bytes at a time, which can then be processed more easily. This will read a specific number of bytes from the file if possible and return them as a string. If the first character read is an EOF marker, null is returned.

The code in read_file.py demonstrates how to read the entire contents at once, one line at a time, as well as a specific number of bytes from a file.

filePath = "input.txt"

#Read entire file into a buffer
buffer = "Read buffer:
"
buffer += open(filePath, 'rU').read()
print buffer

#Read lines into a buffer
buffer = "Readline buffer:
"
inList = open(filePath, 'rU').readlines()
print inList
for line in inList:
    buffer += line
print buffer

#Read bytes into a buffer
buffer = "Read buffer:
"
file = open(filePath, 'rU')
while(1):
    bytes = file.read(5)
    if bytes:
        buffer += bytes
    else:
        break

print buffer

read_file.py

Read buffer:
Line 1
Line 2
Line 3
Line 4

['Line 1
', 'Line 2
', 'Line 3
', 'Line 4
']
Readline buffer:
Line 1
Line 2
Line 3
Line 4

Read buffer:
Line 1
Line 2
Line 3
Line 4

Output from read_file py code

Reading a Single Line from a File

Example . 

print linecache.getline(filePath, 1)
print linecache.getline(filePath, 3)
linecache.clearcache()

The linecache module in Python is an extremely useful tool if you need to access specific lines in certain files multiple times. The linecache module caches the lines in a file in memory the first time they are read. Although this does not provide any advantage the first time the file is accessed, it does speed up consecutive accesses immensely.

The getline(filename, lineno) function of the linecache module accepts a filename and line number as its arguments. It then reads the line from the file, caches it in memory for later use, and then returns a string representation of the line. The clearcache() function of the linecache module frees up the cache memory by removing all lines that have been previously read.

import linecache
filePath = "input.txt"

print linecache.getline(filePath, 1)
print linecache.getline(filePath, 3)
linecache.clearcache()

line_cache.py

Line 1

Line 3

Output from line_cache.py code

Accessing Each Word in a File

Example . 

file = open(filePath, 'rU')
for line in file:
    for word in line.split():
        wordList.append(word)

A useful tool when processing files is to separate each word in the file and process them one at a time. The words can be individually processed by opening the file, reading each line into a string, and then splitting the strings into words using the split() function.

The program read_words.py shows a simple example of reading a file and processing the words one at time. The lines in the file are processed one at a time using a for loop. The split() function splits the line into a list of words based on spaces because no other character was passed as the separator argument. Once the words are separated, they can be individually processed into lists, dictionaries, and so on.

filePath = "input.txt"
wordList = []
wordCount = 0

#Read lines into a list
file = open(filePath, 'rU')
for line in file:
    for word in line.split():
        wordList.append(word)
        wordCount += 1
print wordList
print "Total words = %d" % wordCount

read_words.py

['Line', '1', 'Line', '2', 'Line', '3', 'Line', '4']
Total words = 8

Output from read_words.py code

Writing a File

Example . 

file.writelines(wordList)
file.write("

Formatted text:
")
print >>file,"	%s Color Adjust" % word

Just as with reading the contents of a file, there are several ways to write data out to a file. The easiest, yet the most dynamic and powerful, is the write(string) function. The write function writes the string argument to the file at the current file pointer. Although the write function itself is relatively simple, the power of Python with regard to string manipulation makes the capabilities of the write function virtually limitless.

Python provides the writelines(sequence) function to save time writing a list of data out to the file. The writelines function typically accepts a list of strings and writes those strings to the file.

Another option available in Python is to redirect the print statement out to a file using the >> redirection operation. This allows you to use the versatility of the Python print function to format and write data out to a file.

wordList = ["Red", "Blue", "Green"]
filePath = "output.txt"

#Write a list to a file
file = open(filePath, 'wU')
file.writelines(wordList)

#Write a string to a file
file.write("

Formatted text:
")

#Print directly to a file
for word in wordList:
    print >>file,"	%s Color Adjust" % word

file.close()

write_file.py

RedBlueGreen

Formatted text:
       Red Color Adjust
       Blue Color Adjust
       Green Color Adjust

Contents of output.txt file

Determining the Number of Lines in a File

Example . 

lineCount = len(open(filePath, 'rU').readlines())
print "File %s has %d lines." % (filePath,
lineCount)

When parsing files using Python, it’s useful to know exactly how many lines are contained in the file. The example in file_lines.py shows a simple method to determine the number of lines contained in a file by first opening it, and then using readlines() to generate a list of lines and using the len() function to determine the number of lines in the list.

Note

For large files, using readlines() to generate a list lines in a file might be impractical because of the amount of memory and processing time necessary.

filePath = "input.txt"

lineCount = len(open(filePath, 'rU').readlines())
print "File %s has %d lines." % (filePath,
lineCount)

file_lines.py

File input.txt has 4 lines.

Output from file_lines.py code

Walking the Directory Tree

Example . 

tree = os.walk(path)
for directory in tree:
    printDirectory(directory)

Python provides a powerful directory tree-walking function in the os module. The walk(path) function will walk the directory tree, and for each directory in the tree create a three-tuple containing (1) the dirpath, (2) a list of dirnames, and (3) a list of filenames.

Once the tuples have been created, they can be processed one at a time as elements of a list. For each tuple, you can access the path to the directory represented directly by using the 0 index into the tuple. Lists of the subdirectories and files contained in the directory can likewise be accessed using the 1 and 2 indexes, respectively.

The example in dir_tree.py shows how to use the os.walk(path) function to walk a directory tree and print out a formatted listing of the tree.

import os
path = "/books/python"

def printFiles(dirList, spaceCount):
    for file in dirList:
        print "/".rjust(spaceCount+1) + file

def printDirectory(dirEntry):
    print dirEntry[0] + "/"
    printFiles(dirEntry[2], len(dirEntry[0]))

tree = os.walk(path)
for directory in tree:
    printDirectory(directory)

dir_tree.py

/books/python/
             /Python Proposal.doc
             /Python_Phrasebook_TOC.doc
             /python_schedule.xls
             /template.doc
             /TOC_Notes.doc
/books/pythonCH2/
                 /ch2.doc
/books/pythonCH2code/
                      /comp_str.py
                      /end_str.py
                      /eval_str.py
                      /format_str.py
                      /join_str.py
                      /output.txt
                      /replace_str.py
                      /search_str.py
                      /split_str.py
                      /trim_str.py
                      /unicode_str.py
                      /var_str.py
/books/pythonCH3/
                 /ch3.doc

Output from dir_tree.py code

Renaming Files

Example . 

os.remove(newFileName)
os.rename(oldFileName, newFileName)

A common task when parsing files using Python is to either delete the file or at least rename it once the data has been processed. The easiest way to accomplish this is to use the os.remove(newFile) and os.rename(oldFile, newFile) function in the os module.

The example in ren_file shows how to rename a file by first detecting whether the new filename already exists and then removing the existing file. Once the existing file has been removed, the rename function can be used to rename the file.

import os

oldFileName = "/books/python/CH4/code/output.txt"
newFileName = "/books/python/CH4/code/output.old"

#Old Listing
for file in os.listdir("/books/python/CH4/code/"):
    if file.startswith("output"):
        print file

#Remove file if the new name already exists
if os.access(newFileName, os.X_OK):
    print "Removing " + newFileName
    os.remove(newFileName)

#Rename the file
os.rename(oldFileName, newFileName)

#New Listing
for file in os.listdir("/books/python/CH4/code/"):
    if file.startswith("output"):
        print file

ren_file.py

output.old
output.txt
Removing /books/python/CH4/code/output.old
output.old

Output from ren_file.py code

Recursively Deleting Files and Subdirectories

Example . 

for file in dirList:
    os.remove(dirPath + "/" + file)
for dir in emptyDirs:
    os.rmdir(dir)

To recursively delete files and subdirectories in Python, use the walk(path) function in the os module. For a more detailed description of the walk function, refer to the “Walking the Directory Tree” section earlier in this chapter.

The walk function will automatically create a list of tuples representing the directories that need to be deleted. To recursively delete a tree, walk through the list of directories and delete each file contained in the files list (third item in the tuple).

The trick is removing the directories. Because a directory cannot be removed until it is completely empty, the files must first be deleted and then the directories must be removed in reverse order, starting with the deepest subdirectory.

The example in del_tree.py shows how to use the os.walk(path) function to walk a directory tree and delete the files, and then recursively remove the subdirectories.

import os

emptyDirs = []
path = "/trash/deleted_files"

def deleteFiles(dirList, dirPath):
    for file in dirList:
        print "Deleting " + file
        os.remove(dirPath + "/" + file)

def removeDirectory(dirEntry):
    print "Deleting files in " + dirEntry[0]
    deleteFiles(dirEntry[2], dirEntry[0])
    emptyDirs.insert(0, dirEntry[0])

#Enumerate the entries in the tree
tree = os.walk(path)
for directory in tree:
    removeDirectory(directory)

#Remove the empty directories
for dir in emptyDirs:
    print "Removing " + dir
    os.rmdir(dir)

del_tree.py

Deleting files in /trash/deleted_files
Deleting 102.ini
Deleting 103.ini
Deleting 104.ini
Deleting 105.ini
Deleting 106.ini
Deleting 107.ini
Deleting 108.ini
Deleting 109.ini
Deleting files in/trash/deleted_filesTest
Deleting 111.ini
Deleting 114.ini
Deleting 115.ini
Deleting files in/trash/deleted_filesTestTest2
Deleting 112.ini
Deleting 113.ini
Removing /trash/deleted_filesTestTest2
Removing /trash/deleted_filesTest
Removing /trash/deleted_files

Output from del_tree.py code

Searching for Files Based on Extension

Example . 

for ext in pattern.split(";"):
   extList.append(ext.lstrip("*"))
....
if file.endswith(ext):
    print "/".rjust(spaceCount+1) + file

One of the most common file functions is to search for files based on extension. The example in find_file.py shows one way to search for files based on a string of extensions. The search is handled by first creating a list of the file extensions by splitting the pattern string using the split() function.

Once the list of extensions is created, walk the directory tree and check to see whether the file’s extension matches one in the list by using the endswith(string) function on the file.

import os
path = "/books/python"
pattern = "*.py;*.doc"

#Print files that match to file extensions
def printFiles(dirList, spaceCount, typeList):
    for file in dirList:
        for ext in typeList:
            if file.endswith(ext):
                print "/".rjust(spaceCount+1) + file
                break

#Print each sub-directory
def printDirectory(dirEntry, typeList):
    print dirEntry[0] + "/"
    printFiles(dirEntry[2], len(dirEntry[0]),
typeList)

#Convert pattern string to list of file extensions
extList = []
for ext in pattern.split(";"):
    extList.append(ext.lstrip("*"))

#Walk the tree to print files
for directory in os.walk(path):
    printDirectory(directory, extList)

find_file.py

/books/python/
             /Python Proposal.doc
             /Python_Phrasebook_TOC.doc
             /template.doc
             /TOC_Notes.doc
/books/pythonCH2/
                 /ch2.doc
/books/pythonCH2code/
                      /comp_str.py
                      /end_str.py
                      /eval_str.py
                      /format_str.py
                      /join_str.py
                      /replace_str.py
                      /search_str.py
                      /split_str.py
                      /trim_str.py
                      /unicode_str.py
                      /var_str.py
/books/pythonCH3/
                 /ch3.doc

Output from find_file.py code

Creating a TAR File

Example . 

tFile = tarfile.open("files.tar", 'w')
files = os.listdir(".")
for f in files:
    tFile.add(f)

The tarfile module, included with Python, provides a set of easy-to-use methods to create and manipulate TAR files. The open(filename [, mode [, fileobj [, bufsize]]]) method must be called with the write mode set to create a new TAR. Table 4.2 shows the different modes available when opening a TAR file.

Table 4.2. File Modes for Python’s tarfile Module

Mode

Description

r

(Default) Opens a TAR file for reading. If the file is compressed, it will be decompressed.

r:

Opens a TAR file for reading with no compression.

w or w:

Opens a TAR file for writing with no compression.

a or a:

Opens a TAR file for appending with no compression.

r:gz

Opens a TAR file for reading with gzip compression.

w:gz

Opens a TAR file for writing with gzip compression.

r:bz2

Opens a TAR file for reading with bzip2 compression.

w:bz2

Opens a TAR file for writing with bzip2 compression.

Once the TAR file has been opened in write mode, files can be added to it using the add(name [,arcname [, recursive]]) method. The add method adds the file or directory specified in name to the archive. The optional arcname argument enables you to specify what name the file should have inside the archive. The recursive argument accepts a Boolean true or false to determine whether or not to recursively add the contents of directories to the archive.

Note

To open a TAR file for sequential access only, replace the : character in the mode with a | character. The append mode is not available for the sequential access option.

import os
import tarfile

#Create Tar file
tFile = tarfile.open("files.tar", 'w')

#Add directory contents to tar file
files = os.listdir(".")
for f in files:
    tFile.add(f)

#List files in tar
for f in tFile.getnames():
    print "Added %s" % f

tFile.close()

tar_file.py

Added add_zip.py
Added del_tree.py
Added dir_tree.py
Added extract.txt
Added extract_tar.py
Added file_lines.py
Added find_file.py
Added get_zip.py
Added input.txt
Added open_file.py
Added output.old
Added read_file.py
Added read_line.py
Added read_words.py
Added ren_file.py
Added tar_file.py
Added write_file.py

Output from tar_file.py code

Extracting a File from a TAR File

Example . 

tFile = tarfile.open("files.tar", 'r')
tFile.extract(f, extractPath)

The tarfile module includes the extract(file [, path]) method to extract files specified by the file argument and place them in the location specified by the path argument. If no path is specified, the current working directory becomes the destination.

The example in extract_tar.py opens the TAR file created in the previous phrase and extracts only the Python files to a directory called /bin/py.

import os
import tarfile

extractPath = "/bin/py"

#Open Tar file
tFile = tarfile.open("files.tar", 'r')

#Extract py files in tar
for f in tFile.getnames():
    if f.endswith("py"):
        print "Extracting %s" % f
        tFile.extract(f, extractPath)
    else:
        print "%s is not a Python file." % f

tFile.close()

extract_tar.py

Extracting add_zip.py
Extracting del_tree.py
Extracting dir_tree.py
extract.txt is not a Python file.
Extracting extract_tar.py
Extracting file_lines.py
Extracting find_file.py
Extracting get_zip.py
input.txt is not a Python file.
Extracting open_file.py
output.old is not a Python file.
Extracting read_file.py
Extracting read_line.py
Extracting read_words.py
Extracting ren_file.py
Extracting tar_file.py
Extracting write_file.py

Output from extract_tar.py code

Adding Files to a ZIP File

Example . 

tFile = zipfile.ZipFile("files.zip", 'w')
files = os.listdir(".")
for f in files:
    tFile.write(f)

The zipfile module, included with Python, provides a set of easy-to-use methods to create and manipulate ZIP files. The ZipFile(filename [, mode [, compression]]) method creates or opens a ZIP file depending on the mode specified. The available modes for ZIP files are r, w, and a to read, write, or append, respectively. Using the w mode will create a new ZIP file or truncate the existing file to zero if it already exists.

The optional compression argument will accept either the ZIP_STORED(not compressed) or ZIP_DEFLATED(compressed) compression options to set the default compression when writing files to the archive.

Once the ZIP file has been opened in write mode, files can be added to it using the write(filename [,arcname [, compression]]) method. The write method adds the file specified in filename to the archive. The optional arcname argument enables you to specify what name the file should have inside the archive.

import os
import zipfile

#Create the zip file
tFile = zipfile.ZipFile("files.zip", 'w')

#Write directory contents to the zip file
files = os.listdir(".")
for f in files:
    tFile.write(f)

#List archived files
for f in tFile.namelist():
    print "Added %s" % f


tFile.close()

add_zip.py

Added add_zip.py
Added del_tree.py
Added dir_tree.py
Added extract.txt
Added extract_tar.py
Added files.zip
Added file_lines.py
Added find_file.py
Added get_zip.py
Added input.txt
Added open_file.py
Added output.old
Added read_file.py
Added read_line.py
Added read_words.py
Added ren_file.py
Added tar_file.py
Added write_file.py

Output from add_zip.py code

Retrieving Files from a ZIP File

Example . 

tFile = zipfile.ZipFile("files.zip", 'r')
buffer = tFile.read("ren_file.py")

Retrieving file contents from a ZIP file is easily done using the read(filename) method included in the zipfile module. Once the ZIP file is opened in read mode, the read method is called and the contents of the specified file are returned as a string. Once the contents are returned, they can be added to a list or dictionary, printed to the screen, written to a file, or any number of other possibilities.

The example in get_zip.py opens the ZIP file created in the previous phrase, reads Python file ren_file.py, prints the contents to the screen, and then writes the contents to a new file called extract.txt.

import os
import zipfile


tFile = zipfile.ZipFile("files.zip", 'r')

#List info for archived file
print tFile.getinfo("input.txt")

#Read zipped file into a buffer
buffer = tFile.read("ren_file.py")
print buffer

#Write zipped file contents to new file
f = open("extract.txt", "w")
f.write(buffer)
f.close()

tFile.close()

get_zip.py

<zipfile.ZipInfo instance at 0x008DCB70>
import os

oldFileName = "/books/python/CH4/code/output.txt"
newFileName = "/books/python/CH4/code/output.old"

#Old Listing
for file in os.listdir("/books/python/CH4/code/"):
    if file.startswith("output"):
        print file

#Remove file if the new name already exists
if os.access(newFileName, os.X_OK):
    print "Removing " + newFileName
    os.remove(newFileName)

#Rename the file
os.rename(oldFileName, newFileName)

#New Listing
for file in os.listdir("/books/python/CH4/code/"):
    if file.startswith("output"):
        print file

Output from get_zip.py code

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset