Chapter 13
IN THIS CHAPTER
Considering local file storage methods
Dealing with file access issues
Performing typical file access tasks
Using file management techniques CRUD style
Chapter 11 gives you a very brief look at localized file management in the “Working with devices” section of the chapter. Now it’s time to look at local files in more detail because you often use local files as part of applications — everything from storing application settings to analyzing a moderately large dataset. In fact, as you may already know, local files were the first kind of data storage that computers used; networks and the cloud came much later. Even on the smallest tablet today, you can still find local files stored in a hard-drive–like environment (although hard drives have come a very long way from those disk packs of old).
After you get past some of the general mechanics of how files are stored, you actually need to start working with them. Developers face a number of issues when working with files. For example, one of the more common problems is that a user can’t access a file because of a lack of rights. Security is a two-edged sword that protects data by restricting access to it and keeping the right people from accessing it for the right reasons. This chapter helps you understand various file access issues and demonstrates how to overcome them.
The chapter also discusses Create, Read, Update, and Delete (CRUD), the four actions you can perform on any file for which you have the correct rights. CRUD normally appears in reference to database management, but it applies just as much to any file you might work with.
If you have worked with computers for a while, you know that the operating system handles all the details of working with files. An application requests these services of the operating system. Using this approach is important for security reasons, and it ensures that all applications can work together on the same system. If each application was allowed to perform tasks in a unique manner, the resulting chaos would make it impossible for any application to work.
The reason that operating system and other application considerations are important for the functional programming paradigm is that unlike other tasks you might perform, file access depends on a nonfunctional, procedural third party. In most cases, you must perform a set of prescribed steps in a specific order to get any work done. As with anything, you can find exceptions, such as the functional operating systems described at http://wiki.c2.com/?PurelyFunctionalOperatingSystem
and https://en.wikipedia.org/wiki/House_(operating_system)
. However, you have to ask yourself whether you’ve ever even heard of these operating systems. You're more likely to need to work with OS X, Linux, or Windows on the desktop and something like Android or iOS on mobile devices.
Files also have specific characteristics associated with them that vary by operating system. However, most operating systems include a creation and last modification date, file size, file type (possibly through the use of a particular file extension), and security access rights with the filename. If you plan to use your application on multiple platforms, which is becoming more common, you must create a plan for interacting with file properties in a consistent manner across platforms if possible.
All the considerations described in this section come into play when performing file access, even with a functional language. However, as you see later, functional languages often rely on the use of monads to perform most file access tasks in a consistent manner across operating systems, as described for any I/O in Chapter 11. By abstracting the process of interacting with files, the functional programming paradigm actually makes things simpler.
A number of common problems arise in accessing files on a system — problems that the functional programming paradigm can’t hide. The most common problem is a lack of rights to access the file. Security issues plague not only the local drive, but every other sort of drive as well, including cloud-based storage. One of the best practices for a developer to follow is to test everything using precisely the same rights that the user will have. Unfortunately, even then you may not find every security issue, but you’ll find the vast majority of them.
Some access issues are also the result of bad information — fallacies that developers have simply believed without testing. One of these issues is the supposed difference in using the backslash on Windows and the forward slash on Linux and OS X. The truth is that you can use the forward slash on all operating systems, as described at http://blog.johnmuellerbooks.com/2014/03/10/backslash-versus-forward-slash/
. All the example code in this chapter uses the forward slash when dealing with paths as a point of demonstration.
Often a developer also runs afoul of file property issues. Some of these issues are external to the file, such as mistaking one file type for another. Other issues are internal to the file, such as trying to read a UTF-7 file using code designed for UTF-8 or UTF16, which are currently more common. Even though you can access a file when facing a property issue, the access doesn’t help because you can’t do anything with the file after you access it. As far as your application is concerned, you still lack access to the file (and in a practical sense, you do, even if you have successfully opened it).
Specific language tools also present problems. For example, the message thread at https://github.com/haskell/cabal/issues/447
discusses issues that occur as part of the installation process using Cabal (the utility that ships with Haskell). Imagine installing a new application that you built and then finding that only administrators can use it. Unfortunately, this problem might not show up unless you test your application installation on the right version of Windows. Haskell isn’t alone in this problem; every language comes with special issues that may affect your ability to access files, so constant testing and handling of error reports is an essential part of working with files.
Understanding how the files are stored and knowing the requirements for access are the first two steps in interacting with them. If you have worked with other programming languages, you have likely worked with files in a procedural manner: obtaining a file handle, using it to open the file, and then closing the file handle when finished. The functional programming paradigm must also follow these rules, as demonstrated in Chapter 11, but working in the functional world brings different nuances, as discussed in the sections that follow.
Operating systems generally provide a number of ways of opening files. In the default method, you normally open the file and overwrite the existing content with anything new that you write. When the file doesn’t exist, the operating system automatically creates it for you. The following code shows an example of opening a file for writing and automatically creating that file when it doesn’t exist:
import System.IO as IO
main = do
handle <- openFile "MyData.txt" WriteMode
hPutStrLn handle "This is some test data."
hClose handle
The defining factor here is the WriteMode
argument. When you use the WriteMode
argument, you tell the operating system to create a new file when one doesn't exist or to overwrite any existing content. The Python equivalent to this code is
handle = open("MyData2.txt", "w")
print(handle.write("This is some test data.
"))
handle.close()
Notice that when using Python, you use the "w"
argument to access the write mode. In addition, Python has no method of writing a line with a carriage return; you add it manually by using the
escape. Adding the print
function lets you see how many characters Python writes to the file.
When you have an existing file, you can read, append, update, and delete the data it contains. Even though you will create new files when writing an application, most applications spend more time opening existing files in order to manage content in some way. For the application to perform data-management tasks, the file must exist. Even if you think that the file exists, you must verify its presence because the user or another application may have deleted it, or the user may not have followed protocol and created it, or sunspot could have damaged the file directory entry on disk, or …. The list can become quite long as to why the file you thought was there really isn’t. The process of data management can become complex because you often perform searches for specific content as well. However, the initial task focuses on simply opening the file.
The “Reading data” section of the chapter discusses the task of opening a file to read it, especially when you need to search for specific data. Likewise, writing, updating, and deleting data appears in the “Updating data” section of the chapter. However, the task of appending — adding content to the end of the file — is somewhat different. The following code shows how to append data to a file that already exists:
import System.IO as IO
main = do
handle <- openFile "MyData.txt" AppendMode
hPutStrLn handle "This is some test data too."
hClose handle
Except for the AppendMode
argument, this code looks much like the code in the previous section. However, no matter how often you run the code in the previous section, the resulting file always contains just one line of text. When you run this example, you see multiple lines, as shown in Figure 13-1.
Python provides the same functionality. The following code shows the Python version, which relies on the "a"
(append) mode:
handle = open("MyData2.txt", "a")
print(handle.write("This is some test data too.
"))
handle.close()
When thinking through the process of dealing with I/O on the local system in the form of files, you have to separate the main components and deal with them individually:
People create acronyms to make remember something easier. Sometimes those acronyms are unfortunate, as in calling operations on data CRUD. However, the people who work with databases wanted something easy to remember, so data-related tasks became CRUD. Another school of thought called the list of tasks Browse, Read, Edit, Add, and Delete (BREAD), but that particular acronym didn’t seem to stick, even though your daily BREAD might rely on your ability to employ CRUD. This chapter uses CRUD because that seems to be the most popular acronym. You can view CRUD as comprising the following tasks:
IO
monad operation on the combination of a handle and the associated data to place data in the file. This takes place after creating the file using another monad consisting of the IO
operating on a combination of the filename and opening mode.IO
monad operation on the combination of a handle and data location to retrieve specific data. In this case, the data output is the target of the task. When you don’t supply a specific location, the operation assumes either the start of the storage or the current storage location pointer value. (The location pointer is an internally maintained value that indicates the end of the last read location within the storage.)IO
monad operating on the combination of a handle and search criteria. In this case, the goal is to search for specific data and retrieve a data location based on that search. Some developers view this task as a browse, rather than as a read.IO
monads: IO
monads to perform the task: The concept of reading data isn't merely about obtaining information from a storage container, such as a file. When a person reads a book, a lot more goes on than simple information acquisition, in many cases. Often, the person must search for the appropriate information (unless the intent is to read the entire book) and then track progress during each reading session (unless there is just one session). A computer must do the same. The following example shows how the computer tracks its current position within the file during the read:
import System.IO as IO
main = do
handle <- openFile "MyData.txt" ReadMode
myData <- hGetLine handle
position <- hGetPosn handle
hClose handle
putStrLn myData
putStrLn (show position)
Here, the application performs a read using hGetLine
, which obtains an entire line of text (ending with a carriage return). However, the test file contains more than one line of text if you worked through the examples in the previous sections. This means that the file pointer isn’t at the end of the file.
The call to hGetPosn
obtains the actual position of the file pointer. The example outputs both the first line of text and the file position, which is reported as {handle: MyData.txt} at position 25
if you used the file from the previous examples. A second call to hGetLine
will actually retrieve the next line of text from the file, at which point the file pointer will be at the end of the file.
Of the tasks you can perform with a data container, such as a file, updating is often the hardest because it involves finding the data first and then writing new data to the same location without overwriting any data that isn't part of the update. The combination of the language you use and the operating system do reduce the work you perform immensely, but the process is still error prone. The following code demonstrates one of a number of ways to change the contents of a file. (Note that the two lines beginning with let writeData
must appear on a single line in your code file.)
import System.IO as IO
import Data.Text as DT
displayData (filePath) = do
handle <- openFile filePath ReadMode
myData <- hGetContents handle
putStrLn myData
hClose handle
main = do
displayData "MyData3.txt"
contents <- readFile "MyData3.txt"
let writeData = unpack(replace
(pack "Edit") (pack "Update") (pack contents))
writeFile "MyData4.txt" writeData
displayData "MyData4.txt"
This example shows two methods for opening a file for reading. The first (as defined by the displayData
function) relies on a modified form of the code shown in the “Reading data” section, earlier in this chapter. In this case, the example gets the entire contents of the file in a single read using hGetContents
. The second version (starting with the second line of the main
function) uses readFile
, which also obtains the entire content of the file in a single read. This second form is easier to use but provides less flexibility.
The code uses the functions found in Data.Text
to manipulate the file content. These functions rely on the Text
data type, not the String
data type. To convert a String
to Text
, you must call the pack
function, as shown in the code. The reverse operation relies on the unpack
function. The replace
function provides just one method of modifying the content of a string. You can also rely on mapping to perform certain kinds of replacement, such as this single-character replacement:
let transform = pack contents
DT.map (c -> if c == '.' then '!' else c) transform
This method relies on a lambda function and provides considerable flexibility for a single-character replacement. The output replaces the periods in the text with exclamation marks by mapping the lambda function to the packed String
(which is a Text
object) found in transform
. Notice how the lambda function examines characters separately, as opposed to the word-level search used in the example.
After you finish performing data-related tasks, you need to do something with the data storage container. In most cases, that means closing the handle associated with the container. When working with files, some functions, such as readFile
and writeFile
, perform the task automatically. Otherwise, you close the file manually using hClose
.
Another potential need may arise. If you use temporary files in your application, you need to remove them. The removeFile
function performs this task by deleting the file from the path you supply. However, when working with Haskell, you find the call in System.Directory
, not System.IO
.