Chapter 13

Dealing with Files

IN THIS CHAPTER

Check Considering local file storage methods

Check Dealing with file access issues

Check Performing typical file access tasks

Check Using file management techniques CRUD style

Chapter 11 gives you a very brief look at localized file management in the “Working with devices” section of the chapter. Now it’s time to look at local files in more detail because you often use local files as part of applications — everything from storing application settings to analyzing a moderately large dataset. In fact, as you may already know, local files were the first kind of data storage that computers used; networks and the cloud came much later. Even on the smallest tablet today, you can still find local files stored in a hard-drive–like environment (although hard drives have come a very long way from those disk packs of old).

After you get past some of the general mechanics of how files are stored, you actually need to start working with them. Developers face a number of issues when working with files. For example, one of the more common problems is that a user can’t access a file because of a lack of rights. Security is a two-edged sword that protects data by restricting access to it and keeping the right people from accessing it for the right reasons. This chapter helps you understand various file access issues and demonstrates how to overcome them.

The chapter also discusses Create, Read, Update, and Delete (CRUD), the four actions you can perform on any file for which you have the correct rights. CRUD normally appears in reference to database management, but it applies just as much to any file you might work with.

Understanding How Local Files are Stored

If you have worked with computers for a while, you know that the operating system handles all the details of working with files. An application requests these services of the operating system. Using this approach is important for security reasons, and it ensures that all applications can work together on the same system. If each application was allowed to perform tasks in a unique manner, the resulting chaos would make it impossible for any application to work.

The reason that operating system and other application considerations are important for the functional programming paradigm is that unlike other tasks you might perform, file access depends on a nonfunctional, procedural third party. In most cases, you must perform a set of prescribed steps in a specific order to get any work done. As with anything, you can find exceptions, such as the functional operating systems described at http://wiki.c2.com/?PurelyFunctionalOperatingSystem and https://en.wikipedia.org/wiki/House_(operating_system). However, you have to ask yourself whether you’ve ever even heard of these operating systems. You're more likely to need to work with OS X, Linux, or Windows on the desktop and something like Android or iOS on mobile devices.

Remember Most operating systems use a hierarchical approach to storing files. Each operating system does have differences, such as those discussed between Linux and Windows at https://www.howtogeek.com/137096/6-ways-the-linux-file-system-is-different-from-the-windows-file-system/. However, the fact that Linux doesn’t use locks on files but Windows does really won’t affect your application in most cases. The recursive nature of the functional programming paradigm does work well in locating files and ensuring that files get stored in the right location. Ultimately, the hierarchy used to store files means that you need a path to locate the file on the drive (regardless of whether the operating system specifically mentions the drive).

Files also have specific characteristics associated with them that vary by operating system. However, most operating systems include a creation and last modification date, file size, file type (possibly through the use of a particular file extension), and security access rights with the filename. If you plan to use your application on multiple platforms, which is becoming more common, you must create a plan for interacting with file properties in a consistent manner across platforms if possible.

All the considerations described in this section come into play when performing file access, even with a functional language. However, as you see later, functional languages often rely on the use of monads to perform most file access tasks in a consistent manner across operating systems, as described for any I/O in Chapter 11. By abstracting the process of interacting with files, the functional programming paradigm actually makes things simpler.

Ensuring Access to Files

A number of common problems arise in accessing files on a system — problems that the functional programming paradigm can’t hide. The most common problem is a lack of rights to access the file. Security issues plague not only the local drive, but every other sort of drive as well, including cloud-based storage. One of the best practices for a developer to follow is to test everything using precisely the same rights that the user will have. Unfortunately, even then you may not find every security issue, but you’ll find the vast majority of them.

Some access issues are also the result of bad information — fallacies that developers have simply believed without testing. One of these issues is the supposed difference in using the backslash on Windows and the forward slash on Linux and OS X. The truth is that you can use the forward slash on all operating systems, as described at http://blog.johnmuellerbooks.com/2014/03/10/backslash-versus-forward-slash/. All the example code in this chapter uses the forward slash when dealing with paths as a point of demonstration.

Often a developer also runs afoul of file property issues. Some of these issues are external to the file, such as mistaking one file type for another. Other issues are internal to the file, such as trying to read a UTF-7 file using code designed for UTF-8 or UTF16, which are currently more common. Even though you can access a file when facing a property issue, the access doesn’t help because you can’t do anything with the file after you access it. As far as your application is concerned, you still lack access to the file (and in a practical sense, you do, even if you have successfully opened it).

Specific language tools also present problems. For example, the message thread at https://github.com/haskell/cabal/issues/447 discusses issues that occur as part of the installation process using Cabal (the utility that ships with Haskell). Imagine installing a new application that you built and then finding that only administrators can use it. Unfortunately, this problem might not show up unless you test your application installation on the right version of Windows. Haskell isn’t alone in this problem; every language comes with special issues that may affect your ability to access files, so constant testing and handling of error reports is an essential part of working with files.

Interacting with Files

Understanding how the files are stored and knowing the requirements for access are the first two steps in interacting with them. If you have worked with other programming languages, you have likely worked with files in a procedural manner: obtaining a file handle, using it to open the file, and then closing the file handle when finished. The functional programming paradigm must also follow these rules, as demonstrated in Chapter 11, but working in the functional world brings different nuances, as discussed in the sections that follow.

Creating new files

Operating systems generally provide a number of ways of opening files. In the default method, you normally open the file and overwrite the existing content with anything new that you write. When the file doesn’t exist, the operating system automatically creates it for you. The following code shows an example of opening a file for writing and automatically creating that file when it doesn’t exist:

import System.IO as IO

main = do
handle <- openFile "MyData.txt" WriteMode
hPutStrLn handle "This is some test data."
hClose handle

The defining factor here is the WriteMode argument. When you use the WriteMode argument, you tell the operating system to create a new file when one doesn't exist or to overwrite any existing content. The Python equivalent to this code is

handle = open("MyData2.txt", "w")
print(handle.write("This is some test data. "))
handle.close()

Notice that when using Python, you use the "w" argument to access the write mode. In addition, Python has no method of writing a line with a carriage return; you add it manually by using the escape. Adding the print function lets you see how many characters Python writes to the file.

Remember As an alternative to using the WriteMode argument, you can use the ReadWriteMode argument when you want to both read from and write to the file. Writing to the file works as before: You either create a new file or overwrite the content of an existing file. To read from the file, of course, the file must contain something to read. The “Reading data” section of the chapter discusses this issue in more detail.

Opening existing files

When you have an existing file, you can read, append, update, and delete the data it contains. Even though you will create new files when writing an application, most applications spend more time opening existing files in order to manage content in some way. For the application to perform data-management tasks, the file must exist. Even if you think that the file exists, you must verify its presence because the user or another application may have deleted it, or the user may not have followed protocol and created it, or sunspot could have damaged the file directory entry on disk, or …. The list can become quite long as to why the file you thought was there really isn’t. The process of data management can become complex because you often perform searches for specific content as well. However, the initial task focuses on simply opening the file.

The “Reading data” section of the chapter discusses the task of opening a file to read it, especially when you need to search for specific data. Likewise, writing, updating, and deleting data appears in the “Updating data” section of the chapter. However, the task of appending — adding content to the end of the file — is somewhat different. The following code shows how to append data to a file that already exists:

import System.IO as IO

main = do
handle <- openFile "MyData.txt" AppendMode
hPutStrLn handle "This is some test data too."
hClose handle

Except for the AppendMode argument, this code looks much like the code in the previous section. However, no matter how often you run the code in the previous section, the resulting file always contains just one line of text. When you run this example, you see multiple lines, as shown in Figure 13-1.

Screen capture of Mydata.txt Notepad window with the text: This is some test data. This is some test data too.

FIGURE 13-1: Appending means adding content to the end of a file.

Python provides the same functionality. The following code shows the Python version, which relies on the "a" (append) mode:

handle = open("MyData2.txt", "a")
print(handle.write("This is some test data too. "))
handle.close()

Remember Some languages treat appending differently from standard writing. If the file doesn't exist, the language will raise an exception to tell you that you can't append to a file that doesn’t exist. To append to a file, you must create it first. Both Haskell and Python take a better route — appending also covers creating a new file when one doesn’t exist.

Manipulating File Content

When thinking through the process of dealing with I/O on the local system in the form of files, you have to separate the main components and deal with them individually:

  • Physicality: The location of the file on the storage system. The operating system can hide this location in some respects, and even create mappings so that a single storage unit actually points to multiple physical drives that aren't necessarily located on the local machine. The fact remains, however, that the file must appear somewhere. Even if the user accesses this file by clicking a convenient icon, the developer must still have some idea of where the file resides, or access is impossible.
  • Container: Data resides in a container of some sort. The container used in this chapter is a file, but it could just as easily be a database or a collection of files within a particular folder. As with physicality, users don’t often see the container used to hold the data except as an abstraction (and sometimes not even that, as in the case of an application that opens a database automatically). Again, the developer must know the properties and characteristics of the container to write a successful application.
  • Data: The data itself is an entity and the one that everyone, including users, is intimately aware of when working with an application. Previous sections of the chapter discuss the other entities in this list. The following sections discuss this final entity. It begins with the Create, Read, Update, and Delete (CRUD) operations associated with data and views two of those entities in closer detail.

Considering CRUD

People create acronyms to make remember something easier. Sometimes those acronyms are unfortunate, as in calling operations on data CRUD. However, the people who work with databases wanted something easy to remember, so data-related tasks became CRUD. Another school of thought called the list of tasks Browse, Read, Edit, Add, and Delete (BREAD), but that particular acronym didn’t seem to stick, even though your daily BREAD might rely on your ability to employ CRUD. This chapter uses CRUD because that seems to be the most popular acronym. You can view CRUD as comprising the following tasks:

  • Create: Adding new data to storage. Anytime you create new storage, such as a file, you generally create new data as well. Empty storage isn’t useful. The examples in the “Interacting with Files” section, earlier in this chapter, demonstrate creating data in both a new and an existing file. In both cases, the functional programming paradigm uses the IO monad operation on the combination of a handle and the associated data to place data in the file. This takes place after creating the file using another monad consisting of the IO operating on a combination of the filename and opening mode.
  • Read: Reading data within a storage container means to do something with the content that doesn't change it in any way. You can see at least two kinds of read tasks in most applications:
    1. Employ an IO monad operation on the combination of a handle and data location to retrieve specific data. In this case, the data output is the target of the task. When you don’t supply a specific location, the operation assumes either the start of the storage or the current storage location pointer value. (The location pointer is an internally maintained value that indicates the end of the last read location within the storage.)
    2. Employ an IO monad operating on the combination of a handle and search criteria. In this case, the goal is to search for specific data and retrieve a data location based on that search. Some developers view this task as a browse, rather than as a read.
  • Update: When data within the storage container still has value but contains mistakes, it requires an update, which the application performs using the following steps. In this case, you're really looking at a series of IO monads:
    1. Locate the existing data using the combination of a handle and the search expression.
    2. Copy the existing data using the combination of a handle and the data location.
    3. Write the new data using a combination of a handle and the data.
  • Delete: When the data within storage no longer has value, the application deletes the entry. In this case, you rely on the following IO monads to perform the task:
    1. Locate the data to remove using a combination of a handle and a search expression.
    2. Delete the data using a combination of a handle and a data location.

Reading data

The concept of reading data isn't merely about obtaining information from a storage container, such as a file. When a person reads a book, a lot more goes on than simple information acquisition, in many cases. Often, the person must search for the appropriate information (unless the intent is to read the entire book) and then track progress during each reading session (unless there is just one session). A computer must do the same. The following example shows how the computer tracks its current position within the file during the read:

import System.IO as IO

main = do
handle <- openFile "MyData.txt" ReadMode
myData <- hGetLine handle
position <- hGetPosn handle
hClose handle
putStrLn myData
putStrLn (show position)

Here, the application performs a read using hGetLine, which obtains an entire line of text (ending with a carriage return). However, the test file contains more than one line of text if you worked through the examples in the previous sections. This means that the file pointer isn’t at the end of the file.

The call to hGetPosn obtains the actual position of the file pointer. The example outputs both the first line of text and the file position, which is reported as {handle: MyData.txt} at position 25 if you used the file from the previous examples. A second call to hGetLine will actually retrieve the next line of text from the file, at which point the file pointer will be at the end of the file.

Remember The example shows hGetLine, but Haskell and Python both provide an extensive array of calls to obtain data from a file. For example, you can get a single character by calling hGetChar. You can also peek at the next character in line without moving the file pointer by calling hLookAhead.

Updating data

Of the tasks you can perform with a data container, such as a file, updating is often the hardest because it involves finding the data first and then writing new data to the same location without overwriting any data that isn't part of the update. The combination of the language you use and the operating system do reduce the work you perform immensely, but the process is still error prone. The following code demonstrates one of a number of ways to change the contents of a file. (Note that the two lines beginning with let writeData must appear on a single line in your code file.)

import System.IO as IO
import Data.Text as DT

displayData (filePath) = do
handle <- openFile filePath ReadMode
myData <- hGetContents handle
putStrLn myData
hClose handle

main = do
displayData "MyData3.txt"

contents <- readFile "MyData3.txt"
let writeData = unpack(replace
(pack "Edit") (pack "Update") (pack contents))
writeFile "MyData4.txt" writeData

displayData "MyData4.txt"

This example shows two methods for opening a file for reading. The first (as defined by the displayData function) relies on a modified form of the code shown in the “Reading data” section, earlier in this chapter. In this case, the example gets the entire contents of the file in a single read using hGetContents. The second version (starting with the second line of the main function) uses readFile, which also obtains the entire content of the file in a single read. This second form is easier to use but provides less flexibility.

The code uses the functions found in Data.Text to manipulate the file content. These functions rely on the Text data type, not the String data type. To convert a String to Text, you must call the pack function, as shown in the code. The reverse operation relies on the unpack function. The replace function provides just one method of modifying the content of a string. You can also rely on mapping to perform certain kinds of replacement, such as this single-character replacement:

let transform = pack contents
DT.map (c -> if c == '.' then '!' else c) transform

This method relies on a lambda function and provides considerable flexibility for a single-character replacement. The output replaces the periods in the text with exclamation marks by mapping the lambda function to the packed String (which is a Text object) found in transform. Notice how the lambda function examines characters separately, as opposed to the word-level search used in the example.

Warning Observe how the example uses one file for input and an entirely different file for output. Haskell relies on lazy reads and writes. If you were to attempt to use readFile on a file and then writeFile on the same file a few lines down, the resulting application would display a “resource busy” type of error message.

Completing File-related Tasks

After you finish performing data-related tasks, you need to do something with the data storage container. In most cases, that means closing the handle associated with the container. When working with files, some functions, such as readFile and writeFile, perform the task automatically. Otherwise, you close the file manually using hClose.

Warning Haskell, like most languages, comes with a few odd calls. For example, when you call hGetContents, the handle you use is semi-closed. A semi-closed handle is almost but not quite closed, which is odd when you think about it. You can't perform any additional reads, nor can you obtain the position of the file pointers. However, calling hClose to fully close the handle is still possible. The odd nature of this particular call can cause problems in your application because the error message will tell you that the handle is semi-closed, but it won’t tell you what that means or define the actual source of the semi-closure.

Another potential need may arise. If you use temporary files in your application, you need to remove them. The removeFile function performs this task by deleting the file from the path you supply. However, when working with Haskell, you find the call in System.Directory, not System.IO.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset