Chapter 2. Using the Windows File System and Character I/O

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 2. Using the Windows File System and Character I/O

The file system and simple terminal I/O are often the first OS features that the developer encounters. Early PC OSs such as MS-DOS did little more than manage files and terminal (or console) I/O, and these resources are also central features of nearly every OS.

Files are essential for the long-term storage of data and programs. Files are also the simplest form of program-to-program communication. Furthermore, many aspects of the file system model apply to interprocess and network communication.

The file copy programs in Chapter 1 introduced the four essential file processing functions:

CreateFile

ReadFile

WriteFile

CloseHandle

This chapter explains these and related functions and also describes character processing and console I/O functions in detail. First, we say a few words about the various file systems available and their principal characteristics. In the process, we’ll see how to use Unicode wide characters for internationalization. The chapter includes an introduction to Windows file and directory management.

The Windows File Systems

Windows natively supports four file systems on directly attached devices, but only the first is important throughout the book, as it is Microsoft’s primary, full-functionality file system. In addition, file systems are supported on devices such as USB drives. The file system choice on a disk volume or partition is specified when the volume is formatted.

1. The NT file system (NTFS) is Microsoft’s modern file system that supports long file names, security, fault tolerance, encryption, compression, extended attributes, and very large files¹ and volumes. Note that diskettes, which are now rare, do not support NTFS.

¹ “Very large” and “huge” are relative terms that we’ll use to describe a file longer than 4GB, which means that you need to use 64-bit integers to specify the file length and positions in the file.

2. The File Allocation Table (FAT and FAT32) file systems are rare on current systems and descend from the original MS-DOS and Windows 3.1 FAT (or FAT16) file systems. FAT32 supported larger disk drives and other enhancements, and the term FAT will refer to both versions. FAT does not support Windows security, among other limitations. FAT is the only supported file system for floppy disks and is often the file system on memory cards.

3. The CD-ROM file system (CDFS), as the name implies, is for accessing information provided on CD-ROMs. CDFS is compliant with the ISO 9660 standard.

4. The Universal Disk Format (UDF), an industry standard, supports DVD drives and will ultimately supplant CDFS. Windows Vista uses the term Live File System (LFS) as an enhancement that allows you to add new files and hide, but not actually delete, files.

Windows provides both client and server support for distributed file systems, such as the Networked File System (NFS) and Common Internet File System (CIFS). Windows Server 2003 and 2008 provide extensive support for storage area networks (SANs) and emerging storage technologies. Windows also allows custom file system development.

The file system API accesses all the file systems in the same way, sometimes with limitations. For example, only NTFS supports security. This chapter and the next point out features unique to NTFS as appropriate, but, in general, assume NTFS.

File Naming

Windows supports hierarchical file naming, but there are a few subtle distinctions for the UNIX user and basic rules for everyone.

• The full pathname of a disk file starts with a drive name, such as A: or C:. The A: and B: drives are normally diskette drives, and C:, D:, and so on are hard disks, DVDs, and other directly attached devices. Network drives are usually designated by letters that fall later in the alphabet, such as H: and K:.

• Alternatively, a full pathname, or Universal Naming Convention (UNC), can start with a double backslash (\), indicating the global root, followed by a server name and a share name to indicate a path on a network file server. The first part of the pathname, then, is \servernamesharename.

• The pathname separator is the backslash (), although the forward slash (/) works in CreateFile and other low-level API pathname parameters. This may be more convenient for C/C++ programmers, although it’s best simply to use backslashes to avoid possible incompatibility.

• Directory and file names cannot contain any ASCII characters with a value in the range 1–31 or any of these characters:

< > : " | ? * /

These characters have meaning on command lines, and their occurrences in file names would complicate command line parsing. Names can contain blanks. However, when using file names with blanks on a command line, put each file name in quotes so that the name is not interpreted as naming two distinct files.

• Directory and file names are case-insensitive, but they are also case-retaining, so that if the creation name is MyFile, the file name will show up as it was created, but the file can also be accessed with the name myFILE.

• Normally, file and directory names used as API function arguments can be as many as 255 characters long, and pathnames are limited to MAX_PATH characters (currently 260). You can also specify very long names with an escape sequence, which we’ll describe later.

• A period (.) separates a file’s name from its extension, and extensions (usually two to four characters after the rightmost period in the file name) conventionally indicate the file’s type. Thus, cci.EXE would be an executable file, and cci.C would be a C language source file. File names can contain multiple periods.

A single period (.) and two periods (..), as directory names, indicate the current directory and its parent, respectively.

With this introduction, it is now time to learn more about the Windows functions introduced in Chapter 1.

Opening, Reading, Writing, and Closing Files

The first Windows function described in detail is CreateFile, which opens existing files and creates new ones. This and other functions are described first by showing the function prototype and then by describing the parameters and function operation.

Creating and Opening Files

This is the first Windows function, so we’ll describe it in detail; later descriptions will frequently be much more streamlined as the Windows conventions become more familiar. This approach will help users understand the basic concepts and use the functions without getting bogged down in details that are available on MSDN.

Furthermore, CreateFile is complex with numerous advanced options not described here; we’ll generally mention the more important options and sometimes give very brief descriptions of other options that are used in later chapters and examples.

Chapter 1’s introductory Windows cpW program (Program 1-2) shows a simple use of CreateFile in which there are two calls that rely on default values for most of the parameters shown here.

Parameters

The parameter names illustrate some Windows conventions that were introduced in Chapter 1. The prefix dw describes DWORD (32 bits, unsigned) options containing flags or numerical values. lpsz (long pointer to a zero-terminated string), or, more simply, lp, is for pathnames and other strings, although the Microsoft documentation is not entirely consistent. At times, you need to use common sense or read the documentation carefully to determine the correct data types.

lpName is a pointer to the null-terminated string that names the file, pipe, or other named object to open or create. The pathname is normally limited to MAX_PATH (260) characters, but you can circumvent this restriction by prefixing the pathname with \? and using Unicode characters and strings.² This technique allows functions requiring pathname arguments to use names as long as 32K characters. The prefix is not part of the name. Finally, the LPCTSTR data type is explained in an upcoming section that also describes generic characters and strings; just regard it as a string data type for now.

² Please see the “Interlude: Unicode and Generic Characters” section later in this chapter for more information.

dwAccess specifies the read and write access, using GENERIC_READ and GENERIC_WRITE. Flag values such as READ and WRITE do not exist. The GENERIC_ prefix may seem redundant, but it is necessary to conform with the macro names in the Windows header file, winnt.h. Numerous other constant names may seem longer than necessary, but the long names are easily readable and avoid name collisions with other macros.

These values can be combined with a bit-wise “or” operator (|), so to open a file for read and write access:

GENERIC_READ | GENERIC_WRITE

dwShareMode is a bit-wise “or” combination of:

• 0—The file cannot be shared. Furthermore, not even this process can open a second HANDLE on this file.

• FILE_SHARE_READ—Other processes, including the one making this call, can open this file for concurrent read access.

• FILE_SHARE_WRITE—This allows concurrent writing to the file.

When relevant to proper program operation, the programmer must take care to prevent concurrent updates to the same file location by using locks or other mechanisms. Chapter 3 covers this in more detail.

lpSecurityAttributes points to a SECURITY_ATTRIBUTES structure. Use NULL values with CreateFile and all other functions for now; security is treated in Chapter 15.

dwCreate specifies whether to create a new file, overwrite an existing file, and so on.

• CREATE_NEW—Create a new file. Fail if the specified file already exists.

• CREATE_ALWAYS—Create a new file, or overwrite the file if it already exists.

• OPEN_EXISTING—Open an existing file or fail if the file does not exist.

• OPEN_ALWAYS—Open the file, creating it if it does not exist.

• TRUNCATE_EXISTING—Set the file length to zero. dwCreate must specify at least GENERIC_WRITE access. Destroy all contents if the specified file exists. Fail if the file does not exist.

dwAttrsAndFlags specifies file attributes and flags. There are 32 flags and attributes. Attributes are characteristics of the file, as opposed to the open HANDLE, and these flags are ignored when an existing file is opened. Here are some of the more important attribute and flag values.

• FILE_ATTRIBUTE_NORMAL—This attribute can be used only when no other attributes are set (flags can be set, however).

• FILE_ATTRIBUTE_READONLY—Applications can neither write to nor delete the file.

• FILE_FLAG_DELETE_ON_CLOSE—This is useful for temporary files. Windows deletes the file when the last open HANDLE is closed.

• FILE_FLAG_OVERLAPPED—This attribute flag is important for asynchronous I/O (see Chapter 14).

Several additional flags also specify how a file is processed and help the Windows implementation optimize performance and file integrity.

• FILE_FLAG_RANDOM_ACCESS—The file is intended for random access, and Windows will attempt to optimize file caching.

• FILE_FLAG_SEQUENTIAL_SCAN—The file is for sequential access, and Windows will optimize caching accordingly. These last two access modes are not enforced and are hints to the Windows cache manager. Accessing a file in a manner inconsistent with these access modes may degrade performance.

• FILE_FLAG_WRITE_THROUGH and FILE_FLAG_NO_BUFFERING are two examples of advanced flags that are useful in some advanced applications.

hTemplateFile is the HANDLE of an open GENERIC_READ file that specifies extended attributes to apply to a newly created file, ignoring dwAttrsAndFlags. Normally, this parameter is NULL. Windows ignores hTemplateFile when an existing file is opened. This parameter can be used to set the attributes of a new file to be the same as those of an existing file.

The two CreateFile instances in cpW (Program 1-2) use default values extensively and are as simple as possible but still appropriate for the task. It could be beneficial to use FILE_FLAG_SEQUENTIAL_SCAN in both cases. (Exercise 2–3 explores this option, and Appendix C shows the performance results.)

Notice that if the file share attributes and security permit it, there can be numerous open handles on a given file. The open handles can be owned by the same process or by different processes. (Chapter 6 describes process management.)

Windows Vista and later versions provide the ReOpenFile function, which returns a new handle with different flags, access rights, and so on, assuming there are no conflicts with existing handles to the same file. ReOpenFile allows you to have different handles for different situations and protect against accidental misuse. For example, a function that updates a shared file could use a handle with read-write access, whereas other functions would use a read-only handle.

Closing Files

Windows has a single all-purpose CloseHandle function to close and invalidate kernel handles³ and to release system resources. Use this function to close nearly all HANDLE objects; exceptions are noted. Closing a handle also decrements the object’s handle reference count so that nonpersistent objects such as temporary files and events can be deleted. Windows will close all open handles on exit, but it is still good practice for programs to close their handles before terminating.

³ It is convenient to use the term “handle,” and the context should make it clear that we mean a Windows HANDLE.

Closing an invalid handle or closing the same handle twice will cause an exception when running under a debugger (Chapter 4 discusses exceptions and exception handling). It is not necessary or appropriate to close the standard device handles, which are discussed in the “Standard Devices and Console I/O” section.

The comparable UNIX functions are different in a number of ways. The UNIX open function returns an integer file descriptor rather than a handle, and it specifies access, sharing, create options, attributes, and flags in the single integer oflag parameter. The options overlap, with Windows providing a richer set.

There is no UNIX equivalent to dwShareMode. UNIX files are always shareable.

Both systems use security information when creating a new file. In UNIX, the mode argument specifies the familiar user, group, and other file permissions.

close is comparable to CloseHandle, but it is not general purpose.

The C library stdio.h functions use FILE objects, which are comparable to handles (for disk files, terminals, tapes, and other devices) connected to streams. The fopen mode parameter specifies whether the file data is to be treated as binary or text. There is a set of options for read-only, update, append at the end, and so on. freopen allows FILE reuse without closing it first. The Standard C library cannot set security permissions.

fclose closes a FILE. Most stdioFILE-related functions have the f prefix.

Reading Files

Assume, until Chapter 14, that the file handle does not have the FILE_FLAG_OVERLAPPED option set in dwAttrsAndFlags. ReadFile, then, starts at the current file position (for the handle) and advances the position by the number of bytes transferred.

The function fails, returning FALSE, if the handle or any other parameters are invalid or if the read operation fails for any reason. The function does not fail if the file handle is positioned at the end of file; instead, the number of bytes read (*lpNumberOfBytesRead) is set to 0.

Parameters

Because of the long variable names and the natural arrangement of the parameters, they are largely self-explanatory. Nonetheless, here are some brief explanations.

hFile is a file handle with FILE_READ_DATA access, a subset of GENERIC_READ access. lpBuffer points to the memory buffer to receive the input data. nNumberOfBytesToRead is the number of bytes to read from the file.

lpNumberOfBytesRead points to the actual number of bytes read by the ReadFile call. This value can be zero if the handle is positioned at the end of file or there is an error, and message-mode named pipes (Chapter 11) allow a zero-length message.

lpOverlapped points to an OVERLAPPED structure (Chapters 3 and 14). Use NULL for the time being.

Writing Files

The parameters are familiar by now. Notice that a successful write does not ensure that the data actually is written through to the disk unless FILE_FLAG_WRITE_THROUGH is specified with CreateFile. If the HANDLE position plus the write byte count exceed the current file length, Windows will extend the file length.

UNIX read and write are the comparable functions, and the programmer supplies a file descriptor, buffer, and byte count. The functions return the number of bytes actually transferred. A value of 0 on read indicates the end of file; –1 indicates an error. Windows, by contrast, requires a separate transfer count and returns Boolean values to indicate success or failure.

The functions in both systems are general purpose and can read from files, terminals, tapes, pipes, and so on.

The Standard C library fread and fwrite binary I/O functions use object size and object count rather than a single byte count as in UNIX and Windows. A short transfer could be caused by either an end of file or an error; test explicitly with ferror or feof. The library provides a full set of text-oriented functions, such as fgetc and fputc, that do not exist outside the C library in either OS.

Interlude: Unicode and Generic Characters

Before proceeding, we explain briefly how Windows processes characters and differentiates between 8- and 16-bit characters and generic characters. The topic is a large one and beyond the book’s scope, so we only provide the minimum detail required.

Windows supports standard 8-bit characters (type char or CHAR) and wide 16-bit characters (WCHAR, which is defined to be the C wchar_t type). The Microsoft documentation refers to the 8-bit character set as ANSI, but it is actually a misnomer. For convenience, we use the term “ASCII,” which also is not totally accurate.⁴

⁴ The distinctions and details are technical but can be critical in some situations. ASCII codes only go to 127. There are different ASNI code pages, which are configurable from the Control Panel. Use your favorite search engine or search MSDN with a phrase such as “Windows code page 1252” to obtain more information.

The wide character support that Windows provides using the Unicode UTF-16 encoding is capable of representing symbols and letters in all major languages, including English, French, Spanish, German, Japanese, and Chinese.

Here are the normal steps for writing a generic Windows application that can be built to use either Unicode or 8-bit ASCII characters.

1. Define all characters and strings using the generic types TCHAR, LPTSTR, and LPCTSTR.

2. Include the definitions #define UNICODE and #define_UNICODE in all source modules to get Unicode wide characters (ANSI C wchar_t); otherwise, with UNICODE and _UNICODE undefined, TCHAR will be equivalent to CHAR (ANSI C char). The definition must precede the #include <windows.h> statement and is frequently defined on the compiler command line, the Visual Studio project properties, or the project’s stdafx.h file. The first preprocessor variable controls the Windows function definitions, and the second variable controls the C library.

3. Byte buffer lengths—as used, for example, in ReadFile—can be calculated using sizeof (TCHAR).

4. Use the collection of generic C library string and character I/O functions in tchar.h. Representative functions are _fgettc, _itot (for itoa), _stprintf (for sprintf), _tcscpy (for strcpy), _ttoi, _totupper, _totlower, and _ftprintf.⁵ See MSDN for a complete and extensive list. All these definitions depend on _UNICODE. This collection is not complete. memchr is an example of a function without a wide character implementation. New versions are provided in the Examples file as required.

⁵ The underscore character (_) indicates that a function or keyword is provided by Microsoft C, and the letters t and T denote a generic text character. Other development systems provide similar capability but may use different names or keywords.

5. Constant strings should be in one of three forms. Use these conventions for single characters as well. The first two forms are ANSI C; the third—the _T macro (equivalently, TEXT and _TEXT)—is supplied with the Microsoft C compiler.

6. Include tchar.h after windows.h to get required definitions for text macros and generic C library functions.

Windows uses Unicode 16-bit characters throughout, and NTFS file names and pathnames are represented internally in Unicode. If the UNICODE macro is defined, wide character strings are required by Windows calls; otherwise, 8-bit character strings are converted to wide characters. Some Windows API functions only support Unicode, and this policy is expected to continue with new functions.

All future program examples will use TCHAR instead of the normal char for characters and character strings unless there is a clear reason to deal with individual 8-bit characters. Similarly, the type LPTSTR indicates a pointer to a generic string, and LPCTSTR indicates, in addition, a constant string. At times, this choice will add some clutter to the programs, but it is the only choice that allows the flexibility necessary to develop and test applications in either Unicode or 8-bit character form so that the program can be easily converted to Unicode at a later date. Furthermore, this choice is consistent with common, if not universal, industry practice.

It is worthwhile to examine the system include files to see how TCHAR and the system function interfaces are defined and how they depend on whether or not UNICODE and _UNICODE are defined. A typical entry is of the following form:

Alternative Generic String Processing Functions

String comparisons can use lstrcmp and lstrcmpi rather than the generic _tcscmp and _tcscmpi to account for the specific language and region, or locale, at run time and also to perform word rather than string comparisons. String comparisons simply compare the numerical values of the characters, whereas word comparisons consider locale-specific word order. The two methods can give opposite results for string pairs such as coop/co-op and were/we’re.

There is also a group of Windows functions for dealing with Unicode characters and strings. These functions handle locale characteristics transparently. Typical functions are CharUpper, which can operate on strings as well as individual characters, and IsCharAlphaNumeric. Other string functions include CompareString (which is locale-specific). The generic C library functions (e.g., _tprintf) and the Windows functions will both appear in upcoming examples to demonstrate their use. Examples in later chapters will rely mostly on the generic C library for character and string manipulation, as the C Library has the required functionality, the Windows functions do not add value, and readers will be familiar with the C Library.

The Generic Main Function

Replace the C main function, with its argument list (argv[]), with the macro _tmain. The macro expands to either main or wmain depending on the _UNICODE definition. The _tmain definition is in tchar.h, which must be included after windows.h. A typical main program heading, then, would look like this:

The Microsoft C _tmain function also supports a third parameter for environment strings. This nonstandard extension is also common in UNIX.

Function Definitions

A function such as CreateFile is defined through a preprocessor macro as CreateFileA when UNICODE is not defined and as CreateFileW when UNICODE is defined. The definitions also describe the string parameters as 8-bit or wide character strings. Consequently, compilers will report a source code error, such as an illegal parameter to CreateFile, as an error in the use of CreateFileA or CreateFileW.

Unicode Strategies

A programmer starting a Windows project, either to develop new code or to enhance or port existing code, can select from four strategies, based on project requirements.

1. 8-bit only. Ignore Unicode and continue to use the char (or CHAR) data type and the Standard C library for functions such as printf, atoi, and strcmp.

2. 8-bit or Unicode with generic code. Follow the earlier guidelines for generic code. The example programs generally use this strategy with the Unicode macros undefined to produce 8-bit code.

3. Unicode only. Follow the generic guidelines, but define the two preprocessor variables. Alternatively, use wide characters and the wide character functions exclusively.

4. Unicode and 8-bit. The program includes both Unicode and ASCII code and decides at run time which code to execute, based on a run-time switch or other factors.

As mentioned previously, writing generic code, while requiring extra effort and creating awkward-looking code, allows the programmer to maintain maximum flexibility. However, Unicode only (Strategy 3) is increasingly common, especially with applications requiring a graphical user interface.

ReportError (Program 2-1) shows how to specify the language for error messages.

Program 2-1 ReportError: Reporting System Call Errors

Example: Error Processing

cpW, Program 1-2, showed some rudimentary error processing, obtaining the DWORD error number with the GetLastError function. A function call, rather than a global error number, such as the UNIX errno, ensures that system errors are unique to the threads (Chapter 7) that share data storage.

The function FormatMessage turns the message number into a meaningful message, in English or one of many other languages, returning the message length.

ReportError, Program 2-1, shows a useful general-purpose error-processing function, ReportError, which is similar to the C library perror and to err_sys, err_ret, and other functions. ReportError prints a message specified in the first argument and will terminate with an exit code or return, depending on the value of the second argument. The third argument determines whether the system error message should be displayed.

Notice the arguments to FormatMessage. The value returned by GetLastError is used as one parameter, and a flag indicates that the message is to be generated by the system. The generated message is stored in a buffer allocated by the function, and the address is returned in a parameter. There are several other parameters with default values. The language for the message can be set at either compile time or run time. This information is sufficient for our needs, but MSDN supplies complete details.

ReportError can simplify error processing, and nearly all subsequent examples use it. Chapter 4 extends ReportError to generate exceptions.

Program 2-1 introduces the include file Everything.h. As the name implies, this file includes windows.h, Environment.h, which has the UNICODE definition, and other include files.⁶ It also defines commonly used functions, such as ReportError itself. All subsequent examples will use this single include file, which is in the Examples code.

⁶ “Everything” is an exaggeration, of course, but it’s everything we need for most examples, and it’s used in nearly all examples. Additional special-purpose include files are introduced in later chapters.

Notice the call to the function LocalFree near the end of the program, as required by FormatMessage (see MSDN). This function is explained in Chapter 5. Previous book editions erroneously used GlobalFree.

See Run 2-2 for sample ReportError output from a complete program, and many other screenshots throughout the book show ReportError output.

Standard Devices

Like UNIX, a Windows process has three standard devices for input, output, and error reporting. UNIX uses well-known values for the file descriptors (0, 1, and 2), but Windows requires HANDLEs and provides a function to obtain them for the standard devices.

Parameters

nStdHandle must have one of these values:

• STD_INPUT_HANDLE

• STD_OUTPUT_HANDLE

• STD_ERROR_HANDLE

The standard device assignments are normally the console and the keyboard. Standard I/O can be redirected.

GetStdHandle does not create a new or duplicate handle on a standard device. Successive calls in the process with the same device argument return the same handle value. Closing a standard device handle makes the device unavailable for future use within the process. For this reason, the examples often obtain a standard device handle but do not close it.

Chapter 7’s grepMT example and Chapter 11’s pipe example illustrate GetStdHandle usage.

Parameters

In SetStdHandle, nStdHandle has the same enumerated values as in GetStdHandle. hHandle specifies an open file that is to be the standard device.

There are two reserved pathnames for console input (the keyboard) and console output: "CONIN$" and "CONOUT$". Initially, standard input, output, and error are assigned to the console. It is possible to use the console regardless of any redirection to these standard devices; just use CreateFile to open handles to "CONIN$" or "CONOUT$". The “Console I/O” section at the end of this chapter covers the subject.

Example: Copying Multiple Files to Standard Output

cat, the next example (Program 2-2), illustrates standard I/O and extensive error checking as well as user interaction. This program is a limited implementation of the UNIX cat command, which copies one or more specified files—or standard input if no files are specified—to standard output.

Program 2-2 cat: File Concatenation to Standard Output

Program 2-2 includes complete error handling. Future program listings omit most error checking for brevity, but the Examples contain the complete programs with extensive error checking and documentation. Also, notice the Options function, which is called at the start of the program. This function, included in the Examples file and used throughout the book, evaluates command line option flags and returns the argv index of the first file name. Use Options in much the same way as getopt is used in many UNIX programs.

Run 2-2 shows cat output with and without errors. The error output occurs when a file name does not exist. The output also shows the text that the randfile program generates; randfile is convenient for these examples, as it quickly generates text files of nearly any size. Also, notice that the records can be sorted on the first 8 characters, which will be convenient for examples in later chapters. The “x” character at the end of each line is a visual cue and has no other meaning.

Finally, Run 2-2 shows cat displaying individual file names; this feature is not part of Program 2-2 but was added temporarily to help clarify Run 2-2.

Run 2-2 cat: Results, with ReportError Output

Example: Simple File Encryption

File copying is familiar by now, so Program 2-3 also converts a file byte-by-byte so that there is computation as well as file I/O. The conversion is a modified “Caesar cipher,” which adds a fixed number to each byte (a Web search will provide extensive background information). The program also includes some error reporting. It is similar to Program 1-3 (cpCF), replacing the final call to CopyFile with a new function that performs the file I/O and the byte addition.

Program 2-3 cci: File Encryption with Error Reporting

The shift number, along with the input and output file, are command line parameters. The program adds the shift to each byte modulo 256, which means that the encrypted file may contain unprintable characters. Furthermore, end of line, end of string, and other control characters are changed. A true Caesar cipher only shifts the letters; this implementation shifts all bytes. You can decrypt the file by subtracting the original shift from 256 or by using a negative shift.

This program, while simple, is a good base for numerous variations later in the book that use threads, asynchronous I/O, and other file processing techniques.

Program 2-4, immediately after Program 2-3, shows the actual conversion function, and Run 2-3 shows program operation with encryption, decryption, and file comparison using the Windows FC command.

Program 2-4 cci_f: File Conversion Function

Run 2-3 cci: Caesar Cipher Run and Test

Comment: Note that the full Examples code uses the Microsoft C Library function, _taccess, to determine if the file exists. The code comments describe two alternative techniques.

Warning: Future program listings after Program 2-3 omit most, or all, error checking in order to streamline the presentation and concentrate on the logic. Use the full Examples code if you want to copy any of the examples.

Program 2-4 is the conversion function cci_f called by Program 2-3; later, we’ll have several variations of this function.

Performance

Appendix C shows that the performance of the file conversion program can be improved by using such techniques as providing a larger buffer and by specifying FILE_FLAG_SEQUENTIAL_SCAN with CreateFile. Later chapters show more advanced techniques to enhance this simple program.

File and Directory Management

This section introduces the basic functions for file and directory management.

File Management

Windows provides a number of file management functions, which are generally straightforward. The functions described here delete, copy, and rename files. There is also a function to create temporary file names.

File Deletion

You can delete a file by specifying the file name and calling the DeleteFile function. Recall that all absolute pathnames start with a drive letter or a server name.

Copying a File

Copy an entire file using a single function, CopyFile, which was introduced in Chapter 1’s cpCF (Program 1-3) example.

CopyFile copies the named existing file and assigns the specified new name to the copy. If a file with the new name already exists, it will be replaced only if fFailIfExists is FALSE. CopyFile also copies file metadata, such as creation time.

Hard and Symbolic Links

Create a hard link between two files with the CreateHardLink function, which is similar to a UNIX hard link. With a hard link, a file can have two separate names. Note that there is only one file, so a change to the file will be available regardless of the name used to open the file.

The first two arguments, while in the opposite order, are used as in CopyFile. The two file names, the new name and the existing name, must occur in the same file system volume, but they can be in different directories. The security attributes, if any, apply to the new file name.

Windows Vista and other NT6 systems support a similar symbolic link function, but there is no symbolic link in earlier Windows systems.

lpSymlinkFileName is the symbolic link that is created to lpTargetFileName. Set dwFlags to 0 if the target is a file, and set it to SYMBOLIC_LINK_FLAG_DIRECTORY if it is a directory. lpTargetFileName is treated as an absolute link if there is a device name associated with it. See MSDN for detailed information about absolute and relative links.

Renaming and Moving Files

There is a pair of functions to rename, or “move,” a file. These functions also work for directories, whereas DeleteFile and CopyFile are restricted to files.

MoveFile fails if the new file already exists; use MoveFileEx to overwrite existing files.

Note: The Ex suffix is common and represents an extended version of an existing function in order to provide additional functionality. Many extended functions are not supported in earlier Windows versions.

The MoveFile and MoveFileEx parameters, especially the flags, are sufficiently complex to require additional explanation:

lpExistingFileName specifies the name of the existing file or directory.

lpNewFileName specifies the new file or directory name, which cannot already exist in the case of MoveFile. A new file can be on a different file system or drive, but new directories must be on the same drive. If NULL, the existing file is deleted. Wildcards are not allowed in file or directory names. Specify the actual name.

dwFlags specifies options as follows:

• MOVEFILE_REPLACE_EXISTING—Use this option to replace an existing file.

• MOVEFILE_WRITE_THROUGH—Use this option to ensure that the function does not return until the copied file is flushed through to the disk.

• MOVEFILE_COPY_ALLOWED—When the new file is on a different volume, the move is achieved with a CopyFile followed by a DeleteFile. You cannot move a file to a different volume without using this flag, and moving a file to the same volume just involves renaming without copying the file data, which is fast compared to a full copy.

• MOVEFILE_DELAY_UNTIL_REBOOT—This flag, which cannot be used in conjunction with MOVEFILE_COPY_ALLOWED, is restricted to administrators and ensures that the file move does not take effect until Windows restarts. Also, if the new file name is null, the existing file will be deleted when Windows restarts.

UNIX pathnames do not include a drive or server name; the slash indicates the system root. The Microsoft C library file functions also support drive names as required by the underlying Windows file naming.

UNIX does not have a function to copy files directly. Instead, you must write a small program or call system() to execute the cp command.

unlink is the UNIX equivalent of DeleteFile except that unlink can also delete directories.

rename and remove are in the C library, and rename will fail when attempting to move a file to an existing file name or a directory to a non-empty directory.

Directory Management

Creating or deleting a directory involves a pair of simple functions.

lpPathName points to a null-terminated string with the name of the directory that is to be created or deleted. The security attributes, as with other functions, should be NULL for the time being; Chapter 15 describes file and object security. Only an empty directory can be removed.

A process has a current, or working, directory, just as in UNIX. Furthermore, each individual drive keeps a working directory. Programs can both get and set the current directory. The first function sets the directory.

lpPathName is the path to the new current directory. It can be a relative path or a fully qualified path starting with either a drive letter and colon, such as D:, or a UNC name (such as \ACCTG_SERVERPUBLIC).

If the directory path is simply a drive name (such as A: or C:), the working directory becomes the working directory on the specified drive. For example, if the working directories are set in the sequence

then the resulting working directory will be

C:MSDEVINCLUDE

The next function returns the fully qualified pathname into a specified buffer.

cchCurDir is the character (not byte; the ccb prefix denotes byte length) length of the buffer for the directory name. The length must allow for the terminating null character. lpCurDir points to the buffer to receive the pathname string.

Notice that if the buffer is too small for the pathname, the return value tells how large the buffer should be. Therefore, the test for function failure should test both for zero and for the result being larger than the cchCurDir argument.

This method of returning strings and their lengths is common in Windows and must be handled carefully. Program 2-6 illustrates a typical code fragment that performs the logic. Similar logic occurs in other examples. The method is not always consistent, however. Some functions return a Boolean, and the length parameter is used twice; it is set with the length of the buffer before the call, and the function changes the value. LookupAccountName in Chapter 15 is one of more complex functions in terms of returning results.

An alternative approach, illustrated with the GetFileSecurity function in Program 15-4, is to make two function calls with a buffer memory allocation in between. The first call gets the string length, which is used in the memory allocation. The second call gets the actual string. The simplest approach in this case is to allocate a string holding MAX_PATH characters.

Examples Using File and Directory Management Functions

pwd (Program 2-6) uses GetCurrentDirectory. Example programs in Chapter 3 and elsewhere use other file and directory management functions.

Console I/O

Console I/O can be performed with ReadFile and WriteFile, but it is simpler to use the specific console I/O functions, ReadConsole and WriteConsole. The principal advantages are that these functions process generic characters (TCHAR) rather than bytes, and they also process characters according to the console mode, which is set with the SetConsoleMode function.

Parameters

hConsoleHandle identifies a console input or screen buffer, which must have GENERIC_WRITE access even if it is an input-only device.

dwMode specifies how characters are processed. Each flag name indicates whether the flag applies to console input or output. Five commonly used flags, listed here, control behavior; they are all enabled by default.

• ENABLE_LINE_INPUT—Specify that ReadConsole returns when it encounters a carriage return character.

• ENABLE_ECHO_INPUT—Echo characters to the screen as they are read.

• ENABLE_PROCESSED_INPUT—Process backspace, carriage return, and line feed characters.

• ENABLE_PROCESSED_OUTPUT—Process backspace, tab, bell, carriage return, and line feed characters.

• ENABLE_WRAP_AT_EOL_OUTPUT—Enable line wrap for both normal and echoed output.

If SetConsoleMode fails, the mode is unchanged and the function returns FALSE. GetLastError returns the error code number.

The ReadConsole and WriteConsole functions are similar to ReadFile and WriteFile.

The parameters are nearly the same as with ReadFile. The two length parameters are in terms of generic characters rather than bytes, and lpReserved must be NULL. Never use any of the reserved fields that occur in this and other functions. WriteConsole is now self-explanatory. The next example (Program 2-5) shows how to use ReadConsole and WriteConsole with generic strings and how to take advantage of the console mode.

Program 2-5 PrintMsg: Console Prompt and Print Utility Functions

A process can have only one console at a time. Applications such as the ones developed so far are normally initialized with a console. In many cases, such as a server or GUI application, however, you may need a console to display status or debugging information. There are two simple parameterless functions for this purpose.

FreeConsole detaches a process from its console. Calling AllocConsole then creates a new one associated with the process’s standard input, output, and error handles. AllocConsole will fail if the process already has a console; to avoid this problem, precede the call with FreeConsole.

Note: Windows GUI applications do not have a default console and must allocate one before using functions such as WriteConsole or printf to display on a console. It’s also possible that server processes may not have a console. Chapter 6 shows how to create a process without a console.

There are numerous other console I/O functions for specifying cursor position, screen attributes (such as color), and so on. This book’s approach is to use only those functions needed to get the examples to work and not to wander further than necessary into user interfaces. It is easy to learn additional functions from the MSDN reference material after you see the examples.

For historical reasons, Windows does not support character-oriented terminals in the way that UNIX does, and not all the UNIX terminal functionality is replicated by Windows. For example, UNIX provides functions for setting baud rates and line control functions. Stevens and Rago dedicate a chapter to UNIX terminal I/O (Chapter 11) and one to pseudo terminals (Chapter 19).

Serious Windows user interfaces are, of course, graphical, with mouse as well as keyboard input. The GUI is outside the scope of this book, but everything we discuss works within a GUI application.

Example: Printing and Prompting

The ConsolePrompt function, which appears in PrintMsg (Program 2-5), is a useful utility that prompts the user with a specified message and then returns the user’s response. There is an option to suppress the response echo. The function uses the console I/O functions and generic characters. PrintStrings and PrintMsg are the other entries in this module; they can use any handle but are normally used with standard output or error handles. The first function allows a variable-length argument list, whereas the second one allows just one string and is for convenience only. PrintStrings uses the va_start, va_arg, and va_end functions in the Standard C library to process the variable-length argument list.

Example programs will use these functions and the generic C library functions as convenient.

See Run 2-6 after Program 2-6 for sample outputs. Chapters 11 and 15 have examples using ConsolePrompt.

Run 2-6 pwd: Determining the Current Directory

Program 2-6 pwd: Printing the Current Directory

Notice that ConsolePrompt returns a Boolean success indicator. Furthermore, GetLastError will return the error from the function that failed, but it’s important to call ReportError, and hence GetLastError, before the CloseHandle calls.

Also, ReadConsole returns a carriage return and line feed, so the last step is to insert a null character in the proper location over the carriage return. The calling program must provide the maxChar parameter to prevent buffer overflow.

Example: Printing the Current Directory

pwd (Program 2-6) implements a version of the UNIX command pwd. The MAX_PATH value specifies the buffer size, but there is an error test to illustrate GetCurrentDirectory.

Run 2-6, shows the results, which appear on a single line. The Windows Command Prompt produces the first and last lines, whereas pwd produces the middle line.

Summary

Windows supports a complete set of functions for processing and managing files and directories, along with character processing functions. In addition, you can write portable, generic applications that can be built for either ASCII or Unicode operation.

The Windows functions resemble their UNIX and C library counterparts in many ways, but the differences are also apparent. Appendix B discusses portable coding techniques. Appendix B also has a table showing the Windows, UNIX, and C library functions, noting how they correspond and pointing out some of the significant differences.

Looking Ahead

The next step, in Chapter 3, is to discuss direct file access and to learn how to deal with file and directory attributes such as file length and time stamps. Chapter 3 also shows how to process directories and ends with a discussion of the registry management API, which is similar to the directory management API.

Additional Reading

NTFS and Windows Storage

Inside Windows Storage, by Dilip Naik, is a comprehensive discussion of the complete range of Windows storage options including directly attached and network attached storage. Recent developments, enhancements, and performance improvements, along with internal implementation details, are all described.

Inside the Windows NT File System, by Helen Custer, and Windows NT File System Internals, by Rajeev Nagar, are additional references, as is the previously mentioned Windows Internals: Including Windows Server 2008 and Windows Vista.

Unicode

Developing International Software, by Dr. International (that’s the name on the book), shows how to use Unicode in practice, with guidelines, international standards, and culture-specific issues.

UNIX

Stevens and Rado cover UNIX files and directories in Chapters 3 and 4 and terminal I/O in Chapter 11.

UNIX in a Nutshell, by Arnold Robbins et al., is a useful quick reference on the UNIX commands.

Exercises

2–1. Write a short program to test the generic versions of printf and scanf.

2–2. Modify the CatFile function in cat (Program 2-2) so that it uses WriteConsole rather than WriteFile when the standard output handle is associated with a console.

2–3. CreateFile allows you to specify file access characteristics so as to enhance performance. FILE_FLAG_SEQUENTIAL_SCAN is an example. Use this flag in cci_f (Program 2-4) and determine whether there is a performance improvement for large files, including files larger than 4GB. Also try FILE_FLAG_NO_BUFFERING after reading the MSDN CreateFile documentation carefully. Appendix C shows results on several Windows versions and computers.

2–4. Run cci (Program 2-3) with and without UNICODE defined. What is the effect, if any?

2–5. Compare the information provided by perror (in the C library) and ReportError for common errors such as opening a nonexistent file.

2–6. Test the ConsolePrompt (Program 2-5) function’s suppression of keyboard echo by using it to ask the user to enter and confirm a password.

2–7. Determine what happens when performing console output with a mixture of generic C library and Windows WriteFile or WriteConsole calls. What is the explanation?

2–8. Write a program that sorts an array of Unicode strings. Determine the difference between the word and string sorts by using lstrcmp and _tcscmp. Does lstrlen produce different results from those of _tcslen? The remarks under the CompareString function entry in the Microsoft online help are useful.

2–9. Appendix C provides performance data for file copying and cci conversion using different program implementations. Investigate performance with the test programs on computers available to you. Also, if possible, investigate performance using networked file systems, SANs, and so on, to understand the impact of various storage architectures when performing sequential file access.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 2. Using the Windows File System and Character I/O

Create new playlist

Sign In

Sign Up

Chapter 2. Using the Windows File System and Character I/O

The Windows File Systems

File Naming

Opening, Reading, Writing, and Closing Files

Creating and Opening Files

Parameters

Closing Files

Reading Files

Parameters

Writing Files

Interlude: Unicode and Generic Characters

Alternative Generic String Processing Functions

The Generic Main Function

Function Definitions

Unicode Strategies

Example: Error Processing

Standard Devices

Parameters

Parameters

Example: Copying Multiple Files to Standard Output

Example: Simple File Encryption

Performance

File and Directory Management

File Management

File Deletion

Copying a File

Hard and Symbolic Links

Renaming and Moving Files

Directory Management

Examples Using File and Directory Management Functions

Console I/O

Parameters

Example: Printing and Prompting

Example: Printing the Current Directory

Summary

Looking Ahead

Additional Reading

NTFS and Windows Storage

Unicode

UNIX

Exercises

Table of Contents for
Chapter 2. Using the Windows File System and Character I/O