17
FILESYSTEMS

“So, you’re the UNIX guru.” At the time, Randy was still stupid enough to be flattered by this attention, when he should have recognized them as bone-chilling words.
—Neal Stephenson
, Cryptonomicon

Image

This chapter teaches you how to use the stdlib’s Filesystem library to perform operations on filesystems, such as manipulating and inspecting files, enumerating directories, and interoperating with file streams.

The stdlib and Boost contain Filesystem libraries. The stdlib’s Filesystem library grew out of Boost’s, and accordingly they’re largely interchangeable. This chapter focuses on the stdlib implementation. If you’re interested in learning more about Boost, refer to the Boost Filesystem documentation. Boost and stdlib’s implementations are mostly identical.

NOTE

The C++ Standard has a history of subsuming Boost libraries. This allows the C++ community to gain experience with new features in Boost before going through the more arduous process of including the features in the C++ Standard.

Filesystem Concepts

Filesystems model several important concepts. The central entity is the file. A file is a filesystem object that supports input and output and holds data. Files exist in containers called directories, which can be nested within other directories. For simplicity, directories are considered files. The directory containing a file is called that file’s parent directory.

A path is a string that identifies a specific file. Paths begin with an optional root name, which is an implementation-specific string, such as C: or //localhost on Windows followed by an optional root directory, which is another implementation-specific string, such as / on Unix-like systems. The remainder of the path is a sequence of directories separated by implementation-defined separators. Optionally, paths terminate in a non-directory file. Paths can contain the special names “.” and “..”, which mean current directory and parent directory, respectively.

A hard link is a directory entry that assigns a name to an existing file, and a symbolic link (or symlink) assigns a name to a path (which might or might not exist). A path whose location is specified in relation to another path (usually the current directory) is called a relative path, and a canonical path unambiguously identifies a file’s location, doesn’t contain the special names “.” and “..”, and doesn’t contain any symbolic links. An absolute path is any path that unambiguously identifies a file’s location. A major difference between a canonical path and an absolute path is that a canonical path cannot contain the special names “.” and “..”.

WARNING

The stdlib filesystem might not be available if the target platform doesn’t offer a hierarchical filesystem.

std::filesystem::path

The std::filesystem::path is the Filesystem library’s class for modeling a path, and you have many options for constructing paths. Perhaps the two most common are the default constructor, which constructs an empty path, and the constructor taking a string type, which creates the path indicated by the characters in the string. Like all other filesystem classes and functions, the path class resides in the <filesystem> header.

In this section, you’ll learn how to construct a path from a string representation, decompose it into constituent parts, and modify it. In many common system- and application-programming contexts, you’ll need to interact with files. Because each operating system has a unique representation for filesystems, the stdlib’s Filesystem library is a welcome abstraction that allows you to write cross-platform code easily.

Constructing Paths

The path class supports comparison with other path objects and with string objects using the operator==. But if you just want to check whether the path is empty, it offers an empty method that returns a Boolean. Listing 17-1 illustrates how to construct two paths (one empty and one non-empty) and test them.

#include <string>
#include <filesystem>

TEST_CASE("std::filesystem::path supports == and .empty()") {
  std::filesystem::path empty_path; 
  std::filesystem::path shadow_path{ "/etc/shadow" }; 
  REQUIRE(empty_path.empty()); 
  REQUIRE(shadow_path == std::string{ "/etc/shadow" }); 
}

Listing 17-1: Constructing std::filesystem::path

You construct two paths: one with the default constructor and one referring to /etc/shadow . Because you default construct it, the empty method of empty_path returns true . The shadow_path equals a string containing /etc/shadow, because you construct it with the same contents .

Decomposing Paths

The path class contains some decomposition methods that are, in effect, specialized string manipulators that allow you to extract components of the path, for example:

  • root_name() returns the root name.
  • root_directory() returns the root directory.
  • root_path() returns the root path.
  • relative_path() returns a path relative to the root.
  • parent_path() returns the parent path.
  • filename() returns the filename component.
  • stem() returns the filename stripped of its extension.
  • extension() returns the extension.

Listing 17-2 provides the values returned by each of these methods for a path pointing to a very important Windows system library, kernel32.dll.

#include <iostream>
#include <filesystem>

using namespace std;

int main() {
  const filesystem::path kernel32{ R"(C:WindowsSystem32kernel32.dll)" }; 
  cout << "Root name: " << kernel32.root_name() 
    << "
Root directory: " << kernel32.root_directory() 
    << "
Root path: " << kernel32.root_path() 
    << "
Relative path: " << kernel32.relative_path() 
    << "
Parent path: " << kernel32.parent_path() 
    << "
Filename: " << kernel32.filename() 
    << "
Stem: " << kernel32.stem() 
    << "
Extension: " << kernel32.extension() 
    << endl;
}
-----------------------------------------------------------------------
Root name: "C:" 
Root directory: "\" 
Root path: "C:\" 
Relative path: "Windows\System32\kernel32.dll" 
Parent path: "C:\Windows\System32" 
Filename: "kernel32.dll" 
Stem: "kernel32" 
Extension: ".dll" 

Listing 17-2: A program printing various decompositions of a path

You construct a path to kernel32 using a raw string literal to avoid having to escape the backslashes . You extract the root name , the root directory , and the root path of kernel32 and output them to stdout. Next, you extract the relative path, which displays the path relative to the root C: . The parent path is the path of kernel32.dll’s parent, which is simply the directory containing it . Finally, you extract the filename , its stem , and its extension .

Notice that you don’t need to run Listing 17-2 on any particular operating system. None of the decomposition methods require that the path actually point to an existing file. You simply extract components of the path’s contents, not the pointed-to file. Of course, different operating systems will yield different results, especially with respect to the delimiters (which are, for example, forward slashes on Linux).

NOTE

Listing 17-2 illustrates that std::filesystem::path has an operator<< that prints quotation marks at the beginning and end of its path. Internally, it uses std::quoted, a class template in the <iomanip> header that facilitates the insertion and extraction of quoted strings. Also, recall that you must escape the backslash in a string literal, which is why you see two rather than one in the paths embedded in the source code.

Modifying Paths

In addition to decomposition methods, path offers several modifier methods, which allow you to modify various characteristics of a path:

  • clear() empties the path.
  • make_preferred() converts all the directory separators to the implementation-preferred directory separator. For example, on Windows this converts the generic separator / to the system-preferred separator .
  • remove_filename() removes the filename portion of the path.
  • replace_filename(p) replaces the path’s filename with that of another path p.
  • replace_extension(p) replaces the path’s extension with that of another path p.
  • remove_extension() removes the extension portion of the path.

Listing 17-3 illustrates how to manipulate a path using several modifier methods.

#include <iostream>
#include <filesystem>

using namespace std;

int main() {
  filesystem::path path{ R"(C:/Windows/System32/kernel32.dll)" };
  cout << path << endl; 

  path.make_preferred();
  cout << path << endl; 

  path.replace_filename("win32kfull.sys");
  cout << path << endl; 

  path.remove_filename();
  cout << path << endl; 

  path.clear();
  cout << "Is empty: " << boolalpha << path.empty() << endl; 
}
-----------------------------------------------------------------------
"C:/Windows/System32/kernel32.dll" 
"C:\Windows\System32\kernel32.dll" 
"C:\Windows\System32\win32kfull.sys" 
"C:\Windows\System32\" 
Is empty: true 

Listing 17-3: Manipulating a path using modifier methods. (Output is from a Windows 10 x64 system.)

As in Listing 17-2, you construct a path to kernel32, although this one is non-const because you’re about to modify it . Next, you convert all the directory separators to the system’s preferred directory separator using make_preferred. Listing 17-3 shows output from a Windows 10 x64 system, so it has converted from slashes (/) to backslashes () . Using replace_filename, you replace the filename from kernel32.dll to win32kfull.sys . Notice again that the file described by this path doesn’t need to exist on your system; you’re just manipulating the path. Finally, you remove the filename using the remove_filename method and then empty the path’s contents entirely using clear .

Summary of Filesystem Path Methods

Table 17-1 contains a partial listing of the available methods of path. Note that p, p1, and p2 are path objects and s is a stream in the table.

Table 17-1: A Summary of std::filestystem::path Operations

Operation

Notes

path{}

Constructs an empty path.

Path{ s, [f] }

Constructs a path from the string type s; f is an optional path::format type that defaults to the implementation-defined pathname format.

Path{ p }

p1 = p2

Copy construction/assignment.

Path{ move(p) }

p1 = move(p2)

Move construction/assignment.

p.assign(s)

Assigns p to s, discarding current contents.

p.append(s)

p / s

Appends s to p, including the appropriate separator, path::preferred_separator.

p.concat(s)

p + s

Appends s to p without including a separator.

p.clear()

Erases the contents.

p.empty()

Returns true if p is empty.

p.make_preferred()

Converts all the directory separators to the implementation-preferred directory separator.

p.remove_filename()

Removes the filename portion.

p1.replace_filename(p2)

Replaces the filename of p1 with that of p2.

p1.replace_extension(p2)

Replaces the extension of p1 with that of p2.

p.root_name()

Returns the root name.

p.root_directory()

Returns the root directory.

p.root_path()

Returns the root path.

p.relative_path()

Returns the relative path.

p.parent_path()

Returns the parent path.

p.filename()

Returns the filename.

p.stem()

Returns the stem.

p.extension()

Returns the extension.

p.has_root_name()

Returns true if p has a root name.

p.has_root_directory()

Returns true if p has a root directory.

p.has_root_path()

Returns true if p has a root path.

p.has_relative_path()

Returns true if p has a relative path.

p.has_parent_path()

Returns true if p has a parent path.

p.has_filename()

Returns true if p has a filename.

p.has_stem()

Returns true if p has a stem.

p.has_extension()

Returns true if p has an extension.

p.c_str()

p.native()

Returns the native-string representation of p.

p.begin()

p.end()

Accesses the elements of a path sequentially as a half-open range.

s << p

Writes p into s.

s >> p

Reads s into p.

p1.swap(p2)

swap(p1, p2)

Exchanges each element of p1 with the elements of p2.

p1 == p2

p1 != p2

p1 > p2

p1 >= p2

p1 < p2

p1 <= p2

Lexicographically compares two paths p1 and p2.

Files and Directories

The path class is the central element of the Filesystem library, but none of its methods actually interact with the filesystem. Instead, the <filesystem> header contains non-member functions to do this. Think of path objects as the way you declare which filesystem components you want to interact with and think of the <filesystem> header as containing the functions that perform work on those components.

These functions have friendly error-handling interfaces and allow you to break paths into, for example, directory name, filename, and extension. Using these functions, you have many tools for interacting with the files in your environment without having to use an operating-specific application programming interface.

Error Handling

Interacting with the environment’s filesystem involves the potential for errors, such as files not found, insufficient permissions, or unsupported operations. Therefore, each non-member function in the Filesystem library that interacts with the filesystem must convey error conditions to the caller. These non-member functions provide two options: throw an exception or set an error variable.

Each function has two overloads: one that allows you to pass a reference to a std::system_error and one that omits this parameter. If you provide the reference, the function will set the system_error equal to an error condition, should one occur. If you don’t provide this reference, the function will throw a std::filesystem::filesystem_error (an exception type inheriting from std::system_error) instead.

Path-Composing Functions

As an alternative to using the constructor of path, you can construct various kinds of paths:

  • absolute(p, [ec]) returns an absolute path referencing the same location as p but where is_absolute() is true.
  • canonical(p, [ec]) returns a canonical path referencing the same location as p.
  • current_path([ec]) returns the current path.
  • relative(p, [base], [ec]) returns a path where p is made relative to base.
  • temp_directory_path([ec]) returns a directory for temporary files. The result is guaranteed to be an existing directory.

Note that current_path supports an overload so you can set the current directory (as in cd or chdir on Posix). Simply provide a path argument, as in current_path(p, [ec]).

Listing 17-4 illustrates several of these functions in action.

#include <filesystem>
#include <iostream>

using namespace std;

int main() {
  try {
    const auto temp_path = filesystem::temp_directory_path(); 
    const auto relative = filesystem::relative(temp_path); 
    cout << boolalpha
      << "Temporary directory path: " << temp_path 
      << "
Temporary directory absolute: " << temp_path.is_absolute() 
      << "
Current path: " << filesystem::current_path() 
      << "
Temporary directory's relative path: " << relative 
      << "
Relative directory absolute: " << relative.is_absolute() 
      << "
Changing current directory to temp.";
    filesystem::current_path(temp_path); 
    cout << "
Current directory: " << filesystem::current_path(); 
  } catch(const exception& e) {
    cerr << "Error: " << e.what(); 
  }
}
-----------------------------------------------------------------------
Temporary directory path: "C:\Users\lospi\AppData\Local\Temp\" 
Temporary directory absolute: true 
Current path: "c:\Users\lospi\Desktop" 
Temporary directory's relative path: "..\AppData\Local\Temp" 
Relative directory absolute: false 
Changing current directory to temp. 
Current directory: "C:\Users\lospi\AppData\Local\Temp" 

Listing 17-4: A program using several path composing functions. (Output is from a Windows 10 x64 system.)

You construct a path using temp_directory_path, which returns the system’s directory for temporary files , and then use relative to determine its relative path . After printing the temporary path , is_absolute illustrates that this path is absolute . Next, you print the current path and the temporary directory’s path relative to the current path . Because this path is relative, is_absolute returns false . Once you change the path to the temporary path , you then print the current directory . Of course, your output will look different from the output in Listing 17-4, and you might even get an exception if your system doesn’t support certain operations . (Recall the warning at the beginning of the chapter: the C++ Standard allows that some environments might not support some or all of the filesystem library.)

Inspecting File Types

You can inspect a file’s attributes given a path by using the following functions:

  • is_block_file(p, [ec]) determines if p is a block file, a special file in some operating systems (for example, block devices in Linux that allow you to transfer randomly accessible data in fixed-size blocks).
  • is_character_file(p, [ec]) determines if p is a character file, a special file in some operating systems (for example, character devices in Linux that allow you to send and receive single characters).
  • is_regular_file(p, [ec]) determines p is a regular file.
  • is_symlink(p, [ec]) determines if p is a symlink, which is a reference to another file or directory.
  • is_empty(p, [ec]) determines if p is either an empty file or an empty directory.
  • is_directory(p, [ec]) determines if p is a directory.
  • is_fifo(p, [ec]) determines if p is a named pipe, a special kind of interprocess communication mechanism in many operating systems.
  • is_socket(p, [ec]) determines if p is a socket, another special kind of interprocess communication mechanism in many operating systems.
  • is_other(p, [ec]) determines if p is some kind of file other than a regular file, a directory, or a symlink.

Listing 17-5 uses is_directory and is_regular_file to inspect four different paths.

#include <iostream>
#include <filesystem>

using namespace std;

void describe(const filesystem::path& p) { 
  cout << boolalpha << "Path: " << p << endl;
  try {
    cout << "Is directory: " << filesystem::is_directory(p) << endl; 
    cout << "Is regular file: " << filesystem::is_regular_file(p) << endl; 
  } catch (const exception& e) {
    cerr << "Exception: " << e.what() << endl;
  }
}

int main() {
  filesystem::path win_path{ R"(C:/Windows/System32/kernel32.dll)" };
  describe(win_path); 
  win_path.remove_filename();
  describe(win_path); 

  filesystem::path nix_path{ R"(/bin/bash)" };
  describe(nix_path); 
  nix_path.remove_filename();
  describe(nix_path); 
}

Listing 17-5: A program inspecting four iconic Windows and Linux paths with is_directory and is_regular_file.

On a Windows 10 x64 machine, running the program in Listing 17-5 yielded the following output:

Path: "C:/Windows/System32/kernel32.dll" 
Is directory: false 
Is regular file: true 
Path: "C:/Windows/System32/" 
Is directory: true 
Is regular file: false 
Path: "/bin/bash" 
Is directory: false 
Is regular file: false 
Path: "/bin/" 
Is directory: false 
Is regular file: false 

And on an Ubuntu 18.04 x64 machine, running the program in Listing 17-5 yielded the following output:

Path: "C:/Windows/System32/kernel32.dll" 
Is directory: false 
Is regular file: false 
Path: "C:/Windows/System32/" 
Is directory: false 
Is regular file: false 
Path: "/bin/bash" 
Is directory: false 
Is regular file: true 
Path: "/bin/" 
Is directory: true 
Is regular file: false 

First, you define the describe function, which takes a single path . After printing the path, you also print whether the path is a directory or a regular file . Within main, you pass four different paths to describe:

  • C:/Windows/System32/kernel32.dll
  • C:/Windows/System32/
  • /bin/bash
  • /bin/

Note that the result is operating system specific.

Inspecting Files and Directories

You can inspect various filesystem attributes using the following functions:

  • current_path([p], [ec]), which, if p is provided, sets the program’s current path to p; otherwise, it returns the program’s current path.
  • exists(p, [ec]) returns whether a file or directory exists at p.
  • equivalent(p1, p2, [ec]) returns whether p1 and p2 refer to the same file or directory.
  • file_size(p, [ec]) returns the size in bytes of the regular file at p.
  • hard_link_count(p, [ec]) returns the number of hard links for p.
  • last_write_time(p, [t] [ec]), which, if tect is provided, sets p’s last modified time to t; otherwise, it returns the last time p was modified. (t is a std::chrono::time_point.)
  • permissions(p, prm, [ec]) sets p’s permissions. prm is of type std::filesystem::perms, which is an enum class modeled after POSIX permission bits. (Refer to [fs.enum.perms].)
  • read_symlink(p, [ec]) returns the target of the symlink p.
  • space(p, [ec]) returns space information about the filesystem p occupies in the form of a std::filesystem::space_info. This POD contains three fields: capacity (the total size), free (the free space), and available (the free space available to a non-privileged process). All are an unsigned integer type, measured in bytes.
  • status(p, [ec]) returns the type and attributes of the file or directory p in the form of a std::filesystem::file_status. This class contains a type method that accepts no parameters and returns an object of type std::filesystem::file_type, which is an enum class that takes values describing a file’s type, such as not_found, regular, directory. The symlink file_status class also offers a permissions method that accepts no parameters and returns an object of type std::filesystem::perms. (Refer to [fs.class.file_status] for details.)
  • symlink_status(p, [ec]) is like a status that won’t follow symlinks.

If you’re familiar with Unix-like operating systems, you’ve no doubt used the ls (short for “list”) program many times to enumerate files and directories. On DOS-like operating systems (including Windows), you have the analogous dir command. You’ll use several of these functions later in the chapter (in Listing 17-7) to build your own simple listing program.

Now that you know how to inspect files and directories, let’s turn to how you can manipulate the files and directories your paths refer to.

Manipulating Files and Directories

Additionally, the Filesystem library contains a number of methods for manipulating files and directories:

  • copy(p1, p2, [opt], [ec]) copies files or directories from p1 to p2. You can provide a std::filesystem::copy_options opt to customize the behavior of copy_file. This enum class can take several values, including none (report an error if the destination already exists), skip_existing (to keep existing), overwrite_existing (to overwrite), and update_existing (to overwrite if p1 is newer). (Refer to [fs.enum.copy.opts] for details.)
  • copy_file(p1, p2, [opt], [ec]) is like copy except it will generate anerror if p1 is anything but a regular file.
  • copy_file(p1, p2, [opt], [ec]) is like copy except it will generate an error if p1 is anything but a regular file.
  • create_directory(p, [ec]) creates the directory p.
  • create_directories(p, [ec]) is like calling create_directory recursively, so if a nested path contains parents that don’t exist, use this form.
  • create_hard_link(tgt, lnk, [ec]) creates a hard link to tgt at lnk.
  • create_symlink(tgt, lnk, [ec]) creates a symlink to tgt at lnk.
  • create_directory_symlink(tgt, lnk, [ec]) should be used for directories instead of create_symlink.
  • remove(p, [ec]) removes a file or empty directory p (without following symlinks).
  • remove_all(p, [ec]) removes a file or directory recursively p (without following symlinks).
  • rename(p1, p2, [ec]) renames p1 to p2.
  • resize_file(p, new_size, [ec]) changes the size of p (if it’s a regular file) to new_size. If this operation grows the file, zeros fill the new space. Otherwise, the operation trims p from the end.

You can create a program that copies, resizes, and deletes a file using several of these methods. Listing 17-6 illustrates this by defining a function that prints file size and modification time. In main, the program creates and modifies two path objects, and it invokes that function after each modification.

#include <iostream>
#include <filesystem>

using namespace std;
using namespace std::filesystem;
using namespace std::chrono;

void write_info(const path& p) {
  if (!exists(p)) { 
    cout << p << " does not exist." << endl;
    return;
  }
  const auto last_write = last_write_time(p).time_since_epoch();
  const auto in_hours = duration_cast<hours>(last_write).count();
  cout << p << "	" << in_hours << "	" << file_size(p) << "
"; 
}

int main() {
  const path win_path{ R"(C:/Windows/System32/kernel32.dll)" }; 
  const auto reamde_path = temp_directory_path() / "REAMDE"; 
  try {
    write_info(win_path); 
    write_info(reamde_path); 

    cout << "Copying " << win_path.filename()
         << " to " << reamde_path.filename() << "
";
    copy_file(win_path, reamde_path);
    write_info(reamde_path); 

    cout << "Resizing " << reamde_path.filename() << "
";
    resize_file(reamde_path, 1024);
    write_info(reamde_path); 

    cout << "Removing " << reamde_path.filename() << "
";
    remove(reamde_path);
    write_info(reamde_path); 
  } catch(const exception& e) {
    cerr << "Exception: " << e.what() << endl;
  }
}
-----------------------------------------------------------------------
"C:/Windows/System32/kernel32.dll"      3657767 720632 
"C:\Users\lospi\AppData\Local\Temp\REAMDE" does not exist. 
Copying "kernel32.dll" to "REAMDE"
"C:\Users\lospi\AppData\Local\Temp\REAMDE"        3657767 720632 
Resizing "REAMDE"
"C:\Users\lospi\AppData\Local\Temp\REAMDE"        3659294 1024 
Removing "REAMDE"
"C:\Users\lospi\AppData\Local\Temp\REAMDE" does not exist. 

Listing 17-6: A program illustrating several methods for interacting with the filesystem. (Output is from a Windows 10 x64 system.)

The write_info function takes a single path parameter. You check whether this path exists , printing an error message and returning immediately if it doesn’t. If the path does exist, you print a message indicating its last modification time (in hours since epoch) and its file size .

Within main, you create a path win_path to kernel32.dll and a path to a nonexistent file called REAMDE in the filesystem’s temporary file directory at reamde_path . (Recall from Table 17-1 that you can use operator/ to concatenate two path objects.) Within a try-catch block, you invoke write_info on both paths ➎➏. (If you’re using a non-Windows machine, you’ll get different output. You can modify win_path to an existing file on your system to follow along.)

Next, you copy the file at win_path to reamde_path and invoke write_info on it . Notice that, as opposed to earlier , the file at reamde_path exists and it has the same last write time and file size as kernel32.dll.

You then resize the file at reamde_path to 1024 bytes and invoke write_info . Notice that the last write time increased from 3657767 to 3659294 and the file size decreased from 720632 to 1024.

Finally, you remove the file at reamde_path and invoke write_info , which tells you that the file again no longer exists.

NOTE

How filesystems resize files behind the scenes varies by operating system and is beyond the scope of this book. But you can think of how a resize operation might work conceptually as the resize operation on a std::vector. All the data at the end of the file that doesn’t fit into the file’s new size is discarded by the operating system.

Directory Iterators

The Filesystem library provides two classes for iterating over the elements of a directory: std::filesystem::directory_iterator and std::filesystem::recursive_directory_iterator. A directory_iterator won’t enter subdirectories, but the recursive_directory_iterator will. This section introduces the directory_iterator, but the recursive_directory_iterator is a drop-in replacement and supports all the following operations.

Constructing

The default constructor of directory_iterator produces the end iterator. (Recall that an input end iterator indicates when an input range is exhausted.) Another constructor accepts path, which indicates the directory you want to enumerate. Optionally, you can provide std::filesystem::directory_options, which is an enum class bitmask with the following constants:

  • none directs the iterator to skip directory symlinks. If the iterator encounters a permission denial, it produces an error.
  • follow_directory_symlink follows symlinks.
  • skip_permission_denied skips directories if the iterator encounters a permission denial.

Additionally, you can provide a std::error_code, which, like all other Filesystem library functions that accept an error_code, will set this parameter rather than throwing an exception if an error occurs during construction.

Table 17-2 summarizes these options for constructing a directory_iterator. Note that p is path and d is directory, op is directory_options, and ec is error_code in the table.

Table 17-2: A Summary of std::filestystem::directory_iterator Operations

Operation

Notes

directory_iterator{}

Constructs the end iterator.

directory_iterator{ p, [op], [ec] }

Constructs a directory iterator referring to the directory p. The argument op defaults to none. If provided, ec receives error conditions rather than throwing an exception.

directory_iterator { d }

d1 = d2

Copies construction/assignment.

directory_iterator { move(d) }

d1 = move(d2)

Moves construction/assignment.

Directory Entries

The input iterators directory_iterator and recursive_directory_iterator produce a std::filesystem::directory_entry element for each entry they encounter. The directory_entry class stores a path, as well as some attributes about that path exposed as methods. Table 17-3 lists these methods. Note that de is a directory_entry in the table.

Table 17-3: A Summary of std::filesystem::directory_entry Operations

Operation

Description

de.path()

Returns the referenced path.

de.exists()

Returns true if the referenced path exists on the filesystem.

de.is_block_file()

Returns true if the referenced path is a block device.

de.is_character_file()

Returns true if the referenced path is a character device.

de.is_directory()

Returns true if the referenced path is a directory.

de.is_fifo()

Returns true if the referenced path is a named pipe.

de.is_regular_file()

Returns true if the referenced path is a regular file.

de.is_socket()

Returns true if the referenced path is a socket.

de.is_symlink()

Returns true if the referenced path is a symlink

de.is_other()

Returns true if the referenced path is something else.

de.file_size()

Returns the size of the referenced path.

de.hard_link_count()

Returns the number of hard links to the referenced path.

de.last_write_time([t])

If t is provided, sets the last modified time of the referenced path; otherwise, it returns the last modified time.

de.status()
de.symlink_status()

Returns a std::filesystem::file_status for the referenced path.

You can employ directory_iterator and several of the operations in Table 17-3 to create a simple directory-listing program, as Listing 17-7 illustrates.

#include <iostream>
#include <filesystem>
#include <iomanip>

using namespace std;
using namespace std::filesystem;
using namespace std::chrono;

void describe(const directory_entry& entry) { 
  try {
    if (entry.is_directory()) { 
      cout << "           *";
    } else {
      cout << setw(12) << entry.file_size();
    }
    const auto lw_time =
      duration_cast<seconds>(entry.last_write_time().time_since_epoch());
    cout << setw(12) << lw_time.count()
      << " " << entry.path().filename().string()
      << "
"; 
  } catch (const exception& e) {
    cout << "Error accessing " << entry.path().string()
         << ": " << e.what() << endl; 
  }
}

int main(int argc, const char** argv) {
  if (argc != 2) {
    cerr << "Usage: listdir PATH";
    return -1; 
  }
  const path sys_path{ argv[1] }; 
  cout << "Size         Last Write  Name
";
  cout << "------------ ----------- ------------
"; 
  for (const auto& entry : directory_iterator{ sys_path }) 
    describe(entry); 
}
-----------------------------------------------------------------------
> listdir c:Windows
Size         Last Write  Name
------------ ----------- ------------
           * 13177963504 addins
           * 13171360979 appcompat
--snip--
           * 13173551028 WinSxS
      316640 13167963236 WMSysPr9.prx
       11264 13167963259 write.exe

Listing 17-7: A file- and directory-listing program that uses std::filesystem::directory_iterator to enumerate a given directory. (Output is from a Windows 10 x64 system.)

NOTE

You should modify the program’s name from listdir to whatever value matches your compiler’s output.

You first define a describe function that takes a path reference , which checks whether the path is a directory and prints an asterisk for a directory and a corresponding size for a file. Next, you determine the entry’s last modification in seconds since epoch and print it along with the entry’s associated filename . If any exception occurs, you print an error message and return .

Within main, you first check that the user invoked your program with a single argument and return with a negative number if not . Next, you construct a path using the single argument , print some fancy headers for your output , iterate over each entry in the directory , and pass it to describe .

Recursive Directory Iteration

The recursive_directory_iterator is a drop-in replacement for directory_iterator in the sense that it supports all the same operations but will enumerate subdirectories. You can use these iterators in combination to build a program that computes the size and quantity of files and subdirectories for a given directory. Listing 17-8 illustrates how.

#include <iostream>
#include <filesystem>

using namespace std;
using namespace std::filesystem;

struct Attributes {
  Attributes& operator+=(const Attributes& other) {
    this->size_bytes += other.size_bytes;
    this->n_directories += other.n_directories;
    this->n_files += other.n_files;
    return *this;
  }
  size_t size_bytes;
  size_t n_directories;
  size_t n_files;
}; 

void print_line(const Attributes& attributes, string_view path) {
  cout << setw(14) << attributes.size_bytes
       << setw(7) << attributes.n_files
       << setw(7) << attributes.n_directories
       << " " << path << "
"; 
}

Attributes explore(const directory_entry& directory) {
  Attributes attributes{};
  for(const auto& entry : recursive_directory_iterator{ directory.path() }) { 
      if (entry.is_directory()) {
        attributes.n_directories++; 
      } else {
        attributes.n_files++;
        attributes.size_bytes += entry.file_size(); 
      }
  }
  return attributes;
}

int main(int argc, const char** argv) {
  if (argc != 2) {
    cerr << "Usage: treedir PATH";
    return -1; 
  }
  const path sys_path{ argv[1] };
  cout << "Size           Files  Dirs   Name
";
  cout << "-------------- ------ ------ ------------
";
  Attributes root_attributes{};
  for (const auto& entry : directory_iterator{ sys_path }) { 
    try {
      if (entry.is_directory()) {
        const auto attributes = explore(entry); 
        root_attributes += attributes;
        print_line(attributes, entry.path().string());
        root_attributes.n_directories++;
      } else {
        root_attributes.n_files++;
        error_code ec;
        root_attributes.size_bytes += entry.file_size(ec); 
        if (ec) cerr << "Error reading file size: "
                     << entry.path().string() << endl;
      }
    } catch(const exception&) {
    }
  }
  print_line(root_attributes, argv[1]); 
}
-----------------------------------------------------------------------
> treedir C:Windows
Size         Files  Dirs Name
------------ ----- ----- ------------
           802      1      0 C:Windowsaddins
       8267330      9      5 C:Windowsapppatch
--snip--
   11396916465  73383  20480 C:WindowsWinSxS
   21038460348 110950  26513 C:Windows 

Listing 17-8: A file- and directory-listing program that uses std::filesystem::recursive_directory_iterator to list the number of files and total size of a given path’s subdirectory. (Output is from a Windows 10 x64 system.)

NOTE

You should modify the program’s name from treedir to whatever value matches your compiler’s output.

After declaring the Attributes class for storing accounting data , you define a print_line function that presents an Attributes instance in a user-friendly way alongside a path string . Next, you define an explore function that accepts a directory_entry reference and iterates over it recursively . If the resulting entry is a directory, you increment the directory count ; otherwise, you increment the file count and total size .

Within main, you check that the program invoked with exactly two arguments. If not, you return with an error code -1 . You employ a (non-recursive) directory_iterator to enumerate the contents of the target path referred by sys_path . If an entry is a directory, you invoke explore to determine its attributes , which you subsequently print to the console. You also increment the n_directories member of root_attributes to keep account. If the entry isn’t a directory, you add to the n_files and size_bytes members of root_attributes accordingly .

Once you’ve completed iterating over all sys_path subelements, you print root_attributes as the final line . The final line of output in Listing 17-8, for example, shows that this particular Windows directory contains 110,950 files occupying 21,038,460,348 bytes (about 21GB) and 26,513 subdirectories.

fstream Interoperation

You can construct file streams (basic_ifstream, basic_ofstream, or basic_fstream) using std::filesystem::path or std::filesystem::directory_entry in addition to string types.

For example, you can iterate over a directory and construct an ifstream to read each file you encounter. Listing 17-9 illustrates how to check for the magic MZ bytes at the beginning of each Windows portable executable file (a .sys, a .dll, a .exe, and so on) and report any file that violates this rule.

#include <iostream>
#include <fstream>
#include <filesystem>
#include <unordered_set>

using namespace std;
using namespace std::filesystem;

int main(int argc, const char** argv) {
  if (argc != 2) {
    cerr << "Usage: pecheck PATH";
    return -1; 
  }
  const unordered_set<string> pe_extensions{
    ".acm", ".ax",  ".cpl", ".dll", ".drv",
    ".efi", ".exe", ".mui", ".ocx", ".scr",
    ".sys", ".tsp"
  }; 
  const path sys_path{ argv[1] };
  cout << "Searching " << sys_path << " recursively.
";
  size_t n_searched{};
  auto iterator = recursive_directory_iterator{ sys_path,
                                 directory_options::skip_permission_denied }; 
  for (const auto& entry : iterator) { 
    try {
      if (!entry.is_regular_file()) continue;
      const auto& extension = entry.path().extension().string();
      const auto is_pe = pe_extensions.find(extension) != pe_extensions.end();
      if (!is_pe) continue; 
      ifstream file{ entry.path() }; 
      char first{}, second{};
      if (file) file >> first;
      if (file) file >> second; 
      if (first != 'M' || second != 'Z')
        cout << "Invalid PE found: " << entry.path().string() << "
"; 
      ++n_searched;
    } catch(const exception& e) {
      cerr << "Error reading " << entry.path().string()
           << ": " << e.what() << endl;
    }
  }
  cout << "Searched " << n_searched << " PEs for magic bytes." << endl; 
}
----------------------------------------------------------------------
listing_17_9.exe c:WindowsSystem32
Searching "c:\Windows\System32" recursively.
Searched 8231 PEs for magic bytes.

Listing 17-9: Searching the Windows System32 directory for Windows portable executable files

In main, you check for exactly two arguments and return an error code as appropriate . You construct an unordered_set containing all the extensions associated with portable executable files , which you’ll use to check file extensions. You use a recursive_directory_iterator with the directory_options::skip_permission_denied option to enumerate all the files in the specified path . You iterate over each entry , skipping over anything that’s not a regular file, and you determine whether the entry is a portable executable by attempting to find it in pe_extensions. If the entry doesn’t have such an extension, you skip over the file .

To open the file, you simply pass the path of the entry into the constructor of ifstream . You then use the resulting input file stream to read the first two bytes of the file into first and second . If these first two characters aren’t MZ, you print a message to the console . Either way, you increment a counter called n_searched. After exhausting the directory iterator, you print a message indicating n_searched to the user before returning from main .

Summary

In this chapter, you learned about the stdlib filesystem facilities, including paths, files, directories, and error handling. These facilities enable you to write cross-platform code that interacts with the files in your environment. The chapter culminated with some important operations, directory iterators, and interoperation with file streams.

EXERCISES

17-1. Implement a program that takes two arguments: a path and an extension. The program should search the given path recursively and print any file with the specified extension.

17-2. Improve the program in Listing 17-8 so it can take an optional second argument. If the first argument begins with a hyphen (-), the program reads all contiguous letters immediately following the hyphen and parses each letter as an option. The second argument then becomes the path to search. If the list of options contains an R, perform a recursive directory. Otherwise, don’t use a recursive directory iterator.

17-3. Refer to the documentation for the dir or ls command and implement as many of the options as possible in your new, improved version of Listing 17-8.

FURTHER READING

  • Windows NT File System Internals: A Developer’s Guide by Rajeev Nagar (O’Reilly, 1997)
  • The Boost C++ Libraries, 2nd Edition, by Boris Schäling (XML Press, 2014)
  • The Linux Programming Interface: A Linux and UNIX System Programming Handbook by Michael Kerrisk (No Starch Press, 2010)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset