Chapter 14. Directory Operations

Linux, like many other operating systems, uses directories to organize files. Directories (which are just special types of files that contain lists of file names) contain files, as well as other directories, allowing a file hierarchy to be built. All Linux systems have a root directory, known as /, through which (directly or indirectly) you access all the files on the system.

The Current Working Directory

Finding the Current Working Directory

The getcwd() function allows a process to find the name of its current directory relative to the system’s root directory.

#include <unistd.h>
char * getcwd(char * buf, size_t size);

The first parameter, buf, points to a buffer that is filled in with the path to the current working directory. If the current path is larger than size - 1 bytes long (the -1 allows the path to be '' terminated), the function returns an error of ERANGE. If the call succeeds, buf is returned; NULL is returned if an error occurs. Although most modern shells maintain a PWD environment variable that contains the path to the current directory, it does not necessarily have the same value a call to getcwd() would return. PWD often includes path elements that are symbolic links to other directories, but getcwd() always returns a path free from symbolic links.

If the current path is unknown (such as at program startup), the buffer that holds the current directory must be dynamically allocated because the current path may be arbitrarily large. Code that properly reads the current path looks like this:

char * buf;
int len = 50;

buf = malloc(len);
while (!getcwd(buf, len) && errno == ERANGE) {
    len += 50;
    buf = realloc(buf, len);
}

Linux, along with many other Unix systems, provides a useful extension to the POSIX getcwd() specification. If buf is NULL, the function allocates a buffer large enough to contain the current path through the normal malloc() mechanism. Although the caller must take care to properly free() the result, using this extension can make code look much cleaner than using a loop, as was shown in the earlier example.

BSD’s getwd() function is a commonly used alternative to getcwd(), but it suffers from certain defects that led to the development of getcwd().

#include <unistd.h>
char * getwd(char * buf);

Like getcwd(), getwd() fills in buf with the current path, although the function has no idea how large buf is. getwd() never writes more than PATH_MAX (defined through <limits.h>) to the buffer, which allows programs to avoid buffer overruns, but does not give the program any mechanism for finding the correct path if it is longer than PATH_MAX bytes![1] This function is supported by Linux only for legacy applications and should not be used by new applications. Instead, use the correct and more portable getcwd() function.

If the current directory path is displayed to users, it is normally a good idea to check the PWD environment variable. If it is set, it contains the path the user thinks he is using (which may contain symbolic links for some of the elements in the path), which is generally what the user would like an application to display. To make this easier, Linux’s C library provides the get_current_dir_name() function, which is implemented like this:

char * get_current_dir_name() {
    char * env = getenv("PWD");

    if (env)
        return strdup(env);
    else
        return getcwd(NULL, 0);
}

The . and .. Special Files

Every directory, including the root directory, includes two special files, called . and .., which are useful in some circumstances. The first, ., is the same as the current directory. This means that the file names somefile and ./somefile are equivalent.

The other special file name, .., is the current directory’s parent directory. For the root directory, .. refers to the root directory itself (because the root directory has no parent).

Both . and .. can be used wherever a directory name can be used. It is common to see symbolic links refer to paths such as ../include/mylib, and file names like /./foo/.././bar/./fubar/../../usr/bin/less are perfectly legal (although admittedly convoluted).[2]

Changing the Current Directory

There are two system calls that change a process’s current directory: chdir() and fchdir().

#include <unistd.h>
int chdir(const char * pathname);
int fchdir(int fd);

The first of these takes the name of a directory as its sole argument; the second takes a file descriptor that is an open directory. In either case, the specified directory is made the current working directory. These functions can fail if their arguments specify a file that is not a directory or if the process does not have proper permissions.

Changing the Root Directory

Although the system has a single root directory, the meaning of / may be changed for each process on the system. This is usually done to prevent suspect processes (such as ftp daemons handling requests from untrusted users) from accessing the complete file system. For example, if /home/ftp is specified as the process’s root directory, running chdir("/") will make the process’s current directory /home/ftp, and getcwd() will return / to keep things consistent for the process in question. To ensure security, if the process tries to chdir("/.."), it is left in its / directory (the system-wide /home/ftp directory), just as normal processes that chdir("/..") are left in the system-wide root directory. A process may easily change its current root directory through the chroot() system call. The process’s new root directory path is interpreted with the current root directory in place, so chroot("/") does not modify the process’s current root directory.

#include <unistd.h>
int chroot(const char * path);

Here, the path specifies the new root directory for the process. This system call does not change the current working directory of the process, however. The process can still access files in the current directory, as well as relative to it (that is, ../../directory/file). Most processes that chroot() themselves immediately change their current working directory to be inside the new root hierarchy with chdir("/"), or something similar, and not doing so would be a security problem in some applications.

Creating and Removing Directories

Creating New Directories

Creating new directories is straightforward.

#include <fcntl.h>
#include <unistd.h>
int mkdir(const char * dirname, mode_t mode);

The path specified by dirname is created as a new directory with permissions mode (which is modified by the process’s umask). If dirname specifies an existing file or if any of the elements of dirname are not a directory or a symbolic link to a directory, the system call fails.

Removing Directories

Removing a directory is almost exactly the same as removing a file; only the name of the system call is different.

#include <unistd.h>
int rmdir(char * pathname);

For rmdir() to succeed, the directory must be empty (other than the omnipresent . and .. entries); otherwise, ENOTEMPTY is returned.

Reading a Directory’s Contents

It is common for a program to need a list of the files contained in a directory. Linux provides a set of functions that allow a directory to be handled as an abstract entity to avoid forcing programs to depend on the exact format of directories employed by a file system. Opening and closing directories is straightforward.

#include <dirent.h>
DIR * opendir(const char * pathname);
int closedir(DIR * dir);

opendir() returns a pointer to a DIR data type, which is abstract (just like stdio’s FILE structure) and should not be manipulated outside the C library. As directories may be opened only for reading, it is not necessary to specify what mode the directory is opened with. opendir() succeeds only if the directory exists—it cannot be used to create new directories (use mkdir() for that). Closing a directory can fail only if the dir parameter is invalid.

Once the directory has been opened, directory entries are read sequentially until the end of the directory is reached.

readdir() returns the name of the next file in the directory. Directories are not ordered in any way, so do not assume that the contents of the directory are sorted. If you need a sorted list of files, you must sort the file names yourself. The readdir() function is defined like this:

#include <dirent.h>
struct dirent * readdir(DIR * dir);

A pointer to a struct dirent is returned to the caller. Although struct dirent contains multiple members, the only one that is portable is d_name, which holds the file name of the directory entry. The rest of struct dirent’s members are system specific. The only interesting one of these is d_ino, which contains the inode number of the file.

The only tricky part of this is determining when an error has occurred. Unfortunately, readdir() returns NULL if an error occurs or if there are no more entries in the directory. To differentiate between the two cases, you must check errno. This task is made more difficult by readdir() not changing errno unless an error occurs, which means errno must be set to a known value (normally, 0) before calling readdir() to allow proper error checking. Here is a simple program that writes the names of the files in the current directory to stdout:

 1: /* dircontents.c */
 2:
 3: #include <errno.h>
 4: #include <dirent.h>
 5: #include <stdio.h>
 6:
 7: int main(void) {
 8:     DIR * dir;
 9:     struct dirent * ent;
10:
11:     /* "." is the current directory */
12:     if (!(dir = opendir("."))) {
13:         perror("opendir");
14:         return 1;
15:     }
16:
17:     /* set errno to 0, so we can tell when readdir() fails */
18:     errno = 0;
19:     while ((ent = readdir(dir))) {
20:         puts(ent->d_name);
21:         /* reset errno, as puts() could modify it */
22:         errno = 0;
23:     }
24:
25:     if (errno) {
26:         perror("readdir");
27:         return 1;
28:     }
29:
30:     closedir(dir);
31:
32:     return 0;
33: }

Starting Over

If you need to reread the contents of a directory that has already been opened with opendir(), rewinddir() resets the DIR structure so that the next call to readdir() returns the first file in the directory.

#include <dirent.h>
int rewinddir(DIR * dir);

File Name Globbing

Most Linux users take it for granted that running ls *.c does not tell them all about the file in the current directory called *.c. Instead, they expect to see a list of all the file names in the current directory whose names end with .c. This file-name expansion from *.c to ladsh.c dircontents.c (for example) is normally handled by the shell, which globs all the parameters to programs it runs. Programs that help users manipulate files often need to glob file names, as well. There are two common ways to glob file names from inside a program.

Use a Subprocess

The oldest method is simply to run a shell as a child process and let it glob the file names for you. The standard popen()[3] function makes this simple—just run the command ls *.c through popen() and read the results. Although this may seem simplistic, it is a simple solution to the globbing problem and is highly portable (which is why applications like Perl use this approach).

Here is a program that globs all its arguments and displays all of the matches:

 1: /* popenglob.c */
 2:
 3: #include <stdio.h>
 4: #include <string.h>
 5: #include <sys/wait.h>
 6: #include <unistd.h>
 7:
 8: int main(int argc, const char ** argv) {
 9:     char buf[1024];
10:     FILE * ls;
11:     int result;
12:     int i;
13:
14:     strcpy(buf, "ls ");
15:
16:     for (i = 1; i < argc; i++) {
17:         strcat(buf, argv[i]);
18:         strcat(buf, " ");
19:     }
20:
21:     ls = popen(buf, "r");
22:     if (!ls) {
23:         perror("popen");
24:         return 1;
25:     }
26:
27:     while (fgets(buf, sizeof(buf), ls))
28:         printf("%s", buf);
29:
30:     result = pclose(ls);
31:
32:     if (!WIFEXITED(result)) return 1;
33:
34:     return 0;
35: }

Internal Globbing

If you need to glob many file names, running many subshells through popen() may be too inefficient. The glob() function allows you to glob file names without running any subprocesses, at the price of increased complexity and reduced portability. Although glob() is specified by POSIX.2, many Unix variants do not yet support it.

#include <glob.h>
int glob(const char *pattern, int flags,
         int errfunc(const char * epath, int eerrno), glob_t * pglob);

The first parameter, pattern, specifies the pattern that file names must match. This function understands the *, ?, and [] globbing operators, and optionally also the {, }, and ~ globbing operators, and treats them identically to the standard shells. The final parameter is a pointer to a structure that gets filled in with the results of the glob. The structure is defined like this:

#include <glob.h>

typedef struct {
    int gl_pathc;      /* number of paths in gl_pathv */
    char **gl_pathv;   /* list of gl_pathc matched pathnames */
    int gl_offs;       /* slots to reserve in gl_pathv for GLOB_DOOFS */
} glob_t;

The flags are of one or more of the following values bitwise OR’ed together:

GLOB_ERR

Returned if an error occurs (if the function cannot read the contents of a directory due to permissions problems, for example).

GLOB_MARK

If the pattern matches a directory name, that directory name will have a / appended to it on return.

GLOB_NOSORT

Normally, the returned pathnames are sorted alphabetically. If this flag is specified, they are not sorted.

GLOB_DOOFS

If set, the first pglob->gl_offs strings in the returned list of pathnames are left empty. This allows glob() to be used while building a set of arguments that will be passed directly to execv().

GLOB_NOCHECK

If no file names match the pattern, the pattern itself is returned as the sole match (usually, no matches are returned). In either case, if the pattern does not contain any globbing operators, the pattern is returned.

GLOB_APPEND

pglob is assumed to be a valid result from a previous call to glob(), and any results from this invocation are appended to the results from the previous call. This makes it easy to glob multiple patterns.

GLOB_NOESCAPE

Usually, if a backslash () precedes a globbing operator, the operator is taken as a normal character instead of being assigned its special meaning. For example, the pattern a* usually matches only a file named a*. If GLOB_NOESCAPE is specified, loses this special meaning, and a* matches any file name that begins with the characters a. In this case, a and acd would be matched, but arachnid would not because it does not contain a .

GLOB_PERIOD

Most shells do not allow glob operators to match files whose names begin with a . (try ls * in your home directory and compare it with ls -a.). The glob() function generally behaves this way, but GLOB_PERIOD allows the globbing operators to match a leading . character. GLOB_PERIOD is not defined by POSIX.

GLOB_BRACE

Many shells (following the lead of csh) expand sequences with braces as alternatives; for example, the pattern “{a,b} “is expanded to “a b “, and the pattern “a{,b,c} “to” a ab ac “. The GLOB_BRACE enables this behavior. GLOB_BRACE is not defined by POSIX.

GLOB_NOMAGIC

Acts just like GLOB_NOCHECK except that it appends the pattern to the list of results only if it contains no special characters. GLOB_NOMAGIC is not defined by POSIX.

GLOB_TILDE

Turns on tilde expansion, in which ~ or the substring ~/ is expanded to the path to the current user’s home directory, and ~user is expanded to the path to user’s home directory. GLOB_TILDE is not defined by POSIX.

GLOB_ONLYDIR

Matches only directories, not any other type of file. GLOB_ONLYDIR is not defined by POSIX.

Often, glob() encounters directories to which the process does not have access, which causes an error. Although the error may need to be handled in some manner, if the glob() returns the error (thanks to GLOB_ERR), there is no way to restart the globbing operation where the previous globbing operation encountered the error. As this makes it difficult both to handle errors that occur during a glob() and to complete the glob, glob() allows the errors to be reported to a function of the caller’s choice, which is specified in the third parameter to glob(). It should be prototyped as follows:

int globerr(const char * pathname, int globerrno);

The function is passed the pathname that caused the error and the errno value that resulted from one of opendir(), readdir(), or stat(). If the error function returns nonzero, glob() returns with an error. Otherwise, the globbing operation is continued.

The results of the glob are stored in the glob_t structure referenced by pglob. It includes the following members, which allow the caller to find the matched file names:

gl_pathc

The number of pathnames that matched the pattern

gl_pathv

Array of pathnames that matched the pattern

After the returned glob_t has been used, the memory it uses should be freed by passing it to globfree().

void globfree(glob_t * pglob);

glob() returns GLOB_NOSPACE if it ran out of memory, GLOB_ABEND if a read error caused the function to fail, GLOB_NOMATCH if no matches were found, or 0 if the function succeeded and found matches.

To help illustrate glob(), here is a program called globit, which accepts multiple patterns as arguments, globs them all, and displays the result. If an error occurs, a message describing the error is displayed, but the glob operation is continued.

 1: /* globit.c */
 2:
 3: #include <errno.h>
 4: #include <glob.h>
 5: #include <stdio.h>
 6: #include <string.h>
 7: #include <unistd.h>
 8:
 9: /* This is the error function we pass to glob(). It just displays
10:    an error and returns success, which allows the glob() to
11:    continue. */
12: int errfn(const char * pathname, int theerr) {
13:     fprintf(stderr, "error accessing %s: %s
", pathname,
14:             strerror(theerr));
15:
16:     /* We want the glob operation to continue, so return 0 */
17:     return 0;
18: }
19:
20: int main(int argc, const char ** argv) {
21:     glob_t result;
22:     int i, rc, flags;
23:
24:     if (argc < 2) {
25:         printf("at least one argument must be given
");
26:         return 1;
27:     }
28:
29:     /* set flags to 0; it gets changed to GLOB_APPEND later */
30:     flags = 0;
31:
32:     /* iterate over all of the command-line arguments */
33:     for (i = 1; i < argc; i++) {
34:         rc = glob(argv[i], flags, errfn, &result);
35:
36:         /* GLOB_ABEND can't happen thanks to errfn */
37:         if (rc == GLOB_NOSPACE) {
38:             fprintf(stderr, "out of space during glob operation
");
39:             return 1;
40:         }
41:
42:         flags |= GLOB_APPEND;
43:     }
44:
45:     if (!result.gl_pathc) {
46:         fprintf(stderr, "no matches
");
47:         rc = 1;
48:     } else {
49:         for (i = 0; i < result.gl_pathc; i++)
50:             puts(result.gl_pathv[i]);
51:         rc = 0;
52:     }
53:
54:     /* the glob structure uses memory from the malloc() pool, which
55:        needs to be freed */
56:     globfree(&result);
57:
58:     return rc;
59: }

Adding Directories and Globbing to ladsh

The evolution of ladsh continues here by adding four new features to ladsh3.c.

  1. The cd built-in, to change directories.

  2. The pwd built-in, to display the current directory.

  3. File name globbing.

  4. Some new messages are displayed to take advantage of strsignal(). These changes are discussed on page 222 of Chapter 12.

Adding cd and pwd

Adding the built-in commands is a straightforward application of chdir() and getcwd(). The code fits into runProgram() right where all the other built-in commands are handled. Here is how the built-in command-handling section looks in ladsh3.c:

422:     if (!strcmp(newJob.progs[0].argv[0], "exit")) {
423:         /* this should return a real exit code */
424:         exit(0);
425:     } else if (!strcmp(newJob.progs[0].argv[0], "pwd")) {
426:         len = 50;
427:         buf = malloc(len);
428:         while (!getcwd(buf, len) && errno == ERANGE) {
429:             len += 50;
430:             buf = realloc(buf, len);
431:         }
432:         printf("%s
", buf);
433:         free(buf);
434:         return 0;
435:     } else if (!strcmp(newJob.progs[0].argv[0], "cd")) {
436:         if (!newJob.progs[0].argv[1] == 1)
437:             newdir = getenv("HOME");
438:         else
439:             newdir = newJob.progs[0].argv[1];
440:         if (chdir(newdir))
441:             printf("failed to change current directory: %s
",
442:                     strerror(errno));
443:         return 0;
444:     } else if (!strcmp(newJob.progs[0].argv[0], "jobs")) {
445:         for (job = jobList->head; job; job = job->next)
446:             printf(JOB_STATUS_FORMAT, job->jobId, "Running",
447:                     job->text);
448:         return 0;
449:     }

Adding File Name Globbing

File name globbing, in which the shell expands the *, [], and ? characters into matching file names, is a bit tricky to implement because of the various quoting methods. The first modification is to build up each argument as a string suitable for passing to glob(). If a globbing character is quoted by a shell quoting sequence (enclosed in double quotes, for example), then the globbing character is prefixed by a to prevent glob() from expanding it. Although this sounds tricky, it is easy to do.

Two parts of parseCommand()’s command parsing need to be slightly modified. The " and ' sequences are handled near the top of the loop, which splits a command string into arguments. If we are in the middle of a quoted string and we encounter a globbing character, we quote the globbing character with a while parsing it, which looks like this:

189:         } else if (quote) {
190:             if (*src == '') {
191:                 src++;
192:                 if (!*src) {
193:                     fprintf(stderr,
194:                             "character expected after \
");
195:                     freeJob(job);
196:                     return 1;
197:                 }
198:
199:                 /* in shell, "'" should yield ' */
200:                 if (*src != quote) *buf++ = '';
201:             } else if (*src == '*' || *src == '?' || *src == '[' ||
202:                        *src == ']')
203:                 *buf++ = '';
204:             *buf++ = *src;
205:         } else if (isspace(*src)) {

Only the middle else if and the assignment statement in its body were added to the code. Similar code needs to be added to the handling of characters that occur outside quoted strings. This case is handled at the end of parseCommand()’s main loop. Here is the modified code:

329:         case '':
330:             src++;
331:             if (!*src) {
332:                 freeJob(job);
333:                 fprintf(stderr, "character expected after \
");
334:                 return 1;
335:             }
336:             if (*src == '*' || *src == '[' || *src == ']'
337:                             || *src == '?')
338:                 *buf++ = '';
339:             /* fallthrough */
340:         default:
341:             *buf++ = *src;

The same code was added here to quote the globbing characters.

Those two sequences of code ensure that each argument may be passed to glob() without finding unintended matches. Now we add a function, globLastArgument(), which globs the most recently found argument for a child program and replaces it with whatever matches it finds.

To help ease the memory management, a glob_t called globResult, which is used to hold the results of all glob operations, has been added to struct childProgram. We also added an integer, freeGlob, which is nonzero if freeJob() should free the globResult contained in the structure. Here is the complete definition for struct childProgram in ladsh3.c:

35: struct childProgram {
36:     pid_t pid;                  /* 0 if exited */
37:     char ** argv;               /* program name and arguments */
38:     int numRedirections;        /* elements in redirection array */
39:     struct redirectionSpecifier * redirections;  /* I/O redirs */
40:     glob_t globResult;          /* result of parameter globbing */
41:     int freeGlob;               /* should we free globResult? */
42: };

The first time globLastArgument() is run for a command string (when argc for the current child is 1), it initializes globResult. For the rest of the arguments, it takes advantage of GLOB_APPEND to add new matches to the end of the existing matches. This prevents us from having to allocate our own memory for globbing because our single glob_t is automatically expanded as necessary.

If globLastArgument() does not find any matches, the quoting characters are removed from the argument. Otherwise, all the new matches are copied into the list of arguments being constructed for the child program.

Here is the complete implementation of globLastArgument(). All the tricky parts are related to memory management; the actual globbing is similar to the globit.c sample program presented earlier in this chapter.

 87: void globLastArgument(struct childProgram * prog, int * argcPtr,
 88:                         int * argcAllocedPtr) {
 89:     int argc = *argcPtr;
 90:     int argcAlloced = *argcAllocedPtr;
 91:     int rc;
 92:     int flags;
 93:     int i;
 94:     char * src, * dst;
 95:
 96:     if (argc > 1) {        /* cmd->globResult already initialized */
 97:         flags = GLOB_APPEND;
 98:         i = prog->globResult.gl_pathc;
 99:     } else {
100:         prog->freeGlob = 1;
101:         flags = 0;
102:         i = 0;
103:     }
104:
105:    rc = glob(prog->argv[argc - 1], flags, NULL, &prog->globResult);
106:     if (rc == GLOB_NOSPACE) {
107:         fprintf(stderr, "out of space during glob operation
");
108:         return;
109:     } else if (rc == GLOB_NOMATCH ||
110:                (!rc && (prog->globResult.gl_pathc - i) == 1 &&
111:                 !strcmp(prog->argv[argc - 1],
112:                         prog->globResult.gl_pathv[i]))) {
113:         /* we need to remove whatever  quoting is still present */
114:         src = dst = prog->argv[argc - 1];
115:         while (*src) {
116:             if (*src != '') *dst++ = *src;
117:             src++;
118:         }
119:         *dst = '';
120:     } else if (!rc) {
121:         argcAlloced += (prog->globResult.gl_pathc - i);
122:         prog->argv = realloc(prog->argv,
123:                              argcAlloced * sizeof(*prog->argv));
124:         memcpy(prog->argv + (argc - 1),
125:                prog->globResult.gl_pathv + i,
126:                sizeof(*(prog->argv)) *
127:                       (prog->globResult.gl_pathc - i));
128:         argc += (prog->globResult.gl_pathc - i - 1);
129:     }
130:
131:     *argcAllocedPtr = argcAlloced;
132:     *argcPtr = argc;
133: }

The final changes are the calls to globLastArgument() that need to be made once a new argument has been parsed. The calls are added in two places: when white space is found outside a quoted string and when the entire command string has been parsed. Both of the calls look like this:

globLastArgument(prog, &argc, &argvAlloced);

The complete source code for ladsh3.c is available on the LAD Web site at http://ladweb.net/lad/src/.

Walking File System Trees

There are two functions available that make it easy for applications to look at all of the files in a directory, including files in subdirectories. Recursing through all entries in a tree (such as a file system) is often called walking the structure, and is the reason these two functions are called ftw() and nftw(), which stand for file tree walk and new file tree walk. As the names suggest, nftw() is an enhanced version of ftw.

Using ftw()

#include <ftw.h>

int ftw(const char *dir, ftwFunctionPointer callback, int depth);

The ftw() function starts in the directory dir and calls the function pointed to by callback for every file it finds in that directory and any subdirectories. The function is called for all file types, including symbolic links and directories. The implementation of ftw() opens every directory it finds (using up a file descriptor), and, to improve performance, it does not close them until it is finished reading all of the entries in that directory. This means that it uses as many file descriptors as there are levels of subdirectories. To prevent the application from running out of file descriptors, the depth parameter limits how many file descriptors ftw() will leave open at one time. If this limit is hit, performance slows down, because directories need to be opened and closed repeatedly.

The callback points to a function defined as follows:

int ftwCallbackFunction(const char *file, const struct stat * sb,
                        int flag);

This function is called once for every file in the directory tree, and the first parameter, file, gives the name of the file beginning with the dir passed to ftw(). For example, if the dir was ".", one file name might be "./.bashrc". If "/etc" was used instead, a file name would be /etc/hosts.

The second argument to the callback, sb, points to a struct stat that resulted from a stat() on the file.[4] The flag argument provides information on the file, and takes one of the following values:

FTW_F

The file is not a symbolic link or a directory.

FTW_D

The file is a directory or a symbolic link pointing to a directory.

FTW_DNR

The file is a directory that the application does not have permission to read (so it cannot be traversed).

FTW_SL

The file is a symbolic link.

FTW_NS

The file is an object on which stat() failed. An example of this would be a file in a directory that the application has read permission for (allowing the application to get a list of the files in that directory) but not execute permission for (preventing the stat() call from succeeding on the files in that directory).

When a file is a symbolic link, ftw() attempts to follow that link and return information on the file it points to (ftw() traverses the same directory multiple times if there are multiple symbolic links to that directory, although it is smart enough to avoid loops). If it is a broken link, however, it is not defined whether FTW_SL or FTW_NS is returned. This is a good reason to use nftw() instead.

If the callback function returns zero, the directory traversal continues. If a nonzero value is returned, the file tree walk ends and that value is returned by ftw(). If the traversal completes normally, ftw() returns zero, and it returns -1 if an error occurs as part of ftw().

File Tree Walks with nftw()

The new version of ftw(), nftw() addresses the symbolic link ambiguities inherent in ftw() and includes some additional features. To have nftw() properly defined by the header files, the application needs to define _XOPEN_SOURCE to be 500 or greater. Here is the prototype for nftw():

#define _XOPEN_SOURCE 600
#include <ftw.h>

int nftw(const char *dir, ftwFunctionPointer callback, int depth, int
flags);

int nftwCallbackFunction(const char *file, const struct stat * sb,
                         int flag, struct FTW * ftwInfo);

Comparing nftw() to ftw() shows a single new parameter, flags. It can be one or more of the following flags logically OR’ed together:

FTW_CHDIR

nftw() does not normally change the current directory of the program. When FTW_CHDIR is specified, nftw() changes the current directory to whatever directory is being currently read. In other words, when the callback is invoked, the file name it is passed is always in the current directory.

FTW_DEPTH

By default, nftw() reports a directory name before the files in the directory. This flag causes that order to be reversed, with the contents of a directory being reported before the directory itself.[5]

FTW_MOUNT

This flag prevents nftw() from crossing a file system boundary during the traversal. If you are not sure what a file system is, refer to [Sobell, 2002].

FTW_PHYS

Rather than follow symbolic links, nftw() will report the links but not follow them. A side effect of this is that the callback gets the result of an lstat() call rather than a stat() call.

[5] This flag causes nftw() to provide a depth first search. There is no similar flag for a breadth first search.

The flag argument to the callback can take on two new values for nftw(), in addition to the values we have already mentioned for ftw().

FTW_DP

The item is a directory whose contents have already been reported (this can occur only if FTW_DEPTH was specified).

FTW_SLN

The item is a symbolic link that points to a nonexistent file (it is a broken link). This can occur only if FTW_PHYS was not specified; if it was, FTW_SL would be passed.

These extra flag values make the behavior nftw() for symbolic links well specified. If FTW_PHYS is used, all symbolic links return FTW_SL. Without nftw(), broken links yield FTW_NS and other symbolic links give the same result as the target of the link.

The callback for nftw() takes one more argument, ftwInfo. It is a pointer to a struct FTW, which is defined as:

#define _XOPEN_SOURCE 600
#include <ftw.h>

struct FTW {
    int base;
    int level;
};

The base is the offset of the file name within the full path passed to the callback. For example, if the full path passed was /usr/bin/ls, the base would be 9, and file + ftwInfo->base would give the file name ls. The level is the number of directories below the original directory this file is. If ls was found in an nftw() that began in /usr, the level would be 1. If the search began in /usr/bin, the level would instead be 0.

Implementing find

The find command searches one or more directory trees for files that match certain characteristics. Here is a simple implementation of find that is built around nftw(). It uses fnmatch()[6] to implement the -name switch, and illustrates many of the flags nftw() understands.

 1: /* find.c */
 2:
 3: #define _XOPEN_SOURCE 600
 4:
 5: #include <fnmatch.h>
 6: #include <ftw.h>
 7: #include <limits.h>
 8: #include <stdio.h>
 9: #include <stdlib.h>
10: #include <string.h>
11:
12: const char * name = NULL;
13: int minDepth = 0, maxDepth = INT_MAX;
14:
15: int find(const char * file, const struct stat * sb, int flags,
16:          struct FTW * f) {
17:     if (f->level < minDepth) return 0;
18:     if (f->level > maxDepth) return 0;
19:    if (name && fnmatch(name, file + f->base, FNM_PERIOD)) return 0;
20:
21:     if (flags == FTW_DNR) {
22:         fprintf(stderr, "find: %s: permission denied
", file);
23:     } else {
24:         printf("%s
", file);
25:     }
26:
27:     return 0;
28: }
29:
30: int main(int argc, const char ** argv) {
31:     int flags = FTW_PHYS;
32:     int i;
33:     int problem = 0;
34:     int tmp;
35:     int rc;
36:     char * chptr;
37:
38:     /* look for first command line parameter (which must occur after
39:         the list of paths */
40:     i = 1;
41:     while (i < argc && *argv[i] != '-') i++;
42:
43:     /* handle the command line options */
44:     while (i < argc && !problem) {
45:         if (!strcmp(argv[i], "-name")) {
46:             i++;
47:             if (i == argc)
48:                 problem = 1;
49:             else
50:                 name = argv[i++];
51:         } else if (!strcmp(argv[i], "-depth")) {
52:             i++;
53:             flags |= FTW_DEPTH;
54:         } else if (!strcmp(argv[i], "-mount") ||
55:                    !strcmp(argv[i], "-xdev")) {
56:             i++;
57:             flags |= FTW_MOUNT;
58:         } else if (!strcmp(argv[i], "-mindepth") ||
59:                    !strcmp(argv[i], "-maxdepth")) {
60:             i++;
61:             if (i == argc)
62:                 problem = 1;
63:             else {
64:                 tmp = strtoul(argv[i++], &chptr, 10);
65:                 if (*chptr)
66:                     problem = 1;
67:                 else if (!strcmp(argv[i - 2], "-mindepth"))
68:                     minDepth = tmp;
69:                 else
70:                     maxDepth = tmp;
71:             }
72:         }
73:     }
74:
75:     if (problem) {
76:         fprintf(stderr, "usage: find <paths> [-name <str>] "
77:                 "[-mindepth <int>] [-maxdepth <int>]
");
78:         fprintf(stderr, "       [-xdev] [-depth]
");
79:         return 1;
80:     }
81:
82:     if (argc == 1 || *argv[1] == '-') {
83:         argv[1] = ".";
84:         argc = 2;
85:     }
86:
87:     rc = 0;
88:     i = 1;
89:     flags =0;
90:     while (i < argc && *argv[i] != '-')
91:         rc |= nftw(argv[i++], find, 100, flags);
92:
93:     return rc;
94: }

Directory Change Notification

Applications may wish to know when the contents of a directory change. File managers, for example, may list the contents of a directory in a window and would like to keep that window up-to-date when other programs modify that directory. While the application could recheck the directory at regular intervals, Linux can send a program a signal when a directory is modified, allowing timely updates without the overhead (and delays) of polling.

The fcntl() system call is used to register for notifications of updates to a directory. Recall from Chapter 11 that this system call takes three arguments, the first is the file descriptor we are interested in, the second is the command we want fcntl() to perform, and the final one is an integer value specific to that command. For directory notifications, the first argument is a file descriptor referring to the directory of interest. This is the only case in which a directory should be opened through the normal open() system call instead of opendir(). The command to register for notifications is F_NOTIFY, and the last argument specifies what types of events cause a signal to be sent. It should be one or more of the following flags logically OR’ed together:

DN_ACCESS

A file in the directory has been read from.

DN_ATTRIB

The ownership or permissions of a file in the directory were changed.

DN_CREATE

A new file was created in the directory (this includes new hard links being made to existing files).

DN_DELETE

A file was removed from the directory.

DN_MODIFY

A file in the directory was modified (truncation is a type of modification).

DN_RENAME

A file in the directory was renamed.

To cancel event notification, call fcntl() with a command of F_NOTIFY and a final argument of zero.

Normally, directory notification is automatically canceled after a single signal has been sent. To keep directory notification in effect, the final argument to fcntl() should be OR’ed with DN_MULTISHOT, which causes signals to be sent for all appropriate events until the notification is canceled.

By default, SIGIO is sent for directory notification. If the application wants to use a different signal for this (it may want to use different signals for different directories, for example), the F_SETSIG command of fcntl() can be used, with the final argument to fcntl() specifying the signal to send for that directory. If F_SETSIG is used (even if the signal specified is SIGIO, the kernel also places the file descriptor for the directory in the si_fd member of the siginfo_t argument of the signal handler,[7] letting the application know which of the directories being monitored was updated.[8] If multiple directories are being monitored and a single signal is used for all of them, it is critical to use a real-time signal to make sure no events are lost.

Here is an example program that uses directory change notification to display messages when files are removed or added to any of the directories it is monitoring (any number of which may be specified on the command line). It registers to receive SIGRTMIN whenever a directory changes, and uses si_fd to discover which directory has been modified. To prevent any race conditions, the program uses queued signals and signal blocking. The only time a signal can be delivered is when sigsuspend() is called on line 203. This ensures that any changes to a directory that occur while that directory is being scanned force a rescan of that directory; otherwise those changes might not be noticed. Using queued signals lets any number of directory changes occur while the program is working; those signals are delivered as soon as sigsuspend() is called again, making sure nothing is forgotten.

  1: /* dirchange.c */
  2:
  3: #define _GNU_SOURCE
  4: #include <dirent.h>
  5: #include <errno.h>
  6: #include <fcntl.h>
  7: #include <signal.h>
  8: #include <stdio.h>
  9: #include <stdlib.h>
 10: #include <string.h>
 11: #include <unistd.h>
 12:
 13: /* We use a linked list to store the names of all of the files in
 14:    each directory. The exists field is used for housekeeping work
 15:    when we check for changes. */
 16: struct fileInfo {
 17:     char * name;
 18:     struct fileInfo * next;
 19:     int exists;
 20: };
 21:
 22: /* This is a global array. It matches file descriptors to directory
 23:    paths, stores a list of files in the directory, and gives a place
 24:    for the signal handler to indicate that the directory needs to be
 25:    rescanned. The last entry has a NULL path to mark the end of the
 26:    array. */
 27:
 28: struct directoryInfo {
 29:     char * path;
 30:     int fd;
 31:     int changed;
 32:     struct fileInfo * contents;
 33: } * directoryList;
 34:
 35: /* This will never return an empty list; all directories contain at
 36:    least "." and ".." */
 37: int buildDirectoryList(char * path, struct fileInfo ** listPtr) {
 38:     DIR * dir;
 39:     struct dirent * ent;
 40:     struct fileInfo * list = NULL;
 41:
 42:     if (!(dir = opendir(path))) {
 43:         perror("opendir");
 44:         return 1;
 45:     }
 46:
 47:     while ((ent = readdir(dir))) {
 48:         if (!list) {
 49:             list = malloc(sizeof(*list));
 50:             list->next = NULL;
 51:             *listPtr = list;
 52:         } else {
 53:             list->next = malloc(sizeof(*list));
 54:             list = list->next;
 55:         }
 56:
 57:         list->name = strdup(ent->d_name);
 58:     }
 59:
 60:     if (errno) {
 61:         perror("readdir");
 62:         closedir(dir);
 63:         return 1;
 64:     }
 65:
 66:     closedir(dir);
 67:
 68:     return 0;
 69: }
 70:
 71: /* Scans the directory path looking for changes from the previous
 72:    contents, as specified by the *listPtr. The linked list is
 73:    updated to reflect the new contents, and messages are printed
 74:    specifying what changes have occured. */
 75: int updateDirectoryList(char * path, struct fileInfo ** listPtr) {
 76:     DIR * dir;
 77:     struct dirent * ent;
 78:     struct fileInfo * list = *listPtr;
 79:     struct fileInfo * file, * prev;
 80:
 81:     if (!(dir = opendir(path))) {
 82:         perror("opendir");
 83:         return 1;
 84:     }
 85:
 86:     for (file = list; file; file = file->next)
 87:         file->exists = 0;
 88:
 89:     while ((ent = readdir(dir))) {
 90:         file = list;
 91:         while (file && strcmp(file->name, ent->d_name))
 92:             file = file->next;
 93:
 94:         if (!file) {
 95:             /* new file, add it to the list */
 96:             printf("%s created in %s
", ent->d_name, path);
 97:             file = malloc(sizeof(*file));
 98:             file->name = strdup(ent->d_name);
 99:             file->next = list;
100:             file->exists = 1;
101:             list = file;
102:         } else {
103:             file->exists = 1;
104:         }
105:     }
106:
107:     closedir(dir);
108:
109:     file = list;
110:     prev = NULL;
111:     while (file) {
112:         if (!file->exists) {
113:             printf("%s removed from %s
", file->name, path);
114:             free(file->name);
115:
116:             if (!prev) {
117:                 /* removing the head node */
118:                 list = file->next;
119:                 free(file);
120:                 file = list;
121:             } else {
122:                 prev->next = file->next;
123:                 free(file);
124:                 file = prev->next;
125:             }
126:         } else {
127:             prev = file;
128:             file = file->next;
129:         }
130:     }
131:
132:     *listPtr = list;
133:
134:     return 0;
135: }
136:
137: void handler(int sig, siginfo_t * siginfo, void * context) {
138:     int i;
139:
140:     for (i = 0; directoryList[i].path; i++) {
141:         if (directoryList[i].fd == siginfo->si_fd) {
142:             directoryList[i].changed = 1;
143:             return;
144:         }
145:     }
146: }
147:
148: int main(int argc, char ** argv) {
149:     struct sigaction act;
150:     sigset_t mask, sigio;
151:     int i;
152:
153:     /* Block SIGRTMIN. We don't want to receive this anywhere but
154:        inside of the sigsuspend() system call. */
155:     sigemptyset(&sigio);
156:     sigaddset(&sigio, SIGRTMIN);
157:     sigprocmask(SIG_BLOCK, &sigio, &mask);
158:
159:     act.sa_sigaction = handler;
160:     act.sa_flags = SA_SIGINFO;
161:     sigemptyset(&act.sa_mask);
162:     sigaction(SIGRTMIN, &act, NULL);
163:
164:     if (!argv[1]) {
165:        /* no arguments given, fix up argc/argv to look like "." was
166:             given as the only argument */
167:        argv[1] = ".";
168:        argc++;
169:     }
170:
171:     /* each argument is a directory to watch */
172:     directoryList = malloc(sizeof(*directoryList) * argc);
173:     directoryList[argc - 1].path = NULL;
174:
175:     for (i = 0; i < (argc - 1); i++) {
176:         directoryList[i].path = argv[i + 1];
177:         if ((directoryList[i].fd =
178:                 open(directoryList[i].path, O_RDONLY)) < 0) {
179:             fprintf(stderr, "failed to open %s: %s
",
180:                     directoryList[i].path, strerror(errno));
181:             return 1;
182:         }
183:
184:         /* monitor the directory before scanning it the first time,
185:            ensuring we catch files created by someone else while
186:            we're scanning it. If someone does happen to change it,
187:            a signal will be generated (and blocked until we're
188:            ready for it) */
189:         if (fcntl(directoryList[i].fd, F_NOTIFY, DN_DELETE |
190:                        DN_CREATE | DN_RENAME | DN_MULTISHOT)) {
191:             perror("fcntl F_NOTIFY");
192:             return 1;
193:         }
194:
195:         fcntl(directoryList[i].fd, F_SETSIG, SIGRTMIN);
196:
197:         if (buildDirectoryList(directoryList[i].path,
198:                                &directoryList[i].contents))
199:             return 1;
200:     }
201:
202:     while (1) {
203:         sigsuspend(&mask);
204:
205:         for (i = 0; directoryList[i].path; i++)
206:             if (directoryList[i].changed)
207:                 if (updateDirectoryList(directoryList[i].path,
208:                                    &directoryList[i].contents))
209:                     return 1;
210:     }
211:
212:     return 0;
213: }


[1] That is right; PATH_MAX is not an actual limit. POSIX considers it indeterminate, which is morally equivalent to “do not use this.”

[2] For the curious, that pathname is equivalent to the much simpler /usr/bin/less.

[3] See page 132 for information on popen().

[4] ftw() has to stat every file it finds to determine whether or not it is a directory, and passing this information to the callback prevents the callback from having to stat the files again in many cases.

[6] The fnmatch() function is described on pages 555-556.

[7] This is exactly the same as the method used for file leases, discussed in Chapter 13.

[8] The signal handler still must be registered with the SA_SIGINFO flag for the file descriptor to reach the signal handler properly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset