Perhaps you've never thought about files as an IPC mechanism before, but they shoulder the lion's share of interprocess communication--far more than all other means combined. When one process deposits its precious data in a file and another process later retrieves that data, those processes have communicated. Files offer something unique among all forms of IPC covered here: like a papyrus scroll unearthed after millennia buried in the desert, a file can be unearthed and read long after its writer's personal end.[6] Factoring in persistence with comparative ease of use, it's no wonder that files remain popular.
Using files to transmit information from the dead past to some unknown future poses few surprises. You write the file to some permanent medium like a disk, and that's about it. (You might tell a web server where to find it, if it contains HTML.) The interesting challenge is when all parties are still alive and trying to communicate with one another. Without some agreement about whose turn it is to have their say, reliable communication is impossible; agreement may be achieved through file locking, which is covered in the next section. In the section after that, we discuss the special relationship that exists between a parent process and its children, which allows related parties to exchange information through inherited access to the same files.
Files certainly have their limitations when it comes to things like remote access, synchronization, reliability, and session management. Other sections of the chapter cover various IPC mechanisms invented to address such limitations.
In a multitasking environment, you need to be careful not to collide with other processes that are trying to use the same file you're using. As long as all processes are just reading, there's no problem, but as soon as even one process needs to write to the file, complete chaos ensues unless some sort of locking mechanism acts as traffic cop.
Never use the mere existence of a filename (that is,
-e $file
) as a locking indication, because a race
condition exists between the test for existence of that filename and
whatever you plan to do with it (like create it, open it, or unlink
it). See Section
23.2.2 in Chapter 23,
for more about this.
Perl's portable locking interface is the
flock
(HANDLE
,FLAGS
)
function, described in Chapter
29. Perl maximizes portability by using only the simplest and
most widespread locking features found on the broadest range of
platforms. These semantics are simple enough that they can be
emulated on most systems, including those that don't support the
traditional syscall of that name, such as System V or Windows NT.
(If you're running a Microsoft system earlier than NT, though,
you're probably out of luck, as you would be if you're running a
system from Apple before Mac OS X.)
Locks come in two varieties: shared (the
LOCK_SH
flag) and exclusive (the
LOCK_EX
flag). Despite the suggestive sound of
"exclusive", processes aren't required to obey locks on files. That
is, flock
only implements advisory
locking, which means that locking a file does not stop
another process from reading or even writing the file. Requesting an
exclusive lock is just a way for a process to let the operating
system suspend it until all current lockers, whether shared or
exclusive, are finished with it. Similarly, when a process asks for
a shared lock, it is just suspending itself until there is no
exclusive locker. Only when all parties use the file-locking
mechanism can a contended file be accessed safely.
Therefore, flock
is a blocking
operation by default. That is, if you can't get the lock you want
immediately, the operating system suspends your process till you
can. Here's how to get a blocking, shared lock, typically used for
reading a file:
use Fcntl qw(:DEFAULT :flock); open(FH, "< filename") or die "can't open filename: $!"; flock(FH, LOCK_SH) or die "can't lock filename: $!"; # now read from FH
You can try to acquire a lock in a nonblocking fashion by
including the LOCK_NB
flag in the
flock
request. If you can't be given the lock
right away, the function fails and immediately returns false. Here's
an example:
flock(FH, LOCK_SH | LOCK_NB) or die "can't lock filename: $!";
You may wish to do something besides raising an exception as we did here, but you certainly don't dare do any I/O on the file. If you are refused a lock, you shouldn't access the file until you can get the lock. Who knows what scrambled state you might find the file in? The main purpose of the nonblocking mode is to let you go off and do something else while you wait. But it can also be useful for producing friendlier interactions by warning users that it might take a while to get the lock, so they don't feel abandoned:
use Fcntl qw(:DEFAULT :flock); open(FH, "< filename") or die "can't open filename: $!"; unless (flock(FH, LOCK_SH | LOCK_NB)) { local $| = 1; print "Waiting for lock on filename…"; flock(FH, LOCK_SH) or die "can't lock filename: $!"; print "got it. " } # now read from FH
Some people will be tempted to put that nonblocking lock into a loop. The main problem with nonblocking mode is that, by the time you get back to checking again, someone else may have grabbed the lock because you abandoned your place in line. Sometimes you just have to get in line and wait. If you're lucky there will be some magazines to read.
Locks are on filehandles, not on filenames.[7] When you close the file, the lock dissolves
automatically, whether you close the file explicitly by calling
close
or implicitly by reopening the handle or by
exiting your process.
To get an exclusive lock, typically used for writing,
you have to be more careful. You cannot use a regular
open
for this; if you use an open mode of
<
, it will fail on files that don't exist yet,
and if you use >
, it will clobber any files
that do. Instead, use sysopen
on the file so it
can be locked before getting overwritten. Once you've safely opened
the file for writing but haven't yet touched it, successfully
acquire the exclusive lock and only then
truncate the file. Now you may overwrite it with the new
data.
use Fcntl qw(:DEFAULT :flock); sysopen(FH, "filename", O_WRONLY | O_CREAT) or die "can't open filename: $!"; flock(FH, LOCK_EX) or die "can't lock filename: $!"; truncate(FH, 0) or die "can't truncate filename: $!"; # now write to FH
If you want to modify the contents of a file in
place, use sysopen
again. This time you ask for
both read and write access, creating the file if needed. Once the
file is opened, but before you've done any reading or writing, get
the exclusive lock and keep it around your entire transaction. It's
often best to release the lock by closing the file because that
guarantees all buffers are written before the lock is
released.
An update involves reading in old values and writing out new ones. You must do both operations under a single exclusive lock, lest another process read the (imminently incorrect) value after (or even before) you do, but before you write. (We'll revisit this situation when we cover shared memory later in this chapter.)
use Fcntl qw(:DEFAULT :flock); sysopen(FH, "counterfile", O_RDWR | O_CREAT) or die "can't open counterfile: $!"; flock(FH, LOCK_EX) or die "can't write-lock counterfile: $!"; $counter = <FH> || 0; # first time would be undef seek(FH, 0, 0) or die "can't rewind counterfile : $!"; print FH $counter+1, " " or die "can't write counterfile: $!"; # next line technically superfluous in this program, but # a good idea in the general case truncate(FH, tell(FH)) or die "can't truncate counterfile: $!"; close(FH) or die "can't close counterfile: $!";
You can't lock a file you haven't opened yet, and you
can't have a single lock that applies to more than one file. What
you can do, though, is use a completely separate file to act as a
sort of semaphore, like a traffic light, to provide controlled
access to something else through regular shared and exclusive locks
on the semaphore file. This approach has several advantages. You can
have one lockfile that controls access to multiple files, avoiding
the kind of deadlock that occurs when one process tries to lock
those files in one order while another process is trying to lock
them in a different order. You can use a semaphore file to lock an
entire directory of files. You can even control access to something
that's not even in the filesystem, like a shared memory object or
the socket upon which several preforked servers would like to call
accept
.
If you have a DBM file that doesn't provide its own
explicit locking mechanism, an auxiliary lockfile is the best way to
control concurrent access by multiple agents. Otherwise, your DBM
library's internal caching can get out of sync with the file on
disk. Before calling dbmopen
or
tie
, open and lock the semaphore file. If you
open the database with O_RDONLY
, you'll want to
use LOCK_SH
for the lock. Otherwise, use
LOCK_EX
for exclusive access to updating the
database. (Again, this only works if all participants agree to pay
attention to the semaphore.)
use Fcntl qw(:DEFAULT :flock); use DB_File; # demo purposes only; any db is fine $DBNAME = "/path/to/database"; $LCK = $DBNAME . ".lockfile"; # use O_RDWR if you expect to put data in the lockfile sysopen(DBLOCK, $LCK, O_RDONLY | O_CREAT) or die "can't open $LCK: $!"; # must get lock before opening database flock(DBLOCK, LOCK_SH) or die "can't LOCK_SH $LCK: $!"; tie(%hash, "DB_File", $DBNAME, O_RDWR | O_CREAT) or die "can't tie $DBNAME: $!";
Now you can safely do whatever you'd like with the tied
%hash
. When you're done with your database, make
sure you explicitly release those resources, and in the opposite
order that you acquired them:
untie %hash; # must close database before lockfile close DBLOCK; # safe to let go of lock now
If you have the GNU DBM library installed, you can
use the standard GDBM_File
module's implicit
locking. Unless the initial tie
contains the
GDBM_NOLOCK
flag, the library makes sure that
only one writer may open a GDBM file at a time, and that readers and
writers do not have the database open at the same time.
Whenever you create a child process using
fork
, that new process inherits all its parent's
open filehandles. Using filehandles for interprocess communication
is easiest to illustrate by using plain files first. Understanding
how this works is essential for mastering the fancier mechanisms of
pipes and sockets described later in this chapter.
The simplest example opens a file and starts up a child process. The child then uses the filehandle already opened for it:
open(INPUT, "< /etc/motd") or die "/etc/motd: $!"; if ($pid = fork) { waitpid($pid,0) } else { defined($pid) or die "fork: $!"; while (<INPUT>) { print "$.: $_" } exit; # don't let child fall back into main code } # INPUT handle now at EOF in parent
Once access to a file has been granted by
open
, it stays granted until the filehandle is
closed; changes to the file's permissions or to the owner's access
privileges have no effect on accessibility. Even if the process
later alters its user or group IDs, or the file has its ownership
changed to a different user or group, that doesn't affect
filehandles that are already open. Programs running under increased
permissions (like set-id programs or systems daemons) often open a
file under their increased rights and then hand off the filehandle
to a child process that could not have opened the file on its
own.
Although this feature is of great convenience when
used intentionally, it can also create security issues if
filehandles accidentally leak from one program to the next. To avoid
granting implicit access to all possible filehandles, Perl
automatically closes any filehandles it has opened (including pipes
and sockets) whenever you explicitly exec
a new
program or implicitly execute one through a call to a piped
open
, system
, or
qx//
(backticks). The system filehandles
STDIN
, STDOUT
, and
STDERR
are exempt from this because their main
purpose is to provide communications linkage between programs. So
one way of passing a filehandle to a new program is to copy the
filehandle to one of the standard filehandles:
open(INPUT, "< /etc/motd") or die "/etc/motd: $!"; if ($pid = fork) { wait } else { defined($pid) or die "fork: $!"; open(STDIN, "<&INPUT") or die "dup: $!"; exec("cat", "-n") or die "exec cat: $!"; }
If you really want the new program to gain access to
a filehandle other than these three, you can, but you have to do one
of two things. When Perl opens a new file (or pipe or socket), it
checks the current setting of the $^F
($SYSTEM_FD_MAX
) variable. If the numeric file
descriptor used by that new filehandle is greater than
$^F
, the descriptor is marked as one to close.
Otherwise, Perl leaves it alone, and new programs you
exec
will inherit access.
It's not always easy to predict what file descriptor
your newly opened filehandle will have, but you can temporarily set
your maximum system file descriptor to some outrageously high number
for the duration of the open
:
# open file and mark INPUT to be left open across execs { local $^F = 10_000; open(INPUT, "< /etc/motd") or die "/etc/motd: $!"; } # old value of $^F restored on scope exit
Now all you have to do is get the new program to pay attention
to the descriptor number of the filehandle you just opened. The
cleanest solution (on systems that support this) is to pass a
special filename that equates to a file descriptor. If your system
has a directory called /dev/fd or
/proc/$$/fd containing files numbered from 0
through the maximum number of supported descriptors, you can
probably use this strategy. (Many Linux operating systems have both,
but only the /proc version tends to be
correctly populated. BSD and Solaris prefer
/dev/fd. You'll have to poke around at your
system to see which looks better for you.) First, open and mark your
filehandle as one to be left open across exec
s as
shown in the previous code, then fork like this:
if ($pid = fork) { wait } else { defined($pid) or die "fork: $!"; $fdfile = "/dev/fd/" . fileno(INPUT); exec("cat", "-n", $fdfile) or die "exec cat: $!"; }
If your system supports the fcntl
syscall, you may diddle the filehandle's close-on-exec flag
manually. This is convenient for those times when you didn't realize
back when you created the filehandle that you would want to share it
with your children.
use Fcntl qw/F_SETFD/; fcntl(INPUT, F_SETFD, 0) or die "Can't clear close-on-exec flag on INPUT: $! ";
You can also force a filehandle to close:
fcntl(INPUT, F_SETFD, 1) or die "Can't set close-on-exec flag on INPUT: $! ";
You can also query the current status:
use Fcntl qw/F_SETFD F_GETFD/; printf("INPUT will be %s across execs ", fcntl(INPUT, F_GETFD, 1) ? "closed" : "left open");
If your system doesn't support file descriptors named in the
filesystem, and you want to pass a filehandle other than
STDIN
, STDOUT
, or
STDERR
, you can still do so, but you'll have to
make special arrangements with that program. Common strategies for
this are to pass the descriptor number through an environment
variable or a command-line option.
If the executed program is in Perl, you can use
open
to convert a file descriptor into a
filehandle. Instead of specifying a filename, use
"&=
" followed by the descriptor
number.
if (defined($ENV{input_fdno}) && $ENV{input_fdno}) =~ /^d$/) { open(INPUT, "<&=$ENV{input_fdno}") or die "can't fdopen $ENV{input_fdno} for input: $!"; }
It gets even easier than that if you're going to be running a
Perl subroutine or program that expects a filename argument. You can
use the descriptor-opening feature of Perl's regular
open
function (but not sysopen
or three-argument open
) to make this happen
automatically. Imagine you have a simple Perl program like
this:
#!/usr/bin/perl -p # nl - number input lines printf "%6d ", $.;
Presuming you've arranged for the INPUT
handle to stay open across exec
s, you can call
that program this way:
$fdspec = '<&=' . fileno(INPUT); system("nl", $fdspec);
or to catch the output:
@lines = `nl '$fdspec'`; # single quotes protect spec from shell
Whether or not you exec
another program, if
you use file descriptors inherited across fork
,
there's one small gotcha. Unlike variables copied across a
fork
, which actually get duplicate but
independent copies, file descriptors really
are the same in both processes. If one process
reads data from the handle, the seek pointer (file position)
advances in the other process, too, and that data is no longer
available to either process. If they take turns reading, they'll
leapfrog over each other in the file. This makes intuitive sense for
handles attached to serial devices, pipes, or sockets, since those
tend to be read-only devices with ephemeral data. But this behavior
may surprise you with disk files. If this is a problem, reopen any
files that need separate tracking after the fork.
The fork
operator is a concept
derived from Unix, which means it might not be implemented correctly
on all non-Unix/non-POSIX platforms. Notably,
fork
works on Microsoft systems only if you're
running Perl 5.6 (or better) on Windows 98 (or later). Although
fork
is implemented via multiple concurrent
execution streams within the same program on these systems, these
aren't the sort of threads where all data is shared by default;
here, only file descriptors are. See also Chapter 17.
[6] Presuming that a process can have a personal end.
[7] Actually, locks aren't on filehandles—they're on the file
descriptors associated with the filehandles since the operating
system doesn't know about filehandles. That means that all our
die
messages about failing to get a lock on
filenames are technically inaccurate. But error messages of the
form "I can't get a lock on the file represented by the file
descriptor associated with the filehandle originally opened to
the path filename, although by now
filename may represent a different file
entirely than our handle does" would just confuse the user (not
to mention the reader).