The IPC mechanisms discussed earlier all have one severe restriction: they're designed for communication between processes running on the same computer. (Even though files can sometimes be shared across machines through mechanisms like NFS, locking fails miserably on many NFS implementations, which takes away most of the fun of concurrent access.) For general-purpose networking, sockets are the way to go. Although sockets were invented under BSD, they quickly spread to other forms of Unix, and nowadays you can find a socket interface on nearly every viable operating system out there. If you don't have sockets on your machine, you're going to have tremendous difficulty using the Internet.
With sockets, you can do both virtual circuits (as TCP streams) and datagrams (as UDP packets). You may be able to do even more, depending on your system. But the most common sort of socket programming uses TCP over Internet-domain sockets, so that's the kind we cover here. Such sockets provide reliable connections that work a little bit like bidirectional pipes that aren't restricted to the local machine. The two killer apps of the Internet, email and web browsing, both rely almost exclusively on TCP sockets.
You also use UDP heavily without knowing it. Every time your machine tries to find a site on the Internet, it sends UDP packets to your DNS server asking it for the actual IP address. You might use UDP yourself when you want to send and receive datagrams. Datagrams are cheaper than TCP connections precisely because they aren't connection oriented; that is, they're less like making a telephone call and more like dropping a letter in the mailbox. But UDP also lacks the reliability that TCP provides, making it more suitable for situations where you don't care whether a packet or two gets folded, spindled, or mutilated. Or for when you know that a higher-level protocol will enforce some degree of redundancy or fail-softness (which is what DNS does.)
Other choices are available but far less common. You can use Unix-domain sockets, but they only work for local communication. Various systems support various other non-IP-based protocols. Doubtless these are somewhat interesting to someone somewhere, but we'll restrain ourselves from talking about them somehow.
The Perl functions that deal with sockets have the same names as the corresponding syscalls in C, but their arguments tend to differ for two reasons: first, Perl filehandles work differently from C file descriptors; and second, Perl already knows the length of its strings, so you don't need to pass that information. See Chapter 29 for details on each socket-related syscall.
One problem with ancient socket code in Perl was that
people would use hard-coded values for constants passed into socket
functions, which destroys portability. Like most syscalls, the
socket-related ones quietly but politely return
undef
when they fail, instead of raising an
exception. It is therefore essential to check these functions' return
values, since if you pass them garbage, they aren't going to be very
noisy about it. If you ever see code that does anything like
explicitly setting $AF_INET = 2
, you know you're in
for big trouble. An immeasurably superior approach is to use the
Socket
module or the even friendlier
IO::Socket
module, both of which are standard.
These modules provide various constants and helper functions you'll
need for setting up clients and servers. For optimal success, your
socket programs should always start out like this (and don't forget to
add the -T
taint-checking switch to the shebang
line for servers):
#!/usr/bin/perl -w use strict; use sigtrap; use Socket; # or IO::Socket
As noted elsewhere, Perl is at the mercy of your C libraries for much of its system behavior, and not all systems support all sorts of sockets. It's probably safest to stick with normal TCP and UDP socket operations. For example, if you want your code to stand a chance of being portable to systems you haven't thought of, don't expect there to be support for a reliable sequenced-packet protocol. Nor should you expect to pass open file descriptors between unrelated processes over a local Unix-domain socket. (Yes, you can really do that on many Unix machines--see your local recvmsg (2) manpage.)
If you just want to use a standard Internet service
like mail, news, domain name service, FTP, Telnet, the Web, and so on,
then instead of starting from scratch, try using existing CPAN modules
for these. Prepackaged modules designed for these include
Net::SMTP
(or Mail::Mailer
),
Net::NNTP
, Net::DNS
,
Net::FTP
, Net::Telnet
, and the
various HTTP-related modules. The libnet
and
libwww
module suites both comprise many individual
networking modules. Module areas on CPAN you'll want to look at are
section 5 on Networking and IPC, section 15 on WWW-related modules,
and section 16 on Server and Daemon Utilities.
In the sections that follow, we present several sample clients and servers without a great deal of explanation of each function used, as that would mostly duplicate the descriptions we've already provided in Chapter 29.
Use Internet-domain sockets when you want reliable client-server communication between potentially different machines.
To create a TCP client that connects to a server
somewhere, it's usually easiest to use the standard
IO::Socket::INET
module:
use IO::Socket::INET; $socket = IO::Socket::INET->new(PeerAddr => $remote_host, PeerPort => $remote_port, Proto => "tcp", Type => SOCK_STREAM) or die "Couldn't connect to $remote_host:$remote_port : $! "; # send something over the socket, print $socket "Why don't you call me anymore? "; # read the remote answer, $answer = <$socket>; # and terminate the connection when we're done. close($socket);
A shorthand form of the call is good enough when you just have a host and port combination to connect to, and are willing to use defaults for all other fields:
$socket = IO::Socket::INET->new("www.yahoo.com:80") or die "Couldn't connect to port 80 of yahoo: $!";
To connect using the basic Socket
module:
use Socket; # create a socket socket(Server, PF_INET, SOCK_STREAM, getprotobyname('tcp')); # build the address of the remote machine $internet_addr = inet_aton($remote_host) or die "Couldn't convert $remote_host into an Internet address: $! "; $paddr = sockaddr_in($remote_port, $internet_addr); # connect connect(Server, $paddr) or die "Couldn't connect to $remote_host:$remote_port: $! "; select((select(Server), $| = 1)[0]); # enable command buffering # send something over the socket print Server "Why don't you call me anymore? "; # read the remote answer $answer = <Server>; # terminate the connection when done close(Server);
If you want to close only your side of the
connection, so that the remote end gets an end-of-file, but you can
still read data coming from the server, use the
shutdown
syscall for a half-close:
# no more writing to server shutdown(Server, 1); # Socket::SHUT_WR constant in v5.6
Here's a corresponding server to go along with it.
It's pretty easy with the standard
IO::Socket::INET
class:
use IO::Socket::INET; $server = IO::Socket::INET->new(LocalPort => $server_port, Type => SOCK_STREAM, Reuse => 1, Listen => 10 ) # or SOMAXCONN or die "Couldn't be a tcp server on port $server_port: $! "; while ($client = $server->accept()) { # $client is the new connection } close($server);
You can also write that using the lower-level
Socket
module:
use Socket; # make the socket socket(Server, PF_INET, SOCK_STREAM, getprotobyname('tcp')); # so we can restart our server quickly setsockopt(Server, SOL_SOCKET, SO_REUSEADDR, 1); # build up my socket address $my_addr = sockaddr_in($server_port, INADDR_ANY); bind(Server, $my_addr) or die "Couldn't bind to port $server_port: $! "; # establish a queue for incoming connections listen(Server, SOMAXCONN) or die "Couldn't listen on port $server_port: $! "; # accept and process connections while (accept(Client, Server)) { # do something with new Client connection } close(Server);
The client doesn't need to bind
to any
address, but the server does. We've specified its address as
INADDR_ANY
, which means that clients can connect
from any available network interface. If you want to sit on a
particular interface (like the external side of a gateway or
firewall machine), use that interface's real address instead.
(Clients can do this, too, but rarely need to.)
If you want to know which machine connected to you,
call getpeername
on the client connection. This
returns an IP address, which you'll have to translate into a name on
your own (if you can):
use Socket; $other_end = getpeername(Client) or die "Couldn't identify other end: $! "; ($port, $iaddr) = unpack_sockaddr_in($other_end); $actual_ip = inet_ntoa($iaddr); $claimed_hostname = gethostbyaddr($iaddr, AF_INET);
This is trivially spoofable because the owner of that IP address can set up their reverse tables to say anything they want. For a small measure of additional confidence, translate back the other way again:
@name_lookup = gethostbyname($claimed_hostname) or die "Could not reverse $claimed_hostname: $! "; @resolved_ips = map { inet_ntoa($_) } @name_lookup[ 4 .. $#name_lookup ]; $might_spoof = !grep { $actual_ip eq $_ } @resolved_ips;
Once a client connects to your server, your server
can do I/O both to and from that client handle. But while the server
is so engaged, it can't service any further incoming requests from
other clients. To avoid getting locked down to just one client at a
time, many servers immediately fork
a clone of
themselves to handle each incoming connection. (Others
fork
in advance, or multiplex I/O between several
clients using the select
syscall.)
REQUEST: while (accept(Client, Server)) { if ($kidpid = fork) { close Client; # parent closes unused handle next REQUEST; } defined($kidpid) or die "cannot fork: $!" ; close Server; # child closes unused handle select(Client); # new default for prints $| = 1; # autoflush # per-connection child code does I/O with Client handle $input = <Client>; print Client "output "; # or STDOUT, same thing open(STDIN, "<<&Client") or die "can't dup client: $!"; open(STDOUT, ">&Client") or die "can't dup client: $!"; open(STDERR, ">&Client") or die "can't dup client: $!"; # run the calculator, just as an example system("bc -l"); # or whatever you'd like, so long as # it doesn't have shell escapes! print "done "; # still to client close Client; exit; # don't let the child back to accept! }
This server clones off a child with
fork
for each incoming request. That way it can
handle many requests at once, as long as you can create more
processes. (You might want to limit this.) Even if you don't
fork
, the listen
will allow up
to SOMAXCONN
(usually five or more) pending
connections. Each connection uses up some resources, although not as
much as a process. Forking servers have to be careful about cleaning
up after their expired children (called "zombies" in Unix-speak)
because otherwise they'd quickly fill up your process table. The
REAPER
code discussed in Section 16.1 will take care of
that for you, or you may be able to assign $SIG{CHLD} =
'IGNORE
'.
Before running another command, we connect the
standard input and output (and error) up to the client connection.
This way any command that reads from STDIN
and
writes to STDOUT
can also talk to the remote
machine. Without the reassignment, the command couldn't find the
client handle--which by default gets closed across the
exec
boundary, anyway.
When you write a networking server, we strongly
suggest that you use the -T
switch to enable
taint checking even if you aren't running setuid or setgid. This is
always a good idea for servers and any other program that runs on
behalf of someone else (like all CGI scripts), because it lessens
the chances that people from the outside will be able to compromise
your system. See the section Section 23.1 in Chapter 23 for much more about all
this.
One additional consideration when writing Internet
programs: many protocols specify that the line terminator should be
CRLF
, which can be specified various ways:
"