Although distributed Erlang might be a first step in allowing programs on remote machines to communicate with each other, we sometimes have to rely on lower-level mechanisms and standardized protocols. Sockets allow programs written in any language to exchange data on different computers by exchanging byte streams transmitted using the protocols of the Internet Protocol (IP) Suite.
Whereas sockets are used to create a byte-oriented communication stream between programs possibly running on different machines, ports, which we cover in the next chapter, will do the same for programs running on the same machine. Byte streams, which in Erlang can be viewed as either binaries or integer lists, often follow standards and application-level protocols that allow programs written independently of each other to interact with each other.
Examples of socket-based communication include communication between web browsers and servers, instant messaging (IM) clients, email servers and clients, and peer-to-peer applications. The Erlang distribution itself is based on nodes communicating with each other through sockets.
Erlang can hide the raw packets from the user, providing user-friendly
APIs to User Datagram Protocol (UDP) and Transmission Control Protocol
(TCP). These are contained in the two library modules gen_udp
, a connectionless, less reliable,
packet-based communication protocol, and gen_tcp
, which provides a connection-oriented
communication channel. Both of these protocols communicate over IP.
User Datagram Protocol (UDP) is a connectionless protocol. If a UDP packet is sent, and a socket happens to be listening on the other end, it will pick up the packet. UDP provides little error recovery, leaving it up to the application to ensure packet reception and consistency. UDP packets could take different routes, and as a result could be received in a different order from which they were sent. They can also be lost en route, and as the receiving end does not acknowledge their arrival, their loss happens “silently.” Although the protocol might not be reliable, the overhead of using it is small, making it ideal for transmissions in which you would rather drop a packet than wait for it to be re-sent. For example, errors and alarms are often broadcast in the hope that a socket on the other end picks them up.
In Erlang, UDP is implemented in the gen_udp
module. Let’s
get acquainted with it through an example. Start two Erlang nodes on the
same host and make sure you execute the commands in the following
order:
In the first Erlang node, open a UDP socket on port 1234.
In the second Erlang node, open a UDP socket on port 1235.
Use the socket in the second node to send the binary <<"Hello World">>
to the
listening socket 1234 on the local host IP address 127.0.0.1.
Use the socket in the second node to send the string "Hello World"
to the same IP address and
listening socket.
In the first node, the process that opened (and owns) the socket
should have received both of the "Hello
World"
messages. Retrieve them using the flush()
shell
command.
Close both sockets and thus free the port numbers.
In the Erlang shell on the first node, the commands and output would look like this:
1>{ok, Socket} = gen_udp:open(1234).
{ok,#Port<0.576>} 2>flush().
Shell got {udp,#Port<0.576>,{127,0,0,1},1235,"Hello World"} Shell got {udp,#Port<0.576>,{127,0,0,1},1235,"Hello World"} ok 3>gen_udp:close(Socket).
ok
You should keep in mind that once you’ve opened the socket, you need to send messages from the second node to the first. In the Erlang shell on the second node, the commands would look like this:
1>{ok, Socket} = gen_udp:open(1235).
{ok,#Port<0.203>} 2>gen_udp:send(Socket, {127,0,0,1}, 1234, <<"Hello World">>).
ok 3>gen_udp:send(Socket, {127,0,0,1}, 1234, "Hello World").
ok 4>gen_udp:close(Socket).
ok
Play special attention to the format of the UDP messages sent to the process that owns the socket, and the fact that it receives both messages as lists, even if the first message was sent as a binary. We will explain all of this when we look at the functions involved in more detail.
If you are trying the example on separate computers, you should replace the local host IP address with the address of the computer to which you want to send messages, and ensure that neither firewall is blocking the relevant ports.
As you can see in Figure 15-1, clients on other hosts send their UDP packets to a listener socket which forwards them to an Erlang process. At any one time, only one process is allowed to receive packets from a particular socket. This process is called the controlling process.
To open a socket, on both the client and the server side, you use the following function calls:
gen_udp:open(Port) gen_udp:open(Port, OptionList)
The Port
is an integer denoting
the listening port number of the socket. It is used by clients who need to
send messages to the socket. The OptionList
contains configuration options which
allow you to override the default values. The most useful parameters
include:
list
Forwards all messages in the packet as a list of integers, regardless of how they are sent. It is the default value if no option is chosen.
binary
Forwards all messages in the packet as a binary.
{header
,
Size}
Can be used if packets are being received as binaries. It
splits the message into a list of size Size
, the header, and the message (a
binary). This option was particularly useful before the introduction
of bit syntax and pattern matching on binaries, as described in
Chapter 9.
Repeating the preceding two-node UDP example, but with the first
socket opened using the following,
{ok, Socket} = gen_udp:open(1234,[binary,{header,2}]).
and sending [0,10|"Hello
World"]
will result in the first message being received as
follows:
2> flush().
Shell got {udp,#Port<0.439>,{127,0,0,1},1235,[0,10|<<"Hello World">>]}
ok
In the preceding code, the message is split into the
(two-integer) header and the message. {active
,
true}
ensures that all the messages
received from the socket are forwarded to the process that owns the
socket as Erlang messages of the form {udp
,
Socket
,
IP
,
PortNo
,
Packet}
. Socket
is the receiving socket, IP
and PortNo
are the IP address and sending
socket number, and Packet
is the
message itself. This active mode is the default
value when opening a socket.
{active
,
false}
Sets the socket to passive mode. Instead
of being sent, messages from the socket have to be retrieved using
the gen_udp:recv/2
and gen_udp:recv/3
calls.
{active
,
once}
Will send the first message it receives to the socket, but
subsequent messages have to be retrieved using the recv
functions.
{ip
,
ip_address()}
Is used when opening a socket on a computer that has several network interfaces defined. This option specifies which of the interfaces the socket should use.
inet6
Will set up the socket for IPv6. inet
will set it up for IPv4, which is
also the default value.
The call to open
returns either
{ok
,
Socket}
or {error
,
Reason}
, where Socket
is the identifier for the socket opened
and Reason
is one of several POSIX
error codes returned as an atom. They are listed in the inet
manual page of the Erlang runtime system
documentation. The most common errors you will come across are eaddrinuse
if the address is already in use,
eaddrnotavail
if you are using a port
in a range your OS has reserved, and eacces
if you don’t have permission to open the
socket.
The gen_udp:close(Socket)
call closes the socket and frees the port number allocated to it. It
returns the atom ok
.
If you want to send messages, you use the following function:
gen_udp:send(Socket, Address, Port, Packet)
The Socket
is the UDP socket on
the local machine from which the message is to be sent. The Address
can be entered as a string containing
the hostname or IP address, an atom containing the local hostname, or a
tuple containing the integers making up the IP address. The Port
is the port number on the receiving host,
and the Packet
is the content of the
message, as a sequence of bytes, which can be either a list of integers or
a binary.
When the socket is opened in passive mode, the connected process has to explicitly retrieve the packet from the socket using these function calls:
gen_udp:recv(Socket, Length) gen_udp:recv(Socket, Length, Timeout)
Length
is relevant only to the
raw transmission mode in TCP, and so it is ignored in this case. If a
packet has been received within the timeout, {ok
,
{Ip
,
PortNo
,
Packet}}
is returned. If the bytes are not
received within Timeout
milliseconds
{error
,
timeout}
will be returned. If the receiving
process calls gen_udp:recv
when not in
passive mode, expect to see the {error
,
einval}
error, which is the POSIX error
code denoting an invalid argument.
The most common use of UDP is in the implementation of Simple Network Management Protocol (SNMP). SNMP is a standard often used to monitor devices and systems across IP-based networks. You can read more about the Erlang SNMP application in the documentation provided with the runtime system.
Transmission Control Protocol, or TCP for short, is a connection-oriented protocol allowing peers to exchange streams of data. Unlike UDP, with TCP package reception is guaranteed and packages are received in the same order they are sent. Common uses of TCP include HTTP requests, peer-to-peer applications, and IM client/server connections. Erlang distribution is built on top of TCP. Just as with UDP, neither the client nor the server has to be implemented in Erlang.
On an architectural level, the main difference between TCP and UDP is that once you’ve opened a socket connection using TCP, it is kept open until either side closes it or it terminates because of an error. When setting up a connection, you would often spawn a new process for every request, keeping it alive for as long as the request is being handled.
How does this work in practice? Say you have a listener process whose task is to wait for incoming TCP requests. As soon as a request comes in, the process that acknowledges the connection request becomes the accept process. There are two mechanisms for defining the accept process:
The first option is to spawn a new process which becomes the accept process, while the listener goes back and listens for a new connection request.
The second option, as shown in Figure 15-2, is to make the listener process the accept process, and spawn a new process which becomes the new listener.
If the socket is opened in active mode, the process that owns the
socket will receive messages of the form {tcp
,
Socket
, Packet}
where Socket
is the receiving socket and Packet
is the message itself.
If you are working in passive mode, just like with UDP, you need to use the following:
gen_tcp:recv(Socket, Length) gen_tcp:recv(Socket, Length, Timeout)
The call will return a tuple of the format {ok
,
Packet}
. In these calls, a nonzero value of
Length
denotes the number of bytes the
socket will wait for before returning the message. If the value is 0,
everything available is returned. If the sender socket is closed and fewer
than Length
bytes have been buffered,
they are discarded. The Length
option
is relevant only if the packet type is raw.
Using passive mode is a good way to ensure that your system does not get flooded with requests. It is a common design pattern to spawn a new process that handles the request for each message received. In extreme cases under heavy sustained traffic, the virtual machine risks running out of memory as the system gets flooded by requests (and hence, processes). By using sockets in passive mode, the underlying TCP buffer can be used to throttle the requests and reject messages on the client side. The best way to know whether you need to throttle on the TCP level and whether memory is an issue during traffic bursts is through extensive stress testing of your system.
Let’s start with a simple example of how you can use TCP sockets. The client, given a host and a binary, opens a socket connection on port 1234. Using the bit syntax, it breaks the binary into chunks of 100 bytes and sends them over in separate packets.
client(Host, Data) -> {ok, Socket} = gen_tcp:connect(Host, 1234, [binary, {packet, 0}]), send(Socket, Data), ok = gen_tcp:close(Socket).
You might recall from the description of binaries in Chapter 9 that the
expression <<Chunk:100/binary
,
Rest/binary>>
will bind the
first 100 bytes of the binary to Chunk
and what remains to Rest
. When the binary contains fewer than 100
bytes, pattern matching on the first clause of the send/2
call will fail. Whatever remains of the
possibly empty binary will match the second clause, and so its contents
are sent to the server, after which the Socket
connection is closed.
send(Socket, <<Chunk:100/binary, Rest/binary>>) -> gen_tcp:send(Socket, Chunk), send(Socket, Rest); send(Socket, Rest) -> gen_tcp:send(Socket, Rest).
The server side has a listener process waiting for a client connection. When the request arrives, the listener process becomes the accept process and is ready to receive binaries in passive mode. A new listener process is spawned and waits for the next connection request. The accept process continues receiving data from the client, appending it to a list until the socket is closed, after which it saves the data to a file.
server() -> {ok, ListenSocket} = gen_tcp:listen(1234, [binary, {active, false}]), wait_connect(ListenSocket,0). wait_connect(ListenSocket, Count) -> {ok, Socket} = gen_tcp:accept(ListenSocket), spawn(?MODULE, wait_connect, [ListenSocket, Count+1]), get_request(Socket, [], Count). get_request(Socket, BinaryList, Count) -> case gen_tcp:recv(Socket, 0, 5000) of {ok, Binary} -> get_request(Socket, [Binary|BinaryList], Count); {error, closed} -> handle(lists:reverse(BinaryList), Count) end. handle(Binary, Count) -> {ok, Fd} = file:open("log_file_"++integer_to_list(Count), write), file:write(Fd, Binary), file:close(Fd).
Note how the get_request/3
function
receives the binary chunks in batches of 100 bytes. Once all chunks have
been received and the socket is closed, you need to reverse the list in
which you stored them, as the first chunk you should be writing is now
the last element of the list. You write the chunks to a file, and when
done, you close the socket, releasing the file descriptors.
To run the example, all you need to do is start the server
using tcp:start()
and the
client using the following:
tcp:client({127,0,0,1}, <<"Hello Concurrent World">>).
You can see that many of the commands are similar to the ones we used in the earlier UDP example. The major difference is the following call:
gen_tcp:listen(PortNumber, Options)
This starts a listener socket, which then waits for incoming
connections. The call takes the same options as the call to gen_udp:open/2
described earlier, as well as the following TCP-specific
ones:
{active
,
true}
Ensures that all messages received from the socket are forwarded as Erlang messages to the process that owns the socket. This active mode is the default value when opening a socket.
{active
,
false}
Sets the socket to passive mode. Messages received from the
socket are buffered, and the process must retrieve them through
the gen_tcp:recv/2
and gen_tcp:recv/3
calls.
{active
,
once}
Will set the socket to active mode, but as soon as the first
message is received, it sets it to passive mode so that subsequent
messages have to be retrieved using the recv
functions.
{keepalive
,
true}
Ensures that the connected socket sends keepalive messages when no data is being transferred. As “close socket” messages can be lost, this option ensures that the socket is closed if no response to the keepalive is received. By default, the flag is turned off.
{nodelay
,
true}
Will result in the socket immediately sending the package, no matter how small. By default, this option is turned off and data is instead aggregated and sent in larger chunks.
{packet_size
,
Integer}
Sets the maximum allowed length of the body. If packets are
larger than Size
, the packet is
considered invalid.
There are other flags, all of which you can read about in the
manual pages of the gen_tcp
and
inet
modules.
The gen_tcp:listen/2
call
returns immediately. It returns a socket identifier, Socket
, which is passed to the following
functions:
gen_tcp:accept(Socket) gen_tcp:accept(Socket, TimeOut)
These calls suspend the process until a request to connect is made
to that socket on that IP address. TimeOut
is a value in milliseconds resulting
in {error
,
timeout}
being returned if no attempt
is made to connect to that port. Connections are requested through the
following call:
gen_tcp:connect(Address, Port, OptionList)
The Address
is the IP address
of the machine to which you are connecting, and Port
is the port number of the corresponding
socket. The OptionList
is similar to
the one defined in the gen_tcp:listen/2
call, containing
the gen_udp:open/2
options
together with the TCP-specific keepalive
, nodelay
, and packet_size
discussed earlier.
As the socket in the example is running in passive mode, you
retrieve the socket messages using calls to the functions gen_tcp:recv/1
and
gen_tcp:recv/2
. Had the sockets been
running in active mode, messages would have been sent to the process in
the format {tcp
,
Socket
,
Packet}
and {tcp_error
,
Socket
,
Reason}
.
You close the socket using the gen_tcp:close(Socket)
call. This can be made on either the client or the server side. In
either case, the {tcp_closed
,
Socket}
message will be sent to the socket on
the other side, effectively closing the socket.
The controlling process is generally the process that established
a connection through calling one of gen_tcp:accept
or
gen_tcp:connect
. To redirect messages
elsewhere and pass the control to
another process, the controlling process has to call gen_tcp:controlling_process(Socket,
Pid)
.
In our previous example, the process calling gen_tcp:accept
becomes the controlling
process, and we spawned a new listener process. If instead we were to
spawn a new process that would become the controlling process, with the
listener process remaining the same, the code would look like
this:
server() -> {ok, ListenSocket} = gen_tcp:listen(1234, [binary, {active, false}]), wait_connect(ListenSocket,0). wait_connect(ListenSocket, Count) -> {ok, Socket} = gen_tcp:accept(ListenSocket), Pid = spawn(?MODULE, get_request, [Socket, [], Count]), gen_tcp:controlling_process(Socket, Pid), wait_connect(ListenSocket, Count+1).
In recent Erlang/OTP releases, it is possible to have multiple acceptors against the same listener socket. This could be expected to give better throughput than spawning a new acceptor each time. We leave this modification of the example as an exercise for you!
The inet
module contains generic functions that will work with sockets
regardless of whether you are using TCP or UDP. They provide generic
access to the sockets as well as useful library functions. Without going
into too much detail about what is available, in this section we will
demonstrate the most commonly used functions by showing their use in the
shell. If you need more information, you can look it up in the inet
module’s manual page. The manual page also
contains all of the POSIX error definitions the socket operations will
return.
If you need to change your socket options once you’ve started your
socket, you would use the call inet:setopts(Socket
,
OptionList)
, where OptionList
is a list of tagged tuples containing
the options described in this chapter together with other, less frequently
used ones listed in the inet
module’s
manual page.
To retrieve the configuration parameters of an existing socket, you
would use inet:getopts(Socket
,
Options)
where Options
is a list of atoms denoting the option
values you are interested in retrieving. The function returns a tagged
list where, if the underlying operating system or the socket type you are
using does not support that particular option, it will be omitted from the
result.
1>{ok, Socket} = gen_udp:open(1234).
{ok,#Port<0.468>} 2>inet:getopts(Socket, [active, exit_on_close, header, nodelay]).
{ok,[{active,true},{exit_on_close,true},{header,0}]}
Sockets will gather statistics about the data they send and receive.
Received counters are prefixed with recv_
, and sent
counters with send_
. They can be retrieved for the
following packets:
avg
The average size of the packets
cnt
The number of packets that have been sent or received
dvi
The packet size deviation of bytes sent or received by the socket
max
The size of the largest package
oct
The number of bytes sent or received by the socket
In this example, our UDP socket receives four packets and sends none. The output is:
3>flush().
Shell got {udp,#Port<0.468>,{127,0,0,1},1235,"Hello World"} Shell got {udp,#Port<0.468>,{127,0,0,1},1235,"Hello World"} Shell got {udp,#Port<0.468>,{127,0,0,1},1235,"Hello World"} Shell got {udp,#Port<0.468>,{127,0,0,1},1235,"Hello World"} ok 4>inet:getstat(Socket).
{ok,[{recv_oct,44}, {recv_cnt,4}, {recv_max,11}, {recv_avg,11}, {recv_dvi,0}, {send_oct,0}, {send_cnt,0}, {send_max,0}, {send_avg,0}, {send_pend,0}]}
Some of the functions you might find useful and should try in the
shell follow. Some of them will return the hostent
record, defined in the inet.hrl include file. Remember that you can
load record definitions using the shell command rr("../lib/kernel-2.13/include/inet.hrl")
.
inet:peername(Socket). inet:gethostname(). inet:getaddr(Host, Family). inet:gethostbyaddr(Address). inet:gethostbyname(Name).
Finally, a useful command to know, especially if you are having
problems trying to open, send, or receive data from a socket, is inet:i()
. It lists all
TCP and UDP sockets, including those that the Erlang runtime system uses
as well as those you have created.
In our example, we start a distributed Erlang node. Running the
command shows us two sockets—the TCP listener socket waiting for inbound
connections, and a socket connected to the epmd
port mapper
daemon:
(bar@Vaio)1> inet:i().
Port Module Recv Sent Owner Local Address Foreign Address State
108 inet_tcp 0 0 <0.62.0> *:54843 *:* ACCEPTING
110 inet_tcp 4 18 <0.60.0> localhost:54844 localhost:4369 CONNECTED
Port Module Recv Sent Owner Local Address Foreign Address State
ok
This chapter covered the low-level mechanisms on which to build more complex protocols and layers. The Inets application, which comes as part of the OTP distribution, is a container for IP-based protocol implementations. It includes a web server called Inets as well as HTTP and FTP clients. It also has a Trivial File Transfer Protocol (TFTP) client and server. For more information on the Inets application, refer to its user guide and reference manual.
A part of distributed Erlang is the Secure Sockets Layer (SSL) application, providing encrypted communication over sockets. Erlang’s SSL application is based on the open source OpenSSL toolkit. You can read more about this application in the user guides and manuals that come with the Erlang distribution.
If you are interested in reading more about other Internet Protocol implementations, two good books are Internet Core Protocols by Eric Hall (O’Reilly) and TCP Illustrated by W. Richard Stevens (Addison-Wesley Professional Computing Series).
Open a listener socket on your local machine. Start your web browser and send it a request for a web page. Print the contents of the request and study them. How long before the socket connection is closed? What happens if you shut down your browser?
Change your browser proxy settings to point to your local machine on port 1500.[35] Start a listener socket on that port, and accept any connection coming to it. From your web browser, try to download any web page. The request should be forwarded to your socket connection. Sniff the contents of the request and extract the URL of the web page your browser is trying to load.
Using the HTTP client from the Inets application, retrieve the
contents of the page you are trying to load and send it unchanged to the
open socket connection. Hint: if you are not behind a proxy or firewall,
http:start()
and http:request("
http://www.erlang.org
")
should do the job. Before using them,
however, ensure that you read through the HTTP manual pages that come
with the Erlang distribution.
Write a module that contains code for a peer-to-peer transport
layer. You will need a process which, when started, either waits for a
socket connection to come in on port 1234 or waits for the
function peer:connect(IpAddress)
to be called. If the
latter is called, it will try to connect to port 1234 on that address.
Once the connection has been established, you should be able to use the
function peer:send(String)
to
send data to your peer. Log what is sent to file and print it to the
shell. The functions you should export are:
peer:start() -> ok | {error, already_started} peer:connect(IpAddress) -> ok | {error, Reason} peer:send(String) -> ok | {error, not_connected} peer:stop() -> ok | {error, not_started}
The tricky part of this exercise, which will require some careful thought, is the fact that your process will be speaking to a copy of itself on another machine. By that, we mean both processes will be running the same code base.
If you are worried about Big Brother watching you, you can encrypt
the packets you send using the crypto
module.
[35] Depending on the rights of the user under which you are running your Erlang node, you might not be able to open ports that are either reserved or already taken. If that is the case, pick a higher number.