Chapter 15. Socket Programming

Although distributed Erlang might be a first step in allowing programs on remote machines to communicate with each other, we sometimes have to rely on lower-level mechanisms and standardized protocols. Sockets allow programs written in any language to exchange data on different computers by exchanging byte streams transmitted using the protocols of the Internet Protocol (IP) Suite.

Whereas sockets are used to create a byte-oriented communication stream between programs possibly running on different machines, ports, which we cover in the next chapter, will do the same for programs running on the same machine. Byte streams, which in Erlang can be viewed as either binaries or integer lists, often follow standards and application-level protocols that allow programs written independently of each other to interact with each other.

Examples of socket-based communication include communication between web browsers and servers, instant messaging (IM) clients, email servers and clients, and peer-to-peer applications. The Erlang distribution itself is based on nodes communicating with each other through sockets.

Erlang can hide the raw packets from the user, providing user-friendly APIs to User Datagram Protocol (UDP) and Transmission Control Protocol (TCP). These are contained in the two library modules gen_udp, a connectionless, less reliable, packet-based communication protocol, and gen_tcp, which provides a connection-oriented communication channel. Both of these protocols communicate over IP.

User Datagram Protocol

User Datagram Protocol (UDP) is a connectionless protocol. If a UDP packet is sent, and a socket happens to be listening on the other end, it will pick up the packet. UDP provides little error recovery, leaving it up to the application to ensure packet reception and consistency. UDP packets could take different routes, and as a result could be received in a different order from which they were sent. They can also be lost en route, and as the receiving end does not acknowledge their arrival, their loss happens “silently.” Although the protocol might not be reliable, the overhead of using it is small, making it ideal for transmissions in which you would rather drop a packet than wait for it to be re-sent. For example, errors and alarms are often broadcast in the hope that a socket on the other end picks them up.

In Erlang, UDP is implemented in the gen_udp module. Let’s get acquainted with it through an example. Start two Erlang nodes on the same host and make sure you execute the commands in the following order:

  1. In the first Erlang node, open a UDP socket on port 1234.

  2. In the second Erlang node, open a UDP socket on port 1235.

  3. Use the socket in the second node to send the binary <<"Hello World">> to the listening socket 1234 on the local host IP address 127.0.0.1.

  4. Use the socket in the second node to send the string "Hello World" to the same IP address and listening socket.

  5. In the first node, the process that opened (and owns) the socket should have received both of the "Hello World" messages. Retrieve them using the flush() shell command.

  6. Close both sockets and thus free the port numbers.

In the Erlang shell on the first node, the commands and output would look like this:

1> {ok, Socket} = gen_udp:open(1234).
{ok,#Port<0.576>}
2> flush().
Shell got {udp,#Port<0.576>,{127,0,0,1},1235,"Hello World"}
Shell got {udp,#Port<0.576>,{127,0,0,1},1235,"Hello World"}
ok
3> gen_udp:close(Socket).
ok

You should keep in mind that once you’ve opened the socket, you need to send messages from the second node to the first. In the Erlang shell on the second node, the commands would look like this:

1> {ok, Socket} = gen_udp:open(1235).
{ok,#Port<0.203>}
2> gen_udp:send(Socket, {127,0,0,1}, 1234, <<"Hello World">>).
ok
3> gen_udp:send(Socket, {127,0,0,1}, 1234, "Hello World").
ok
4> gen_udp:close(Socket).
ok

Play special attention to the format of the UDP messages sent to the process that owns the socket, and the fact that it receives both messages as lists, even if the first message was sent as a binary. We will explain all of this when we look at the functions involved in more detail.

If you are trying the example on separate computers, you should replace the local host IP address with the address of the computer to which you want to send messages, and ensure that neither firewall is blocking the relevant ports.

As you can see in Figure 15-1, clients on other hosts send their UDP packets to a listener socket which forwards them to an Erlang process. At any one time, only one process is allowed to receive packets from a particular socket. This process is called the controlling process.

UDP listener sockets
Figure 15-1. UDP listener sockets

To open a socket, on both the client and the server side, you use the following function calls:

gen_udp:open(Port)
gen_udp:open(Port, OptionList)

The Port is an integer denoting the listening port number of the socket. It is used by clients who need to send messages to the socket. The OptionList contains configuration options which allow you to override the default values. The most useful parameters include:

list

Forwards all messages in the packet as a list of integers, regardless of how they are sent. It is the default value if no option is chosen.

binary

Forwards all messages in the packet as a binary.

{header, Size}

Can be used if packets are being received as binaries. It splits the message into a list of size Size, the header, and the message (a binary). This option was particularly useful before the introduction of bit syntax and pattern matching on binaries, as described in Chapter 9. Repeating the preceding two-node UDP example, but with the first socket opened using the following,

{ok, Socket} = gen_udp:open(1234,[binary,{header,2}]).

and sending [0,10|"Hello World"] will result in the first message being received as follows:

2> flush().
Shell got {udp,#Port<0.439>,{127,0,0,1},1235,[0,10|<<"Hello World">>]}
ok

In the preceding code, the message is split into the (two-integer) header and the message. {active, true} ensures that all the messages received from the socket are forwarded to the process that owns the socket as Erlang messages of the form {udp, Socket, IP, PortNo, Packet}. Socket is the receiving socket, IP and PortNo are the IP address and sending socket number, and Packet is the message itself. This active mode is the default value when opening a socket.

{active, false}

Sets the socket to passive mode. Instead of being sent, messages from the socket have to be retrieved using the gen_udp:recv/2 and gen_udp:recv/3 calls.

{active, once}

Will send the first message it receives to the socket, but subsequent messages have to be retrieved using the recv functions.

{ip, ip_address()}

Is used when opening a socket on a computer that has several network interfaces defined. This option specifies which of the interfaces the socket should use.

inet6

Will set up the socket for IPv6. inet will set it up for IPv4, which is also the default value.

The call to open returns either {ok, Socket} or {error, Reason}, where Socket is the identifier for the socket opened and Reason is one of several POSIX error codes returned as an atom. They are listed in the inet manual page of the Erlang runtime system documentation. The most common errors you will come across are eaddrinuse if the address is already in use, eaddrnotavail if you are using a port in a range your OS has reserved, and eacces if you don’t have permission to open the socket.

The gen_udp:close(Socket) call closes the socket and frees the port number allocated to it. It returns the atom ok.

If you want to send messages, you use the following function:

gen_udp:send(Socket, Address, Port, Packet)

The Socket is the UDP socket on the local machine from which the message is to be sent. The Address can be entered as a string containing the hostname or IP address, an atom containing the local hostname, or a tuple containing the integers making up the IP address. The Port is the port number on the receiving host, and the Packet is the content of the message, as a sequence of bytes, which can be either a list of integers or a binary.

When the socket is opened in passive mode, the connected process has to explicitly retrieve the packet from the socket using these function calls:

gen_udp:recv(Socket, Length)
gen_udp:recv(Socket, Length, Timeout)

Length is relevant only to the raw transmission mode in TCP, and so it is ignored in this case. If a packet has been received within the timeout, {ok, {Ip, PortNo, Packet}} is returned. If the bytes are not received within Timeout milliseconds {error, timeout} will be returned. If the receiving process calls gen_udp:recv when not in passive mode, expect to see the {error, einval} error, which is the POSIX error code denoting an invalid argument.

The most common use of UDP is in the implementation of Simple Network Management Protocol (SNMP). SNMP is a standard often used to monitor devices and systems across IP-based networks. You can read more about the Erlang SNMP application in the documentation provided with the runtime system.

Transmission Control Protocol

Transmission Control Protocol, or TCP for short, is a connection-oriented protocol allowing peers to exchange streams of data. Unlike UDP, with TCP package reception is guaranteed and packages are received in the same order they are sent. Common uses of TCP include HTTP requests, peer-to-peer applications, and IM client/server connections. Erlang distribution is built on top of TCP. Just as with UDP, neither the client nor the server has to be implemented in Erlang.

On an architectural level, the main difference between TCP and UDP is that once you’ve opened a socket connection using TCP, it is kept open until either side closes it or it terminates because of an error. When setting up a connection, you would often spawn a new process for every request, keeping it alive for as long as the request is being handled.

How does this work in practice? Say you have a listener process whose task is to wait for incoming TCP requests. As soon as a request comes in, the process that acknowledges the connection request becomes the accept process. There are two mechanisms for defining the accept process:

  • The first option is to spawn a new process which becomes the accept process, while the listener goes back and listens for a new connection request.

  • The second option, as shown in Figure 15-2, is to make the listener process the accept process, and spawn a new process which becomes the new listener.

The listener and accept processes
Figure 15-2. The listener and accept processes

If the socket is opened in active mode, the process that owns the socket will receive messages of the form {tcp, Socket, Packet} where Socket is the receiving socket and Packet is the message itself.

If you are working in passive mode, just like with UDP, you need to use the following:

gen_tcp:recv(Socket, Length)
gen_tcp:recv(Socket, Length, Timeout)

The call will return a tuple of the format {ok, Packet}. In these calls, a nonzero value of Length denotes the number of bytes the socket will wait for before returning the message. If the value is 0, everything available is returned. If the sender socket is closed and fewer than Length bytes have been buffered, they are discarded. The Length option is relevant only if the packet type is raw.

Note

Using passive mode is a good way to ensure that your system does not get flooded with requests. It is a common design pattern to spawn a new process that handles the request for each message received. In extreme cases under heavy sustained traffic, the virtual machine risks running out of memory as the system gets flooded by requests (and hence, processes). By using sockets in passive mode, the underlying TCP buffer can be used to throttle the requests and reject messages on the client side. The best way to know whether you need to throttle on the TCP level and whether memory is an issue during traffic bursts is through extensive stress testing of your system.

A TCP Example

Let’s start with a simple example of how you can use TCP sockets. The client, given a host and a binary, opens a socket connection on port 1234. Using the bit syntax, it breaks the binary into chunks of 100 bytes and sends them over in separate packets.

client(Host, Data) ->
  {ok, Socket} = gen_tcp:connect(Host, 1234, [binary, {packet, 0}]),
  send(Socket, Data),
  ok = gen_tcp:close(Socket).

You might recall from the description of binaries in Chapter 9 that the expression <<Chunk:100/binary, Rest/binary>> will bind the first 100 bytes of the binary to Chunk and what remains to Rest. When the binary contains fewer than 100 bytes, pattern matching on the first clause of the send/2 call will fail. Whatever remains of the possibly empty binary will match the second clause, and so its contents are sent to the server, after which the Socket connection is closed.

send(Socket, <<Chunk:100/binary, Rest/binary>>) ->
  gen_tcp:send(Socket, Chunk),
  send(Socket, Rest);
send(Socket, Rest) ->
  gen_tcp:send(Socket, Rest).

The server side has a listener process waiting for a client connection. When the request arrives, the listener process becomes the accept process and is ready to receive binaries in passive mode. A new listener process is spawned and waits for the next connection request. The accept process continues receiving data from the client, appending it to a list until the socket is closed, after which it saves the data to a file.

server() ->
  {ok, ListenSocket} = gen_tcp:listen(1234, [binary, {active, false}]),
  wait_connect(ListenSocket,0).

wait_connect(ListenSocket, Count) ->
  {ok, Socket} = gen_tcp:accept(ListenSocket),
  spawn(?MODULE, wait_connect, [ListenSocket, Count+1]),
  get_request(Socket, [], Count).

get_request(Socket, BinaryList, Count) ->
  case gen_tcp:recv(Socket, 0, 5000) of
    {ok, Binary} ->
       get_request(Socket, [Binary|BinaryList], Count);
    {error, closed} ->
       handle(lists:reverse(BinaryList), Count)
  end.

handle(Binary, Count) ->
  {ok, Fd} = file:open("log_file_"++integer_to_list(Count), write),
  file:write(Fd, Binary),
  file:close(Fd).

Note how the get_request/3 function receives the binary chunks in batches of 100 bytes. Once all chunks have been received and the socket is closed, you need to reverse the list in which you stored them, as the first chunk you should be writing is now the last element of the list. You write the chunks to a file, and when done, you close the socket, releasing the file descriptors.

To run the example, all you need to do is start the server using tcp:start() and the client using the following:

tcp:client({127,0,0,1}, <<"Hello Concurrent World">>).

You can see that many of the commands are similar to the ones we used in the earlier UDP example. The major difference is the following call:

gen_tcp:listen(PortNumber, Options)

This starts a listener socket, which then waits for incoming connections. The call takes the same options as the call to gen_udp:open/2 described earlier, as well as the following TCP-specific ones:

{active, true}

Ensures that all messages received from the socket are forwarded as Erlang messages to the process that owns the socket. This active mode is the default value when opening a socket.

{active, false}

Sets the socket to passive mode. Messages received from the socket are buffered, and the process must retrieve them through the gen_tcp:recv/2 and gen_tcp:recv/3 calls.

{active, once}

Will set the socket to active mode, but as soon as the first message is received, it sets it to passive mode so that subsequent messages have to be retrieved using the recv functions.

{keepalive, true}

Ensures that the connected socket sends keepalive messages when no data is being transferred. As “close socket” messages can be lost, this option ensures that the socket is closed if no response to the keepalive is received. By default, the flag is turned off.

{nodelay, true}

Will result in the socket immediately sending the package, no matter how small. By default, this option is turned off and data is instead aggregated and sent in larger chunks.

{packet_size, Integer}

Sets the maximum allowed length of the body. If packets are larger than Size, the packet is considered invalid.

There are other flags, all of which you can read about in the manual pages of the gen_tcp and inet modules.

The gen_tcp:listen/2 call returns immediately. It returns a socket identifier, Socket, which is passed to the following functions:

gen_tcp:accept(Socket)
gen_tcp:accept(Socket, TimeOut)

These calls suspend the process until a request to connect is made to that socket on that IP address. TimeOut is a value in milliseconds resulting in {error, timeout} being returned if no attempt is made to connect to that port. Connections are requested through the following call:

gen_tcp:connect(Address, Port, OptionList)

The Address is the IP address of the machine to which you are connecting, and Port is the port number of the corresponding socket. The OptionList is similar to the one defined in the gen_tcp:listen/2 call, containing the gen_udp:open/2 options together with the TCP-specific keepalive, nodelay, and packet_size discussed earlier.

As the socket in the example is running in passive mode, you retrieve the socket messages using calls to the functions gen_tcp:recv/1 and gen_tcp:recv/2. Had the sockets been running in active mode, messages would have been sent to the process in the format {tcp, Socket, Packet} and {tcp_error, Socket, Reason}.

You close the socket using the gen_tcp:close(Socket) call. This can be made on either the client or the server side. In either case, the {tcp_closed, Socket} message will be sent to the socket on the other side, effectively closing the socket.

The controlling process is generally the process that established a connection through calling one of gen_tcp:accept or gen_tcp:connect. To redirect messages elsewhere and pass the control to another process, the controlling process has to call gen_tcp:controlling_process(Socket, Pid).

In our previous example, the process calling gen_tcp:accept becomes the controlling process, and we spawned a new listener process. If instead we were to spawn a new process that would become the controlling process, with the listener process remaining the same, the code would look like this:

server() ->
  {ok, ListenSocket} = gen_tcp:listen(1234, [binary, {active, false}]),
  wait_connect(ListenSocket,0).

wait_connect(ListenSocket, Count) ->
  {ok, Socket} = gen_tcp:accept(ListenSocket),
  Pid = spawn(?MODULE, get_request, [Socket, [], Count]),
  gen_tcp:controlling_process(Socket, Pid),
  wait_connect(ListenSocket, Count+1).

In recent Erlang/OTP releases, it is possible to have multiple acceptors against the same listener socket. This could be expected to give better throughput than spawning a new acceptor each time. We leave this modification of the example as an exercise for you!

The inet Module

The inet module contains generic functions that will work with sockets regardless of whether you are using TCP or UDP. They provide generic access to the sockets as well as useful library functions. Without going into too much detail about what is available, in this section we will demonstrate the most commonly used functions by showing their use in the shell. If you need more information, you can look it up in the inet module’s manual page. The manual page also contains all of the POSIX error definitions the socket operations will return.

If you need to change your socket options once you’ve started your socket, you would use the call inet:setopts(Socket, OptionList), where OptionList is a list of tagged tuples containing the options described in this chapter together with other, less frequently used ones listed in the inet module’s manual page.

To retrieve the configuration parameters of an existing socket, you would use inet:getopts(Socket, Options) where Options is a list of atoms denoting the option values you are interested in retrieving. The function returns a tagged list where, if the underlying operating system or the socket type you are using does not support that particular option, it will be omitted from the result.

1> {ok, Socket} = gen_udp:open(1234).
{ok,#Port<0.468>}
2> inet:getopts(Socket, [active, exit_on_close, header, nodelay]).
{ok,[{active,true},{exit_on_close,true},{header,0}]}

Sockets will gather statistics about the data they send and receive. Received counters are prefixed with recv_, and sent counters with send_. They can be retrieved for the following packets:

avg

The average size of the packets

cnt

The number of packets that have been sent or received

dvi

The packet size deviation of bytes sent or received by the socket

max

The size of the largest package

oct

The number of bytes sent or received by the socket

In this example, our UDP socket receives four packets and sends none. The output is:

3> flush().
Shell got {udp,#Port<0.468>,{127,0,0,1},1235,"Hello World"}
Shell got {udp,#Port<0.468>,{127,0,0,1},1235,"Hello World"}
Shell got {udp,#Port<0.468>,{127,0,0,1},1235,"Hello World"}
Shell got {udp,#Port<0.468>,{127,0,0,1},1235,"Hello World"}
ok
4> inet:getstat(Socket).
{ok,[{recv_oct,44},
     {recv_cnt,4},
     {recv_max,11},
     {recv_avg,11},
     {recv_dvi,0},
     {send_oct,0},
     {send_cnt,0},
     {send_max,0},
     {send_avg,0},
     {send_pend,0}]}

Some of the functions you might find useful and should try in the shell follow. Some of them will return the hostent record, defined in the inet.hrl include file. Remember that you can load record definitions using the shell command rr("../lib/kernel-2.13/include/inet.hrl").

inet:peername(Socket).
inet:gethostname().
inet:getaddr(Host, Family).
inet:gethostbyaddr(Address).
inet:gethostbyname(Name).

Finally, a useful command to know, especially if you are having problems trying to open, send, or receive data from a socket, is inet:i(). It lists all TCP and UDP sockets, including those that the Erlang runtime system uses as well as those you have created.

In our example, we start a distributed Erlang node. Running the command shows us two sockets—the TCP listener socket waiting for inbound connections, and a socket connected to the epmd port mapper daemon:

(bar@Vaio)1> inet:i().
Port Module   Recv Sent Owner    Local Address   Foreign Address State
108  inet_tcp 0    0    <0.62.0> *:54843         *:*             ACCEPTING
110  inet_tcp 4    18   <0.60.0> localhost:54844 localhost:4369  CONNECTED
Port Module   Recv Sent Owner    Local Address   Foreign Address State
ok

Further Reading

This chapter covered the low-level mechanisms on which to build more complex protocols and layers. The Inets application, which comes as part of the OTP distribution, is a container for IP-based protocol implementations. It includes a web server called Inets as well as HTTP and FTP clients. It also has a Trivial File Transfer Protocol (TFTP) client and server. For more information on the Inets application, refer to its user guide and reference manual.

A part of distributed Erlang is the Secure Sockets Layer (SSL) application, providing encrypted communication over sockets. Erlang’s SSL application is based on the open source OpenSSL toolkit. You can read more about this application in the user guides and manuals that come with the Erlang distribution.

If you are interested in reading more about other Internet Protocol implementations, two good books are Internet Core Protocols by Eric Hall (O’Reilly) and TCP Illustrated by W. Richard Stevens (Addison-Wesley Professional Computing Series).

Exercises

Exercise 15-1: Snooping an HTTP Request

Open a listener socket on your local machine. Start your web browser and send it a request for a web page. Print the contents of the request and study them. How long before the socket connection is closed? What happens if you shut down your browser?

Exercise 15-2: A Simple HTTP Proxy

Change your browser proxy settings to point to your local machine on port 1500.[35] Start a listener socket on that port, and accept any connection coming to it. From your web browser, try to download any web page. The request should be forwarded to your socket connection. Sniff the contents of the request and extract the URL of the web page your browser is trying to load.

Using the HTTP client from the Inets application, retrieve the contents of the page you are trying to load and send it unchanged to the open socket connection. Hint: if you are not behind a proxy or firewall, http:start() and http:request("http://www.erlang.org") should do the job. Before using them, however, ensure that you read through the HTTP manual pages that come with the Erlang distribution.

Exercise 15-3: Peer to Peer

Write a module that contains code for a peer-to-peer transport layer. You will need a process which, when started, either waits for a socket connection to come in on port 1234 or waits for the function peer:connect(IpAddress) to be called. If the latter is called, it will try to connect to port 1234 on that address. Once the connection has been established, you should be able to use the function peer:send(String) to send data to your peer. Log what is sent to file and print it to the shell. The functions you should export are:

peer:start() -> ok | {error, already_started}
peer:connect(IpAddress) -> ok | {error, Reason}
peer:send(String) -> ok | {error, not_connected}
peer:stop() -> ok | {error, not_started}

The tricky part of this exercise, which will require some careful thought, is the fact that your process will be speaking to a copy of itself on another machine. By that, we mean both processes will be running the same code base.

If you are worried about Big Brother watching you, you can encrypt the packets you send using the crypto module.



[35] Depending on the rights of the user under which you are running your Erlang node, you might not be able to open ports that are either reserved or already taken. If that is the case, pick a higher number.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset