Chapter 17. Networking Basics

Modern distributed computer systems make extensive use of networking. Understanding network communications is an essential part of building such systems, since such communications underlie the use of any network-based service. A protocol is an agreed “language” that two parties (often, nowadays, computer programs) use to communicate with each other. There are two basic flavors of networking, connection-oriented and connectionless, which we cover in “Networking Principles”, each with its own protocols. Both flavors can operate over a wide range of transport mechanisms thanks to the ubiquity of the TCP/IP stack and the socket interface, first devised to support networking in BSD Unix.

The overall task of communication splits networking into protocol layers, to separate different functions. Data can be carried between systems in many different ways—over an Ethernet, across a serial link, and so on—and it would needlessly complicate application code to have to handle all the differences. The ISO (International Standards Organization) defined a seven-layer model, but this has proved in practice to be unnecessarily complex; TCP/IP uses a four-layer model.

The application code implements the process or application layer, concerned with exchanging messages between two processes potentially running on different computers (although TCP/IP works just as well between local processes). This layer passes messages to the host-to-host or transport layer, concerned with end-to-end communication of messages. The transport layer breaks up longer messages into segments that it passes to the network or Internet layer, responsible for routing the segments (split into chunks known as datagrams) across the required sequence of inter-system “hops.” This layer in turn uses a link or subnetwork layer to pass datagrams between individual systems along the way from source to destination.

To send data, the application passes a chunk of data to the transport layer. This in turn passes the data as segment-sized chunks to the network layer, which uses its knowledge of local network structure to determine which link to use to send the data on its first hop and the size of datagram to split the segments into. Using the appropriate driver, the network layer transmits datagrams as packets across the chosen link. Each intermediate system unwraps network-layer datagrams from link layer packets, and routes them to the next system wrapped in the new link layer protocol for its next hop. Finally, each datagram arrives at its ultimate destination, where the network layer passes it up to the transport layer, which reassembles segments and finally delivers the resulting message to its destination application process.

This chapter covers networking principles, the socket module, and a core subset of the ssl module: enough to let you write some simple networking clients and servers, and, more importantly, properly use networking modules at higher levels of abstraction, covered in Chapter 19, and asynchronous architectures, covered in Chapter 18.

Networking Principles

Connection-oriented protocols are like making a telephone call. You request a connection to a particular network endpoint (equivalent to dialing somebody’s phone number), and your party either answers or doesn’t. If they do, you can talk to them and hear them talking back (simultaneously, if necessary), so you know that nothing is getting lost. At the end of the conversation you both say goodbye and hang up, so it’s obvious something has gone wrong if that finale doesn’t occur (for example, if you just suddenly stop hearing the other party). TCP is the main connection-oriented transport protocol of the Internet, used by web browsers, secure shells, email, and many other applications.

Connectionless or datagram protocols are more like communicating by sending postcards. Mostly, the messages get through, but if anything goes wrong you have to be prepared to cope with the consequences—the protocol doesn’t notify you whether your messages have been received, and messages can arrive out of order. To exchange short messages and get answers, a datagram protocol has less overhead than a connection-oriented one, as long as the overall service can cope with occasional disruptions, like a Domain Name Service (DNS) server failing to respond, for example: most DNS communication is connectionless. UDP is the main connectionless transport protocol for Internet communications.

Nowadays, security is increasingly important: understanding the underlying basis of secure communications helps you ensure that your communications are as secure as they need to be. If this summary dissuades you from trying to implement such technology yourself without a thorough understanding of the issues and risks, it will have served a worthwhile purpose.

All communications across network interfaces exchange strings of bytes (also known as octets in networking parlance). To communicate text, for example, the sender must encode it to bytes, which the receiver must decode.

The Berkeley Socket Interface

Most networking nowadays hinges on sockets. Sockets give access to pipelines between independent endpoints, using a transport layer to move information between the endpoints. The socket concept is general enough that the endpoints can be on the same machine, or on separate computers networked together locally or via a wide-area network. There is no difference in the programming required for these destinations.

The most typical transport layers are UDP (the User Datagram Protocol, for connectionless networking) and TCP (the Transmission Control Protocol, for connection-oriented networking) over a common IP (Internet Protocol) network layer. This combination of protocols, along with the many application protocols that run over them, is collectively known as TCP/IP. The Unix operating system also offers its own flavor of sockets for use between different processes on the same machine; such Unix sockets can be used for either type of networking.

The socket interface was first implemented in the BSD Unix system, and proved to be a very useful mechanism for network interactions, standardizing the structure of networking programs. Sockets are the basis of Python network programming, so it’s important to understand how they work. A good introduction appears in the now somewhat venerable Socket Programming How-To online overview.

The two most common kinds of socket (also known as socket families) are Internet sockets based on TCP/IP communications (available in two flavors to accommodate the modern IPv6 and the more traditional IPv4), and Unix sockets, though other families are also available. Internet sockets allow communication between any two computers that can transmit IP datagrams between them; Unix sockets can only communicate between processes on the same Unix machine.

To support many concurrent Internet sockets, the TCP/IP protocol stack uses endpoints identified by an IP address, a port number and a protocol. The port numbers allow the protocol-handling software to distinguish between different endpoints at the same IP address using the same protocol. A connected socket also associates with a remote endpoint, the counter-party socket to which it is connected and with which it can communicate.

Most Unix sockets are associated with names in the Unix filesystem. On Linux platforms, sockets whose names begin with a zero byte live in a name pool maintained by the kernel. These are useful for communicating with a chroot-jail process, for example, where no filesystem is shared between two processes.

Both Internet and Unix sockets support connectionless and connection-oriented networking, so if you write your programs carefully they can work over either socket family. It is beyond the scope of this book to discuss other socket families, though we should mention that raw sockets, a subtype of the Internet socket family, let you send and receive link-layer packets (for example, Ethernet packets) directly. This is useful for experimental applications, but some operating systems of the Windows family do their best to inhibit access to raw sockets by programmers, on the somewhat specious grounds that doing so discourages “hackers.”

After creating an Internet socket, you can associate (bind) a specific port number with the socket (as long as that port number is not in use by some other socket). This is the strategy many servers use, offering service on so-called well-known port numbers defined by Internet standards as being in the range 1–1,023 (on Unix systems, root privileges are required to gain access to these ports). A typical client is unconcerned with the port number it uses, and so it typically requests an ephemeral port, assigned, and guaranteed to be unique, by the protocol driver. There is no need to bind ephemeral ports.

Imagine two processes on the same computer, both acting as clients to the same remote server. The full association for each socket is (local_IP_address, local_port_number, protocol, remote_IP_address, remote_port_number). When traffic arrives at the server, the destination IP address, destination port number, protocol, and source IP address are exactly equal for both clients. The guarantee of uniqueness for ephemeral port numbers, however, makes it possible for the server to distinguish between traffic from the two clients. This is how TCP/IP protocol multiplexing works, allowing multiple conversations to take place unambiguously between the same two IP addresses.

Socket Addresses

The different types of sockets use different address formats.

Unix socket addresses are strings naming a node in the filesystem (on Linux platforms, strings starting with b'' correspond to names in a kernel table).

IPv4 socket addresses are pairs (address, port). The first item is an IPv4 address, the second a port number in the range 1–65,535.

IPv6 socket addresses are four-item (address, port, flowinfo, scopeid) tuples. When providing an address as an argument, the flowinfo and scopeid items can be omitted, though this can sometimes cause problems.

Client-Server Computing

The pattern we discuss hereafter is usually referred to as client-server networking, where a server listens for traffic on a specific endpoint from clients who require the service. We do not cover peer-to-peer networking, which, lacking any central server, has to include the ability for peers to discover each other. This is often, ironically, achieved by contacting a central server, although discovery protocols such as SSDP allow complete independence from any central authority.

Most, though by no means all, network communication is performed using client-server techniques. The server listens for incoming traffic at a predetermined or advertised network endpoint. In the absence of such input it does nothing, simply sitting there waiting for input from clients. Communication is somewhat different between connectionless and connection-oriented endpoints.

In connectionless networking such as UDP, requests arrive at a server randomly and are dealt with immediately: a response is dispatched to the requester without delay. Each message is handled on its own, usually without reference to any communications that may previously have occurred between the two parties. Connectionless networking is thus well-suited to short-term, stateless interactions such as those required by DNS or network booting.

In connection-oriented networking, the client engages in an initial exchange with the server that effectively establishes a connection across a network pipeline between two processes (sometimes referred to as a virtual circuit), across which the processes can communicate until both have indicated their willingness to end the connection. Serving under these conditions requires the use of parallelism using a concurrency mechanism (such as threads or processes, covered in Chapter 14, or asynchronous programming, covered in Chapter 18) to handle each incoming connection asynchronously or simultaneously. Without such parallelism, the server would be unable to handle new incoming connections before earlier ones had terminated, since calls to socket methods normally block (meaning they pause the thread calling them until they terminate or time-out). Connections are the best way to handle lengthy interactions such as mail exchanges, command-line shell interactions, or the transmission of web content, and offer automatic error detection and correction when TCP is used.

Connectionless client and server structures

The broad logic flow of a connectionless server is as follows:

  1. Create a socket of type socket.SOCK_DGRAM by calling socket.socket.

  2. Associate the socket with the service endpoint by calling the socket’s bind method.

  3. Repeat the following steps ad infinitum:

    1. Request an incoming service datagram from a client by calling the socket’s recvfrom method; this call blocks until a datagram is received.

    2. Compute the result.

    3. Send the result back to the client by calling the socket’s sendto method.

The server spends most of its time in step 3a, awaiting input from clients.

A connectionless client’s interaction with the server proceeds as follows:

  1. Create a socket of type socket.SOCK_DGRAM by calling socket.socket.

  2. Optionally, associate the socket with a specific endpoint by calling the socket’s bind method.

  3. Send a request to the server’s endpoint by calling the socket’s sendto method.

  4. Await the server’s reply by calling the socket’s recvfrom method; this call blocks until the response is received. It is always necessary to apply a timeout to this call, to handle the case where a datagram goes missing, and either retry or abort the attempt: connectionless sockets do not guarantee delivery.

  5. Use the result in the remainder of the client program’s logic.

A single client program can perform several interactions with the same or multiple servers, depending on the services it needs to use. Many such interactions are hidden from the application programmer inside library code. A typical example is the resolution of a hostname to the appropriate network address, which commonly uses the gethostbyname library function (implemented in Python’s socket module). Connectionless interactions normally involve sending a single packet to the server and receiving a single packet in response. The main exception involves streaming protocols, such as RTP, typically layered on top of UDP to minimize latency and delays: in streaming, many datagrams are sent and received.

Connection-oriented client and server structures

The broad flow of logic of a connection-oriented server is as follows:

  1. Create a socket of type socket.SOCK_STREAM by calling socket.socket.

  2. Associate the socket with the appropriate server endpoint by calling the socket’s bind method.

  3. Start the endpoint listening for connection requests by calling the socket’s listen method.

  4. Repeat the following steps ad infinitum:

    1. Await an incoming client connection by calling the socket’s accept method; the server process blocks until an incoming connection request is received. When such a request arrives, a new socket object is created whose other endpoint is the client program.

    2. Create an asynchronous control thread to handle this specific connection, passing it the newly created socket; after which, the main thread continues by looping back to step 4a.

    3. In the new control thread, interact with the client using the new socket’s recv and send methods, respectively, to read data from the client and send data to it. The recv method blocks until data is available from the client (or the client indicates it wishes to close the connection, in which case recv returns an empty result). The send method only blocks when the network software has so much data buffered that communication has to pause until the transport layer has emptied some of its buffer memory. When the server wishes to close the connection, it can do so by calling the socket’s close method, optionally calling its shutdown method first.

The server spends most of its time in step 4a, awaiting connection requests from clients.

A connection-oriented client’s overall logic is as follows:

  1. Create a socket of type socket.SOCK_STREAM by calling socket.socket

  2. Optionally, associate the socket with a specific endpoint by calling the socket’s bind method.

  3. Establish a connection to the server by calling the socket’s connect method.

  4. Interact with the server using the socket’s recv and send methods, respectively, to read data from the server and send data to it. The recv method blocks until data is available from the server (or the server indicates it wishes to close the connection, in which case the recv call returns an empty result). The send method only blocks when the network software has so much data buffered that communications have to pause until the transport layer has emptied some of its buffer memory. When the client wishes to close the connection, it can do so by calling the socket’s close method, optionally calling its shutdown method first.

Connection-oriented interactions tend to be more complex than connectionless ones. Specifically, determining when to read and write data is more complex because inputs must be parsed to determine when a transmission from the other end of the socket is complete. The protocols used in connection-oriented networking have to accommodate this determination; sometimes this is done by indicating the data length as a part of the content, sometimes by more complex methods.

The socket Module

Python’s socket module handles networking with the socket interface. There are minor differences between platforms, but the module hides most of them, making it relatively easy to write portable networking applications.

The module defines four exceptions: their base class socket.error, which in v2 is a subclass of exceptions.IOError; in v3 it is a (deprecated) alias for exceptions.OSError, and three exception subclasses as follows:

herror

socket.herror is raised for hostname-resolution errors—that is, when a name cannot be converted to a network address by the socket.gethostbyname function, or no hostname can be found for a network address by the socket.gethostbyaddr function. The accompanying value is a two-element tuple (h_errno, string) where h_errno is the integer error number returned by the operating system and string is a description of the error.

gaierror

socket.gaierror is raised for addressing errors encountered in the getaddrinfo or getnameinfo functions.

timeout socket.timeout is raised when an operation takes longer than the timeout limit (established by the module’s setdefaulttimeout function, overridable on a per-socket basis).

The module also defines a large set of constants. The most important of these are the address families (AF_*) and the socket types (SOCK_*) listed next. In v2 these constants are integers, while in v3 they are members of IntEnum collections. This difference can be disregarded, but, to debug your code, it’s more helpful to see, for example, <SocketKind.SOCK_STREAM: 1> than a simple 1. The module defines many other constants, used to set socket options, but the documentation does not define them fully: to use them you must be familiar with the low-level documentation for the C sockets libraries and system calls.

AF_INET

Used to create sockets of the IPv4 address family.

AF_INET6

Used to create sockets of the IPv6 address family.

AF_UNIX Used to create sockets of the Unix address family. This constant is only defined on platforms that make Unix sockets available, and so is unavailable to, for example, Windows users.
AF_CAN Used (v3 only) to create sockets for the Controller Area Network (CAN) address family, not further described in this book, but widely used in automation, automotive, and embedded device applications.
SOCK_STREAM Used to create connection-oriented sockets, which provide full error detection and correction facilities.
SOCK_DGRAM Used to create connectionless sockets, which provide best-effort message delivery without connection capabilities or error detection.
SOCK_RAW Used to create sockets that give direct access to the link-layer drivers, typically used to implement lower-level network features outside the scope of this book.
SOCK_RDM Used to create reliable connectionless message sockets used in the TIPC protocol, which is outside the scope of this book.
SOCK_SEQPACKET Used to create reliable connection-oriented message sockets used in the TIPC protocol, which is outside the scope of this book.

The module defines a number of functions to create sockets, manipulate address information, and assist with representing data in a standard way. We do not cover all possibilities in this book, as the socket module documentation is comprehensive; we deal with the ones that are most essential in writing networked applications.

Miscellaneous socket module functions

The socket module contains many functions, but most of them are only useful in specific situations. When communication takes place between network endpoints, the computers at either end might have architectural differences and therefore represent the same data in different ways, and so there are functions to handle translation of a limited number of data types to and from a network-neutral form, for example. Here are a few of the more generally applicable functions:

Table 17-1. Generally useful socket module functions

getaddrinfo

socket.getaddrinfo(host, port, family=0, type=0, proto=0,
flags=0)

Takes a host and port, returns a list of five-item tuples of the form (family, type, proto, canonical_name, socket) that can be used to create a socket connection to a specific service. In v2 all arguments are positional, but named arguments are accepted in v3. The canonical_name item is an empty string unless the socket.AI_CANONNAME bit is set in the flags argument. When you pass a hostname, rather than an IP address, the function returns a list of tuples, one for each IP address associated with the name.

getdefaulttimeout socket.getdefaulttimeout()
Returns the default timeout value in seconds for socket operations, or None if no value has yet been set. Some functions also let you specify an explicit timeout.

getfqdn

socket.getfqdn([host])
Returns the fully qualified domain name associated with a hostname or network address (by default, that of the computer on which you call it).

gethostbyaddr socket.gethostbyaddr(ip_address)
Takes a string containing an IPv4 or IPv6 address and returns a three-item tuple of the form (hostname, aliaslist, ipaddrlist). hostname is the canonical name for the IP, aliaslist a list of alternative names, and ipaddrlist a list of IPv4 and IPv6 addresses.
gethostbyname socket.gethostbyname(hostname)
Returns a string containing the IPv4 address associated with the given hostname. If called with an IP address, returns that address. This function does not support IPv6: use getaddrinfo for IPv6.
getnameinfo socket.getnameinfo(sock_addr, flags=0)
Takes a socket address and returns a (host, port) pair. Without flags, host is an IP address and port is an int.

setdefaulttimeout

socket.setdefaulttimeout(timeout)

Sets sockets’ timeout as a value in floating-point seconds. Newly created sockets operate in the mode determined by the timeout value, as discussed in the next section. Pass timeout as None to cancel the use of timeouts on subsequently created sockets.

Socket Objects

The socket object is the primary means of network communication in Python. A new socket is also created when a SOCK_STREAM socket accepts a connection, each such socket being used to communicate with the relevant client.

Socket objects and with statements

Every socket object is a context manager, so you can use any socket object in a with statement initial clause to ensure proper termination of the socket at exit from the with statement’s body.

There are a number of ways you can create a socket, as detailed in the next section. The socket can operate in different modes, determined by its timeout value, established in one of three ways:

  • By providing the timeout value on creation

  • By calling the socket object’s settimeout method

  • According to the socket module’s default timeout value as returned by the socket.getdefaulttimeout function

The timeout values to establish each mode are as follows:

None

Sets blocking mode. Each operation suspends the process (blocks) until the operation completes, unless the operating system causes an exception to be raised.

0

Sets nonblocking mode. Each operation raises an exception when it cannot be completed immediately, or when an error occurs. Use the selectors module, covered in “The selectors Module”, to find whether an operation can be completed immediately.

>0.0 Sets timeout mode. Each operation blocks until complete, or the timeout elapses (then, a socket.timeout exception is raised), or an error occurs.

Socket creation functions

Socket objects represent network endpoints. There are a number of different functions supplied by the socket module to create a socket:

create_connection

create_connection([address, [timeout, [source_address]]])

Creates a socket connected to a TCP endpoint at an address (a (host, port) pair). host can either be a numeric network address or a DNS hostname; in the latter case, name resolution is attempted for both AF_INET and AF_INET6, and then a connection is attempted to each returned address in turn—a convenient way to create client programs using either IPv6 or IPv4 as appropriate.

The timeout argument, if given, specifies the connection timeout in seconds and thereby sets the socket’s mode; when not present, the socket.getdefaulttimeout function is called to determine the value. The source_address argument, if given, must also be a pair (host, port) that the remote socket gets passed as the connecting endpoint. When host is '' or port is 0, the default OS behavior is used.

socket

socket(family=AF_INET, type=SOCK_STREAM, proto=0,
fileno=None)

Creates and returns a socket of the appropriate address family and type (by default, a TCP socket on IPv4). The protocol number proto is only used with CAN sockets. When you pass the fileno argument, other arguments are ignored: the function returns the socket already associated with the given file descriptor.

The socket does not get inherited by child processes.

socketpair

socketpair([family[, type[, proto]]])

Returns a connected pair of sockets of the given address family, socket type, and (CAN sockets only) protocol. When family is not specified, the sockets are of family AF_UNIX on platforms where the family is available, and otherwise of family AF_INET. When type is not specified, it defaults to SOCK_STREAM.

A socket object s provides the following methods (out of which, those dealing with connections or requiring a connected sockets work only for SOCK_STREAM sockets, while the others work with both SOCK_STREAM and SOCK_DGRAM sockets). In the following table, the exact set of flags available depends on your specific platform; the flags values available are documented on the appropriate Unix manual page for recv(2) or manual page for send(2):

accept

accept()

Blocks until a client establishes a connection to s, which must have been bound to an address (with a call to s.bind) and set to listening (with a call to s.listen). Returns a new socket object, which can be used to communicate with the other endpoint of the connection.

bind

bind(address)

Binds s to a specific address. The form of the address argument depends on the socket’s address family (see “Socket Addresses”).

close

close()

Marks the socket as closed. It does not necessarily close the connection immediately, depending on whether other references to the socket exist. If immediate closure is required, call the s.shutdown method first. The simplest way to ensure a socket is closed in a timely fashion is to use it in a with statement, since sockets are context managers.

connect

connect(address)

Connects to a remote socket at address. The form of the address argument depends on the address family (see “Socket Addresses”).

detach

detach()

Puts the socket into closed mode, but allows the socket object to be reused for further connections.

dup

dup()

Returns a duplicate of the socket, not inheritable by child processes.

fileno

fileno()

Returns the socket’s file descriptor.

get_inheritable

get_inheritable() (v3 only)

Returns True when the socket is going to be inherited by child processes. Otherwise, returns False.

getpeername

getpeername()

Returns the address of the remote endpoint to which this socket is connected.

getsockname

getsockname()

Returns the address being used by this socket.

gettimeout

gettimeout()

Returns the timeout associated with this socket.

listen

listen([backlog])

Starts the socket listening for traffic on its associated endpoint. If given, the integer backlog argument determines how many unaccepted connections the operating system allows to queue up before starting to refuse connections.

makefile

makefile(mode, [bufsize]) (v2)

makefile(mode, buffering=None, *, encoding=None,
newline=None)
(v3)

Returns a file object allowing the socket to be used with file-like operations such as read and write. The mode can be 'r' or 'w', to which 'b' can be added for binary transmissions. The socket must be in blocking mode; if a timeout value is set, unexpected results may be observed if a timeout occurs. Libraries intending to support both v2 and v3 are advised to omit the remaining arguments, which are not well documented and differ between versions.

recv

recv(bufsiz, [flags])

Receive a maximum of bufsiz bytes of data on the socket. Returns the received data.

recvfrom

recvfrom(bufsiz, [flags])

Receive a maximum of bufsiz bytes of data from s. Returns a pair (bytes, address) where bytes is the received data, and address the address of the counter-party socket that sent the data.

recvfrom_into

recvfrom_into(buffer, [nbytes, [flags]])

Receive a maximum of nbytes bytes of data from s, writing it into the given buffer object. Returns a two-element tuple (nbytes, address) where nbytes is the number of bytes received and address is the address of the socket that sent the data.

recv_into

recv_into(buffer, [nbytes, [flags]])

Receive a maximum of nbytes bytes of data from s, writing it into the given buffer object. Returns the number of bytes received.

recvmsg

recvmsg(bufsiz, [ancbufsiz, [flags]])

Receive a maximum of bufsiz bytes of data on the socket and a maximum of ancbufsiz of ancillary (“out-of-band”) data. Returns a four-item tuple (data, ancdata, msg_flags, address), where bytes is the received data, ancdata is a list of three-item (cmsg_level, cmsg_type, cmsg_data) tuples representing the received ancillary data, msg_flags holds any flags received with the message, and address is the address of the counter-party socket that sent the data (if the socket is connected, this value is undefined, but the sender can be determined from the socket).

send

send(bytes, [flags]])

Send the given data bytes over the socket, which must already be connected to a remote endpoint. Returns the number of bytes sent, which should be verified: the call may not transmit all data, in which case transmission of the remainder will have to be separately requested.

sendall

sendall(bytes, [flags])

Send all the given data bytes over the socket, which must already be connected to a remote endpoint. The socket’s timeout value applies to the transmission of all the data, even if multiple transmissions are needed.

sendto

sendto(bytes, address) or

sendto(bytes, flags, address)

Transmit the bytes (s must not be connected) to the given socket address.

sendmsg

sendmsg(buffers, [ancdata, [flags, [address]]])

Send normal and ancillary (out-of-band) data to the connected endpoint. buffers should be an iterable of bytes-like objects. The ancdata argument should be an iterable of (data, ancdata, msg_flags, address) tuples representing the ancillary data, and msg_flags are flags values documented on the Unix manual page for the send(2) system call. address should only be provided for an unconnected socket, and determines the endpoint to which the data is sent.

sendfile

sendfile(file, offset=0, count=None)

Send the contents of file object file (which must be open in binary mode) to the connected endpoint. On platforms where os.sendfile is available, it’s used; otherwise, the send call is used. If provided, offset determines the starting byte position in the file from which transmission begins, and count sets the maximum number of bytes to be transmitted. Returns the total number of bytes transmitted.

set_inheritable

set_inheritable(flag) (v3 only)

Determines whether the socket gets inherited by child processes, according to the Boolean value of flag.

setblocking

setblocking(flag)

Determines whether s operates in blocking mode (see “Socket Objects”) according to the Boolean value of flag. s.setblocking(True) is equivalent to s.settimeout(None) and s.set_blocking(False) is equivalent to s.settimeout(0.0).

settimeout

settimeout(timeout)

Establishes the mode of s (see “Socket Objects”) according to the value of timeout.

shutdown

shutdown(how)

Shuts down one or both halves of a socket connection according to the value of the how argument, as detailed here:

socket.SHUT_
RD
No further receive operations can be performed on s.
socket.SHUT_
WR
No further send operations can be performed on s.
socket.SHUT_
RDWR
No further receive or send operations can be performed on s.

A socket object s also has the following attributes:

family An attribute that is s’s socket family
type An attribute that is s’s socket type

A Connectionless Socket Client

Consider a simplistic packet-echo service. Text encoded in UTF-8 (the assumed encoding of most Python source files) is sent to a server, which sends the same information back to the client originating it. In a connectionless service, all the client has to do is send each chunk of data to the defined server endpoint.

The following client works equally well on either v2 or v3 (printing slightly different output to stdout in each case. In v2, Unicode text strings are marked as u'...' while bytestrings are not marked; in v3, Unicode text strings are not marked, while bytestrings are marked as b'...'):

# coding: utf-8
from __future__ import print_function
import socket

UDP_IP = '127.0.0.1'
UDP_PORT = 8883
MESSAGE = u"""
This is a bunch of lines, each
of which will be sent in a single
UDP datagram. No error detection
or correction will occur.
Crazy bananas! £€ should go through."""

sock = socket.socket(socket.AF_INET,  # Internet
                     socket.SOCK_DGRAM)  # UDP
server = UDP_IP, UDP_PORT
for line in MESSAGE.splitlines():
    data = line.encode('utf-8')
    sock.sendto(data, server)
    print('Sent', repr(data), 'to', server)
    response, address = sock.recvfrom(1024)  # buffer size: 1024
    print('Recv', repr(response.decode('utf-8')), 'from', address)

Note that the server is only expected to perform a byte-oriented echo function. The client therefore encodes its Unicode data into specific bytestrings, and decodes the bytestring responses received from the server back into Unicode text.

A Connectionless Socket Server

A server for this service is also quite simple. It binds to its endpoint, receives packets (datagrams) at that endpoint, and returns a packet to the client sending each datagram, with exactly the same data. The server treats all clients equally and does not need to use any kind of concurrency (though this handy characteristic might not hold for a service where request handling takes more time).

The following server works equally well on v2 or v3 (with slightly different output in each version, as explained previously for the client):

from __future__ import print_function
import socket

UDP_IP = '127.0.0.1'
UDP_PORT = 8883

sock = socket.socket(socket.AF_INET,  # Internet
                     socket.SOCK_DGRAM)  # UDP
sock.bind((UDP_IP, UDP_PORT))
print('Serving at', UDP_IP, UDP_PORT)

while True:
    data, addr = sock.recvfrom(1024)  # buffer size is 1024 bytes
    print('Recv', repr(data), 'from', addr)
    sock.sendto(data, addr)
    print('Sent', repr(data), 'to', addr)

This code offers no way to terminate the service other than by interrupting it (typically from the keyboard, with Ctrl-C or Ctrl-Break).

A Connection-Oriented Socket Client

Consider a simplistic connection-oriented “echo-like” protocol: a server lets clients connect to its listening socket, receives arbitrary bytes from them, and sends back to each client the same bytes that client sent to the server, until the client closes the connection. Here’s an example of an elementary test client (fully coded to run equally well in v2 and v3)1:

# coding: utf-8
from __future__ import print_function
import socket

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
    sock.connect(('localhost', 8881))
    print('Connected to server')
    data = u"""A few lines of text
including non-ASCII characters: €£
to test the operation
of both server
and client."""
    for line in data.splitlines():
        sock.sendall(line.encode('utf-8'))
        print('Sent:', line)
        response = sock.recv(1024)
        print('Recv:', response.decode('utf-8'))
print('Disconnected from server')

Note that the data is text, so it must be encoded with a suitable representation, for which we chose the usual suspect—UTF-8. The server works in terms of bytes (since it is bytes, AKA octets, that travel on the network); the received bytes object gets decoded with UTF-8 back into Unicode text before printing. Any other suitable codec could be used: the key point is that text must be encoded before transmission and decoded after reception. The server, working in terms of bytes, does not even need to know which encoding is being used, except maybe for logging purposes.

A Connection-Oriented Socket Server

Here is a simplistic server corresponding to the testing client shown in “A Connection-Oriented Socket Client”, using multithreading via concurrent.futures, covered in “The concurrent.futures Module”, also fully coded to run equally well in v3 and v2 (as long as you’ve installed the concurrent backport—see “The concurrent.futures Module”):

from __future__ import print_function
from concurrent import futures as cf
import socket

def handle(new_sock, address):
    print('Connected from', address)
    while True:
        received = new_sock.recv(1024)
        if not received: break
        s = received.decode('utf-8', errors='replace')
        print('Recv:', s)
        new_sock.sendall(received)
        print('Echo:', s)
    new_sock.close()
    print('Disconnected from', address)

servsock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
servsock.bind(('localhost', 8881))
servsock.listen(5)
print('Serving at', servsock.getsockname())

with cf.ThreadPoolExecutor(20) as e:
    try:
        while True:
            new_sock, address = servsock.accept()
            e.submit(handle, new_sock, address)
    except KeyboardInterrupt:
        pass
    finally:
        servsock.close()

This server has its limits. In particular, it runs only 20 threads, so it cannot be simultaneously serving more than 20 clients; any further client trying to connect while 20 are being served waits in servsock’s listening queue, and (should that queue fill up with 5 clients waiting to be accepted) further clients attempting connection get rejected outright. This server is intended just as an elementary example for demonstration purposes, not as a solid, scalable, or secure system.

Transport Layer Security (TLS, AKA SSL)

The Transport Layer Security (TLS) is often also known as the Secure Sockets Layer (SSL), which was in fact the name of its predecessor protocol. TLS provides privacy and data integrity over TCP/IP, helping you defend against server impersonation, eavesdropping on the bytes being exchanged, and malicious alteration of those bytes. For an introduction to TLS, we recommend the extensive Wikipedia entry.

In Python, you can use TLS via the ssl module of the standard library, documented in detail online. To use ssl well, you need ssl’s rich online docs, as well as a deep and broad understanding of TLS itself (the Wikipedia article, excellent and vast as it is, can only begin to cover this large, difficult subject). In particular, the security considerations section of the online docs must be entirely learned and completely understood, as must the materials at the many links helpfully offered in that section.

If these considerations make it look like perfect implementation of security precautions is a daunting task, that’s because it is. In security, you’re pitting your wits and skills against those of sophisticated attackers who may be more familiar with all the nooks and crannies of the problems involved, since they specialize in finding work-arounds and breaking in, while (usually) your focus can’t be exclusively on such issues—rather, you’re trying to provide useful services in your code, and it’s quite risky to see security as an afterthought, a secondary point…it has to be front-and-center throughout, to win said battle of skills and wits.

That said, we strongly recommend undertaking the above-outlined study of TLS to all readers—the better all developers understand security considerations, the better off we all are (except, we guess, for “black-hat” security-breaker wannabes).

Unless you have acquired a really deep and broad understanding of TLS and Python’s ssl module (in which case, you’ll know what exactly to do—better than we possibly could!), we recommend using an SSLContext instance to hold all details of your use of TLS. Build that instance with the ssl.create_default_context function, add your certificate if needed (it is needed if you’re writing a secure server), then use the instance’s wrap_socket method to wrap (almost2) every socket.socket instance you make into an instance of ssl.SSLSocket—behaving almost identically to the socket object it wraps, but nearly transparently adding security checks and validation “on the side.”

The default TLS contexts strike a good compromise between security and broad usability, and we recommend you stick by them (unless you’re knowledgeable enough to further fine-tune and tighten security for special needs). If you need to support outdated counterparts, ones unable to use the most recent, most secure implementations of TLS, you may feel tempted to learn just enough to relax your security demands; do that at your own risk—we most definitely don’t recommend wandering into such “HC SVNT DRACONES” (here be dragons) lands!

In the following sections, we cover the minimal subset of ssl you need if you just want to follow our recommendations. But, remember, even if that is the case, please also read up on TLS and ssl, just to gain some background knowledge about the intricate issues involved. It may stand you in good stead one day!

SSLContext

The ssl module supplies an ssl.SSLContext class, whose instances hold information about TLS configuration (including certificates and private keys) and offer many methods to set, change, check, and use that information. If you know exactly what you’re doing, you can manually instantiate, set up, and use your own SSLContext instances for your own specialized purposes.

However, we recommend instead that you instantiate an SSLContext using the well-tuned function named ssl.create_default_context with a single argument: ssl.Purpose.CLIENT_AUTH if your code is a server (and thus may need to authenticate clients), or ssl.Purpose.SERVER_AUTH if your code is a client (and thus definitely needs to authenticate servers). If your code is both a client to some servers and a server to other clients (as, for example, some Internet proxies are), then you’ll need two instances of SSLContext, one for each purpose.

For most client-side uses, your SSLContext is ready. If you’re coding a server, or a client for one of the rare servers that require TLS authentication of the clients, you need to have a certificate file and a key file, and add them to the SSLContext instance (so that counter-parties can verify your identify) with code such as, for example:

ctx = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
ctx.load_cert_chain(certfile='mycert.pem', keyfile='mykey.key')

passing the paths to the certificate and key files to the load_cert_chain method. (See the online docs to learn how to obtain key and certificate files.)

Once your context instance ctx is ready, if you’re coding a client, just call ctx.wrap_socket to wrap any socket you’re about to connect to a server, and use the wrapped result (an instance of ssl.SSLSocket) instead of the socket you just wrapped. For example:

sock = socket.socket(socket.AF_INET)
sock = ctx.wrap_socket(sock, server_hostname='www.example.com')
sock.connect(('www.example.com', 443))
# just use `sock` normally from here onwards

Note that, in the client case, you should also pass wrap_socket a server_hostname argument corresponding to the server you’re about to connect to; this way, the connection can verify that the identity of the server you end up connecting to is indeed correct, one absolutely crucial step to any Internet security.

Server-side, don’t wrap the socket that you are binding to an address, listening on, or accepting connections on; just bind the new socket to accept returns. For example:

sock = socket.socket(socket.AF_INET)
sock.bind(('www.example.com', 443))
sock.listen(5)
while True:
    newsock, fromaddr = sock.accept()
    newsock = ctx.wrap_socket(newsock, server_side=True)
    # deal with `newsock` as usual, shutdown and close it when done

In this case, what you need to pass to wrap_socket is an argument server_side=True, so it knows that you’re on the server side of things.

Again, we recommend the online docs, and particularly the examples, for better understanding of even this simple subset of ssl operations.

1 Note this client example isn’t secure; see “Transport Layer Security (TLS, AKA SSL)” for an introduction to making it secure.

2 We say “almost” because, when you code a server, you don’t wrap the socket you bind, listen on, and accept connections from.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset