12. Network Scripting

“Tune In, Log On, and Drop Out”

Over the 15 years since this book was first published, the Internet has virtually exploded onto the mainstream stage. It has rapidly grown from a simple communication device used primarily by academics and researchers into a medium that is now nearly as pervasive as the television and telephone. Social observers have likened the Internet’s cultural impact to that of the printing press, and technical observers have suggested that all new software development of interest occurs only on the Internet. Naturally, time will be the final arbiter for such claims, but there is little doubt that the Internet is a major force in society and one of the main application contexts for modern software systems.

The Internet also happens to be one of the primary application domains for the Python programming language. In the decade and a half since the first edition of this book was written, the Internet’s growth has strongly influenced Python’s tool set and roles. Given Python and a computer with a socket-based Internet connection today, we can write Python scripts to read and send email around the world, fetch web pages from remote sites, transfer files by FTP, program interactive websites, parse HTML and XML files, and much more, simply by using the Internet modules that ship with Python as standard tools.

In fact, companies all over the world do: Google, YouTube, Walt Disney, Hewlett-Packard, JPL, and many others rely on Python’s standard tools to power their websites. For example, the Google search engine—widely credited with making the Web usable—makes extensive use of Python code. The YouTube video server site is largely implemented in Python. And the BitTorrent peer-to-peer file transfer system—written in Python and downloaded by tens of millions of users—leverages Python’s networking skills to share files among clients and remove some server bottlenecks.

Many also build and manage their sites with larger Python-based toolkits. For instance, the Zope web application server was an early entrant to the domain and is itself written and customizable in Python. Others build sites with the Plone content management system, which is built upon Zope and delegates site content to its users. Still others use Python to script Java web applications with Jython (formerly known as JPython)—a system that compiles Python programs to Java bytecode, exports Java libraries for use in Python scripts, and allows Python code to serve as web applets downloaded and run in a browser.

In more recent years, new techniques and systems have risen to prominence in the Web sphere. For example, XML-RPC and SOAP interfaces for Python have enabled web service programming; frameworks such as Google App Engine, Django, and TurboGears have emerged as powerful tools for constructing websites; the XML package in Python’s standard library, as well as third-party extensions, provides a suite of XML processing tools; and the IronPython implementation provides seamless .NET/Mono integration for Python code in much the same way Jython leverages Java libraries.

As the Internet has grown, so too has Python’s role as an Internet tool. Python has proven to be well suited to Internet scripting for some of the very same reasons that make it ideal in other domains. Its modular design and rapid turnaround mix well with the intense demands of Internet development. In this part of the book, we’ll find that Python does more than simply support Internet scripting; it also fosters qualities such as productivity and maintainability that are essential to Internet projects of all shapes and sizes.

Internet Scripting Topics

Internet programming entails many topics, so to make the presentation easier to digest, I’ve split this subject over the next five chapters of this book. Here’s this part’s chapter rundown:

This chapter introduces Internet fundamentals and explores sockets, the underlying communications mechanism of the Internet. We met sockets briefly as IPC tools in Chapter 5 and again in a GUI use case in Chapter 10, but here we will study them in the depth afforded by their broader networking roles.
Chapter 13 covers the fundamentals of client-side scripting and Internet protocols. Here, we’ll explore Python’s standard support for FTP, email, HTTP, NNTP, and more.
Chapter 14 presents a larger client-side case study: PyMailGUI, a full-featured email client.
Chapter 15 discusses the fundamentals of server-side scripting and website construction. We’ll study basic CGI scripting techniques and concepts that underlie most of what happens in the Web.
Chapter 16 presents a larger server-side case study: PyMailCGI, a full-featured webmail site.

Each chapter assumes you’ve read the previous one, but you can generally skip around, especially if you have prior experience in the Internet domain. Since these chapters represent a substantial portion of this book at large, the following sections go into a few more details about what we’ll be studying.

What we will cover

In conceptual terms, the Internet can roughly be thought of as being composed of multiple functional layers:

Low-level networking layers: Mechanisms such as the TCP/IP transport mechanism, which deal with transferring bytes between machines, but don’t care what they mean
Sockets: The programmer’s interface to the network, which runs on top of physical networking layers like TCP/IP and supports flexible client/server models in both IPC and networked modes
Higher-level protocols: Structured Internet communication schemes such as FTP and email, which run on top of sockets and define message formats and standard addresses
Server-side web scripting: Application models such as CGI, which define the structure of communication between web browsers and web servers, also run on top of sockets, and support the notion of web-based programs
Higher-level frameworks and tools: Third-party systems such as Django, App Engine, Jython, and pyjamas, which leverage sockets and communication protocols, too, but address specific techniques or larger problem domains

This book covers the middle three tiers in this list—sockets, the Internet protocols that run on them, and the CGI model of web-based conversations. What we learn here will also apply to more specific toolkits in the last tier above, because they are all ultimately based upon the same Internet and web fundamentals.

More specifically, in this and the next chapter, our main focus is on programming the second and third layers: sockets and higher-level Internet protocols. We’ll start this chapter at the bottom, learning about the socket model of network programming. Sockets aren’t strictly tied to Internet scripting, as we saw in Chapter 5’s IPC examples, but they are presented in full here because this is one of their primary roles. As we’ll see, most of what happens on the Internet happens through sockets, whether you notice or not.

After introducing sockets, the next two chapters make their way up to Python’s client-side interfaces to higher-level protocols—things like email and FTP transfers, which run on top of sockets. It turns out that a lot can be done with Python on the client alone, and Chapters 13 and 14 will sample the flavor of Python client-side scripting. Finally, the last two chapters in this part of the book then move on to present server-side scripting—programs that run on a server computer and are usually invoked by a web browser.

What we won’t cover

Now that I’ve told you what we will cover in this book, I also want to be clear about what we won’t cover. Like tkinter, the Internet is a vast topic, and this part of the book is mostly an introduction to its core concepts and an exploration of representative tasks. Because there are so many Internet-related modules and extensions, this book does not attempt to serve as an exhaustive survey of the domain. Even in just Python’s own tool set, there are simply too many Internet modules to include each in this text in any sort of useful fashion.

Moreover, higher-level tools like Django, Jython, and App Engine are very large systems in their own right, and they are best dealt with in more focused documents. Because dedicated books on such topics are now available, we’ll merely scratch their surfaces here with a brief survey later in this chapter. This book also says almost nothing about lower-level networking layers such as TCP/IP. If you’re curious about what happens on the Internet at the bit-and-wire level, consult a good networking text for more details.

In other words, this part is not meant to be an exhaustive reference to Internet and web programming with Python—a topic which has evolved between prior editions of this book, and will undoubtedly continue to do so after this one is published. Instead, the goal of this part of the book is to serve as a tutorial introduction to the domain to help you get started, and to provide context and examples which will help you understand the documentation for tools you may wish to explore after mastering the fundamentals here.

Other themes in this part of the book

Like the prior parts of the book, this one has other agendas, too. Along the way, this part will also put to work many of the operating-system and GUI interfaces we studied in Parts II and III (e.g., processes, threads, signals, and tkinter). We’ll also get to see the Python language applied in realistically scaled programs, and we’ll investigate some of the design choices and challenges that the Internet presents.

That last statement merits a few more words. Internet scripting, like GUIs, is one of the “sexier” application domains for Python. As in GUI work, there is an intangible but instant gratification in seeing a Python Internet program ship information all over the world. On the other hand, by its very nature, network programming can impose speed overheads and user interface limitations. Though it may not be a fashionable stance these days, some applications are still better off not being deployed on the Web.

A traditional “desktop” GUI like those of Part III, for example, can combine the feature-richness and responsiveness of client-side libraries with the power of network access. On the other hand, web-based applications offer compelling benefits in portability and administration. In this part of the book, we will take an honest look at the Net’s trade-offs as they arise and explore examples which illustrate the advantages of both web and nonweb architectures. In fact, the larger PyMailGUI and PyMailCGI examples we’ll explore are intended in part to serve this purpose.

The Internet is also considered by many to be something of an ultimate proof of concept for open source tools. Indeed, much of the Net runs on top of a large number of such tools, such as Python, Perl, the Apache web server, the sendmail program, MySQL, and Linux.^[42] Moreover, new tools and technologies for programming the Web sometimes seem to appear faster than developers can absorb them.

The good news is that Python’s integration focus makes it a natural in such a heterogeneous world. Today, Python programs can be installed as client-side and server-side tools; used as applets and servlets in Java applications; mixed into distributed object systems like CORBA, SOAP, and XML-RPC; integrated into AJAX-based applications; and much more. In more general terms, the rationale for using Python in the Internet domain is exactly the same as in any other—Python’s emphasis on quality, productivity, portability, and integration makes it ideal for writing Internet programs that are open, maintainable, and delivered according to the ever-shrinking schedules in this field.

Running Examples in This Part of the Book

Internet scripts generally imply execution contexts that earlier examples in this book have not. That is, it usually takes a bit more to run programs that talk over networks. Here are a few pragmatic notes about this part’s examples, up front:

You don’t need to download extra packages to run examples in this part of the book. All of the examples we’ll see are based on the standard set of Internet-related modules that come with Python and are installed in Python’s library directory.
You don’t need a state-of-the-art network link or an account on a web server to run the socket and client-side examples in this part. Although some socket examples will be shown running remotely, most can be run on a single local machine. Client-side examples that demonstrate protocol like FTP require only basic Internet access, and email examples expect just POP and SMTP capable servers.
You don’t need an account on a web server machine to run the server-side scripts in later chapters; they can be run by any web browser. You may need such an account to change these scripts if you store them remotely, but not if you use a locally running web server as we will in this book.

We’ll discuss configuration details as we move along, but in short, when a Python script opens an Internet connection (with the socket module or one of the Internet protocol modules), Python will happily use whatever Internet link exists on your machine, be that a dedicated T1 line, a DSL line, or a simple modem. For instance, opening a socket on a Windows PC automatically initiates processing to create a connection to your Internet provider if needed.

Moreover, as long as your platform supports sockets, you probably can run many of the examples here even if you have no Internet connection at all. As we’ll see, a machine name localhost or "" (an empty string) usually means the local computer itself. This allows you to test both the client and the server sides of a dialog on the same computer without connecting to the Net. For example, you can run both socket-based clients and servers locally on a Windows PC without ever going out to the Net. In other words, you can likely run the programs here whether you have a way to connect to the Internet or not.

Some later examples assume that a particular kind of server is running on a server machine (e.g., FTP, POP, SMTP), but client-side scripts themselves work on any Internet-aware machine with Python installed. Server-side examples in Chapters 15 and 16 require more: to develop CGI scripts, you’ll need to either have a web server account or run a web server program locally on your own computer (which is easier than you may think—we’ll learn how to code a simple one in Python in Chapter 15). Advanced third-party systems like Jython and Zope must be downloaded separately, of course; we’ll peek at some of these briefly in this chapter but defer to their own documentation for more details.

Besides creating the Python language, Guido van Rossum also wrote a World Wide Web browser in Python years ago, named (appropriately enough) Grail. Grail was partly developed as a demonstration of Python’s capabilities. It allows users to browse the Web much like Firefox or Internet Explorer, but it can also be programmed with Grail applets—Python/tkinter programs downloaded from a server when accessed and run on the client by the browser. Grail applets work much like Java applets in more widespread browsers (more on applets in the next section).

Though it was updated to run under recent Python releases as I was finishing this edition, Grail is no longer under active development today, and it is mostly used for research purposes (indeed, the Netscape browser was counted among its contemporaries). Nevertheless, Python still reaps the benefits of the Grail project, in the form of a rich set of Internet tools. To write a full-featured web browser, you need to support a wide variety of Internet protocols, and Guido packaged support for all of these as standard library modules that were eventually shipped with the Python language.

Because of this legacy, Python now includes standard support for Usenet news (NNTP), email processing (POP, SMTP, IMAP), file transfers (FTP), web pages and interactions (HTTP, URLs, HTML, CGI), and other commonly used protocols such as Telnet. Python scripts can connect to all of these Internet components by simply importing the associated library module.

Since Grail, additional tools have been added to Python’s library for parsing XML files, OpenSSL secure sockets, and more. But much of Python’s Internet support can be traced back to the Grail browser—another example of Python’s support for code reuse at work. At this writing, you can still find the Grail by searching for “Grail web browser” at your favorite web search engine.

Python Internet Development Options

Although many are outside our scope here, there are a variety of ways that Python programmers script the Web. Just as we did for GUIs, I want to begin with a quick overview of some of the more popular tools in this domain before we jump into the fundamentals.

Networking tools

As we’ll see in this chapter, Python comes with tools the support basic networking, as well as implementation of custom types of network servers. This includes sockets, but also the select call for asynchronous servers, as well as higher-order and pre-coded socket server classes. Standard library modules socket, select, and socketserver support all these roles.

Client-side protocol tools

As we’ll see in the next chapter, Python’s Internet arsenal also includes canned support for the client side of most standard Internet protocols—scripts can easily make use of FTP, email, HTTP, Telnet, and more. Especially when wedded to desktop GUIs of the sort we met in the preceding part of this book, these tools open the door to full-featured and highly responsive Web-aware applications.

Server-side CGI scripting

Perhaps the simplest way to implement interactive website behavior, CGI scripting is an application model for running scripts on servers to process form data, take action based upon it, and produce reply pages. We’ll use it later in this part of the book. It’s supported by Python’s standard library directly, is the basis for much of what happens on the Web, and suffices for simpler site development tasks. Raw CGI scripting doesn’t by itself address issues such as cross-page state retention and concurrent updates, but CGI scripts that use devices like cookies and database systems often can.

Web frameworks and clouds

For more demanding Web work, frameworks can automate many of the low-level details and provide more structured and powerful techniques for dynamic site implementation. Beyond basic CGI scripts, the Python world is flush with third-party web frameworks such as Django—a high-level framework that encourages rapid development and clean, pragmatic design and includes a dynamic database access API and its own server-side templating language; Google App Engine—a “cloud computing” framework that provides enterprise-level tools for use in Python scripts and allows sites to leverage the capacity of Google’s Web infrastructure; and Turbo Gears—an integrated collection of tools including a JavaScript library, a template system, CherryPy for web interaction, and SQLObject for accessing databases using Python’s class model.

Also in the framework category are Zope—an open source web application server and toolkit, written in and customizable with Python, in which websites are implemented using a fundamentally object-oriented model; Plone—a Zope-based website builder which provides a workflow model (called a content management system) that allows content producers to add their content to a site; and other popular systems for website construction, including pylons, web2py, CherryPy, and Webware.

Many of these frameworks are based upon the now widespread MVC (model-view-controller) structure , and most provide state retention solutions that wrap database storage. Some make use of the ORM (object relational mapping) model we’ll meet in the next part of the book, which superimposes Python’s classes onto relational database tables, and Zope stores objects in your site in the ZODB object-oriented database we’ll study in the next part as well.

Rich Internet Applications (revisited)

Discussed at the start of Chapter 7, newer and emerging “rich Internet application” (RIA) systems such as Flex, Silverlight, JavaFX, and pyjamas allow user interfaces implemented in web browsers to be much more dynamic and functional than HTML has traditionally allowed. These are client-side solutions, based generally upon AJAX and JavaScript, which provide widget sets that rival those of traditional “desktop” GUIs and provide for asynchronous communication with web servers. According to some observers, such interactivity is a major component of the “Web 2.0” model.

Ultimately, the web browser is a “desktop” GUI application, too, albeit one which is very widely available and which can be generalized with RIA techniques to serve as a platform for rendering other GUIs, using software layers that do not rely on a particular GUI library. In effect, RIAs turn web browsers into extendable GUIs.

At least that’s their goal today. Compared to traditional GUIs, RIAs gain some portability and deployment simplicity, in exchange for decreased performance and increased software stack complexity. Moreover, much as in the GUI realm, there are already competing RIA toolkits today which may add dependencies and impact portability. Unless a pervasive frontrunner appears, using a RIA application may require an install step, not unlike desktop applications.

Stay tuned, though; like the Web at large, the RIA story is still a work in progress. The emerging HTML5 standard, for instance, while likely not to become prevalent for some years to come, may obviate the need for RIA browser plug-ins eventually.

Web services: XML-RPC, SOAP

XML-RPC is a technology that provides remote procedural calls to components over networks. It routes requests over the HTTP protocol and ships data back and forth packaged as XML text. To clients, web servers appear to be simple functions; when function calls are issued, passed data is encoded as XML and shipped to remote servers using the Web’s HTTP transport mechanism. The net effect is to simplify the interface to web servers in client-side programs.

More broadly, XML-RPC fosters the notion of web services—reusable software components that run on the Web—and is supported by Python’s xmlrpc.client module, which handles the client side of this protocol, and xmlrcp.server, which provides tools for the server side. SOAP is a similar but generally heavier web services protocol, available to Python in the third-party SOAPy and ZSI packages, among others.

CORBA ORBs

An earlier but comparable technology, CORBA is an architecture for distributed programming, in which components communicate across a network by routing calls through an Object Request Broker (ORB). Python support for CORBA is available in the third-party OmniORB package, as well as the (still available though not recently maintained) ILU system.

Java and .NET: Jython and IronPython

We also met Jython and IronPython briefly at the start of Chapter 7, in the context of GUIs. By compiling Python script to Java bytecode, Jython also allows Python scripts to be used in any context that Java programs can. This includes web-oriented roles, such as applets stored on the server but run on the client when referenced within web pages. The IronPython system also mentioned in Chapter 7 similarly offers Web-focused options, including access to the Silverlight RIA framework and its Moonlight implementation in the Mono system for Linux.

Screen scraping: XML and HTML parsing tools

Though not technically tied to the Internet, XML text often appears in such roles. Because of its other roles, though, we’ll study Python’s basic XML parsing support, as well as third-party extensions to it, in the next part of this book, when we explore Python’s text processing toolkit. As we’ll see, Python’s xml package comes with support for DOM, SAX, and ElementTree style XML parsing, and the open source domain provides extensions for XPath and much more. Python’s html.parser library module also provides a HTML-specific parser, with a model not unlike that of XML’s SAX technique. Such tools can be used in screen scraping roles, to extract content of web pages fetched with urllib.request tools.

Windows COM and DCOM

The PyWin32 package allows Python scripts to communicate via COM on Windows to perform feats such as editing Word documents and populating Excel spreadsheets (additional tools support Excel document processing). Though not related to the Internet itself (and being arguably upstaged by .NET in recent years), the distributed extension to COM, DCOM, offers additional options for distributing applications over networks.

Other tools

Other tools serve more specific roles. Among this crowd are mod_python—a system which optimizes the execution of Python server-scripts in the Apache web server; Twisted—an asynchronous, event-driven, networking framework written in Python, with support for a large number of network protocols and with precoded implementations of common network servers; HTMLgen—a lightweight tool that allows HTML code to be generated from a tree of Python objects that describes a web page; and Python Server Pages (PSP)—a server-side templating technology that embeds Python code inside HTML, runs it with request context to render part of a reply page, and is strongly reminiscent of PHP, ASP, and JSP.

As you might expect given the prominence of the Web, there are more Internet tools for Python than we have space to discuss here. For more on this front, see the PyPI website at http://python.org/, or visit your favorite web search engine (some of which are implemented using Python’s Internet tools themselves).

Again, the goal of this book is to cover the fundamentals in an in-depth way, so that you’ll have the context needed to use tools like some of those above well, when you’re ready to graduate to more comprehensive solutions. As we’ll see, the basic model of CGI scripting we’ll meet here illustrates the mechanisms underlying all web development, whether it’s implemented by bare-bones scripts, or advanced frameworks.

Because we must walk before we can run well, though, let’s start at the bottom here, and get a handle on what the Internet really is. The Internet today rests upon a rich software stack; while tools can hide some of its complexity, programming it skillfully still requires knowledge of all its layers. As we’ll see, deploying Python on the Web, especially with higher-order web frameworks like those listed above, is only possible because we truly are “surfing on the shoulders of giants.”

Plumbing the Internet

Unless you’ve been living in a cave for the last decade or two, you are probably already familiar with the Internet, at least from a user’s perspective. Functionally, we use it as a communication and information medium, by exchanging email, browsing web pages, transferring files, and so on. Technically, the Internet consists of many layers of abstraction and devices—from the actual wires used to send bits across the world to the web browser that grabs and renders those bits into text, graphics, and audio on your computer.

In this book, we are primarily concerned with the programmer’s interface to the Internet. This, too, consists of multiple layers: sockets, which are programmable interfaces to the low-level connections between machines, and standard protocols, which add structure to discussions carried out over sockets. Let’s briefly look at each of these layers in the abstract before jumping into programming details.

The Socket Layer

In simple terms, sockets are a programmable interface to connections between programs, possibly running on different computers of a network. They allow data formatted as byte strings to be passed between processes and machines. Sockets also form the basis and low-level “plumbing” of the Internet itself: all of the familiar higher-level Net protocols, like FTP, web pages, and email, ultimately occur over sockets. Sockets are also sometimes called communications endpoints because they are the portals through which programs send and receive bytes during a conversation.

Although often used for network conversations, sockets may also be used as a communication mechanism between programs running on the same computer, taking the form of a general Inter-Process Communication (IPC) mechanism. We saw this socket usage mode briefly in Chapter 5. Unlike some IPC devices, sockets are bidirectional data streams: programs may both send and receive data through them.

To programmers, sockets take the form of a handful of calls available in a library. These socket calls know how to send bytes between machines, using lower-level operations such as the TCP network transmission control protocol. At the bottom, TCP knows how to transfer bytes, but it doesn’t care what those bytes mean. For the purposes of this text, we will generally ignore how bytes sent to sockets are physically transferred. To understand sockets fully, though, we need to know a bit about how computers are named.

Machine identifiers

Suppose for just a moment that you wish to have a telephone conversation with someone halfway across the world. In the real world, you would probably need either that person’s telephone number or a directory that you could use to look up the number from her name (e.g., a telephone book). The same is true on the Internet: before a script can have a conversation with another computer somewhere in cyberspace, it must first know that other computer’s number or name.

Luckily, the Internet defines standard ways to name both a remote machine and a service provided by that machine. Within a script, the computer program to be contacted through a socket is identified by supplying a pair of values—the machine name and a specific port number on that machine:

Machine names: A machine name may take the form of either a string of numbers separated by dots, called an IP address (e.g., 166.93.218.100), or a more legible form known as a domain name (e.g., starship.python.net). Domain names are automatically mapped into their dotted numeric address equivalent when used, by something called a domain name server—a program on the Net that serves the same purpose as your local telephone directory assistance service. As a special case, the machine name localhost, and its equivalent IP address 127.0.0.1, always mean the same local machine; this allows us to refer to servers running locally on the same computer as its clients.
Port numbers: A port number is an agreed-upon numeric identifier for a given conversation. Because computers on the Net support a variety of services, port numbers are used to name a particular conversation on a given machine. For two machines to talk over the Net, both must associate sockets with the same machine name and port number when initiating network connections. As we’ll see, Internet protocols such as email and the Web have standard reserved port numbers for their connections, so clients can request a service regardless of the machine providing it. Port number 80, for example, usually provides web pages on any web server machine.

The combination of a machine name and a port number uniquely identifies every dialog on the Net. For instance, an ISP’s computer may provide many kinds of services for customers—web pages, Telnet, FTP transfers, email, and so on. Each service on the machine is assigned a unique port number to which requests may be sent. To get web pages from a web server, programs need to specify both the web server’s Internet Protocol (IP) or domain name and the port number on which the server listens for web page requests.

If this sounds a bit strange, it may help to think of it in old-fashioned terms. To have a telephone conversation with someone within a company, for example, you usually need to dial both the company’s phone number and the extension of the person you want to reach. If you don’t know the company’s number, you can probably find it by looking up the company’s name in a phone book. It’s almost the same on the Net—machine names identify a collection of services (like a company), port numbers identify an individual service within a particular machine (like an extension), and domain names are mapped to IP numbers by domain name servers (like a phone book).

When programs use sockets to communicate in specialized ways with another machine (or with other processes on the same machine), they need to avoid using a port number reserved by a standard protocol—numbers in the range of 0 to 1023—but we first need to discuss protocols to understand why.

The Protocol Layer

Although sockets form the backbone of the Internet, much of the activity that happens on the Net is programmed with protocols,^[43] which are higher-level message models that run on top of sockets. In short, the standard Internet protocols define a structured way to talk over sockets. They generally standardize both message formats and socket port numbers:

Message formats provide structure for the bytes exchanged over sockets during conversations.
Port numbers are reserved numeric identifiers for the underlying sockets over which messages are exchanged.

Raw sockets are still commonly used in many systems, but it is perhaps more common (and generally easier) to communicate with one of the standard higher-level Internet protocols. As we’ll see, Python provides support for standard protocols, which automates most of the socket and message formatting details.

Port number rules

Technically speaking, socket port numbers can be any 16-bit integer value between 0 and 65,535. However, to make it easier for programs to locate the standard protocols, port numbers in the range of 0 to 1023 are reserved and preassigned to the standard higher-level protocols. Table 12-1 lists the ports reserved for many of the standard protocols; each gets one or more preassigned numbers from the reserved range.

Table 12-1. Port numbers reserved for common protocols

Protocol	Common function	Port number	Python module
HTTP	Web pages	80	`http.client`, `http.server`
NNTP	Usenet news	119	`nntplib`
FTP data default	File transfers	20	`ftplib`
FTP control	File transfers	21	`ftplib`
SMTP	Sending email	25	`smtplib`
POP3	Fetching email	110	`poplib`
IMAP4	Fetching email	143	`imaplib`
Finger	Informational	79	n/a
SSH	Command lines	22	n/a: third party
Telnet	Command lines	23	`telnetlib`

Clients and servers

To socket programmers, the standard protocols mean that port numbers 0 to 1023 are off-limits to scripts, unless they really mean to use one of the higher-level protocols. This is both by standard and by common sense. A Telnet program, for instance, can start a dialog with any Telnet-capable machine by connecting to its port, 23; without preassigned port numbers, each server might install Telnet on a different port. Similarly, websites listen for page requests from browsers on port 80 by standard; if they did not, you might have to know and type the HTTP port number of every site you visit while surfing the Net.

By defining standard port numbers for services, the Net naturally gives rise to a client/server architecture. On one side of a conversation, machines that support standard protocols perpetually run a set of programs that listen for connection requests on the reserved ports. On the other end of a dialog, other machines contact those programs to use the services they export.

We usually call the perpetually running listener program a server and the connecting program a client. Let’s use the familiar web browsing model as an example. As shown in Table 12-1, the HTTP protocol used by the Web allows clients and servers to talk over sockets on port 80:

Server: A machine that hosts websites usually runs a web server program that constantly listens for incoming connection requests, on a socket bound to port 80. Often, the server itself does nothing but watch for requests on its port perpetually; handling requests is delegated to spawned processes or threads.
Clients: Programs that wish to talk to this server specify the server machine’s name and port 80 to initiate a connection. For web servers, typical clients are web browsers like Firefox, Internet Explorer, or Chrome, but any script can open a client-side connection on port 80 to fetch web pages from the server. The server’s machine name can also be simply “localhost” if it’s the same as the client’s.

In general, many clients may connect to a server over sockets, whether it implements a standard protocol or something more specific to a given application. And in some applications, the notion of client and server is blurred—programs can also pass bytes between each other more as peers than as master and subordinate. An agent in a peer-to-peer file transfer system, for instance, may at various times be both client and server for parts of files transferred.

For the purposes of this book, though, we usually call programs that listen on sockets servers, and those that connect clients. We also sometimes call the machines that these programs run on server and client (e.g., a computer on which a web server program runs may be called a web server machine, too), but this has more to do with the physical than the functional.

Protocol structures

Functionally, protocols may accomplish a familiar task, like reading email or posting a Usenet newsgroup message, but they ultimately consist of message bytes sent over sockets. The structure of those message bytes varies from protocol to protocol, is hidden by the Python library, and is mostly beyond the scope of this book, but a few general words may help demystify the protocol layer.

Some protocols may define the contents of messages sent over sockets; others may specify the sequence of control messages exchanged during conversations. By defining regular patterns of communication, protocols make communication more robust. They can also minimize deadlock conditions—machines waiting for messages that never arrive.

For example, the FTP protocol prevents deadlock by conversing over two sockets: one for control messages only and one to transfer file data. An FTP server listens for control messages (e.g., “send me a file”) on one port, and transfers file data over another. FTP clients open socket connections to the server machine’s control port, send requests, and send or receive file data over a socket connected to a data port on the server machine. FTP also defines standard message structures passed between client and server. The control message used to request a file, for instance, must follow a standard format.

Python’s Internet Library Modules

If all of this sounds horribly complex, cheer up: Python’s standard protocol modules handle all the details. For example, the Python library’s ftplib module manages all the socket and message-level handshaking implied by the FTP protocol. Scripts that import ftplib have access to a much higher-level interface for FTPing files and can be largely ignorant of both the underlying FTP protocol and the sockets over which it runs.^[44]

In fact, each supported protocol is represented in Python’s standard library by either a module package of the same name as the protocol or by a module file with a name of the form xxxlib.py, where xxx is replaced by the protocol’s name. The last column in Table 12-1 gives the module name for some standard protocol modules. For instance, FTP is supported by the module file ftplib.py and HTTP by package http.*. Moreover, within the protocol modules, the top-level interface object is often the name of the protocol. So, for instance, to start an FTP session in a Python script, you run import ftplib and pass appropriate parameters in a call to ftplib.FTP; for Telnet, create a telnetlib.Telnet instance.

In addition to the protocol implementation modules in Table 12-1, Python’s standard library also contains modules for fetching replies from web servers for a web page request (urllib.request), parsing and handling data once it has been transferred over sockets or protocols (html.parser, the email.* and xml.* packages), and more. Table 12-2 lists some of the more commonly used modules in this category.

Table 12-2. Common Internet-related standard modules

Python modules	Utility
`socket`, `ssl`	Network and IPC communications support (TCP/IP, UDP, etc.), plus SSL secure sockets wrapper
`cgi`	Server-side CGI script support: parse input stream, escape HTML text, and so on
`urllib.request`	Fetch web pages from their addresses (URLs)
`urllib.parse`	Parse URL string into components, escape URL text
`http.client`, `ftplib`, `nntplib`	HTTP (web), FTP (file transfer), and NNTP (news) client protocol modules
`http.cookies`, `http.cookiejar`	HTTP cookies support (data stored on clients by website request, server- and client-side support)
`poplib`, `imaplib`, `smtplib`	POP, IMAP (mail fetch), and SMTP (mail send) protocol modules
`telnetlib`	Telnet protocol module
`html.parser`, `xml.*`	Parse web page contents (HTML and XML documents)
`xdrlib`, `socket`	Encode binary data portably for transmission
`struct, pickle`	Encode Python objects as packed binary data or serialized byte strings for transmission
`email.*`	Parse and compose email messages with headers, attachments, and encodings
`mailbox`	Process on disk mailboxes and their messages
`mimetypes`	Guess file content types from names and extensions from types
`uu`, `binhex`, `base64`, `binascii`, `quopri`, `email.*`	Encode and decode binary (or other) data transmitted as text (automatic in `email` package)
`socketserver`	Framework for general Net servers
`http.server`	Basic HTTP server implementation, with request handlers for simple and CGI-aware servers

We will meet many of the modules in this table in the next few chapters of this book, but not all of them. Moreover, there are additional Internet modules in Python not shown here. The modules demonstrated in this book will be representative, but as always, be sure to see Python’s standard Library Reference Manual for more complete and up-to-date lists and details.

If you want the full story on protocols and ports, at this writing you can find a comprehensive list of all ports reserved for protocols or registered as used by various common systems by searching the web pages maintained by the Internet Engineering Task Force (IETF) and the Internet Assigned Numbers Authority (IANA). The IETF is the organization responsible for maintaining web protocols and standards. The IANA is the central coordinator for the assignment of unique parameter values for Internet protocols. Another standards body, the W3 (for WWW), also maintains relevant documents. See these web pages for more details:

http://www.ietf.org

http://www.iana.org/numbers.html

http://www.iana.org/assignments/port-numbers

http://www.w3.org

It’s not impossible that more recent repositories for standard protocol specifications will arise during this book’s shelf life, but the IETF website will likely be the main authority for some time to come. If you do look, though, be warned that the details are, well, detailed. Because Python’s protocol modules hide most of the socket and messaging complexity documented in the protocol standards, you usually don’t need to memorize these documents to get web work done with Python.

Socket Programming

Now that we’ve seen how sockets figure into the Internet picture, let’s move on to explore the tools that Python provides for programming sockets with Python scripts. This section shows you how to use the Python socket interface to perform low-level network communications. In later chapters, we will instead use one of the higher-level protocol modules that hide underlying sockets. Python’s socket interfaces can be used directly, though, to implement custom network dialogs and to access standard protocols manually.

As previewed in Chapter 5, the basic socket interface in Python is the standard library’s socket module. Like the os POSIX module, Python’s socket module is just a thin wrapper (interface layer) over the underlying C library’s socket calls. Like Python files, it’s also object-based—methods of a socket object implemented by this module call out to the corresponding C library’s operations after data conversions. For instance, the C library’s send and recv function calls become methods of socket objects in Python.

Python’s socket module supports socket programming on any machine that supports BSD-style sockets—Windows, Macs, Linux, Unix, and so on—and so provides a portable socket interface. In addition, this module supports all commonly used socket types—TCP/IP, UDP, datagram, and Unix domain—and can be used as both a network interface API and a general IPC mechanism between processes running on the same machine.

From a functional perspective, sockets are a programmer’s device for transferring bytes between programs, possibly running on different computers. Although sockets themselves transfer only byte strings, we can also transfer Python objects through them by using Python’s pickle module. Because this module converts Python objects such as lists, dictionaries, and class instances to and from byte strings, it provides the extra step needed to ship higher-level objects through sockets when required.

Python’s struct module can also be used to format Python objects as packed binary data byte strings for transmission, but is generally limited in scope to objects that map to types in the C programming language. The pickle module supports transmission of larger object, such as dictionaries and class instances. For other tasks, including most standard Internet protocols, simpler formatted byte strings suffice. We’ll learn more about pickle later in this chapter and book.

Beyond basic data communication tasks, the socket module also includes a variety of more advanced tools. For instance, it has calls for the following and more:

Converting bytes to a standard network ordering (ntohl, htonl)
Querying machine name and address (gethostname, gethostbyname)
Wrapping socket objects in a file object interface (sockobj.makefile)
Making socket calls nonblocking (sockobj.setblocking)
Setting socket timeouts (sockobj.settimeout)

Provided your Python was compiled with Secure Sockets Layer (SSL) support, the ssl standard library module also supports encrypted transfers with its ssl.wrap_socket call. This call wraps a socket object in SSL logic, which is used in turn by other standard library modules to support the HTTPS secure website protocol (http.client and urllib.request), secure email transfers (poplib and smtplib), and more. We’ll meet some of these other modules later in this part of the book, but we won’t study all of the socket module’s advanced features in this text; see the Python library manual for usage details omitted here.

Socket Basics

Although we won’t get into advanced socket use in this chapter, basic socket transfers are remarkably easy to code in Python. To create a connection between machines, Python programs import the socket module, create a socket object, and call the object’s methods to establish connections and send and receive data.

Sockets are inherently bidirectional in nature, and socket object methods map directly to socket calls in the C library. For example, the script in Example 12-1 implements a program that simply listens for a connection on a socket and echoes back over a socket whatever it receives through that socket, adding Echo=> string prefixes.

Example 12-1. PP4EInternetSocketsecho-server.py

"""
Server side: open a TCP/IP socket on a port, listen for a message from
a client, and send an echo reply; this is a simple one-shot listen/reply
conversation per client, but it goes into an infinite loop to listen for
more clients as long as this server script runs; the client may run on
a remote machine, or on same computer if it uses 'localhost' for server
"""

from socket import *                    # get socket constructor and constants
myHost = ''                             # '' = all available interfaces on host
myPort = 50007                          # listen on a non-reserved port number

sockobj = socket(AF_INET, SOCK_STREAM)       # make a TCP socket object
sockobj.bind((myHost, myPort))               # bind it to server port number
sockobj.listen(5)                            # listen, allow 5 pending connects

while True:                                  # listen until process killed
    connection, address = sockobj.accept()   # wait for next client connect
    print('Server connected by', address)    # connection is a new socket
    while True:
        data = connection.recv(1024)         # read next line on client socket
        if not data: break                   # send a reply line to the client
        connection.send(b'Echo=>' + data)    # until eof when socket closed
    connection.close()

As mentioned earlier, we usually call programs like this that listen for incoming connections servers because they provide a service that can be accessed at a given machine and port on the Internet. Programs that connect to such a server to access its service are generally called clients. Example 12-2 shows a simple client implemented in Python.

Example 12-2. PP4EInternetSocketsecho-client.py

"""
Client side: use sockets to send data to the server, and print server's
reply to each message line; 'localhost' means that the server is running
on the same machine as the client, which lets us test client and server
on one machine;  to test over the Internet, run a server on a remote
machine, and set serverHost or argv[1] to machine's domain name or IP addr;
Python sockets are a portable BSD socket interface, with object methods
for the standard socket calls available in the system's C library;
"""

import sys
from socket import *              # portable socket interface plus constants
serverHost = 'localhost'          # server name, or: 'starship.python.net'
serverPort = 50007                # non-reserved port used by the server

message = [b'Hello network world']          # default text to send to server
                                            # requires bytes: b'' or str,encode()
if len(sys.argv) > 1:
    serverHost = sys.argv[1]                # server from cmd line arg 1
    if len(sys.argv) > 2:                   # text from cmd line args 2..n
        message = (x.encode() for x in sys.argv[2:])

sockobj = socket(AF_INET, SOCK_STREAM)      # make a TCP/IP socket object
sockobj.connect((serverHost, serverPort))   # connect to server machine + port

for line in message:
    sockobj.send(line)                      # send line to server over socket
    data = sockobj.recv(1024)               # receive line from server: up to 1k
    print('Client received:', data)         # bytes are quoted, was `x`, repr(x)

sockobj.close()                             # close socket to send eof to server

Server socket calls

Before we see these programs in action, let’s take a minute to explain how this client and server do their stuff. Both are fairly simple examples of socket scripts, but they illustrate the common call patterns of most socket-based programs. In fact, this is boilerplate code: most connected socket programs generally make the same socket calls that our two scripts do, so let’s step through the important points of these scripts line by line.

Programs such as Example 12-1 that provide services for other programs with sockets generally start out by following this sequence of calls:

sockobj = socket(AF_INET, SOCK_STREAM)

Uses the Python socket module to create a TCP socket object. The names AF_INET and SOCK_STREAM are preassigned variables defined by and imported from the socket module; using them in combination means “create a TCP/IP socket,” the standard communication device for the Internet. More specifically, AF_INET means the IP address protocol, and SOCK_STREAM means the TCP transfer protocol. The AF_INET/SOCK_STREAM combination is the default because it is so common, but it’s typical to make this explicit.

If you use other names in this call, you can instead create things like UDP connectionless sockets (use SOCK_DGRAM second) and Unix domain sockets on the local machine (use AF_UNIX first), but we won’t do so in this book. See the Python library manual for details on these and other socket module options. Using other socket types is mostly a matter of using different forms of boilerplate code.

sockobj.bind((myHost, myPort))

Associates the socket object with an address—for IP addresses, we pass a server machine name and port number on that machine. This is where the server identifies the machine and port associated with the socket. In server programs, the hostname is typically an empty string (“”), which means the machine that the script runs on (formally, all available local and remote interfaces on the machine), and the port is a number outside the range 0 to 1023 (which is reserved for standard protocols, described earlier).

Note that each unique socket dialog you support must have its own port number; if you try to open a socket on a port already in use, Python will raise an exception. Also notice the nested parentheses in this call—for the AF_INET address protocol socket here, we pass the host/port socket address to bind as a two-item tuple object (pass a string for AF_UNIX). Technically, bind takes a tuple of values appropriate for the type of socket created.

sockobj.listen(5)

Starts listening for incoming client connections and allows for a backlog of up to five pending requests. The value passed sets the number of incoming client requests queued by the operating system before new requests are denied (which happens only if a server isn’t fast enough to process requests before the queues fill up). A value of 5 is usually enough for most socket-based programs; the value must be at least 1.

At this point, the server is ready to accept connection requests from client programs running on remote machines (or the same machine) and falls into an infinite loop—while True (or the equivalent while 1 for older Pythons and ex-C programmers)—waiting for them to arrive:

connection, address = sockobj.accept(): Waits for the next client connection request to occur; when it does, the accept call returns a brand-new socket object over which data can be transferred from and to the connected client. Connections are accepted on sockobj, but communication with a client happens on connection, the new socket. This call actually returns a two-item tuple—address is the connecting client’s Internet address. We can call accept more than one time, to service multiple client connections; that’s why each call returns a new, distinct socket for talking to a particular client.

Once we have a client connection, we fall into another loop to receive data from the client in blocks of up to 1,024 bytes at a time, and echo each block back to the client:

data = connection.recv(1024): Reads at most 1,024 more bytes of the next message sent from a client (i.e., coming across the network or IPC connection), and returns it to the script as a byte string. We get back an empty byte string when the client has finished—end-of-file is triggered when the client closes its end of the socket.
connection.send(b'Echo=>' + data): Sends the latest byte string data block back to the client program, prepending the string 'Echo=>' to it first. The client program can then recv what we send here—the next reply line. Technically this call sends as much data as possible, and returns the number of bytes actually sent. To be fully robust, some programs may need to resend unsent portions or use connection.sendall to force all bytes to be sent.
connection.close(): Shuts down the connection with this particular client.

Transferring byte strings and objects

So far we’ve seen calls used to transfer data in a server, but what is it that is actually shipped through a socket? As we learned in Chapter 5, sockets by themselves always deal in binary byte strings, not text. To your scripts, this means you must send and will receive bytes strings, not str, though you can convert to and from text as needed with bytes.decode and str.encode methods. In our scripts, we use b'...' bytes literals to satisfy socket data requirements. In other contexts, tools such as the struct and pickle modules return the byte strings we need automatically, so no extra steps are needed.

For example, although the socket model is limited to transferring byte strings, you can send and receive nearly arbitrary Python objects with the standard library pickle object serialization module. Its dumps and loads calls convert Python objects to and from byte strings, ready for direct socket transfer:

>>> import pickle
>>> x = pickle.dumps([99, 100])        # on sending end... convert to byte strings

>>> x                                  # string passed to send, returned by recv
b'x80x03]qx00(KcKde.'

>>> pickle.loads(x)                    # on receiving end... convert back to object
[99, 100]

For simpler types that correspond to those in the C language, the struct module provides the byte-string conversion we need as well:

>>> import struct
>>> x = struct.pack('>ii', 99, 100)    # convert simpler types for transmission
>>> x
b'x00x00x00cx00x00x00d'
>>> struct.unpack('>ii', x)
(99, 100)

When converted this way, Python native objects become candidates for socket-based transfers. See Chapter 4 for more on struct. We previewed pickle and object serialization in Chapter 1, but we’ll learn more about it and its few pickleability constraints when we explore data persistence in Chapter 17.

In fact there are a variety of ways to extend the basic socket transfer model. For instance, much like os.fdopen and open for the file descriptors we studied in Chapter 4, the socket.makefile method allows you to wrap sockets in text-mode file objects that handle text encodings for you automatically. This call also allows you to specify nondefault Unicode encodings and end-line behaviors in text mode with extra arguments in 3.X just like the open built-in function. Because its result mimics file interfaces, the socket.makefile call additionally allows the pickle module’s file-based calls to transfer objects over sockets implicitly. We’ll see more on socket file wrappers later in this chapter.

For our simpler scripts here, hardcoded byte strings and direct socket calls do the job. After talking with a given connected client, the server in Example 12-1 goes back to its infinite loop and waits for the next client connection request. Let’s move on to see what happened on the other side of the fence.

Client socket calls

The actual socket-related calls in client programs like the one shown in Example 12-2 are even simpler; in fact, half of that script is preparation logic. The main thing to keep in mind is that the client and server must specify the same port number when opening their sockets and the client must identify the machine on which the server is running; in our scripts, server and client agree to use port number 50007 for their conversation, outside the standard protocol range. Here are the client’s socket calls:

sockobj = socket(AF_INET, SOCK_STREAM): Creates a Python socket object in the client program, just like the server.
sockobj.connect((serverHost, serverPort)): Opens a connection to the machine and port on which the server program is listening for client connections. This is where the client specifies the string name of the service to be contacted. In the client, we can either specify the name of the remote machine as a domain name (e.g., starship.python.net) or numeric IP address. We can also give the server name as localhost (or the equivalent IP address 127.0.0.1) to specify that the server program is running on the same machine as the client; that comes in handy for debugging servers without having to connect to the Net. And again, the client’s port number must match the server’s exactly. Note the nested parentheses again—just as in server bind calls, we really pass the server’s host/port address to connect in a tuple object.

Once the client establishes a connection to the server, it falls into a loop, sending a message one line at a time and printing whatever the server sends back after each line is sent:

sockobj.send(line): Transfers the next byte-string message line to the server over the socket. Notice that the default list of lines contains bytes strings (b'...'). Just as on the server, data passed through the socket must be a byte string, though it can be the result of a manual str.encode encoding call or an object conversion with pickle or struct if desired. When lines to be sent are given as command-line arguments instead, they must be converted from str to bytes; the client arranges this by encoding in a generator expression (a call map(str.encode, sys.argv[2:]) would have the same effect).
data = sockobj.recv(1024): Reads the next reply line sent by the server program. Technically, this reads up to 1,024 bytes of the next reply message and returns it as a byte string.
sockobj.close(): Closes the connection with the server, sending it the end-of-file signal.

And that’s it. The server exchanges one or more lines of text with each client that connects. The operating system takes care of locating remote machines, routing bytes sent between programs and possibly across the Internet, and (with TCP) making sure that our messages arrive intact. That involves a lot of processing, too—our strings may ultimately travel around the world, crossing phone wires, satellite links, and more along the way. But we can be happily ignorant of what goes on beneath the socket call layer when programming in Python.

Running Socket Programs Locally

Let’s put this client and server to work. There are two ways to run these scripts—on either the same machine or two different machines. To run the client and the server on the same machine, bring up two command-line consoles on your computer, start the server program in one, and run the client repeatedly in the other. The server keeps running and responds to requests made each time you run the client script in the other window.

For instance, here is the text that shows up in the MS-DOS console window where I’ve started the server script:

C:...PP4EInternetSockets> python echo-server.py
Server connected by ('127.0.0.1', 57666)
Server connected by ('127.0.0.1', 57667)
Server connected by ('127.0.0.1', 57668)

The output here gives the address (machine IP name and port number) of each connecting client. Like most servers, this one runs perpetually, listening for client connection requests. This server receives three, but I have to show you the client window’s text for you to understand what this means:

C:...PP4EInternetSockets> python echo-client.py
Client received: b'Echo=>Hello network world'

C:...PP4EInternetSockets> python echo-client.py localhost spam Spam SPAM
Client received: b'Echo=>spam'
Client received: b'Echo=>Spam'
Client received: b'Echo=>SPAM'

C:...PP4EInternetSockets> python echo-client.py localhost Shrubbery
Client received: b'Echo=>Shrubbery'

Here, I ran the client script three times, while the server script kept running in the other window. Each client connected to the server, sent it a message of one or more lines of text, and read back the server’s reply—an echo of each line of text sent from the client. And each time a client is run, a new connection message shows up in the server’s window (that’s why we got three). Because the server’s coded as an infinite loop, you may need to kill it with Task Manager on Windows when you’re done testing, because a Ctrl-C in the server’s console window is ignored; other platforms may fare better.

It’s important to notice that client and server are running on the same machine here (a Windows PC). The server and client agree on the port number, but they use the machine names "" and localhost, respectively, to refer to the computer on which they are running. In fact, there is no Internet connection to speak of. This is just IPC, of the sort we saw in Chapter 5: sockets also work well as cross-program communications tools on a single machine.

Running Socket Programs Remotely

To make these scripts talk over the Internet rather than on a single machine and sample the broader scope of sockets, we have to do some extra work to run the server on a different computer. First, upload the server’s source file to a remote machine where you have an account and a Python. Here’s how I do it with FTP to a site that hosts a domain name of my own, learning-python.com; most informational lines in the following have been removed, your server name and upload interface details will vary, and there are other ways to copy files to a computer (e.g., FTP client GUIs, email, web page post forms, and so on—see Tips on Using Remote Servers for hints on accessing remote servers):

C:...PP4EInternetSockets> ftp learning-python.com
Connected to learning-python.com.
User (learning-python.com:(none)): xxxxxxxx
Password: yyyyyyyy
ftp> mkdir scripts
ftp> cd scripts
ftp> put echo-server.py
ftp> quit

Once you have the server program loaded on the other computer, you need to run it there. Connect to that computer and start the server program. I usually Telnet or SSH into my server machine and start the server program as a perpetually running process from the command line. The & syntax in Unix/Linux shells can be used to run the server script in the background; we could also make the server directly executable with a #! line and a chmod command (see Chapter 3 for details).

Here is the text that shows up in a window on my PC that is running a SSH session with the free PuTTY client, connected to the Linux server where my account is hosted (again, less a few deleted informational lines):

login as: xxxxxxxx
[email protected]'s password: yyyyyyyy
Last login: Fri Apr 23 07:46:33 2010 from 72.236.109.185
[...]$ cd scripts
[...]$ python echo-server.py &
[1] 23016

Now that the server is listening for connections on the Net, run the client on your local computer multiple times again. This time, the client runs on a different machine than the server, so we pass in the server’s domain or IP name as a client command-line argument. The server still uses a machine name of "" because it always listens on whatever machine it runs on. Here is what shows up in the remote learning-python.com server’s SSH window on my PC:

[...]$ Server connected by ('72.236.109.185', 57697)
Server connected by ('72.236.109.185', 57698)
Server connected by ('72.236.109.185', 57699)
Server connected by ('72.236.109.185', 57700)

And here is what appears in the Windows console window where I run the client. A “connected by” message appears in the server SSH window each time the client script is run in the client window:

C:...PP4EInternetSockets> python echo-client.py learning-python.com
Client received: b'Echo=>Hello network world'

C:...PP4EInternetSockets> python echo-client.py learning-python.com ni Ni NI
Client received: b'Echo=>ni'
Client received: b'Echo=>Ni'
Client received: b'Echo=>NI'

C:...PP4EInternetSockets> python echo-client.py learning-python.com Shrubbery
Client received: b'Echo=>Shrubbery'

The ping command can be used to get an IP address for a machine’s domain name; either machine name form can be used to connect in the client:

C:...PP4EInternetSockets> ping learning-python.com
Pinging learning-python.com [97.74.215.115] with 32 bytes of data:
Reply from 97.74.215.115: bytes=32 time=94ms TTL=47
Ctrl-C

C:...PP4EInternetSockets> python echo-client.py 97.74.215.115 Brave Sir Robin
Client received: b'Echo=>Brave'
Client received: b'Echo=>Sir'
Client received: b'Echo=>Robin'

This output is perhaps a bit understated—a lot is happening under the hood. The client, running on my Windows laptop, connects with and talks to the server program running on a Linux machine perhaps thousands of miles away. It all happens about as fast as when client and server both run on the laptop, and it uses the same library calls; only the server name passed to clients differs.

Though simple, this illustrates one of the major advantages of using sockets for cross-program communication: they naturally support running the conversing programs on different machines, with little or no change to the scripts themselves. In the process, sockets make it easy to decouple and distribute parts of a system over a network when needed.

Socket pragmatics

Before we move on, there are three practical usage details you should know. First, you can run the client and server like this on any two Internet-aware machines where Python is installed. Of course, to run the client and server on different computers, you need both a live Internet connection and access to another machine on which to run the server.

This need not be an expensive proposition, though; when sockets are opened, Python is happy to initiate and use whatever connectivity you have, be it a dedicated T1 line, wireless router, cable modem, or dial-up account. Moreover, if you don’t have a server account of your own like the one I’m using on learning-python.com, simply run client and server examples on the same machine, localhost, as shown earlier; all you need then is a computer that allows sockets, and most do.

Second, the socket module generally raises exceptions if you ask for something invalid. For instance, trying to connect to a nonexistent server (or unreachable servers, if you have no Internet link) fails:

C:...PP4EInternetSockets> python echo-client.py www.nonesuch.com hello
Traceback (most recent call last):
  File "echo-client.py", line 24, in <module>
    sockobj.connect((serverHost, serverPort))   # connect to server machine...
socket.error: [Errno 10060] A connection attempt failed because the connected
party did not properly respond after a period of time, or established connection
failed because connected host has failed to respond

Finally, also be sure to kill the server process before restarting it again, or else the port number will still be in use, and you’ll get another exception; on my remote server machine:

[...]$ ps -x
  PID TTY      STAT   TIME COMMAND
 5378 pts/0    S      0:00 python echo-server.py
22017 pts/0    Ss     0:00 -bash
26805 pts/0    R+     0:00 ps –x

[...]$ python echo-server.py
Traceback (most recent call last):
  File "echo-server.py", line 14, in <module>
    sockobj.bind((myHost, myPort))               # bind it to server port number
socket.error: [Errno 10048] Only one usage of each socket address (protocol/
network address/port) is normally permitted

A series of Ctrl-Cs will kill the server on Linux (be sure to type fg to bring it to the foreground first if started with an &):

[...]$ fg
python echo-server.py
Traceback (most recent call last):
  File "echo-server.py", line 18, in <module>
    connection, address = sockobj.accept()   # wait for next client connect
KeyboardInterrupt

As mentioned earlier, a Ctrl-C kill key combination won’t kill the server on my Windows 7 machine, however. To kill the perpetually running server process running locally on Windows, you may need to start Task Manager (e.g., using a Ctrl-Alt-Delete key combination), and then end the Python task by selecting it in the process listbox that appears. Closing the window in which the server is running will also suffice on Windows, but you’ll lose that window’s command history. You can also usually kill a server on Linux with a kill −9 pid shell command if it is running in another window or in the background, but Ctrl-C requires less typing.

Some of this chapter’s examples run server code on a remote computer. Though you can also run the examples locally on localhost, remote execution better captures the flexibility and power of sockets. To run remotely, you’ll need access to an Internet accessible computer with Python, where you can upload and run scripts. You’ll also need to be able to access the remote server from your PC. To help with this last step, here are a few hints for readers new to using remote servers.

To transfer scripts to a remote machine, the FTP command is standard on Windows machines and most others. On Windows, simply type it in a console window to connect to an FTP server or start your favorite FTP client GUI program; on Linux, type the FTP command in an xterm window. You’ll need to supply your account name and password to connect to a nonanonymous FTP site. For anonymous FTP, use “anonymous” for the username and your email address for the password.

To run scripts remotely from a command line, Telnet is a standard command on some Unix-like machines, too. On Windows, it’s often run as a client GUI. For some server machines, you’ll need to use SSH secure shell rather than Telnet to access a shell prompt. There are a variety of SSH utilities available on the Web, including PuTTY, used for this book. Python itself comes with a telnetlib telnet module, and a web search will reveals current SSH options for Python scripts, including ssh.py, paramiko, Twisted, Pexpect, and even subprocess.Popen.

Spawning Clients in Parallel

So far, we’ve run a server locally and remotely, and run individual clients manually, one after another. Realistic servers are generally intended to handle many clients, of course, and possibly at the same time. To see how our echo server handles the load, let’s fire up eight copies of the client script in parallel using the script in Example 12-3; see the end of Chapter 5 for details on the launchmodes module used here to spawn clients and alternatives such as the multiprocessing and subprocess modules.

Example 12-3. PP4EInternetSockets estecho.py

import sys
from PP4E.launchmodes import QuietPortableLauncher

numclients = 8
def start(cmdline):
    QuietPortableLauncher(cmdline, cmdline)()

# start('echo-server.py')              # spawn server locally if not yet started

args = ' '.join(sys.argv[1:])          # pass server name if running remotely
for i in range(numclients):
    start('echo-client.py %s' % args)  # spawn 8? clients to test the server

To run this script, pass no arguments to talk to a server listening on port 50007 on the local machine; pass a real machine name to talk to a server running remotely. Three console windows come into play in this scheme—the client, a local server, and a remote server. On Windows, the clients’ output is discarded when spawned from this script, but it would be similar to what we’ve already seen. Here’s the client window interaction—8 clients are spawned locally to talk to both a local and a remote server:

C:...PP4EInternetSockets> set PYTHONPATH=C:...devExamples

C:...PP4EInternetSockets> python testecho.py

C:...PP4EInternetSockets> python testecho.py learning-python.com

If the spawned clients connect to a server run locally (the first run of the script on the client), connection messages show up in the server’s window on the local machine:

C:...PP4EInternetSockets> python echo-server.py
Server connected by ('127.0.0.1', 57721)
Server connected by ('127.0.0.1', 57722)
Server connected by ('127.0.0.1', 57723)
Server connected by ('127.0.0.1', 57724)
Server connected by ('127.0.0.1', 57725)
Server connected by ('127.0.0.1', 57726)
Server connected by ('127.0.0.1', 57727)
Server connected by ('127.0.0.1', 57728)

If the server is running remotely, the client connection messages instead appear in the window displaying the SSH (or other) connection to the remote computer, here, learning-python.com:

[...]$ python echo-server.py
Server connected by ('72.236.109.185', 57729)
Server connected by ('72.236.109.185', 57730)
Server connected by ('72.236.109.185', 57731)
Server connected by ('72.236.109.185', 57732)
Server connected by ('72.236.109.185', 57733)
Server connected by ('72.236.109.185', 57734)
Server connected by ('72.236.109.185', 57735)
Server connected by ('72.236.109.185', 57736)

Preview: Denied client connections

The net effect is that our echo server converses with multiple clients, whether running locally or remotely. Keep in mind, however, that this works for our simple scripts only because the server doesn’t take a long time to respond to each client’s requests—it can get back to the top of the server script’s outer while loop in time to process the next incoming client. If it could not, we would probably need to change the server to handle each client in parallel, or some might be denied a connection.

Technically, client connections would fail after 5 clients are already waiting for the server’s attention, as specified in the server’s listen call. To prove this to yourself, add a time.sleep call somewhere inside the echo server’s main loop in Example 12-1 after a connection is accepted, to simulate a long-running task (this is from file echo-server-sleep.py in the examples package if you wish to experiment):

while True:                                  # listen until process killed
    connection, address = sockobj.accept()   # wait for next client connect
    while True:
        data = connection.recv(1024)         # read next line on client socket
        time.sleep(3)                        # take time to process request
        ...

If you then run this server and the testecho clients script, you’ll notice that not all 8 clients wind up receiving a connection, because the server is too busy to empty its pending-connections queue in time. Only 6 clients are served when I run this on Windows—one accepted initially, and 5 in the pending-requests listen queue. The other two clients are denied connections and fail.

The following shows the server and client messages produced when the server is stalled this way, including the error messages that the two denied clients receive. To see the clients’ messages on Windows, you can change testecho to use the StartArgs launcher with a /B switch at the front of the command line to route messages to the persistent console window (see file testecho-messages.py in the examples package):

C:...PP4EdevExamplesPP4EInternetSockets> echo-server-sleep.py
Server connected by ('127.0.0.1', 59625)
Server connected by ('127.0.0.1', 59626)
Server connected by ('127.0.0.1', 59627)
Server connected by ('127.0.0.1', 59628)
Server connected by ('127.0.0.1', 59629)
Server connected by ('127.0.0.1', 59630)

C:...PP4EdevExamplesPP4EInternetSockets> testecho-messages.py
/B echo-client.py
/B echo-client.py
/B echo-client.py
/B echo-client.py
/B echo-client.py
/B echo-client.py
/B echo-client.py
/B echo-client.py
Client received: b'Echo=>Hello network world'

Traceback (most recent call last):
  File "C:...PP4EInternetSocketsecho-client.py", line 24, in <module>
    sockobj.connect((serverHost, serverPort))   # connect to server machine...
socket.error: [Errno 10061] No connection could be made because the target
machine actively refused it

Traceback (most recent call last):
  File "C:...PP4EInternetSocketsecho-client.py", line 24, in <module>
    sockobj.connect((serverHost, serverPort))   # connect to server machine...
socket.error: [Errno 10061] No connection could be made because the target
machine actively refused it

Client received: b'Echo=>Hello network world'
Client received: b'Echo=>Hello network world'
Client received: b'Echo=>Hello network world'
Client received: b'Echo=>Hello network world'
Client received: b'Echo=>Hello network world'

As you can see, with such a sleepy server, 8 clients are spawned, but only 6 receive service, and 2 fail with exceptions. Unless clients require very little of the server’s attention, to handle multiple requests overlapping in time we need to somehow service clients in parallel. We’ll see how servers can handle multiple clients more robustly in a moment; first, though, let’s experiment with some special ports.

Talking to Reserved Ports

It’s also important to know that this client and server engage in a proprietary sort of discussion, and so use the port number 50007 outside the range reserved for standard protocols (0 to 1023). There’s nothing preventing a client from opening a socket on one of these special ports, however. For instance, the following client-side code connects to programs listening on the standard email, FTP, and HTTP web server ports on three different server machines:

C:...PP4EInternetSockets> python
>>> from socket import *
>>> sock = socket(AF_INET, SOCK_STREAM)
>>> sock.connect(('pop.secureserver.net', 110))    # talk to POP email server
>>> print(sock.recv(70))
b'+OK <[email protected]>
'
>>> sock.close()

>>> sock = socket(AF_INET, SOCK_STREAM)
>>> sock.connect(('learning-python.com', 21))      # talk to FTP server
>>> print(sock.recv(70))
b'220---------- Welcome to Pure-FTPd [privsep] [TLS] ----------
220-You'
>>> sock.close()

>>> sock = socket(AF_INET, SOCK_STREAM)
>>> sock.connect(('www.python.net', 80))           # talk to Python's HTTP server

>>> sock.send(b'GET /
')                        # fetch root page reply
7
>>> sock.recv(70)
b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://'
>>> sock.recv(70)
b'www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.'

If we know how to interpret the output returned by these ports’ servers, we could use raw sockets like this to fetch email, transfer files, and grab web pages and invoke server-side scripts. Fortunately, though, we don’t have to worry about all the underlying details—Python’s poplib, ftplib, and http.client and urllib.request modules provide higher-level interfaces for talking to servers on these ports. Other Python protocol modules do the same for other standard ports (e.g., NNTP, Telnet, and so on). We’ll meet some of these client-side protocol modules in the next chapter.^[45]

Binding reserved port servers

Speaking of reserved ports, it’s all right to open client-side connections on reserved ports as in the prior section, but you can’t install your own server-side scripts for these ports unless you have special permission. On the server I use to host learning-python.com, for instance, the web server port 80 is off limits (presumably, unless I shell out for a virtual or dedicated hosting account):

[...]$ python
>>> from socket import *
>>> sock = socket(AF_INET, SOCK_STREAM)    # try to bind web port on general server
>>> sock.bind(('', 80))                    # learning-python.com is a shared machine
Traceback (most recent call last):
  File "<stdin>", line 1, in
  File "<string>", line 1, in bind
socket.error: (13, 'Permission denied')

Even if run by a user with the required permission, you’ll get the different exception we saw earlier if the port is already being used by a real web server. On computers being used as general servers, these ports really are reserved. This is one reason we’ll run a web server of our own locally for testing when we start writing server-side scripts later in this book—the above code works on a Windows PC, which allows us to experiment with websites locally, on a self-contained machine:

C:...PP4EInternetSockets> python
>>> from socket import *
>>> sock = socket(AF_INET, SOCK_STREAM)      # can bind port 80 on Windows
>>> sock.bind(('', 80))                      # allows running server on localhost
>>>

We’ll learn more about installing web servers later in Chapter 15. For the purposes of this chapter, we need to get realistic about how our socket servers handle their clients.

Handling Multiple Clients

The echo client and server programs shown previously serve to illustrate socket fundamentals. But the server model used suffers from a fairly major flaw. As described earlier, if multiple clients try to connect to the server, and it takes a long time to process a given client’s request, the server will fail. More accurately, if the cost of handling a given request prevents the server from returning to the code that checks for new clients in a timely manner, it won’t be able to keep up with all the requests, and some clients will eventually be denied connections.

In real-world client/server programs, it’s far more typical to code a server so as to avoid blocking new requests while handling a current client’s request. Perhaps the easiest way to do so is to service each client’s request in parallel—in a new process, in a new thread, or by manually switching (multiplexing) between clients in an event loop. This isn’t a socket issue per se, and we already learned how to start processes and threads in Chapter 5. But since these schemes are so typical of socket server programming, let’s explore all three ways to handle client requests in parallel here.

Forking Servers

The script in Example 12-4 works like the original echo server, but instead forks a new process to handle each new client connection. Because the handleClient function runs in a new process, the dispatcher function can immediately resume its main loop in order to detect and service a new incoming request.

Example 12-4. PP4EInternetSocketsfork-server.py

"""
Server side: open a socket on a port, listen for a message from a client,
and send an echo reply; forks a process to handle each client connection;
child processes share parent's socket descriptors; fork is less portable
than threads--not yet on Windows, unless Cygwin or similar installed;
"""

import os, time, sys
from socket import *                      # get socket constructor and constants
myHost = ''                               # server machine, '' means local host
myPort = 50007                            # listen on a non-reserved port number

sockobj = socket(AF_INET, SOCK_STREAM)           # make a TCP socket object
sockobj.bind((myHost, myPort))                   # bind it to server port number
sockobj.listen(5)                                # allow 5 pending connects

def now():                                       # current time on server
    return time.ctime(time.time())

activeChildren = []
def reapChildren():                              # reap any dead child processes
    while activeChildren:                        # else may fill up system table
        pid, stat = os.waitpid(0, os.WNOHANG)    # don't hang if no child exited
        if not pid: break
        activeChildren.remove(pid)

def handleClient(connection):                    # child process: reply, exit
    time.sleep(5)                                # simulate a blocking activity
    while True:                                  # read, write a client socket
        data = connection.recv(1024)             # till eof when socket closed
        if not data: break
        reply = 'Echo=>%s at %s' % (data, now())
        connection.send(reply.encode())
    connection.close()
    os._exit(0)

def dispatcher():                                # listen until process killed
    while True:                                  # wait for next connection,
        connection, address = sockobj.accept()   # pass to process for service
        print('Server connected by', address, end=' ')
        print('at', now())
        reapChildren()                           # clean up exited children now
        childPid = os.fork()                     # copy this process
        if childPid == 0:                        # if in child process: handle
            handleClient(connection)
        else:                                    # else: go accept next connect
            activeChildren.append(childPid)      # add to active child pid list

dispatcher()

Running the forking server

Parts of this script are a bit tricky, and most of its library calls work only on Unix-like platforms. Crucially, it runs on Cygwin Python on Windows, but not standard Windows Python. Before we get into too many forking details, though, let’s focus on how this server arranges to handle multiple client requests.

First, notice that to simulate a long-running operation (e.g., database updates, other network traffic), this server adds a five-second time.sleep delay in its client handler function, handleClient. After the delay, the original echo reply action is performed. That means that when we run a server and clients this time, clients won’t receive the echo reply until five seconds after they’ve sent their requests to the server.

To help keep track of requests and replies, the server prints its system time each time a client connect request is received, and adds its system time to the reply. Clients print the reply time sent back from the server, not their own—clocks on the server and client may differ radically, so to compare apples to apples, all times are server times. Because of the simulated delays, we also must usually start each client in its own console window on Windows (clients will hang in a blocked state while waiting for their reply).

But the grander story here is that this script runs one main parent process on the server machine, which does nothing but watch for connections (in dispatcher), plus one child process per active client connection, running in parallel with both the main parent process and the other client processes (in handleClient). In principle, the server can handle any number of clients without bogging down.

To test, let’s first start the server remotely in a SSH or Telnet window, and start three clients locally in three distinct console windows. As we’ll see in a moment, this server can also be run under Cygwin locally if you have Cygwin but don’t have a remote server account like the one on learning-python.com used here:

[server window (SSH or Telnet)]
[...]$ uname -p -o
i686 GNU/Linux
[...]$ python fork-server.py
Server connected by ('72.236.109.185', 58395) at Sat Apr 24 06:46:45 2010
Server connected by ('72.236.109.185', 58396) at Sat Apr 24 06:46:49 2010
Server connected by ('72.236.109.185', 58397) at Sat Apr 24 06:46:51 2010

[client window 1]
C:...PP4EInternetSockets> python echo-client.py learning-python.com
Client received: b"Echo=>b'Hello network world' at Sat Apr 24 06:46:50 2010"

[client window 2]
C:...PP4EInternetSockets> python echo-client.py learning-python.com Bruce
Client received: b"Echo=>b'Bruce' at Sat Apr 24 06:46:54 2010"

[client window 3]
C:...Sockets> python echo-client.py learning-python.com The Meaning of Life
Client received: b"Echo=>b'The' at Sat Apr 24 06:46:56 2010"
Client received: b"Echo=>b'Meaning' at Sat Apr 24 06:46:56 2010"
Client received: b"Echo=>b'of' at Sat Apr 24 06:46:56 2010"
Client received: b"Echo=>b'Life' at Sat Apr 24 06:46:57 2010"

Again, all times here are on the server machine. This may be a little confusing because four windows are involved. In plain English, the test proceeds as follows:

The server starts running remotely.
All three clients are started and connect to the server a few seconds apart.
On the server, the client requests trigger three forked child processes, which all immediately go to sleep for five seconds (to simulate being busy doing something useful).
Each client waits until the server replies, which happens five seconds after their initial requests.

In other words, clients are serviced at the same time by forked processes, while the main parent process continues listening for new client requests. If clients were not handled in parallel like this, no client could connect until the currently connected client’s five-second delay expired.

In a more realistic application, that delay could be fatal if many clients were trying to connect at once—the server would be stuck in the action we’re simulating with time.sleep, and not get back to the main loop to accept new client requests. With process forks per request, clients can be serviced in parallel.

Notice that we’re using the same client script here (echo-client.py, from Example 12-2), just a different server; clients simply send and receive data to a machine and port and don’t care how their requests are handled on the server. The result displayed shows a byte string within a byte string, because the client sends one to the server and the server sends one back; because the server uses string formatting and manual encoding instead of byte string concatenation, the client’s message is shown as byte string explicitly here.

Other run modes: Local servers with Cygwin and remote clients

Also note that the server is running remotely on a Linux machine in the preceding section. As we learned in Chapter 5, the fork call is not supported on Windows in standard Python at the time this book was written. It does run on Cygwin Python, though, which allows us to start this server locally on localhost, on the same machine as its clients:

[Cygwin shell window]
[C:...PP4EInternetSocekts]$ python fork-server.py
Server connected by ('127.0.0.1', 58258) at Sat Apr 24 07:50:15 2010
Server connected by ('127.0.0.1', 58259) at Sat Apr 24 07:50:17 2010

[Windows console, same machine]
C:...PP4EInternetSockets> python echo-client.py localhost bright side of life
Client received: b"Echo=>b'bright' at Sat Apr 24 07:50:20 2010"
Client received: b"Echo=>b'side' at Sat Apr 24 07:50:20 2010"
Client received: b"Echo=>b'of' at Sat Apr 24 07:50:20 2010"
Client received: b"Echo=>b'life' at Sat Apr 24 07:50:20 2010"

[Windows console, same machine]
C:...PP4EInternetSockets> python echo-client.py
Client received: b"Echo=>b'Hello network world' at Sat Apr 24 07:50:22 2010"

We can also run this test on the remote Linux server entirely, with two SSH or Telnet windows. It works about the same as when clients are started locally, in a DOS console window, but here “local” actually means a remote machine you’re using locally. Just for fun, let’s also contact the remote server from a locally running client to show how the server is also available to the Internet at large—when servers are coded with sockets and forks this way, clients can connect from arbitrary machines, and can overlap arbitrarily in time:

[one SSH (or Telnet) window]
[...]$ python fork-server.py
Server connected by ('127.0.0.1', 55743) at Sat Apr 24 07:15:14 2010
Server connected by ('127.0.0.1', 55854) at Sat Apr 24 07:15:26 2010
Server connected by ('127.0.0.1', 55950) at Sat Apr 24 07:15:36 2010
Server connected by ('72.236.109.185', 58414) at Sat Apr 24 07:19:50 2010

[another SSH window, same machine]
[...]$ python echo-client.py
Client received: b"Echo=>b'Hello network world' at Sat Apr 24 07:15:19 2010"
[...]$ python echo-client.py localhost niNiNI!
Client received: b"Echo=>b'niNiNI!' at Sat Apr 24 07:15:31 2010"
[...]$ python echo-client.py localhost Say no more!
Client received: b"Echo=>b'Say' at Sat Apr 24 07:15:41 2010"
Client received: b"Echo=>b'no' at Sat Apr 24 07:15:41 2010"
Client received: b"Echo=>b'more!' at Sat Apr 24 07:15:41 2010"

[Windows console, local machine]
C:...InternetSockets> python echo-client.py learning-python.com Blue, no yellow!
Client received: b"Echo=>b'Blue,' at Sat Apr 24 07:19:55 2010"
Client received: b"Echo=>b'no' at Sat Apr 24 07:19:55 2010"
Client received: b"Echo=>b'yellow!' at Sat Apr 24 07:19:55 2010"

Now that we have a handle on the basic model, let’s move on to the tricky bits. This server script is fairly straightforward as forking code goes, but a few words about the library tools it employs are in order.

Forked processes and sockets

We met os.fork in Chapter 5, but recall that forked processes are essentially a copy of the process that forks them, and so they inherit file and socket descriptors from their parent process. As a result, the new child process that runs the handleClient function has access to the connection socket created in the parent process. Really, this is why the child process works at all—when conversing on the connected socket, it’s using the same socket that parent’s accept call returns. Programs know they are in a forked child process if the fork call returns 0; otherwise, the original parent process gets back the new child’s ID.

Exiting from children

In earlier fork examples, child processes usually call one of the exec variants to start a new program in the child process. Here, instead, the child process simply calls a function in the same program and exits with os._exit. It’s imperative to call os._exit here—if we did not, each child would live on after handleClient returns, and compete for accepting new client requests.

In fact, without the exit call, we’d wind up with as many perpetual server processes as requests served—remove the exit call and do a ps shell command after running a few clients, and you’ll see what I mean. With the call, only the single parent process listens for new requests. os._exit is like sys.exit, but it exits the calling process immediately without cleanup actions. It’s normally used only in child processes, and sys.exit is used everywhere else.

Killing the zombies: Don’t fear the reaper!

Note, however, that it’s not quite enough to make sure that child processes exit and die. On systems like Linux, though not on Cygwin, parents must also be sure to issue a wait system call to remove the entries for dead child processes from the system’s process table. If we don’t do this, the child processes will no longer run, but they will consume an entry in the system process table. For long-running servers, these bogus entries may become problematic.

It’s common to call such dead-but-listed child processes zombies: they continue to use system resources even though they’ve already passed over to the great operating system beyond. To clean up after child processes are gone, this server keeps a list, activeChildren, of the process IDs of all child processes it spawns. Whenever a new incoming client request is received, the server runs its reapChildren to issue a wait for any dead children by issuing the standard Python os.waitpid(0,os.WNOHANG) call.

The os.waitpid call attempts to wait for a child process to exit and returns its process ID and exit status. With a 0 for its first argument, it waits for any child process. With the WNOHANG parameter for its second, it does nothing if no child process has exited (i.e., it does not block or pause the caller). The net effect is that this call simply asks the operating system for the process ID of any child that has exited. If any have, the process ID returned is removed both from the system process table and from this script’s activeChildren list.

To see why all this complexity is needed, comment out the reapChildren call in this script, run it on a platform where this is an issue, and then run a few clients. On my Linux server, a ps -f full process listing command shows that all the dead child processes stay in the system process table (show as <defunct>):

[...]$ ps –f
UID        PID  PPID  C STIME TTY          TIME CMD
5693094   9990 30778  0 04:34 pts/0    00:00:00 python fork-server.py
5693094  10844  9990  0 04:35 pts/0    00:00:00 [python] <defunct>
5693094  10869  9990  0 04:35 pts/0    00:00:00 [python] <defunct>
5693094  11130  9990  0 04:36 pts/0    00:00:00 [python] <defunct>
5693094  11151  9990  0 04:36 pts/0    00:00:00 [python] <defunct>
5693094  11482 30778  0 04:36 pts/0    00:00:00 ps -f
5693094  30778 30772  0 04:23 pts/0    00:00:00 -bash

When the reapChildren command is reactivated, dead child zombie entries are cleaned up each time the server gets a new client connection request, by calling the Python os.waitpid function. A few zombies may accumulate if the server is heavily loaded, but they will remain only until the next client connection is received (you get only as many zombies as processes served in parallel since the last accept):

[...]$ python fork-server.py &
[1] 20515
[...]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
5693094  20515 30778  0 04:43 pts/0    00:00:00 python fork-server.py
5693094  20777 30778  0 04:43 pts/0    00:00:00 ps -f
5693094  30778 30772  0 04:23 pts/0    00:00:00 -bash
[...]$
Server connected by ('72.236.109.185', 58672) at Sun Apr 25 04:43:51 2010
Server connected by ('72.236.109.185', 58673) at Sun Apr 25 04:43:54 2010
[...]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
5693094  20515 30778  0 04:43 pts/0    00:00:00 python fork-server.py
5693094  21339 20515  0 04:43 pts/0    00:00:00 [python] <defunct>
5693094  21398 20515  0 04:43 pts/0    00:00:00 [python] <defunct>
5693094  21573 30778  0 04:44 pts/0    00:00:00 ps -f
5693094  30778 30772  0 04:23 pts/0    00:00:00 -bash
[...]$
Server connected by ('72.236.109.185', 58674) at Sun Apr 25 04:44:07 2010
[...]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
5693094  20515 30778  0 04:43 pts/0    00:00:00 python fork-server.py
5693094  21646 20515  0 04:44 pts/0    00:00:00 [python] <defunct>
5693094  21813 30778  0 04:44 pts/0    00:00:00 ps -f
5693094  30778 30772  0 04:23 pts/0    00:00:00 -bash

In fact, if you type fast enough, you can actually see a child process morph from a real running program into a zombie. Here, for example, a child spawned to handle a new request changes to <defunct> on exit. Its connection cleans up lingering zombies, and its own process entry will be removed completely when the next request is received:

[...]$
Server connected by ('72.236.109.185', 58676) at Sun Apr 25 04:48:22 2010
[...] ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
5693094  20515 30778  0 04:43 pts/0    00:00:00 python fork-server.py
5693094  27120 20515  0 04:48 pts/0    00:00:00 python fork-server.py
5693094  27174 30778  0 04:48 pts/0    00:00:00 ps -f
5693094  30778 30772  0 04:23 pts/0    00:00:00 -bash
[...]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
5693094  20515 30778  0 04:43 pts/0    00:00:00 python fork-server.py
5693094  27120 20515  0 04:48 pts/0    00:00:00 [python] <defunct>
5693094  27234 30778  0 04:48 pts/0    00:00:00 ps -f
5693094  30778 30772  0 04:23 pts/0    00:00:00 -bash

Preventing zombies with signal handlers on Linux

On some systems, it’s also possible to clean up zombie child processes by resetting the signal handler for the SIGCHLD signal delivered to a parent process by the operating system when a child process stops or exits. If a Python script assigns the SIG_IGN (ignore) action as the SIGCHLD signal handler, zombies will be removed automatically and immediately by the operating system as child processes exit; the parent need not issue wait calls to clean up after them. Because of that, this scheme is a simpler alternative to manually reaping zombies on platforms where it is supported.

If you’ve already read Chapter 5, you know that Python’s standard signal module lets scripts install handlers for signals—software-generated events. By way of review, here is a brief bit of background to show how this pans out for zombies. The program in Example 12-5 installs a Python-coded signal handler function to respond to whatever signal number you type on the command line.

Example 12-5. PP4EInternetSocketssignal-demo.py

"""
Demo Python's signal module; pass signal number as a command-line arg, and use
a "kill -N pid" shell command to send this process a signal; on my Linux machine,
SIGUSR1=10, SIGUSR2=12, SIGCHLD=17, and SIGCHLD handler stays in effect even if
not restored: all other handlers are restored by Python after caught, but SIGCHLD
behavior is left to the platform's implementation; signal works on Windows too,
but defines only a few signal types; signals are not very portable in general;
"""

import sys, signal, time

def now():
    return time.asctime()

def onSignal(signum, stackframe):                # Python signal handler
    print('Got signal', signum, 'at', now())     # most handlers stay in effect
    if signum == signal.SIGCHLD:                 # but sigchld handler is not
        print('sigchld caught')
        #signal.signal(signal.SIGCHLD, onSignal)

signum = int(sys.argv[1])
signal.signal(signum, onSignal)                  # install signal handler
while True: signal.pause()                       # sleep waiting for signals

To run this script, simply put it in the background and send it signals by typing the kill -signal-number process-id shell command line; this is the shell’s equivalent of Python’s os.kill function available on Unix-like platforms only. Process IDs are listed in the PID column of ps command results. Here is this script in action catching signal numbers 10 (reserved for general use) and 9 (the unavoidable terminate signal):

[...]$ python signal-demo.py 10 &
[1] 10141
[...]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
5693094  10141 30778  0 05:00 pts/0    00:00:00 python signal-demo.py 10
5693094  10228 30778  0 05:00 pts/0    00:00:00 ps -f
5693094  30778 30772  0 04:23 pts/0    00:00:00 -bash

[...]$ kill −10 10141
Got signal 10 at Sun Apr 25 05:00:31 2010

[...]$ kill −10 10141
Got signal 10 at Sun Apr 25 05:00:34 2010

[...]$ kill −9 10141
[1]+  Killed                  python signal-demo.py 10

And in the following the script catches signal 17, which happens to be SIGCHLD on my Linux server. Signal numbers vary from machine to machine, so you should normally use their names, not their numbers. SIGCHLD behavior may vary per platform as well. On my Cygwin install, for example, signal 10 can have different meaning, and signal 20 is SIGCHLD—on Cygwin, the script works as shown on Linux here for signal 10, but generates an exception if it tries to install on handler for signal 17 (and Cygwin doesn’t require reaping in any event). See the signal module’s library manual entry for more details:

[...]$ python signal-demo.py 17 &
[1] 11592
[...]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
5693094  11592 30778  0 05:00 pts/0    00:00:00 python signal-demo.py 17
5693094  11728 30778  0 05:01 pts/0    00:00:00 ps -f
5693094  30778 30772  0 04:23 pts/0    00:00:00 -bash

[...]$ kill −17 11592
Got signal 17 at Sun Apr 25 05:01:28 2010
sigchld caught

[...]$ kill −17 11592
Got signal 17 at Sun Apr 25 05:01:35 2010
sigchld caught

[...]$ kill −9 11592
[1]+  Killed                  python signal-demo.py 17

Now, to apply all of this signal knowledge to killing zombies, simply set the SIGCHLD signal handler to the SIG_IGN ignore handler action; on systems where this assignment is supported, child processes will be cleaned up when they exit. The forking server variant shown in Example 12-6 uses this trick to manage its children.

Example 12-6. PP4EInternetSocketsfork-server-signal.py

"""
Same as fork-server.py, but use the Python signal module to avoid keeping
child zombie processes after they terminate, instead of an explicit reaper
loop before each new connection; SIG_IGN means ignore, and may not work with
SIG_CHLD child exit signal on all platforms; see Linux documentation for more
about the restartability of a socket.accept call interrupted with a signal;
"""

import os, time, sys, signal, signal
from socket import *                      # get socket constructor and constants
myHost = ''                               # server machine, '' means local host
myPort = 50007                            # listen on a non-reserved port number

sockobj = socket(AF_INET, SOCK_STREAM)           # make a TCP socket object
sockobj.bind((myHost, myPort))                   # bind it to server port number
sockobj.listen(5)                                # up to 5 pending connects
signal.signal(signal.SIGCHLD, signal.SIG_IGN)    # avoid child zombie processes

def now():                                       # time on server machine
    return time.ctime(time.time())

def handleClient(connection):                    # child process replies, exits
    time.sleep(5)                                # simulate a blocking activity
    while True:                                  # read, write a client socket
        data = connection.recv(1024)
        if not data: break
        reply = 'Echo=>%s at %s' % (data, now())
        connection.send(reply.encode())
    connection.close()
    os._exit(0)

def dispatcher():                                # listen until process killed
    while True:                                  # wait for next connection,
        connection, address = sockobj.accept()   # pass to process for service
        print('Server connected by', address, end=' ')
        print('at', now())
        childPid = os.fork()                     # copy this process
        if childPid == 0:                        # if in child process: handle
            handleClient(connection)             # else: go accept next connect

dispatcher()

Where applicable, this technique is:

Much simpler; we don’t need to manually track or reap child processes.
More accurate; it leaves no zombies temporarily between client requests.

In fact, only one line is dedicated to handling zombies here: the signal.signal call near the top, to set the handler. Unfortunately, this version is also even less portable than using os.fork in the first place, because signals may work slightly differently from platform to platform, even among Unix variants. For instance, some Unix platforms may not allow SIG_IGN to be used as the SIGCHLD action at all. On Linux systems, though, this simpler forking server variant works like a charm:

[...]$ python fork-server-signal.py &
[1] 3837
Server connected by ('72.236.109.185', 58817) at Sun Apr 25 08:11:12 2010

[...] ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
5693094   3837 30778  0 08:10 pts/0    00:00:00 python fork-server-signal.py
5693094   4378  3837  0 08:11 pts/0    00:00:00 python fork-server-signal.py
5693094   4413 30778  0 08:11 pts/0    00:00:00 ps -f
5693094  30778 30772  0 04:23 pts/0    00:00:00 -bash

[...]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
5693094   3837 30778  0 08:10 pts/0    00:00:00 python fork-server-signal.py
5693094   4584 30778  0 08:11 pts/0    00:00:00 ps -f
5693094  30778 30772  0 04:23 pts/0    00:00:00 –bash

Notice how in this version the child process’s entry goes away as soon as it exits, even before a new client request is received; no “defunct” zombie ever appears. More dramatically, if we now start up the script we wrote earlier that spawns eight clients in parallel (testecho.py) to talk to this server remotely, all appear on the server while running, but are removed immediately as they exit:

[client window]
C:...PP4EInternetSockets> testecho.py learning-python.com

[server window]
[...]$
Server connected by ('72.236.109.185', 58829) at Sun Apr 25 08:16:34 2010
Server connected by ('72.236.109.185', 58830) at Sun Apr 25 08:16:34 2010
Server connected by ('72.236.109.185', 58831) at Sun Apr 25 08:16:34 2010
Server connected by ('72.236.109.185', 58832) at Sun Apr 25 08:16:34 2010
Server connected by ('72.236.109.185', 58833) at Sun Apr 25 08:16:34 2010
Server connected by ('72.236.109.185', 58834) at Sun Apr 25 08:16:34 2010
Server connected by ('72.236.109.185', 58835) at Sun Apr 25 08:16:34 2010
Server connected by ('72.236.109.185', 58836) at Sun Apr 25 08:16:34 2010

[...]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
5693094   3837 30778  0 08:10 pts/0    00:00:00 python fork-server-signal.py
5693094   9666  3837  0 08:16 pts/0    00:00:00 python fork-server-signal.py
5693094   9667  3837  0 08:16 pts/0    00:00:00 python fork-server-signal.py
5693094   9668  3837  0 08:16 pts/0    00:00:00 python fork-server-signal.py
5693094   9670  3837  0 08:16 pts/0    00:00:00 python fork-server-signal.py
5693094   9674  3837  0 08:16 pts/0    00:00:00 python fork-server-signal.py
5693094   9678  3837  0 08:16 pts/0    00:00:00 python fork-server-signal.py
5693094   9681  3837  0 08:16 pts/0    00:00:00 python fork-server-signal.py
5693094   9682  3837  0 08:16 pts/0    00:00:00 python fork-server-signal.py
5693094   9722 30778  0 08:16 pts/0    00:00:00 ps -f
5693094  30778 30772  0 04:23 pts/0    00:00:00 -bash

[...]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
5693094   3837 30778  0 08:10 pts/0    00:00:00 python fork-server-signal.py
5693094  10045 30778  0 08:16 pts/0    00:00:00 ps -f
5693094  30778 30772  0 04:23 pts/0    00:00:00 –bash

And now that I’ve shown you how to use signal handling to reap children automatically on Linux, I should underscore that this technique is not universally supported across all flavors of Unix. If you care about portability, manually reaping children as we did in Example 12-4 may still be desirable.

Why multiprocessing doesn’t help with socket server portability

In Chapter 5, we learned about Python’s new multiprocessing module. As we saw, it provides a way to start function calls in new processes that is more portable than the os.fork call used in this section’s server code, and it runs processes instead of threads to work around the thread GIL in some scenarios. In particular, multiprocessing works on standard Windows Python too, unlike direct os.fork calls.

I experimented with a server variant based upon this module to see if its portability might help for socket servers. Its full source code is in the examples package in file multi-server.py, but here are its important bits that differ:

...rest unchanged from fork-server.py...
from multiprocessing import Process

def handleClient(connection):
    print('Child:', os.getpid())                 # child process: reply, exit
    time.sleep(5)                                # simulate a blocking activity
    while True:                                  # read, write a client socket
        data = connection.recv(1024)             # till eof when socket closed
        ...rest unchanged...

def dispatcher():                                # listen until process killed
    while True:                                  # wait for next connection,
        connection, address = sockobj.accept()   # pass to process for service
        print('Server connected by', address, end=' ')
        print('at', now())
        Process(target=handleClient, args=(connection,)).start()

if __name__ == '__main__':
    print('Parent:', os.getpid())
    sockobj = socket(AF_INET, SOCK_STREAM)           # make a TCP socket object
    sockobj.bind((myHost, myPort))                   # bind it to server port number
    sockobj.listen(5)                                # allow 5 pending connects
    dispatcher()

This server variant is noticeably simpler too. Like the forking server it’s derived from, this server works fine under Cygwin Python on Windows running as localhost, and would probably work on other Unix-like platforms as well, because multiprocessing forks a process on such systems, and file and socket descriptors are inherited by child processes as usual. Hence, the child process uses the same connected socket as the parent. Here’s the scene in a Cygwin server window and two Windows client windows:

[server window]
[C:...PP4EInternetSockets]$ python multi-server.py
Parent: 8388
Server connected by ('127.0.0.1', 58271) at Sat Apr 24 08:13:27 2010
Child: 8144
Server connected by ('127.0.0.1', 58272) at Sat Apr 24 08:13:29 2010
Child: 8036

[two client windows]
C:...PP4EInternetSockets> python echo-client.py
Client received: b"Echo=>b'Hello network world' at Sat Apr 24 08:13:33 2010"

C:...PP4EInternetSockets> python echo-client.py localhost Brave Sir Robin
Client received: b"Echo=>b'Brave' at Sat Apr 24 08:13:35 2010"
Client received: b"Echo=>b'Sir' at Sat Apr 24 08:13:35 2010"
Client received: b"Echo=>b'Robin' at Sat Apr 24 08:13:35 2010"

However, this server does not work on standard Windows Python—the whole point of trying to use multiprocessing in this context—because open sockets are not correctly pickled when passed as arguments into the new process. Here’s what occurs in the server windows on Windows 7 with Python 3.1:

C:...PP4EInternetSockets> python multi-server.py
Parent: 9140
Server connected by ('127.0.0.1', 58276) at Sat Apr 24 08:17:41 2010
Child: 9628
Process Process-1:
Traceback (most recent call last):
  File "C:Python31libmultiprocessingprocess.py", line 233, in _bootstrap
    self.run()
  File "C:Python31libmultiprocessingprocess.py", line 88, in run
    self._target(*self._args, **self._kwargs)
  File "C:...PP4EInternetSocketsmulti-server.py", line 38, in handleClient
    data = connection.recv(1024)             # till eof when socket closed
socket.error: [Errno 10038] An operation was attempted on something that is not
a socket

Recall from Chapter 5 that on Windows multiprocessing passes context to a new Python interpreter process by pickling it, and that Process arguments must all be pickleable for Windows. Sockets in Python 3.1 don’t trigger errors when pickled thanks to the class they are an instance of, but they are not really pickled correctly:

>>> from pickle import *
>>> from socket import *
>>> s = socket()
>>> x = dumps(s)
>>> s
<socket.socket object, fd=180, family=2, type=1, proto=0>
>>> loads(x)
<socket.socket object, fd=-1, family=0, type=0, proto=0>
>>> x
b'x80x03csocket
socket
qx00)x81qx01N}qx02(Xx08x00x00x00_io_refsqx03
Kx00Xx07x00x00x00_closedqx04x89ux86qx05b.'

As we saw in Chapter 5, multiprocessing has other IPC tools such as its own pipes and queues that might be used instead of sockets to work around this issue, but clients would then have to use them, too—the resulting server would not be as broadly accessible as one based upon general Internet sockets.

Even if multiprocessing did work on Windows, though, its need to start a new Python interpreter would likely make it much slower than the more traditional technique of spawning threads to talk to clients. Coincidentally, that brings us to our next topic.

Threading Servers

The forking model just described works well on Unix-like platforms in general, but it suffers from some potentially significant limitations:

Performance: On some machines, starting a new process can be fairly expensive in terms of time and space resources.
Portability: Forking processes is a Unix technique; as we’ve learned, the os.fork call currently doesn’t work on non-Unix platforms such as Windows under standard Python. As we’ve also learned, forks can be used in the Cygwin version of Python on Windows, but they may be inefficient and not exactly the same as Unix forks. And as we just discovered, multiprocessing won’t help on Windows, because connected sockets are not pickleable across process boundaries.
Complexity: If you think that forking servers can be complicated, you’re not alone. As we just saw, forking also brings with it all the shenanigans of managing and reaping zombies—cleaning up after child processes that live shorter lives than their parents.

If you read Chapter 5, you know that one solution to all of these dilemmas is to use threads rather than processes. Threads run in parallel and share global (i.e., module and interpreter) memory.

Because threads all run in the same process and memory space, they automatically share sockets passed between them, similar in spirit to the way that child processes inherit socket descriptors. Unlike processes, though, threads are usually less expensive to start, and work on both Unix-like machines and Windows under standard Python today. Furthermore, many (though not all) see threads as simpler to program—child threads die silently on exit, without leaving behind zombies to haunt the server.

To illustrate, Example 12-7 is another mutation of the echo server that handles client requests in parallel by running them in threads rather than in processes.

Example 12-7. PP4EInternetSockets hread-server.py

"""
Server side: open a socket on a port, listen for a message from a client,
and send an echo reply; echoes lines until eof when client closes socket;
spawns a thread to handle each client connection; threads share global
memory space with main thread; this is more portable than fork: threads
work on standard Windows systems, but process forks do not;
"""

import time, _thread as thread           # or use threading.Thread().start()
from socket import *                     # get socket constructor and constants
myHost = ''                              # server machine, '' means local host
myPort = 50007                           # listen on a non-reserved port number

sockobj = socket(AF_INET, SOCK_STREAM)           # make a TCP socket object
sockobj.bind((myHost, myPort))                   # bind it to server port number
sockobj.listen(5)                                # allow up to 5 pending connects

def now():
    return time.ctime(time.time())               # current time on the server

def handleClient(connection):                    # in spawned thread: reply
    time.sleep(5)                                # simulate a blocking activity
    while True:                                  # read, write a client socket
        data = connection.recv(1024)
        if not data: break
        reply = 'Echo=>%s at %s' % (data, now())
        connection.send(reply.encode())
    connection.close()

def dispatcher():                                # listen until process killed
    while True:                                  # wait for next connection,
        connection, address = sockobj.accept()   # pass to thread for service
        print('Server connected by', address, end=' ')
        print('at', now())
        thread.start_new_thread(handleClient, (connection,))

dispatcher()

This dispatcher delegates each incoming client connection request to a newly spawned thread running the handleClient function. As a result, this server can process multiple clients at once, and the main dispatcher loop can get quickly back to the top to check for newly arrived requests. The net effect is that new clients won’t be denied service due to a busy server.

Functionally, this version is similar to the fork solution (clients are handled in parallel), but it will work on any machine that supports threads, including Windows and Linux. Let’s test it on both. First, start the server on a Linux machine and run clients on both Linux and Windows:

[window 1: thread-based server process, server keeps accepting
client connections while threads are servicing prior requests]
[...]$ python thread-server.py
Server connected by ('127.0.0.1', 37335) at Sun Apr 25 08:59:05 2010
Server connected by ('72.236.109.185', 58866) at Sun Apr 25 08:59:54 2010
Server connected by ('72.236.109.185', 58867) at Sun Apr 25 08:59:56 2010
Server connected by ('72.236.109.185', 58868) at Sun Apr 25 08:59:58 2010

[window 2: client, but on same remote server machine]
[...]$ python echo-client.py
Client received: b"Echo=>b'Hello network world' at Sun Apr 25 08:59:10 2010"

[windows 3-5: local clients, PC]
C:...PP4EInternetSockets> python echo-client.py learning-python.com
Client received: b"Echo=>b'Hello network world' at Sun Apr 25 08:59:59 2010"

C:...PP4EInternetSockets> python echo-client.py learning-python.com Bruce
Client received: b"Echo=>b'Bruce' at Sun Apr 25 09:00:01 2010"

C:...Sockets> python echo-client.py learning-python.com The Meaning of life
Client received: b"Echo=>b'The' at Sun Apr 25 09:00:03 2010"
Client received: b"Echo=>b'Meaning' at Sun Apr 25 09:00:03 2010"
Client received: b"Echo=>b'of' at Sun Apr 25 09:00:03 2010"
Client received: b"Echo=>b'life' at Sun Apr 25 09:00:03 2010"

Because this server uses threads rather than forked processes, we can run it portably on both Linux and a Windows PC. Here it is at work again, running on the same local Windows PC as its clients; again, the main point to notice is that new clients are accepted while prior clients are being processed in parallel with other clients and the main thread (in the five-second sleep delay):

[window 1: server, on local PC]
C:...PP4EInternetSockets> python thread-server.py
Server connected by ('127.0.0.1', 58987) at Sun Apr 25 12:41:46 2010
Server connected by ('127.0.0.1', 58988) at Sun Apr 25 12:41:47 2010
Server connected by ('127.0.0.1', 58989) at Sun Apr 25 12:41:49 2010

[windows 2-4: clients, on local PC]
C:...PP4EInternetSockets> python echo-client.py
Client received: b"Echo=>b'Hello network world' at Sun Apr 25 12:41:51 2010"

C:...PP4EInternetSockets> python echo-client.py localhost Brian
Client received: b"Echo=>b'Brian' at Sun Apr 25 12:41:52 2010"

C:...PP4EInternetSockets> python echo-client.py localhost Bright side of life
Client received: b"Echo=>b'Bright' at Sun Apr 25 12:41:54 2010"
Client received: b"Echo=>b'side' at Sun Apr 25 12:41:54 2010"
Client received: b"Echo=>b'of' at Sun Apr 25 12:41:54 2010"
Client received: b"Echo=>b'life' at Sun Apr 25 12:41:54 2010"

Remember that a thread silently exits when the function it is running returns; unlike the process fork version, we don’t call anything like os._exit in the client handler function (and we shouldn’t—it may kill all threads in the process, including the main loop watching for new connections!). Because of this, the thread version is not only more portable, but also simpler.

Standard Library Server Classes

Now that I’ve shown you how to write forking and threading servers to process clients without blocking incoming requests, I should also tell you that there are standard tools in the Python standard library to make this process even easier. In particular, the socketserver module defines classes that implement all flavors of forking and threading servers that you are likely to be interested in.

Like the manually-coded servers we’ve just studied, this module’s primary classes implement servers which process clients in parallel (a.k.a. asynchronously) to avoid denying service to new requests during long-running transactions. Their net effect is to automate the top-levels of common server code. To use this module, simply create the desired kind of imported server object, passing in a handler object with a callback method of your own, as demonstrated in the threaded TCP server of Example 12-8.

Example 12-8. PP4EInternetSocketsclass-server.py

"""
Server side: open a socket on a port, listen for a message from a client, and
send an echo reply; this version uses the standard library module socketserver to
do its work; socketserver provides TCPServer, ThreadingTCPServer, ForkingTCPServer,
UDP variants of these, and more, and routes each client connect request to a new
instance of a passed-in request handler object's handle method; socketserver also
supports Unix domain sockets, but only on Unixen; see the Python library manual.
"""

import socketserver, time               # get socket server, handler objects
myHost = ''                             # server machine, '' means local host
myPort = 50007                          # listen on a non-reserved port number
def now():
    return time.ctime(time.time())

class MyClientHandler(socketserver.BaseRequestHandler):
    def handle(self):                           # on each client connect
        print(self.client_address, now())       # show this client's address
        time.sleep(5)                           # simulate a blocking activity
        while True:                             # self.request is client socket
            data = self.request.recv(1024)      # read, write a client socket
            if not data: break
            reply = 'Echo=>%s at %s' % (data, now())
            self.request.send(reply.encode())
        self.request.close()

# make a threaded server, listen/handle clients forever
myaddr = (myHost, myPort)
server = socketserver.ThreadingTCPServer(myaddr, MyClientHandler)
server.serve_forever()

This server works the same as the threading server we wrote by hand in the previous section, but instead focuses on service implementation (the customized handle method), not on threading details. It is run the same way, too—here it is processing three clients started by hand, plus eight spawned by the testecho script shown we wrote in Example 12-3:

[window 1: server, serverHost='localhost' in echo-client.py]
C:...PP4EInternetSockets> python class-server.py
('127.0.0.1', 59036) Sun Apr 25 13:50:23 2010
('127.0.0.1', 59037) Sun Apr 25 13:50:25 2010
('127.0.0.1', 59038) Sun Apr 25 13:50:26 2010
('127.0.0.1', 59039) Sun Apr 25 13:51:05 2010
('127.0.0.1', 59040) Sun Apr 25 13:51:05 2010
('127.0.0.1', 59041) Sun Apr 25 13:51:06 2010
('127.0.0.1', 59042) Sun Apr 25 13:51:06 2010
('127.0.0.1', 59043) Sun Apr 25 13:51:06 2010
('127.0.0.1', 59044) Sun Apr 25 13:51:06 2010
('127.0.0.1', 59045) Sun Apr 25 13:51:06 2010
('127.0.0.1', 59046) Sun Apr 25 13:51:06 2010

[windows 2-4: client, same machine]
C:...PP4EInternetSockets> python echo-client.py
Client received: b"Echo=>b'Hello network world' at Sun Apr 25 13:50:28 2010"

C:...PP4EInternetSockets> python echo-client.py localhost Arthur
Client received: b"Echo=>b'Arthur' at Sun Apr 25 13:50:30 2010"

C:...PP4EInternetSockets> python echo-client.py localhost Brave Sir Robin
Client received: b"Echo=>b'Brave' at Sun Apr 25 13:50:31 2010"
Client received: b"Echo=>b'Sir' at Sun Apr 25 13:50:31 2010"
Client received: b"Echo=>b'Robin' at Sun Apr 25 13:50:31 2010"

C:...PP4EInternetSockets> python testecho.py

To build a forking server instead, just use the class name ForkingTCPServer when creating the server object. The socketserver module has more power than shown by this example; it also supports nonparallel (a.k.a. serial or synchronous) servers, UDP and Unix domain sockets, and Ctrl-C server interrupts on Windows. See Python’s library manual for more details.

For more advanced server needs, Python also comes with standard library tools that use those shown here, and allow you to implement in just a few lines of Python code a simple but fully-functional HTTP (web) server that knows how to run server-side CGI scripts. We’ll explore those larger server tools in Chapter 15.

Multiplexing Servers with select

So far we’ve seen how to handle multiple clients at once with both forked processes and spawned threads, and we’ve looked at a library class that encapsulates both schemes. Under both approaches, all client handlers seem to run in parallel with one another and with the main dispatch loop that continues watching for new incoming requests. Because all of these tasks run in parallel (i.e., at the same time), the server doesn’t get blocked when accepting new requests or when processing a long-running client handler.

Technically, though, threads and processes don’t really run in parallel, unless you’re lucky enough to have a machine with many CPUs. Instead, your operating system performs a juggling act—it divides the computer’s processing power among all active tasks. It runs part of one, then part of another, and so on. All the tasks appear to run in parallel, but only because the operating system switches focus between tasks so fast that you don’t usually notice. This process of switching between tasks is sometimes called time-slicing when done by an operating system; it is more generally known as multiplexing.

When we spawn threads and processes, we rely on the operating system to juggle the active tasks so that none are starved of computing resources, especially the main server dispatcher loop. However, there’s no reason that a Python script can’t do so as well. For instance, a script might divide tasks into multiple steps—run a step of one task, then one of another, and so on, until all are completed. The script need only know how to divide its attention among the multiple active tasks to multiplex on its own.

Servers can apply this technique to yield yet another way to handle multiple clients at once, a way that requires neither threads nor forks. By multiplexing client connections and the main dispatcher with the select system call, a single event loop can process multiple clients and accept new ones in parallel (or at least close enough to avoid stalling). Such servers are sometimes called asynchronous, because they service clients in spurts, as each becomes ready to communicate. In asynchronous servers, a single main loop run in a single process and thread decides which clients should get a bit of attention each time through. Client requests and the main dispatcher loop are each given a small slice of the server’s attention if they are ready to converse.

Most of the magic behind this server structure is the operating system select call, available in Python’s standard select module on all major platforms. Roughly, select is asked to monitor a list of input sources, output sources, and exceptional condition sources and tells us which sources are ready for processing. It can be made to simply poll all the sources to see which are ready; wait for a maximum time period for sources to become ready; or wait indefinitely until one or more sources are ready for processing.

However used, select lets us direct attention to sockets ready to communicate, so as to avoid blocking on calls to ones that are not. That is, when the sources passed to select are sockets, we can be sure that socket calls like accept, recv, and send will not block (pause) the server when applied to objects returned by select. Because of that, a single-loop server that uses select need not get stuck communicating with one client or waiting for new ones while other clients are starved for the server’s attention.

Because this type of server does not need to start threads or processes, it can be efficient when transactions with clients are relatively short-lived. However, it also requires that these transactions be quick; if they are not, it still runs the risk of becoming bogged down waiting for a dialog with a particular client to end, unless augmented with threads or forks for long-running transactions.^[46]

A select-based echo server

Let’s see how all of this translates into code. The script in Example 12-9 implements another echo server, one that can handle multiple clients without ever starting new processes or threads.

Example 12-9. PP4EInternetSocketsselect-server.py

"""
Server: handle multiple clients in parallel with select. use the select
module to manually multiplex among a set of sockets: main sockets which
accept new client connections, and input sockets connected to accepted
clients; select can take an optional 4th arg--0 to poll, n.m to wait n.m
seconds, or omitted to wait till any socket is ready for processing.
"""

import sys, time
from select import select
from socket import socket, AF_INET, SOCK_STREAM
def now(): return time.ctime(time.time())

myHost = ''                             # server machine, '' means local host
myPort = 50007                          # listen on a non-reserved port number
if len(sys.argv) == 3:                  # allow host/port as cmdline args too
    myHost, myPort = sys.argv[1:]
numPortSocks = 2                        # number of ports for client connects

# make main sockets for accepting new client requests
mainsocks, readsocks, writesocks = [], [], []
for i in range(numPortSocks):
    portsock = socket(AF_INET, SOCK_STREAM)   # make a TCP/IP socket object
    portsock.bind((myHost, myPort))           # bind it to server port number
    portsock.listen(5)                        # listen, allow 5 pending connects
    mainsocks.append(portsock)                # add to main list to identify
    readsocks.append(portsock)                # add to select inputs list
    myPort += 1                               # bind on consecutive ports

# event loop: listen and multiplex until server process killed
print('select-server loop starting')
while True:
    #print(readsocks)
    readables, writeables, exceptions = select(readsocks, writesocks, [])
    for sockobj in readables:
        if sockobj in mainsocks:                     # for ready input sockets
            # port socket: accept new client
            newsock, address = sockobj.accept()      # accept should not block
            print('Connect:', address, id(newsock))  # newsock is a new socket
            readsocks.append(newsock)                # add to select list, wait
        else:
            # client socket: read next line
            data = sockobj.recv(1024)                # recv should not block
            print('	got', data, 'on', id(sockobj))
            if not data:                             # if closed by the clients
                sockobj.close()                      # close here and remv from
                readsocks.remove(sockobj)            # del list else reselected
            else:
                # this may block: should really select for writes too
                reply = 'Echo=>%s at %s' % (data, now())
                sockobj.send(reply.encode())

The bulk of this script is its while event loop at the end that calls select to find out which sockets are ready for processing; these include both main port sockets on which clients can connect and open client connections. It then loops over all such ready sockets, accepting connections on main port sockets and reading and echoing input on any client sockets ready for input. Both the accept and recv calls in this code are guaranteed to not block the server process after select returns; as a result, this server can quickly get back to the top of the loop to process newly arrived client requests and already connected clients’ inputs. The net effect is that all new requests and clients are serviced in pseudoparallel fashion.

To make this process work, the server appends the connected socket for each client to the readables list passed to select, and simply waits for the socket to show up in the selected inputs list. For illustration purposes, this server also listens for new clients on more than one port—on ports 50007 and 50008, in our examples. Because these main port sockets are also interrogated with select, connection requests on either port can be accepted without blocking either already connected clients or new connection requests appearing on the other port. The select call returns whatever sockets in readables are ready for processing—both main port sockets and sockets connected to clients currently being processed.

Running the select server

Let’s run this script locally to see how it does its stuff (the client and server can also be run on different machines, as in prior socket examples). First, we’ll assume we’ve already started this server script on the local machine in one window, and run a few clients to talk to it. The following listing gives the interaction in two such client console windows running on Windows. The first client simply runs the echo-client script twice to contact the server, and the second also kicks off the testecho script to spawn eight echo-client programs running in parallel.

As before, the server simply echoes back whatever text that client sends, though without a sleep pause here (more on this in a moment). Notice how the second client window really runs a script called echo-client-50008 so as to connect to the second port socket in the server; it’s the same as echo-client, with a different hardcoded port number; alas, the original script wasn’t designed to input a port number:

[client window 1]
C:...PP4EInternetSockets> python echo-client.py
Client received: b"Echo=>b'Hello network world' at Sun Apr 25 14:51:21 2010"

C:...PP4EInternetSockets> python echo-client.py
Client received: b"Echo=>b'Hello network world' at Sun Apr 25 14:51:27 2010"

[client window 2]
C:...PP4EInternetSockets> python echo-client-5008.py localhost Sir Galahad
Client received: b"Echo=>b'Sir' at Sun Apr 25 14:51:22 2010"
Client received: b"Echo=>b'Galahad' at Sun Apr 25 14:51:22 2010"

C:...PP4EInternetSockets> python testecho.py

The next listing is the sort of output that show up in the window where the server has been started. The first three connections come from echo-client runs; the rest is the result of the eight programs spawned by testecho in the second client window. We can run this server on Windows, too, because select is available on this platform. Correlate this output with the server’s code to see how it runs.

Notice that for testecho, new client connections and client inputs are multiplexed together. If you study the output closely, you’ll see that they overlap in time, because all activity is dispatched by the single event loop in the server. In fact, the trace output on the server will probably look a bit different nearly every time it runs. Clients and new connections are interleaved almost at random due to timing differences on the host machines. This happens in the earlier forking and treading servers, too, but the operating system automatically switches between the execution paths of the dispatcher loop and client transactions.

Also note that the server gets an empty string when the client has closed its socket. We take care to close and delete these sockets at the server right away, or else they would be needlessly reselected again and again, each time through the main loop:

[server window]
C:...PP4EInternetSockets> python select-server.py
C:UsersmarkStuffBooks4EPP4EdevExamplesPP4EInternetSockets>python sele
ct-server.py
select-server loop starting
Connect: ('127.0.0.1', 59080) 21339352
        got b'Hello network world' on 21339352
        got b'' on 21339352
Connect: ('127.0.0.1', 59081) 21338128
        got b'Sir' on 21338128
        got b'Galahad' on 21338128
        got b'' on 21338128
Connect: ('127.0.0.1', 59082) 21339352
        got b'Hello network world' on 21339352
        got b'' on 21339352

[testecho results]
Connect: ('127.0.0.1', 59083) 21338128
        got b'Hello network world' on 21338128
        got b'' on 21338128
Connect: ('127.0.0.1', 59084) 21339352
        got b'Hello network world' on 21339352
        got b'' on 21339352
Connect: ('127.0.0.1', 59085) 21338128
        got b'Hello network world' on 21338128
        got b'' on 21338128
Connect: ('127.0.0.1', 59086) 21339352
        got b'Hello network world' on 21339352
        got b'' on 21339352
Connect: ('127.0.0.1', 59087) 21338128
        got b'Hello network world' on 21338128
        got b'' on 21338128
Connect: ('127.0.0.1', 59088) 21339352
Connect: ('127.0.0.1', 59089) 21338128
        got b'Hello network world' on 21339352
        got b'Hello network world' on 21338128
Connect: ('127.0.0.1', 59090) 21338056
        got b'' on 21339352
        got b'' on 21338128
        got b'Hello network world' on 21338056
        got b'' on 21338056

Besides this more verbose output, there’s another subtle but crucial difference to notice—a time.sleep call to simulate a long-running task doesn’t make sense in the server here. Because all clients are handled by the same single loop, sleeping would pause everything, and defeat the whole point of a multiplexing server. Again, manual multiplexing servers like this one work well when transactions are short, but also generally require them to either be so, or be handled specially.

Before we move on, here are a few additional notes and options:

select call details

Formally, select is called with three lists of selectable objects (input sources, output sources, and exceptional condition sources), plus an optional timeout. The timeout argument may be a real wait expiration value in seconds (use floating-point numbers to express fractions of a second), a zero value to mean simply poll and return immediately, or omitted to mean wait until at least one object is ready (as done in our server script). The call returns a triple of ready objects—subsets of the first three arguments—any or all of which may be empty if the timeout expired before sources became ready.

select portability

Like threading, but unlike forking, this server works in standard Windows Python, too. Technically, the select call works only for sockets on Windows, but also works for things like files and pipes on Unix and Macintosh. For servers running over the Internet, of course, the primary devices we are interested in are sockets.

Nonblocking sockets

select lets us be sure that socket calls like accept and recv won’t block (pause) the caller, but it’s also possible to make Python sockets nonblocking in general. Call the setblocking method of socket objects to set the socket to blocking or nonblocking mode. For example, given a call like sock.setblocking(flag), the socket sock is set to nonblocking mode if the flag is zero and to blocking mode otherwise. All sockets start out in blocking mode initially, so socket calls may always make the caller wait.

However, when in nonblocking mode, a socket.error exception is raised if a recv socket call doesn’t find any data, or if a send call can’t immediately transfer data. A script can catch this exception to determine whether the socket is ready for processing. In blocking mode, these calls always block until they can proceed. Of course, there may be much more to processing client requests than data transfers (requests may also require long-running computations), so nonblocking sockets don’t guarantee that servers won’t stall in general. They are simply another way to code multiplexing servers. Like select, they are better suited when client requests can be serviced quickly.

The asyncore module framework

If you’re interested in using select, you will probably also be interested in checking out the asyncore.py module in the standard Python library. It implements a class-based callback model, where input and output callbacks are dispatched to class methods by a precoded select event loop. As such, it allows servers to be constructed without threads or forks, and it is a select-based alternative to the socketserver module’s threading and forking module we met in the prior sections. As for this type of server in general, asyncore is best when transactions are short—what it describes as “I/O bound” instead of “CPU bound” programs, the latter of which still require threads or forks. See the Python library manual for details and a usage example.

Twisted

For other server options, see also the open source Twisted system (http://twistedmatrix.com). Twisted is an asynchronous networking framework written in Python that supports TCP, UDP, multicast, SSL/TLS, serial communication, and more. It supports both clients and servers and includes implementations of a number of commonly used network services such as a web server, an IRC chat server, a mail server, a relational database interface, and an object broker.

Although Twisted supports processes and threads for longer-running actions, it also uses an asynchronous, event-driven model to handle clients, which is similar to the event loop of GUI libraries like tkinter. It abstracts an event loop, which multiplexes among open socket connections, automates many of the details inherent in an asynchronous server, and provides an event-driven framework for scripts to use to accomplish application tasks. Twisted’s internal event engine is similar in spirit to our select-based server and the asyncore module, but it is regarded as much more advanced. Twisted is a third-party system, not a standard library tool; see its website and documentation for more details.

Summary: Choosing a Server Scheme

So when should you use select to build a server, instead of threads or forks? Needs vary per application, of course, but as mentioned, servers based on the select call generally perform very well when client transactions are relatively short and are not CPU-bound. If they are not short, threads or forks may be a better way to split processing among multiple clients. Threads and forks are especially useful if clients require long-running processing above and beyond the socket calls used to pass data. However, combinations are possible too—nothing is stopping a select-based polling loop from using threads, too.

It’s important to remember that schemes based on select (and nonblocking sockets) are not completely immune to blocking. In Example 12-9, for instance, the send call that echoes text back to a client might block, too, and hence stall the entire server. We could work around that blocking potential by using select to make sure that the output operation is ready before we attempt it (e.g., use the writesocks list and add another loop to send replies to ready output sockets), albeit at a noticeable cost in program clarity.

In general, though, if we cannot split up the processing of a client’s request in such a way that it can be multiplexed with other requests and not block the server’s main loop, select may not be the best way to construct a server by itself. While some network servers can satisfy this constraint, many cannot.

Moreover, select also seems more complex than spawning either processes or threads, because we need to manually transfer control among all tasks (for instance, compare the threaded and select versions of our echo server, even without write selects). As usual, though, the degree of that complexity varies per application. The asyncore standard library module mentioned earlier simplifies some of the tasks of implementing a select-based event-loop socket server, and Twisted offers additional hybrid solutions.

Making Sockets Look Like Files and Streams

So far in this chapter, we’ve focused on the role of sockets in the classic client/server networking model. That’s one of their primary roles, but they have other common use cases as well.

In Chapter 5, for instance, we saw sockets as a basic IPC device between processes and threads on a single machine. And in Chapter 10’s exploration of linking non-GUI scripts to GUIs, we wrote a utility module (Example 10-23) which connected a caller’s standard output stream to a socket, on which a GUI could listen for output to be displayed. There, I promised that we’d flesh out that module with additional transfer modes once we had a chance to explore sockets in more depth. Now that we have, this section takes a brief detour from the world of remote network servers to tell the rest of this story.

Although some programs can be written or rewritten to converse over sockets explicitly, this isn’t always an option; it may be too expensive an effort for existing scripts, and might preclude desirable nonsocket usage modes for others. In some cases, it’s better to allow a script to use standard stream tools such as the print and input built-in functions and sys module file calls (e.g., sys.stdout.write), and connect them to sockets only when needed.

Because such stream tools are designed to operate on text-mode files, though, probably the biggest trick here is fooling them into operating on the inherently binary mode and very different method interface of sockets. Luckily, sockets come with a method that achieves all the forgery we need.

The socket object makefile method comes in handy anytime you wish to process a socket with normal file object methods or need to pass a socket to an existing interface or program that expects a file. The socket wrapper object returned allows your scripts to transfer data over the underlying socket with read and write calls, rather than recv and send. Since input and print built-in functions use the former methods set, they will happily interact with sockets wrapped by this call, too.

The makefile method also allows us to treat normally binary socket data as text instead of byte strings, and has additional arguments such as encoding that let us specify nondefault Unicode encodings for the transferred text—much like the built-in open and os.fdopen calls we met in Chapter 4 do for file descriptors. Although text can always be encoded and decoded with manual calls after binary mode socket transfers, makefile shifts the burden of text encodings from your code to the file wrapper object.

This equivalence to files comes in handy any time we want to use software that supports file interfaces. For example, the Python pickle module’s load and dump methods expect an object with a file-like interface (e.g., read and write methods), but they don’t require a physical file. Passing a TCP/IP socket wrapped with the makefile call to the pickler allows us to ship serialized Python objects over the Internet, without having to pickle to byte strings ourselves and call raw socket methods manually. This is an alternative to using the pickle module’s string-based calls (dumps, loads) with socket send and recv calls, and might offer more flexibility for software that must support a variety of transport mechanisms. See Chapter 17 for more details on object serialization interfaces.

More generally, any component that expects a file-like method protocol will gladly accept a socket wrapped with a socket object makefile call. Such interfaces will also accept strings wrapped with the built-in io.StringIO class, and any other sort of object that supports the same kinds of method calls as built-in file objects. As always in Python, we code to protocols—object interfaces—not to specific datatypes.

A Stream Redirection Utility

To illustrate the makefile method’s operation, Example 12-10 implements a variety of redirection schemes, which redirect the caller’s streams to a socket that can be used by another process for communication. The first of its functions connects output, and is what we used in Chapter 10; the others connect input, and both input and output in three different modes.

Naturally, the wrapper object returned by socket.makefile can also be used with direct file interface read and write method calls and independently of standard streams. This example uses those methods, too, albeit in most cases indirectly and implicitly through the print and input stream access built-ins, and reflects a common use case for the tool.

Example 12-10. PP4EInternetSocketssocket_stream_redirect.py

"""
###############################################################################
Tools for connecting standard streams of non-GUI programs to sockets that
a GUI (or other) program can use to interact with the non-GUI program.
###############################################################################
"""

import sys
from socket import *
port = 50008            # pass in different port if multiple dialogs on machine
host = 'localhost'      # pass in different host to connect to remote listeners

def initListenerSocket(port=port):
    """
    initialize connected socket for callers that listen in server mode
    """
    sock = socket(AF_INET, SOCK_STREAM)
    sock.bind(('', port))                     # listen on this port number
    sock.listen(5)                            # set pending queue length
    conn, addr = sock.accept()                # wait for client to connect
    return conn                               # return connected socket

def redirectOut(port=port, host=host):
    """
    connect caller's standard output stream to a socket for GUI to listen
    start caller after listener started, else connect fails before accept
    """
    sock = socket(AF_INET, SOCK_STREAM)
    sock.connect((host, port))                # caller operates in client mode
    file = sock.makefile('w')                 # file interface: text, buffered
    sys.stdout = file                         # make prints go to sock.send
    return sock                               # if caller needs to access it raw

def redirectIn(port=port, host=host):
    """
    connect caller's standard input stream to a socket for GUI to provide
    """
    sock = socket(AF_INET, SOCK_STREAM)
    sock.connect((host, port))
    file = sock.makefile('r')                 # file interface wrapper
    sys.stdin = file                          # make input come from sock.recv
    return sock                               # return value can be ignored

def redirectBothAsClient(port=port, host=host):
    """
    connect caller's standard input and output stream to same socket
    in this mode, caller is client to a server: sends msg, receives reply
    """
    sock = socket(AF_INET, SOCK_STREAM)
    sock.connect((host, port))                # or open in 'rw' mode
    ofile = sock.makefile('w')                # file interface: text, buffered
    ifile = sock.makefile('r')                # two file objects wrap same socket
    sys.stdout = ofile                        # make prints go to sock.send
    sys.stdin  = ifile                        # make input come from sock.recv
    return sock

def redirectBothAsServer(port=port, host=host):
    """
    connect caller's standard input and output stream to same socket
    in this mode, caller is server to a client: receives msg, send reply
    """
    sock = socket(AF_INET, SOCK_STREAM)
    sock.bind((host, port))                   # caller is listener here
    sock.listen(5)
    conn, addr = sock.accept()
    ofile = conn.makefile('w')                # file interface wrapper
    ifile = conn.makefile('r')                # two file objects wrap same socket
    sys.stdout = ofile                        # make prints go to sock.send
    sys.stdin  = ifile                        # make input come from sock.recv
    return conn

To test, the script in Example 12-11 defines five sets of client/server functions. It runs the client’s code in process, but deploys the Python multiprocessing module we met in Chapter 5 to portably spawn the server function’s side of the dialog in a separate process. In the end, the client and server test functions run in different processes, but converse over a socket that is connected to standard streams within the test script’s process.

Example 12-11. PP4EInternetSockets est-socket_stream_redirect.py

"""
###############################################################################
test the socket_stream_redirect.py modes
###############################################################################
"""

import sys, os, multiprocessing
from socket_stream_redirect import *

###############################################################################
# redirected client output
###############################################################################

def server1():
    mypid = os.getpid()
    conn  = initListenerSocket()                     # block till client connect
    file  = conn.makefile('r')
    for i in range(3):                               # read/recv client's prints
        data = file.readline().rstrip()              # block till data ready
        print('server %s got [%s]' % (mypid, data))  # print normally to terminal

def client1():
    mypid = os.getpid()
    redirectOut()
    for i in range(3):
        print('client %s: %s' % (mypid, i))          # print to socket
        sys.stdout.flush()                           # else buffered till exits!

###############################################################################
# redirected client input
###############################################################################

def server2():
    mypid = os.getpid()                              # raw socket not buffered
    conn  = initListenerSocket()                     # send to client's input
    for i in range(3):
        conn.send(('server %s: %s
' % (mypid, i)).encode())

def client2():
    mypid = os.getpid()
    redirectIn()
    for i in range(3):
        data = input()                               # input from socket
        print('client %s got [%s]' % (mypid, data))  # print normally to terminal

###############################################################################
# redirect client input + output, client is socket client
###############################################################################

def server3():
    mypid = os.getpid()
    conn  = initListenerSocket()                     # wait for client connect
    file  = conn.makefile('r')                       # recv print(), send input()
    for i in range(3):                               # readline blocks till data
        data = file.readline().rstrip()
        conn.send(('server %s got [%s]
' % (mypid, data)).encode())

def client3():
    mypid = os.getpid()
    redirectBothAsClient()
    for i in range(3):
        print('client %s: %s' % (mypid, i))          # print to socket
        data = input()                               # input from socket: flushes!
        sys.stderr.write('client %s got [%s]
' % (mypid, data))  # not redirected

###############################################################################
# redirect client input + output, client is socket server
###############################################################################

def server4():
    mypid = os.getpid()
    sock  = socket(AF_INET, SOCK_STREAM)
    sock.connect((host, port))
    file  = sock.makefile('r')
    for i in range(3):
        sock.send(('server %s: %s
' % (mypid, i)).encode())   # send to input()
        data = file.readline().rstrip()                        # recv from print()
        print('server %s got [%s]' % (mypid, data))            # result to terminal

def client4():
    mypid = os.getpid()
    redirectBothAsServer()         # I'm actually the socket server in this mode
    for i in range(3):
        data = input()                               # input from socket: flushes!
        print('client %s got [%s]' % (mypid, data))  # print to socket
        sys.stdout.flush()                           # else last buffered till exit!

###############################################################################
# redirect client input + output, client is socket client, server xfers first
###############################################################################

def server5():
    mypid = os.getpid()                              # test 4, but server accepts
    conn  = initListenerSocket()                     # wait for client connect
    file  = conn.makefile('r')                       # send input(), recv print()
    for i in range(3):
        conn.send(('server %s: %s
' % (mypid, i)).encode())
        data = file.readline().rstrip()
        print('server %s got [%s]' % (mypid, data))

def client5():
    mypid = os.getpid()
    s = redirectBothAsClient()     # I'm the socket client in this mode
    for i in range(3):
        data = input()                               # input from socket: flushes!
        print('client %s got [%s]' % (mypid, data))  # print to socket
        sys.stdout.flush()                           # else last buffered till exit!

###############################################################################
# test by number on command-line
###############################################################################

if __name__ == '__main__':
    server = eval('server' + sys.argv[1])
    client = eval('client' + sys.argv[1])               # client in this process
    multiprocessing.Process(target=server).start()      # server in new process
    client()                                            # reset streams in client
    #import time; time.sleep(5)                         # test effect of exit flush

Run the test script with a client and server number on the command line to test the module’s tools; messages display process ID numbers, and those within square brackets reflect a transfer across streams connected to sockets (twice, when nested):

C:...PP4EInternetSockets> test-socket_stream_redirect.py 1
server 3844 got [client 1112: 0]
server 3844 got [client 1112: 1]
server 3844 got [client 1112: 2]

C:...PP4EInternetSockets> test-socket_stream_redirect.py 2
client 5188 got [server 2020: 0]
client 5188 got [server 2020: 1]
client 5188 got [server 2020: 2]

C:...PP4EInternetSockets> test-socket_stream_redirect.py 3
client 7796 got [server 2780 got [client 7796: 0]]
client 7796 got [server 2780 got [client 7796: 1]]
client 7796 got [server 2780 got [client 7796: 2]]

C:...PP4EInternetSockets> test-socket_stream_redirect.py 4
server 4288 got [client 3852 got [server 4288: 0]]
server 4288 got [client 3852 got [server 4288: 1]]
server 4288 got [client 3852 got [server 4288: 2]]

C:...PP4EInternetSockets> test-socket_stream_redirect.py 5
server 6040 got [client 7728 got [server 6040: 0]]
server 6040 got [client 7728 got [server 6040: 1]]
server 6040 got [client 7728 got [server 6040: 2]]

If you correlate this script’s output with its code to see how messages are passed between client and server, you’ll find that print and input calls in client functions are ultimately routed over sockets to another process. To the client functions, the socket linkage is largely invisible.

Text-mode files and buffered output streams

Before we move on, there are two remarkably subtle aspects of the example’s code worth highlighting:

Binary to text translations

Raw sockets transfer binary byte strings, but by opening the wrapper files in text mode, their content is automatically translated to text strings on input and output. Text-mode file wrappers are required if accessed through standard stream tools such as the print built-in that writes text strings (as we’ve learned, binary mode files require byte strings instead). When dealing with the raw socket directly, though, text must still be manually encoded to byte strings, as shown in most of Example 12-11’s tests.

Buffered streams, program output, and deadlock

As we learned in Chapters 5 and 10, standard streams are normally buffered, and printed text may need to be flushed so that it appears on a socket connected to a process’s output stream. Indeed, some of this example’s tests require explicit or implicit flush calls to work properly at all; otherwise their output is either incomplete or absent altogether until program exit. In pathological cases, this can lead to deadlock, with a process waiting for output from another that never appears. In other configurations, we may also get socket errors in a reader if a writer exits too soon, especially in two-way dialogs.

For example, if client1 and client4 did not flush periodically as they do, the only reason that they would work is because output streams are automatically flushed when their process exits. Without manual flushes, client1 transfers no data until process exit (at which point all its output is sent at once in a single message), and client4’s data is incomplete till exit (its last printed message is delayed).

Even more subtly, both client3 and client4 rely on the fact that the input built-in first automatically flushes sys.stdout internally for its prompt option, thereby sending data from preceding print calls. Without this implicit flush (or the addition of manual flushes), client3 would experience deadlock immediately, as would client4 if its manual flush call was removed (even with input’s flush, removing client4’s manual flush causes its final print message to not be transferred until process exit). client5 has this same behavior as client4, because it simply swaps which process binds and accepts and which connects.

In the general case, if we wish to read a program’s output as it is produced, instead of all at once when it exits or as its buffers fill, the program must either call sys.stdout.flush periodically, or be run with unbuffered streams by using Python’s -u command-line argument of Chapter 5 if applicable.

Although we can open socket wrapper files in unbuffered mode with a second makefile argument of zero (like normal open), this does not allow the wrapper to run in the text mode required for print and desired for input. In fact, attempting to make a socket wrapper file both text mode and unbuffered this way fails with an exception, because Python 3.X no longer supports unbuffered mode for text files (it is allowed for binary mode only today). In other words, because print requires text mode, buffered mode is also implied for output stream files. Moreover, attempting to open a socket file wrapper in line-buffered mode appears to not be supported in Python 3.X (more on this ahead).

While some buffering behavior may be library and platform dependent, manual flush calls or direct socket access might sometimes still be required. Note that sockets can also be made nonblocking with the setblocking(0) method, but this only avoids wait states for transfer calls and does not address the data producer’s failure to send buffered output.

Stream requirements

To make some of this more concrete, Example 12-12 illustrates how some of these complexities apply to redirected standard streams, by attempting to connect them to both text and binary mode files produced by open and accessing them with print and input built-ins much as redirected script might.

Example 12-12. PP4EInternetSockets est-stream-modes.py

"""
test effect of connecting standard streams to text and binary mode files
same holds true for socket.makefile: print requires text mode, but text
mode precludes unbuffered mode -- use -u or sys.stdout.flush() calls
"""

import sys

def reader(F):
    tmp, sys.stdin = sys.stdin, F
    line = input()
    print(line)
    sys.stdin = tmp

reader( open('test-stream-modes.py') )         # works: input() returns text
reader( open('test-stream-modes.py', 'rb') )   # works: but input() returns bytes

def writer(F):
    tmp, sys.stdout = sys.stdout, F
    print(99, 'spam')
    sys.stdout = tmp

writer( open('temp', 'w') )             # works: print() passes text str to .write()
print(open('temp').read())

writer( open('temp', 'wb') )            # FAILS on print: binary mode requires bytes
writer( open('temp', 'w', 0) )          # FAILS on open: text must be buffered

When run, the last two lines in this script both fail—the second to last fails because print passes text strings to a binary-mode file (never allowed for files in general), and the last fails because we cannot open text-mode files in unbuffered mode in Python 3.X (text mode implies Unicode encodings). Here are the errors we get when this script is run: the first run uses the script as shown, and the second shows what happens if the second to last line is commented out (I edited the exception text slightly for presentation):

C:...PP4EInternetSockets> test-stream-modes.py
"""
b'"""
'
99 spam

Traceback (most recent call last):
  File "C:...PP4EInternetSockets	est-stream-modes.py", line 26, in <module>
    writer( open('temp', 'wb') )            # FAILS on print: binary mode...
  File "C:...PP4EInternetSockets	est-stream-modes.py", line 20, in writer
    print(99, 'spam')
TypeError: must be bytes or buffer, not str

C:...PP4EInternetSockets> test-streams-binary.py
"""
b'"""
'
99 spam

Traceback (most recent call last):
  File "C:...PP4EInternetSockets	est-stream-modes.py", line 27, in <module>
    writer( open('temp', 'w', 0) )          # FAILS on open: text must be...
ValueError: can't have unbuffered text I/O

The same rules apply to socket wrapper file objects created with a socket’s makefile method—they must be opened in text mode for print and should be opened in text mode for input if we wish to receive text strings, but text mode prevents us from using fully unbuffered file mode altogether:

>>> from socket import *
>>> s = socket()                       # defaults to tcp/ip (AF_INET, SOCK_STREAM)
>>> s.makefile('w', 0)                 # this used to work in Python 2.X
Traceback (most recent call last):
  File "C:Python31libsocket.py", line 151, in makefile
ValueError: unbuffered streams must be binary

Line buffering

Text-mode socket wrappers also accept a buffering-mode argument of 1 to specify line-buffering instead of the default full buffering:

>>> from socket import *
>>> s = socket()
>>> f = s.makefile('w', 1)    # same as buffering=1, but acts as fully buffered!

This appears to be no different than full buffering, and still requires the resulting file to be flushed manually to transfer lines as they are produced. Consider the simple socket server and client scripts in Examples 12-13 and 12-14. The server simply reads three messages using the raw socket interface.

Example 12-13. PP4EInternetSocketssocket-unbuff-server.py

from socket import *           # read three messages over a raw socket
sock = socket()
sock.bind(('', 60000))
sock.listen(5)
print('accepting...')
conn, id = sock.accept()       # blocks till client connect

for i in range(3):
    print('receiving...')
    msg = conn.recv(1024)      # blocks till data received
    print(msg)                 # gets all print lines at once unless flushed

The client in Example 12-14 sends three messages; the first two over a socket wrapper file, and the last using the raw socket; the manual flush calls in this are commented out but retained so you can experiment with turning them on, and sleep calls make the server wait for data.

Example 12-14. PP4InternetSocketssocket-unbuff-client.py

import time                            # send three msgs over wrapped and raw socket
from socket import *
sock = socket()                        # default=AF_INET, SOCK_STREAM (tcp/ip)
sock.connect(('localhost', 60000))
file = sock.makefile('w', buffering=1) # default=full buff, 0=error, 1 not linebuff!

print('sending data1')
file.write('spam
')
time.sleep(5)               # must follow with flush() to truly send now
#file.flush()               # uncomment flush lines to see the difference

print('sending data2')
print('eggs', file=file)    # adding more file prints does not flush buffer either
time.sleep(5)
#file.flush()               # output appears at server recv only upon flush or exit

print('sending data3')
sock.send(b'ham
')         # low-level byte string interface sends immediately
time.sleep(5)               # received first if don't flush other two!

Run the server in one window first and the client in another (or run the server first in the background in Unix-like platforms). The output in the server window follows—the messages sent with the socket wrapper are deferred until program exit, but the raw socket call transfers data immediately:

C:...PP4EInternetSockets> socket-unbuff-server.py
accepting...
receiving...
b'ham
'
receiving...
b'spam
eggs
'
receiving...
b''

The client window simply displays “sending” lines 5 seconds apart; its third message appears at the server in 10 seconds, but the first and second messages it sends using the wrapper file are deferred until exit (for 15 seconds) because the socket wrapper is still fully buffered. If the manual flush calls in the client are uncommented, each of the three sent messages is delivered in serial, 5 seconds apart (the third appears immediately after the second):

C:...PP4EInternetSockets> socket-unbuff-server.py
accepting...
receiving...
b'spam
'
receiving...
b'eggs
'
receiving...
b'ham
'

In other words, even when line buffering is requested, socket wrapper file writes (and by association, prints) are buffered until the program exits, manual flushes are requested, or the buffer becomes full.

Solutions

The short story here is this: to avoid delayed outputs or deadlock, scripts that might send data to waiting programs by printing to wrapped sockets (or for that matter, by using print or sys.stdout.write in general) should do one of the following:

Call sys.stdout.flush periodically to flush their printed output so it becomes available as produced, as shown in Example 12-11.
Be run with the -u Python command-line flag, if possible, to force the output stream to be unbuffered. This works for unmodified programs spawned by pipe tools such as os.popen. It will not help with the use case here, though, because we manually reset the stream files to buffered text socket wrappers after a process starts. To prove this, uncomment Example 12-11’s manual flush calls and the sleep call at its end, and run with -u: the first test’s output is still delayed for 5 seconds.
Use threads to read from sockets to avoid blocking, especially if the receiving program is a GUI and it cannot depend upon the client to flush. See Chapter 10 for pointers. This doesn’t really fix the problem—the spawned reader thread may be blocked or deadlocked, too—but at least the GUI remains active during waits.
Implement their own custom socket wrapper objects which intercept text write calls, encode to binary, and route to a socket with send calls; socket.makefile is really just a convenience tool, and we can always code a wrapper of our own for more specific roles. For hints, see Chapter 10’s GuiOutput class, the stream redirection class in Chapter 3, and the classes of the io standard library module (upon which Python’s input/output tools are based, and which you can mix in custom ways).
Skip print altogether and communicate directly with the native interfaces of IPC devices, such as socket objects’ raw send and recv methods—these transfer data immediately and do not buffer data as file methods can. We can either transfer simple byte strings this way or use the pickle module’s dumps and loads tools to convert Python objects to and from byte strings for such direct socket transfer (more on pickle in Chapter 17).

The latter option may be more direct (and the redirection utility module also returns the raw socket in support of such usage), but it isn’t viable in all scenarios, especially for existing or multimode scripts. In many cases, it may be most straightforward to use manual flush calls in shell-oriented programs whose streams might be linked to other programs through sockets.

Buffering in other contexts: Command pipes revisited

Also keep in mind that buffered streams and deadlock are general issues that go beyond socket wrapper files. We explored this topic in Chapter 5; as a quick review, the nonsocket Example 12-15 does not fully buffer its output when it is connected to a terminal (output is only line buffered when run from a shell command prompt), but does if connected to something else (including a socket or pipe).

Example 12-15. PP4EInternetSocketspipe-unbuff-writer.py

# output line buffered (unbuffered) if stdout is a terminal, buffered by default for
# other devices: use -u or sys.stdout.flush() to avoid delayed output on pipe/socket

import time, sys
for i in range(5):
    print(time.asctime())                 # print transfers per stream buffering
    sys.stdout.write('spam
')            # ditto for direct stream file access
    time.sleep(2)                         # unles sys.stdout reset to other file

Although text-mode files are required for Python 3.X’s print in general, the -u flag still works in 3.X to suppress full output stream buffering. In Example 12-16, using this flag makes the spawned script’s printed output appear every 2 seconds, as it is produced. Not using this flag defers all output for 10 seconds, until the spawned script exits, unless the spawned script calls sys.stdout.flush on each iteration.

Example 12-16. PP4EInternetSocketspipe-unbuff-reader.py

# no output for 10 seconds unless Python -u flag used or sys.stdout.flush()
# but writer's output appears here every 2 seconds when either option is used

import os
for line in os.popen('python -u pipe-unbuff-writer.py'):    # iterator reads lines
    print(line, end='')                                     # blocks without -u!

Following is the reader script’s output; unlike the socket examples, it spawns the writer automatically, so we don’t need separate windows to test. Recall from Chapter 5 that os.popen also accepts a buffering argument much like socket.makefile, but it does not apply to the spawned program’s stream, and so would not prevent output buffering in this case.

C:...PP4EInternetSockets> pipe-unbuff-reader.py
Wed Apr 07 09:32:28 2010
spam
Wed Apr 07 09:32:30 2010
spam
Wed Apr 07 09:32:32 2010
spam
Wed Apr 07 09:32:34 2010
spam
Wed Apr 07 09:32:36 2010
spam

The net effect is that -u still works around the stream buffering issue for connected programs in 3.X, as long as you don’t reset the streams to other objects in the spawned program as we did for socket redirection in Example 12-11. For socket redirections, manual flush calls or replacement socket wrappers may be required.

Sockets versus command pipes

So why use sockets in this redirection role at all? In short, for server independence and networked use cases. Notice how for command pipes it’s not clear who should be called “server” and “client,” since neither script runs perpetually. In fact, this is one of the major downsides of using command pipes like this instead of sockets—because the programs require a direct spawning relationship, command pipes do not support longer-lived or remotely running servers the way that sockets do.

With sockets, we can start client and server independently, and the server may continue running perpetually to serve multiple clients (albeit with some changes to our utility module’s listener initialization code). Moreover, passing in remote machine names to our socket redirection tools would allow a client to connect to a server running on a completely different machine. As we learned in Chapter 5, named pipes (fifos) accessed with the open call support stronger independence of client and server, too, but unlike sockets, they are usually limited to the local machine, and are not supported on all platforms.

Experiment with this code on your own for more insight. Also try changing Example 12-11 to run the client function in a spawned process instead of or in addition to the server, with and without flush calls and time.sleep calls to defer exits; the spawning structure might have some impact on the soundness of a given socket dialog structure as well, which we’ll finesse here in the interest of space.

Despite the care that must be taken with text encodings and stream buffering, the utility provided by Example 12-10 is still arguably impressive—prints and input calls are routed over network or local-machine socket connections in a largely automatic fashion, and with minimal changes to the nonsocket code that uses the module. In many cases, the technique can extend a script’s applicability.

In the next section, we’ll use the makefile method again to wrap the socket in a file-like object, so that it can be read by lines using normal text-file method calls and techniques. This isn’t strictly required in the example—we could read lines as byte strings with the socket recv call, too. In general, though, the makefile method comes in handy any time you wish to treat sockets as though they were simple files. To see this at work, let’s move on.

A Simple Python File Server

It’s time for something realistic. Let’s conclude this chapter by putting some of the socket ideas we’ve studied to work doing something a bit more useful than echoing text back and forth. Example 12-17 implements both the server-side and the client-side logic needed to ship a requested file from server to client machines over a raw socket.

In effect, this script implements a simple file download system. One instance of the script is run on the machine where downloadable files live (the server), and another on the machines you wish to copy files to (the clients). Command-line arguments tell the script which flavor to run and optionally name the server machine and port number over which conversations are to occur. A server instance can respond to any number of client file requests at the port on which it listens, because it serves each in a thread.

Example 12-17. PP4EInternetSocketsgetfile.py

"""
#############################################################################
implement client and server-side logic to transfer an arbitrary file from
server to client over a socket; uses a simple control-info protocol rather
than separate sockets for control and data (as in ftp), dispatches each
client request to a handler thread, and loops to transfer the entire file
by blocks; see ftplib examples for a higher-level transport scheme;
#############################################################################
"""

import sys, os, time, _thread as thread
from socket import *

blksz = 1024
defaultHost = 'localhost'
defaultPort = 50001

helptext = """
Usage...
server=> getfile.py  -mode server            [-port nnn] [-host hhh|localhost]
client=> getfile.py [-mode client] -file fff [-port nnn] [-host hhh|localhost]
"""

def now():
    return time.asctime()

def parsecommandline():
    dict = {}                        # put in dictionary for easy lookup
    args = sys.argv[1:]              # skip program name at front of args
    while len(args) >= 2:            # example: dict['-mode'] = 'server'
        dict[args[0]] = args[1]
        args = args[2:]
    return dict

def client(host, port, filename):
    sock = socket(AF_INET, SOCK_STREAM)
    sock.connect((host, port))
    sock.send((filename + '
').encode())      # send remote name with dir: bytes
    dropdir = os.path.split(filename)[1]       # filename at end of dir path
    file = open(dropdir, 'wb')                 # create local file in cwd
    while True:
        data = sock.recv(blksz)                # get up to 1K at a time
        if not data: break                     # till closed on server side
        file.write(data)                       # store data in local file
    sock.close()
    file.close()
    print('Client got', filename, 'at', now())

def serverthread(clientsock):
    sockfile = clientsock.makefile('r')        # wrap socket in dup file obj
    filename = sockfile.readline()[:-1]        # get filename up to end-line
    try:
        file = open(filename, 'rb')
        while True:
            bytes = file.read(blksz)           # read/send 1K at a time
            if not bytes: break                # until file totally sent
            sent = clientsock.send(bytes)
            assert sent == len(bytes)
    except:
        print('Error downloading file on server:', filename)
    clientsock.close()

def server(host, port):
    serversock = socket(AF_INET, SOCK_STREAM)     # listen on TCP/IP socket
    serversock.bind((host, port))                 # serve clients in threads
    serversock.listen(5)
    while True:
        clientsock, clientaddr = serversock.accept()
        print('Server connected by', clientaddr, 'at', now())
        thread.start_new_thread(serverthread, (clientsock,))

def main(args):
    host = args.get('-host', defaultHost)         # use args or defaults
    port = int(args.get('-port', defaultPort))    # is a string in argv
    if args.get('-mode') == 'server':             # None if no -mode: client
        if host == 'localhost': host = ''         # else fails remotely
        server(host, port)
    elif args.get('-file'):                       # client mode needs -file
        client(host, port, args['-file'])
    else:
        print(helptext)

if __name__ == '__main__':
    args = parsecommandline()
    main(args)

This script isn’t much different from the examples we saw earlier. Depending on the command-line arguments passed, it invokes one of two functions:

The server function farms out each incoming client request to a thread that transfers the requested file’s bytes.
The client function sends the server a file’s name and stores all the bytes it gets back in a local file of the same name.

The most novel feature here is the protocol between client and server: the client starts the conversation by shipping a filename string up to the server, terminated with an end-of-line character, and including the file’s directory path in the server. At the server, a spawned thread extracts the requested file’s name by reading the client socket, and opens and transfers the requested file back to the client, one chunk of bytes at a time.

Running the File Server and Clients

Since the server uses threads to process clients, we can test both client and server on the same Windows machine. First, let’s start a server instance and execute two client instances on the same machine while the server runs:

[server window, localhost]
C:...InternetSockets> python getfile.py -mode server
Server connected by ('127.0.0.1', 59134) at Sun Apr 25 16:26:50 2010
Server connected by ('127.0.0.1', 59135) at Sun Apr 25 16:27:21 2010

[client window, localhost]
C:...InternetSockets> dir /B *.gif *.txt
File Not Found

C:...InternetSockets> python getfile.py -file testdirora-lp4e.gif
Client got testdirora-lp4e.gif at Sun Apr 25 16:26:50 2010

C:...InternetSockets> python getfile.py -file testdir	extfile.txt -port 50001
Client got testdir	extfile.txt at Sun Apr 25 16:27:21 2010

Clients run in the directory where you want the downloaded file to appear—the client instance code strips the server directory path when making the local file’s name. Here the “download” simply copies the requested files up to the local parent directory (the DOS fc command compares file contents):

C:...InternetSockets> dir /B *.gif *.txt
ora-lp4e.gif
textfile.txt

C:...InternetSockets> fc /B ora-lp4e.gif testdir/ora-lp4e.gif
FC: no differences encountered

C:...InternetSockets> fc textfile.txt testdir	extfile.txt
FC: no differences encountered

As usual, we can run server and clients on different machines as well. For instance, here are the sort of commands we would use to launch the server remotely and fetch files from it locally; run this on your own to see the client and server outputs:

[remote server window]
[...]$ python getfile.py -mode server

[client window: requested file downloaded in a thread on server]
C:...InternetSockets> python getfile.py –mode client
                             -host learning-python.com
                             -port 50001 -file python.exe

C:...InternetSockets> python getfile.py
                             -host learning-python.com -file index.html

One subtle security point here: the server instance code is happy to send any server-side file whose pathname is sent from a client, as long as the server is run with a username that has read access to the requested file. If you care about keeping some of your server-side files private, you should add logic to suppress downloads of restricted files. I’ll leave this as a suggested exercise here, but we will implement such filename checks in a different getfile download tool later in this book.^[47]

Adding a User-Interface Frontend

After all the GUI commotion in the prior part of this book, you might have noticed that we have been living in the realm of the command line for this entire chapter—our socket clients and servers have been started from simple DOS or Linux shells. Nothing is stopping us from adding a nice point-and-click user interface to some of these scripts, though; GUI and network scripting are not mutually exclusive techniques. In fact, they can be arguably “sexy” when used together well.

For instance, it would be easy to implement a simple tkinter GUI frontend to the client-side portion of the getfile script we just met. Such a tool, run on the client machine, may simply pop up a window with Entry widgets for typing the desired filename, server, and so on. Once download parameters have been input, the user interface could either import and call the getfile.client function with appropriate option arguments, or build and run the implied getfile.py command line using tools such as os.system, os.popen, subprocess, and so on.

Using row frames and command lines

To help make all of this more concrete, let’s very quickly explore a few simple scripts that add a tkinter frontend to the getfile client-side program. All of these examples assume that you are running a server instance of getfile; they merely add a GUI for the client side of the conversation, to fetch a file from the server. The first, in Example 12-18, uses form construction techniques we met in Chapters 8 and 9 to create a dialog for inputting server, port, and filename information, and simply constructs the corresponding getfile command line and runs it with the os.system call we studied in Part II.

Example 12-18. PP4EInternetSocketsgetfilegui-1.py

"""
launch getfile script client from simple tkinter GUI;
could also use os.fork+exec, os.spawnv (see Launcher);
windows: replace 'python' with 'start' if not on path;
"""

import sys, os
from tkinter import *
from tkinter.messagebox import showinfo

def onReturnKey():
    cmdline = ('python getfile.py -mode client -file %s -port %s -host %s' %
                      (content['File'].get(),
                       content['Port'].get(),
                       content['Server'].get()))
    os.system(cmdline)
    showinfo('getfilegui-1', 'Download complete')

box = Tk()
labels = ['Server', 'Port', 'File']
content = {}
for label in labels:
    row = Frame(box)
    row.pack(fill=X)
    Label(row, text=label, width=6).pack(side=LEFT)
    entry = Entry(row)
    entry.pack(side=RIGHT, expand=YES, fill=X)
    content[label] = entry

box.title('getfilegui-1')
box.bind('<Return>', (lambda event: onReturnKey()))
mainloop()

When run, this script creates the input form shown in Figure 12-1. Pressing the Enter key (<Return>) runs a client-side instance of the getfile program; when the generated getfile command line is finished, we get the verification pop up displayed in Figure 12-2.

Figure 12-1. getfilegui-1 in action

Figure 12-2. getfilegui-1 verification pop up

Using grids and function calls

The first user-interface script (Example 12-18) uses the pack geometry manager and row Frames with fixed-width labels to lay out the input form and runs the getfile client as a standalone program. As we learned in Chapter 9, it’s arguably just as easy to use the grid manager for layout and to import and call the client-side logic function instead of running a program. The script in Example 12-19 shows how.

Example 12-19. PP4EInternetSocketsgetfilegui-2.py

"""
same, but with grids and import+call, not packs and cmdline;
direct function calls are usually faster than running files;
"""

import getfile
from tkinter import *
from tkinter.messagebox import showinfo

def onSubmit():
    getfile.client(content['Server'].get(),
                   int(content['Port'].get()),
                   content['File'].get())
    showinfo('getfilegui-2', 'Download complete')

box    = Tk()
labels = ['Server', 'Port', 'File']
rownum  = 0
content = {}
for label in labels:
    Label(box, text=label).grid(column=0, row=rownum)
    entry = Entry(box)
    entry.grid(column=1, row=rownum, sticky=E+W)
    content[label] = entry
    rownum += 1

box.columnconfigure(0, weight=0)   # make expandable
box.columnconfigure(1, weight=1)
Button(text='Submit', command=onSubmit).grid(row=rownum, column=0, columnspan=2)

box.title('getfilegui-2')
box.bind('<Return>', (lambda event: onSubmit()))
mainloop()

This version makes a similar window (Figure 12-3), but adds a button at the bottom that does the same thing as an Enter key press—it runs the getfile client procedure. Generally speaking, importing and calling functions (as done here) is faster than running command lines, especially if done more than once. The getfile script is set up to work either way—as program or function library.

Figure 12-3. getfilegui-2 in action

Using a reusable form-layout class

If you’re like me, though, writing all the GUI form layout code in those two scripts can seem a bit tedious, whether you use packing or grids. In fact, it became so tedious to me that I decided to write a general-purpose form-layout class, shown in Example 12-20, which handles most of the GUI layout grunt work.

Example 12-20. PP4EInternetSocketsform.py

"""
##################################################################
a reusable form class, used by getfilegui (and others)
##################################################################
"""

from tkinter import *
entrysize = 40

class Form:                                           # add non-modal form box
    def __init__(self, labels, parent=None):          # pass field labels list
        labelsize = max(len(x) for x in labels) + 2
        box = Frame(parent)                           # box has rows, buttons
        box.pack(expand=YES, fill=X)                  # rows has row frames
        rows = Frame(box, bd=2, relief=GROOVE)        # go=button or return key
        rows.pack(side=TOP, expand=YES, fill=X)       # runs onSubmit method
        self.content = {}
        for label in labels:
            row = Frame(rows)
            row.pack(fill=X)
            Label(row, text=label, width=labelsize).pack(side=LEFT)
            entry = Entry(row, width=entrysize)
            entry.pack(side=RIGHT, expand=YES, fill=X)
            self.content[label] = entry
        Button(box, text='Cancel', command=self.onCancel).pack(side=RIGHT)
        Button(box, text='Submit', command=self.onSubmit).pack(side=RIGHT)
        box.master.bind('<Return>', (lambda event: self.onSubmit()))

    def onSubmit(self):                                      # override this
        for key in self.content:                             # user inputs in
            print(key, '	=>	', self.content[key].get())    # self.content[k]

    def onCancel(self):                                      # override if need
        Tk().quit()                                          # default is exit

class DynamicForm(Form):
    def __init__(self, labels=None):
        labels = input('Enter field names: ').split()
        Form.__init__(self, labels)
    def onSubmit(self):
        print('Field values...')
        Form.onSubmit(self)
        self.onCancel()

if __name__ == '__main__':
    import sys
    if len(sys.argv) == 1:
        Form(['Name', 'Age', 'Job'])     # precoded fields, stay after submit
    else:
        DynamicForm()                    # input fields, go away after submit
    mainloop()

Compare the approach of this module with that of the form row builder function we wrote in Chapter 10’s Example 10-9. While that example much reduced the amount of code required, the module here is a noticeably more complete and automatic scheme—it builds the entire form given a set of label names, and provides a dictionary with every field’s entry widget ready to be fetched.

Running this module standalone triggers its self-test code at the bottom. Without arguments (and when double-clicked in a Windows file explorer), the self-test generates a form with canned input fields captured in Figure 12-4, and displays the fields’ values on Enter key presses or Submit button clicks:

C:...PP4EInternetSockets> python form.py
Age     =>       40
Name    =>       Bob
Job     =>       Educator, Entertainer

Figure 12-4. Form test, canned fields

With a command-line argument, the form class module’s self-test code prompts for an arbitrary set of field names for the form; fields can be constructed as dynamically as we like. Figure 12-5 shows the input form constructed in response to the following console interaction. Field names could be accepted on the command line, too, but the input built-in function works just as well for simple tests like this. In this mode, the GUI goes away after the first submit, because DynamicForm.onSubmit says so:

C:...PP4EInternetSockets> python form.py -
Enter field names: Name Email Web Locale
Field values...
Locale  =>       Florida
Web     =>       http://learning-python.com
Name    =>       Book
Email   =>       [email protected]

Figure 12-5. Form test, dynamic fields

And last but not least, Example 12-21 shows the getfile user interface again, this time constructed with the reusable form layout class. We need to fill in only the form labels list and provide an onSubmit callback method of our own. All of the work needed to construct the form comes “for free,” from the imported and widely reusable Form superclass.

Example 12-21. PP4EInternetSocketsgetfilegui.py

"""
launch getfile client with a reusable GUI form class;
os.chdir to target local dir if input (getfile stores in cwd);
to do: use threads, show download status and getfile prints;
"""

from form import Form
from tkinter import Tk, mainloop
from tkinter.messagebox import showinfo
import getfile, os

class GetfileForm(Form):
    def __init__(self, oneshot=False):
        root = Tk()
        root.title('getfilegui')
        labels = ['Server Name', 'Port Number', 'File Name', 'Local Dir?']
        Form.__init__(self, labels, root)
        self.oneshot = oneshot

    def onSubmit(self):
        Form.onSubmit(self)
        localdir   = self.content['Local Dir?'].get()
        portnumber = self.content['Port Number'].get()
        servername = self.content['Server Name'].get()
        filename   = self.content['File Name'].get()
        if localdir:
            os.chdir(localdir)
        portnumber = int(portnumber)
        getfile.client(servername, portnumber, filename)
        showinfo('getfilegui', 'Download complete')
        if self.oneshot: Tk().quit()  # else stay in last localdir

if __name__ == '__main__':
    GetfileForm()
    mainloop()

The form layout class imported here can be used by any program that needs to input form-like data; when used in this script, we get a user interface like that shown in Figure 12-6 under Windows 7 (and similar on other versions and platforms).

Figure 12-6. getfilegui in action

Pressing this form’s Submit button or the Enter key makes the getfilegui script call the imported getfile.client client-side function as before. This time, though, we also first change to the local directory typed into the form so that the fetched file is stored there (getfile stores in the current working directory, whatever that may be when it is called). Here are the messages printed in the client’s console, along with a check on the file transfer; the server is still running above testdir, but the client stores the file elsewhere after it’s fetched on the socket:

C:...InternetSockets> getfilegui.py
Local Dir?      =>       C:usersMark	emp
File Name       =>       testdirora-lp4e.gif
Server Name     =>       localhost
Port Number     =>       50001
Client got testdirora-lp4e.gif at Sun Apr 25 17:22:39 2010

C:...InternetSockets> fc /B C:Usersmark	empora-lp4e.gif testdirora-lp4e.gif
FC: no differences encountered

As usual, we can use this interface to connect to servers running locally on the same machine (as done here), or remotely on a different computer. Use a different server name and file paths if you’re running the server on a remote machine; the magic of sockets make this all “just work” in either local or remote modes.

One caveat worth pointing out here: the GUI is essentially dead while the download is in progress (even screen redraws aren’t handled—try covering and uncovering the window and you’ll see what I mean). We could make this better by running the download in a thread, but since we’ll see how to do that in the next chapter when we explore the FTP protocol, you should consider this problem a preview.

In closing, a few final notes: first, I should point out that the scripts in this chapter use tkinter techniques we’ve seen before and won’t go into here in the interest of space; be sure to see the GUI chapters in this book for implementation hints.

Keep in mind, too, that these interfaces just add a GUI on top of the existing script to reuse its code; any command-line tool can be easily GUI-ified in this way to make it more appealing and user friendly. In Chapter 14, for example, we’ll meet a more useful client-side tkinter user interface for reading and sending email over sockets (PyMailGUI), which largely just adds a GUI to mail-processing tools. Generally speaking, GUIs can often be added as almost an afterthought to a program. Although the degree of user-interface and core logic separation varies per program, keeping the two distinct makes it easier to focus on each.

And finally, now that I’ve shown you how to build user interfaces on top of this chapter’s getfile, I should also say that they aren’t really as useful as they might seem. In particular, getfile clients can talk only to machines that are running a getfile server. In the next chapter, we’ll discover another way to download files—FTP—which also runs on sockets but provides a higher-level interface and is available as a standard service on many machines on the Net. We don’t generally need to start up a custom server to transfer files over FTP, the way we do with getfile. In fact, the user-interface scripts in this chapter could be easily changed to fetch the desired file with Python’s FTP tools, instead of the getfile module. But instead of spilling all the beans here, I’ll just say, “Read on.”

Sockets, the main subject of this chapter, are the programmer’s interface to network connections in Python scripts. As we’ve seen, they let us write scripts that converse with computers arbitrarily located on a network, and they form the backbone of the Internet and the Web.

If you’re looking for a lower-level way to communicate with devices in general, though, you may also be interested in the topic of Python’s serial port interfaces. This isn’t quite related to Internet scripting, but it’s similar enough in spirit and is discussed often enough on the Net to merit a few words here.

In brief, scripts can use serial port interfaces to engage in low-level communication with things like mice, modems, and a wide variety of serial devices and hardware. Serial port interfaces are also used to communicate with devices connected over infrared ports (e.g., hand-held computers and remote modems). Such interfaces let scripts tap into raw data streams and implement device protocols of their own. Other Python tools such as the ctypes and struct modules may provide additional tools for creating and extracting the packed binary data these ports transfer.

At this writing, there are a variety of ways to send and receive data over serial ports in Python scripts. Notable among these options is an open source extension package known as pySerial, which allows Python scripts to control serial ports on both Windows and Linux, as well as BSD Unix, Jython (for Java), and IronPython (for .Net and Mono). Unfortunately, there is not enough space to cover this or any other serial port option in any sort of detail in this text. As always, see your favorite web search engine for up-to-date details on this front.

Table of Contents for 12. Network Scripting

Create new playlist

Sign In

Sign Up

Chapter 12. Network Scripting