Over the 15 years since this book was first published, the Internet has virtually exploded onto the mainstream stage. It has rapidly grown from a simple communication device used primarily by academics and researchers into a medium that is now nearly as pervasive as the television and telephone. Social observers have likened the Internet’s cultural impact to that of the printing press, and technical observers have suggested that all new software development of interest occurs only on the Internet. Naturally, time will be the final arbiter for such claims, but there is little doubt that the Internet is a major force in society and one of the main application contexts for modern software systems.
The Internet also happens to be one of the primary application domains for the Python programming language. In the decade and a half since the first edition of this book was written, the Internet’s growth has strongly influenced Python’s tool set and roles. Given Python and a computer with a socket-based Internet connection today, we can write Python scripts to read and send email around the world, fetch web pages from remote sites, transfer files by FTP, program interactive websites, parse HTML and XML files, and much more, simply by using the Internet modules that ship with Python as standard tools.
In fact, companies all over the world do: Google, YouTube, Walt Disney, Hewlett-Packard, JPL, and many others rely on Python’s standard tools to power their websites. For example, the Google search engine—widely credited with making the Web usable—makes extensive use of Python code. The YouTube video server site is largely implemented in Python. And the BitTorrent peer-to-peer file transfer system—written in Python and downloaded by tens of millions of users—leverages Python’s networking skills to share files among clients and remove some server bottlenecks.
Many also build and manage their sites with larger Python-based toolkits. For instance, the Zope web application server was an early entrant to the domain and is itself written and customizable in Python. Others build sites with the Plone content management system, which is built upon Zope and delegates site content to its users. Still others use Python to script Java web applications with Jython (formerly known as JPython)—a system that compiles Python programs to Java bytecode, exports Java libraries for use in Python scripts, and allows Python code to serve as web applets downloaded and run in a browser.
In more recent years, new techniques and systems have risen to prominence in the Web sphere. For example, XML-RPC and SOAP interfaces for Python have enabled web service programming; frameworks such as Google App Engine, Django, and TurboGears have emerged as powerful tools for constructing websites; the XML package in Python’s standard library, as well as third-party extensions, provides a suite of XML processing tools; and the IronPython implementation provides seamless .NET/Mono integration for Python code in much the same way Jython leverages Java libraries.
As the Internet has grown, so too has Python’s role as an Internet tool. Python has proven to be well suited to Internet scripting for some of the very same reasons that make it ideal in other domains. Its modular design and rapid turnaround mix well with the intense demands of Internet development. In this part of the book, we’ll find that Python does more than simply support Internet scripting; it also fosters qualities such as productivity and maintainability that are essential to Internet projects of all shapes and sizes.
Internet programming entails many topics, so to make the presentation easier to digest, I’ve split this subject over the next five chapters of this book. Here’s this part’s chapter rundown:
This chapter introduces Internet fundamentals and explores sockets, the underlying communications mechanism of the Internet. We met sockets briefly as IPC tools in Chapter 5 and again in a GUI use case in Chapter 10, but here we will study them in the depth afforded by their broader networking roles.
Chapter 13 covers the fundamentals of client-side scripting and Internet protocols. Here, we’ll explore Python’s standard support for FTP, email, HTTP, NNTP, and more.
Chapter 14 presents a larger client-side case study: PyMailGUI, a full-featured email client.
Chapter 15 discusses the fundamentals of server-side scripting and website construction. We’ll study basic CGI scripting techniques and concepts that underlie most of what happens in the Web.
Chapter 16 presents a larger server-side case study: PyMailCGI, a full-featured webmail site.
Each chapter assumes you’ve read the previous one, but you can generally skip around, especially if you have prior experience in the Internet domain. Since these chapters represent a substantial portion of this book at large, the following sections go into a few more details about what we’ll be studying.
In conceptual terms, the Internet can roughly be thought of as being composed of multiple functional layers:
Mechanisms such as the TCP/IP transport mechanism, which deal with transferring bytes between machines, but don’t care what they mean
The programmer’s interface to the network, which runs on top of physical networking layers like TCP/IP and supports flexible client/server models in both IPC and networked modes
Structured Internet communication schemes such as FTP and email, which run on top of sockets and define message formats and standard addresses
Application models such as CGI, which define the structure of communication between web browsers and web servers, also run on top of sockets, and support the notion of web-based programs
Third-party systems such as Django, App Engine, Jython, and pyjamas, which leverage sockets and communication protocols, too, but address specific techniques or larger problem domains
This book covers the middle three tiers in this list—sockets, the Internet protocols that run on them, and the CGI model of web-based conversations. What we learn here will also apply to more specific toolkits in the last tier above, because they are all ultimately based upon the same Internet and web fundamentals.
More specifically, in this and the next chapter, our main focus is on programming the second and third layers: sockets and higher-level Internet protocols. We’ll start this chapter at the bottom, learning about the socket model of network programming. Sockets aren’t strictly tied to Internet scripting, as we saw in Chapter 5’s IPC examples, but they are presented in full here because this is one of their primary roles. As we’ll see, most of what happens on the Internet happens through sockets, whether you notice or not.
After introducing sockets, the next two chapters make their way up to Python’s client-side interfaces to higher-level protocols—things like email and FTP transfers, which run on top of sockets. It turns out that a lot can be done with Python on the client alone, and Chapters 13 and 14 will sample the flavor of Python client-side scripting. Finally, the last two chapters in this part of the book then move on to present server-side scripting—programs that run on a server computer and are usually invoked by a web browser.
Now that I’ve told you what we will cover in this book, I also want to be clear about what we won’t cover. Like tkinter, the Internet is a vast topic, and this part of the book is mostly an introduction to its core concepts and an exploration of representative tasks. Because there are so many Internet-related modules and extensions, this book does not attempt to serve as an exhaustive survey of the domain. Even in just Python’s own tool set, there are simply too many Internet modules to include each in this text in any sort of useful fashion.
Moreover, higher-level tools like Django, Jython, and App Engine are very large systems in their own right, and they are best dealt with in more focused documents. Because dedicated books on such topics are now available, we’ll merely scratch their surfaces here with a brief survey later in this chapter. This book also says almost nothing about lower-level networking layers such as TCP/IP. If you’re curious about what happens on the Internet at the bit-and-wire level, consult a good networking text for more details.
In other words, this part is not meant to be an exhaustive reference to Internet and web programming with Python—a topic which has evolved between prior editions of this book, and will undoubtedly continue to do so after this one is published. Instead, the goal of this part of the book is to serve as a tutorial introduction to the domain to help you get started, and to provide context and examples which will help you understand the documentation for tools you may wish to explore after mastering the fundamentals here.
Like the prior parts of the book, this one has other agendas, too. Along the way, this part will also put to work many of the operating-system and GUI interfaces we studied in Parts II and III (e.g., processes, threads, signals, and tkinter). We’ll also get to see the Python language applied in realistically scaled programs, and we’ll investigate some of the design choices and challenges that the Internet presents.
That last statement merits a few more words. Internet scripting, like GUIs, is one of the “sexier” application domains for Python. As in GUI work, there is an intangible but instant gratification in seeing a Python Internet program ship information all over the world. On the other hand, by its very nature, network programming can impose speed overheads and user interface limitations. Though it may not be a fashionable stance these days, some applications are still better off not being deployed on the Web.
A traditional “desktop” GUI like those of Part III, for example, can combine the feature-richness and responsiveness of client-side libraries with the power of network access. On the other hand, web-based applications offer compelling benefits in portability and administration. In this part of the book, we will take an honest look at the Net’s trade-offs as they arise and explore examples which illustrate the advantages of both web and nonweb architectures. In fact, the larger PyMailGUI and PyMailCGI examples we’ll explore are intended in part to serve this purpose.
The Internet is also considered by many to be something of an ultimate proof of concept for open source tools. Indeed, much of the Net runs on top of a large number of such tools, such as Python, Perl, the Apache web server, the sendmail program, MySQL, and Linux.[42] Moreover, new tools and technologies for programming the Web sometimes seem to appear faster than developers can absorb them.
The good news is that Python’s integration focus makes it a natural in such a heterogeneous world. Today, Python programs can be installed as client-side and server-side tools; used as applets and servlets in Java applications; mixed into distributed object systems like CORBA, SOAP, and XML-RPC; integrated into AJAX-based applications; and much more. In more general terms, the rationale for using Python in the Internet domain is exactly the same as in any other—Python’s emphasis on quality, productivity, portability, and integration makes it ideal for writing Internet programs that are open, maintainable, and delivered according to the ever-shrinking schedules in this field.
Internet scripts generally imply execution contexts that earlier examples in this book have not. That is, it usually takes a bit more to run programs that talk over networks. Here are a few pragmatic notes about this part’s examples, up front:
You don’t need to download extra packages to run examples in this part of the book. All of the examples we’ll see are based on the standard set of Internet-related modules that come with Python and are installed in Python’s library directory.
You don’t need a state-of-the-art network link or an account on a web server to run the socket and client-side examples in this part. Although some socket examples will be shown running remotely, most can be run on a single local machine. Client-side examples that demonstrate protocol like FTP require only basic Internet access, and email examples expect just POP and SMTP capable servers.
You don’t need an account on a web server machine to run the server-side scripts in later chapters; they can be run by any web browser. You may need such an account to change these scripts if you store them remotely, but not if you use a locally running web server as we will in this book.
We’ll discuss configuration details as we move along, but in
short, when a Python script opens an Internet connection (with the
socket
module or one of the
Internet protocol modules), Python will happily use whatever
Internet link exists on your machine, be that a dedicated T1 line, a
DSL line, or a simple modem. For instance, opening a socket on a
Windows PC automatically initiates processing to create a connection
to your Internet provider if needed.
Moreover, as long as your platform supports sockets, you
probably can run many of the examples here even if you have no
Internet connection at all. As we’ll see, a machine name localhost
or ""
(an empty string) usually means the
local computer itself. This allows you to test both the client and
the server sides of a dialog on the same computer without connecting
to the Net. For example, you can run both socket-based clients and
servers locally on a Windows PC without ever going out to the Net.
In other words, you can likely run the programs here whether you
have a way to connect to the Internet or not.
Some later examples assume that a particular kind of server is running on a server machine (e.g., FTP, POP, SMTP), but client-side scripts themselves work on any Internet-aware machine with Python installed. Server-side examples in Chapters 15 and 16 require more: to develop CGI scripts, you’ll need to either have a web server account or run a web server program locally on your own computer (which is easier than you may think—we’ll learn how to code a simple one in Python in Chapter 15). Advanced third-party systems like Jython and Zope must be downloaded separately, of course; we’ll peek at some of these briefly in this chapter but defer to their own documentation for more details.
Although many are outside our scope here, there are a variety of ways that Python programmers script the Web. Just as we did for GUIs, I want to begin with a quick overview of some of the more popular tools in this domain before we jump into the fundamentals.
As we’ll see in this chapter, Python comes with tools the
support basic networking, as well as implementation of custom
types of network servers. This includes
sockets, but also the select call for
asynchronous servers, as well as higher-order and pre-coded
socket server classes. Standard library
modules socket
, select
, and socket
server
support all these
roles.
As we’ll see in the next chapter, Python’s Internet arsenal also includes canned support for the client side of most standard Internet protocols—scripts can easily make use of FTP, email, HTTP, Telnet, and more. Especially when wedded to desktop GUIs of the sort we met in the preceding part of this book, these tools open the door to full-featured and highly responsive Web-aware applications.
Perhaps the simplest way to implement interactive website behavior, CGI scripting is an application model for running scripts on servers to process form data, take action based upon it, and produce reply pages. We’ll use it later in this part of the book. It’s supported by Python’s standard library directly, is the basis for much of what happens on the Web, and suffices for simpler site development tasks. Raw CGI scripting doesn’t by itself address issues such as cross-page state retention and concurrent updates, but CGI scripts that use devices like cookies and database systems often can.
For more demanding Web work, frameworks can automate many of the low-level details and provide more structured and powerful techniques for dynamic site implementation. Beyond basic CGI scripts, the Python world is flush with third-party web frameworks such as Django—a high-level framework that encourages rapid development and clean, pragmatic design and includes a dynamic database access API and its own server-side templating language; Google App Engine—a “cloud computing” framework that provides enterprise-level tools for use in Python scripts and allows sites to leverage the capacity of Google’s Web infrastructure; and Turbo Gears—an integrated collection of tools including a JavaScript library, a template system, CherryPy for web interaction, and SQLObject for accessing databases using Python’s class model.
Also in the framework category are Zope—an open source web application server and toolkit, written in and customizable with Python, in which websites are implemented using a fundamentally object-oriented model; Plone—a Zope-based website builder which provides a workflow model (called a content management system) that allows content producers to add their content to a site; and other popular systems for website construction, including pylons, web2py, CherryPy, and Webware.
Many of these frameworks are based upon the now widespread MVC (model-view-controller) structure, and most provide state retention solutions that wrap database storage. Some make use of the ORM (object relational mapping) model we’ll meet in the next part of the book, which superimposes Python’s classes onto relational database tables, and Zope stores objects in your site in the ZODB object-oriented database we’ll study in the next part as well.
Discussed at the start of Chapter 7, newer and emerging “rich Internet application” (RIA) systems such as Flex, Silverlight, JavaFX, and pyjamas allow user interfaces implemented in web browsers to be much more dynamic and functional than HTML has traditionally allowed. These are client-side solutions, based generally upon AJAX and JavaScript, which provide widget sets that rival those of traditional “desktop” GUIs and provide for asynchronous communication with web servers. According to some observers, such interactivity is a major component of the “Web 2.0” model.
Ultimately, the web browser is a “desktop” GUI application, too, albeit one which is very widely available and which can be generalized with RIA techniques to serve as a platform for rendering other GUIs, using software layers that do not rely on a particular GUI library. In effect, RIAs turn web browsers into extendable GUIs.
At least that’s their goal today. Compared to traditional GUIs, RIAs gain some portability and deployment simplicity, in exchange for decreased performance and increased software stack complexity. Moreover, much as in the GUI realm, there are already competing RIA toolkits today which may add dependencies and impact portability. Unless a pervasive frontrunner appears, using a RIA application may require an install step, not unlike desktop applications.
Stay tuned, though; like the Web at large, the RIA story is still a work in progress. The emerging HTML5 standard, for instance, while likely not to become prevalent for some years to come, may obviate the need for RIA browser plug-ins eventually.
XML-RPC is a technology that provides remote procedural calls to components over networks. It routes requests over the HTTP protocol and ships data back and forth packaged as XML text. To clients, web servers appear to be simple functions; when function calls are issued, passed data is encoded as XML and shipped to remote servers using the Web’s HTTP transport mechanism. The net effect is to simplify the interface to web servers in client-side programs.
More broadly, XML-RPC fosters the notion of web
services—reusable software components that run on the
Web—and is supported by Python’s xmlrpc.client
module, which handles
the client side of this protocol, and xmlrcp.server
, which provides tools
for the server side. SOAP is a similar but generally heavier web
services protocol, available to Python in the third-party
SOAPy and ZSI
packages, among others.
An earlier but comparable technology, CORBA is an architecture for distributed programming, in which components communicate across a network by routing calls through an Object Request Broker (ORB). Python support for CORBA is available in the third-party OmniORB package, as well as the (still available though not recently maintained) ILU system.
We also met Jython and IronPython briefly at the start of Chapter 7, in the context of GUIs. By compiling Python script to Java bytecode, Jython also allows Python scripts to be used in any context that Java programs can. This includes web-oriented roles, such as applets stored on the server but run on the client when referenced within web pages. The IronPython system also mentioned in Chapter 7 similarly offers Web-focused options, including access to the Silverlight RIA framework and its Moonlight implementation in the Mono system for Linux.
Though not technically tied to the Internet, XML text often
appears in such roles. Because of its other roles, though, we’ll
study Python’s basic XML parsing support, as well as third-party
extensions to it, in the next part of this book, when we explore
Python’s text processing toolkit. As we’ll see,
Python’s xml
package
comes with support for DOM, SAX, and ElementTree style XML
parsing, and the open source domain provides extensions for
XPath and much more. Python’s html.parser
library module also provides a HTML-specific parser, with a
model not unlike that of XML’s SAX technique. Such tools can be
used in screen scraping roles, to extract
content of web pages fetched with urllib.request
tools.
The PyWin32 package allows Python scripts to communicate via COM on Windows to perform feats such as editing Word documents and populating Excel spreadsheets (additional tools support Excel document processing). Though not related to the Internet itself (and being arguably upstaged by .NET in recent years), the distributed extension to COM, DCOM, offers additional options for distributing applications over networks.
Other tools serve more specific roles. Among this crowd are mod_python—a system which optimizes the execution of Python server-scripts in the Apache web server; Twisted—an asynchronous, event-driven, networking framework written in Python, with support for a large number of network protocols and with precoded implementations of common network servers; HTMLgen—a lightweight tool that allows HTML code to be generated from a tree of Python objects that describes a web page; and Python Server Pages (PSP)—a server-side templating technology that embeds Python code inside HTML, runs it with request context to render part of a reply page, and is strongly reminiscent of PHP, ASP, and JSP.
As you might expect given the prominence of the Web, there are more Internet tools for Python than we have space to discuss here. For more on this front, see the PyPI website at http://python.org/, or visit your favorite web search engine (some of which are implemented using Python’s Internet tools themselves).
Again, the goal of this book is to cover the fundamentals in an in-depth way, so that you’ll have the context needed to use tools like some of those above well, when you’re ready to graduate to more comprehensive solutions. As we’ll see, the basic model of CGI scripting we’ll meet here illustrates the mechanisms underlying all web development, whether it’s implemented by bare-bones scripts, or advanced frameworks.
Because we must walk before we can run well, though, let’s start at the bottom here, and get a handle on what the Internet really is. The Internet today rests upon a rich software stack; while tools can hide some of its complexity, programming it skillfully still requires knowledge of all its layers. As we’ll see, deploying Python on the Web, especially with higher-order web frameworks like those listed above, is only possible because we truly are “surfing on the shoulders of giants.”
Unless you’ve been living in a cave for the last decade or two, you are probably already familiar with the Internet, at least from a user’s perspective. Functionally, we use it as a communication and information medium, by exchanging email, browsing web pages, transferring files, and so on. Technically, the Internet consists of many layers of abstraction and devices—from the actual wires used to send bits across the world to the web browser that grabs and renders those bits into text, graphics, and audio on your computer.
In this book, we are primarily concerned with the programmer’s interface to the Internet. This, too, consists of multiple layers: sockets, which are programmable interfaces to the low-level connections between machines, and standard protocols, which add structure to discussions carried out over sockets. Let’s briefly look at each of these layers in the abstract before jumping into programming details.
In simple terms, sockets are a programmable interface to connections between programs, possibly running on different computers of a network. They allow data formatted as byte strings to be passed between processes and machines. Sockets also form the basis and low-level “plumbing” of the Internet itself: all of the familiar higher-level Net protocols, like FTP, web pages, and email, ultimately occur over sockets. Sockets are also sometimes called communications endpoints because they are the portals through which programs send and receive bytes during a conversation.
Although often used for network conversations, sockets may also be used as a communication mechanism between programs running on the same computer, taking the form of a general Inter-Process Communication (IPC) mechanism. We saw this socket usage mode briefly in Chapter 5. Unlike some IPC devices, sockets are bidirectional data streams: programs may both send and receive data through them.
To programmers, sockets take the form of a handful of calls available in a library. These socket calls know how to send bytes between machines, using lower-level operations such as the TCP network transmission control protocol. At the bottom, TCP knows how to transfer bytes, but it doesn’t care what those bytes mean. For the purposes of this text, we will generally ignore how bytes sent to sockets are physically transferred. To understand sockets fully, though, we need to know a bit about how computers are named.
Suppose for just a moment that you wish to have a telephone conversation with someone halfway across the world. In the real world, you would probably need either that person’s telephone number or a directory that you could use to look up the number from her name (e.g., a telephone book). The same is true on the Internet: before a script can have a conversation with another computer somewhere in cyberspace, it must first know that other computer’s number or name.
Luckily, the Internet defines standard ways to name both a remote machine and a service provided by that machine. Within a script, the computer program to be contacted through a socket is identified by supplying a pair of values—the machine name and a specific port number on that machine:
A machine name may take the form of either a string of numbers
separated by dots, called an IP address (e.g., 166.93.218.100
), or a more legible
form known as a domain name (e.g.,
starship.python.net). Domain names are
automatically mapped into their dotted numeric address
equivalent when used, by something called a domain name
server—a program on the Net that serves the same purpose as
your local telephone directory assistance service. As a
special case, the machine name localhost
, and its equivalent IP
address 127.0.0.1
, always
mean the same local machine; this allows us to refer to
servers running locally on the same computer as its
clients.
A port number is an agreed-upon numeric identifier
for a given conversation. Because computers on the Net
support a variety of services, port numbers are used to name
a particular conversation on a given machine. For two
machines to talk over the Net, both must associate sockets
with the same machine name and port number when initiating
network connections. As we’ll see, Internet protocols such
as email and the Web have standard reserved port numbers for
their connections, so clients can request a service
regardless of the machine providing it. Port number 80
, for example, usually provides
web pages on any web server machine.
The combination of a machine name and a port number uniquely identifies every dialog on the Net. For instance, an ISP’s computer may provide many kinds of services for customers—web pages, Telnet, FTP transfers, email, and so on. Each service on the machine is assigned a unique port number to which requests may be sent. To get web pages from a web server, programs need to specify both the web server’s Internet Protocol (IP) or domain name and the port number on which the server listens for web page requests.
If this sounds a bit strange, it may help to think of it in old-fashioned terms. To have a telephone conversation with someone within a company, for example, you usually need to dial both the company’s phone number and the extension of the person you want to reach. If you don’t know the company’s number, you can probably find it by looking up the company’s name in a phone book. It’s almost the same on the Net—machine names identify a collection of services (like a company), port numbers identify an individual service within a particular machine (like an extension), and domain names are mapped to IP numbers by domain name servers (like a phone book).
When programs use sockets to communicate in specialized ways with another machine (or with other processes on the same machine), they need to avoid using a port number reserved by a standard protocol—numbers in the range of 0 to 1023—but we first need to discuss protocols to understand why.
Although sockets form the backbone of the Internet, much of the activity that happens on the Net is programmed with protocols,[43] which are higher-level message models that run on top of sockets. In short, the standard Internet protocols define a structured way to talk over sockets. They generally standardize both message formats and socket port numbers:
Message formats provide structure for the bytes exchanged over sockets during conversations.
Port numbers are reserved numeric identifiers for the underlying sockets over which messages are exchanged.
Raw sockets are still commonly used in many systems, but it is perhaps more common (and generally easier) to communicate with one of the standard higher-level Internet protocols. As we’ll see, Python provides support for standard protocols, which automates most of the socket and message formatting details.
Technically speaking, socket port numbers can be any 16-bit integer value between 0 and 65,535. However, to make it easier for programs to locate the standard protocols, port numbers in the range of 0 to 1023 are reserved and preassigned to the standard higher-level protocols. Table 12-1 lists the ports reserved for many of the standard protocols; each gets one or more preassigned numbers from the reserved range.
Protocol | Common function | Port number | Python module |
HTTP | Web pages | 80 |
|
NNTP | Usenet news | 119 |
|
FTP data default | File transfers | 20 |
|
FTP control | File transfers | 21 |
|
SMTP | Sending email | 25 |
|
POP3 | Fetching email | 110 |
|
IMAP4 | Fetching email | 143 |
|
Finger | Informational | 79 | n/a |
SSH | Command lines | 22 | n/a: third party |
Telnet | Command lines | 23 |
|
To socket programmers, the standard protocols mean that port numbers 0 to 1023 are off-limits to scripts, unless they really mean to use one of the higher-level protocols. This is both by standard and by common sense. A Telnet program, for instance, can start a dialog with any Telnet-capable machine by connecting to its port, 23; without preassigned port numbers, each server might install Telnet on a different port. Similarly, websites listen for page requests from browsers on port 80 by standard; if they did not, you might have to know and type the HTTP port number of every site you visit while surfing the Net.
By defining standard port numbers for services, the Net naturally gives rise to a client/server architecture. On one side of a conversation, machines that support standard protocols perpetually run a set of programs that listen for connection requests on the reserved ports. On the other end of a dialog, other machines contact those programs to use the services they export.
We usually call the perpetually running listener program a server and the connecting program a client. Let’s use the familiar web browsing model as an example. As shown in Table 12-1, the HTTP protocol used by the Web allows clients and servers to talk over sockets on port 80:
A machine that hosts websites usually runs a web server program that constantly listens for incoming connection requests, on a socket bound to port 80. Often, the server itself does nothing but watch for requests on its port perpetually; handling requests is delegated to spawned processes or threads.
Programs that wish to talk to this server specify the server machine’s name and port 80 to initiate a connection. For web servers, typical clients are web browsers like Firefox, Internet Explorer, or Chrome, but any script can open a client-side connection on port 80 to fetch web pages from the server. The server’s machine name can also be simply “localhost” if it’s the same as the client’s.
In general, many clients may connect to a server over sockets, whether it implements a standard protocol or something more specific to a given application. And in some applications, the notion of client and server is blurred—programs can also pass bytes between each other more as peers than as master and subordinate. An agent in a peer-to-peer file transfer system, for instance, may at various times be both client and server for parts of files transferred.
For the purposes of this book, though, we usually call programs that listen on sockets servers, and those that connect clients. We also sometimes call the machines that these programs run on server and client (e.g., a computer on which a web server program runs may be called a web server machine, too), but this has more to do with the physical than the functional.
Functionally, protocols may accomplish a familiar task, like reading email or posting a Usenet newsgroup message, but they ultimately consist of message bytes sent over sockets. The structure of those message bytes varies from protocol to protocol, is hidden by the Python library, and is mostly beyond the scope of this book, but a few general words may help demystify the protocol layer.
Some protocols may define the contents of messages sent over sockets; others may specify the sequence of control messages exchanged during conversations. By defining regular patterns of communication, protocols make communication more robust. They can also minimize deadlock conditions—machines waiting for messages that never arrive.
For example, the FTP protocol prevents deadlock by conversing over two sockets: one for control messages only and one to transfer file data. An FTP server listens for control messages (e.g., “send me a file”) on one port, and transfers file data over another. FTP clients open socket connections to the server machine’s control port, send requests, and send or receive file data over a socket connected to a data port on the server machine. FTP also defines standard message structures passed between client and server. The control message used to request a file, for instance, must follow a standard format.
If all of this sounds horribly complex, cheer up: Python’s standard
protocol modules handle all the details. For example, the Python
library’s ftplib
module
manages all the socket and message-level handshaking implied by the
FTP protocol. Scripts that import ftplib
have access to a much higher-level
interface for FTPing files and can be largely ignorant of both the underlying FTP protocol
and the sockets over which it runs.[44]
In fact, each supported protocol is represented in Python’s
standard library by either a module package of the same name as the
protocol or by a module file with a name of the form
xxxlib.py, where xxx is
replaced by the protocol’s name. The last column in Table 12-1 gives the
module name for some standard protocol modules. For instance, FTP is
supported by the module file ftplib.py and HTTP
by package http.*
. Moreover,
within the protocol modules, the top-level interface object is often
the name of the protocol. So, for instance, to start an FTP session
in a Python script, you run import
ftplib
and pass appropriate parameters in a call to
ftplib.FTP
; for Telnet, create a
telnetlib.Telnet
instance.
In addition to the protocol implementation modules in Table 12-1, Python’s
standard library also contains modules for fetching replies from web
servers for a web page request (urllib.request
), parsing and handling data
once it has been transferred over sockets or protocols (html.parser
, the email.*
and xml.*
packages), and more. Table 12-2 lists
some of the more commonly used modules in this category.
We will meet many of the modules in this table in the next few chapters of this book, but not all of them. Moreover, there are additional Internet modules in Python not shown here. The modules demonstrated in this book will be representative, but as always, be sure to see Python’s standard Library Reference Manual for more complete and up-to-date lists and details.
Now that we’ve seen how sockets figure into the Internet picture, let’s move on to explore the tools that Python provides for programming sockets with Python scripts. This section shows you how to use the Python socket interface to perform low-level network communications. In later chapters, we will instead use one of the higher-level protocol modules that hide underlying sockets. Python’s socket interfaces can be used directly, though, to implement custom network dialogs and to access standard protocols manually.
As previewed in Chapter 5, the
basic socket interface in Python is the standard library’s socket
module. Like the os
POSIX module, Python’s socket
module is just a thin wrapper
(interface layer) over the underlying C library’s socket calls. Like
Python files, it’s also object-based—methods of a socket object
implemented by this module call out to the corresponding C library’s
operations after data conversions. For instance, the C library’s
send
and recv
function calls become methods of socket
objects in Python.
Python’s socket
module
supports socket programming on any machine that supports BSD-style
sockets—Windows, Macs, Linux, Unix, and so on—and so provides a
portable socket interface. In addition, this module supports all
commonly used socket types—TCP/IP, UDP, datagram, and Unix
domain—and can be used as both a network interface API and a general
IPC mechanism between processes running on the same machine.
From a functional perspective, sockets are a programmer’s device
for transferring bytes between programs, possibly running on different
computers. Although sockets themselves transfer only byte strings, we
can also transfer Python objects through them by using Python’s
pickle
module. Because this module
converts Python objects such as lists, dictionaries, and class
instances to and from byte strings, it provides the extra step needed
to ship higher-level objects through sockets when required.
Python’s struct
module can
also be used to format Python objects as packed binary data byte
strings for transmission, but is generally limited in scope to objects
that map to types in the C programming language. The pickle
module supports transmission of
larger object, such as dictionaries and class instances. For other
tasks, including most standard Internet protocols, simpler formatted
byte strings suffice. We’ll learn more about pickle
later in this chapter and
book.
Beyond basic data communication tasks, the socket
module also includes a variety of
more advanced tools. For instance, it has calls for the following and
more:
Converting bytes to a standard network ordering (ntohl
, htonl
)
Querying machine name and address (gethostname
, gethostbyname
)
Wrapping socket objects in a file object interface (sockobj.makefile
)
Making socket calls nonblocking (sockobj.setblocking
)
Setting socket timeouts (sockobj.settimeout
)
Provided your Python was compiled with Secure Sockets Layer
(SSL) support, the ssl
standard
library module also supports encrypted transfers with its ssl.wrap_socket
call. This call wraps a
socket object in SSL logic, which is used in turn by other standard
library modules to support the HTTPS secure website protocol (http.client
and urllib.request
), secure email transfers
(poplib
and smtplib
), and more. We’ll meet some of these
other modules later in this part of the book, but we won’t study all
of the socket
module’s advanced
features in this text; see the Python library manual for usage details
omitted here.
Although we won’t get into advanced socket use in this chapter, basic socket
transfers are remarkably easy to code in Python. To create a
connection between machines, Python programs import the socket
module, create a socket object, and
call the object’s methods to establish connections and send and
receive data.
Sockets are inherently bidirectional in nature, and socket
object methods map directly to socket calls in the C library. For
example, the script in Example 12-1 implements a program that
simply listens for a connection on a socket and echoes back over a
socket whatever it receives through that socket, adding Echo=>
string prefixes.
""" Server side: open a TCP/IP socket on a port, listen for a message from a client, and send an echo reply; this is a simple one-shot listen/reply conversation per client, but it goes into an infinite loop to listen for more clients as long as this server script runs; the client may run on a remote machine, or on same computer if it uses 'localhost' for server """ from socket import * # get socket constructor and constants myHost = '' # '' = all available interfaces on host myPort = 50007 # listen on a non-reserved port number sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP socket object sockobj.bind((myHost, myPort)) # bind it to server port number sockobj.listen(5) # listen, allow 5 pending connects while True: # listen until process killed connection, address = sockobj.accept() # wait for next client connect print('Server connected by', address) # connection is a new socket while True: data = connection.recv(1024) # read next line on client socket if not data: break # send a reply line to the client connection.send(b'Echo=>' + data) # until eof when socket closed connection.close()
As mentioned earlier, we usually call programs like this that listen for incoming connections servers because they provide a service that can be accessed at a given machine and port on the Internet. Programs that connect to such a server to access its service are generally called clients. Example 12-2 shows a simple client implemented in Python.
""" Client side: use sockets to send data to the server, and print server's reply to each message line; 'localhost' means that the server is running on the same machine as the client, which lets us test client and server on one machine; to test over the Internet, run a server on a remote machine, and set serverHost or argv[1] to machine's domain name or IP addr; Python sockets are a portable BSD socket interface, with object methods for the standard socket calls available in the system's C library; """ import sys from socket import * # portable socket interface plus constants serverHost = 'localhost' # server name, or: 'starship.python.net' serverPort = 50007 # non-reserved port used by the server message = [b'Hello network world'] # default text to send to server # requires bytes: b'' or str,encode() if len(sys.argv) > 1: serverHost = sys.argv[1] # server from cmd line arg 1 if len(sys.argv) > 2: # text from cmd line args 2..n message = (x.encode() for x in sys.argv[2:]) sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP/IP socket object sockobj.connect((serverHost, serverPort)) # connect to server machine + port for line in message: sockobj.send(line) # send line to server over socket data = sockobj.recv(1024) # receive line from server: up to 1k print('Client received:', data) # bytes are quoted, was `x`, repr(x) sockobj.close() # close socket to send eof to server
Before we see these programs in action, let’s take a minute to explain how this client and server do their stuff. Both are fairly simple examples of socket scripts, but they illustrate the common call patterns of most socket-based programs. In fact, this is boilerplate code: most connected socket programs generally make the same socket calls that our two scripts do, so let’s step through the important points of these scripts line by line.
Programs such as Example 12-1 that provide services for other programs with sockets generally start out by following this sequence of calls:
sockobj = socket(AF_INET,
SOCK_STREAM)
Uses the Python socket module to create a TCP socket
object. The names AF_INET
and SOCK_STREAM
are
preassigned variables defined by and imported from the
socket module; using them in combination means “create a
TCP/IP socket,” the standard communication device for the
Internet. More specifically, AF_INET
means the IP address
protocol, and SOCK_STREAM
means the TCP transfer protocol. The AF_INET
/SOCK_STREAM
combination is the
default because it is so common, but it’s typical to make
this explicit.
If you use other names in this call, you can instead
create things like UDP connectionless sockets (use SOCK_DGRAM
second) and Unix domain
sockets on the local machine (use AF_UNIX
first), but we won’t do so
in this book. See the Python library manual for details on
these and other socket module options. Using other socket
types is mostly a matter of using different forms of
boilerplate code.
sockobj.bind((myHost,
myPort))
Associates the socket object with an address—for IP addresses, we pass a server machine name and port number on that machine. This is where the server identifies the machine and port associated with the socket. In server programs, the hostname is typically an empty string (“”), which means the machine that the script runs on (formally, all available local and remote interfaces on the machine), and the port is a number outside the range 0 to 1023 (which is reserved for standard protocols, described earlier).
Note that each unique socket dialog you support must
have its own port number; if you try to open a socket on a
port already in use, Python will raise an exception. Also
notice the nested parentheses in this call—for the AF_INET
address protocol socket
here, we pass the host/port socket address to bind
as a two-item tuple object
(pass a string for AF_UNIX
). Technically, bind
takes a tuple of values
appropriate for the type of socket created.
sockobj.listen(5)
Starts listening for incoming client connections and allows for a backlog of up to five pending requests. The value passed sets the number of incoming client requests queued by the operating system before new requests are denied (which happens only if a server isn’t fast enough to process requests before the queues fill up). A value of 5 is usually enough for most socket-based programs; the value must be at least 1.
At this point, the server is ready to accept connection
requests from client programs running on remote machines (or the
same machine) and falls into an infinite loop—while True
(or the equivalent while 1
for older Pythons and ex-C
programmers)—waiting for them to arrive:
connection, address =
sockobj.accept()
Waits for the next client connection request to occur;
when it does, the accept
call returns a brand-new socket object over which data can
be transferred from and to the connected client. Connections
are accepted on sockobj
,
but communication with a client happens on connection
, the new socket. This
call actually returns a two-item tuple—address
is the connecting client’s
Internet address. We can call accept
more than one time, to
service multiple client connections; that’s why each call
returns a new, distinct socket for talking to a particular
client.
Once we have a client connection, we fall into another loop to receive data from the client in blocks of up to 1,024 bytes at a time, and echo each block back to the client:
data =
connection.recv(1024)
Reads at most 1,024 more bytes of the next message sent from a client (i.e., coming across the network or IPC connection), and returns it to the script as a byte string. We get back an empty byte string when the client has finished—end-of-file is triggered when the client closes its end of the socket.
connection.send(b'Echo=>' +
data)
Sends the latest byte string data block back to the
client program, prepending the string 'Echo=>'
to it first. The
client program can then recv
what we send
here—the next reply line.
Technically this call sends as much data as possible, and
returns the number of bytes actually sent. To be fully
robust, some programs may need to resend unsent portions or
use connection.sendall
to
force all bytes to be sent.
connection.close()
So far we’ve seen calls used to transfer data in a server, but what is
it that is actually shipped through a socket? As we learned in
Chapter 5, sockets by themselves
always deal in binary byte strings, not text.
To your scripts, this means you must send and will receive
bytes
strings, not str
, though you can convert to and from
text as needed with bytes.decode
and str.encode
methods. In our scripts, we
use b'...' bytes
literals to
satisfy socket data requirements. In other contexts, tools such as
the struct
and pickle
modules return the byte strings
we need automatically, so no extra steps are needed.
For example, although the socket model is limited to
transferring byte strings, you can send and receive nearly
arbitrary Python objects with the standard
library pickle
object
serialization module. Its dumps
and loads
calls convert Python
objects to and from byte strings, ready for direct socket
transfer:
>>>import pickle
>>>x = pickle.dumps([99, 100])
# on sending end... convert to byte strings >>>x
# string passed to send, returned by recv b'x80x03]qx00(KcKde.' >>>pickle.loads(x)
# on receiving end... convert back to object [99, 100]
For simpler types that correspond to those in the C
language, the struct
module
provides the byte-string conversion we need as well:
>>>import struct
>>>x = struct.pack('>ii', 99, 100)
# convert simpler types for transmission >>>x
b'x00x00x00cx00x00x00d' >>>struct.unpack('>ii', x)
(99, 100)
When converted this way, Python native objects become
candidates for socket-based transfers. See Chapter 4 for more on struct
. We previewed pickle
and object serialization in Chapter 1, but we’ll learn more about it and
its few pickleability constraints when we explore data persistence
in Chapter 17.
In fact there are a variety of ways to extend the basic
socket transfer model. For instance, much like os.fdopen
and open
for the file descriptors we studied
in Chapter 4, the socket.makefile
method allows you to
wrap sockets in text-mode file objects that handle text encodings
for you automatically. This call also allows you to specify
nondefault Unicode encodings and end-line behaviors in text mode
with extra arguments in 3.X just like the open
built-in function. Because its
result mimics file interfaces, the socket.makefile
call additionally allows
the pickle
module’s file-based
calls to transfer objects over sockets implicitly. We’ll see more
on socket file wrappers later in this chapter.
For our simpler scripts here, hardcoded byte strings and direct socket calls do the job. After talking with a given connected client, the server in Example 12-1 goes back to its infinite loop and waits for the next client connection request. Let’s move on to see what happened on the other side of the fence.
The actual socket-related calls in client programs like the one shown in Example 12-2 are even simpler; in fact, half of that script is preparation logic. The main thing to keep in mind is that the client and server must specify the same port number when opening their sockets and the client must identify the machine on which the server is running; in our scripts, server and client agree to use port number 50007 for their conversation, outside the standard protocol range. Here are the client’s socket calls:
sockobj = socket(AF_INET,
SOCK_STREAM)
Creates a Python socket object in the client program, just like the server.
sockobj.connect((serverHost,
serverPort))
Opens a connection to the machine and port on which
the server program is listening for client connections. This
is where the client specifies the string name of the service
to be contacted. In the client, we can either specify the
name of the remote machine as a domain name (e.g.,
starship.python.net) or numeric IP
address. We can also give the server name as localhost
(or the equivalent IP
address 127.0.0.1
) to
specify that the server program is running on the same
machine as the client; that comes in handy for debugging
servers without having to connect to the Net. And again, the
client’s port number must match the server’s exactly. Note
the nested parentheses again—just as in server bind
calls, we really pass the
server’s host/port address to connect
in a tuple object.
Once the client establishes a connection to the server, it falls into a loop, sending a message one line at a time and printing whatever the server sends back after each line is sent:
sockobj.send(line)
Transfers the next byte-string message line to the
server over the socket. Notice that the default list of
lines contains bytes
strings (b'...'
). Just as
on the server, data passed through the socket must be a byte
string, though it can be the result of a manual str.encode
encoding call or an
object conversion with pickle
or struct
if desired. When lines to
be sent are given as command-line arguments instead, they
must be converted from str
to bytes
; the client arranges this by
encoding in a generator expression (a call map(str.encode, sys.argv[2:])
would have the same effect).
data =
sockobj.recv(1024)
Reads the next reply line sent by the server program. Technically, this reads up to 1,024 bytes of the next reply message and returns it as a byte string.
sockobj.close()
Closes the connection with the server, sending it the end-of-file signal.
And that’s it. The server exchanges one or more lines of text with each client that connects. The operating system takes care of locating remote machines, routing bytes sent between programs and possibly across the Internet, and (with TCP) making sure that our messages arrive intact. That involves a lot of processing, too—our strings may ultimately travel around the world, crossing phone wires, satellite links, and more along the way. But we can be happily ignorant of what goes on beneath the socket call layer when programming in Python.
Let’s put this client and server to work. There are two ways to run these scripts—on either the same machine or two different machines. To run the client and the server on the same machine, bring up two command-line consoles on your computer, start the server program in one, and run the client repeatedly in the other. The server keeps running and responds to requests made each time you run the client script in the other window.
For instance, here is the text that shows up in the MS-DOS console window where I’ve started the server script:
C:...PP4EInternetSockets> python echo-server.py
Server connected by ('127.0.0.1', 57666)
Server connected by ('127.0.0.1', 57667)
Server connected by ('127.0.0.1', 57668)
The output here gives the address (machine IP name and port number) of each connecting client. Like most servers, this one runs perpetually, listening for client connection requests. This server receives three, but I have to show you the client window’s text for you to understand what this means:
C:...PP4EInternetSockets>python echo-client.py
Client received: b'Echo=>Hello network world' C:...PP4EInternetSockets>python echo-client.py localhost spam Spam SPAM
Client received: b'Echo=>spam' Client received: b'Echo=>Spam' Client received: b'Echo=>SPAM' C:...PP4EInternetSockets>python echo-client.py localhost Shrubbery
Client received: b'Echo=>Shrubbery'
Here, I ran the client script three times, while the server script kept running in the other window. Each client connected to the server, sent it a message of one or more lines of text, and read back the server’s reply—an echo of each line of text sent from the client. And each time a client is run, a new connection message shows up in the server’s window (that’s why we got three). Because the server’s coded as an infinite loop, you may need to kill it with Task Manager on Windows when you’re done testing, because a Ctrl-C in the server’s console window is ignored; other platforms may fare better.
It’s important to notice that client and server are running on
the same machine here (a Windows PC). The server and client agree on
the port number, but they use the machine names ""
and localhost
, respectively, to refer to the
computer on which they are running. In fact, there is no Internet
connection to speak of. This is just IPC, of the sort we saw in
Chapter 5: sockets also work well as
cross-program communications tools on a single machine.
To make these scripts talk over the Internet rather than on a single machine and sample the broader scope of sockets, we have to do some extra work to run the server on a different computer. First, upload the server’s source file to a remote machine where you have an account and a Python. Here’s how I do it with FTP to a site that hosts a domain name of my own, learning-python.com; most informational lines in the following have been removed, your server name and upload interface details will vary, and there are other ways to copy files to a computer (e.g., FTP client GUIs, email, web page post forms, and so on—see Tips on Using Remote Servers for hints on accessing remote servers):
C:...PP4EInternetSockets>ftp learning-python.com
Connected to learning-python.com. User (learning-python.com:(none)):xxxxxxxx
Password:yyyyyyyy
ftp>mkdir scripts
ftp>cd scripts
ftp>put echo-server.py
ftp>quit
Once you have the server program loaded on the other computer,
you need to run it there. Connect to that computer and start the
server program. I usually Telnet or SSH into my server machine and
start the server program as a perpetually running process from the
command line. The &
syntax in
Unix/Linux shells can be used to run the server script in the
background; we could also make the server directly executable with a
#!
line and a chmod
command (see Chapter 3 for details).
Here is the text that shows up in a window on my PC that is running a SSH session with the free PuTTY client, connected to the Linux server where my account is hosted (again, less a few deleted informational lines):
login as:xxxxxxxx
[email protected]'s password:yyyyyyyy
Last login: Fri Apr 23 07:46:33 2010 from 72.236.109.185 [...]$cd scripts
[...]$python echo-server.py &
[1] 23016
Now that the server is listening for connections on the Net,
run the client on your local computer multiple times again. This
time, the client runs on a different machine than the server, so we
pass in the server’s domain or IP name as a client command-line
argument. The server still uses a machine name of ""
because it always listens on whatever
machine it runs on. Here is what shows up in the remote learning-python.com server’s SSH window
on my PC:
[...]$ Server connected by ('72.236.109.185', 57697) Server connected by ('72.236.109.185', 57698) Server connected by ('72.236.109.185', 57699) Server connected by ('72.236.109.185', 57700)
And here is what appears in the Windows console window where I run the client. A “connected by” message appears in the server SSH window each time the client script is run in the client window:
C:...PP4EInternetSockets>python echo-client.py learning-python.com
Client received: b'Echo=>Hello network world' C:...PP4EInternetSockets>python echo-client.py learning-python.com ni Ni NI
Client received: b'Echo=>ni' Client received: b'Echo=>Ni' Client received: b'Echo=>NI' C:...PP4EInternetSockets>python echo-client.py learning-python.com Shrubbery
Client received: b'Echo=>Shrubbery'
The ping
command can be
used to get an IP address for a machine’s domain name; either
machine name form can be used to connect in the client:
C:...PP4EInternetSockets>ping learning-python.com
Pinging learning-python.com [97.74.215.115] with 32 bytes of data: Reply from 97.74.215.115: bytes=32 time=94ms TTL=47 Ctrl-C C:...PP4EInternetSockets>python echo-client.py 97.74.215.115 Brave Sir Robin
Client received: b'Echo=>Brave' Client received: b'Echo=>Sir' Client received: b'Echo=>Robin'
This output is perhaps a bit understated—a lot is happening under the hood. The client, running on my Windows laptop, connects with and talks to the server program running on a Linux machine perhaps thousands of miles away. It all happens about as fast as when client and server both run on the laptop, and it uses the same library calls; only the server name passed to clients differs.
Though simple, this illustrates one of the major advantages of using sockets for cross-program communication: they naturally support running the conversing programs on different machines, with little or no change to the scripts themselves. In the process, sockets make it easy to decouple and distribute parts of a system over a network when needed.
Before we move on, there are three practical usage details you should know. First, you can run the client and server like this on any two Internet-aware machines where Python is installed. Of course, to run the client and server on different computers, you need both a live Internet connection and access to another machine on which to run the server.
This need not be an expensive proposition, though; when
sockets are opened, Python is happy to initiate and use whatever
connectivity you have, be it a dedicated T1 line, wireless router,
cable modem, or dial-up account. Moreover, if you don’t have a
server account of your own like the one I’m using on learning-python.com, simply run client
and server examples on the same machine, localhost
, as shown earlier; all you
need then is a computer that allows sockets, and most do.
Second, the socket module generally raises exceptions if you ask for something invalid. For instance, trying to connect to a nonexistent server (or unreachable servers, if you have no Internet link) fails:
C:...PP4EInternetSockets> python echo-client.py www.nonesuch.com hello
Traceback (most recent call last):
File "echo-client.py", line 24, in <module>
sockobj.connect((serverHost, serverPort)) # connect to server machine...
socket.error: [Errno 10060] A connection attempt failed because the connected
party did not properly respond after a period of time, or established connection
failed because connected host has failed to respond
Finally, also be sure to kill the server process before restarting it again, or else the port number will still be in use, and you’ll get another exception; on my remote server machine:
[...]$ps -x
PID TTY STAT TIME COMMAND 5378 pts/0 S 0:00 python echo-server.py 22017 pts/0 Ss 0:00 -bash 26805 pts/0 R+ 0:00 ps –x [...]$python echo-server.py
Traceback (most recent call last): File "echo-server.py", line 14, in <module> sockobj.bind((myHost, myPort)) # bind it to server port number socket.error: [Errno 10048] Only one usage of each socket address (protocol/ network address/port) is normally permitted
A series of Ctrl-Cs will kill the server on Linux (be sure
to type fg
to bring it to the
foreground first if started with an &
):
[...]$ fg
python echo-server.py
Traceback (most recent call last):
File "echo-server.py", line 18, in <module>
connection, address = sockobj.accept() # wait for next client connect
KeyboardInterrupt
As mentioned earlier, a Ctrl-C kill key combination won’t
kill the server on my Windows 7 machine, however. To kill the
perpetually running server process running locally on Windows, you
may need to start Task Manager (e.g., using a Ctrl-Alt-Delete key
combination), and then end the Python task by selecting it in the
process listbox that appears. Closing the window in which the
server is running will also suffice on Windows, but you’ll lose
that window’s command history. You can also usually kill a server
on Linux with a kill −9
pid
shell command if it is running in
another window or in the background, but Ctrl-C requires less
typing.
So far, we’ve run a server locally and remotely, and run individual
clients manually, one after another. Realistic servers are generally
intended to handle many clients, of course, and possibly at the same
time. To see how our echo server handles the load, let’s fire up
eight copies of the client script in parallel using the script in
Example 12-3; see the end of
Chapter 5 for details on the launchmodes
module used here to spawn
clients and alternatives such as the multiprocessing
and subprocess
modules.
import sys from PP4E.launchmodes import QuietPortableLauncher numclients = 8 def start(cmdline): QuietPortableLauncher(cmdline, cmdline)() # start('echo-server.py') # spawn server locally if not yet started args = ' '.join(sys.argv[1:]) # pass server name if running remotely for i in range(numclients): start('echo-client.py %s' % args) # spawn 8? clients to test the server
To run this script, pass no arguments to talk to a server listening on port 50007 on the local machine; pass a real machine name to talk to a server running remotely. Three console windows come into play in this scheme—the client, a local server, and a remote server. On Windows, the clients’ output is discarded when spawned from this script, but it would be similar to what we’ve already seen. Here’s the client window interaction—8 clients are spawned locally to talk to both a local and a remote server:
C:...PP4EInternetSockets>set PYTHONPATH=C:...devExamples
C:...PP4EInternetSockets>python testecho.py
C:...PP4EInternetSockets>python testecho.py learning-python.com
If the spawned clients connect to a server run locally (the first run of the script on the client), connection messages show up in the server’s window on the local machine:
C:...PP4EInternetSockets> python echo-server.py
Server connected by ('127.0.0.1', 57721)
Server connected by ('127.0.0.1', 57722)
Server connected by ('127.0.0.1', 57723)
Server connected by ('127.0.0.1', 57724)
Server connected by ('127.0.0.1', 57725)
Server connected by ('127.0.0.1', 57726)
Server connected by ('127.0.0.1', 57727)
Server connected by ('127.0.0.1', 57728)
If the server is running remotely, the client connection messages instead appear in the window displaying the SSH (or other) connection to the remote computer, here, learning-python.com:
[...]$ python echo-server.py
Server connected by ('72.236.109.185', 57729)
Server connected by ('72.236.109.185', 57730)
Server connected by ('72.236.109.185', 57731)
Server connected by ('72.236.109.185', 57732)
Server connected by ('72.236.109.185', 57733)
Server connected by ('72.236.109.185', 57734)
Server connected by ('72.236.109.185', 57735)
Server connected by ('72.236.109.185', 57736)
The net effect is that our echo server converses with
multiple clients, whether running locally or remotely. Keep in
mind, however, that this works for our simple scripts only because
the server doesn’t take a long time to respond to each client’s
requests—it can get back to the top of the server script’s outer
while
loop in time to process
the next incoming client. If it could not, we would probably need
to change the server to handle each client in parallel, or some
might be denied a connection.
Technically, client connections would fail after 5 clients
are already waiting for the server’s attention, as specified in
the server’s listen
call. To
prove this to yourself, add a time.sleep
call somewhere inside the
echo server’s main loop in Example 12-1 after a connection is
accepted, to simulate a long-running task (this is from file
echo-server-sleep.py in the
examples package if you wish to experiment):
while True: # listen until process killed connection, address = sockobj.accept() # wait for next client connect while True: data = connection.recv(1024) # read next line on client socket time.sleep(3) # take time to process request ...
If you then run this server and the testecho
clients script, you’ll notice
that not all 8 clients wind up receiving a connection, because the
server is too busy to empty its pending-connections queue in time.
Only 6 clients are served when I run this on Windows—one accepted
initially, and 5 in the pending-requests listen
queue. The other two clients are
denied connections and fail.
The following shows the server and client messages produced
when the server is stalled this way, including the error messages
that the two denied clients receive. To see the clients’ messages
on Windows, you can change testecho
to use the StartArgs
launcher with a /B
switch at the front of the command
line to route messages to the persistent console window (see file
testecho-messages.py in the
examples package):
C:...PP4EdevExamplesPP4EInternetSockets>echo-server-sleep.py
Server connected by ('127.0.0.1', 59625) Server connected by ('127.0.0.1', 59626) Server connected by ('127.0.0.1', 59627) Server connected by ('127.0.0.1', 59628) Server connected by ('127.0.0.1', 59629) Server connected by ('127.0.0.1', 59630) C:...PP4EdevExamplesPP4EInternetSockets>testecho-messages.py
/B echo-client.py /B echo-client.py /B echo-client.py /B echo-client.py /B echo-client.py /B echo-client.py /B echo-client.py /B echo-client.py Client received: b'Echo=>Hello network world' Traceback (most recent call last): File "C:...PP4EInternetSocketsecho-client.py", line 24, in <module> sockobj.connect((serverHost, serverPort)) # connect to server machine... socket.error: [Errno 10061] No connection could be made because the target machine actively refused it Traceback (most recent call last): File "C:...PP4EInternetSocketsecho-client.py", line 24, in <module> sockobj.connect((serverHost, serverPort)) # connect to server machine... socket.error: [Errno 10061] No connection could be made because the target machine actively refused it Client received: b'Echo=>Hello network world' Client received: b'Echo=>Hello network world' Client received: b'Echo=>Hello network world' Client received: b'Echo=>Hello network world' Client received: b'Echo=>Hello network world'
As you can see, with such a sleepy server, 8 clients are spawned, but only 6 receive service, and 2 fail with exceptions. Unless clients require very little of the server’s attention, to handle multiple requests overlapping in time we need to somehow service clients in parallel. We’ll see how servers can handle multiple clients more robustly in a moment; first, though, let’s experiment with some special ports.
It’s also important to know that this client and server engage in a proprietary sort of discussion, and so use the port number 50007 outside the range reserved for standard protocols (0 to 1023). There’s nothing preventing a client from opening a socket on one of these special ports, however. For instance, the following client-side code connects to programs listening on the standard email, FTP, and HTTP web server ports on three different server machines:
C:...PP4EInternetSockets>python
>>>from socket import *
>>>sock = socket(AF_INET, SOCK_STREAM)
>>>sock.connect(('pop.secureserver.net', 110))
# talk to POP email server >>>print(sock.recv(70))
b'+OK <[email protected]> ' >>>sock.close()
>>>sock = socket(AF_INET, SOCK_STREAM)
>>>sock.connect(('learning-python.com', 21))
# talk to FTP server >>>print(sock.recv(70))
b'220---------- Welcome to Pure-FTPd [privsep] [TLS] ---------- 220-You' >>>sock.close()
>>>sock = socket(AF_INET, SOCK_STREAM)
>>>sock.connect(('www.python.net', 80))
# talk to Python's HTTP server >>>sock.send(b'GET / ')
# fetch root page reply 7 >>>sock.recv(70)
b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://' >>>sock.recv(70)
b'www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.'
If we know how to interpret the output returned by these
ports’ servers, we could use raw sockets like this to fetch email,
transfer files, and grab web pages and invoke server-side scripts.
Fortunately, though, we don’t have to worry about all the underlying
details—Python’s poplib
, ftplib
, and http.client
and urllib.request
modules provide
higher-level interfaces for talking to servers on these ports. Other
Python protocol modules do the same for other standard ports (e.g.,
NNTP, Telnet, and so on). We’ll meet some of these client-side
protocol modules in the next chapter.[45]
Speaking of reserved ports, it’s all right to open client-side connections on reserved ports as in the prior section, but you can’t install your own server-side scripts for these ports unless you have special permission. On the server I use to host learning-python.com, for instance, the web server port 80 is off limits (presumably, unless I shell out for a virtual or dedicated hosting account):
[...]$python
>>>from socket import *
>>>sock = socket(AF_INET, SOCK_STREAM)
# try to bind web port on general server >>>sock.bind(('', 80))
# learning-python.com is a shared machine Traceback (most recent call last): File "<stdin>", line 1, in File "<string>", line 1, in bind socket.error: (13, 'Permission denied')
Even if run by a user with the required permission, you’ll get the different exception we saw earlier if the port is already being used by a real web server. On computers being used as general servers, these ports really are reserved. This is one reason we’ll run a web server of our own locally for testing when we start writing server-side scripts later in this book—the above code works on a Windows PC, which allows us to experiment with websites locally, on a self-contained machine:
C:...PP4EInternetSockets>python
>>>from socket import *
>>>sock = socket(AF_INET, SOCK_STREAM)
# can bind port 80 on Windows >>>sock.bind(('', 80))
# allows running server on localhost >>>
We’ll learn more about installing web servers later in Chapter 15. For the purposes of this chapter, we need to get realistic about how our socket servers handle their clients.
The echo
client and
server programs shown previously serve to illustrate
socket fundamentals. But the server model used suffers from a fairly
major flaw. As described earlier, if multiple clients try to connect
to the server, and it takes a long time to process a given client’s
request, the server will fail. More accurately, if the cost of
handling a given request prevents the server from returning to the
code that checks for new clients in a timely manner, it won’t be able
to keep up with all the requests, and some clients will eventually be
denied connections.
In real-world client/server programs, it’s far more typical to code a server so as to avoid blocking new requests while handling a current client’s request. Perhaps the easiest way to do so is to service each client’s request in parallel—in a new process, in a new thread, or by manually switching (multiplexing) between clients in an event loop. This isn’t a socket issue per se, and we already learned how to start processes and threads in Chapter 5. But since these schemes are so typical of socket server programming, let’s explore all three ways to handle client requests in parallel here.
The script in Example 12-4 works like the original echo
server, but instead forks a new process to handle each new client
connection. Because the handleClient
function runs in a new
process, the dispatcher
function
can immediately resume its main loop in order to detect and service
a new incoming request.
""" Server side: open a socket on a port, listen for a message from a client, and send an echo reply; forks a process to handle each client connection; child processes share parent's socket descriptors; fork is less portable than threads--not yet on Windows, unless Cygwin or similar installed; """ import os, time, sys from socket import * # get socket constructor and constants myHost = '' # server machine, '' means local host myPort = 50007 # listen on a non-reserved port number sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP socket object sockobj.bind((myHost, myPort)) # bind it to server port number sockobj.listen(5) # allow 5 pending connects def now(): # current time on server return time.ctime(time.time()) activeChildren = [] def reapChildren(): # reap any dead child processes while activeChildren: # else may fill up system table pid, stat = os.waitpid(0, os.WNOHANG) # don't hang if no child exited if not pid: break activeChildren.remove(pid) def handleClient(connection): # child process: reply, exit time.sleep(5) # simulate a blocking activity while True: # read, write a client socket data = connection.recv(1024) # till eof when socket closed if not data: break reply = 'Echo=>%s at %s' % (data, now()) connection.send(reply.encode()) connection.close() os._exit(0) def dispatcher(): # listen until process killed while True: # wait for next connection, connection, address = sockobj.accept() # pass to process for service print('Server connected by', address, end=' ') print('at', now()) reapChildren() # clean up exited children now childPid = os.fork() # copy this process if childPid == 0: # if in child process: handle handleClient(connection) else: # else: go accept next connect activeChildren.append(childPid) # add to active child pid list dispatcher()
Parts of this script are a bit tricky, and most of its library calls work only on Unix-like platforms. Crucially, it runs on Cygwin Python on Windows, but not standard Windows Python. Before we get into too many forking details, though, let’s focus on how this server arranges to handle multiple client requests.
First, notice that to simulate a long-running operation
(e.g., database updates, other network traffic), this server adds
a five-second time.sleep
delay
in its client handler function, handleClient
. After the delay, the
original echo reply action is performed. That means that when we
run a server and clients this time, clients won’t receive the echo
reply until five seconds after they’ve sent their requests to the
server.
To help keep track of requests and replies, the server prints its system time each time a client connect request is received, and adds its system time to the reply. Clients print the reply time sent back from the server, not their own—clocks on the server and client may differ radically, so to compare apples to apples, all times are server times. Because of the simulated delays, we also must usually start each client in its own console window on Windows (clients will hang in a blocked state while waiting for their reply).
But the grander story here is that this script runs one main
parent process on the server machine, which does nothing but watch
for connections (in dispatcher
), plus one child process per
active client connection, running in parallel with both the main
parent process and the other client processes (in handleClient
). In principle, the server
can handle any number of clients without bogging down.
To test, let’s first start the server remotely in a SSH or Telnet window, and start three clients locally in three distinct console windows. As we’ll see in a moment, this server can also be run under Cygwin locally if you have Cygwin but don’t have a remote server account like the one on learning-python.com used here:
[server window (SSH or Telnet)] [...]$uname -p -o
i686 GNU/Linux [...]$python fork-server.py
Server connected by ('72.236.109.185', 58395) at Sat Apr 24 06:46:45 2010 Server connected by ('72.236.109.185', 58396) at Sat Apr 24 06:46:49 2010 Server connected by ('72.236.109.185', 58397) at Sat Apr 24 06:46:51 2010 [client window 1] C:...PP4EInternetSockets>python echo-client.py learning-python.com
Client received: b"Echo=>b'Hello network world' at Sat Apr 24 06:46:50 2010" [client window 2] C:...PP4EInternetSockets>python echo-client.py learning-python.com Bruce
Client received: b"Echo=>b'Bruce' at Sat Apr 24 06:46:54 2010" [client window 3] C:...Sockets>python echo-client.py learning-python.com The Meaning of Life
Client received: b"Echo=>b'The' at Sat Apr 24 06:46:56 2010" Client received: b"Echo=>b'Meaning' at Sat Apr 24 06:46:56 2010" Client received: b"Echo=>b'of' at Sat Apr 24 06:46:56 2010" Client received: b"Echo=>b'Life' at Sat Apr 24 06:46:57 2010"
Again, all times here are on the server machine. This may be a little confusing because four windows are involved. In plain English, the test proceeds as follows:
The server starts running remotely.
All three clients are started and connect to the server a few seconds apart.
On the server, the client requests trigger three forked child processes, which all immediately go to sleep for five seconds (to simulate being busy doing something useful).
Each client waits until the server replies, which happens five seconds after their initial requests.
In other words, clients are serviced at the same time by forked processes, while the main parent process continues listening for new client requests. If clients were not handled in parallel like this, no client could connect until the currently connected client’s five-second delay expired.
In a more realistic application, that delay could be fatal
if many clients were trying to connect at once—the server would be
stuck in the action we’re simulating with time.sleep
, and not get back to the main
loop to accept
new client
requests. With process forks per request, clients can be serviced
in parallel.
Notice that we’re using the same client script here (echo-client.py, from Example 12-2), just a different server; clients simply send and receive data to a machine and port and don’t care how their requests are handled on the server. The result displayed shows a byte string within a byte string, because the client sends one to the server and the server sends one back; because the server uses string formatting and manual encoding instead of byte string concatenation, the client’s message is shown as byte string explicitly here.
Also note that the server is running remotely on a Linux
machine in the preceding section. As we learned in Chapter 5, the fork
call is not supported on Windows in
standard Python at the time this book was written. It does run on
Cygwin Python, though, which allows us to start this server
locally on localhost
, on the
same machine as its clients:
[Cygwin shell window] [C:...PP4EInternetSocekts]$python fork-server.py
Server connected by ('127.0.0.1', 58258) at Sat Apr 24 07:50:15 2010 Server connected by ('127.0.0.1', 58259) at Sat Apr 24 07:50:17 2010 [Windows console, same machine] C:...PP4EInternetSockets>python echo-client.py localhost bright side of life
Client received: b"Echo=>b'bright' at Sat Apr 24 07:50:20 2010" Client received: b"Echo=>b'side' at Sat Apr 24 07:50:20 2010" Client received: b"Echo=>b'of' at Sat Apr 24 07:50:20 2010" Client received: b"Echo=>b'life' at Sat Apr 24 07:50:20 2010" [Windows console, same machine] C:...PP4EInternetSockets>python echo-client.py
Client received: b"Echo=>b'Hello network world' at Sat Apr 24 07:50:22 2010"
We can also run this test on the remote Linux server entirely, with two SSH or Telnet windows. It works about the same as when clients are started locally, in a DOS console window, but here “local” actually means a remote machine you’re using locally. Just for fun, let’s also contact the remote server from a locally running client to show how the server is also available to the Internet at large—when servers are coded with sockets and forks this way, clients can connect from arbitrary machines, and can overlap arbitrarily in time:
[one SSH (or Telnet) window] [...]$python fork-server.py
Server connected by ('127.0.0.1', 55743) at Sat Apr 24 07:15:14 2010 Server connected by ('127.0.0.1', 55854) at Sat Apr 24 07:15:26 2010 Server connected by ('127.0.0.1', 55950) at Sat Apr 24 07:15:36 2010 Server connected by ('72.236.109.185', 58414) at Sat Apr 24 07:19:50 2010 [another SSH window, same machine] [...]$python echo-client.py
Client received: b"Echo=>b'Hello network world' at Sat Apr 24 07:15:19 2010" [...]$python echo-client.py localhost niNiNI!
Client received: b"Echo=>b'niNiNI!' at Sat Apr 24 07:15:31 2010" [...]$python echo-client.py localhost Say no more!
Client received: b"Echo=>b'Say' at Sat Apr 24 07:15:41 2010" Client received: b"Echo=>b'no' at Sat Apr 24 07:15:41 2010" Client received: b"Echo=>b'more!' at Sat Apr 24 07:15:41 2010" [Windows console, local machine] C:...InternetSockets>python echo-client.py learning-python.com Blue, no yellow!
Client received: b"Echo=>b'Blue,' at Sat Apr 24 07:19:55 2010" Client received: b"Echo=>b'no' at Sat Apr 24 07:19:55 2010" Client received: b"Echo=>b'yellow!' at Sat Apr 24 07:19:55 2010"
Now that we have a handle on the basic model, let’s move on to the tricky bits. This server script is fairly straightforward as forking code goes, but a few words about the library tools it employs are in order.
We met os.fork
in Chapter 5, but recall that forked processes are essentially a copy
of the process that forks them, and so they inherit file and
socket descriptors from their parent process. As a result, the new
child process that runs the handleClient
function has access to the
connection socket created in the parent process. Really, this is
why the child process works at all—when conversing on the
connected socket, it’s using the same socket that parent’s
accept
call returns. Programs
know they are in a forked child process if the fork call returns
0; otherwise, the original parent process gets back the new
child’s ID.
In earlier fork examples, child processes usually call one of the exec
variants to start a new program in
the child process. Here, instead, the child process simply calls a
function in the same program and exits with os._exit
. It’s imperative to call
os._exit
here—if we did not, each child would
live on after handleClient
returns, and compete for accepting new client requests.
In fact, without the exit call, we’d wind up with as many
perpetual server processes as requests served—remove the exit call
and do a ps
shell command after
running a few clients, and you’ll see what I mean. With the call,
only the single parent process listens for new requests. os._exit
is like sys.exit
, but it exits the calling
process immediately without cleanup actions. It’s normally used
only in child processes, and sys.exit
is used everywhere else.
Note, however, that it’s not quite enough to make sure that child
processes exit and die. On systems like Linux, though not on
Cygwin, parents must also be sure to issue a wait
system call to remove the entries
for dead child processes from the system’s process table. If we
don’t do this, the child processes will no longer run, but they
will consume an entry in the system process table. For
long-running servers, these bogus entries may become
problematic.
It’s common to call such dead-but-listed child processes
zombies: they continue to use system
resources even though they’ve already passed over to the great
operating system beyond. To clean up after child processes are
gone, this server keeps a list, active
Children
, of the process IDs of all
child processes it spawns. Whenever a new incoming client request
is received, the server runs its reapChildren
to issue a wait
for any dead children by issuing
the standard Python os.waitpid(0,os.WNOHANG)
call.
The os.waitpid
call
attempts to wait for a child process to exit and returns its
process ID and exit status. With a 0
for its first argument, it waits for
any child process. With the WNOHANG
parameter for its second, it
does nothing if no child process has exited (i.e., it does not
block or pause the caller). The net effect is that this call
simply asks the operating system for the process ID of any child
that has exited. If any have, the process ID returned is removed
both from the system process table and from this script’s activeChildren
list.
To see why all this complexity is needed, comment out the
reapChildren
call in this
script, run it on a platform where this is an issue, and then run
a few clients. On my Linux server, a ps
-f
full process listing command shows that all the dead
child processes stay in the system process table (show as <defunct>
):
[...]$ ps –f
UID PID PPID C STIME TTY TIME CMD
5693094 9990 30778 0 04:34 pts/0 00:00:00 python fork-server.py
5693094 10844 9990 0 04:35 pts/0 00:00:00 [python] <defunct>
5693094 10869 9990 0 04:35 pts/0 00:00:00 [python] <defunct>
5693094 11130 9990 0 04:36 pts/0 00:00:00 [python] <defunct>
5693094 11151 9990 0 04:36 pts/0 00:00:00 [python] <defunct>
5693094 11482 30778 0 04:36 pts/0 00:00:00 ps -f
5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash
When the reapChildren
command is reactivated, dead child zombie entries are cleaned up
each time the server gets a new client connection request, by
calling the Python os.waitpid
function. A few zombies may accumulate if the server is heavily
loaded, but they will remain only until the next client connection
is received (you get only as many zombies as processes served in
parallel since the last accept
):
[...]$python fork-server.py &
[1] 20515 [...]$ps -f
UID PID PPID C STIME TTY TIME CMD 5693094 20515 30778 0 04:43 pts/0 00:00:00 python fork-server.py 5693094 20777 30778 0 04:43 pts/0 00:00:00 ps -f 5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash [...]$ Server connected by ('72.236.109.185', 58672) at Sun Apr 25 04:43:51 2010 Server connected by ('72.236.109.185', 58673) at Sun Apr 25 04:43:54 2010 [...]$ps -f
UID PID PPID C STIME TTY TIME CMD 5693094 20515 30778 0 04:43 pts/0 00:00:00 python fork-server.py 5693094 21339 20515 0 04:43 pts/0 00:00:00 [python] <defunct> 5693094 21398 20515 0 04:43 pts/0 00:00:00 [python] <defunct> 5693094 21573 30778 0 04:44 pts/0 00:00:00 ps -f 5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash [...]$ Server connected by ('72.236.109.185', 58674) at Sun Apr 25 04:44:07 2010 [...]$ps -f
UID PID PPID C STIME TTY TIME CMD 5693094 20515 30778 0 04:43 pts/0 00:00:00 python fork-server.py 5693094 21646 20515 0 04:44 pts/0 00:00:00 [python] <defunct> 5693094 21813 30778 0 04:44 pts/0 00:00:00 ps -f 5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash
In fact, if you type fast enough, you can actually see a
child process morph from a real running program into a zombie.
Here, for example, a child spawned to handle a new request changes
to <defunct>
on exit. Its
connection cleans up lingering zombies, and its own process entry
will be removed completely when the next request is
received:
[...]$ Server connected by ('72.236.109.185', 58676) at Sun Apr 25 04:48:22 2010 [...]ps -f
UID PID PPID C STIME TTY TIME CMD 5693094 20515 30778 0 04:43 pts/0 00:00:00 python fork-server.py 5693094 27120 20515 0 04:48 pts/0 00:00:00 python fork-server.py 5693094 27174 30778 0 04:48 pts/0 00:00:00 ps -f 5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash [...]$ps -f
UID PID PPID C STIME TTY TIME CMD 5693094 20515 30778 0 04:43 pts/0 00:00:00 python fork-server.py 5693094 27120 20515 0 04:48 pts/0 00:00:00 [python] <defunct> 5693094 27234 30778 0 04:48 pts/0 00:00:00 ps -f 5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash
On some systems, it’s also possible to clean up zombie child
processes by resetting the signal handler for the SIGCHLD
signal delivered to a parent
process by the operating system when a child process stops or
exits. If a Python script assigns the SIG_IGN
(ignore) action as the SIGCHLD
signal handler, zombies will be
removed automatically and immediately by the operating system as
child processes exit; the parent need not issue wait calls to
clean up after them. Because of that, this scheme is a simpler
alternative to manually reaping zombies on platforms where it is
supported.
If you’ve already read Chapter 5, you know that Python’s
standard signal
module lets
scripts install handlers for signals—software-generated events. By
way of review, here is a brief bit of background to show how this
pans out for zombies. The program in Example 12-5 installs a
Python-coded signal handler function to respond to whatever signal
number you type on the command line.
""" Demo Python's signal module; pass signal number as a command-line arg, and use a "kill -N pid" shell command to send this process a signal; on my Linux machine, SIGUSR1=10, SIGUSR2=12, SIGCHLD=17, and SIGCHLD handler stays in effect even if not restored: all other handlers are restored by Python after caught, but SIGCHLD behavior is left to the platform's implementation; signal works on Windows too, but defines only a few signal types; signals are not very portable in general; """ import sys, signal, time def now(): return time.asctime() def onSignal(signum, stackframe): # Python signal handler print('Got signal', signum, 'at', now()) # most handlers stay in effect if signum == signal.SIGCHLD: # but sigchld handler is not print('sigchld caught') #signal.signal(signal.SIGCHLD, onSignal) signum = int(sys.argv[1]) signal.signal(signum, onSignal) # install signal handler while True: signal.pause() # sleep waiting for signals
To run this script, simply put it in the background and send
it signals by typing the kill
-
signal-number
process-id
shell command line; this is
the shell’s equivalent of Python’s os.kill
function
available on Unix-like platforms only. Process IDs are listed in
the PID column of ps
command
results. Here is this script in action catching signal numbers 10
(reserved for general use) and 9 (the unavoidable terminate
signal):
[...]$python signal-demo.py 10 &
[1] 10141 [...]$ps -f
UID PID PPID C STIME TTY TIME CMD 5693094 10141 30778 0 05:00 pts/0 00:00:00 python signal-demo.py 10 5693094 10228 30778 0 05:00 pts/0 00:00:00 ps -f 5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash [...]$kill −10 10141
Got signal 10 at Sun Apr 25 05:00:31 2010 [...]$kill −10 10141
Got signal 10 at Sun Apr 25 05:00:34 2010 [...]$kill −9 10141
[1]+ Killed python signal-demo.py 10
And in the following the script catches signal 17, which
happens to be SIGCHLD
on my
Linux server. Signal numbers vary from machine to machine, so you
should normally use their names, not their numbers. SIGCHLD
behavior may vary per platform
as well. On my Cygwin install, for example, signal 10 can have
different meaning, and signal 20 is SIGCHLD—on Cygwin, the script
works as shown on Linux here for signal 10, but generates an
exception if it tries to install on handler for signal 17 (and
Cygwin doesn’t require reaping in any event). See the signal
module’s library manual entry for
more details:
[...]$python signal-demo.py 17 &
[1] 11592 [...]$ps -f
UID PID PPID C STIME TTY TIME CMD 5693094 11592 30778 0 05:00 pts/0 00:00:00 python signal-demo.py 17 5693094 11728 30778 0 05:01 pts/0 00:00:00 ps -f 5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash [...]$kill −17 11592
Got signal 17 at Sun Apr 25 05:01:28 2010 sigchld caught [...]$kill −17 11592
Got signal 17 at Sun Apr 25 05:01:35 2010 sigchld caught [...]$kill −9 11592
[1]+ Killed python signal-demo.py 17
Now, to apply all of this signal knowledge to killing
zombies, simply set the SIGCHLD
signal handler to the SIG_IGN
ignore handler action; on systems where this assignment is
supported, child processes will be cleaned up when they exit. The
forking server variant shown in Example 12-6 uses this trick to
manage its children.
"""
Same as fork-server.py, but use the Python signal module to avoid keeping
child zombie processes after they terminate, instead of an explicit reaper
loop before each new connection; SIG_IGN means ignore, and may not work with
SIG_CHLD child exit signal on all platforms; see Linux documentation for more
about the restartability of a socket.accept call interrupted with a signal;
"""
import os, time, sys, signal, signal
from socket import * # get socket constructor and constants
myHost = '' # server machine, '' means local host
myPort = 50007 # listen on a non-reserved port number
sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP socket object
sockobj.bind((myHost, myPort)) # bind it to server port number
sockobj.listen(5) # up to 5 pending connects
signal.signal(signal.SIGCHLD, signal.SIG_IGN)
# avoid child zombie processes
def now(): # time on server machine
return time.ctime(time.time())
def handleClient(connection): # child process replies, exits
time.sleep(5) # simulate a blocking activity
while True: # read, write a client socket
data = connection.recv(1024)
if not data: break
reply = 'Echo=>%s at %s' % (data, now())
connection.send(reply.encode())
connection.close()
os._exit(0)
def dispatcher(): # listen until process killed
while True: # wait for next connection,
connection, address = sockobj.accept() # pass to process for service
print('Server connected by', address, end=' ')
print('at', now())
childPid = os.fork() # copy this process
if childPid == 0: # if in child process: handle
handleClient(connection) # else: go accept next connect
dispatcher()
Where applicable, this technique is:
Much simpler; we don’t need to manually track or reap child processes.
More accurate; it leaves no zombies temporarily between client requests.
In fact, only one line is dedicated to handling zombies
here: the signal.signal
call
near the top, to set the handler. Unfortunately, this version is
also even less portable than using os.fork
in the first place, because
signals may work slightly differently from platform to platform,
even among Unix variants. For instance, some Unix platforms may
not allow SIG_IGN
to be used as
the SIGCHLD
action at all. On
Linux systems, though, this simpler forking server variant works
like a charm:
[...]$python fork-server-signal.py &
[1] 3837 Server connected by ('72.236.109.185', 58817) at Sun Apr 25 08:11:12 2010 [...]ps -f
UID PID PPID C STIME TTY TIME CMD 5693094 3837 30778 0 08:10 pts/0 00:00:00 python fork-server-signal.py 5693094 4378 3837 0 08:11 pts/0 00:00:00 python fork-server-signal.py 5693094 4413 30778 0 08:11 pts/0 00:00:00 ps -f 5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash [...]$ps -f
UID PID PPID C STIME TTY TIME CMD 5693094 3837 30778 0 08:10 pts/0 00:00:00 python fork-server-signal.py 5693094 4584 30778 0 08:11 pts/0 00:00:00 ps -f 5693094 30778 30772 0 04:23 pts/0 00:00:00 –bash
Notice how in this version the child process’s entry goes away as soon as it exits, even before a new client request is received; no “defunct” zombie ever appears. More dramatically, if we now start up the script we wrote earlier that spawns eight clients in parallel (testecho.py) to talk to this server remotely, all appear on the server while running, but are removed immediately as they exit:
[client window] C:...PP4EInternetSockets>testecho.py learning-python.com
[server window] [...]$ Server connected by ('72.236.109.185', 58829) at Sun Apr 25 08:16:34 2010 Server connected by ('72.236.109.185', 58830) at Sun Apr 25 08:16:34 2010 Server connected by ('72.236.109.185', 58831) at Sun Apr 25 08:16:34 2010 Server connected by ('72.236.109.185', 58832) at Sun Apr 25 08:16:34 2010 Server connected by ('72.236.109.185', 58833) at Sun Apr 25 08:16:34 2010 Server connected by ('72.236.109.185', 58834) at Sun Apr 25 08:16:34 2010 Server connected by ('72.236.109.185', 58835) at Sun Apr 25 08:16:34 2010 Server connected by ('72.236.109.185', 58836) at Sun Apr 25 08:16:34 2010 [...]$ps -f
UID PID PPID C STIME TTY TIME CMD 5693094 3837 30778 0 08:10 pts/0 00:00:00 python fork-server-signal.py 5693094 9666 3837 0 08:16 pts/0 00:00:00 python fork-server-signal.py 5693094 9667 3837 0 08:16 pts/0 00:00:00 python fork-server-signal.py 5693094 9668 3837 0 08:16 pts/0 00:00:00 python fork-server-signal.py 5693094 9670 3837 0 08:16 pts/0 00:00:00 python fork-server-signal.py 5693094 9674 3837 0 08:16 pts/0 00:00:00 python fork-server-signal.py 5693094 9678 3837 0 08:16 pts/0 00:00:00 python fork-server-signal.py 5693094 9681 3837 0 08:16 pts/0 00:00:00 python fork-server-signal.py 5693094 9682 3837 0 08:16 pts/0 00:00:00 python fork-server-signal.py 5693094 9722 30778 0 08:16 pts/0 00:00:00 ps -f 5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash [...]$ps -f
UID PID PPID C STIME TTY TIME CMD 5693094 3837 30778 0 08:10 pts/0 00:00:00 python fork-server-signal.py 5693094 10045 30778 0 08:16 pts/0 00:00:00 ps -f 5693094 30778 30772 0 04:23 pts/0 00:00:00 –bash
And now that I’ve shown you how to use signal handling to reap children automatically on Linux, I should underscore that this technique is not universally supported across all flavors of Unix. If you care about portability, manually reaping children as we did in Example 12-4 may still be desirable.
In Chapter 5, we learned about Python’s new multiprocessing
module. As we saw, it
provides a way to start function calls in new processes that is
more portable than the os.fork
call used in this section’s server code, and it runs processes
instead of threads to work around the thread GIL in some
scenarios. In particular, multiprocessing
works on standard
Windows Python too, unlike direct os.fork
calls.
I experimented with a server variant based upon this module to see if its portability might help for socket servers. Its full source code is in the examples package in file multi-server.py, but here are its important bits that differ:
...rest unchanged from fork-server.py... from multiprocessing import Process def handleClient(connection): print('Child:', os.getpid()) # child process: reply, exit time.sleep(5) # simulate a blocking activity while True: # read, write a client socket data = connection.recv(1024) # till eof when socket closed ...rest unchanged... def dispatcher(): # listen until process killed while True: # wait for next connection, connection, address = sockobj.accept() # pass to process for service print('Server connected by', address, end=' ') print('at', now()) Process(target=handleClient, args=(connection,)).start() if __name__ == '__main__': print('Parent:', os.getpid()) sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP socket object sockobj.bind((myHost, myPort)) # bind it to server port number sockobj.listen(5) # allow 5 pending connects dispatcher()
This server variant is noticeably simpler too. Like the
forking server it’s derived from, this server works fine under
Cygwin Python on Windows running as localhost
, and would probably work on
other Unix-like platforms as well, because multiprocessing
forks a process on such
systems, and file and socket descriptors are inherited by child
processes as usual. Hence, the child process uses the same
connected socket as the parent. Here’s the scene in a Cygwin
server window and two Windows client windows:
[server window] [C:...PP4EInternetSockets]$python multi-server.py
Parent: 8388 Server connected by ('127.0.0.1', 58271) at Sat Apr 24 08:13:27 2010 Child: 8144 Server connected by ('127.0.0.1', 58272) at Sat Apr 24 08:13:29 2010 Child: 8036 [two client windows] C:...PP4EInternetSockets>python echo-client.py
Client received: b"Echo=>b'Hello network world' at Sat Apr 24 08:13:33 2010" C:...PP4EInternetSockets>python echo-client.py localhost Brave Sir Robin
Client received: b"Echo=>b'Brave' at Sat Apr 24 08:13:35 2010" Client received: b"Echo=>b'Sir' at Sat Apr 24 08:13:35 2010" Client received: b"Echo=>b'Robin' at Sat Apr 24 08:13:35 2010"
However, this server does not work on
standard Windows Python—the whole point of trying to use multiprocessing
in this context—because
open sockets are not correctly pickled when passed as arguments
into the new process. Here’s what occurs in the server windows on
Windows 7 with Python 3.1:
C:...PP4EInternetSockets> python multi-server.py
Parent: 9140
Server connected by ('127.0.0.1', 58276) at Sat Apr 24 08:17:41 2010
Child: 9628
Process Process-1:
Traceback (most recent call last):
File "C:Python31libmultiprocessingprocess.py", line 233, in _bootstrap
self.run()
File "C:Python31libmultiprocessingprocess.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "C:...PP4EInternetSocketsmulti-server.py", line 38, in handleClient
data = connection.recv(1024) # till eof when socket closed
socket.error: [Errno 10038] An operation was attempted on something that is not
a socket
Recall from Chapter 5 that on
Windows multiprocessing
passes
context to a new Python
interpreter process by pickling it, and that Process
arguments must all be pickleable
for Windows. Sockets in Python 3.1 don’t trigger errors when
pickled thanks to the class they are an instance of, but they are
not really pickled correctly:
>>>from pickle import *
>>>from socket import *
>>>s = socket()
>>>x = dumps(s)
>>>s
<socket.socket object, fd=180, family=2, type=1, proto=0> >>>loads(x)
<socket.socket object, fd=-1, family=0, type=0, proto=0> >>>x
b'x80x03csocket socket qx00)x81qx01N}qx02(Xx08x00x00x00_io_refsqx03 Kx00Xx07x00x00x00_closedqx04x89ux86qx05b.'
As we saw in Chapter 5,
multiprocessing
has other IPC
tools such as its own pipes and queues that might be used instead
of sockets to work around this issue, but clients would then have
to use them, too—the resulting server would not be as broadly
accessible as one based upon general Internet sockets.
Even if multiprocessing
did work on Windows, though, its need to start a new Python
interpreter would likely make it much slower than the more
traditional technique of spawning threads to talk to clients.
Coincidentally, that brings us to our next topic.
The forking model just described works well on Unix-like platforms in general, but it suffers from some potentially significant limitations:
On some machines, starting a new process can be fairly expensive in terms of time and space resources.
Forking processes is a Unix technique; as we’ve learned,
the os.fork
call currently
doesn’t work on non-Unix platforms such as Windows under
standard Python. As we’ve also learned, forks can be used in
the Cygwin version of Python on Windows, but they may be
inefficient and not exactly the same as Unix forks. And as we
just discovered, multiprocessing
won’t help on
Windows, because connected sockets are not pickleable across
process boundaries.
If you think that forking servers can be complicated, you’re not alone. As we just saw, forking also brings with it all the shenanigans of managing and reaping zombies—cleaning up after child processes that live shorter lives than their parents.
If you read Chapter 5, you know that one solution to all of these dilemmas is to use threads rather than processes. Threads run in parallel and share global (i.e., module and interpreter) memory.
Because threads all run in the same process and memory space, they automatically share sockets passed between them, similar in spirit to the way that child processes inherit socket descriptors. Unlike processes, though, threads are usually less expensive to start, and work on both Unix-like machines and Windows under standard Python today. Furthermore, many (though not all) see threads as simpler to program—child threads die silently on exit, without leaving behind zombies to haunt the server.
To illustrate, Example 12-7 is another mutation of the echo server that handles client requests in parallel by running them in threads rather than in processes.
""" Server side: open a socket on a port, listen for a message from a client, and send an echo reply; echoes lines until eof when client closes socket; spawns a thread to handle each client connection; threads share global memory space with main thread; this is more portable than fork: threads work on standard Windows systems, but process forks do not; """ import time, _thread as thread # or use threading.Thread().start() from socket import * # get socket constructor and constants myHost = '' # server machine, '' means local host myPort = 50007 # listen on a non-reserved port number sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP socket object sockobj.bind((myHost, myPort)) # bind it to server port number sockobj.listen(5) # allow up to 5 pending connects def now(): return time.ctime(time.time()) # current time on the server def handleClient(connection): # in spawned thread: reply time.sleep(5) # simulate a blocking activity while True: # read, write a client socket data = connection.recv(1024) if not data: break reply = 'Echo=>%s at %s' % (data, now()) connection.send(reply.encode()) connection.close() def dispatcher(): # listen until process killed while True: # wait for next connection, connection, address = sockobj.accept() # pass to thread for service print('Server connected by', address, end=' ') print('at', now()) thread.start_new_thread(handleClient, (connection,)) dispatcher()
This dispatcher
delegates
each incoming client connection request to a newly spawned thread
running the handleClient
function. As a result, this server can process multiple clients at
once, and the main dispatcher loop can get quickly back to the top
to check for newly arrived requests. The net effect is that new
clients won’t be denied service due to a busy server.
Functionally, this version is similar to the fork
solution (clients are handled in
parallel), but it will work on any machine that supports threads,
including Windows and Linux. Let’s test it on both. First, start the
server on a Linux machine and run clients on both Linux and
Windows:
[window 1: thread-based server process, server keeps accepting client connections while threads are servicing prior requests] [...]$python thread-server.py
Server connected by ('127.0.0.1', 37335) at Sun Apr 25 08:59:05 2010 Server connected by ('72.236.109.185', 58866) at Sun Apr 25 08:59:54 2010 Server connected by ('72.236.109.185', 58867) at Sun Apr 25 08:59:56 2010 Server connected by ('72.236.109.185', 58868) at Sun Apr 25 08:59:58 2010 [window 2: client, but on same remote server machine] [...]$python echo-client.py
Client received: b"Echo=>b'Hello network world' at Sun Apr 25 08:59:10 2010" [windows 3-5: local clients, PC] C:...PP4EInternetSockets>python echo-client.py learning-python.com
Client received: b"Echo=>b'Hello network world' at Sun Apr 25 08:59:59 2010" C:...PP4EInternetSockets>python echo-client.py learning-python.com Bruce
Client received: b"Echo=>b'Bruce' at Sun Apr 25 09:00:01 2010" C:...Sockets>python echo-client.py learning-python.com The Meaning of life
Client received: b"Echo=>b'The' at Sun Apr 25 09:00:03 2010" Client received: b"Echo=>b'Meaning' at Sun Apr 25 09:00:03 2010" Client received: b"Echo=>b'of' at Sun Apr 25 09:00:03 2010" Client received: b"Echo=>b'life' at Sun Apr 25 09:00:03 2010"
Because this server uses threads rather than forked processes, we can run it portably on both Linux and a Windows PC. Here it is at work again, running on the same local Windows PC as its clients; again, the main point to notice is that new clients are accepted while prior clients are being processed in parallel with other clients and the main thread (in the five-second sleep delay):
[window 1: server, on local PC] C:...PP4EInternetSockets>python thread-server.py
Server connected by ('127.0.0.1', 58987) at Sun Apr 25 12:41:46 2010 Server connected by ('127.0.0.1', 58988) at Sun Apr 25 12:41:47 2010 Server connected by ('127.0.0.1', 58989) at Sun Apr 25 12:41:49 2010 [windows 2-4: clients, on local PC] C:...PP4EInternetSockets>python echo-client.py
Client received: b"Echo=>b'Hello network world' at Sun Apr 25 12:41:51 2010" C:...PP4EInternetSockets>python echo-client.py localhost Brian
Client received: b"Echo=>b'Brian' at Sun Apr 25 12:41:52 2010" C:...PP4EInternetSockets>python echo-client.py localhost Bright side of life
Client received: b"Echo=>b'Bright' at Sun Apr 25 12:41:54 2010" Client received: b"Echo=>b'side' at Sun Apr 25 12:41:54 2010" Client received: b"Echo=>b'of' at Sun Apr 25 12:41:54 2010" Client received: b"Echo=>b'life' at Sun Apr 25 12:41:54 2010"
Remember that a thread silently exits when the function it is
running returns; unlike the process fork
version, we don’t call anything like
os
._exit
in the client handler function (and
we shouldn’t—it may kill all threads in the process, including the
main loop watching for new connections!). Because of this, the
thread version is not only more portable, but also simpler.
Now that I’ve shown you how to write forking and threading servers
to process clients without blocking incoming requests, I should also
tell you that there are standard tools in the Python standard
library to make this process even easier. In particular, the
socketserver
module defines
classes that implement all flavors of forking and threading servers
that you are likely to be interested in.
Like the manually-coded servers we’ve just studied, this module’s primary classes implement servers which process clients in parallel (a.k.a. asynchronously) to avoid denying service to new requests during long-running transactions. Their net effect is to automate the top-levels of common server code. To use this module, simply create the desired kind of imported server object, passing in a handler object with a callback method of your own, as demonstrated in the threaded TCP server of Example 12-8.
""" Server side: open a socket on a port, listen for a message from a client, and send an echo reply; this version uses the standard library module socketserver to do its work; socketserver provides TCPServer, ThreadingTCPServer, ForkingTCPServer, UDP variants of these, and more, and routes each client connect request to a new instance of a passed-in request handler object's handle method; socketserver also supports Unix domain sockets, but only on Unixen; see the Python library manual. """ import socketserver, time # get socket server, handler objects myHost = '' # server machine, '' means local host myPort = 50007 # listen on a non-reserved port number def now(): return time.ctime(time.time()) class MyClientHandler(socketserver.BaseRequestHandler): def handle(self): # on each client connect print(self.client_address, now()) # show this client's address time.sleep(5) # simulate a blocking activity while True: # self.request is client socket data = self.request.recv(1024) # read, write a client socket if not data: break reply = 'Echo=>%s at %s' % (data, now()) self.request.send(reply.encode()) self.request.close() # make a threaded server, listen/handle clients forever myaddr = (myHost, myPort) server = socketserver.ThreadingTCPServer(myaddr, MyClientHandler) server.serve_forever()
This server works the same as the threading server we wrote by
hand in the previous section, but instead focuses on service
implementation (the customized handle
method), not on threading details.
It is run the same way, too—here it is processing three clients
started by hand, plus eight spawned by the testecho
script shown we wrote in Example 12-3:
[window 1: server, serverHost='localhost' in echo-client.py] C:...PP4EInternetSockets>python class-server.py
('127.0.0.1', 59036) Sun Apr 25 13:50:23 2010 ('127.0.0.1', 59037) Sun Apr 25 13:50:25 2010 ('127.0.0.1', 59038) Sun Apr 25 13:50:26 2010 ('127.0.0.1', 59039) Sun Apr 25 13:51:05 2010 ('127.0.0.1', 59040) Sun Apr 25 13:51:05 2010 ('127.0.0.1', 59041) Sun Apr 25 13:51:06 2010 ('127.0.0.1', 59042) Sun Apr 25 13:51:06 2010 ('127.0.0.1', 59043) Sun Apr 25 13:51:06 2010 ('127.0.0.1', 59044) Sun Apr 25 13:51:06 2010 ('127.0.0.1', 59045) Sun Apr 25 13:51:06 2010 ('127.0.0.1', 59046) Sun Apr 25 13:51:06 2010 [windows 2-4: client, same machine] C:...PP4EInternetSockets>python echo-client.py
Client received: b"Echo=>b'Hello network world' at Sun Apr 25 13:50:28 2010" C:...PP4EInternetSockets>python echo-client.py localhost Arthur
Client received: b"Echo=>b'Arthur' at Sun Apr 25 13:50:30 2010" C:...PP4EInternetSockets>python echo-client.py localhost Brave Sir Robin
Client received: b"Echo=>b'Brave' at Sun Apr 25 13:50:31 2010" Client received: b"Echo=>b'Sir' at Sun Apr 25 13:50:31 2010" Client received: b"Echo=>b'Robin' at Sun Apr 25 13:50:31 2010" C:...PP4EInternetSockets>python testecho.py
To build a forking server instead, just use the class name
ForkingTCPServer
when creating
the server object. The socketserver
module has more power than
shown by this example; it also supports nonparallel (a.k.a. serial
or synchronous) servers, UDP and Unix domain sockets, and Ctrl-C
server interrupts on Windows. See Python’s library manual for more
details.
For more advanced server needs, Python also comes with standard library tools that use those shown here, and allow you to implement in just a few lines of Python code a simple but fully-functional HTTP (web) server that knows how to run server-side CGI scripts. We’ll explore those larger server tools in Chapter 15.
So far we’ve seen how to handle multiple clients at once with both forked processes and spawned threads, and we’ve looked at a library class that encapsulates both schemes. Under both approaches, all client handlers seem to run in parallel with one another and with the main dispatch loop that continues watching for new incoming requests. Because all of these tasks run in parallel (i.e., at the same time), the server doesn’t get blocked when accepting new requests or when processing a long-running client handler.
Technically, though, threads and processes don’t really run in parallel, unless you’re lucky enough to have a machine with many CPUs. Instead, your operating system performs a juggling act—it divides the computer’s processing power among all active tasks. It runs part of one, then part of another, and so on. All the tasks appear to run in parallel, but only because the operating system switches focus between tasks so fast that you don’t usually notice. This process of switching between tasks is sometimes called time-slicing when done by an operating system; it is more generally known as multiplexing.
When we spawn threads and processes, we rely on the operating system to juggle the active tasks so that none are starved of computing resources, especially the main server dispatcher loop. However, there’s no reason that a Python script can’t do so as well. For instance, a script might divide tasks into multiple steps—run a step of one task, then one of another, and so on, until all are completed. The script need only know how to divide its attention among the multiple active tasks to multiplex on its own.
Servers can apply this technique to yield yet another way to
handle multiple clients at once, a way that requires neither threads
nor forks. By multiplexing client connections and the main
dispatcher with the select
system
call, a single event loop can process multiple clients and accept
new ones in parallel (or at least close enough to avoid stalling).
Such servers are sometimes called asynchronous,
because they service clients in spurts, as each becomes ready to
communicate. In asynchronous servers, a single main loop run in a
single process and thread decides which clients should get a bit of
attention each time through. Client requests and the main dispatcher
loop are each given a small slice of the server’s attention if they
are ready to converse.
Most of the magic behind this server structure is the
operating system select
call,
available in Python’s standard select
module on all major platforms.
Roughly, select
is asked to
monitor a list of input sources, output sources, and exceptional
condition sources and tells us which sources are ready for
processing. It can be made to simply poll all the sources to see
which are ready; wait for a maximum time period for sources to
become ready; or wait indefinitely until one or more sources are
ready for processing.
However used, select
lets
us direct attention to sockets ready to communicate, so as to avoid
blocking on calls to ones that are not. That is, when the sources
passed to select
are sockets, we
can be sure that socket calls like accept
, recv
, and send
will not block (pause) the server
when applied to objects returned by select
. Because of that, a single-loop
server that uses select
need not
get stuck communicating with one client or waiting for new ones
while other clients are starved for the server’s attention.
Because this type of server does not need to start threads or processes, it can be efficient when transactions with clients are relatively short-lived. However, it also requires that these transactions be quick; if they are not, it still runs the risk of becoming bogged down waiting for a dialog with a particular client to end, unless augmented with threads or forks for long-running transactions.[46]
Let’s see how all of this translates into code. The script
in Example 12-9 implements
another echo
server, one that
can handle multiple clients without ever starting new processes or
threads.
""" Server: handle multiple clients in parallel with select. use the select module to manually multiplex among a set of sockets: main sockets which accept new client connections, and input sockets connected to accepted clients; select can take an optional 4th arg--0 to poll, n.m to wait n.m seconds, or omitted to wait till any socket is ready for processing. """ import sys, time from select import select from socket import socket, AF_INET, SOCK_STREAM def now(): return time.ctime(time.time()) myHost = '' # server machine, '' means local host myPort = 50007 # listen on a non-reserved port number if len(sys.argv) == 3: # allow host/port as cmdline args too myHost, myPort = sys.argv[1:] numPortSocks = 2 # number of ports for client connects # make main sockets for accepting new client requests mainsocks, readsocks, writesocks = [], [], [] for i in range(numPortSocks): portsock = socket(AF_INET, SOCK_STREAM) # make a TCP/IP socket object portsock.bind((myHost, myPort)) # bind it to server port number portsock.listen(5) # listen, allow 5 pending connects mainsocks.append(portsock) # add to main list to identify readsocks.append(portsock) # add to select inputs list myPort += 1 # bind on consecutive ports # event loop: listen and multiplex until server process killed print('select-server loop starting') while True: #print(readsocks) readables, writeables, exceptions = select(readsocks, writesocks, []) for sockobj in readables: if sockobj in mainsocks: # for ready input sockets # port socket: accept new client newsock, address = sockobj.accept() # accept should not block print('Connect:', address, id(newsock)) # newsock is a new socket readsocks.append(newsock) # add to select list, wait else: # client socket: read next line data = sockobj.recv(1024) # recv should not block print(' got', data, 'on', id(sockobj)) if not data: # if closed by the clients sockobj.close() # close here and remv from readsocks.remove(sockobj) # del list else reselected else: # this may block: should really select for writes too reply = 'Echo=>%s at %s' % (data, now()) sockobj.send(reply.encode())
The bulk of this script is its while
event loop at the end that calls
select
to find out which
sockets are ready for processing; these include both main port
sockets on which clients can connect and open client connections.
It then loops over all such ready sockets, accepting connections
on main port sockets and reading and echoing input on any client
sockets ready for input. Both the accept
and recv
calls in this code are guaranteed
to not block the server process after select
returns; as a result, this server
can quickly get back to the top of the loop to process newly
arrived client requests and already connected clients’ inputs. The
net effect is that all new requests and clients are serviced in
pseudoparallel fashion.
To make this process work, the server appends the connected
socket for each client to the readables
list passed to select
, and simply waits for the socket
to show up in the selected inputs list. For illustration purposes,
this server also listens for new clients on more than one port—on
ports 50007 and 50008, in our examples. Because these main port
sockets are also interrogated with select
, connection requests on either
port can be accepted without blocking either already connected
clients or new connection requests appearing on the other port.
The select
call returns
whatever sockets in readables
are ready for
processing—both main port sockets and sockets connected to clients
currently being processed.
Let’s run this script locally to see how it does its stuff
(the client and server can also be run on different machines, as
in prior socket examples). First, we’ll assume we’ve already
started this server script on the local machine in one window, and
run a few clients to talk to it. The following listing gives the
interaction in two such client console windows running on Windows.
The first client simply runs the echo-client
script twice to contact the
server, and the second also kicks off the testecho
script to spawn eight echo-client
programs running in
parallel.
As before, the server simply echoes back whatever text that
client sends, though without a sleep pause here (more on this in a
moment). Notice how the second client window really runs a script
called echo-client-50008
so as
to connect to the second port socket in the server; it’s the same
as echo-client
, with a
different hardcoded port number; alas, the original script wasn’t
designed to input a port number:
[client window 1] C:...PP4EInternetSockets>python echo-client.py
Client received: b"Echo=>b'Hello network world' at Sun Apr 25 14:51:21 2010" C:...PP4EInternetSockets>python echo-client.py
Client received: b"Echo=>b'Hello network world' at Sun Apr 25 14:51:27 2010" [client window 2] C:...PP4EInternetSockets>python echo-client-5008.py localhost Sir Galahad
Client received: b"Echo=>b'Sir' at Sun Apr 25 14:51:22 2010" Client received: b"Echo=>b'Galahad' at Sun Apr 25 14:51:22 2010" C:...PP4EInternetSockets>python testecho.py
The next listing is the sort of output that show up in the
window where the server has been started. The first three
connections come from echo-client
runs; the rest is the result
of the eight programs spawned by testecho
in the second client window. We
can run this server on Windows, too, because select
is available on this platform.
Correlate this output with the server’s code to see how it
runs.
Notice that for testecho
,
new client connections and client inputs are multiplexed together.
If you study the output closely, you’ll see that they overlap in
time, because all activity is dispatched by the single event loop
in the server. In fact, the trace output on the server will
probably look a bit different nearly every time it runs. Clients
and new connections are interleaved almost at random due to timing
differences on the host machines. This happens in the earlier
forking and treading servers, too, but the operating system
automatically switches between the execution paths of the
dispatcher loop and client transactions.
Also note that the server gets an empty string when the client has closed its socket. We take care to close and delete these sockets at the server right away, or else they would be needlessly reselected again and again, each time through the main loop:
[server window]
C:...PP4EInternetSockets> python select-server.py
C:UsersmarkStuffBooks4EPP4EdevExamplesPP4EInternetSockets>python sele
ct-server.py
select-server loop starting
Connect: ('127.0.0.1', 59080) 21339352
got b'Hello network world' on 21339352
got b'' on 21339352
Connect: ('127.0.0.1', 59081) 21338128
got b'Sir' on 21338128
got b'Galahad' on 21338128
got b'' on 21338128
Connect: ('127.0.0.1', 59082) 21339352
got b'Hello network world' on 21339352
got b'' on 21339352
[testecho results]
Connect: ('127.0.0.1', 59083) 21338128
got b'Hello network world' on 21338128
got b'' on 21338128
Connect: ('127.0.0.1', 59084) 21339352
got b'Hello network world' on 21339352
got b'' on 21339352
Connect: ('127.0.0.1', 59085) 21338128
got b'Hello network world' on 21338128
got b'' on 21338128
Connect: ('127.0.0.1', 59086) 21339352
got b'Hello network world' on 21339352
got b'' on 21339352
Connect: ('127.0.0.1', 59087) 21338128
got b'Hello network world' on 21338128
got b'' on 21338128
Connect: ('127.0.0.1', 59088) 21339352
Connect: ('127.0.0.1', 59089) 21338128
got b'Hello network world' on 21339352
got b'Hello network world' on 21338128
Connect: ('127.0.0.1', 59090) 21338056
got b'' on 21339352
got b'' on 21338128
got b'Hello network world' on 21338056
got b'' on 21338056
Besides this more verbose output, there’s another subtle but
crucial difference to notice—a time.sleep
call to simulate a
long-running task doesn’t make sense in the server here. Because
all clients are handled by the same single loop, sleeping would
pause everything, and defeat the whole point
of a multiplexing server. Again, manual multiplexing servers like
this one work well when transactions are short, but also generally
require them to either be so, or be handled specially.
Before we move on, here are a few additional notes and options:
select
call
detailsFormally, select
is
called with three lists of selectable objects (input
sources, output sources, and exceptional condition sources),
plus an optional timeout. The timeout argument may be a real
wait expiration value in seconds (use floating-point numbers
to express fractions of a second), a zero value to mean
simply poll and return immediately, or omitted to mean wait
until at least one object is ready (as done in our server
script). The call returns a triple of ready objects—subsets
of the first three arguments—any or all of which may be
empty if the timeout expired before sources became
ready.
select
portabilityLike threading, but unlike forking, this server works
in standard Windows Python, too. Technically, the select
call works only for sockets
on Windows, but also works for things like files and pipes
on Unix and Macintosh. For servers running over the
Internet, of course, the primary devices we are interested
in are sockets.
select
lets us be
sure that socket calls like accept
and recv
won’t block (pause) the
caller, but it’s also possible to make Python sockets nonblocking in general. Call the
setblocking
method of
socket objects to set the socket to blocking or nonblocking
mode. For example, given a call like sock.setblocking(flag)
, the socket
sock
is set to
nonblocking mode if the flag is zero and to blocking mode
otherwise. All sockets start out in blocking mode initially,
so socket calls may always make the caller wait.
However, when in nonblocking mode, a socket.error
exception is raised
if a recv
socket call
doesn’t find any data, or if a send
call can’t immediately
transfer data. A script can catch this exception to
determine whether the socket is ready for processing. In
blocking mode, these calls always block until they can
proceed. Of course, there may be much more to processing
client requests than data transfers (requests may also
require long-running computations), so nonblocking sockets
don’t guarantee that servers won’t stall in general. They
are simply another way to code multiplexing servers. Like
select
, they are better
suited when client requests can be serviced quickly.
asyncore
module
frameworkIf you’re interested in using select
, you will probably also be
interested in checking out the asyncore.py
module in the standard
Python library. It implements a class-based callback model,
where input and output callbacks are dispatched to class
methods by a precoded select
event loop. As such, it
allows servers to be constructed without threads or forks,
and it is a select
-based
alternative to the socketserver
module’s threading
and forking module we met in the prior sections. As for this
type of server in general, asyncore
is best when transactions
are short—what it describes as “I/O bound” instead of “CPU
bound” programs, the latter of which still require threads
or forks. See the Python library manual for details and a
usage example.
For other server options, see also the open source Twisted system (http://twistedmatrix.com). Twisted is an asynchronous networking framework written in Python that supports TCP, UDP, multicast, SSL/TLS, serial communication, and more. It supports both clients and servers and includes implementations of a number of commonly used network services such as a web server, an IRC chat server, a mail server, a relational database interface, and an object broker.
Although Twisted supports processes and threads for
longer-running actions, it also uses an asynchronous,
event-driven model to handle clients, which is similar to
the event loop of GUI libraries like tkinter. It abstracts
an event loop, which multiplexes among open socket
connections, automates many of the details inherent in an
asynchronous server, and provides an event-driven framework
for scripts to use to accomplish application tasks.
Twisted’s internal event engine is similar in spirit to our
select
-based server and
the asyncore
module, but
it is regarded as much more advanced. Twisted is a
third-party system, not a standard library tool; see its
website and documentation for more details.
So when should you use select
to build a server, instead of
threads or forks? Needs vary per application, of course, but as
mentioned, servers based on the select
call generally perform very well
when client transactions are relatively short and are not CPU-bound.
If they are not short, threads or forks may be a better way to split
processing among multiple clients. Threads and forks are especially
useful if clients require long-running processing above and beyond
the socket calls used to pass data. However, combinations are
possible too—nothing is stopping a select-based polling loop from
using threads, too.
It’s important to remember that schemes based on select
(and nonblocking sockets) are not
completely immune to blocking. In Example 12-9, for instance, the
send
call that echoes text back
to a client might block, too, and hence stall the entire server. We
could work around that blocking potential by using select
to make sure that the output
operation is ready before we attempt it (e.g., use the writesocks
list and add another loop to
send replies to ready output sockets), albeit at a noticeable cost
in program clarity.
In general, though, if we cannot split up the processing of a
client’s request in such a way that it can be multiplexed with other
requests and not block the server’s main loop, select
may not be the best way to
construct a server by itself. While some network servers can satisfy
this constraint, many cannot.
Moreover, select
also seems
more complex than spawning either processes or threads, because we
need to manually transfer control among all tasks (for instance,
compare the threaded and select
versions of our echo server, even without write selects). As usual,
though, the degree of that complexity varies per application. The
asyncore standard library module mentioned
earlier simplifies some of the tasks of implementing a select
-based event-loop socket server, and
Twisted offers additional hybrid solutions.
So far in this chapter, we’ve focused on the role of sockets in the classic client/server networking model. That’s one of their primary roles, but they have other common use cases as well.
In Chapter 5, for instance, we saw sockets as a basic IPC device between processes and threads on a single machine. And in Chapter 10’s exploration of linking non-GUI scripts to GUIs, we wrote a utility module (Example 10-23) which connected a caller’s standard output stream to a socket, on which a GUI could listen for output to be displayed. There, I promised that we’d flesh out that module with additional transfer modes once we had a chance to explore sockets in more depth. Now that we have, this section takes a brief detour from the world of remote network servers to tell the rest of this story.
Although some programs can be written or rewritten to converse
over sockets explicitly, this isn’t always an option; it may be too
expensive an effort for existing scripts, and might preclude desirable
nonsocket usage modes for others. In some cases, it’s better to allow
a script to use standard stream tools such as the print
and input
built-in functions and sys
module file calls (e.g., sys.stdout.write
), and connect them to
sockets only when needed.
Because such stream tools are designed to operate on text-mode files, though, probably the biggest trick here is fooling them into operating on the inherently binary mode and very different method interface of sockets. Luckily, sockets come with a method that achieves all the forgery we need.
The socket object makefile
method comes in handy anytime you
wish to process a socket with normal file object methods or need to
pass a socket to an existing interface or program that expects a file.
The socket wrapper object returned allows your scripts to transfer
data over the underlying socket with read
and write
calls, rather than recv
and send
. Since input
and print
built-in functions use the former
methods set, they will happily interact with sockets wrapped by this
call, too.
The makefile
method also
allows us to treat normally binary socket data as text instead of byte
strings, and has additional arguments such as encoding
that let us specify nondefault
Unicode encodings for the transferred text—much like the built-in
open
and os.fdopen
calls we met in Chapter 4 do for file descriptors.
Although text can always be encoded and decoded with manual calls
after binary mode socket transfers, makefile
shifts the burden of text encodings
from your code to the file wrapper object.
This equivalence to files comes in handy any time we want to use
software that supports file interfaces. For example, the Python
pickle
module’s load
and dump
methods expect an object with a
file-like interface (e.g., read
and
write
methods), but they don’t
require a physical file. Passing a TCP/IP socket wrapped with the
makefile
call to the pickler allows
us to ship serialized Python objects over the Internet, without having
to pickle to byte strings ourselves and call raw socket methods
manually. This is an alternative to using the pickle
module’s string-based calls (dumps
, loads
) with socket send
and recv
calls, and might offer more flexibility
for software that must support a variety of transport mechanisms. See
Chapter 17 for more details on
object serialization interfaces.
More generally, any component that expects a file-like method
protocol will gladly accept a socket wrapped with a socket object
makefile
call. Such interfaces will
also accept strings wrapped with the built-in io.StringIO
class, and any other sort of
object that supports the same kinds of method calls as built-in file
objects. As always in Python, we code to protocols—object
interfaces—not to specific datatypes.
To illustrate the makefile
method’s operation, Example 12-10 implements a variety
of redirection schemes, which redirect the caller’s streams to a
socket that can be used by another process for communication. The
first of its functions connects output, and is what we used in Chapter 10; the others connect input, and
both input and output in three different modes.
Naturally, the wrapper object returned by socket.makefile
can also be used with
direct file interface read
and
write
method calls and
independently of standard streams. This example uses those methods,
too, albeit in most cases indirectly and implicitly through the
print
and input
stream access built-ins, and
reflects a common use case for the tool.
""" ############################################################################### Tools for connecting standard streams of non-GUI programs to sockets that a GUI (or other) program can use to interact with the non-GUI program. ############################################################################### """ import sys from socket import * port = 50008 # pass in different port if multiple dialogs on machine host = 'localhost' # pass in different host to connect to remote listeners def initListenerSocket(port=port): """ initialize connected socket for callers that listen in server mode """ sock = socket(AF_INET, SOCK_STREAM) sock.bind(('', port)) # listen on this port number sock.listen(5) # set pending queue length conn, addr = sock.accept() # wait for client to connect return conn # return connected socket def redirectOut(port=port, host=host): """ connect caller's standard output stream to a socket for GUI to listen start caller after listener started, else connect fails before accept """ sock = socket(AF_INET, SOCK_STREAM) sock.connect((host, port)) # caller operates in client mode file = sock.makefile('w') # file interface: text, buffered sys.stdout = file # make prints go to sock.send return sock # if caller needs to access it raw def redirectIn(port=port, host=host): """ connect caller's standard input stream to a socket for GUI to provide """ sock = socket(AF_INET, SOCK_STREAM) sock.connect((host, port)) file = sock.makefile('r') # file interface wrapper sys.stdin = file # make input come from sock.recv return sock # return value can be ignored def redirectBothAsClient(port=port, host=host): """ connect caller's standard input and output stream to same socket in this mode, caller is client to a server: sends msg, receives reply """ sock = socket(AF_INET, SOCK_STREAM) sock.connect((host, port)) # or open in 'rw' mode ofile = sock.makefile('w') # file interface: text, buffered ifile = sock.makefile('r') # two file objects wrap same socket sys.stdout = ofile # make prints go to sock.send sys.stdin = ifile # make input come from sock.recv return sock def redirectBothAsServer(port=port, host=host): """ connect caller's standard input and output stream to same socket in this mode, caller is server to a client: receives msg, send reply """ sock = socket(AF_INET, SOCK_STREAM) sock.bind((host, port)) # caller is listener here sock.listen(5) conn, addr = sock.accept() ofile = conn.makefile('w') # file interface wrapper ifile = conn.makefile('r') # two file objects wrap same socket sys.stdout = ofile # make prints go to sock.send sys.stdin = ifile # make input come from sock.recv return conn
To test, the script in Example 12-11 defines five sets of
client/server functions. It runs the client’s code in process, but
deploys the Python multiprocessing
module we met in Chapter 5 to portably spawn the server
function’s side of the dialog in a separate process. In the end, the
client and server test functions run in different processes, but
converse over a socket that is connected to standard streams within
the test script’s process.
""" ############################################################################### test the socket_stream_redirect.py modes ############################################################################### """ import sys, os, multiprocessing from socket_stream_redirect import * ############################################################################### # redirected client output ############################################################################### def server1(): mypid = os.getpid() conn = initListenerSocket() # block till client connect file = conn.makefile('r') for i in range(3): # read/recv client's prints data = file.readline().rstrip() # block till data ready print('server %s got [%s]' % (mypid, data)) # print normally to terminal def client1(): mypid = os.getpid() redirectOut() for i in range(3): print('client %s: %s' % (mypid, i)) # print to socket sys.stdout.flush() # else buffered till exits! ############################################################################### # redirected client input ############################################################################### def server2(): mypid = os.getpid() # raw socket not buffered conn = initListenerSocket() # send to client's input for i in range(3): conn.send(('server %s: %s ' % (mypid, i)).encode()) def client2(): mypid = os.getpid() redirectIn() for i in range(3): data = input() # input from socket print('client %s got [%s]' % (mypid, data)) # print normally to terminal ############################################################################### # redirect client input + output, client is socket client ############################################################################### def server3(): mypid = os.getpid() conn = initListenerSocket() # wait for client connect file = conn.makefile('r') # recv print(), send input() for i in range(3): # readline blocks till data data = file.readline().rstrip() conn.send(('server %s got [%s] ' % (mypid, data)).encode()) def client3(): mypid = os.getpid() redirectBothAsClient() for i in range(3): print('client %s: %s' % (mypid, i)) # print to socket data = input() # input from socket: flushes! sys.stderr.write('client %s got [%s] ' % (mypid, data)) # not redirected ############################################################################### # redirect client input + output, client is socket server ############################################################################### def server4(): mypid = os.getpid() sock = socket(AF_INET, SOCK_STREAM) sock.connect((host, port)) file = sock.makefile('r') for i in range(3): sock.send(('server %s: %s ' % (mypid, i)).encode()) # send to input() data = file.readline().rstrip() # recv from print() print('server %s got [%s]' % (mypid, data)) # result to terminal def client4(): mypid = os.getpid() redirectBothAsServer() # I'm actually the socket server in this mode for i in range(3): data = input() # input from socket: flushes! print('client %s got [%s]' % (mypid, data)) # print to socket sys.stdout.flush() # else last buffered till exit! ############################################################################### # redirect client input + output, client is socket client, server xfers first ############################################################################### def server5(): mypid = os.getpid() # test 4, but server accepts conn = initListenerSocket() # wait for client connect file = conn.makefile('r') # send input(), recv print() for i in range(3): conn.send(('server %s: %s ' % (mypid, i)).encode()) data = file.readline().rstrip() print('server %s got [%s]' % (mypid, data)) def client5(): mypid = os.getpid() s = redirectBothAsClient() # I'm the socket client in this mode for i in range(3): data = input() # input from socket: flushes! print('client %s got [%s]' % (mypid, data)) # print to socket sys.stdout.flush() # else last buffered till exit! ############################################################################### # test by number on command-line ############################################################################### if __name__ == '__main__': server = eval('server' + sys.argv[1]) client = eval('client' + sys.argv[1]) # client in this process multiprocessing.Process(target=server).start() # server in new process client() # reset streams in client #import time; time.sleep(5) # test effect of exit flush
Run the test script with a client and server number on the command line to test the module’s tools; messages display process ID numbers, and those within square brackets reflect a transfer across streams connected to sockets (twice, when nested):
C:...PP4EInternetSockets>test-socket_stream_redirect.py 1
server 3844 got [client 1112: 0] server 3844 got [client 1112: 1] server 3844 got [client 1112: 2] C:...PP4EInternetSockets>test-socket_stream_redirect.py 2
client 5188 got [server 2020: 0] client 5188 got [server 2020: 1] client 5188 got [server 2020: 2] C:...PP4EInternetSockets>test-socket_stream_redirect.py 3
client 7796 got [server 2780 got [client 7796: 0]] client 7796 got [server 2780 got [client 7796: 1]] client 7796 got [server 2780 got [client 7796: 2]] C:...PP4EInternetSockets>test-socket_stream_redirect.py 4
server 4288 got [client 3852 got [server 4288: 0]] server 4288 got [client 3852 got [server 4288: 1]] server 4288 got [client 3852 got [server 4288: 2]] C:...PP4EInternetSockets>test-socket_stream_redirect.py 5
server 6040 got [client 7728 got [server 6040: 0]] server 6040 got [client 7728 got [server 6040: 1]] server 6040 got [client 7728 got [server 6040: 2]]
If you correlate this script’s output with its code to see how
messages are passed between client and server, you’ll find that
print
and input
calls in client functions are
ultimately routed over sockets to another process. To the client
functions, the socket linkage is largely invisible.
Before we move on, there are two remarkably subtle aspects of the example’s code worth highlighting:
Raw sockets transfer binary byte strings, but by
opening the wrapper files in text mode, their content is
automatically translated to text strings on input and
output. Text-mode file wrappers are required if accessed
through standard stream tools such as the print
built-in that writes text
strings (as we’ve learned, binary mode files require byte
strings instead). When dealing with the raw socket directly,
though, text must still be manually encoded to byte strings,
as shown in most of Example 12-11’s
tests.
As we learned in Chapters 5 and 10, standard streams are normally buffered, and printed text may need to be flushed so that it appears on a socket connected to a process’s output stream. Indeed, some of this example’s tests require explicit or implicit flush calls to work properly at all; otherwise their output is either incomplete or absent altogether until program exit. In pathological cases, this can lead to deadlock, with a process waiting for output from another that never appears. In other configurations, we may also get socket errors in a reader if a writer exits too soon, especially in two-way dialogs.
For example, if client1
and client4
did not flush periodically
as they do, the only reason that they would work is because
output streams are automatically flushed when their process
exits. Without manual flushes, client1
transfers no data until
process exit (at which point all its output is sent at once
in a single message), and client4
’s data is incomplete till
exit (its last printed message is delayed).
Even more subtly, both client3
and client4
rely on the fact that the
input
built-in first
automatically flushes sys.stdout
internally for its
prompt option, thereby sending data from preceding print
calls. Without this implicit
flush (or the addition of manual flushes), client3
would experience
deadlock immediately, as would client4
if its manual flush call
was removed (even with input
’s flush, removing client4
’s manual flush causes its
final print
message to
not be transferred until process exit). client5
has this same behavior as
client4
, because it
simply swaps which process binds and accepts and which
connects.
In the general case, if we wish to read a program’s
output as it is produced, instead of all at once when it
exits or as its buffers fill, the program must either call
sys.stdout.flush
periodically, or be run with unbuffered streams by using
Python’s -u
command-line argument of Chapter 5 if applicable.
Although we can open socket wrapper files in
unbuffered mode with a second makefile
argument of zero (like
normal open
), this does
not allow the wrapper to run in the text mode required for
print
and desired for
input
. In fact,
attempting to make a socket wrapper file both text mode and
unbuffered this way fails with an exception, because Python
3.X no longer supports unbuffered mode for text files (it is
allowed for binary mode only today). In other words, because
print
requires text mode,
buffered mode is also implied for output stream files.
Moreover, attempting to open a socket file wrapper in
line-buffered mode appears to not be
supported in Python 3.X (more on this ahead).
While some buffering behavior may be library and
platform dependent, manual flush calls or direct socket
access might sometimes still be required. Note that sockets
can also be made nonblocking with the setblocking(0)
method, but this
only avoids wait states for transfer calls and does not
address the data producer’s failure to send buffered
output.
To make some of this more concrete, Example 12-12 illustrates how
some of these complexities apply to redirected standard streams,
by attempting to connect them to both text and binary mode files
produced by open
and accessing
them with print
and input
built-ins much as redirected
script might.
""" test effect of connecting standard streams to text and binary mode files same holds true for socket.makefile: print requires text mode, but text mode precludes unbuffered mode -- use -u or sys.stdout.flush() calls """ import sys def reader(F): tmp, sys.stdin = sys.stdin, F line = input() print(line) sys.stdin = tmp reader( open('test-stream-modes.py') ) # works: input() returns text reader( open('test-stream-modes.py', 'rb') ) # works: but input() returns bytes def writer(F): tmp, sys.stdout = sys.stdout, F print(99, 'spam') sys.stdout = tmp writer( open('temp', 'w') ) # works: print() passes text str to .write() print(open('temp').read()) writer( open('temp', 'wb') ) # FAILS on print: binary mode requires bytes writer( open('temp', 'w', 0) ) # FAILS on open: text must be buffered
When run, the last two lines in this script both fail—the
second to last fails because print
passes text strings to a
binary-mode file (never allowed for files in general), and the
last fails because we cannot open text-mode files in unbuffered
mode in Python 3.X (text mode implies Unicode encodings). Here are
the errors we get when this script is run: the first run uses the
script as shown, and the second shows what happens if the second
to last line is commented out (I edited the exception text
slightly for presentation):
C:...PP4EInternetSockets>test-stream-modes.py
""" b'""" ' 99 spam Traceback (most recent call last): File "C:...PP4EInternetSockets est-stream-modes.py", line 26, in <module> writer( open('temp', 'wb') ) # FAILS on print: binary mode... File "C:...PP4EInternetSockets est-stream-modes.py", line 20, in writer print(99, 'spam') TypeError: must be bytes or buffer, not str C:...PP4EInternetSockets>test-streams-binary.py
""" b'""" ' 99 spam Traceback (most recent call last): File "C:...PP4EInternetSockets est-stream-modes.py", line 27, in <module> writer( open('temp', 'w', 0) ) # FAILS on open: text must be... ValueError: can't have unbuffered text I/O
The same rules apply to socket wrapper file objects created
with a socket’s makefile
method—they must be opened in text mode for print
and should be opened in text mode
for input
if we wish to receive
text strings, but text mode prevents us from using fully
unbuffered file mode altogether:
>>>from socket import *
>>>s = socket()
# defaults to tcp/ip (AF_INET, SOCK_STREAM) >>>s.makefile('w', 0)
# this used to work in Python 2.X Traceback (most recent call last): File "C:Python31libsocket.py", line 151, in makefile ValueError: unbuffered streams must be binary
Text-mode socket wrappers also accept a buffering-mode argument of 1
to specify
line-buffering instead of the default full
buffering:
>>>from socket import *
>>>s = socket()
>>>f = s.makefile('w', 1)
# same as buffering=1, but acts as fully buffered!
This appears to be no different than full buffering, and still requires the resulting file to be flushed manually to transfer lines as they are produced. Consider the simple socket server and client scripts in Examples 12-13 and 12-14. The server simply reads three messages using the raw socket interface.
from socket import * # read three messages over a raw socket sock = socket() sock.bind(('', 60000)) sock.listen(5) print('accepting...') conn, id = sock.accept() # blocks till client connect for i in range(3): print('receiving...') msg = conn.recv(1024) # blocks till data received print(msg) # gets all print lines at once unless flushed
The client in Example 12-14 sends three messages; the first two over a socket wrapper file, and the last using the raw socket; the manual flush calls in this are commented out but retained so you can experiment with turning them on, and sleep calls make the server wait for data.
import time # send three msgs over wrapped and raw socket from socket import * sock = socket() # default=AF_INET, SOCK_STREAM (tcp/ip) sock.connect(('localhost', 60000)) file = sock.makefile('w', buffering=1) # default=full buff, 0=error, 1 not linebuff! print('sending data1') file.write('spam ') time.sleep(5) # must follow with flush() to truly send now #file.flush() # uncomment flush lines to see the difference print('sending data2') print('eggs', file=file) # adding more file prints does not flush buffer either time.sleep(5) #file.flush() # output appears at server recv only upon flush or exit print('sending data3') sock.send(b'ham ') # low-level byte string interface sends immediately time.sleep(5) # received first if don't flush other two!
Run the server in one window first and the client in another (or run the server first in the background in Unix-like platforms). The output in the server window follows—the messages sent with the socket wrapper are deferred until program exit, but the raw socket call transfers data immediately:
C:...PP4EInternetSockets> socket-unbuff-server.py
accepting...
receiving...
b'ham
'
receiving...
b'spam
eggs
'
receiving...
b''
The client window simply displays “sending” lines 5 seconds apart; its third message appears at the server in 10 seconds, but the first and second messages it sends using the wrapper file are deferred until exit (for 15 seconds) because the socket wrapper is still fully buffered. If the manual flush calls in the client are uncommented, each of the three sent messages is delivered in serial, 5 seconds apart (the third appears immediately after the second):
C:...PP4EInternetSockets> socket-unbuff-server.py
accepting...
receiving...
b'spam
'
receiving...
b'eggs
'
receiving...
b'ham
'
In other words, even when line buffering is requested, socket wrapper file writes (and by association, prints) are buffered until the program exits, manual flushes are requested, or the buffer becomes full.
The short story here is this: to avoid delayed outputs or
deadlock, scripts that might send data to waiting programs by
printing to wrapped sockets (or for that matter, by using print
or sys.stdout.write
in general) should do
one of the following:
Call sys.stdout.flush
periodically to flush their printed output so it becomes
available as produced, as shown in Example 12-11.
Be run with the -u
Python command-line flag, if possible, to force the output
stream to be unbuffered. This works for unmodified programs
spawned by pipe tools such as os.popen
. It will
not help with the use case here, though,
because we manually reset the stream files to buffered text
socket wrappers after a process starts. To prove this,
uncomment Example 12-11’s manual flush
calls and the sleep call at its end, and run with -u
: the first test’s output is still
delayed for 5 seconds.
Use threads to read from sockets to avoid blocking, especially if the receiving program is a GUI and it cannot depend upon the client to flush. See Chapter 10 for pointers. This doesn’t really fix the problem—the spawned reader thread may be blocked or deadlocked, too—but at least the GUI remains active during waits.
Implement their own custom socket
wrapper objects which intercept text write
calls, encode to binary, and
route to a socket with send
calls; socket.makefile
is
really just a convenience tool, and we can always code a
wrapper of our own for more specific roles. For hints, see
Chapter 10’s GuiOutput
class, the stream
redirection class in Chapter 3, and the classes of the
io
standard library module
(upon which Python’s input/output tools are based, and which
you can mix in custom ways).
Skip print
altogether
and communicate directly with the native interfaces of IPC
devices, such as socket objects’ raw send
and recv
methods—these transfer data
immediately and do not buffer data as file methods can. We can
either transfer simple byte strings this way or use the
pickle
module’s dumps
and loads
tools to convert Python
objects to and from byte strings for such direct socket
transfer (more on pickle
in
Chapter 17).
The latter option may be more direct (and the redirection utility module also returns the raw socket in support of such usage), but it isn’t viable in all scenarios, especially for existing or multimode scripts. In many cases, it may be most straightforward to use manual flush calls in shell-oriented programs whose streams might be linked to other programs through sockets.
Also keep in mind that buffered streams and deadlock are general issues that go beyond socket wrapper files. We explored this topic in Chapter 5; as a quick review, the nonsocket Example 12-15 does not fully buffer its output when it is connected to a terminal (output is only line buffered when run from a shell command prompt), but does if connected to something else (including a socket or pipe).
# output line buffered (unbuffered) if stdout is a terminal, buffered by default for # other devices: use -u or sys.stdout.flush() to avoid delayed output on pipe/socket import time, sys for i in range(5): print(time.asctime()) # print transfers per stream buffering sys.stdout.write('spam ') # ditto for direct stream file access time.sleep(2) # unles sys.stdout reset to other file
Although text-mode files are required for Python 3.X’s
print
in general, the -u
flag still works in 3.X to suppress
full output stream buffering. In Example 12-16, using this flag
makes the spawned script’s printed output appear every 2 seconds,
as it is produced. Not using this flag defers all output for 10
seconds, until the spawned script exits, unless the spawned script
calls sys.stdout
.flush
on each iteration.
# no output for 10 seconds unless Python -u flag used or sys.stdout.flush() # but writer's output appears here every 2 seconds when either option is used import os for line in os.popen('python -u pipe-unbuff-writer.py'): # iterator reads lines print(line, end='') # blocks without -u!
Following is the reader script’s output; unlike the socket
examples, it spawns the writer automatically, so we don’t need
separate windows to test. Recall from Chapter 5 that os.popen
also accepts a buffering
argument much like socket.makefile
, but it does not apply
to the spawned program’s stream, and so would not prevent output
buffering in this case.
C:...PP4EInternetSockets> pipe-unbuff-reader.py
Wed Apr 07 09:32:28 2010
spam
Wed Apr 07 09:32:30 2010
spam
Wed Apr 07 09:32:32 2010
spam
Wed Apr 07 09:32:34 2010
spam
Wed Apr 07 09:32:36 2010
spam
The net effect is that -u
still works around the stream buffering issue for connected
programs in 3.X, as long as you don’t reset the streams to other
objects in the spawned program as we did for socket redirection in
Example 12-11. For socket
redirections, manual flush calls or replacement socket wrappers
may be required.
So why use sockets in this redirection role at all? In short, for server independence and networked use cases. Notice how for command pipes it’s not clear who should be called “server” and “client,” since neither script runs perpetually. In fact, this is one of the major downsides of using command pipes like this instead of sockets—because the programs require a direct spawning relationship, command pipes do not support longer-lived or remotely running servers the way that sockets do.
With sockets, we can start client and server independently,
and the server may continue running perpetually to serve multiple
clients (albeit with some changes to our utility module’s listener
initialization code). Moreover, passing in remote machine names to
our socket redirection tools would allow a client to connect to a
server running on a completely different machine. As we learned in
Chapter 5, named pipes (fifos)
accessed with the open
call
support stronger independence of client and server, too, but
unlike sockets, they are usually limited to the local machine, and
are not supported on all platforms.
Experiment with this code on your own for more insight. Also
try changing Example 12-11
to run the client function in a spawned process instead of or in
addition to the server, with and without flush calls and time.sleep
calls to defer exits; the
spawning structure might have some impact on the soundness of a
given socket dialog structure as well, which we’ll finesse here in
the interest of space.
Despite the care that must be taken with text encodings and stream buffering, the utility provided by Example 12-10 is still arguably impressive—prints and input calls are routed over network or local-machine socket connections in a largely automatic fashion, and with minimal changes to the nonsocket code that uses the module. In many cases, the technique can extend a script’s applicability.
In the next section, we’ll use the makefile
method again to wrap the socket
in a file-like object, so that it can be read by lines using
normal text-file method calls and techniques. This isn’t strictly
required in the example—we could read lines as byte strings with
the socket recv
call, too. In
general, though, the makefile
method comes in handy any time you wish to treat sockets as though
they were simple files. To see this at work, let’s move on.
It’s time for something realistic. Let’s conclude this chapter by putting some of the socket ideas we’ve studied to work doing something a bit more useful than echoing text back and forth. Example 12-17 implements both the server-side and the client-side logic needed to ship a requested file from server to client machines over a raw socket.
In effect, this script implements a simple file download system. One instance of the script is run on the machine where downloadable files live (the server), and another on the machines you wish to copy files to (the clients). Command-line arguments tell the script which flavor to run and optionally name the server machine and port number over which conversations are to occur. A server instance can respond to any number of client file requests at the port on which it listens, because it serves each in a thread.
""" ############################################################################# implement client and server-side logic to transfer an arbitrary file from server to client over a socket; uses a simple control-info protocol rather than separate sockets for control and data (as in ftp), dispatches each client request to a handler thread, and loops to transfer the entire file by blocks; see ftplib examples for a higher-level transport scheme; ############################################################################# """ import sys, os, time, _thread as thread from socket import * blksz = 1024 defaultHost = 'localhost' defaultPort = 50001 helptext = """ Usage... server=> getfile.py -mode server [-port nnn] [-host hhh|localhost] client=> getfile.py [-mode client] -file fff [-port nnn] [-host hhh|localhost] """ def now(): return time.asctime() def parsecommandline(): dict = {} # put in dictionary for easy lookup args = sys.argv[1:] # skip program name at front of args while len(args) >= 2: # example: dict['-mode'] = 'server' dict[args[0]] = args[1] args = args[2:] return dict def client(host, port, filename): sock = socket(AF_INET, SOCK_STREAM) sock.connect((host, port)) sock.send((filename + ' ').encode()) # send remote name with dir: bytes dropdir = os.path.split(filename)[1] # filename at end of dir path file = open(dropdir, 'wb') # create local file in cwd while True: data = sock.recv(blksz) # get up to 1K at a time if not data: break # till closed on server side file.write(data) # store data in local file sock.close() file.close() print('Client got', filename, 'at', now()) def serverthread(clientsock): sockfile = clientsock.makefile('r') # wrap socket in dup file obj filename = sockfile.readline()[:-1] # get filename up to end-line try: file = open(filename, 'rb') while True: bytes = file.read(blksz) # read/send 1K at a time if not bytes: break # until file totally sent sent = clientsock.send(bytes) assert sent == len(bytes) except: print('Error downloading file on server:', filename) clientsock.close() def server(host, port): serversock = socket(AF_INET, SOCK_STREAM) # listen on TCP/IP socket serversock.bind((host, port)) # serve clients in threads serversock.listen(5) while True: clientsock, clientaddr = serversock.accept() print('Server connected by', clientaddr, 'at', now()) thread.start_new_thread(serverthread, (clientsock,)) def main(args): host = args.get('-host', defaultHost) # use args or defaults port = int(args.get('-port', defaultPort)) # is a string in argv if args.get('-mode') == 'server': # None if no -mode: client if host == 'localhost': host = '' # else fails remotely server(host, port) elif args.get('-file'): # client mode needs -file client(host, port, args['-file']) else: print(helptext) if __name__ == '__main__': args = parsecommandline() main(args)
This script isn’t much different from the examples we saw earlier. Depending on the command-line arguments passed, it invokes one of two functions:
The server
function farms
out each incoming client request to a thread that transfers the
requested file’s bytes.
The client
function sends
the server a file’s name and stores all the bytes it gets back in
a local file of the same name.
The most novel feature here is the protocol between client and server: the client starts the conversation by shipping a filename string up to the server, terminated with an end-of-line character, and including the file’s directory path in the server. At the server, a spawned thread extracts the requested file’s name by reading the client socket, and opens and transfers the requested file back to the client, one chunk of bytes at a time.
Since the server uses threads to process clients, we can test both client and server on the same Windows machine. First, let’s start a server instance and execute two client instances on the same machine while the server runs:
[server window, localhost] C:...InternetSockets>python getfile.py -mode server
Server connected by ('127.0.0.1', 59134) at Sun Apr 25 16:26:50 2010 Server connected by ('127.0.0.1', 59135) at Sun Apr 25 16:27:21 2010 [client window, localhost] C:...InternetSockets>dir /B *.gif *.txt
File Not Found C:...InternetSockets>python getfile.py -file testdirora-lp4e.gif
Client got testdirora-lp4e.gif at Sun Apr 25 16:26:50 2010 C:...InternetSockets>python getfile.py -file testdir extfile.txt -port 50001
Client got testdir extfile.txt at Sun Apr 25 16:27:21 2010
Clients run in the directory where you want the downloaded
file to appear—the client instance code strips the server directory
path when making the local file’s name. Here the “download” simply
copies the requested files up to the local parent directory (the DOS
fc
command compares file
contents):
C:...InternetSockets>dir /B *.gif *.txt
ora-lp4e.gif textfile.txt C:...InternetSockets>fc /B ora-lp4e.gif testdir/ora-lp4e.gif
FC: no differences encountered C:...InternetSockets>fc textfile.txt testdir extfile.txt
FC: no differences encountered
As usual, we can run server and clients on different machines as well. For instance, here are the sort of commands we would use to launch the server remotely and fetch files from it locally; run this on your own to see the client and server outputs:
[remote server window] [...]$python getfile.py -mode server
[client window: requested file downloaded in a thread on server] C:...InternetSockets>python getfile.py –mode client
-host learning-python.com
-port 50001 -file python.exe
C:...InternetSockets>python getfile.py
-host learning-python.com -file index.html
One subtle security point here: the server instance code is
happy to send any server-side file whose pathname is sent from a
client, as long as the server is run with a username that has read
access to the requested file. If you care about keeping some of your
server-side files private, you should add logic to suppress
downloads of restricted files. I’ll leave this as a suggested
exercise here, but we will implement such filename checks in a
different getfile
download tool
later in this book.[47]
After all the GUI commotion in the prior part of this book, you might have noticed that we have been living in the realm of the command line for this entire chapter—our socket clients and servers have been started from simple DOS or Linux shells. Nothing is stopping us from adding a nice point-and-click user interface to some of these scripts, though; GUI and network scripting are not mutually exclusive techniques. In fact, they can be arguably “sexy” when used together well.
For instance, it would be easy to implement a simple tkinter
GUI frontend to the client-side portion of the getfile
script we just met. Such a tool,
run on the client machine, may simply pop up a window with Entry
widgets for typing the desired
filename, server, and so on. Once download parameters have been
input, the user interface could either import and call the getfile.client
function with appropriate
option arguments, or build and run the implied getfile.py
command line using tools such
as os.system
, os.popen
, subprocess
, and so on.
To help make all of this more concrete, let’s very quickly
explore a few simple scripts that add a tkinter frontend to the
getfile
client-side program.
All of these examples assume that you are running a server
instance of getfile
; they
merely add a GUI for the client side of the conversation, to fetch
a file from the server. The first, in Example 12-18, uses form
construction techniques we met in Chapters 8 and 9 to create a dialog for
inputting server, port, and filename information, and simply
constructs the corresponding getfile
command line and runs it with
the os.system
call we studied
in Part II.
""" launch getfile script client from simple tkinter GUI; could also use os.fork+exec, os.spawnv (see Launcher); windows: replace 'python' with 'start' if not on path; """ import sys, os from tkinter import * from tkinter.messagebox import showinfo def onReturnKey(): cmdline = ('python getfile.py -mode client -file %s -port %s -host %s' % (content['File'].get(), content['Port'].get(), content['Server'].get())) os.system(cmdline) showinfo('getfilegui-1', 'Download complete') box = Tk() labels = ['Server', 'Port', 'File'] content = {} for label in labels: row = Frame(box) row.pack(fill=X) Label(row, text=label, width=6).pack(side=LEFT) entry = Entry(row) entry.pack(side=RIGHT, expand=YES, fill=X) content[label] = entry box.title('getfilegui-1') box.bind('<Return>', (lambda event: onReturnKey())) mainloop()
When run, this script creates the input form shown in Figure 12-1. Pressing the Enter key
(<Return>
) runs a
client-side instance of the getfile
program; when the generated
getfile
command line is
finished, we get the verification pop up displayed in Figure 12-2.
The first user-interface script (Example 12-18) uses the pack
geometry manager and row Frames
with fixed-width labels to
lay out the input form and runs the getfile
client as a standalone program.
As we learned in Chapter 9,
it’s arguably just as easy to use the grid
manager for layout and to import
and call the client-side logic function instead of running a
program. The script in Example 12-19 shows how.
""" same, but with grids and import+call, not packs and cmdline; direct function calls are usually faster than running files; """ import getfile from tkinter import * from tkinter.messagebox import showinfo def onSubmit(): getfile.client(content['Server'].get(), int(content['Port'].get()), content['File'].get()) showinfo('getfilegui-2', 'Download complete') box = Tk() labels = ['Server', 'Port', 'File'] rownum = 0 content = {} for label in labels: Label(box, text=label).grid(column=0, row=rownum) entry = Entry(box) entry.grid(column=1, row=rownum, sticky=E+W) content[label] = entry rownum += 1 box.columnconfigure(0, weight=0) # make expandable box.columnconfigure(1, weight=1) Button(text='Submit', command=onSubmit).grid(row=rownum, column=0, columnspan=2) box.title('getfilegui-2') box.bind('<Return>', (lambda event: onSubmit())) mainloop()
This version makes a similar window (Figure 12-3), but adds a button at the
bottom that does the same thing as an Enter key press—it runs the
getfile
client procedure.
Generally speaking, importing and calling functions (as done here)
is faster than running command lines, especially if done more than
once. The getfile
script is set
up to work either way—as program or function library.
If you’re like me, though, writing all the GUI form layout code in those two scripts can seem a bit tedious, whether you use packing or grids. In fact, it became so tedious to me that I decided to write a general-purpose form-layout class, shown in Example 12-20, which handles most of the GUI layout grunt work.
""" ################################################################## a reusable form class, used by getfilegui (and others) ################################################################## """ from tkinter import * entrysize = 40 class Form: # add non-modal form box def __init__(self, labels, parent=None): # pass field labels list labelsize = max(len(x) for x in labels) + 2 box = Frame(parent) # box has rows, buttons box.pack(expand=YES, fill=X) # rows has row frames rows = Frame(box, bd=2, relief=GROOVE) # go=button or return key rows.pack(side=TOP, expand=YES, fill=X) # runs onSubmit method self.content = {} for label in labels: row = Frame(rows) row.pack(fill=X) Label(row, text=label, width=labelsize).pack(side=LEFT) entry = Entry(row, width=entrysize) entry.pack(side=RIGHT, expand=YES, fill=X) self.content[label] = entry Button(box, text='Cancel', command=self.onCancel).pack(side=RIGHT) Button(box, text='Submit', command=self.onSubmit).pack(side=RIGHT) box.master.bind('<Return>', (lambda event: self.onSubmit())) def onSubmit(self): # override this for key in self.content: # user inputs in print(key, ' => ', self.content[key].get()) # self.content[k] def onCancel(self): # override if need Tk().quit() # default is exit class DynamicForm(Form): def __init__(self, labels=None): labels = input('Enter field names: ').split() Form.__init__(self, labels) def onSubmit(self): print('Field values...') Form.onSubmit(self) self.onCancel() if __name__ == '__main__': import sys if len(sys.argv) == 1: Form(['Name', 'Age', 'Job']) # precoded fields, stay after submit else: DynamicForm() # input fields, go away after submit mainloop()
Compare the approach of this module with that of the form row builder function we wrote in Chapter 10’s Example 10-9. While that example much reduced the amount of code required, the module here is a noticeably more complete and automatic scheme—it builds the entire form given a set of label names, and provides a dictionary with every field’s entry widget ready to be fetched.
Running this module standalone triggers its self-test code at the bottom. Without arguments (and when double-clicked in a Windows file explorer), the self-test generates a form with canned input fields captured in Figure 12-4, and displays the fields’ values on Enter key presses or Submit button clicks:
C:...PP4EInternetSockets> python form.py
Age => 40
Name => Bob
Job => Educator, Entertainer
With a command-line argument, the form class module’s
self-test code prompts for an arbitrary set of field names for the
form; fields can be constructed as dynamically as we like. Figure 12-5 shows the input form
constructed in response to the following console interaction.
Field names could be accepted on the command line, too, but the
input
built-in function works
just as well for simple tests like this. In this mode, the GUI
goes away after the first submit, because DynamicForm.onSubmit
says so:
C:...PP4EInternetSockets> python form.py -
Enter field names: Name Email Web Locale
Field values...
Locale => Florida
Web => http://learning-python.com
Name => Book
Email => [email protected]
And last but not least, Example 12-21 shows the getfile
user interface again, this time
constructed with the reusable form layout class. We need to fill
in only the form labels list and provide an onSubmit
callback method of our own. All
of the work needed to construct the form comes “for free,” from
the imported and widely reusable Form
superclass.
""" launch getfile client with a reusable GUI form class; os.chdir to target local dir if input (getfile stores in cwd); to do: use threads, show download status and getfile prints; """ from form import Form from tkinter import Tk, mainloop from tkinter.messagebox import showinfo import getfile, os class GetfileForm(Form): def __init__(self, oneshot=False): root = Tk() root.title('getfilegui') labels = ['Server Name', 'Port Number', 'File Name', 'Local Dir?'] Form.__init__(self, labels, root) self.oneshot = oneshot def onSubmit(self): Form.onSubmit(self) localdir = self.content['Local Dir?'].get() portnumber = self.content['Port Number'].get() servername = self.content['Server Name'].get() filename = self.content['File Name'].get() if localdir: os.chdir(localdir) portnumber = int(portnumber) getfile.client(servername, portnumber, filename) showinfo('getfilegui', 'Download complete') if self.oneshot: Tk().quit() # else stay in last localdir if __name__ == '__main__': GetfileForm() mainloop()
The form layout class imported here can be used by any program that needs to input form-like data; when used in this script, we get a user interface like that shown in Figure 12-6 under Windows 7 (and similar on other versions and platforms).
Pressing this form’s Submit button or the Enter key makes
the getfilegui
script call the
imported getfile.client
client-side function as before. This time, though, we also first
change to the local directory typed into the form so that the
fetched file is stored there (getfile
stores in the current working
directory, whatever that may be when it is called). Here are the
messages printed in the client’s console, along with a check on
the file transfer; the server is still running above testdir
, but the client stores the file
elsewhere after it’s fetched on the socket:
C:...InternetSockets>getfilegui.py
Local Dir? => C:usersMark emp File Name => testdirora-lp4e.gif Server Name => localhost Port Number => 50001 Client got testdirora-lp4e.gif at Sun Apr 25 17:22:39 2010 C:...InternetSockets>fc /B C:Usersmark empora-lp4e.gif testdirora-lp4e.gif
FC: no differences encountered
As usual, we can use this interface to connect to servers running locally on the same machine (as done here), or remotely on a different computer. Use a different server name and file paths if you’re running the server on a remote machine; the magic of sockets make this all “just work” in either local or remote modes.
One caveat worth pointing out here: the GUI is essentially dead while the download is in progress (even screen redraws aren’t handled—try covering and uncovering the window and you’ll see what I mean). We could make this better by running the download in a thread, but since we’ll see how to do that in the next chapter when we explore the FTP protocol, you should consider this problem a preview.
In closing, a few final notes: first, I should point out that the scripts in this chapter use tkinter techniques we’ve seen before and won’t go into here in the interest of space; be sure to see the GUI chapters in this book for implementation hints.
Keep in mind, too, that these interfaces just add a GUI on top of the existing script to reuse its code; any command-line tool can be easily GUI-ified in this way to make it more appealing and user friendly. In Chapter 14, for example, we’ll meet a more useful client-side tkinter user interface for reading and sending email over sockets (PyMailGUI), which largely just adds a GUI to mail-processing tools. Generally speaking, GUIs can often be added as almost an afterthought to a program. Although the degree of user-interface and core logic separation varies per program, keeping the two distinct makes it easier to focus on each.
And finally, now that I’ve shown you how to build user
interfaces on top of this chapter’s getfile
, I should also say that they
aren’t really as useful as they might seem. In particular,
getfile
clients can talk only
to machines that are running a getfile
server. In the next chapter,
we’ll discover another way to download files—FTP—which also runs
on sockets but provides a higher-level interface and is available
as a standard service on many machines on the Net. We don’t
generally need to start up a custom server to transfer files over
FTP, the way we do with getfile
. In fact, the user-interface
scripts in this chapter could be easily changed to fetch the
desired file with Python’s FTP tools, instead of the getfile
module. But instead of spilling
all the beans here, I’ll just say, “Read on.”
[42] There is even a common acronym for this today: LAMP, for the Linux operating system, the Apache web server, the MySQL database system, and the Python, Perl, and PHP scripting languages. It’s possible, and even very common, to put together an entire enterprise-level web server with open source tools. Python users would probably also like to include systems like Zope, Django, Webware, and CherryPy in this list, but the resulting acronym might be a bit of a stretch.
[43] Some books also use the term protocol to refer to lower-level transport schemes such as TCP. In this book, we use protocol to refer to higher-level structures built on top of sockets; see a networking text if you are curious about what happens at lower levels.
[44] Since Python is an open source system, you can read the
source code of the ftplib
module if you are curious about how the underlying protocol
actually works. See the ftplib.py file in
the standard source library directory in your machine. Its code
is complex (since it must format messages and manage two
sockets), but with the other standard Internet protocol modules,
it is a good example of low-level socket programming.
[45] You might be interested to know that the last part of this example, talking to port 80, is exactly what your web browser does as you surf the Web: followed links direct it to download web pages over this port. In fact, this lowly port is the primary basis of the Web. In Chapter 15, we will meet an entire application environment based upon sending formatted data over port 80—CGI server-side scripting. At the bottom, though, the Web is just bytes over sockets, with a user interface. The wizard behind the curtain is not as impressive as he may seem!
[46] Confusingly, select
-based servers are often called
asynchronous, to describe their
multiplexing of short-lived transactions. Really, though, the
classic forking and threading servers we met earlier are
asynchronous, too, as they do not wait for completion of a given
client’s request. There is a clearer distinction between serial
and parallel servers—the former process one transaction at a
time and the latter do not—and “synchronous” and “asynchronous”
are essentially synonyms for “serial” and “parallel.” By this
definition, forking, threading, and select
loops are three alternative
ways to implement parallel, asynchronous servers.
[47] We’ll see three more getfile
programs before we leave
Internet scripting. The next chapter’s
getfile.py fetches a file with the
higher-level FTP interface instead of using raw socket calls,
and its http-getfile scripts fetch files
over the HTTP protocol. Later, Chapter 15 presents a server-side
getfile.py CGI script that transfers file
contents over the HTTP port in response to a request made in a
web browser client (files are sent as the output of a CGI
script). All four of the download schemes presented in this text
ultimately use sockets, but only the version here makes that use
explicit.