Chapter 17. Internet Client Programming

In an earlier chapter, we took a look at low-level networking communication protocols using sockets. This type of networking is at the heart of most of the client/server protocols which exist on the Internet today. These protocols include those for transferring files (FTP, SCP, etc.), reading Usenet newsgroups (NNTP), sending e-mail (SMTP), and downloading e-mail from a server (POP3, IMAP), etc. These protocols work in a way much like the client/server examples in the earlier chapter on socket programming. The only thing that is different is that now we have taken lower-level protocols like TCP/IP and created newer, more specific protocols on top of it to implement the higher-level services we just described.

17.1 What Are Internet Clients?

Before we take a look at these protocols, we first must ask, “What is an Internet client?” To answer this question, we simplify the Internet to a place where data are exchanged, and this interchange is made up of someone offering a service and a user of such services. You will hear the term “producer-consumer” in some circles (although this phrase is generally reserved for conversations on operating systems). Servers are the producers, providing the services, and clients consume the offered services. For any one particular service, there is usually only one server (process, host, etc.) and more than one consumer. We previously examined the client/server model, and although we do not need to create Internet clients with the low-level socket operations seen earlier, the model is an accurate match.

Here, we will look specifically at three of these Internet protocols—FTP, NNTP, and POP3—and write clients for each. What you should take away afterward are being able to recognize how similar the APIs of all of these protocols are—this is done by design, as keeping interfaces consistent is a worthy cause—and most importantly, the ability to create real clients of these and other Internet protocols. And even though we are only highlighting these three specific protocols, at the end of this chapter, you should feel confident enough to write clients for just about any Internet protocol.

17.2 Transferring Files

17.2.1 File Transfer Internet Protocols

One of the most popular Internet activities is file exchange. It happens all the time. There have been many protocols to transfer files over the Internet, with some of the most popular including the File Transfer Protocol (FTP), the Unix-to-Unix Copy Protocol (UUCP), and of course, the Web’s Hypertext Transfer Protocol (HTTP). We should also include the remote (Unix) file copy command rcp (and now its more secure and flexible cousins scp and rsync).

HTTP, FTP, and scp/rsync are still quite popular today. HTTP is primarily used for Web-based file download and accessing Web services. It generally doesn’t require clients to have a login and/or password on the server host to obtain documents or service. The majority of all HTTP file transfer requests are for Web page retrieval (file downloads).

On the other hand, scp and rsync require a user login on the server host. Clients must be authenticated before file transfers can occur, and files can be sent (upload) or retrieved (download). Finally, we have FTP. Like scp/rsync, FTP can be used for file upload or download; and like scp/rsync, it employs the Unix multi-user concepts of usernames and passwords: FTP clients must use the login/password of existing users. However, FTP also allows anonymous logins. Let us now take a closer look at FTP.

17.2.2 File Transfer Protocol (FTP)

The File Transfer Protocol was developed by the late Jon Postel and Joyce Reynolds in the Internet Request for Comment (RFC) 959 document and published in October 1985. It is primarily used to download publicly accessible files in an anonymous fashion. It can also be used by users to transfer files between two machines, especially in cases where you’re using a Unix system as for file storage or archiving and a desktop or laptop PC for work. Before the Web became popular, FTP was one of the primary methods of transferring files on the Internet, and one of the only ways to download software and/or source code.

As described previously, one must have a login/password for accessing the remote host running the FTP server. The exception is anonymous logins, which are designed for guest downloads. These permit clients who do not have accounts to download files. The server’s administrator must set up an FTP server with anonymous logins in order for these to occur. In these cases, the “login” of an unregistered user is called “anonymous,” and the password is generally the e-mail address of the client. This is akin to a public login and access to directories that were designed for general consumption as opposed to logging in and transferring files as a particular user. The list of available commands via the FTP protocol is also generally more restrictive than that for real users.

The protocol is diagrammed below in Figure 17-1 and works as follows:

  1. Client contacts the FTP server on the remote host
  2. Client logs in with username and password (or “anonymous” and e-mail address)
  3. Client performs various file transfers or information requests
  4. Client completes the transaction by logging out of the remote host and FTP server

Figure 17-1. FTP Clients and Servers on the Internet. The client and server communicate using the FTP protocol on the command or control port while data is transferred using the data port.

image

Of course, this is generally how it works. Sometimes there are circumstances whereby the entire transaction is terminated before it’s completed. These include being disconnected from the network if one of the two hosts crash or because of some other network connectivity issue. For inactive clients, FTP connections will generally time out after 15 minutes (900 seconds) of inactivity.

Under the covers, it is good to know that FTP uses only TCP (see earlier chapter on network programming)—it does not use UDP in any way. Also, FTP may be seen as a more “unusual” example of client/server programming because both the clients and the servers use a pair of sockets for communication: one is the control or command port (port 21), and the other is the data port (sometimes port 20).

We say “sometimes” because there are two FTP modes, Active and Passive, and the server’s data port is only 20 for Active mode. After the server sets up 20 as its data port, it “actively” initiates the connection to the client’s data port. For Passive mode, the server is only responsible for letting the client know where its random data port is, and the client must initiate the data connection. As you can see in this mode, the FTP server is taking a more “passive” role in setting up the data connection. Finally, there is now support for a new Extended Passive Mode to support version 6 Internet Protocol (IPv6) addresses—see RFC 2428.

Python has support for most Internet protocols, including FTP. Other supported client libraries can be found at http://docs.python.org/lib/internet.html. Now let’s take a look at just how easy it is to create an Internet client with Python.

17.2.3 Python and FTP

So how do we write an FTP client using Python? What we just described in the previous section covers it pretty much. The only additional work required is to import the appropriate Python module and make the appropriate calls in Python. So let us review the protocol briefly:

  1. Connect to server
  2. Log in
  3. Make service request(s) (and hopefully get reply[ies])
  4. Quit

When using Python’s FTP support, all you do is import the ftplib module and instantiate the ftplib.FTP class. All FTP activity will be accomplished using your object, i.e., logging in, transferring files, and logging out.

Here is some Python pseudocode:

image

Soon we will look at a real example, but for now, let us familiarize ourselves with methods from the ftplib.FTP class, which you will likely use in your code.

17.2.4 ftplib.FTP Class Methods

We outline the most popular methods in Table 17.1. The list is not comprehensive— see the source code for the class itself for all methods—but the ones presented here are those that make up the “API” for FTP client programming in Python. In other words, you don’t really need to use the others as they are either utility or administrative functions or are used by the API methods later.

Table 17.1. Methods for FTP Objects

image

image

The methods you will most likely use in a normal FTP transaction include login(), cwd(), dir(), pwd(), stor*(), retr*(), and quit(). There are more FTP object methods not listed in the table which you may find useful. Please see the Python documentation for detailed information on FTP objects:

http://python.org/docs/current/lib/ftp-objects.html

17.2.5 Interactive FTP Example

An example of using FTP with Python is so simple to use that you do not even have to write a script. You can just do it all from the interactive interpreter and see the action and output in real time. This is a sample session we did years ago when there was still an FTP server running at python.org:

image

17.2.6 Client Program FTP Example

We mentioned previously that an example script is not even necessary since you can run one interactively and not get lost in any code. We will try anyway. For example, let us say you wanted a piece of code that goes to download the latest copy of Bugzilla from the Mozilla Web site. Example 17.1 is what we came up with. We are attempting an application here, but even so, you can probably run this one interactively, too. Our application uses the FTP library to download the file and built it with some error-checking.

Example 17.1. FTP Download Example (getLatestFTP.py)

This program is used to download the latest version of a file from a Web site. You can tweak it to download your favorite application.

image

It is not automated, however; it is up to you to run it whenever you want to perform the download, or if you are on a Unix-based system, you can set up a “cron” job to automate it for you. Another issue is that it will break if either the file or directory names change.

If no errors occur when we run our script, we get the following output:

image

Line-by-Line Explanation

Lines 1–9

The initial lines of code import the necessary modules (mainly to grab exception objects) and set a few constants.

Lines 11–44

The main() function consists of various steps of operation: create an FTP object and attempt to connect to the FTPs server (lines 12-17) and (return and) quit on any failure. We attempt to login as “anonymous” and bail if it fails (lines 19-25). The next step is to change to the distribution directory (lines 27-33), and finally, we try to download the file (lines 35-44).

On lines 35-36, we pass a callback to retrbinary() that should be executed for every block of binary data downloaded. This is the write() method of a file object we create to write out the local version of the file. We are depending on the Python interpreter to adequately close our file after the transfer is done and not to lose any of our data. Although more convenient, your author usually tries to not use this style, because the programmer should be responsible for freeing resources directly allocated rather than depending on other code. In this case, we should save the open file object to a variable, say loc, and then pass loc.write in the call to ftp.retrbinary(). After the transfer has completed, we would call loc.close(). If for some reason we are not able to save the file, we remove the empty file to avoid cluttering up the file system (line 40). We should put some error-checking around that call to os.unlink(FILE) in case the file does not exist. Finally, to avoid another pair of lines (lines 43-44) that close the FTP connection and return, we use an else clause (lines 35–42).

Lines 46–47

This is the usual idiom for running a standalone script.

17.2.7 Miscellaneous FTP

Python supports both Active and Passive modes. Note, however, that in Python 2.0 and before, Passive mode was off by default; in Python 2.1 and later, it is on by default.

image

Here is a list of typical FTP clients:

Command-line client program: This is where you execute FTP transfers by running an FTP client program such as /bin/ftp, or NcFTP, which allows users to interactively participate in an FTP transaction via the command line.

GUI client program: Similar to a command-line client program except it is a GUI application like WsFTP and Fetch.

Web browser: In addition to using HTTP, most Web browsers (also referred to as a client) can also speak FTP. The first directive in a URL/URI is the protocol, i.e., “http://blahblah.” This tells the browser to use HTTP as a means of transferring data from the given Web site. By changing the protocol, one can make a request using FTP, as in “ftp://blahblah.” It looks pretty much exactly the same as an URL, which uses HTTP. (Of course, the “blahblah” can expand to the expected “host/path?attributes” after the protocol directive “ftp://”. Because of the login requirement, users can add their logins and passwords (in clear text) into their URL, i.e., “ftp://user:passwd@host/path?attr1=val1&attr2=val2...”.

Custom application: A program you write that uses FTP to transfer files. It generally does not allow the user to interact with the server as the application was created for specific purposes.

All four types of clients can be created using Python. We used ftplib above to create our custom application, but you can just as well create an interactive command-line application. On top of that, you can even bring a GUI toolkit such as Tk, wxWidgets, GTK+, Qt, MFC, and even Swing into the mix (by importing their respective Python [or Jython] interface modules) and build a full GUI application on top of your command-line client code. Finally, you can use Python’s urllib module to parse and perform FTP transfers using FTP URLs. At its heart, urllib imports and uses ftplib making urllib another client of ftplib.

FTP is not only useful for downloading client applications to build and/or use, but it can also be helpful in your everyday job if it involves moving files between systems. For example, let us say you are an engineer or a system administrator needing to transfer files. It is an obvious choice to use the scp or rsync commands when crossing the Internet boundary or pushing files to an externally visible server. However, there is a penalty when moving extremely large logs or database files between internal machines on a secure network in that manner: security, encryption, compression/decompression, etc. If what you want to do is just build a simple FTP application that moves files for you quickly during the after-hours, using Python is a great way to do it!

You can read more about FTP in the FTP Protocol Definition/Specification (RFC 959) at ftp://ftp.isi.edu/in-notes/rfc959.txt as well as on the http://www.networksorcery.com/enp/protocol/ftp.htm Web page. Other related RFCs include 2228, 2389, 2428, 2577, 2640, and 4217. To find out more about Python’s FTP support, you can start here: http://python.org/docs/current/lib/module-ftplib.html.

17.3 Network News

17.3.1 Usenet and Newsgroups

The Usenet News System is a global archival “bulletin board.” There are newsgroups for just about any topic, from poems to politics, linguistics to computer languages, software to hardware, planting to cooking, finding or announcing employment opportunities, music and magic, breaking up or finding love. Newsgroups can be general and worldwide or targeted toward a specific geographic region.

The entire system is a large global network of computers that participate in sharing Usenet postings. Once a user uploads a message to his or her local Usenet computer, it will then be propagated to other adjoining Usenet computers, and then to the neighbors of those systems, until it’s gone around the world and everyone has received the posting. Postings will live on Usenet for a finite period of time, either dictated by a Usenet system administrator or the posting itself via an expiration date/time.

Each system has a list of newsgroups that it “subscribes” to and only accepts postings of interest—not all newsgroups may be archived on a server. Usenet news service is dependent on which provider you use. Many are open to the public while others only allow access to specific users, such as paying subscribers, or students of a particular university, etc. A login and password are optional, configurable by the Usenet system administrator. The ability to post or download-only is another parameter configurable by the administrator.

17.3.2 Network News Transfer Protocol (NNTP)

The method by which users can download newsgroup postings or “articles” or perhaps post new articles is called the Network News Transfer Protocol (NNTP). It was authored by Brian Kantor (UC San Diego) and Phil Lapsley (UC Berkeley) in RFC 977, published in February 1986. The protocol has since then been updated in RFC 2980, published in October 2000.

As another example of client/server architecture, NNTP operates in a fashion similar to FTP; however, it is much simpler. Rather than having a whole set of different port numbers for logging in, data, and control, NNTP uses only one standard port for communication, 119. You give the server a request, and it responds appropriately, as shown in Figure 17-2.

Figure 17-2. NNTP Clients and Servers on the Internet. Clients mostly read news but may also post. Articles are then distributed as servers update each other.

image

17.3.3 Python and NNTP

Based on your experience with Python and FTP above, you can probably guess that there is an nntplib and an nntplib.NNTP class that you need to instantiate, and you would be right. As with FTP, all we need to do is to import that Python module and make the appropriate calls in Python. So let us review the protocol briefly:

  1. Connect to server
  2. Log in (if applicable)
  3. Make service request(s)
  4. Quit

Look somewhat familiar? It should, because it’s practically a carbon copy of using the FTP protocol. The only change is that the login step is optional, depending on how an NNTP server is configured.

Here is some Python pseudocode to get started:

image

Typically, once you log in, you will choose a newsgroup of interest and call the group() method. It returns the server reply, a count of the number of articles, the ID of the first and last articles, and superfluously, the group name again. Once you have this information, you will then perform some sort of action such as scroll through and browse articles, download entire postings (headers and body of article), or perhaps post an article.

Before we take a look at a real example, let’s introduce some of the more popular methods of the nntplib.NNTP class.

17.3.4 nntplib.NNTP Class Methods

As in the previous section outlining the ftplib.FTP class methods, we will not show you all methods of nntplib.NNTP, just the ones you need in order to create an NNTP client application.

As with the FTP objects table in the previous segment, there are more NNTP object methods not described here. To avoid clutter, we list only the ones we think you would most likely use. For the rest, we again refer you to the Python Library Reference.

17.3.5 Interactive NNTP Example

Here is an interactive example of how to use Python’s NNTP library. It should look similar to the interactive FTP example. (The e-mail addresses have been changed for privacy reasons.)

When connecting to a group, you get a 5-tuple back from the group() method as described in Table 17.2.

Table 17.2. Methods for NNTP Objects

image

image

17.3.6 Client Program NNTP Example

For our NNTP client example, we are going to try to be more adventurous. It will be similar to the FTP client example in that we are going to download the latest of something—this time it will be the latest article available in the Python language newsgroup, comp.lang.python.

Once we have it, we will display (up to) the first 20 lines in the article, and on top of that, (up to) the first 20 meaningful lines of the article. By that, we mean lines of real data, not quoted text (which begin with “>” or “|”) or even quoted text introductions like “In article <...>, [email protected] wrote:”.

Finally, we are going to do blank lines intelligently. We will display one blank line when we see one in the article, but if there are more than one consecutive blank, we only show the first blank line of the set. Only lines with real data are counted toward the “first 20 lines,” so it is possible to display a maximum of 39 lines of output, 20 real lines of data interleaved with 19 blank ones.

If no errors occur when we run our script, we may see something like this:

image

Example 17.2. NNTP Download Example (getFirstNNTP.py)

This downloads and displays the first “meaningful” (up to 20) lines of the most recently available article in comp.lang.python, the Python newsgroup.

image

image

This output is given the original newsgroup posting, which looks like this:

image

image

Of course, the output will always be different since articles are always being posted. No two executions will result in the same output unless your news server has not been updated with another article since you last ran the script.

Line-by-Line Explanation

Lines 1–9

This application starts with a few import statements and some constants, much like the FTP client example.

Lines 11–40

In the first section, we attempt to connect to the NNTP host server and bail if it tails (lines 13-24). Line 15 is commented out deliberately in case your server requires authentication (with login and password)—if so, uncomment this line and edit it in with line 14. This is followed by trying to load up the specific newsgroup. Again, it will quit if that newsgroup does not exist, is not archived by this server, or if authentication is required (lines 26-40).

Lines 42–55

In the next part we get some headers to display (lines 42-51). The ones that have the most meaning are the author, subject, and date. This data is retrieved and displayed to the user. Each call to the xhdr() method requires us to give the range of articles to extract the headers from. We are only interested in a single message, so the range is “X-X” where X is the last message number.

xhdr() returns a 2-tuple consisting of a server response (rsp) and a list of the headers in the range we specify. Since we are only requesting this information for one message (the last one), we just take the first element of the list (hdr[0]). That data item is a 2-tuple consisting of the article number and the data string. Since we already know the article number (because we give it in our range request), we are only interested in the second item, the data string (hdr[0][1]).

The last part is to download the body of the article itself (lines 53-55). It consists of a call to the body() method, a display the first 20 or fewer meaningful lines (as defined at the beginning of this section), a logout of the server, and complete execution.

Lines 57–80

The core piece of processing is done by the displayFirst20() function (lines 57-80). It takes the set of lines making up the article body and does some preprocessing like setting our counter to 0, creating a generator expression that lazily iterates through our (possibly large) set of lines making up the body, and “pretends” that we have just seen and displayed a blank line (more on this later; lines 59-61). When we strip the line of data, we only remove the trailing whitespace (rstrip()) because leading spaces may be intended lines of Python code.

One criterion we have is that we should not show any quoted text or quoted text introductions. That is what the big if statement is for on lines 65-71 (also include line 64). We do this checking if the line is not blank (line 63). We lowercase the line so that our comparisons are case-insensitive (line 64).

If a line begins with “>” or “|,” it means it is usually a quote. We make an exception for lines that start with “>>>” since it may be an interactive interpreter line, although this does introduce a flaw that a triply-old message (one quoted three times for the fourth responder) is displayed. (One of the exercises at the end of the chapter is to remove this flaw.) Lines that begin with “in article ...”, and/or end with “writes:” or “wrote:”, both with trailing colons ( : ), are also quoted text introductions. We skip all these with the continue statement.

Now to address the blank lines. We want our application to be smart. It should show blank lines as seen in the article, but it should be smart about it. If there is more than one blank line consecutively, only show the first one so the user does not see unnecessarily excessive lines, scrolling useful information off the screen. We should also not count any blank lines in our set of 20 meaningful lines. All of these requirements are taken care of in lines 72-78.

The if statement on line 72 says to only display the line if the last line was not blank, or if the last line was blank but now we have a non-blank line. In other words, if we fall through and we print the current line, it is because it is either a line with data or a blank line as long as the previous line was not blank. Now the other tricky part: if we have a non-blank line, count it and set the lastBlank flag to False since this line was not empty (lines 74-76). Otherwise, we have just seen a blank line so set the flag to True.

Now back to the business on line 61 ... we set the lastBlank flag to True because if the first real (non-introductory or quoted) line of the body is a blank, we do not want to display it ... we want to show the first real data line!

Finally, if we have seen 20 non-blank lines, then we quit and discard the remaining lines (lines 79-80). Otherwise we would have exhausted all the lines and the for loop terminates normally.

17.3.7 Miscellaneous NNTP

You can read more about NNTP in the NNTP Protocol Definition/Specification (RFC 977) at ftp://ftp.isi.edu/in-notes/rfc977.txt as well as on the http://www.networksorcery.com/enp/protocol/nntp.htm Web page. Other related RFCs include 1036 and 2980. To find out more about Python’s NNTP support, you can start here: http://python.org/docs/current/lib/module-nntplib.html.

17.4 Electronic Mail

Electronic mail is both archaic and modern at the same time. For those of us who have been using the Internet since the early days, e-mail seems so “old,” especially compared to newer and more immediate communication mechanisms such as Web-based online chat, instant messaging (IM), and digital telephony, i.e., Voice Over Internet Protocol (VOIP), applications. The next section gives a high-level overview of how e-mail works. If you are already familiar with this and just want to move on to developing e-mail-related clients in Python, skip to the succeeding sections.

Before we take a look at the e-mail infrastructure, have you ever asked yourself what is the exact definition of an e-mail message? Well, according to RFC 2822, “[a] message consists of header fields (collectively called ‘the header of the message’) followed, optionally, by a body.” When we think of e-mail as users, we immediately think of its contents, whether it be a real message or an unsolicited commercial advertisement (aka spam). However, the RFC states that the body itself is optional and that only the headers are required. Imagine that!

17.4.1 E-mail System Components and Protocols

Despite what you may think, electronic mail (e-mail) actually existed before the modern Internet came around. It actually started as a simple message exchange between mainframe users ... note that there wasn’t even any networking involved as they all used the same computer. Then when networking became a reality, it was possible for users on different hosts to exchange messages. This, of course, was a complicated concept as people used different computers, which used different networking protocols. It was not until the early 1980s that message exchange settled on a single de facto standard for moving e-mail around the Internet.

Before we get into the details, let’s first ask ourselves, how does e-mail work? How does a message get from sender to recipient across the vastness of all the computers accessible on the Internet? To put it simply, there are the originating computer (the sender’s message departs from here) and the destination computer (recipient’s mail server). The optimal solution is if the sending machine knows exactly how to reach the receiving host because then it can make a direct connection to deliver the message. However, this is usually not the case.

The sending computer queries to find another intermediate host who can pass the message along its way to the final recipient host. Then that host searches for the next host who is another step closer to the destination. So in between the originating and final destination hosts are any number of machines called “hops.” If you look carefully at the full e-mail headers of any message you receive, you will see a “passport” stamped with all the places your message bounced to before it finally reached you.

To get a clearer picture, let’s take a look at the components of the e-mail system. The foremost component is the message transport agent (MTA). This is a server process running on a mail exchange host which is responsible for the routing, queuing, and sending of e-mail. These represent all the hosts that an e-mail message bounces from beginning at the source host all the way to the final destination host and all hops in between. Thus they are “agents” of “message transport.”

In order for all this to work, MTAs need to know two things: 1) how to find out the next MTA to forward a message to, and 2) how to talk to another MTA. The first is solved by using a domain name service (DNS) lookup to find the MX (Mail eXchange) of the destination domain. This is not necessarily the final recipient, but rather, the next recipient who can eventually get the message to its final destination. Next, how do MTAs forward messages to other MTAs?

17.4.2 Sending E-mail

In order to send e-mail, your mail client must connect to an MTA, and the only language they understand is a communication protocol. The way MTAs communicate with one another is by using a message transport system (MTS). This protocol must be “known” by a pair of MTAs before they can communicate. As we described at the beginning of this section, such communication was dicey and unpredictable in the early days as there were so many different types of computer systems, each running different networking software. With the added complexity that computers were using both networked transmission as well as dial-up modem, delivery times were unpredictable. In fact, this author has had a message not show up until almost nine months after the message was originally sent! How is that for Internet speed? Out of this complexity rose the Simple Mail Transfer Protocol (SMTP) in 1982, one of the foundations of modern e-mail.

SMTP

SMTP was authored by the late Jonathan Postel (ISI) in RFC 821, published in August 1982. The protocol has since been updated in RFC 2821, published in April 2001. Some well-known MTAs that have implemented SMTP include:

Open Source MTAs

• Sendmail

• Postfix

• Exim

• qmail (freely distributed but not Open Source)

Commercial MTAs

• Microsoft Exchange

• Lotus Notes Domino Mail Server

Note that while they have all implemented the minimum SMTP protocol requirements as specified in RFC 2821, most of them, especially the commercial MTAs, have added even more features to their servers, which goes above and beyond the protocol definition.

SMTP is the MTS that is used by most of the MTAs on the Internet for message exchange. It is the protocol used by MTAs to transfer e-mail from (MTA) host to (MTA) host. When you send e-mail, you must connect to an outgoing SMTP server where your mail application acts as an SMTP client. Your SMTP server, therefore, is the first hop for your message.

17.4.3 Python and SMTP

Yes, there is an smtplib and an smtplib.SMTP class to instantiate. Review this familiar story:

  1. Connect to server
  2. Log in (if applicable)
  3. Make service request(s)
  4. Quit

As with NNTP, the login step is optional and only required if the server has SMTP authentication (SMTP-AUTH) enabled. SMTP-AUTH is defined in RFC 2554. And also like NNTP, speaking SMTP only requires communicating with one port on the server; this time, it’s port 25.

Here is some Python pseudocode to get started:

image

Before we take a look at a real example, let’s introduce some of the more popular methods of the smtplib.SMTP class.

17.4.4 smtplib.SMTP Class Methods

As in the previous section outlining the smtplib.SMTP class methods, we won’t show you all methods, just the ones you need in order to create an SMTP client application. For most e-mail sending applications, only two are required: sendmail() and quit().

All arguments to sendmail() should conform to RFC 2822, i.e., e-mail addresses must be properly formatted, and the message body should have appropriate leading headers and contain lines that must be delimited by carriage-return and NEWLINE pairs.

Note that an actual message body is not required. According to RFC 2822, “[the] only required header fields are the origination date field and the originator address field(s),” i.e., “Date:” and “From:”: (MAIL FROM, RCPT TO, DATA).

There are a few more methods not described here, but they are not normally required to send an e-mail message. Please see the Python documentation for information on all the SMTP object methods.

Table 17.3. Methods for SMTP Objects

image

17.4.5 Interactive SMTP Example

Once again, we present an interactive example:

image

image

17.4.6 Miscellaneous SMTP

You can read more about SMTP in the SMTP Protocol Definition/Specification (RFC 2821) at ftp://ftp.isi.edu/in-notes/rfc2821.txt as well as on the http://www.networksorcery.com/enp/protocol/smtp.htm Web page. To find out more about Python’s SMTP support, you can start here: http://python.org/docs/current/lib/module-smtplib.html

One of the more important aspects of e-mail which we have not discussed yet is how to properly format Internet addresses as well as e-mail messages themselves. This information is detailed in the Internet Message Format RFC, 2822, and can be downloaded at ftp://ftp.isi.edu/in-notes/rfc2822.txt.

17.4.7 Receiving E-mail

Back in the day, communicating by e-mail on the Internet was relegated to university students, researchers, and employees of private industry and commercial corporations. Desktop computers were predominantly still Unix-based workstations. Home users just dialed-up on PCs and really didn’t use e-mail. When the Internet began to explode in the mid-1990s, e-mail came home to everyone.

Because it was not feasible for home users to have workstations in their dens running SMTP, a new type of system had to be devised to leave e-mail on an incoming mail host while periodically downloading mail for offline reading. Such a system consists of both a new application and a new protocol to communicate with the mail server.

The application, which runs on a home computer, is called a mail user agent (MUA). An MUA will download mail from a server, perhaps automatically deleting it in the process (or not, leaving the mail on the server to be deleted manually by the user). However, an MUA must also be able to send mail ... in other words, it should also be able to speak SMTP to communicate directly to an MTA when sending mail. We have already seen this type of client, in the previous section when we looked at SMTP. How about downloading mail then?

17.4.8 POP and IMAP

The first protocol developed for downloading was the Post Office Protocol. As stated in the original RFC document, RFC 918 published in October 1984, “The intent of the Post Office Protocol (POP) is to allow a user’s workstation to access mail from a mailbox server. It is expected that mail will be posted from the workstation to the mailbox server via the Simple Mail Transfer Protocol (SMTP).” The most recent version of POP is version 3, otherwise known as POP3. POP3, defined in RFC 1939, is still widely used today, and is the basis of our example client below.

Another protocol came a few years after POP, known as the Interactive Mail Access Protocol, or IMAP. The first version was experimental, and it was not until version 2 that its RFC was published, RFC 1064 in July 1988. The current version of IMAP in use today is IMAP4rev1, and it, too, is widely used. In fact, Microsoft Exchange, one of the predominant mail servers in the world today, uses IMAP as its download mechanism. The IMAP4rev1 protocol definition is spelled out in RFC 3501, published in March 2003. The intent of IMAP is to provide a more complete solution to the problem; however, it is more complex than POP. Further discussion of IMAP is beyond the scope of the remainder of this chapter. We refer the interested reader to the aforementioned RFC documents. The diagram in Figure 17-3 illustrates this complex system we know simply as e-mail.

Figure 17-3. E-Mail Senders and Recipients on the Internet. Clients download and send mail via their MUAs, which talk to their corresponding MTAs. E-Mail “hops” from MTA to MTA until it reaches the correct destination.

image

17.4.9 Python and POP3

No surprises here: import poplib and instantiate the poplib.POP3 class; the standard conversation is as expected:

  1. Connect to server
  2. Log in
  3. Make service request(s)
  4. Quit

And the expected Python pseudocode:

image

Before we take a look at a real example, we should mention that there is also a poplib.POP3_SSL class which will perform mail transfer over a secure connection, provided the appropriate credentials are supplied. Let’s take a look at an interactive example as well as introduce the basic methods of the poplib.POP3 class.

17.4.10 Interactive POP3 Example

Here is an interactive example of using Python’s poplib:

image

image

17.4.11 poplib.POP3 Class Methods

The POP3 class has numerous methods to help you download and manage your inbox offline. The most widely used ones are included in Table 17.4.

Table 17.4. Methods for POP3 Objects

image

When logging in, the user() method not only sends the login name to the server, but it also awaits the reply indicating the server is waiting for user’s password. If pass_() fails due to authentication issues, the exception raised is poplib.error_proto. If it is successful, it gets back a positive reply, e.g., ‘+OK ready’, and the mailbox on the server is locked until quit() is called.

For the list() method, the msg_list is of the form [‘msgnum msgsiz’,...] where msgnum and msgsiz are the message number and message sizes, respectively, of each message.

There are a few other methods not listed here. For the full details, check out the documentation for poplib in the Python Library Reference.

17.4.12 Client Program SMTP and POP3 Example

The example below shows how to use both SMTP and POP3 to create a client that both receives and downloads e-mail as well as one that uploads and sends e-mail. What we are going to do is send an e-mail message to ourselves (or some test account) via SMTP, wait for a bit—we arbitrarily chose ten seconds—and then use POP3 to download our message and assert that the messages are identical. Our operation will be a success if the program completes silently, meaning that there should be no output or any errors.

Example 17.3. SMTP and POP3 Example (myMail.py)

This script sends a test e-mail message to the destination address (via the outgoing/SMTP mail server) and retrieves it immediately from the (incoming mail/ POP) server. You must change the server names and e-mail addresses to make it work properly.

image


Line-by-Line Explanation

Lines 1–8

This application starts with a few import statements and some constants, much like the other examples in this chapter. The constants here are the outgoing (SMTP) and incoming (POP3) mail servers.

Lines 10–14

These lines represent the preparation of the message contents. We have some mail headers followed by three lines for the message body. The From and To headers represent the message sender and recipient(s). Line 14 puts everything together into a sendable message of headers followed by a message body, all delimited by the RFC 2822-required line delimiters with a blank line separating the two sections.

Lines 16–21

We connect to the outgoing (SMTP) server and send our message. There is another pair of From and To addresses here. These are the “real” e-mail addresses, or the envelope sender and recipient(s). The recipient field should be an iterable. If a string is passed in, it will be transformed into a list of one element. For unsolicited spam e-mail, there is usually a discrepancy between the message headers and the envelope headers.

The third argument to sendmail() is the e-mail message itself. Once it has returned, we log out of the SMTP server and check that no errors have occurred. Then we give the servers some time to send and receive the message.

Lines 23–30

The final part of our application downloads the just-sent message and asserts that both it and the received messages are identical. A connection is made to the POP3 server with a username and password. After successful login, a stat() call is made to get a list of available messages. The first message is chosen ([0]), and retr() is told to download it.

We look for the blank line separating the headers and message, discard the headers, and compare the original message body with the incoming message body. If they are identical, nothing is displayed and the program ends successfully. Otherwise, an assertion is made.

Due to the numerous errors, we left out all the error-checking for this script so that it is easy on the eyes. One of the exercises at the end of the chapter is to add the error-checking.

Now you have a very good idea of how sending and receiving e-mail works in today’s environment. If you wish to continue exploring this realm of programming expertise, see the next section for other e-mail-related Python modules, which will prove valuable in application development.

17.5 Related Modules

One of Python’s greatest assets is the strength of its networking support in the standard library, particularly those oriented toward Internet protocols and client development. Listed below are related modules, first focusing on electronic mail followed by Internet protocols in general.

17.5.1 E-mail

Python features numerous e-mail modules and packages to help you with building an application. Some of them are listed in Table 17.5.

Table 17.5. E-Mail-Related Modules

image

17.5.2 Other Internet Protocols

Table 17.6. Internet Protocol-Related Modules

image

17.6 Exercises

FTP

17-1. Simple FTP Client. Given the FTP examples from this chapter, write a small FTP client program that goes to your favorite Web sites and downloads the latest versions of the applications you use. This may be a script that you run every few months to make sure you’re using the “latest and greatest.” You should probably keep some sort of table with FTP location, login, and password for your convenience.

17-2. Simple FTP Client and Pattern-Matching. Use your solution to the previous exercise as a starting point for creating another simple FTP client that either pushes or pulls a set of files from a remote host using patterns. For example, if you want to move a set of Python or PDF files from one host to another, allow users to enter “*.py” or “doc*.pdf” and only transfer those files whose names match.

17-3. Smart FTP Command-Line Client. Create a command-line FTP application similar to the vanilla Unix /bin/ftp program, however, make it a “better FTP client,” meaning it should have additional useful features. You can take a look at the ncFTP application as motivation. It can be found at http://ncftp.com. For example, it has the following features: history, bookmarks (saving FTP locations with log in and password), download progress, etc. You may have to implement readline functionality for history and curses for screen control.

17-4. FTP and Multithreading. Create an FTP client that uses Python threads to download files. You can either upgrade your existing Smart FTP client as in the previous problem, or just write a more simple client to download files. This can be either a command-line program where you enter multiple files as arguments to the program, or a GUI where you let the user select 1+ file(s) to transfer. Extra credit: Allow patterns, i.e., *.exe. Use individual threads to download each file.

17-5. FTP and GUI. Take your smart FTP client developed above and add a GUI layer on top of it to form a complete FTP application. You may choose from any of the modern Python GUI toolkits.

17-6. Subclassing. Derive ftplib.FTP and make a new class FTP2 where you do not need to give “STOR filename” and “RETR filename” commands with all four (4) retr*() and stor*() methods ... you only need to pass in the filename. You may choose to either override the existing methods or create new ones with a ‘2’ suffix, i.e., retrlines2().

The file Tools/scripts/ftpmirror.py in the Python source distribution is a script that can mirror FTP sites, or portions thereof, using the ftplib module. It can be used as an extended example that applies to this module. The next five problems feature creating solutions that revolve around code like ftpmirror.py. You may use code in ftpmirror.py or implement your own solution with its code as your motivation.

17-7. Recursion. The ftpmirror.py script copies a remote directory recursively. Create a simpler FTP client in the spirit of ftpmirror.py but one that does not recurse by default. Create an “-r” option that tells the application to recursively copy subdirectories to the local filesystem.

17-8. Pattern-Matching. The ftpmirror.py script has an “-s” option that lets users skip files that match the given pattern, i.e., “.exe.” Create your own simpler FTP client or update your solution to the previous exercise so that it lets the user supply a pattern and only copy those files matching that pattern. Use your solution to an earlier problem above as a starting point.

17-9. Recursion and Pattern-Matching. Create an FTP client that integrates both of the previous exercises.

17-10. Recursion and ZIP files. This problem is similar to the first recursion exercise above—instead of copying the remote files to the local filesystem, either update your existing FTP client or create a new one to download remote files and compress them into a ZIP (or TGZ or BZ2) file. This “-z” option allows your users to back up an FTP site in an automated manner.

17-11. Kitchen Sink. Implement a single, final, all-encompassing FTP application that has all the solutions to the exercises above, i.e., “-r”, “-s”, and “-z” options.

NNTP

17-12. Introduction to NNTP. Change Example 17.2 (getLatestNNTP.py) so that instead of the most recent article, it displays the first available article meaningfully.

17-13. Improving Code. Fix the flaw in getLatestNNTP.py where triple-quoted lines show up in the output. This is because we want to display Python interactive interpreter lines but not triple-quoted text. Solve this problem by checking whether the stuff that comes after the “>>>” is real Python code. If so, display it as a line of data; if not, do not display this quoted text. Extra credit: Use your solution to solve another minor problem: leading whitespace is not stripped from the body because it may represent indented Python code. If it really is code, display it; otherwise, it is text so lstrip() that before displaying.

17-14. Finding Articles. Create an NNTP client application that lets the user log in and choose a newsgroup of interest. Once that has been accomplished, prompt the user for keywords to search article Subject lines for. Bring up the list of articles that match the requirement and display them to the user. The user should then be allowed to choose an article to read from that list—display them and provide simple navigation like pagination, etc. If no search field is entered, bring up all current articles.

17-15. Searching Bodies. Upgrade your solution to the previous problem by searching both Subject lines and article bodies. Allow for AND or OR searching of keywords. Also allow for AND or OR searching of Subject lines and article bodies, i.e., keyword(s) must be in Subject lines only, article bodies only, either, or both.

17-16. Threaded Newsreader. This doesn’t mean write a multithreaded newsreader—it means organize related postings into “article threads.” In other words, group related articles together, independent of when the individual articles were posted. All the articles belonging to individual threads should be listed chronologically though. Allow the user to:

(a) select individual articles (bodies) to view, then have the option to go back to the list view or to previous or next article either sequentially or related to the current thread.

(b) allow replies to threads, option to copy and quote previous article, reply to the entire newsgroup via another post. Extra credit: Allow personal reply to individual via e-mail.

(c) permanently delete threads—no future related articles should show up in the article list. For this, you will have to temporarily keep a persistent list of deleted threads so that they don’t show up again. You can assume a thread is dead if no one posts an article with the same Subject line after several months.

17-17. GUI Newsreader. Similar to an FTP exercise above, choose a Python GUI toolkit to implement a complete standalone GUI newsreader application.

17-18. Refactoring. Like ftpmirror.py for FTP, there is a demo script for NNTP: Demo/scripts/newslist.py. Run it. This script was written a long time ago and can use a facelift. For this exercise, you are to refactor this program using features of the latest versions of Python as well as your developing skills in Python to perform the same task but run and complete in less time. This can include using list comprehensions or generator expressions, using smarter string concatenation, not calling unnecessary functions, etc.

17-19. Caching. Another problem with newslist.py is that, according to its author, “I should really keep a list of ignored empty groups and re-check them for articles on every run, but I haven’t got around to it yet.” Make this improvement a reality. You may use the default version as-is or your newly improved one from the previous exercise.

E-MAIL

17-20. Identifiers. The POP3 method pass_() is used to send the password to the server after giving it the login name using login(). Can you give any reasons why you believe this method was named with a trailing underscore, i.e., “pass_()”, instead of just plain old “pass()”?

17-21. IMAP. Now that you are familiar with how POP works, your experience will help you with an IMAP client. Study the IMAP protocol RFC document, and use the Python imaplib module to help you.

The next set of exercises deal with the myMail.py application found in this chapter (Example 17.3).

17-22. E-mail Headers. In myMail.py, the last few lines compared the originally sent body with the body in the received e-mail. Create similar code to assert the original headers. Hint: Ignore newly added headers.

17-23. Error Checking. Add SMTP and POP3 error-checking.

17-24. SMTP and IMAP. Take our simple myMail.py, and added support for IMAP. Extra credit: Support both mail download protocols, letting the user choose which to use.

17-25. E-mail Composition. Further develop your solution to the previous problem by giving the users of your application the ability to compose and send e-mail.

17-26. E-mail Application. Further develop your e-mail application, turning it into something more useful by adding in mailbox management. Your application should be able to read in the current set of e-mail messages in a user’s imbeds and display their Subject lines. Users should be able to select messages to view. Extra credit: Add support to view attachments via external applications.

17-27. GUI. Add a GUI layer on top of your solution to the previous problem to make it practically a full e-mail application.

17-28. Elements of SPAM. Unsolicited junk e-mail, or spam, is a very real and significant problem today. There are many good solutions out there, validating this market. We do not want you to (necessarily) reinvent the wheel but we would like you to get a taste of some of the elements of spam.

(a) “mboxformat. Before we can get started, we should convert any e-mail messages you want to work on to a common format, such as the “mbox” format. (There are others that you can use if you prefer. Once you have several (or all) work messages in mbox format, merge them all into a single file. Hint: see the mailbox module and email package.

(b) Headers. Most of the clues of spam lie in the e-mail headers. (You may wish to use the email package or parse them manually yourself.) Write code that answers questions such as:

– What e-mail client appears to have originated this message? (Check out the X-Mailer header.)

– Is the message ID (Message-ID header) format valid?

– Are there domain name mismatches between the From, Received, and perhaps Return-Path headers? What about domain name and IP address mismatches? Is there an X-Authentication-Warning header? If so, what does it report?

(c) Information Servers. Based on an IP address or domain, servers such as WHOIS, SenderBase.org, etc., may be able to help you identify the location where a piece of bulk e-mail originated. Find one or more of these services and build code to the find the country of origin, and optionally the city, network owner name, contact info, etc.

(d) Keywords. Certain words keep popping up in spam. You have no doubt seen them before, and in all of their variations, including using a number resembling a letter, capitalizing random letters, etc. Build a list of frequent words that you have seen definitely tied to spam, and quarantine such messages as possible spam. Extra credit: Develop an algorithm or add keyword variations to spot such trickery in messages.

(e) Phishing. These spam messages attempt to disguise themselves as valid e-mail from major banking institutions or well-known Internet Web sites. They contain links that lure readers to Web sites in an attempt to harvest private and extremely sensitive information such as login names, passwords, and credit card numbers. These fakers do a pretty good job of giving their fraudulent messages an accurate look-and-feel. However, they cannot hide the fact that the actual link that they direct users to does not belong to the company they are masquerading as. Many of them are obvious giveaways, i.e., horrible-looking domain names, raw IP addresses, and even IP addresses in 32-bit integer format rather than in octets. Develop code that can determine whether e-mail that looks like official communication is real or bogus.

Miscellaneous

A list of various Internet protocols, including the three highlighted in this chapter, can be found at http://www.networksorcery.com/enp/topic/ipsuite.htm# Application%20layer%20protocols. A list of specific Internet protocols supported by Python (currently), can be found at http://docs.python.org/lib/internet.html

17-29. Developing Alternate Internet Clients. Now that you have seen four examples of how Python can help you develop Internet clients, choose another protocol with client support in a Python standard library module and write a client application for it.

17-30. *Developing New Internet Clients. Much more difficult: find an uncommon or upcoming protocol without Python support and implement it. Be serious enough that you will consider writing and submitting a PEP to have your module included in the standard library distribution of a future Python release.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset