CHAPTER 5
Networking Appliances

All the appliances described here are the building blocks of modern data centers. They enable both the establishment of network boundaries and the deployment of applications. In both physical and virtual form, they also enable Cloud Computing inasmuch as networking is its essential component.

The first critical network appliance, described in Section 5.1, is the Domain Name System (DNS) server. To access any resource on the Internet (a web page, a mailbox, a telephone to receive a call), one ultimately needs to specify an IP address of the resource. An application, however, is not supposed to know it (and rarely does). Instead, it uses a resource locator—a character string specified according to the application-layer naming scheme. The locator is then translated by the DNS into an associated IP address in real time. A few benefits of such an arrangement, beyond supporting names that are easy for humans to remember, is that it supports resource mobility and can also be utilized, in support of elasticity, for load balancing—that is, the distribution of effort among several servers that perform identical operations. But, the translation service is not the only benefit of DNS—it is also used for service discovery, which makes the DNS infrastructure particularly critical for networking and Cloud Computing.

The second appliance described in this chapter is called a firewall. Firewalls are major implements of security—particularly when it comes to network isolation and protection.

Third comes the discussion of the Network Address Translation (NAT) appliance—one that is as controversial as it is necessary, what with the fast-depleting IPv4 address space. Although NATs are almost exclusively implemented in firewalls, they are different in their function.

We conclude the chapter with a brief review of the load balancer appliance, which in large part enables elasticity of the Cloud. Inasmuch as the space of this manuscript allows, we review different flavors of load balancing, finishing with the load balancing performed solely by DNS servers.

5.1 Domain Name System

The major motivation for developing DNS came from e-mail, or rather its particular implementation. Even though e-mail has existed as a service in one from or another since the mid-1960s it was Ray Tomlinson1 who, while developing the e-mail package for the TENEX operating system, defined a resource locator of the form <user>@<host>, in which the “@” character separates the ASCII-encoded user name and host name. This form of addressing became standard in the ARPANET in 1972, when e-mail support was incorporated into the File Transfer Protocol (FTP), as codified in RFC 385.2

Of course, for e-mail to function, it was essential that the host name be translated into a proper IP address. Each host kept a copy of the translation file, which apparently was not much of a problem at the time: in 1971, there were only 23 hosts on the ARPANET. But two years later, in 1973, Peter Deutsch wrote in RFC 606:3 “Now that we finally have an official list of host names, it seems about time to put an end to the absurd situation where each site on the network must maintain a different, generally out-of-date, host list for the use of its own operating system or user programs.”

Indeed, that “absurd situation” came to an end. After several RFCs written in response to RFC 606,4 a one-pager RFC 625,5 published on March 7, 1974, confirmed that the Stanford Research Institute (now SRI Corporation6) will keep the master copy of the host translation file, called HOST.txt, which will be made available to other hosts. RFC 625 insisted that (1) the file be maintained in the ASCII text form and (2) the FTP protocol be used to access the file.

Seven years later, that solution started to look absurd, too, but for a very different reason. The culprit was the e-mail addressing system, now standardized, in which the sender had to spell out the whole relaying path to the receiver.7 Not only was this process cumbersome, but it was technically impossible for any human to find the right path when a given path failed.

DNS history was made on January 11, 1982, when 22 Internet engineers participated in a meeting called by Jon Postel8 “to discuss a few specific issues in text mail systems for the ARPA Internet.” RFC 805,9 from which the above quotation is taken, is a remarkable record of the discussion and decisions. The first specific issue mentioned was relaying of e-mail, and the RFC listed several proposals, noting that “One of the interesting ideas that emerged from this discussion was that the ‘user@host’ model of a mailbox identifier should, in principle, be replaced by a ‘unique-id@location-id’ model, where the unique-id would be a globally unique id for this mailbox (independent of location) and the location-id would be advice about where to find the mailbox.” This idea was not pursued because “… it was recognized that the ‘user@host’ model was well established and … so many different elaborations of the ‘user’ field were already in use.”10

There was agreement, however, “that the current ‘user@host’ mailbox identifier should be extended to [email protected], where ‘domain’ could be a hierarchy of domains. In particular, the ‘host’ field would become a ‘location’ field and the structure would read (left to right) from the most specific to the most general.” It was the job of specialized databases—name servers—to store the information related to domains and provide it when queried. The most remarkable part of this vision was (and, in our opinion, still is) the concept of separation of a service from the host that provides that service. Ultimately, this enabled the implementation of the World-Wide Web 25 years later, followed by that of IP telephony, IPTV, and, ultimately, the Cloud. With only e-mail as a solid reference point, envisioning a system that would support a multitude of abstract services is an act that fits the definition of a genius in the narrative of Vladimir Nabokov's novel, Gift [1]: “Genius is an African who dreams up snow.” RFC 882 says: “We should be able to use names to retrieve host addresses, mailbox data, and other as yet undetermined information.”

With respect to the envisioned queries, three separate services were identified, but the agreement was that only one of these services—the one that translated the hostname location-id into the respective IP address—was essential at the moment. It was also recognized that the name servers could return other information (such as that for mail procedures employed at the host), although this information could be obtained in other ways, too.

A critical architecture decision was made on the issue of central vs. distributed implementation of the name query: “It is recognized that having separate servers for each domain has administrative and maintenance advantages, but that a central server may be a useful first step. It is also recognized that each distinct database should be replicated a few times and be available from distinct servers for robust and reliable service.” It was in this document that the name recursive server first appeared, in the context of the following example:

“Suppose that the new mailbox specification is of the form [email protected]. e.g., [email protected]. A source host sending mail to this address first queries a name server for the domain IN (giving the whole location ‘F.ISI.IN’). The result of the query is either (1) the final address of the destination host (F.ISI), or (2) the address of a name server for ISI, or (3) the address of a forwarder for ISI. In cases 1 and 3, the source host sends the mail to the address returned. In case 2, the source host queries the ISI name server and … (recursive call to this paragraph).”

With a few major requirements for the system coming out of that meeting, its design principles and implementation considerations were respectively laid out in RFC 88211 and RFC 883,12 both authored by Paul Mockapetris, who took on the task of designing DNS. Both documents13 were published in November 1983, and at the same time Jon Postel published the project management document, RFC 881,14 with the plan to both complete the rollout of DNS in ARPANET and discontinue the maintenance of the HOST.TXT file a year after—in December 1984.

The first issue of DNS software, Berkeley Internet Domain Name (BIND), was released by the University of California at Berkeley. Later versions of BIND were written at Digital Equipment Corporation, and, finally, the development of BIND moved into the Internet Systems Consortium.15 It is now an open-source project.

In the years that followed, the standards have evolved along with their implementations, but the changes were incremental, except for one major change concerned with security. Indeed, security was neither a requirement nor a concern initially—scalability was. Hence, security was an add-on, unfortunately, and it was specified as a reaction after a serious vulnerability was exploited. A security solution, called DNSSEC (which we will discuss in more detail later), had been published by the IETF in 2005, although there were serious problems in the way of its deployment. DNSSEC is also only a partial solution. As we will see, a number of threats can be mitigated with firewalls, but much still depends—and will depend—on consistent implementations of security practices by both network providers and their customers. To this end, Cloud can help implement and enforce security policies consistently.

The rest of this section explains the DNS architecture and operation, and then—as an important aside—also introduces the top-level domain classification and the issue of internationalization of domain names. The discussion of security issues and DNSSEC is in the last section.

5.1.1 Architecture and Protocol

The components of the DNS architecture are depicted in Figure 5.1. The component that is closest to the end user is the resolver, which the application process queries. The resolver is expected to be (and typically is) part of an operating system, and therefore no protocol between the application process and the resolver is defined for the interface between the two. It is rather a matter of an application program interface.

Diagram shows the components of DNS architecture; application process, resolver, cache, records, name server, root, and records.

Figure 5.1 DNS components.

A resolver, in turn, gets the information the process needs by querying one or more name servers. The responses are stored in the local cache, as it is natural to expect that there be more than one reference to resources in a particular domain. The cache keeps the information for a time not to exceed the value of the Time-To-Live (TTL) parameter of the record it stores. When the query arrives at the resolver, it first checks if it has the record in the cache. If not, it starts querying the name servers using the DNS protocol (which, by the way, operates over UDP, using standard port 5016).

The name servers implement the domain name space and they contain the resource records associated with the names. The DNS has been designed for the exponential growth of the name space in time; consequently, the domain name space is implemented as a tree structure. As RFC 1034 puts it, “Conceptually, each node and leaf of the domain name space tree names a set of information, and query operations are attempts to extract specific types of information from a particular set.”

Figure 5.2 elucidates this model further. A name server is responsible for a subset of the domain space, called a zone, and this name server is called an authority (or an authoritative name server) for that zone. In addition to providing zone-related information, a name server provides addresses to other name servers—outside its zone.

Tree diagram shows a set of subsets of a particular domain space. The authoritative server record, a master and a set of slaves.  The master automatically updates the slave name servers in a zone. The authoritative Server Record acts as the name server.

Figure 5.2 The domain name space tree.

A zone administrator can create other zones, thus establishing another branch in the tree. According to RFC 1034, “the database is partitioned at points where a particular organization wants to take over control of a sub-tree. Once an organization controls its own zone it can unilaterally change the data in the zone, grow new tree sections connected to the zone, delete existing nodes, or delegate new subzones under its zone.”

To ensure reliability, redundancy is introduced by keeping several identical slave name servers in the zone, which are automatically updated by the master server.

The syntax of the domain name, which is an alphanumeric string, corresponds strictly to the domain name space tree in that a locator of every resource spells out the full path from the root of the tree. Each node of the tree has a label (up to 63 bytes in length). Note that the root label is of length zero—it is an empty string. The domain name is a sequence of labels spelled—by a long-established convention stemming from early e-mail addressing—from right to the left. Hence the rightmost label always corresponds to the top-level domain.

Let us look at an example, www.stevens.edu, and ask what exactly the label of the top-level domain is here. (No, it is not “edu”!) Actually, it is an empty string—as it is supposed to be, by definition, but the above address is technically misspelled, even though existing browsers would make up for that. In other words, the correct (or fully qualified) name would have the dot (.) on the right—as in ‘www.stevens.edu.’ The dot is almost invariably omitted now (it looks a bit redundant, and it is), but in the early days this omission was of concern for a potential security problem raised by RFC 1535.17

Labels are case-insensitive.18 Thus, www.StEVenS.eDU is resolved the same way as www.stevens.edu. By definition, a hostname is a domain name that has at least one IP address associated with it. Not every domain name is a hostname (e.g., neither .com nor .edu is). With the domain name internationalization, the way in which domain names are expressed has grown to adapt to multiple languages and their respective alphabets.

We have mentioned the client side of DNS so far only in the context of the resolver, which keeps querying the servers19 until it finds the address. But the need to smooth the network traffic—especially in the presence of short, simple queries—demanded the introduction of a name server that can operate in the “recursive mode” (that is, itself act as a client, querying multiple servers before returning an answer). A recursive name server (depicted in Figure 5.3) also caches answers. Unfortunately, both cache-keeping and open recursive service provision have become enablers of security attacks, which we will review later in this chapter.

Diagram shows the components of a recursive name server, where a name server acts as a client between different clients and answer their queries. A recursive name server caches the answers given to multiple servers.

Figure 5.3 A recursive name server.

The default (and mandatory-to-implement) mode for a name server is non-recursive. In this mode, it answers queries using only local information. It responds by providing either (a) an answer, (b) an error, or (c) a referral to another server. To use the recursive mode, both clients and the server must agree to it.

To demystify the DNS operation, we will introduce the basics of the DNS protocol.

Figure 5.4 presents the common format of both DNS queries and DNS responses. Aligned on the 16-bit boundary are the main sections of the protocol data unit.

Diagram shows the format commonly used for DNS queries and DNS responses. The format lies under a 16-bit format with five different sections; header, questions, answer, authority and additional. The header contains an ID, QR bit, opcode, AA, TC, RD, RA, Z, RCCODE, QDCOUNT, ANCOUNT, NSCOUNT and ARCOUNT. Question section contains QNAME, QTYPE, and QCLASS.

Figure 5.4 The DNS query/response format. Source: RFC 1035.

The most involved section is the Header, which contains the:

  • identification (ID) of the unit, so as to correlate replies with outstanding queries;
  • tag bit (QR), which indicates whether the unit is a query or a response;
  • OPCODE, which specifies the query;20
  • flag (AA), which is meaningful only in responses that indicate whether the response is authoritative;
  • truncation flag (set if the message has been truncated because it was longer than permitted by the domain);
  • two flags—Recursion Desired (RD) and Recursion Available (RA), of which the former is used only in a query directing the name server to pursue the query recursively, and the latter comes in responses indicating whether the server actually supports recursive queries—the Z-bit is reserved for future use and expected to be zero;
  • RCODE, which specifies the response code; and
  • QDCOUNT, ANCOUNT, NSCOUNT, and ARCOUNT, integers respectively specifying the number of entries in the subsequent sections (Question, Answer, Authority, and Additional).

The Question section is an array (of size QDCOUNT) of records that contain QNAME (a domain name), QTYPE (more on this when we discuss the resource records), and QCLASS, which for all practical purposes always has a value of IN (for Internet).21

The Answer, Authority, and Additional sections are identical in that each contains a set of Resource Records (RRs).

The RR structure is demonstrated in Figure 5.5, which is for the most part self-explanatory.

Diagram shows the structure of resource records with five different sections; Domain name, class, type, record data, and time to live. Class contains the IN value denotes the type of record, record data has RDLENGTH and RDATA and TTL records the stability of the information in cache.

Figure 5.5 The RR structure. Source: RFC 1035.

The record TYPE is one field, which has been gaining new values. One example is A—for IPv4 addresses, later supplemented by AAAA—for IPv6 addresses. Other examples are Start of Authority (SOA), CERT (for cryptographic certificates), and SERV (for services). The latter, specified by RFC 2782,22 allows us to specify the location of servers performing specified services. This feature can be very helpful both with e-mail and IP telephony, where a resource locator (explained later in this section) can be used to discover a specific domain-offering service.

Playing with the strings may become complex (as all compiler writers know). One problem, depicted in Figure 5.6, is circular dependencies, which arise when a name server returns a name reference that points back to the original domain to be resolved. For this, RFC 1035 postulates that in such cases glue (i.e., actual addresses) be specified in the referral.

Diagram shows the circular dependencies occurring in recursive name servers. If the name given in the delegation is a subdomain of the domain for which the delegation is being provided a circular dependency occurs.

Figure 5.6 Circular dependencies.

The Time-to-Live (TTL) parameter is one interesting example where an early (1983) specification had to be changed. It was found that a 16-bit integer was, too small to specify the appropriate time value in seconds, and so the field grew to 32 bits.

Figure 5.7 provides an example of resolving a query to a recursive name server to find an IP address of the domain www.cs.stevens.edu.

Diagram shows the steps involved to find the IP address of a domain by resolving a query send to a recursive name server; Resolver, Recursive DNS Server, Root Server, and Authoritative DNS Server.

Figure 5.7 A sample name resolution.

The process starts from the root, h.root-servers.net [12A63.2.53]. (The significance of the letter “h” in front of the name is explained in the next section.) The response contains a set of referrals to the servers supporting the “.edu” domain. One such referral record is “edu. 172800 IN NS d.edu-servers.net.” (Note the high value of TTL—the higher the level, the more stable the records are.)

The query to the referred server resolves to the authoritative name servers (later supplemented by their IP addresses as glue) and then the canonical (i.e., primary) domain name record: www.cs.stevens.edu. 3600 IN CNAME tarantula.srcit.stevens-tech.edu, which we encounter for the first time now. This feature is needed to support alias names, as a given domain can provide different services under different names. CNAME merely points to the canonical domain name (in our example, tarantula.srcit.stevens-tech.edu). The next query must have that name. Finally, we get the IP address in the A record (155.246.89.8).

5.1.2 DNS Operation

Based on what we have reviewed so far, it should be clear that running a top-level domain server is a formidable technical task (what with all the databases that need to be replicated across the borders and kept secure at the same time). But technical complexities aside, the mere fact that domain names can be bought and sold has naturally complicated things.

Once DNS was adopted, Jon Postel founded the Internet Assigned Numbers Authority (IANA),23 which has, among its many tasks, administered the root zone. In 1998, IANA became a department of the newly created Internet Corporation for Assigned Names and Numbers (ICANN),24 which performs, among other services under a US government contract: “(1) the coordination of the assignment of technical protocol parameters including the management of the address and routing parameter area (ARPA) top-level domain; (2) the administration of certain responsibilities associated with Internet DNS root zone management such as generic (gTLD) and country code (ccTLD) Top-Level Domains; [and] (3) the allocation of Internet numbering resources …”

The Top-Level Domain (TLD) names correspond to the DNS root zone. The information about all top TLDs is replicated on 13 root servers (or rather root systems, as we will explain shortly) named by the respective letters of the alphabet—“A” through “M.” Each of these servers has a name, starting from this letter, to which the suffix “.root-servers.net” is appended. To initialize the cache of a resolver (or a recursive DNS server), one can obtain a hint file with the appropriate IP addresses from www.internic.net/zones/named.root. The systems, along with their respective operators, and the contact IPv4 and IPv6 addresses, are listed in Figure 5.8.

Table with three columns shows the system names, IPv4 and IPv6 addresses and the corresponding managers of the systems.

Figure 5.8 Root name systems. Source: Internet Assigned Number Authority, www.iana.org

It is a common mistake to think that the servers are single hosts (although this was the case long ago). With one exception,25 they are replicated among different servers around the world, with the total number presently approaching 400. For this reason, the word “server” is a misnomer, and it is much better to refer to a system instead. (The L-system alone has 146 servers covering all continents!) In fact, most root systems operate as networks—they have AS numbers and enter peering agreements. Often, anycast IP addressing is used to find the machine that is the closest to the issuer of the query. There is a fascinating site, www.root-servers.org/, which provides detailed information on the location and operation of the root domain.

There is no master system or master server at the top domain.26 All root systems (or rather, actual hosts within each of them) have the same knowledge about the address records of the name servers authoritative for all existing TLDs. All the hosts are updated from the same set of files maintained by the administering organization, VeriSign Inc. (as a contractor). The DNS root operates under the authority of the National Telecommunications and Information Administration (NTIA), an agency of the US Department of Commerce.

5.1.3 Top-Level Domain Labels

Edu is an example of a TLD, whose use is reserved for accredited colleges and universities. Mil and gov are other examples of top-level domains with strictly controlled registries; these are respectively reserved for the US military and US government agencies. Some domains, such as com or biz (for business), were initially reserved for one purpose (in our example, commercial companies), but ended up being open to all. Ditto for net, which was initially created to designate networks but ended up being used for all kinds of purposes.

The history of the control over top-level domain names became rather tumultuous around 1998—with e-commerce developing and different groups (including a group of IETF engineers and Internet enthusiasts, who thought that the Internet belonged to them) asserting themselves. We have neither the space nor the time to address this history here, suffice it to note that ICANN was created for, among other things, managing TLDs. (The excerpt from the ICANN mission statement cited earlier should become clear now.)

ICANN distinguishes several groups of TLDs. The generic ones (gTLD) include the labels we use most, such as com, net, edu, and so on. The country code TLDs (ccTLDs) are based on the two-character ISO country codes. The infrastructure TLDs (under the name arpa, which the reader may remember from the ICANN mission statement) are used mostly for the reverse look-up of IP addresses. CcTLDs also exist in their respective internationalized versions (which we will discuss shortly), as does a special test domain created for testing internationalization. Finally, RFC 260627 defines domains created for nothing else but testing, self-reference, and reference as examples:28 “… four domain names are reserved as listed and described below …

  • ‘.test’ is recommended for use in testing of current or new DNS related code;
  • ‘.example’ is recommended for use in documentation or as examples;
  • ‘.invalid’ is intended for use in online construction of domain names that are sure to be invalid and which [sic] it is obvious at a glance are invalid;
  • ‘.localhost’ has traditionally been statically defined in host DNS implementations as having an A record pointing to the loop-back IP address and is reserved for such use.”

As we noted before, some gTLDs (such as .com or .net) ended up registering anyone and anything, and so judging by the label alone it is impossible to infer much about the entity registered under the domain. Such gTLDs are called unsponsored, as opposed to sponsored gTLDs (such as .edu, .mil, or .int—the latter reserved for international organizations created by treaties), which have specific communities or interest groups associated with such domains. It is a community for a sponsored gTLD that defines the policies determining eligibility for registration and enforces such policies.

And even then there are problems. An interesting and often-cited example is a gTLD named .xxx, dedicated to pornography and sponsored by the International Foundation for Online Responsibility (IFFOR). The domain was finally approved by ICANN in 2011 after a long controversy (just six years before then, the ICANN Board voted against the approval). The argument for its creation was that if a site is explicitly dedicated to pornography, it is easy to block it when needed (e.g., by parents or a company's IT organization). The counter-argument was that nothing prevented the same material from being redistributed in a non-sponsored site (such as .com). The most interesting fact about this site is that there are at least two examples of registrants whose content has nothing to do with pornography. One is www.kite.xxx, which focuses “100% on kite sports with an eye for product design.” There is a speculation that the site was created tongue-in-cheek merely to point out that there is no restriction on the domain; another speculation is that being present in that gTLD is a good attention-getter. Go Figure …

But even stranger things happen on that domain! According to the March 19, 2012 article in The Register®29, the domain registered as PopeBenedict.xxx was to promote … Islam. Apparently, this was a clear case of cyber-squatting as “A Turkish cyber-squatter has registered at least a dozen variants of Pope Benedict's name as .xxx internet domains …” All domains were offered for sale. The registrar then reserved the names that might deal with religion (such as anglican.xxx, vatican.xxx, or jewish.xxx) so as to prevent their registration.

Things are not quite simple to understand with the ccTLD either. What does it take to register under a particular country name? It turns out that there is no simple answer. Some countries require proof of citizenship for registration (e.g., Albania for TLD .al). Others (e.g., Estonia for .ee or Germany for .de) require only the physical presence in the country of a local administrative contact. Still there are countries, such as Eritrea, that offer no registry services at all. To make things more confusing, countries can sell their TLDs. A well-known example is Tokelau's sale of its domain, .tk, to a Dutch company DOT.TK, which was reported to host spammers by Darren Pauli in his online article.30

An important development in the history of domain names is their internationalization, or the ability of users to employ the scripts of their respective languages. There are two aspects to this: (1) the encoding of a native script so that it can be displayed on a monitor by a browser (as in .中国); (2) the interworking of the script with the actual DNS use.

Support for internationalized scripts on terminals and keyboards has been around since the 1970s. There were various proprietary implementations around—for computer terminals built by various manufacturers—but a standard encoding scheme called Unicode has been developed for that purpose by the Unicode Consortium.31 As for interworking with DNS, apparently the first proposal was published as an IETF Internet Draft in 1996 by M. Dürst32.

Much has been achieved since then, with the major standard for Internationalizing Domain Names in Applications (IDNA) published as RFC 3490.33 The overarching idea is the addition of new features without any modification of the existing infrastructure. In other words, only application programs change—not the DNS system. To this end, the latter uses only ASCII-encoded (not Unicode-encoded) labels.

Figure 5.9 demonstrates the components and illustrates the operation of IDNA.

Diagram shows the components involved in the internationalization of a domain name; User computer, internationalizing domain names in applications or IDNA, resolver and end server. The IDNA converts the domain name in unicode labels to ASCII compatible encoding and vice versa.

Figure 5.9 Domain name internationalization components. Source: RFC 3490.

When the user types in the domain name in whatever language or script he or she chooses, it is the job of the application (typically, the browser) to convert the Unicode labels into ASCII-Compatible-Encoding (ACE) strings, and vice versa. The mapping is specified by Punicode, defined in RFC 349234 in 2003 and later updated. The ToASCII algorithm converts the Unicode-encoded string into a unique ASCII-encoded string, and appends the result to the four-character prefix (“xn–”), so all IDNA labels start with these four characters. Conversely, the ToUnicode algorithm strips off the ACE prefix and converts the rest into a Unicode-encoded string.

It is the browser's job to perform conversions so that the whole DNS system—starting from the resolver—remains unaffected.35 ToASCII is executed on a string received from the user; ToUnicode is executed on a string received from the resolver.

A major problem with internationalization (as is typical for all major computing problems) is security, and the specific problem here is spoofing. Even with plain ASCII, some characters (e.g., “l” and “1”) look alike in certain fonts. This could be exploited: one might create a www.paypa1.com site that looks exactly like the www.paypal.com site and lure (for instance, through e-mail) unsuspecting people to it, who will thus divulge their passwords.36

With internationalization, things get more serious. Can the reader see the difference between www.paypal.com and www.paypal.com? The authors certainly can't. Both strings look exactly the same on paper and on the computer screen, which is exactly the problem—the second domain name actually contains a Cyrillic character “a” in place of the same-looking Latin alphabet character. (The character was entered into this manuscript in Unicode). If we compare the bit strings that correspond to these domain names, we find that they are different. Words that are spelled the same but have different meanings are called homographs. This definition has been broadened in computing to indicate strings that look the same. An ACM Communications article [2] reported on an existing homographic attack and warned of their possibilities in the internationalized domain name system.

Perhaps attacks could be prevented if all browsers forbade the use of combinations of Unicode characters that belong to different alphabets in the same label, but this would be hard to enforce. The solution is for the domain name registrars to forbid the practice of mixing characters from different alphabets by refusing to register domain names that contain such mixes. This suggests that a hierarchy of internationalized domain names be created and placed under the respective authorities.

A separate top-level domain of the NDS was created for internationalization. It is called Internationalized country code Top-Level Domain (IDN ccTLD).

The ICANN Board approved the establishment of an internationalized top-level domain name working group in December 2006. A bit less than three years after that, they started accepting applications for top-level internationalized domain names from representatives of countries and territories. Figure 5.10 provides a few examples of the internationalized country code domain names.

Table shows internationalized country code domain names with their corresponding sponsoring organization such as China, Russia, etc.

Figure 5.10 Examples of internationalized country code top domain names. Source: Internet Assigned Number Authority, www.iana.org

Going back to our example, the (bad) paypal label should be disallowed for registration in all countries that employ the Cyrillic alphabet. (Just a few examples of such countries are Belarus, Bulgaria, Kazakhstan, Mongolia, Russia, Serbia, and Ukraine. Fortunately, everything seems to work, for the corresponding domain www.xn–pypl-53dc.com still does not exist!)

5.1.4 DNS Security

As is the case with the early Internet development, things were built to be used … by those who had built them. The fact that everything worked was both amazing and sufficient. No one suspected that the system would be abused (and, sure enough, in the early days there was no clear financial gain in abusing it either). As a consequence, DNS—on which all applications depended—was designed in particular with no protection against potential attackers in mind. The idea, perhaps utopian, was that the Internet community would always be benevolent. Jon Postel put forward his famous robustness principle in RFC 760 (published back in 1980): “In general, an implementation should be conservative in its sending behavior, and liberal in its receiving behavior.” The last clause means that if a PDU received is badly formed but could be interpreted, it should be accepted rather than rejected. Unfortunately, this magnanimous attitude has been exploited.

The motivation for exploiting DNS vulnerabilities is simple and similar to the motivation for robbing banks: one does it because this is where the money is. If an attacker can somehow fool the recursive name server (or a resolver) into accepting a bad record pointing to the attacker's site (instead of the bank's site), one can learn a good deal about the bank's customer (including his or her password) who visited the attacker's site. Repeating the same trick (which is exactly what software is good at) on many customers and then using the obtained passwords to withdraw money ends up effectively equivalent to robbing the bank.

The major problem with the original DNS design was (and still is) that the records obtained by resolvers or recursive name servers are not authenticated.37 Ultimately, a response is considered to be valid as long as it matches the QueryID, the requestor's port number, and the original query. If a process (called the man-in-the-middle) can intercept a DNS request, it can respond with whatever records it makes up. The originator of the request cannot observe any difference. Creating a man-in-the-middle is not always possible, so attacks went to the next level of sophistication.

With a cache poisoning attack (thoroughly reviewed in [3]), the attacker guesses the QueryID and the port number (we will explain how in a moment) and then returns the “responses” to the popular queries (which need to be guessed, too). As a result, the resolver's cache is “poisoned”—for a long TTL-specified time—with the attacker's IP address. Now, guessing the QueryID is fairly easy as long as these are changed incrementally. (An attacker can keep issuing its own queries to determine it.) The port number can be easy to guess, too, as long as it is not changing. Hence the countermeasure has been to randomize the QueryID (using a pseudo-random number generator) as well as the port numbers, although randomizing the port numbers is not that easy as we will see later in this chapter when discussing NAT boxes. Still, nothing prevents attackers from sending many “responses” with random QueryIDs, and such attacks have been reported to be effective.

Furthermore, as late as 2008—long after the IETF published its DNS security extension standard—a new vulnerability was discovered, where the whole zone could be spoofed. Specifically, the attacker can configure a name server claiming that it is authoritative for a given zone. (There is nothing wrong or dangerous with the ability to do so per se, because the name servers higher in the hierarchy won't point to it.) Then the attacker sends “responses” with authority records, which delegate to the authoritative server for the zone to be spoofed. The name of the authoritative server will actually be correct, but the glue will provide the attacker's IP address instead.

There are remedies against this particular attack, too (and so name servers constantly get patched as a reaction to any given attack), but there are surprisingly straightforward solutions to a broad class of DNS security problems. RFC 383338 provides a comprehensive catalogue of the attacks.

Perhaps the simplest solution is to use an authenticated channel between the client and the name server, so that the origin of each record is clear. To this end, an IPsec channel would do the job.

The other, more generic solution is to authenticate each DNS record so that the client can check its origin. In 1997, the IETF developed a solution and published the first standard (RFC 2065, now obsolete) specifying the DNS Security extension (DNSSEC). The work on the standard continued, with the present set of specifications (DNSSEC-bis) contained in RFCs 4033–4035.

The scheme is to use public-key cryptography and verify the chain of trust top down, starting with the authoritative name server for the root. DNSSEC actually provides not only a record's origin authentication but also its integrity assurance (against any modification en route to the client). To enable this, DNSSEC adds new resource record types:

  1. Resource Record Signature (RRSIG);
  2. DNS Public Key (DNSKEY);
  3. Delegation Signer (DS); and
  4. Next Secure (NSEC).

DNSSEC also modifies the message header by adding new flags. Overall, these modifications end up in much larger DNS response messages, a result that can unfortunately be exploited in the denial-of-service attacks discussed later in this chapter.

The purpose of the new resource record types is described in what follows.

The RRSIG record stores the digital signature for the respective DNS RRset. The situation is somewhat complicated by allowing more than one private key to sign a zone's data (as may be required by different algorithms). It is the job of a security-aware resolver to learn a zone's public keys. This can be achieved by configuring a trust anchor (a public key or its hash that serves as a starting point for building the authentication chain to the DNS response). The security-aware resolver can learn the anchor either from its configuration or in the process of normal DNS resolution. It is for the latter purpose that the DNSKEY RR is introduced. With that, a security-aware resolver authenticates zone information by forming an authentication chain from a new DNSKEY to an already known DNSKEY. For this process to work, the resolver must, of course, be configured with at least one trust anchor.

The DS RR is used for signing delegations across zone boundaries; it contains the public keys of the delegated child zone. Not that complexity always helps security, but RFC 4033 notes that “DNSSEC permits more complex authentication chains, such as additional layers of DNSKEY RRs signing other DNS RRs within a zone.”

In the default mode of operation, the authentication chain is constructed down from the root to the leaf zones, but DNSSEC allows local policies to override the default mode.

So far, we have described how DNSSEC deals with positive responses. In contrast, NSEC RR is used for signing negative responses. RFC 4033 explains: “The NSEC record allows a security-aware resolver to authenticate a negative reply for either name or type non-existence with the same mechanisms used to authenticate other DNS replies … Chains of NSEC records explicitly describe the gaps, or ‘empty space’, between domain names in a zone and list the types of RRsets present at existing names.”

As is true for every chain, a trust chain is as strong as its weakest link, and if this link happens to be at the root, then every chain would be weak. For this reason, the security of the top domain name servers is a major concern. ICANN has resolved this by scripting a ceremony39 for signing the root, where multiple personas with interesting sounding names (such as Ceremony Administrator, Crypto-Officer, Safe Security Controller, Internal Witness, and so on) are choreographed into performing the required steps. These steps include ensuring that the safe for key storage is initially empty,40 bringing the key-generation equipment into the room, generating and signing the key—producing the certificates, backing up the keys on a smart card, and storing the recovery key material (in tamper-proof hardware security modules), which are then placed in various safe-deposit boxes. All of this is done in front of the auditors, and each step is carefully documented in one or another log. Of course, all participants are themselves authenticated through their respective government-issued IDs. Different rooms have different entry permits—not everyone may enter the safe room, for example. Conversely, until the end of the ceremony or a pre-defined break, no one may leave the ceremony room. There are also procedures for annual inventories of the recovery material.

Unfortunately, the DNS system itself can be used in a denial-of-service attack (without breaking its own security), which we will discuss in the next section as part of a family of such attacks.

The importance of DNS to the Internet is hard to overestimate. The notion of “Internet governance” pretty much means the management of the DNS root and the top domain name space. As it happens, some DNS pioneers had a very different idea of Internet governance in the 1990s. The dramatic events that took place then are described in a well-researched monograph [4], whose authors—Professors of Law at Harvard University and Columbia University, respectively—also provide their opinions on the subject of Internet governance.

There has been much controversy surrounding ICANN, but we believe it is important to point out how much ICANN has achieved. To cite [4]: “… [ICANN] has decentralized the sale and distribution of domain names, resulting in a dramatic drop in the price of registration. It has established an effective mechanism for resolving trademark dispute that has diminished the problem of ‘cybersquatting’ … And it has maintained enough stability in the naming and numbering system that people rarely worry about the Internet collapsing.” Who could ask for anything more?

5.2 Firewalls

NIST publication [5] defines a firewall as a program device that controls “… the flow of network traffic between networks or hosts that employ differing security postures.”

There are three aspects to this definition. First, a firewall does not need to be a “box”—a concept that we will elaborate on further when discussing network function virtualization. Second, as Figure 5.11 demonstrates, a firewall may stand between two networks as well as between a single host and a network. (To this end, a firewall may be supplied by an operating system, or—as we will see later—by a hypervisor, in a purely “soft” form.) Third, a firewall may not necessarily stand only between two different networks; different sections (or zones) of a single network—the sections that respectively employ different security policies—must be guarded separately.

Image described by surrounding text.

Figure 5.11 Firewalls: (a) a firewall between two networks; (b) a firewall protecting a single host.

The last point, illustrated in Figure 5.12, is essential. It is commonplace that “a firewall is the first line of defense,” but an important question to ask is: who is the attacker? For instance, an attack41 can come from inside an organization, and the first line of defense here is to guard each part of the organization's network according to its respective policies.

Diagram shows how networks with different security postures are interconnected with firewalls between each layer of interconnection. It includes public and private networks.

Figure 5.12 Interconnecting networks with different security postures.

Firewalls are not there only to keep the “bad traffic” out. As we will see, their other purpose is to prevent inappropriate traffic from leaving the network. The notion of propriety is defined by security policies, and it changes with time and legal requirements. Interestingly, limiting the traffic that leaves a network is not necessarily a matter of secrecy; just as parents are responsible for the bad behavior of their children, so may the organizations be responsible in which malevolent traffic originates.

The early history of firewalls is described in [6]. The firewall technology appeared much later than the data communications technologies reviewed in the earlier chapters of this book—as is the case with most security developments, the improvements in firewalls were reactive. The first firewalls merely separated LANs (and were intentionally built into the routers rather than into the Layer-2 switches so as to terminate traffic broadcast altogether). As [6] observes, these early firewalls were not deployed with security concerns in mind. True to the original meaning of a firewall (the means to prevent the spread of a fire from one room or building to another), the idea was merely to prevent the spread of local network problems, most of which at that time were caused by misconfiguration.

Firewalls as network security engines came to life in the early 1990s. Initially these were routers augmented with the capability to execute filtering rules that restricted certain destination and origin addresses. Meanwhile, research was conducted at Digital Equipment Corporation and AT&T Bell Laboratories on combining packet filtering with application gateways (Figure 5.13). The first commercial firewall is reported to have been sold in 1991.

Diagram shows the structure of an application gateway. The application gateway consists of a security component that augments a firewall or a NAT integrated in a network.

Figure 5.13 An application gateway.

An application gateway does more than its name (a euphemism by all measures!) implies—it examines the network traffic at the application layer while remaining a part of what is effectively a network-layer device and continuing to examine the network and transport-layer datagrams. Of course, the idea is not only to examine traffic, but also to follow up on the examination by restricting it (i.e., dropping suspicious packets). One other—essential—task that started to be performed by firewalls at that time was logging. Information on potential attacks was and remains invaluable.

As far as Internet architecture purists are concerned, the firewalls are an abomination. First of all, the application gateway development was a flagrant violation of the layering principle. Second, it resulted in breaking communications without informing the endpoints, which could not possibly deduce why the traffic between them was lost and therefore would be likely to assume network congestion and act accordingly.42

Yet, the introduction of application-layer firewalls (or application gateways) was widely embraced not only by the network administrators but also by the researchers and architects of the Internet. The authors of [7], one of whom43 was an avid Internet enthusiast—later elected to the Internet Architecture Board and heading the IETF Security Area—wrote in its first (1994) edition: “… We feel that firewalls are an important tool that can minimize the danger, while providing most—but not necessarily all—of the benefits of a network connection.”

Note that the above statement pre-dates the World-Wide Web, e-commerce, the Internet bubble, and the wholesale movement to IP-based networking by telecommunications providers, banks, newspapers, advertisers, and criminals. To understand the threats addressed then, we need to look at Internet applications ca. the late 1980s. These included

  1. file transfer;
  2. e-mail (which, incidentally, transferred pretty much text only—attachments, still encoded in ASCII, became fashionable later);
  3. remote teletype (pure ASCII text) terminal services (via the telnet protocol);
  4. remote login package (including remote shell execution);
  5. name service (via finger protocol)—what would pass as a poor man's presence package; and
  6. Usenet—a bulletin-board discussion service and precursor to Internet fora.

Even with these services, incredibly simplistic and archaic in today's view, there were plenty of security problems. Some were fairly easy to anticipate (unauthorized access to remote computers via remote login), although still considerably hard both to prevent and prosecute. Abuse by hackers was more or less limited to stealing software and occasional e-mails, although the latter did not have as much effect as it does now because at the time intercompany communications were still carried out mostly in the form of printed memoranda, face-to-face meetings, and telephone calls (which were pretty secure when made with plain old telephony).

These seemingly meager (but actually yet unexploited) means of abuse had tremendous effect with the 1988 Internet Worm attack, in which a Cornell University graduate student, who claimed that he had only attempted to determine the size of the Internet, cleverly exploited bugs in the respective implementations of the sendmail and finger protocols in the ubiquitous Berkeley Unix version, as well as the vulnerabilities caused by poor host administration, to infect a huge number of hosts with a program that kept replicating itself. The process running the program was designed to be—and would have been—unnoticed had it not been for the fact that because of a miscalculation the program replicated itself, too many times, starting new processes and eventually choking a huge number of hosts. The attack is described in great detail in [8]. It caused enough of a sensation to warrant US Congress's request to the Government Accountability Agency to investigate the matter. The resulting report [9] makes very interesting reading. It also demystifies the origin of the damage estimates44 floating on the Web. The specific recommendations presented in the report are marked as “not implemented”; however, reaction to the incident has been far-reaching.

One major outcome of the incident was that the Defense Advanced Research Projects Agency charged the Software Engineering Institute (SEI), a federally funded research center at Carnegie Mellon University, with creating the CERT® program45 “to quickly and effectively coordinate communication among experts during security emergencies in order to prevent future incidents and to build awareness of security issues across the internet community.” This program is still very effective in detecting security problems and analyzing product vulnerabilities.

The Internet Worm incident also helped the cause of firewall advocates in vendor companies by hinting that there was going to be a market in security products. This market has surely grown with the advance of the World-Wide Web, as have the attacks on the Internet. Accordingly, different types of firewalls and their respective hybrids have been built to protect new applications and to respond to new threats.

In what follows, we review the motivation (which, in many cases, was to respond to an attack) for introducing a specific firewall technology as well as its brief description. To this end, we discuss

  1. the basics of network perimeter control (including VPNs);
  2. stateless firewalls;
  3. atateful transport-layer firewalls; and
  4. application-layer firewalls.

5.2.1 Network Perimeter Control

It is a well-known practice that each organization that owns a network must have a security policy dictated by the business needs of the organization. The formal expression of such policy and its translation into something on which computers, in general, and firewalls, in particular, can act is the subject of ongoing research and development. For the purposes of this chapter, we merely list the issues that can be translated into the rules to be applied by firewalls.

The first issue is specification of the entities outside the network that can access the resources (for example, web servers, or telephony gateway servers) in the network. For each resource, the sets of respective network entities, users, or user groups for that resource.

The second issue is the specification of the time limit for accessing any given resource. A simple example is the specification of a contiguous time period or a set of contiguous time periods. When the access to resources is charged for, the problem is much more complex.

The third issue is the specification of the operations that are allowed to be performed on resources46 (for instance, where a certain parameter is read-only, or its value can also be changed). As these may depend on specific users (for instance, a network manager employed by the network owner should be allowed to do much more than a customer), they are also specified as part of the access rule.

The fourth issue is the specification of the type of authentication that is required for a particular user. (We believe that this is by far the most complex subject around which the discipline and the industry of identity management—reviewed further in this book—has been built.)

The fifth issue is the specification of what types of exceptional (or even routine) data need to be logged and what acts should be cause for an immediate alarm.

The above five issues become ten when replicated to specify access from inside the network to the resources outside. We mentioned in the introduction to this section, and it is important to keep in mind, that the term “protection” when applied to a firewall has a dual meaning. Keeping bad packets out of the network is only one part of the problem; an equally important problem is restricting emigration: preventing certain packets from leaving the network. (The latter concept may appear peculiar47 at first sight, but consider the implications of confidential data leaking out of a company or of its employees visiting dangerous sites. Another important example: network providers are expected to prevent IP address spoofing by blocking IP traffic with source addresses not used within their respective networks.)

Policy specifications are then translated into rules. The firewalls must interpret the rules, which is a non-trivial problem studied in the discipline of logic programming. A well-known problem here is feature interaction (a rule addressing one feature may contradict a rule supporting another feature, leading to unanticipated consequences in real-time operation). It is expected that the policy description language will help specify the rules unambiguously and ensure that they can be applied in a non-order-dependent manner.

Again, two sets of policies—and consequently two sets of rules—apply: the policy that controls the admission of inbound traffic specifies ingress filtering; the “emigration” policy that specifies which packets may not leave the network protected by the firewall specifies egress filtering. This distinction is illustrated in Figure 5.14(a).

Diagram on the left shows how a network is protected by a firewall with egress and ingress policy. Diagram on the right shows carriers network with ingress firewall and a private network with egress firewall. Ingress policy controls the admission of inbound traffic and egress policy specifies which packets may not leave the network.

Figure 5.14 Ingress and egress filtering: (a) interfaces; (b) split CPE.

In effect then, there are two different firewalls that deal with the inbound and outbound traffic, respectively. When a carrier provides the firewall service in the Customer Premises Equipment (CPE), it makes sense to split it, as Figure 5.14(b) demonstrates, so that the ingress firewall is physically located within the carrier's network. Not only is bandwidth in the access saved that way, but security is enhanced, too, since the carrier is much better equipped to mitigate certain denial-of-service attacks against the enterprise. Both points apply to home networking, which we will discuss later in this book in the context of network function virtualization.

Finally, when speaking of the “network perimeter,” we should return to the VPN discussion of Chapter 4 and recall that a network may actually be an archipelago (cf. Figure 4.2). There are two aspects to that:

  1. The islands of the archipelago must be kept indistinguishable from one another (and from the “mainland”) as far as the security policy is concerned.
  2. Given that each island is surrounded by potentially hostile waters, there is no such thing as one firewall separating “us and them.”

The first aspect is addressed by the fact that each island is indeed protected by one or more firewalls—one for each outside connection, which, of course, must implement the same policy as far as ingress and egress access is concerned. The second aspect is dealt with by employing IPsec to connect the firewalls, so that each tunnel is secure (independent of the underlying network implementation). See Figure 5.15.

Diagram shows how the employment of IPsec to connect the firewalls between private network segments makes a tunnel secure and independent of the underlying network implementations.

Figure 5.15 Layer-3 VPN with firewalls.

Now we are ready to review the services that the firewalls deliver. The simplest and oldest of all such services—a stateless firewall service—only filters IP packets based on the source and destination IP addresses, and, sometimes, on the protocol type.

5.2.2 Stateless Firewalls

To give a good example why it is important to restrict traffic based on the protocol number (and not only on the IP address), we need to introduce, briefly, another protocol—the Internet Control Message Protocol (ICMP), defined in RFC 79248 and updated in several other RFCs, notably for IPv6. ICMP has its own protocol number (which is, in fact, 1); it is not, however, a transport protocol for it does not carry arbitrary end-to-end payload. Its job is mainly to signal reachability problems among routers, although it was cleverly put to good use in applications such as ping (which allows us to determine if a host is present) and traceroute (which finds all the routers on a path to a host). Unfortunately, ICMP was also used in a series of DoS attacks49, starting with the 1997 smurf attack (allegedly perpetrated by a high-school student).

The attack, illustrated in Figure 5.16, is directed against a given server, whose IP address is known. The attacker spoofs this address in an ICMP echo request sent to a block of broadcast addresses. The routers obligingly propagate the request and, when terminating at LANs, helpfully translate it to the link-layer broadcast requests (to which hosts are required to respond). The participating networks are said to act as amplifiers. When all these hosts in the amplifiers “reply” to the server, the latter quickly becomes overwhelmed with processing the traffic, to the point of being useless for anything else. The sheer amount of traffic may also overwhelm the network in which the server is based. To this end, the CERT advisory CA-1995.0150 has noted: “Both the intermediary and victim of this attack may suffer degraded network performance both on their internal networks or on their connection to the Internet … A significant enough stream of traffic can cause serious performance degradation for small and mid-level ISPs that supply service to the intermediaries or victims. Larger ISPs may see backbone degradation and peering saturation.”

Diagram shows a smurf attack, where the attacker spoofs the IP address of a server in an ICMP echo request sent to a block of broadcast addresses. In smurf attack both the intermediary and the victim is exposed to a degraded network performance.

Figure 5.16 A smurf attack.

The reaction to the smurf attack resulted in changing the router configurations and also modifying the Internet standard not to require (as a default) that packets directed to broadcast addresses actually be forwarded to them. In the absence of these measures, the ingress filtering, by a carrier, of the IP packets whose source address does not match that of the network that issues it prevents a carrier from becoming an amplifier of the attack. This is mandated by RFC 2827,51 which has the status of Best Current Practice. We also note that prevention of spoofing by an egress firewall of the originating carrier is the simplest solution of all (although it would not prevent attacks on the host within that carrier's network, which would in turn require that all egress firewalls implement this feature). Another measure taken was the restriction of ICMP traffic by network administrators. That should explain why it may be desirable for firewalls to look at the protocol field.

So much for smurfing, except that the use of ICMP was non-essential; it was merely a device for carrying an attack. The ideas of amplification (through broadcast) of a malicious “request” and reflection by multiple servers “responding” to the request were abstract enough and powerful enough to have become a method for creating numerous denial-of-service attacks.

George Pólya famously said in [10]: “What is the difference between method and device? A method is a device which you used twice.” Sure enough, the same person who wrote the smurf.c program followed up, using the same method, with the fraggle.c (a larger program than the former, but still only 136 lines long). That attack did not use ICMP at all. Instead, it used UDP traffic to ports 7 and 19, which are associated with testing. (Port 7 is connected to the echo service; port 19 to the chargen service, which responds with random strings.)

The effect of the fraggle is similar to that of the smurf. Clearly, disabling these services on all hosts as a default would prevent this specific attack, but that outcome would be, too much to expect from system administrators. Unless the attack had already clogged the network (or just the link) before reaching the firewall, not much could be done. Otherwise, the firewalls can help here by eliminating the traffic destined for these ports—the closer to the perimeter of the network the better—and also by logging the respective activities. (Log analyzers can issue alarms to the proper response teams.) Even more important, the firewalls are always effective at the source of the problem, where the packets with spoofed addresses can be eliminated before causing any harm. Unfortunately, the amplification attacks have not ended with these as described; they seem to have no end, period.

As we mentioned earlier, DNS has been targeted by attacks of this type. In the DNS Amplification attack, illustrated in Figure 5.17, an attacker spoofs requests to open52 recursive resolvers, which in turn flood the target host (whose IP address was spoofed) with the DNS response traffic. The CERT Alert TA13-088A53 provides a detailed description of the attack. Note that the requests can be constructed to demand all possible zone information (by using type “ANY”). Sadly, DNSSEC makes things even worse, for the resulting responses are much larger than those that do not use DNSSEC because of the signatures. Finally, botnets—possibly located in different networks—can be used to further magnify the effect, resulting in a large-scale Distributed Denial-Of-Service (DDOS) attack.

Diagram shows how a DNS is targeted by amplification attack. The attacker uses a spoofed IP address of the victim and sends a DNS look-up request to vulnerable DNS servers that support open recursive relay.

Figure 5.17 A reflective DNS attack.

A variation of this type of attack can also involve authoritative name servers that do not provide recursive resolution, but in this case it is possible to mitigate the attack by limiting the response rate.

Again, source IP address verification according to IETF standards is the best protection against this attack, but this requires the cooperation of all providers—the IP packets with spoofed addresses must never be allowed to leave their networks. Here, the egress firewalls are indispensible. To mitigate the reflective DNS attacks, NIST recommends that firewalls keep track of DNS requests on the egress interfaces and never permit the “response” to a non-existing query to enter the network.

The firewalls that provide only packet filtering are called stateless, because they examine each packet without considering the overall traffic context, as opposed to stateful firewalls, which actually follow the state of transport-layer connections. (We have already come close to using stateful firewalls: one can argue that the implementation of the NIST recommendation of the previous paragraph in effect requires a stateful application-layer firewall for it involves keeping track of all outstanding DNS queries.)

5.2.3 Stateful Firewalls

The original motivation for developing these types of firewall was the mitigation of the SYN attack, described in great detail in [11], whose author also wrote (the informational) RFC 498754, which the IETF published in 2007. The attack exploits typical implementations of TCP (specifically, the TCP connection establishment phase).

It is as important to understand what exactly is happening here as why it is happening.55 The relative complexity of the connection establishment mechanism is caused by the necessity to synchronize the initial sequence number values at both the sender and the receiver. The original TCP specification56 explains: “For an association to be established or initialized, the two TCP's must synchronize on each other's initial sequence numbers. Hence the solution requires a suitable mechanism for picking an initial sequence number [ISN], and a slightly involved handshake to exchange the ISN's. A ‘three way handshake’ is necessary because sequence numbers are not tied to a global clock in the network, and TCP's may have different mechanisms for picking the ISN's. The receiver of the first SYN has no way of knowing whether the packet was an old delayed one or not, unless it remembers the last sequence number used on the connection which is not always possible, and so it must ask the sender to verify this SYN.”

Figure 5.18 depicts the exchange that ends up in successful connection establishment. The state of both the receiver and the responder is maintained in the three bits (S1, S2, R) respectively indicating whether

Diagram shows the states of an initiator and a responder when a TCP connection is established between them. The state of the initiator and the responder is maintained in the three bits; S sub(1), S sub(2) and R. The values of the bits can be 0 for negative state and 1 for positive state.

Figure 5.18 TCP connection establishment. Source: RFC 675.

  • the initial SYN with the sequence number was sent (S1 = 1);
  • the sequence number was acknowledged by the other party by sending back as the ACK parameter, the original value incremented by 1 (S2 = 1); and
  • the SYN was received (R = 1).

The connection is established when both parties reach state (1, 1, 1).

In an unsuccessful exchange, the acknowledgment never arrives, in which case the party that was expecting it returns to state (0, 0, 0) upon expiration of the respective timer.

In what follows, we will concentrate on the responding host (which, for our purposes, is a server against which the attack is carried). Maintaining a connection requires memory. The corresponding memory (typically, an array cell) is allocated by a host upon receiving the first SYN, that is—on transition to state (0, 0, 1). The block is naturally supposed to be kept until the connection returns to state (0, 0, 0). Thus, as a minimum, the memory block is kept until the timeout.

If, too many SYN packets arrive in a sufficiently short period of time, then the responder will run out of memory, which in turn would result in either a crash or, at the least, an inability to open any new connection. To tie up the server, an attacker can do just that (without ever following up with the final ACKs). RFC 4987 cites an interesting account that the authors of [7] envisioned this attack and wrote a paragraph about it in the 1994 edition, but then decided to remove it.57

The attack was first described in 1996 in what CERT called “underground magazines.” Subsequently, CERT issued advisory CA-1996-21.58 Since the attacks could be effectively carried out in a distributed manner, with the spoofed IP addresses, the Internet community followed up with the requirements for filtering—stressing the need for providers' ingress firewall action. Unfortunately, this measure alone is not sufficient because the attack can be carried out by seemingly legitimate hosts once they are hijacked to become part of a botnet.

Other proposed measures could be carried out only by changing the operating system kernels (where TCP is implemented). One such feature provided the ability to configure the limits of the backlogs (the number of half-opened connections) so that once such a limit is reached, the half-opened connections are closed and their memory is freed, this action possibly combined with ignoring SYN requests. The backlogs could be increased for large servers. The timer values could also be tinkered with.

The problem though is caused by early memory allocation, and a potential solution has to do something about postponing it. One efficient technique (which also hinted at the subsequent firewall implementation) was SYN-caching, in which the actual memory was not allocated until the connection had been established. Of course, the problem here is that it requires a (non-trivial) change in the operating system of a host, but there is more than one operating system. Another technique, called SYN Cookies (described by its author at http://cr.yp.to/syncookies.html), uses cryptographic techniques to encode the state of half-opened connections using the bits reserved for sequence numbers. When the connection completes, the state can be reconstructed in the newly allocated memory slot. The disadvantage of this technique is that certain TCP options become unavailable when it is used (as the bits are taken). As [11] reports, the techniques can be combined in a powerful hybrid.

An example of a stateful firewall solution is depicted in Figure 5.19.

Diagram shows how a TCP connection is established via a stateful firewall. A session established between a initiator and a responder can be limited to a specific time by a stateful firewall.

Figure 5.19 Stateful firewall (an example of a TCP connection establishment).

The firewall (presumably itself protected with techniques like SYN-caching or SYN Cookies) responds to the initiator of the TCP connection, attempting to establish its own connection before the responder knows anything about the request. If it succeeds, it establishes another connection with the responder, and from then on relays the data between the initiator and the responder, neither of which is aware of a proxy (or midbox) in the middle.

The proxy arrangement has been carried into applications protocols, as we will see, but a stateful firewall does not necessarily have to be a proxy. In a less extreme example, a firewall may merely keep track of all connections (by maintaining a table of all <Source IP address, Destination IP address, Protocol, Source port, Destination port> quintuples), rather than establishing two endpoints. This type of firewall can still do more than a stateless one. For instance, such a firewall can support a policy that no session may last longer than a specified period of time (an important policy if one considers charging for a connection based on its duration, especially when a pre-paid card is used). Keeping track of open connections coincides with the function of NATs, a point we will explore in the next section of this chapter.

5.2.4 Application-Layer Firewalls

It has been a tradition in the IETF to publish RFCs of an absurd and humorous nature on April 1.59 To this end, RFC 3903,60written by Mark Gaynor and Scott Bradner in 2001, proposed the “Firewall Enhancement Protocol (FEP) to allow innovation, without violating the security model of a Firewall.” More specifically, the “proposal” is “to layer any application layer Transmission Control Protocol/User Datagram Protocol (TCP/UDP) packets over the HyperText Transfer Protocol (HTTP) protocol, since HTTP packets are typically able to transit Firewalls.” That was indeed a funny joke! At the time, nothing could look more grotesque than carrying IP datagrams over the transaction-oriented application protocol, which was designed for retrieving fairly small files—called pages—on which the World-Wide Web depended.

We will discuss HTTP later in the book; here we observe—not without amazement—that what appeared absurd in 2001 in fact became the norm three or four years later! Effectively, the HTTP started to be used as a transport protocol for remote procedure calls and even for video streaming! The reason was precisely that mentioned in RFC 3093—HTTP could traverse firewalls unhindered. Furthermore, certain peer-to-peer applications (including instant messaging) used port 80—typically reserved for HTTP. All of that, of course, meant that HTTP could no longer traverse firewalls the way it did: the content of the “HTTP messages” has to be analyzed.

A significant threat in terms of malware has been posed by active content (e.g., Java programs), which therefore needs to be examined. But even “normal” HTTP methods have been used for harmful purposes (such as causing buffer overflow attacks) and to deal with this, it was important at least to validate the input parameters' lengths.

Hence, application firewalls have been built for HTTP, e-mail, interactions with SQL servers, and voice-over-IP applications. Even a seemingly innocuous data structure specification language—the Extensible Markup Language (XML) presented a need for XML firewalls, chiefly to understand the service specifications that are written in it. For example, the XML-based Web Services Description Language (WSDL) has been developed to describe service programs (to be executed remotely) that are offered by a site.

Analysis of the application protocol payload is called deep packet inspection. But, application firewalls often need to go way beyond just checking the payload. In fact, in many cases, to understand the payload, one needs to know the particular state of the protocol. Hence, application firewalls are often stateful. In the extreme version, an application firewall acts as a proxy in that it terminates the protocol in question and restarts other instances with the entity behind the firewall that it protects. The situation is similar to that depicted in Figure 5.19. Things can become pretty complex, and this complexity can be (and has been) exploited by intruders. The very attacks that could result in taking control of a host may now result in taking control of the firewall, which is not surprising since the firewall is executing the protocol that a protected host was supposed to be executing and looking at the data that a protected host was supposed to look at. Just as Nietszche warned in [12]: “He who fights with monsters should be careful lest he thereby become a monster. And when thou gaze long into an abyss, the abyss will also gaze into thee.”

It has become common for enterprises to deploy firewalls to separate zones—sometimes even within a single organization. The idea is to establish a fully protected internal network—often called the trusted zone—behind the firewall, while also maintaining another network—called the demilitarized zone (DMZ), which operates according to a different policy. A typical use case for DMZ is to host an enterprise's public web server.61 DMZ operates under a different policy, and so it is separate from both the outside world and the protected zone.

Zoning can be accomplished with a single firewall that has at least three network interfaces, as depicted in Figure 5.20(a). One interface is connected to the outside world, the other to the DMZ (so that both the traffic from- and into the outside world, as well as traffic from- and into the protected zone, go through this interface); the third interface is connected to the trusted zone. This arrangement is the least expensive to effect, but it presents dangers (consider misconfiguration or a single-point failure).

The dual-firewall arrangement of Figure 5.20(b) is more expensive, but is considered to be much safer.

Diagram shows how zoning is done in networks with a single firewall and two firewalls. Each of the networks has three network interfaces. First interface is connected to the outside world, second interface is connected to a demilitarized zone and the third interface is connected to a trusted zone.

Figure 5.20 Network zoning: (a) with a single firewall; (b) with two firewalls.

As we are going to describe the network address translation function, we note that—although it is distinct from that of the firewall—it is often implemented as part of a firewall.

5.3 NAT Boxes

With a 32-bit IP address field, there are about 4.3 billion IP addresses—clearly not enough to assign to every person's computer, phone, washing machine, tooth pick, and what not. This was understood as early as in the 1990s, although the idea of assigning IP addresses to things other than computers came later—phones, around 1997; washing machines, around 2003 as part of the Smart Grid plan; and tooth picks … well, we surmise that tooth picks are “things” as in the “Internet of Things.”

IPv4 has stubbornly remained around though, as have the solutions to address the shortage of the IPv4 addressing space. The idea is to reuse the pool of existing IP addresses in a network following a clever scheme.

In this scheme, first the IP address space is divided into two parts—private and public—with the private addresses to be used only inside the network, that is for intra-network communications. Second, for each IP packet that goes outside, its source IP address is changed to the public IP address. Such address substitution is precisely a function of a NAT box. With that, a NAT box can assign the same public address to all packets that go through it. Hence, if a network has n entities, inside only one IP address—shared by all of them—it is used for external communications. Not only does this scheme allow the reuse of IP addresses, but it also anonymizes the network addresses behind the NAT box,62 as depicted in Figure 5.21.

Diagram shows how a NAT box assigns same public address to the entire packet that goes through it. NAT box will assign a unique public address to individual packets with different IP addresses.

Figure 5.21 NAT in a nutshell.

Of course, such a trick could not be pulled off without introducing new (and serious) problems, the first of which is caused by the necessity to translate the incoming packet's destination IP address to the proper private IP address. This problem can be solved in a fairly straightforward way, but the solution itself is problematic for, as we will see, no entity external to a network can initiate communications with any entity within a network. This is a violation of a major Internet principle. In fact, we will see that the scheme has broken a number of Internet principles. But then, the people who considered NAT an abomination have learned (and taught others) how to live with it. As the deployment of NAT has progressed—along with further growth of the number of Internet hosts and further IP address space depletion—so has grown the number of new problems and their solutions. Still, the NAT boxes are evolving while still serving their two major purposes mentioned above: (1) effective management of scarce IP addresses and (2) obfuscation of the internal network structure.

Again, it is hard to overestimate the importance of the latter. We had mentioned this in a footnote earlier, but this subject warrants an expansion. First of all, in an enterprise network (say that of a bank or—for a more striking example—a military facility), there are hosts that for obvious reasons should never be visible to the outside world. Indeed, the firewalls can help here, but the major issue is that such hosts must not be addressable from the Internet, period. Second, given the way the Internet has been designed, not only hosts, but also routers can be reached by their IP addresses. When it comes to carrier networks, the ability to reach their routers through network management interfaces is a dangerous thing. (This issue should not be confused with that of connecting peer routers directly, which is accomplished at the link layer.) The PSTN has stayed fairly secure in large part, owing to the fact that its switches and other network elements were not addressable, and thus accessible, from outside. The ISPs have learned that they had better follow the same model, which NAT enables.

With the advances in home networking (in which there is a NAT function deployed within the home gateway), combined with the growth in enterprise networking, carriers needed to introduce NAT-on-top-of-NAT, which is often called carrier-grade- or large-scale NAT.

In the rest of this section we discuss:

  1. The allocation of private IP addresses;
  2. The architecture and operation of NAT boxes;
  3. The protocols that enable operations (of many existing Internet protocols) in the presence of NATs—the Interactive Connectivity Establishment (ICE) protocol, Session Traversal Utilities for NAT (STUN) protocol, and the Traversal Using Relay NAT (TURN) protocol; and
  4. Large-scale NAT.

5.3.1 Allocation of Private IP Addresses

Definitive guidance on the subject was first issued in 1994, in RFC 1597,63 co-authored by the engineers who represented three essential segments of the Internet industry—software vendors, enterprise networking, and Internet registry.64 Explaining the advantages of introducing the private address space—IP address conservation and operational flexibility—the RFC observes that the (uncoordinated) use of private IP addresses had already taken place and warned about the consequences:

“For a variety of reasons the Internet has already encountered situations where an enterprise that has not been connected to the Internet had used IP address space for its hosts without getting this space assigned from the IANA. In some cases this address space had been already assigned to other enterprises. When such an enterprise later connects to the Internet, it could potentially create very serious problems, as IP routing cannot provide correct operations in the presence of ambiguous addressing. Using private address space provides a safe choice for such enterprises, avoiding clashes once outside connectivity is needed.”

Interestingly, we witnessed just such a clash six years later, when we were in charge of establishing the IETF network for the March 2001 IETF meeting (the first IETF meeting in the 21st century, sponsored by Lucent Technologies). Much expertise was supplied by Bell Labs and Lucent Technologies business units, and the network had passed all the tests within the Bell Labs premises. The network was designed and built to accommodate more than 3,000 hosts, with both wireline and wireless connections. Yet, on its deployment, a strange phenomenon manifested itself: many participants had no outside connection at all, even though the network multi-homed on two ISPs to ensure reliability.

Here is what had happened. For the internal IETF network, Lucent Technologies used the IP addresses from its private space, but the specific block of IP addresses in question was inherited from AT&T following its trivestiture in 1996. Something obviously fell through the cracks, for someone in AT&T forgot to delete these addresses from its space. Fortunately, the problem was diagnosed and fixed within a couple of hours, but we will always remember the anxiety it caused …

In 1996, the (informational) RFC 159765 was replaced by the best-current-practice RFC 1918,66 which remains in force. As specified, IANA has reserved the following three blocks of IP address space for private networks:

  1. 10.0.0.0–10.255.255.255 (10/8 prefix);
  2. 172.16.0.0–172.31.255.255 (172.16/12 prefix);
  3. 192.16A0.0–192.16A255.255 (192.168/16 prefix).

This is equivalent to the allocation of, respectively, a single Class-A network number, 16 contiguous Class-B network numbers, and 256 contiguous Class-C network numbers.

Any entity can use them in its own domain without asking IANA or anyone else: “An enterprise that decides to use IP addresses out of the address space defined in this document can do so without any coordination with IANA or an Internet registry. The address space can thus be used by many enterprises. Addresses within this private address space will only be unique within the enterprise, or the set of enterprises which choose to cooperate over this space so they may communicate with each other in their own private internet.” Of course, none of these addresses can be used externally, but this is what IANA and the Internet registry have to ensure.

In contrast, any entity that is made visible to—and accessible from—the outside world must have an address from the globally unique address space. These addresses can be assigned only by an Internet registry (which, in turn, gets allocated the address space by IANA).

The hosts that are assigned network addresses from the private space are called private hosts; similarly, the hosts that are assigned globally-unique addresses are called public hosts. Figure 5.22 illustrates the arrangement.

Diagram shows a set of public and private hosts in networks A and B. In both the networks, hosts with private IP addresses a, b and c are private hosts. Hosts with globally unique IP address X, in network A, and host with globally unique IP address Y, in network B, are public hosts.

Figure 5.22 Private and public addressing networks A and B.

The private hosts (placed within a punctured oval) in both networks, A and B, have private IP addresses, {a, b, c}, which are meaningful only within the network they are in. (It is important to note that in addition to an ability to communicate among themselves, they can communicate with any host within their respective networks independent of that host's address.) The hosts with IP addresses X and Y, both of which are globally unique, are public. They are addressable by any host on the Internet. In the particular example of Figure 5.22, eight hosts use only five IP addresses.

With a change of its IP address, a host can move from being private to being public (and the other way around).

Naturally, routing information of private networks is not propagated outside. It is also a job of edge routers (or, to be more precise, the firewalls) to ensure that no packet with either a private source or a private destination address (1) leaves the network in which it originated or (2) enters any network. A propos (2), an edge router that rejects an incoming packet with a private IP address or routing information associated with such an address is not expected to treat this as a protocol error. It must simply toss the packet out.

In the same vein, the DNS resource records or any other information relevant to a network's internal private addresses must never leave the network. (This is yet another policy to be enforced by firewalls.)

Again, with all the benefits of private addressing for IPv4 space preservation, the major problem with this approach is that the IP addresses were envisioned to be global. The mere fact that private addressing was introduced has signaled a departure from the original Internet vision. A taboo was broken.

5.3.2 Architecture and Operation of the NAT Boxes

It was exactly because many IETF engineers could not agree on how to standardize a broken taboo that the industry went ahead without waiting for a standard. It would be incorrect though to say that the IETF has not worked on NAT; to the contrary, several working groups dealt with, and have been dealing with, NAT. It is just that there is no standard for NAT, although there are standards (described in the next section) on how to live with NAT boxes.

The subtle issue here is that the architecture per se is not necessarily subject to standardization—the protocol is. The adage is that the standard does not have to deal with how a box is built; it merely has to describe the box's behavior. That maxim is incorrect though, and the NAT box is a poster child for demonstrating how the internal structure and behavior may be indivisible.

An informational RFC 163167 published in 1999, while by no means a standard, sheds light on what needs to be implemented.

Let us start with the definition of the minimal function of a NAT box. It has to translate S—a set of private source IP addresses—into only one source IP address, i, but do it so that on receiving a packet with the destination address i, it would translate it back into the unique i′ ∈ S. Obviously, this cannot be achieved without modifying some other field in the IP header. The only field available for modification is the source port number. But if the source port number changes, it must be saved, along with the source IP address. Let us take a look at Figure 5.23.

Diagram shows how a NAT box handles outgoing traffic. The internal source IP address, source port number and external source IP address are the variables of incoming traffic.

Figure 5.23 A NAT box—outgoing traffic.

Let X and Y be the private source IP address and source port number of an IP packet that arrives at a NAT box from inside. For the future backward translation to work, the NAT box must save the pair (X, Y) in the translation table and transform the pointer to that table entry into the value Y′.68 The packet that leaves NAT now has the NAT box's public IP address X′ along with the source port number Y′. But now that the IP header has changed, so must the checksums change for each IP header as well as the transport protocol headers. (We remind the reader that the source port number is part of the transport-layer header.) Right at this point, another taboo is broken—the layering principle has been violated, as the network-layer entity needs to look at and modify a transport PDU. (A NAT box is essentially performing a router function.)

Not only that, but several Internet application-layer protocols have already violated the layering principle in that they had carried—in the absence of a universal resource locator scheme that was invented and implemented much later than these protocols—the IP addresses within the application protocol payload. To this end, RFC 1631 suggests that “NAT must also look out for ICMP and FTP and modify the places where the IP address appears. There are undoubtedly other places where modifications must be done.” This would be rather a complex task,69 further complicated because of the tradition of specifying the application protocols (such as the protocols for e-mail, multimedia session initiation, and hypertext transfer) as ASCII text! This is how breaking one taboo immediately resulted in breaking quite a few others …

As far as the incoming traffic is concerned, the behavior of the NAT box is uniquely determined, as Figure 5.24 demonstrates.

Diagram shows how a NAT box handles incoming traffic. The destination IP Address and Destination Port number are the variables of incoming traffic.

Figure 5.24 A NAT box—incoming traffic.

Since X′ is a constant associated with the NAT box, only Y′ can be used to determine the original pair (X, Y).

At this point it should be clear why with this scheme no entity behind the NAT box can be contacted from outside before that entity initiates the contact. In fact, as may be expected, this

matter is complicated further. The translation scheme that we just presented requires that the size of the table for keeping the (X, Y) pairs be proportional to the number of hosts in the private network. For big carriers, this table is already, too big to be implemented practically, and so the number of entries in the table must be smaller than the logical maximum. To support this requirement, the entries must be removed from the table after they spend there some time—determined by the timing parameter. Implementations vary here, but no matter what is done, it is quite possible for an external interlocutor of an entity in the private network behind the NAT box never to be able to reach it until it restarts the conversation. That necessitates the “keep-alive” routines that increase the network traffic without carrying any payload.

This problem gets worse as NAT becomes more complex, because of the constant introduction of new features. It is important to note that the translation mechanism is invariant to feature introduction. What changes is what is being stored and for how long.

These features—never standardized, but greatly affecting the behavior of the NAT boxes—are, too many to consider here, and their taxonomy alone is overwhelming. For the purposes of this book, we only hint at what happens here.

Let us consider a requirement that a host in the private network may receive messages only from those external hosts with whom it had initiated a conversation. Not only is this requirement reasonable, it is essential to implement in order to protect against reflective denial-of-service attacks discussed in the previous section.

Consider the situation of Figure 5.25. Suppose the private host X starts conversations with the external hosts A and B. Both A and B can, of course, respond because they know the unique port number that allows them to identify X. But nothing prevents a (potentially malicious) host C from learning this port number (for instance, by sniffing the traffic to B or simply by making B disclose this number—it is by no means secret, and is not protected cryptographically). Once C knows this number, it can start sending messages to X. According to requirements, the NAT box must not allow this message to go through, but with the arrangement we just described the NAT box has absolutely no means of doing so!

Diagram shows how a potentially malicious host C establishes a connection with a private host X, via a NAT box, by sniffing the traffic to host B and identifying the unique port number of X. The unique port number of host X is known to only host A and host B.

Figure 5.25 An unsolicited “response.”

The only solution is for the NAT box to store the parameters of each opened session—that is not only the source IP address and source port number, but also the destination IP address and the destination port number. This would immediately double the size of the table.

More stringent requirements result in saving the protocol number, the timer interval for each session, and other parameters. Unfortunately, all this is still only the tip of the iceberg …

The reader has probably observed that much of this could have been eliminated, had the NAT boxes been considered separate from firewalls; however, the implementations invariably keep the NAT function as part of the firewalls, further blurring the distinction between essentially different functions.

To deal with the complexity, the IETF created the NAT working group, which, however, did not produce a much-needed standard. RFC 266370 further complicates things, introducing multiple “flavors,” including one (two-NAT), in which DNS is involved. This is definitely an interesting read, but as other NAT-related RFCs had mentioned, RFC 2663 is informational—it specifies no standard.

An excellent analysis of NAT architecture and deployment, along with a review of application protocol interworking with NAT, is contained in an early 2001 monograph [13], but quite a few things—notably those we discuss in the next section—developed later (the development motivated first by IP telephony and later by more general real-time multimedia requirements).

Meanwhile, the IETF followed up with the Behavior Engineering for Hindrance Avoidance (behave) working group, with the objective of creating “documents to enable IPv4/IPv4 and IPv6/IPv4 NATs to function in as deterministic a fashion as possible.” This group, which has concluded, produced about three dozen documents71, enough to fill a small library, thus bringing the volume of literature dedicated to NAT behavior close to that dedicated to human psychiatry …

For a concise history of NAT development, along with the nuanced pro and contra arguments, we highly recommend [14]—an article in the IETF Journal—written by the IETF insider (and inventor of the RSVP protocol), Lixia Zhang. Here is one of Dr. Zhang's observations: “The misjudgment on NAT costs us dearly. While the big debate went on, NAT deployment was rolled out, and the absence of a standard led to a number of different behaviors among various NAT products. A number of new Internet protocols were also developed … during this time … All of their designs were based on the original model of IP architecture, wherein IP addresses are assumed to be globally unique and reachable. When those protocols became ready for deployment, they faced a world that was mismatched with their design. Not only did they have to solve the NAT traversal problem, but also the solution had to deal with a variety of NAT box behaviors.”

The word “behaviors” is key here, for just figuring out the mapping at the NAT box does not guarantee NAT traversal. We have already described informally the differences in behavior. An early (2003) attempt to characterize such behavior was taken in the (now obsolete) RFC 348972:

  • With the full cone NAT, an external host can send a packet to the internal host without prerequisites.
  • With the restricted cone NAT, an external host cannot initiate a conversation with an internal host; it can speak only if it has been spoken to. In other words, the NAT box keeps track of the destination IP addresses to which the internal host had previously sent messages, and allows traffic only from those addresses.
  • With the port-restricted cone NAT, the previous restriction is narrowed to allow responses only through the port on which the conversation was initiated. Here the NAT box keeps track of the pairs (Destination IP address, Destination port number) to which the internal host had previously sent messages, and allows IP packets whose (Source IP address, Source port number) pairs match the above.
  • With the symmetric NAT, the table entries are sessions (Source IP address, Source port number, Destination IP address, Destination port number), and each session is mapped to a different port number.

In 2007, the IETF issued a best-current-practice RFC 478773 on NAT behavior as related to unicast UDP packets. The RFC acknowledges though: “The classification of NATs is further complicated by the fact that, under some conditions, the same NAT will exhibit different behaviors. This has been seen on NATs that preserve ports or have specific algorithms for selecting a port other than a free one. If the external port that the NAT wishes to use is already in use by another session, the NAT must select a different port. This results in different code paths for this conflict case, which results in different behavior.” One rather sad observation is that instead of setting the standard for NAT and leading the industry, the IETF ended up trying to keep up with the standardless situation. Larger and larger documents appeared, occasionally “obsoleting” (sic) earlier documents, each of them nearing the length of the TCP specification (by far the most complex one in the Internet). None of these documents makes easy reading either, partly because the state of our understanding of the Wild-West nature of NAT development is commensurate with the state of writing the very documents that describe it …

We will look now at ways to deal with the NAT traversal problem.

5.3.3 Living with NAT

An early mechanism, described back in 1999 (in RFC 2663), is fairly simple, and is still being used. It is called Application Level Gateway (ALG). RFC 2663 describes it as follows:

“Application Level Gateways (ALGs) are application specific translation agents that allow an application on a host in one address realm to connect to its counterpart running on a host in different realm transparently. An ALG may interact with NAT to set up state, use NAT state information, modify application specific payload and perform whatever else is necessary to get the application running across disparate address realms.”

Actually, the idea is very straightforward. The mechanism is exactly the same as that which allows a person who may play chess very badly—in fact, a person who might not even know how to play chess!—to play simultaneously with the best two chess players in the world (say X playing white and Y playing black), and end up either winning at least one game or drawing both. To achieve this, the incompetent player would start the game with Y by repeating the first move of X, then proceed by repeating the move that Y made in the game with X, and so on. In reality, all this means is that X plays with Y. If the game is drawn, that means the incompetent player has drawn both games; if the game is won, it means that the bad player won one game (and lost the other). The power of a middleman!

And this is precisely how an ALG works, as shown in Figure 5.26. A process running on host X always initiates the conversation (because it is inside the private network), thinking it is talking to its interlocutor on host Y. Instead, the ALG inserts itself into the middle of the conversation, by maintaining two processes—one impersonating Y to X and one impersonating X (now equipped with the NAT public address) to Y. At least the ALG can just copy the data, but it can also perform a firewall function, modifying or even censoring the packets. To this end, ALG is a kind of stateful firewall.

Diagram shows how an application level gateway acts as a intermediary between two peers X and Y establishing a communication with each other; the application level gateway impersonates Y to X and impersonates X to Y during the communication.

Figure 5.26 Application-Level Gateway (ALG).

Before addressing other solutions for NAT traversal, we should emphasize that the fact that only a host within a NAT-shielded network can start a conversation with any host outside means that no two hosts located in different NAT-shielded networks can communicate, period. (Once again we see that breaking the end-to-end principle results in a chain of far-reaching consequences.)

As it happens, this was a major problem that the development of IP telephony faced originally (as in the PC-to-PC scenario described in [15]). Interestingly enough, the problem is still very real today—a significant amount of energy in the IETF Real-Time Communications in Web Browsers s (RTCWeb) working group74, created in 2011 to eliminate plug-ins and enable multimedia streaming directly between browsers, has been spent on its charter item “Define the solution—protocols and API requirements—for firewall and NAT traversal.”

From the time the development of IP telephony started, the objective of NAT traversal was two-fold: (1) to discover an entity hidden by a NAT box and (2) to establish a pinhole75 for the necessary port in the firewall. The discovery is the hardest problem. It has multiple solutions, of which some may work in some circumstances but not in others.

Let us start with the most straightforward solution, which always works. The drawback is that it is the most expensive of all, and perhaps impractical at scale.

The solution is called Traversal Using Relays around NAT (TURN), and it is described in RFC 5776.76 TURN was originally designed to work with the Session Initiation Protocol (SIP) and Session Description Protocol (SDP),77 as part of IP telephony, but it can be applied more universally, which we will discuss later.

Figure 5.27 presents the idea: make a central (i.e., publically addressable) server a relay point to connect two NAT-hidden hosts (or peers) in separate networks. Each peer can establish its own connection to the relay server, which will then relay packets so that the peers communicate with each other.

Diagram shows how two NAT hidden peers are connected using a publically addressable rendezvous server. Peer's X and Y establish a connection with the relay server via the NAT boxes, the relay server will relay packets to either of the peers.

Figure 5.27 A rendez-vous relay.

Given that the nature of communications is multimedia, it follows that the relay server must have a very high-bandwidth connection to the network, and also high capacity. This requirement can be significantly relaxed though if it is possible for the peers to establish a multimedia transport connection among themselves so that relaying is used only for initial signaling (which needs significantly less of both the bandwidth and processing capacity). Unfortunately, this is not always possible. We will return to this point later.

What is used in real life though are variations of the above model. Traversal Using Relays around NAT (TURN) is the name of the protocol enabling the scheme of Figure 5.28.

Diagram shows how a communication is established between a client and two different peers. The client uses TURN commands to create and manipulate a data structure. The peers and the client is connected to different NAT boxes, which is further linked to a TURN server, where the allocation is done.

Figure 5.28 Traversal using relays around NAT (TURN).

It is important to note that the TURN server is not a rendez-vous point. Its operation is asymmetric—rather like that of a telephone switch—in that a TURN client must know the IP address and port number of a peer it wishes to communicate with. (This information can be found through peer discovery via a true rendez-vous server—such as a SIP server, to which a peer would register—or it can be distributed through e-mail or similar out-of-band means.) With that, the reason the TURN server is needed at all is because as long as it is public and known, it has a much better chance of penetrating firewalls than its client. To this end, the client turns to TURN only after it has failed to reach the peer.

The TURN protocol is used only between the client and the TURN server, and it must be initiated by the client. A client can learn the IP address and port number of the TURN service from a configuration or through DNS. (The value of the SRV parameter is TURN.) It is also possible to reach TURN via an anycast address.

The client uses TURN commands to create and manipulate a data structure (called allocation) on the TURN, which is the switching point for all the peers the client wishes to communicate with. As is the case for many Internet protocols, TURN defines a keep-alive mechanism: the allocation is kept for as long as the client repeats a refresh request within a specified period of time. The client also specifies permissions for peer connections. Now, the TURN server does not establish a connection with any peer—all communications with peers are done over UDP.

The client encapsulates the application data inside a TURN message. The TURN server extracts these data and sends them over UDP. When the peer sends the data in the other direction, the TURN server relays it to the client inside a TURN message.

The issue of addressing is solved by allocation, which contains the transport address (i.e., the [IP address, port] pair) of the client. This is the source address used in all messages to a peer, and thus becomes the destination address in messages from the peer. The TURN message always contains an indication of the peer that has been the source of the message.

Back to the original problem: what can an application protocol (such as SIP or SDP), which uses IP addresses and port numbers hard-coded in-line, do in view of NATs, which will change the protocol headers but not the in-line data? It cannot do much unless it finds out exactly what its external IP address and port number is. Incidentally, with the high volume of NAT publications, there seems to be a bit of inconsistency in terminology. Some specifications call the combination of an external transport address a mapped address; others call it a reflective address. For the rest of this section, we will use the latter term since this is the one used in the protocol designed to solve the problem at hand. The name of this protocol is Session Traversal Utilities for NAT (STUN).

The first version of STUN was published in 2003, in RFC 3489.78 Back then “S” actually stood for “Simple,” and the name of the protocol was Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs). STUN was described as “a lightweight protocol that allows applications to discover the presence and types of NATs and firewalls between them and the public Internet.” It so happened that nothing at all was simple—or at least as simple and straightforward as it used to be in the original Internet design—and five years later the new RFC (RFC 538979) was published. It had “obsoleted” (sic80) RFC 3489, which by then was called the “classic STUN.” The reason for the radical change was that “experience since the publication of RFC 3489 has found that classic STUN simply does not work sufficiently well to be a deployable solution.” To this end, “Classic STUN provided no way to discover whether it would, in fact, work or not, and it provided no remedy in cases where it did not. Furthermore, classic STUN's algorithm for classification of NAT types was found to be faulty, as many NATs did not fit cleanly into the types defined there.” There was also a security vulnerability (the mapped addresses could be spoofed under certain circumstances).

We believe that it was after this experience that the IETF started to discourage the use of the words “simple” and “light” in the names of its protocols. For sure, nothing has ever been simple about NAT. Yet, the idea of STUN is simple, as illustrated by Figure 5.29: to get its reflective address, an application process running on a host should just send a request to a STUN server, which then responds with the reflective address in the payload. Indeed, it is as simple as looking into a mirror!

Diagram shows how an application process running on a host retrieves a reflective address from a STUN server by sending a request via a NAT box. The STUN server answers back the reflective address in the payload.

Figure 5.29 Learning the reflective address from a STUN server.

What has complicated and ultimately ruined the initial design of STUN is that NAT boxes refuse to behave according to the predicted logic. Some people considered the Internet broken then and there, but others applied realpolitik. RFC 3489 chose to abandon the idea of providing a complete solution, and instead packaged STUN as a tool with several defined “usages” (i.e., specific circumstances where STUN can be applied). In addition, the STUN protocol can now be used over both TCP and UDP as well as the Transport Layer Security (TLS) protocol, which runs on top of TCP.

The protocol supports request/response transactions and indications (which require no response). With both, the single method specified is Binding. In transactions, Binding is used by the client to find out the reflective address it is bound to; in indications it may be used, for example, to keep the binding alive.

An important feature of STUN is that its messages can be multiplexed with those of other protocols. To help with demultiplexing the value of the transaction correlation ID (called magic cookie) is a bit string specially constructed to avoid ambiguity. (There is an extension, which uses a fingerprint attribute to aid in the same task.)

An interesting feature of STUN deals with helpful NAT boxes that try to detect the in-line transport addresses and rewrite them themselves. Hence the obfuscation technique: a STUN server applies an exclusive-or (XOR) bit operation81 on the value of the reflexive address and a part of the magic cookie, which is what ends up being transmitted as the reflexive address value.

STUN addresses security by providing the mechanisms for both authentication and integrity checking. The mechanisms are optional—subject to the “usage” selection—and they are based on two credential mechanisms: with the long-term credential mechanism the user name and password are pre-provisioned; with the short-term mechanism they are shared through an out-of-band exchange. With either mechanism, the authentication is achieved through a challenge/password response, and the key for integrity checking is derived from the password.

Just as with other request/response protocols surveyed above, STUN can be used in an amplified distributed denial-of-service attack, but the nature of the attack is different from what we have seen before. Rather than sending (fairly small) faked responses, an attacker needs to provide multiple clients not with their reflective addresses, but with that of the target of an attack. Once the clients (hopeful of receiving their own huge video streams on this address) hand the address to their peers, all the traffic will be directed to the victim. RFC 5389 notes that to effect the attack, an attacker would need to insert itself between the STUN server and multiple clients.

A simpler DoS attack can be carried out to “silence” a client, if the attacker—again acting as the man-in-the-middle between the STUN server and the client—provides the latter with a fake transport address. This attack is not specific to STUN though, since a potential attacker can just as well deny other services to the client. An attacker can easily modify the nature of the attack by providing the client with its own transport address, so that it can eavesdrop on the traffic destined for the client.

A more serious attack is that of an attacker assuming the identity of a client. This, however, is a more generic problem than just that of STUN, and its solution lies in the means of distributing and guarding shared secrets.

So far we have not addressed the STUN server discovery, which is the last item in our discussion of STUN. Noting that the server discovery is problematic when STUN is multiplexed, RFC 5489 defines an optional DNS procedure, as follows: “When a client wishes to locate a STUN server in the public Internet that accepts Binding request/response transactions, the SRV service name is ‘stun’. When it wishes to locate a STUN server that accepts Binding request/response transactions over a TLS session, the SRV service name is ‘stuns.’ STUN usages MAY define additional DNS SRV service names.”

The implementations of TURN and STUN servers exist in open source, which makes them particularly useful for deploying in the Cloud.

As a segue to the last topic of this section, let us observe one new complexity with the use of NAT boxes. As already observed, the size of a NAT box cannot exceed 61,440.82 The physical memory limitation is yet another constraint, which can make the number of entries even smaller. One approach to deal with this problem is to deploy different NAT boxes at the perimeter of the network so that different prefixes of destination IP addresses use different NATs.83 This situation is reflected in Figure 5.30.

Diagram shows how different NAT boxes are used by different prefixes of destination IP addresses. Network 1 uses NAT box 1 to connect to STUN server 1. Similarly network 2 and 3 uses NAT boxes 2 and 3 to connect to STUN server 2 and 3.

Figure 5.30 Different NATs for different paths.

Consequently, a process on a host within a given network may have different reflexive addresses depending on which TURN server it obtained them from.

Now we are ready to describe one definitive mechanism that uses STUN and TURN for NAT traversal. The name of this mechanism is Interactive Connectivity Establishment (ICE). It is by no means universal in that it applies only to a set of so-called offer/answer protocols, which allow two processes to arrive at a common view of a multimedia session using the SDP protocol. SIP is one such protocol, and it serves as the original inspiration for defining the common features of the offer/answer family. For the purposes of this book, we can assume that the protocol in question is indeed SIP.

ICE is defined in RFC 5245.84 The ultimate objective of ICE is to find two transport addresses best suited for establishing an RTP/RTCP session between two peers. With that, the peers are assumed to communicate between themselves through SIP, which knows how to traverse NAT. (Of course, the transport addresses used for this signaling are likely to be different from those best suitable for carrying the media stream over UDP.) These transport addresses are exchanged through SDP. ICE achieves this objective by accumulating the transport addresses, testing them for connectivity, and selecting the best-performing ones.

Altogether there are three types of addresses, as shown in Figure 5.31:

Diagram shows three different address types; local address or a transport address, server-Reflexive Address obtained from STUN, and relayed address obtained from a TURN server.

Figure 5.31 Candidate transport addresses (after Figure 2 of RFC 5245).

  1. Local address (i.e., a transport address as related to a host's network interface card);
  2. Server-reflexive address (i.e., an address obtained from a STUN server on another side of a NAT box); and
  3. Relayed address (obtained from a TURN server in response to Allocate).

In each address type there may be multiple candidate addresses for every given peer process. Indeed, in the presence of multiple network interfaces (multi-homing) there are as many local addresses. As Figure 5.31 demonstrates, there may be several NAT boxes85 on different paths, and so different STUN servers may supply different server-reflexive addresses. Finally, there may be more than one TURN server and as many relayed addresses.

ICE operates in three steps, as depicted in Figure 5.32.

Diagram shows the operations performed in ICE to establish an RTP/RTCP session between two peers. Data from candidates is collected, sorted and send for peer review via an SDP offer, connectivity is tested and the results are shared.

Figure 5.32 ICE operation.

First, the client that initiates the session collects the candidates and sorts them according to a calculated priority (see the RFC for the algorithm to compute this priority). Once finished, these are sent to the peer (via an SDP offer). The peer performs the same operations, so in the end the sorted candidate lists are exchanged.

At this point, each peer keeps checking connectivity for each pair of transport addresses (STUN running on each peer's host can be used for this purpose) and informs its interlocutor about the check results. In the end, both peers have one or more working pairs.

RFC 5245 carefully considers the optimization options and security considerations, especially for connectivity testing.

5.3.4 Carrier-Grade NAT

So far we have tacitly assumed that NAT boxes are deployed only at one layer. This is no longer the case though, because Internet service providers faced IPv4 address exhaustion, too, especially with the growth of home networking.

Up into the late 1990s (the history described in [15]), it used to be that a single home computer simply dialed into the provider's network, at which point it was assigned a temporary IP address. This situation has changed drastically. A single home may have many devices now, and all of them are expected to be on all the time.

The addresses for these devices come from the private address space, and the residential gateway provides the NAT service. (Note that it is not only big enterprises or people's homes that are involved, as it is customary for libraries, hotels, bars, and cafés to provide access to the Internet, too.)

For some very large carrier networks it has become impossible to assign to each residential gateway—or enterprise CPE gateway—a public IP address. The solution, depicted in Figure 5.33, has been to assign each of them a private-space address still, and hence deploy another NAT box (this time with the public IP address) at the border with public Internet.

Diagram shows the framework of a large scale carrier grade NAT. The three main sections of the framework are enterprise network, carrier network, and residential network. CG NAT box with a public IPv4 address and a public Internet connection, CPE NAT and home NAT with a carrier-private IPv4 addresses.

Figure 5.33 Carrier-grade (large-scale) NAT.

This type of NAT has been called Carrier-Grade NAT (CG NAT or CGN). As it happens, in an industry known for its love of jargon, one name was not enough. Another name denoting the same object is Large-Scale NAT (LS NAT or LSN). And if this were not enough, other terms have been added—playing on the nature of the translation. Common NAT provides IPv4-to-IPv4 translation, and so it has been given the name NAT 44. Since CG NAT deployment effects IPv4-to-IPv4-to-IPv4 translation, it is also called NAT 444. And so we have three names and four additional acronyms—all of which apply to the same thing. And this is not the end yet: because some NAT boxes also provide IPv4-to-IPv6 translation, we also get the name NAT 46. The authors have not seen NAT 446 or NAT 644 yet, but they expect these names to appear, too, especially because CG NAT boxes may deal with IPv6 addresses. Nobody expects NATs to be deployed when the whole world turns to IPv6, and so the apocalyptic potential NAT 666 may very well signify the end of the Internet.

In fact, RFC 6264,86 while acknowledging that “global IPv6 deployment was slower than originally expected,” proposes “an incremental CGN approach [through tunneling] for IPv6 transition. It can provide IPv6 access services for IPv6 hosts and IPv4 access services for IPv4 hosts while leaving much of a legacy ISP network unchanged during the initial stage of IPv4-to-IPv6 migration.”

It is not only the terminology that gets complex here. One essentially new and urgent matter is the need for a new private address space. Just as the public IPv4 addresses may not be used inside NAT-protected networks, so the private addresses used by the hosts in home or enterprise networks may not be assigned to gateways and other entities at the next level of translation.

One alarming practice, which emerged in the first decade of this century, was “address squatting.” This simply means using the yet-unregistered part of the fast-depleting IPv4 address space. As pointed out by ICANN 's Leo Vegoda in [16]: “Many organizations have chosen to use unregistered IPv4 addresses in their internal networks and, in some cases, network equipment or software providers have chosen to use unregistered IPv4 addresses in their products or services. In many cases the choice to use these addresses was made because the network operators did not want the administrative burden of requesting a registered block of addresses from a Regional Internet Registry (RIR).”

Ultimately, in 2012, the Internet community has agreed on the best current practice here, as published in RFC 6598.87 The practice is to extend the private address space further than specified in RFC 1918 to allocate the IPv4 prefix for the shared address space. Specifically responding to the squatting practice, RFC 6598 forbids numbering the CG-NAT second-level interfaces (e.g., those of gateways) from the pool of “usurped globally unique address space (i.e., squat space),” explaining that when “Service Provider leaks advertisements for squat space into the global Internet, the legitimate holders of that address space may be adversely impacted, as would those wishing to communicate with them.” And even “if the Service Provider did not leak advertisements for squat space, the Service Provider and its subscribers might lose connectivity to the legitimate holders of that address space.”

But what is left to the service provider then? One of two things:

  • Reusing the private address space carefully so that the same address is never assigned to two entities, of which one is inside and the other outside the gateway; or
  • Using some new address space (which, incidentally, is the only case when the first option is unenforcible) “in an unmanaged service, where subscribers provide their own CPE and number their own internal network.”

The new address space has been specifically designated “with the purpose of facilitating CGN deployment.” Subsequently, the American Registry for Internet Numbers (ARIN) has adopted the following policy: “A second contiguous/10 IPv4 block will be reserved to facilitate IPv4 address extension. This block will not be allocated or assigned to any single organization, but is to be shared by Service Providers for internal use for IPv4 address extension deployments until connected networks fully support IPv6. Examples of such needs include: IPv4 addresses between home gateways and NAT444 translators.”

And so the depletion of the IPv4 address space has been postponed once again … This has not happened without cost though. RFC 6598 lists several examples where certain services become impossible to provide with CG NAT.

Some of these services appear to be of the type one could live without—like console gaming (when two subscribers using the same IPv4 address try to connect with one another. Others include peer-to-peer-to-peer applications and video streaming. More important services in this category are geo-location services that need to identify the location of a CG NAT server and “6-to-4” (i.e., IPv6-to-IPv4) translation, which requires globally reachable addresses.

Now it is easy to follow up on our earlier promise to clarify the enigmatic use case for DMZ, in which an enterprise places its public web server there. Of the two firewalls in Figure 5.20(b), the NAT box is implemented in the one that separates the trusted zone from DMZ. No public server can possibly function behind the NAT, and so the DMZ is the only place (still protected by another firewall) in the enterprise where it can be located. (Similarly, in the case of Figure 5.20(a), the NAT box is located so that it does not interfere with the traffic between the DMZ interface and the public access interface.)

We are now ready to move to the next function that is often implemented in firewalls—the load balancing.

5.4 Load Balancers

The concept of load balancing is very intuitive. This is how one chooses the most appropriate resource (out of several identical resources) to perform a given function. Imelda Marcos, for example, has been known for her huge collection of shoes (about 3000 pairs), part of which is displayed in the National Museum of the Philippines. We surmise that she would select a pair considering such factors as the time of day, the weather, and the outfit she planned to wear.88

A similar example is from telephony. We have chosen this because it is still widely in use and also because it is very illustrative of virtually all uses of load balancing in IP networks. The example is the so-called 800 (also known as Freephone) service, described in detail in [17] and illustrated in Figure 5.34.

Diagram shows how load balancing is done in a call center with 800 services.

Figure 5.34 A load balancing example: choosing a call center with the 800 service.

When a person needs to call a company's customer service, the number to reach it indicates a special connection that should be free to a caller; instead the called party (in this case, the company) is billed for it. In the USA, this number is assigned special prefixes—the 800 prefix was the first one to appear when AT&T rolled out the service in the early 1980s—to signal that it should be treated differently from “normal” numbers.

When such a number is dialed, the telephone switch immediately knows that it cannot route it and that it needs a special instruction to process it.89 Hence, the switch turns for instruction to a computer called the Service Control Point (SCP). (All the signaling data among the switches and NCP flow through a separate data communications network operating on the Signalling System No. 7 protocols standardized by ITU-T.)

The SCP, in turn, invokes a custom-written service logic program, which makes the translation. In our example, the program looks at the time of the call to decide which call center should be reached. Since the time of day happens to be 5:30 PM, the program figures that the center in New Jersey will be closed for the day, and therefore the most suitable center is that in California (where it is 2:30 PM). Hence the route to the on-premises call center switch in California is specified. For calls starting three hours later, the call center in Bangalore will be specified for the next eight hours, and after that the New Jersey call center. The sun never sets over the company! Note that the translation is handled dynamically, according to the service logic program. (Also note that this example is by no means that of an antiquated technology—Google has been providing a follow-me variant of this service as part of Google Voice.90

In this example, each call is assigned to a call center based on the center's availability. But with IN it is also possible to choose one out of several call centers based on the distribution of the load: calls will be routed to a center that has been handling fewer calls than the other ones. To this end, the calls can be evenly distributed among the call centers statistically, with assigned weights. In the extreme case, SCP can be used to protect the telephone network from overload by applying call-gapping, that is dropping a defined percentage of calls (in which case the caller would hear a fast “busy” signal).

The load-balancing features described above are pretty much uniform across multiple environments. The rest of this section describes load balancing in a modern server farm, provides a practical example of implementing and deploying load balancing, and discusses the use of DNS for the purposes of load balancing.

5.4.1 Load Balancing in a Server Farm

As it happens, the above set of features is pretty much the same as that used in the modern server farm depicted in Figure 5.35. The servers are processes (each process possibly running on a dedicated host) executing a task in response to a client. World Wide Web servers (returning Web pages) were perhaps the first widely used example, followed by SIP servers (for IP telephony), DNS servers, and so on. A load balancer gets the request, which it then forwards to the server it selects.

Diagram shows the components of a server farm which consist of a set of back-end servers that execute the requests from a client via a load balancer.

Figure 5.35 A server farm.

The back-end servers don't necessarily have to be in the same location; in fact, they can be geographically dispersed (in which case, of course, their collection won't be called a “farm”). In fact, it is not even necessary for the back-up servers to be “servers.” Load balancers can be implemented in the routers at the edge of a network, in which case they distribute traffic over spare links.

A few major benefits of this arrangement are as follows:

  1. Speeding up processing (using parallelism—all servers are capable of performing every specified task);
  2. Improving reliability (if one or more servers go down, the performance degrades gracefully, being restored when the servers are brought back); and
  3. Supporting scalability (with highly parallel tasks, adding new servers increases performance linearly).

The last two items are essential Cloud Computing characteristics, which makes the usefulness of load balancers in Cloud Computing obvious.

There is a benefit for security, too—somewhat similar to that provided by NAT—in that the internal structure of the back-end operation is hidden from outside. When a load balancer is implemented (as is often the case) within the firewall, it can actually terminate a security session, thus saving the back-end servers from maintaining the security state. More important, load balancers are used to mitigate various denial-of-service attacks, notably the SYN attack—by implementing the SYN cookies and delayed binding.

As we can see already, the nature of a load balancer is dual: it is both a middle-man and a scheduler. The latter is the most essential function of a load balancer, and so we will concentrate on that in the rest of this section.

The first pertinent matter here is choosing a scheduling algorithm. There are services for which a back-end server can be selected randomly—with a statistical specification of the proportion of the tasks assigned to each back-end server. A geographic (or network-specific) location of a back-end server can be yet another factor, which may be critical for some applications. With that, a scheduling algorithm can also be based on the feedback mechanism: a load balancer can monitor the load and the health of each server and adjust the task assignments accordingly.

So far, we have tacitly assumed that the type of task suitable to load balancing applications is a one-time request/response transaction. Yet, load balancing has been made to work with session-oriented applications. The major issue here is where to keep the state of a session so that a newly assigned back-end server can know the exact context of a request that has arrived.

Some application protocols—notably HTTP—store the entire state of the session at the client. (Interestingly, this mechanism was apparently developed not with load balancing in mind but to solve the scalability problem. When the number of clients is expected to be significantly large, it becomes impractical or even impossible to store the server-side state of each session on the server. Instead, the protocol forces the client to store that state and then present it to the server in the next message,91 as depicted in Figure 5.36.

Diagram shows the server receives request x from a client and the client receives response, <state x> from the server. Finally, the server receives request y, <state x> from the client.

Figure 5.36 Saving session state at the client (a cookie).

The body of the state information is called a cookie. This is a core mechanism enabling many Web applications. Consider shopping online—moving from page to page, putting things in your basket, and so on. The server does not keep the history of the browsing. Instead, it returns a cookie to the client, which the latter is expected to present with the next request. Once the server receives the cookie, it restores the state. Incidentally, this explains why a cookie should be cryptographically protected (lest a rogue client fools the server by modifying the cookie to indicate that the payment has already been made when in fact there was none) and encrypted (to protect both the privacy of the customer and the business interests of the online enterprise).

There are applications, however, which do not use the cookie mechanism. In this case, in order to enable load balancing, the session state is stored in a central database. This, of course, creates a significant reliability problem because of the single point of failure.

There is ongoing research on load balancers, particularly their use in assuring self-healing properties of the network. One interesting approach (demonstrated in [18]) is based on analogies with chemistry. For server load balancing, [19] is a comprehensive monograph.

There is no mystery to load balancing, as the next section will attest.

5.4.2 A Practical Example: A Load-Balanced Web Service

In practice, it is very simple to create and deploy load balancing in the Cloud using open-source software.92 In our example that follows, we use the World-Wide Web service and more specifically, Nginx93 server software. We recommend that a reader try this as a practical task, using, for example, Amazon's EC2 service.

Our task is to create a website maintained on four identical servers, which are load-balanced by the fifth server, as shown in Figure 5.37. To achieve this, we can create five virtual machines running the Linux operating system as Amazon EC2 instances and deploy the Nginx server on each of them. One machine will use the Nginx server to act as a load balancer among the other four.

Diagram shows an Nginx-based load-balanced web service. It shows an HTTP client browser connected to a load balancer through request or response, and the load balancer is connected to four identical virtual servers.

Figure 5.37 An example of an Nginx-based load-balanced web service.

For each of the four servers, the index.html file (located in the directory/usr/share/ nginx/html) should be updated with the text identifying the server id as demonstrated in Figure 5.37 (with the string SID being replaced by Server1, Server2, Server3, or Server4).

The only thing left to do (and this is the most interesting thing) is to configure the load balancer. This turns out to be as straightforward as configuring the server. The load balancer's configuration file is located at /etc/nginx/nginx.conf. Figure 5.38 contains the sample code.

In our example, the weight of each server has the value 1, which means that at run time the traffic will be distributed equally among the four servers, but the weight values can be assigned arbitrarily, and we recommend that the reader experiment with assigning servers different proportions of the traffic, by setting weights correspondingly. To cause the new configuration to take effect, one needs to execute the shell command: /etc/init.d/nginx reload.

5.4.3 Using DNS for Load Balancing

It is possible to achieve load balancing even without employing a dedicated front-end server. The load balancing in this case is performed by the DNS translation function, which returns the IP address of a specific server within the farm. There are two aspects to the DNS-supported load balancing: (1) specifying the addresses of the servers and (2) specifying the load-balancing algorithm.

We will deal with the former aspect first. There are (at least) two ways to specify the addresses, which are presented in Figure 5.39.

In both cases we deal with an abstract service, <service>, which could be any application service (www, ftp, mail, sip, etc.) supported in both the client and DNS server. Figure 5.39(a) shows the response record, which translates serv.example.com into a list of four IP addresses. The alternative, depicted in Figure 5.39(b), is to translate serv.example.com into a list of authoritative servers for the four zones so that each zone is administered by the very server to pick up the load.94 The latter approach slows things down, but it has an advantage in dealing with overload: if the server is, too busy, it won't respond to the DNS query, and so will appear non-existent.

Image described by surrounding text.

Figure 5.38 Configuring the load balancer.

Left column depicts the alternative records approach in which the DNS server is connected to the client through query or response. Right column shows the alternative zones approach that includes four virtual servers.

Figure 5.39 Load balancing with DNS.

As far as the algorithm for selecting the return order of multiple records is concerned, it is defined—in the bind implementation95—by using the rrset-order specification. Specifically, the ordering attribute may take values of fixed, random, or cyclic (the default). When the ordering is fixed, records are returned exactly in the order they are defined; when it is random, the order is random; when it is cyclic, the order is permuted cyclically—resulting in the round-robin schedule.

One problem with the DNS-supported load balancing is caching—the cache returns the same answer for the duration period specified by the TTL parameter. An extreme way of dealing with this problem is to make the TTL value equal to 0, but that would increase the load on the DNS servers. The cache problem is perhaps the most serious to deal with. A number of problems are also caused by the server's availability (or rather lack thereof): if a server is unavailable, the DNS server may still return its address. (Associating the server with its own zone, described earlier, disposes of exactly this problem.) In a less extreme case, the DNS, while unaware of the servers' respective loads, may only provide service distribution rather than load distribution.

To overcome this difficulty, some DNS servers have been enhanced to interact96 with each server to gather the information on its respective load and then make scheduling decisions based on that information. An earlier (ca. late 1990s) implementation of this type of server is the Lbnamed DNS server developed by Rob Riepel of Stanford University. The documentation, software distribution, and accompanying tutorial are available at www.stanford.edu/∼riepel/lbnamed/.

With time, the DNS load balancing capabilities have evolved along the lines of intelligent network processing (cf. Figure 5.33)—guided by dedicated service logic. The latter is invoked at the DNS server when a query is received, and it is capable of processing many factors, including the geographic location of the client that issued the request (so as to find the server that is closest to the client).97 (In a private conversation with an Acamai representative at a conference, we were told that “whatever IN considered we have in our DNS servers!”) This confirms once more that solid concepts are invariant to technological fashions—what was good for PSTN is still good for the Internet.

There are relevant products on the market. For an example98 confirming the above observation, we refer the reader to a White Paper [20].

Notes

References

  1. Nabokov, V.V. (1979) The Gift. Collins Collector's Choice: Five Novels. Collins, London, p. 445.
  2. Gabrilovich, E. and Gontmakher, A. (2002) The homograph attack. Communications of the ACM, 45(2), 128.
  3. Son, S. and Shmatikov, V. (2010) The hitchhiker's guide to DNS cache poisoning. Proceedings of the 6th International ICST Conference on Security and Privacy in Communication Networks (SecureComm), Singapore, September, pp. 466–483.
  4. Goldsmith, J.L. and Wu, T. (2006) Who Controls the Internet? Illusions of a Borderless World. Oxford University Press, Oxford.
  5. Scarfone, K. and Hoffman, P. (2009) Guidelines on Firewall and Firewall Policies. Recommendations of the National Institute of Standards and Technology, NIST Special Publication 800-41, Revision one. US Department of Commerce, Gaithersburg, MD.
  6. Avolio, F. (1999) Firewalls and Internet security, the second hundred (Internet) years. The Internet Protocol Journal, 2(2), 24–32.
  7. Bellovin, S.M., Cheswick, W.R., and Rubin, A.D. (2003) Firewalls and Internet Security: Repelling the Wily Hacker, 2nd edn. Addison-Wesley Professional, Boston, MA.
  8. Eichin, M.W. and Rochlis, J.A. (1989) With microscope and tweezers: An analysis of the Internet virus of November 1988. www.mit.edu/people/eichin/virus/main.html (presented at the 1989 IEEE Symposium on Research on Security and Privacy).
  9. United States General Accounting Office (1989) Virus highlights need for improved Internet management. Report to Chairman, Subcommittee on Telecommunications and Finance, Committee on Energy and Commerce, House of Representatives. IMTEC-89-57, June 12. www.gao.gov/assets/150/147892.pdf.
  10. Pólya, G. (1945) How to Solve It. Princeton University Press, Princeton, NJ.
  11. Eddie, W.M. (2006) Defenses against TCP SYN flooding attacks. The Internet Protocol Journal, 9(4), 2–16.
  12. Nietzsche, F.W. (1886) Beyond Good and Evil. Cited from the 2006 Filiquarian Publishing edition, New York, p. 82.
  13. Dutcher, B. (2001) The NAT Handbook. John Wiley & Sons, Inc., New York.
  14. Zhang, L. (2007) A retrospective view of NAT. IETF Journal, 3(2), 14–20.
  15. Faynberg, I., Lu, H.-L., and Gabuzda, L. (2000) Converged Networks and Services: Internetworking IP and the PSTN. John Wiley & Sons, Inc., New York.
  16. Vegoda, L. (2007) Used but unallocated: Potentially awkward/8 assignments. The Internet Protocol Journal, 10(3), 29–33.
  17. Faynberg, I., Gabuzda, L.R., Kaplan, M.P., and Shah, N. (1996) The Intelligent Network Standards: Their Applications to Services. McGraw-Hill, New York.
  18. Meyer, T. and Tschudin, C. (2009) A Self-Healing Load Balancing Protocol and Implementation. Technical Report CS-2009-001, University of Basel.
  19. Bourke, T. (2001) Server Load Balancing. O'Reilly & Associates, Sebastopol, CA.
  20. Elfiq Networks (2012) Application and Service Delivery with the Elfiq iDNS Module. Technical White Paper, Elfiq Inc., Montreal. www.elfiq.com/sites/default/files/elfiq_white_paper_idns_module_v1.63.pdf.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset