Appendix B
IPv4 PACKET

B.1 Introduction

IP (i.e., IPv4 and IPv6) is a network layer protocol that provides connectionless services to upper-layer protocols (e.g., TCP (Transmission Control Protocol), UDP (User Datagram Protocol), Internet Control Message Protocol (ICMP), Internet Group Management Protocol (IGMP), Open Shortest Path First (OSPF)). The connectionless service is implemented with the basic unit of transport being the datagrams (frequently referred to as packets) that contain the source and destination IP addresses of end-user points and other parameters needed for protocol operation.

The IP layer relies on the underlying network infrastructure for transporting the IP datagrams (packets). This means that IP datagrams are encapsulated (in some cases after segmentation into smaller units) by the frames of the underlying network such as Ethernet, ATM (Asynchronous Transfer Mode), and SONET/SD.H (Synchronous Optical Network/Synchronous Digital Hierarchy). Regardless of the physical layer protocol mix, maximum transmission unit (MTU) sizes of interfaces and speed differences of the links in the underlying networks, the IP layer translates these different layer networks into a common logical IP network that is independent of physical characteristics and differences.

The upper-layer protocols such as TCP, UDP, ICMP, IGMP, and SCTP (Stream Control Transmission Protocol) need not be aware of the hardware, encapsulation methods, and other characteristics of the underlying network. Upper-layer protocols may expect some levels of quality of service (QoS) from the IP layer during data delivery, such as throughput, delay, and delay variation. These are often referred to as QoS parameters to characterize the nature of the data delivery.

In some cases, the upper layers may pass the QoS expected along with data to the IP layer. The IP layer may support mechanisms to enable it map the QoS parameters to services provided by the underlying network infrastructure. The underlying network may or may not support capabilities to provide the QoS demanded or expected from the upper layer.

The basic IP service is based on “best effort” delivery of data, that is, IP does not guarantee that packets received would be delivered to the next node or final destination, it only tries its best to reach the next node or destination. IP also does not guarantee that packets sent would arrive in the proper sequence or the duplication of packets sent or delivered. Upper-layer transport protocols, such as TCP and SCTP, are responsible for handling sequencing, duplication, and other data integrity issues.

The Internet Protocol version 4 (IPv4), described in IETF (Internet Engineering Task Force) publication RFC 791 (September 1981), is the fourth version or iteration of IP when tracing the development of this protocol (IP). IPv4 was the first version of the protocol to be widely deployed and is the network protocol that currently drives a majority of today's enterprise, service provider, and the Internet.

The previous chapter gives a description of the Ethernet frame format and the different fields that make up a frame. This chapter provides a description of the IPv4 header and its corresponding fields. IPv6, the successor protocol to IPv4, is not discussed in this chapter.

B.2 IPv4 Packet Format

IP receives data segments from upper-layer protocols (e.g., TCP (protocol number 6), UDP (protocol number 17), SCTP (protocol number 132), ICMP (protocol number 1), IGMP (protocol number 2), OSPF (protocol number 89)) and formats them into IP packets. The IP layer receives the data units from the upper-layer protocols and adds its own header information.

The received data from the upper layer constitute the payload in the IP packet. The IP header contains all the information needed to route the packet through the IP network node to node from the source to the destination (unicast transmission) or destinations (multicast transmission). The IP packets created are then encapsulated by the protocol frames of the underlying network (e.g., Ethernet) as illustrated in Figure B.1.

Figure depicts IPv4 Packet carried by a link-layer frame.

Figure B.1 IPv4 Packet carried by a link-layer frame.

B.2.1 IPv4 Header

The IPv4 packet header is placed at the front of every IPv4 packet created. The header consists of 14 fields, one of which is optional. The packet is normally 20 bytes in length, but sometimes can carry additional options in a variable length field located after the Destination Address field (Figure B.2). The fields in the IP header are formatted with the most significant byte written first (Big Endian), and within the byte, the most significant bit is written first. For example, the Version field in the IP header is located in the four most significant bits of the first byte of the packet. The IP header and the fields it contains are described below:

  • Version: This 4 bit field specifies the version number of the Internet Protocol used, which also indicates the format of the IP packet header. The header carries other important information other than the version number of the Internet Protocol used. In the context of the protocol discussed here, this field carries a value of 4. The Version field in IPv6 packets contains the value 6.
  • Internet Header Length (IHL): This 4 bit field specifies the length of entire IP header (including the length of the data in the Options field, when present). The IHL specifies the length of the IP packet header in 32 bit (4 byte) words – that is, the number of 32 bit words in the IP header (Figure B.2). The minimum value for a valid header is 5, which gives a 20 byte (5 × 32 = 160 bits) header (when no options are carried). The maximum IHL value possible is 15, which gives a packet header length of 60 bytes (15 × 32 bits = 480 bits). The IHL value of 15 allows the header to carry a maximum of 40 bytes of IP options.
  • Differentiated Services Code Point (DSCP): This 6 bit field is defined in [RFC2474] and occupies part of the obsoleted 8 bit type of service (TOS) field. The six most significant bits (of the obsoleted TOS field) are used for the DSCP and the remaining 2 bits for the ECN (discussed below). The old TOS field (updated in [RFC1349) was used to specify the parameters describing the type of service requested by upper layers. The parameters may be utilized by networks to define how packets from an upper layer client should be handled during transport. The “M” bit was added to the TOS field in [RFC1349].
  • Explicit Congestion Notification (ECN): This 2 bit field, defined in [RFC3168], carries information about the state of congestion along the route taken by the IP packet from source to destination. ECN is optional and is only effective when the underlying network supports capabilities that allow for the signaling/notification of network congestion state end-to-end. ECN can be used only when the endpoints and the intermediate network support ECN processing capabilities. When used ECN, packets are not dropped (by the intermediate network nodes between the endpoints), but instead the ECN fields in the packets are “marked” with congestion notification information.
  • Total Length: This 16 bit field specifies the total length (in bytes) of the entire IP packet. This specifies the length of entire IP packet (including the IP header and IP payload). The minimum IP packet length is 20 bytes (i.e., 20 bytes of IP header plus 0 bytes of payload data), and the maximum length is 65,535 bytes (i.e., 216-1, since only 16 bits are available in this field to specify the total packet length). All network links are expected be able to handle IP packets of at least 576 bytes, but a more typical packet size is 1508 bytes. Note that a fragment of an IP packet is considered a complete IP packet in itself by downstream nodes (since it has its own IP header).
  • Identification: This 16 bit field is used to identify the fragments of one IP packet from those of another (different) IP packet. If an IP packet is fragmented by a node during transport, all its fragments are given (assigned) the same identification number to enable the destination identify the original/sourced IP packet they belong to. The network node that is the source of the fragments sets the identification field to a value that must be unique for that source–destination pair and for the time the packet will be alive/active in the network.
  • Flags: This is a 3 bit field that is used during packet fragmentation. Dictated by network and link characteristics, if an IP packet is too large to be transmitted, the ‘flags’ indicate if the packet can be fragmented or not. A fragment is part of an IP packet but given its own IP header. In the 3 bit flag, the MSB is always set to ‘0’. The 3 bit Flags field are defined as follows (from high- to low-order bit):
    • - 1bit R (Reserved) Flag: This should be set to 0.
    • - 1bit DF (Don't Fragment) Flag: This controls the fragmentation of the IP packet (0 = Fragment if necessary; 1 = Do not fragment).
    • - 1-bit MF (More Fragments) Flag: This indicates if a packet (created from a fragment) has additional fragments after it (0 = This is the last fragment; 1 = More fragments follow this fragment).

    When a complete IP packet leaves its source, the source clears the MF (More Fragments) bit to zero and the Fragment Offset field to zero before transmitting the packet.

  • Fragment Offset: This 13 bit field carries an offset that specifies the exact position of the fragment in the original IP Packet. The offset is measured in units of 8 byte blocks (64 bits) and specifies the offset of a particular fragment starting from the leading end of the original unfragmented IP packet. This information allows the receiving endpoint to properly reassemble the fragments to obtain the original IP packet.
  • Time to Live (TTL): This 8 bit field is a timer field (that carries a value) used to track the lifetime of the IP packet as it travels through the network. When the TTL value is decremented down to zero at a network node, the packet is discarded. To avoid packets traveling in loops in the network, every packet is sent with the TTL field set to an initial value, which indicates to the network how many routers (hops) the packet is allowed to cross. Typically, the TTL value is set to an integer value indicating the number of hops a packet can cross. At each hop, this integer value is decremented by 1 and when the value reaches 0, the packet is discarded.
  • Protocol: This 8 bit field specifies the upper-layer protocol that is the recipient of the IP packet payload. It indicates to the IP (Network) layer at the destination host to which next layer protocol the packet payload should be sent, that is, the client protocol. For example, the protocol number of ICMP (Internet Control Message Protocol.) is 1, IGMP (Internet Group Management Protocol) is 2, TCP is 6, UDP is 17, and SCTP (Stream Control Transmission Protocol) is 132.
  • Header Checksum: This 16 bit field is used to carry checksum value over the entire IP header (only). This allows a receiving IP node to check if the packet is received error-free. The 16 bit checksum is generally performed as a one's complement of the IP header including all IP options. When an IPv4 packet arrives at an IP node (router), the router calculates the checksum of the received header and compares this with the value carried in the checksum field. If the checksum values do not match, the router discards the packet. Modifying any part of the IP header at a node (e.g., TTL) requires recalculation of the IP checksum. Both UDP and TCP have their own checksum fields that allow for errors in the data they receive to be verified for errors. This means errors in the IP packet Data field is to be handled by the upper-layer protocol.
  • Source Address: This 32 bit field contains the IP address of the sender (or source) of the IP packet. The source IP address may be changed in transit by a network address translation device.
  • Destination Address: This 32 bit field contains the address of the receiver (or destination) of the IP packet. This address may also be changed in transit by a network address translation device.
  • Options: This is an optional field (0–40 bytes in length), which is used if the value of IHL is greater than 5 (but not greater than 15) 32 bit words. The Options field may contain values for options such as security, record route, and time stamp. The Options field is variable in size and, when used, increases the overall length of the IP header. This means the IHL will only ever be greater than 5 if there are IP header options. The maximum size of the IP options is obtained as follows: (15 – 5) × 32 bits = 320 bits = 40 bytes.
  • Data: The contents of this field are interpreted based on the client protocol (i.e., value carried in the IP header Protocol field) (Figure B.2). Since the IP header is at least 20 bytes, the maximum payload (data carried in Data field) is limited to 65,535 bytes – 20 bytes = 65,515 bytes. If a maximum number of IP options are carried, then the maximum payload becomes 65,535 bytes – 40 bytes = 65,495 bytes. The IPv4 Header is typically followed by an ICMP header, IGMP header, or a Transport Layer (UDP, TCP or SCTP) header, which, in turn is usually followed by client protocol data. The number of bytes in the client protocol data field is the value of the IP Total Length minus the length of any other headers (ICMP or Transport Layer header), minus the value of IHL (usually 20 bytes). A UDP datagram might have a 20 byte IPv4 packet header, an 8 byte UDP Transport Layer header, and 500 bytes of data, resulting in an IP Total Length packet of 528 bytes. A variable length “Padding” is used by the IP layer as a redundant filler to ensure that the data in the IP packet start on a 32 bit boundary. Note that the data portion of the IP packet is not included in the IP header checksum calculation.
Figure depicts IPv4 Packet format.

Figure B.2 IPv4 Packet format.

Differentiated Services introduces the notion of per hop behaviors (PHBs) that define how traffic (packets) belonging to a particular behavior aggregate (i.e., packets considered to have the same forwarding characteristics) is handled at an individual network node [RFC3140]. PHBs are not indicated directly in IP packet headers; instead, the DSCP values carried in the header can be used in implementing the PHBs in a network node. The 6 bit DSCP field allows for 64 possible DSCP values, but there is no limit on the number of PHBs a node can implement.

Any given network domain can implement its own mechanisms for defining locally the mapping between DSCP values and PHBs. There are standardized PHBs with recommended DSCP mappings, but network operators may choose to implement suitable alternative mappings.

B.3 IPv4 ADDRESSING

A MAC address is an identifier associated with a physical or virtual/logical interface on an Ethernet device or on devices using technologies such as Token Ring and FDDI. The typical case is the MAC address is permanently stored in a network interface adapter to uniquely identify the interface in a network. MAC addresses belong to and are used at the Data Link Layer, while IP addresses belong to/used at the Network Layer. The IP address of a device interface may change as the device is moved in a network to different IP subnets or VLANs or when it powers up (in a network with a Dynamic Host Configuration Protocol (DHCP) server), but the MAC address remains the same, because it is associated with the device interface.

The 32 bit Source Address field (which contains the IPv4 address of the packet source) may be modified by a NAT (Network Address Translation) device. Also, a source IP address can be all 0s (the unspecified IP address) in certain cases, but it can never be a multicast address (which identifies only a group of multicast receivers). The 32 bit Destination Address field (which contains the IPv4 address of the packet's receiving endpoint) may also be modified by NAT in a reply packet. This returned IP address can be a unicast or multicast IPv4 address, but it can never be all 0s (the unspecified IP address).

For convenience and to make IP addresses more readable, 32 bit IPv4 addresses are often written in the dotted-decimal notation. This consists of splitting the 4 byte address into four groups, each with 1 byte. Each octet of a group is then expressed individually in decimal (taking values from 0 to 255) and separated by periods (.). This allows the 32 bit IPv4 addresses to be conveniently expressed in dotted-decimal notation, in which each octet (or byte) is expressed as a separate decimal number. Within an address octet, the rightmost bit represents 20 (or 1), increasing to the left to the first bit in the octet that is 27 (or 128). The following are IP addresses expressed both in binary and dotted-decimal formats:

  1. 00110011 11001100 00111100 00111011 = 51.204.60.59
  2. 11010000 01100010 11000000 10101010 = 208.98.192.170
  3. 01110110 00001111 11110000 01010101 = 118.15.240.85

A 32 bit IPv4 address, in general, is organized in two primary parts: the network prefix and the host identifier or number. Based on this, IPv4 addresses are organized as two groups of bits in the address. The first group contains a portion of the most significant bits and constitutes the network address (or network prefix or network block). This part identifies a whole network or subnet.

The remaining part made up of the least significant bits forms the host identifier. This part specifies a particular interface of a host on that network. This distinction between network prefixes (subnetworks or subnets) specified in the IP address is the basis of traffic routing between IP networks and subnets. The network part (prefixes), which can be of variable length, is also the basis on which IP address allocation policies are developed.

All IP (host) interfaces within a single network or subnet have the same network prefix. This picture gets complicated with the use of supernetting (also called prefix aggregation, route aggregation, or route summarization) [RFC1519], which is not discussed in detail in this book. Each individual interface (host) within the network/subnet also has its own identifier (host identifier/address), that uniquely identifies it in that network, along with its network prefix.

Also, depending on the interface type and the scope of the network, the IP address assigned to it can be either locally or globally unique. Host interfaces that are accessible/visible to other IP nodes or host outside their local network (e.g., email servers, web servers, video servers) must be assigned a globally unique IP addresses. Host interfaces that are accessible/visible only within their local network must be assigned locally unique IP addresses.

The Internet Assigned Numbers Authority (IANA) is the central numbering authority responsible for assigning IP addresses in addition to other related activities such as root zone management in the Domain Name System (DNS), autonomous system number allocation, and so on. IANA is responsible for ensuring that IP addresses given out are globally unique where required. The IANA also has reserved a large IP address space for use by devices interfaces that are not accessible/visible outside their local networks.

IANA allocates (blocks of) addresses to the Regional Internet Registries (RIRs) who then pass on the addresses to their customers (Local Internet Registries (LIRs), Internet service providers, end-user organizations, etc.) to carry out the actual allocation to end users. A local Internet registry receives an address allocation from a regional Internet registry and then assigns parts of the allocation to their customers. Most local Internet registries also operate as Internet service providers.

IPv4 address allocation went through a number of historical changes, from the original ARPANET (Advanced Research Projects Agency Network) address allocation scheme to classful networking, to networking with Variable-Length Subnet Mask (VLSM), and then to Classless Inter-Domain Routing (CIDR).

B.3.1 Original ARPANET Addressing Scheme

In this older IP addressing scheme, the 32 bit IP address was organized into two parts: the network identifier and a host identifier. The network identifier (or network number field) was carried in the most significant (or highest order) octet of the 4-octet IP address. The host identifier (or the local address) was carried in the remaining 3 octets of the IP address (these 3 octets were also called the rest field). The network number field (in the most significant 8 bits of an address) specified the particular network a host was attached to, while the local address (rest field) uniquely identifies a host connected to that network. This addressing system allowed the creation of a maximum of 256 unique networks.

This addressing scheme was adequate at that time, because only a few large IP networks existed, one of which was the ARPANET (which was assigned the network number 10). However, with the wide and rapid proliferation of IP networking, the construction of local area networks (LANs), subnets, and large individual IP networks (academic, enterprise, and service provider networks), it became quickly clear that this addressing scheme was inadequate and not scalable for future network growth.

B.3.2 Classful Addressing

In this IP addressing system, the high-order octets of the 4-octet (32 bit) IP addresses are organized in various blocks and defined to create a set of classes of networks. This scheme was to address the limitations of the original ARPANET addressing scheme and to provide flexibility in the number of addresses allocated to networks of different sizes. This IP addressing system defined five address classes: Class A, B, C, D, and E [RFC791]. Each address class, coded in the first 4 bits of the 32 bit IP address, defines either a network size, that is, the number of potential hosts requiring unicast addresses (classes A, B, C) or a multicast network (class D).

The classes A, B, and C are given different number of bits to accommodate the network identifier. The rest of the bits in an address class is used to identify a host within a network using that class. This means each address class has a different maximum number of host identifiers/addresses that can be assigned to potential hosts.

Class D was designated for IP multicast addressing, while class E was reserved for experimental purposes or future applications. During processing of IP packets, a network node would examine the first few bits of the IP address to determine the class of the address and where the actual network identifier starts and ends.

This classful IP addressing scheme is illustrated in Table B.1. This table shows that each IP address class reserves/specifies a different number of bits for its network identifier (prefix) and host identifier:

  • Class A addresses designate only the first byte for specifying the network prefix and the remaining 3 bytes for the individual host identifiers.
  • Class B addresses designate the first 2 bytes for specifying the network prefix and remaining 2 bytes for the host identifiers.
  • Class C addresses designate the first 3 bytes for specifying the network prefix and the remaining 1 (last) byte for host identifiers.

Table B.1 Classful Addressing

Class Leading Bits (Class Identifier) Size of Network Identifier Field (Bits) Size of Host Identifier Field (Bits) Number of Networks Addresses Per Network Start Address End Address
Class A 0 8 24 128 (= 27) 16,777,216 (= 224) 0.0.0.0 127.255.255.255
Class B 10 16 16 16,384 (= 214) 65,536 (= 216) 128.0.0.0 191.255.255.255
Class C 110 24 8 2,097,152 (= 221) 256 (= 28) 192.0.0.0 223.255.255.255
Class D (multicast) 1110 Not defined Not defined Not defined Not defined 224.0.0.0 239.255.255.255
Class E (reserved) 1111 Not defined Not defined Not defined Not defined 240.0.0.0 255.255.255.255

The three IP address classes can be represented in binary format as follows, with an h representing each bit in the host identifier:

  1. Class A: 0NNNNNNN hhhhhhhh hhhhhhhh hhhhhhhh
  2. Class B: 10NNNNNN NNNNNNNN hhhhhhhh hhhhhhhh
  3. Class C: 110NNNNN NNNNNNNN NNNNNNNN hhhhhhhh

Each bit (h) in a host identifier can have a 0 or 1 value. For example, if only 3 bits are reserved for specifying the host identifier, then we have the following possible host identifiers:

  1. 000; 001; 010; 011; 100; 101; 110; 111

For each IP address class, if H is the number of host identifier bits, then the maximum number of host identifiers that can be supported by that particular network prefix is 2H.

  1. Class A Addresses: Maximum number of host identifiers = 224 (or 16,777,216), H = 24.
  2. Class B Addresses: Maximum number of host identifiers = 216 (or 65,536), H = 16.
  3. Class C Addresses: Maximum number of host identifiers = 28 (or 256), H = 8.

The maximum number of usable addresses for addressing individual hosts in each address class, however, is 2H – 2. The minus 2 here is used to account for the predefined all 0s host identifier part used in a network address and the all 1s host identifier part used in a broadcast address. In a classful network, the mask identifying where that network prefix starts and ends is implicitly derived (inferred) from the IP address itself (i.e., from the leading address bits as shown in Table B.1).

In common network practice, the all-zeros (all 0s) in the host identifier is reserved for referring to all hosts in that network (the entire network or subnet). The all-ones (all 1s) in host identifier is used as a broadcast address in the given network or subnet. These reduce the number of identifiers/addresses available for hosts in a network or subnet by 2. The /31 networks (255.255.255.254) are rarely used, typically used only point to point links (RFC3021]. Such a link supports only two hosts (the endpoints), therefore specifying network and broadcast addresses is not necessary

B.3.3 IPv4 Variable-Length Subnet Masks (VLSM)

To address the physical, architectural, size, and management limitations encountered when constructing large networks, network designers often segment large networks into smaller networks/subnetworks. Let us assume three hosts connected to one subnet (Subnet Delta) and three other hosts connected to a second subnet (Subnet Gamma). Combined, these six hosts and the two subnets (Delta and Gamma) form a larger network than the individual ones. If we assume that the entire network is assigned the network prefix (a Class B address) 192.14.0.0, then each of the six hosts will be assigned an IP address that carries this network prefix.

In addition to sharing the same network prefix (i.e., 192.14), the hosts on each subnet share the same subnet (192.14.Delta.0 and 192.14.Gamma.0). All hosts in the same subnet must have the same subnet identifier/address. Let us assume that Subnet Delta is assigned the IP address 192.14.125.0, while Subnet Gamma is assigned the IP address 192.14.18.0.

The Gamma subnet address 192.14.18.0 can be expressed in binary notation as follows:

  1. 11000000.00001110.00010010.xxxxxxxx

For this network, the first 24 bits in the 32 bit address are used to identify the subnet, making the last 8 bits not significant for network identification. To identify the Gamma subnet, its address can be expressed as 192.14.18.0/24 (or just 192.14.18/24). The /24 in this notation represents the subnet mask (also written as 255.255.255.0).

In the past, subnets were created based on address classes. A subnet could have 8 (class A; /8), 16 (class B; /16), or 24 (class C; /24) network identifier bits, corresponding to a maximum of 224, 216, or 28 hosts. As a result, if an entire class B prefix (/16 subnet) is allocated for a network that required only 600 addresses, then 64,936 (216 – 600 = 64,936) addresses will be wasted.

The introduction of variable-length subnet masks (VLSMs) allowed for IPv4 address spaces to be allocated more efficiently without the wastage seen when using classful addressing [RFC950, RFC1878]. VLSM allows network designers to allocate the number of addresses required for a particular network more precisely. VLSM was developed to allow IPv4 networks to be subdivided conveniently and more efficiently without being constrained by the addressing limitations of classful addressing, particularly address wastage due to unused address space. VLSM provides more flexibility in designing subnets of varying sizes without unnecessary address space wastage. VLSM was the basis on which Classless Inter-Domain Routing (CIDR (discussed below) was developed.

As an example, let us assume a network with the prefix 192.14.18/24 is divided into two smaller subnets, one consisting of 19 hosts and the other of 47 hosts. To accommodate 19 hosts, the first subnet must have 25 (32) host identifiers. Assigning 5 bits to the host identifier results in 27 bits of the 32 bit address being left for the subnet identifier. The IP address of the first subnet then becomes 192.14.18.128/27, which can be expressed in binary notation as follows:

  1. 11000000.00001110.00010010.100xxxxx

To get the “128” in the above address, the “100xxxxx” is converted to “10000000” which is equal to the decimal value of 128. The subnet mask /27 covers the first 27 most significant bits of the IP address. For the second subnet of 47 hosts, the network must accommodate 26 (64) host identifiers. Assigning 6 bits to the host identifier results in 26 bits of the 32 bit address being left for the subnet identifier. The IP address of the second subnet is therefore 192.14.18.64/26, which in binary notation is as follows:

  1. 11000000.00001110.00010010.01xxxxxx

To get the “64” in the above address, the “010xxxxx” is converted to “01000000,” which is equal to the decimal value of 64. Using the larger subnet mask (/24), the network designer is able to assign address bits within it to create the two smaller subnets. With this the allocated address space is used more efficiently.

B.3.4 IPv4 CIDR

As IP networks grew to accommodate more users, it became apparent that many organizations needed larger address blocks than a class C (/24) network provided. These organizations were, therefore, allocated a class B (/16) address block, which in many cases was much larger than their networks required. Also, as enterprise and service provider networks and the Internet grew rapid, the pool of unassigned class B addresses (214, or about 16,000) was rapidly depleted.

Also, during the early phase of the Internet development, some organizations were allocated address spaces far larger than they actually needed. All these factors among others led to inefficient address allocation and use, as well as poor routing in networks. With the class B addresses seriously on the verge of depletion, a large number of class C address were given out. The large number of the allocated smaller class C addresses we used to create networks (geographically dispersed) resulted in very large routing tables in routers. These smaller networks were designed and dispersed such that they offered little opportunity for route aggregation.

CIDR (which is based on VLSM) was designed to address the limitations of classful addressing [RFC1517, RFC1518, RFC1519, RFC4632]. The classful addressing method of allocating the IP address space (combined with how routing of IP packets is done) constrained networks designed with the smaller address classes from being scalable. The CIDR scheme was developed to allow a network designer to flexibly repartition any address space (without being limited by class boundaries) to create a network with a larger or smaller block of addresses to be allocated to users. Other than replacing the inefficient classful addressing method, CIDR slowed the rapid depletion of IPv4 addresses and the growth in routing table sizes on routers in networks.

In classful addressing as discussed above, the network addresses (prefixes) are written in a field that is one or more bytes in length, resulting in the class A, B, or C addresses shown in Table B.1. IP address allocations were therefore based on the byte boundaries of the 4 bytes of an IP address. A full IP address was considered to be the concatenation of an 8, 16, or 24 bit network prefix and a corresponding 24, 16, or 8 bit host identifier. With this, the smallest address allocation was the class C addresses that carried only a maximum of 256 host identifiers. This was often too small for most organizations. The larger class B addresses carried 65,536 host identifiers, which was often too large to be used efficiently by even large networks/organizations.

As enterprise and service provider networks grew, it became increasingly apparent that more flexible and efficient IPv4 addressing methods were needed. With CIDR, an address space is allocated to an organization on any address bit boundary, instead of an 1 byte boundaries. CIDR allows a network designer to partition a larger network into various sized subnets, facilitating creating and sizing a network more appropriately to meet the requirements of the organization.

The CIDR notation (which is now the standard way of representing IP addresses) is a syntax for specifying IP addresses and their associated prefixes for the purpose of routing packets in networks. In this notation, a network address or routing prefix in the IP address is written with a suffix indicating the number of bits of the prefix. Examples are, for IPv4, 192.168.18.0/24, and for IPv6, 2001:db8:abcd:0012::0/64.

  • The CIDR notation is derived from the network prefix and its size in the IP address (which is equivalent to the number of consecutive leading 1 bit in the network prefix mask).
  • The IP address is expressed according to the version of IP used (IPv4 or IPv6) and involves using a separator character, the slash ('/') character, in front of the network prefix size expressed as a decimal number. The CIDR notation is constructed by concatenating the network prefix, a slash character, and number of leading bits of the network prefix expressed as decimal number.
  • The IP address constructed may represent the address of a single, distinct interface in a network, or the routing/network prefix of an entire network. The maximum size of the network (i.e., number of distinct host/interface identifiers supported) is derived from the number of identifiers that are possible with the remaining, lower order bits after the network prefix. An IP address followed by a slash (/) and a decimal number (i.e., 127.0.0.3/8) indicates a block of addresses using a subnet mask. The following are important features of the CIDR notation:
    • - The address 192.168.100.17/24 represents the IPv4 address 192.168.100.17 and its associated network/routing prefix 192.168.100.0. The prefix is equivalently obtained by applying the subnet mask 255.255.255.0 (which has 24 leading 1 bit) to the address 192.168.100.17.
    • - The IPv4 address block 192.168.100.0/22 represents the 1024 IPv4 addresses from 192.168.100.0 to 192.168.103.255 (using a 32 – 22 = 10 bit host identifier space). This is equivalent to the address in binary notation:
      1. 11000000.00001110.01100100.00000000
      2. to
      3. 11000000.00001110.01100111.11111111
    • - The CIDR notation allows for a more compact representation of IPv4 addresses and prefixes than the dot-decimal notation where both the address prefix and the subnet mask are indicated. 192.168.100.0/24 was written in a longer form as 192.168.100.0/255.255.255.0.
    • - The number of host/interface identifiers in a subnet (defined by the network prefix (or mask)) can be calculated as 2IP_Address_Size − Net_Prefix_Size, in which the “IP_Address_Size” is 32 for IPv4 (and 128 for IPv6). For example, in IPv4, a prefix size of /20 gives 232–20 = 212 = 4096 host identifiers.

    With the introduction of CIDR, the process of allocating address blocks to organizations is based on the actual short-term and projected long-term needs of the organizations. The CIDR prefix-based method of representing IP addresses and the associated route aggregation properties allow blocks of addresses to be grouped into single routing table entries. This enables routing to be done more efficiently in the Internet. These address groups (commonly called CIDR blocks) each share a common shorter prefix.

CIDR allows for the aggregation of multiple contiguous network prefixes (called route aggregation, route summarization, or prefix aggregation,) for the creation of supernetwork (or supernet) in networks. The resulting supernet has a subnet mask (network prefix) that is smaller than the individual subnet masks (prefixes) used in constructing the supernet. The supernet prefixes are advertised by routers as aggregates, thus reducing the number of entries in routing tables. The advantages of CIDR and route aggregation can be summarized as follows:

  • CIDR and supernetting allows the aggregation of routes to multiple smaller networks.
  • This results in smaller routing table sizes in routers, as well as saving memory storage space for the routing tables in the routing devices.
  • With smaller routing tables, routing decisions (in the routing devices) are simplified (faster look-ups and forwarding).
  • With smaller routing tables (and a fewer number of smaller networks visible to the outside networks), routing advertisements to neighboring routers are reduced.
  • The use of supernetting allows a network (supernet) to isolate internal topology changes from other outside routers. This can help to improve the stability of the entire network by limiting the propagation of routing protocol traffic outside the supernet after a network link fails internally.
  • If a routing device advertises only an aggregate/summarized route (from a supernet) to peer routing devices, then it does not need to advertise any changes in the specific smaller subnets within the supernet (summarized route). The route aggregation can significantly reduce any unnecessary routing protocol updates following a topology change within the supernet. This increases the speed of convergence of network state and allows the overall network to be more stable.

CIDR address blocks are managed by the IANA with assistance from the RIRs. The IANA allocates to the RIRs large, short-prefix CIDR address blocks who are then responsible for distributing these address blocks to their customers. The RIRs (responsible for address management and allocation in each geographic area, North America, Europe, Africa, etc.) divide these short-prefix CIDR address blocks and allocate the subblocks to the LIRs. Similarly, the LIRs subdivide the address blocks they receive and allocate them to their customers. End-user networks receive address prefixes from the LIRs sized according to their network needs.

B.3.5 Reserved IPv4 Addresses

The following are examples of reserved IPv4 addresses:

  • Private Addresses: These IPv4 addresses can be used within a home, office, campus, company, and enterprise network and are only visible within these networks and not outside [RFC1918]. They are used within such networks when globally routable addresses are not required or obligatory within these networks. Private addresses are not globally managed and delegated by the IANA and RIRs, meaning that they are not allocated to any specific organization and IP packets carrying these addresses cannot be transmitted through the public Internet.

    The following three blocks of the IPv4 address space are reserved by the IANA for private networks:

  • 10.0.0.0/8 (255.0.0.0) Addresses: 10.0.0.0 – 10.255.255.255; 16,777,216 host identifiers (single class A network)
  • 172.16.0.0/12 (255.240.0.0) Addresses: 172.16.0.0 – 172.31.255.255; 1,048,576 host identifiers (16 contiguous class B networks)
  • 192.168.0.0/16 (255.255.0.0) Addresses: 192.168.0.0 – 192.168.255.255; 65,536 host identifiers (256 contiguous class C networks)

    From Table B.1, it can be seen that only parts of the “172” and the “192” address ranges are designated for private use. The remaining addresses in these address classes (Classes B and C) are public and routable on the global Internet.

    Packet with these addresses cannot be routed on the Internet, so such packets are dropped by the routers. In order to communicate with the outside networks, these IP addresses have to be translated to public (routable) IP addresses using a NAT (Network Address Translation) device, or Web Proxy server. A separate range of private addresses was created to control the assignment of the already-limited IPv4 public routable address pool. By using a private address range within a home, office, campus, and similar environments, the demand for routable IPv4 addresses globally decreased significantly. It has also helped delay the exhaustion of routable IPv4 addresses.

  • Loopback Address: The IANA reserved the IP address range 127.0.0.0 – 127.255.255.255 (127.0.0.0/8) for use as a host's self-address, loopback address [RFC6890]. The loopback address is also known as localhost address. The loopback (localhost) address is implemented and managed within the host's operating system. An IP packet carrying the loopback address in its source IP address field should never be transmitted outside the host. The loopback address is used within the host to enable the local server and client processes on the host to communicate with each other.

    When a process running on the host generates a packet with destination IP address set to the loopback address, the operating system loops the packet back to the process without any interference from the network interface adapter. Data sent to the loopback address are forwarded by the operating system to a virtual network interface within operating system that turns it around. The loopback address is mostly used for testing (on a single host) how a client–server process works after its implementation.

  • Broadcast Address: The Destination Address field of an IPv4 packet can carry a special IPv4 broadcast address, that is, 255.255.255.255. Packets carrying this address are never forwarded by routing devices to other networks but remain in the broadcast domains (VLANs or subnets) in which they are sourced.
  • Multicast Addresses: IPv4 multicast addresses are identified by the four highest order address bits 1110 as shown in Table B.1. This definition originates from the classful addressing scheme where this group of addresses is designated as class D addresses. A multicast address starts with 224.x.x.x and the range is from 224.0.0.0 to 239.255.255.255. The CIDR prefix of the IPv4 multicast address range is 224.0.0.0/4. Address assignments within this range are specified in [RFC5771].
  • Link-Local Addresses: Link local addresses are defined in [RFC6890] and range from 169.254.0.0 to 169.254.255.255 (address block 169.254.0.0/16). These addresses are not routable and are only used and valid on links such as a point-to-point connection (between two interfaces) or a single local network segment connected to a host interface. This is because a link-local IPv4 address is not guaranteed to be unique beyond the interface on which it is applied. Routers therefore do not forward packets carrying link-local addresses. These addresses cannot be carried in the source or destination IP address fields of packets traversing routers.

    The assignment of link-local addresses may be done manually by a network administrator or automatically through mechanisms and procedures in a host's operating system [RFC3927]. In case a host is not able to obtain an IP address from a DHCP (Dynamic Host Control Protocol) server and it has not been assigned any IP address manually, the host can assign itself an IP address from a range of reserved link-local addresses. In the absence of a DHCP server, the host may randomly choose an IP address from the range of reserved link-local addresses and then check (via ARP) to ensure that no other host has assigned itself the same IP address.

    Once the hosts (on the point-to-point link or connected to the same single network segment) are configured with link-local addresses in the same address range, they can communicate directly (not across a router) with each other. These IP addresses do not allow hosts to communicate with each other when they do not belong/connect to the same physical or logical network link or segment.

B.4 Address Resolution

Address resolution can be done via one of these methods depending on the format of the original address:

  • ARP (Address Resolution Protocol): In an IP network, ARP is used to map the configured IP address of a device's interface to its corresponding MAC address. ARP is used to obtain the MAC address of an interface whose IP address is already known. Using ARP, a device sends a broadcast packet (ARP request) that is received by all the host interfaces in the broadcast domain (network segment). But only the interface whose IP address is indicated in the ARP request responds to the request by providing its MAC address.

    The MAC address of an interface can be queried given the already known IP address using the ARP for IPv4 or the Neighbor Discovery Protocol (NDP) for IPv6. ARP or NDP is used to translate IP addresses (OSI Layer 3) into Ethernet MAC addresses (OSI Layer 2). A device maintains an ARP table in which it keeps IP address-to-MAC address mappings for the network.

  • DNS (Domain Name System): DNS is a system through which a device can obtain the IP address of another device whose domain name is already known. Hosts on the Internet are usually known by names, for example, www.myself.com, and not primarily by their IP addresses (which is used for identifying network interfaces and routing). To allow domain names to be used for communications over networks, the former has to be translated (or resolved) to IP addresses and vice versa. The translation between IP addresses and domain names is performed by the DNS server. DNS is a distributed and hierarchical naming system that allows name spaces to be delegated to other DNS servers.

B.5 IPv4 Address Exhaustion

The rapid growth of the Internet and its extensive reach drastically increased the number of devices that needs unique IP to be able to communicate with others. It came to a point that enterprise and service providers could not continue to give their customers globally unique IPv4 addresses, and at the same time could not get new globally unique IPv4 addresses for expanding their networks. In spite of these factors, the service providers were expected to continue to serve both existing customers and accept new customers.

The service provider can accept new customers requiring globally unique addresses if their allocated IPv4 address space can accommodate them. With the 32 bit (4 byte) IPv4 address field, the total IPv4 address space is limited to 4,294,967,296 (232) addresses. At the start of the Internet, the 32 bit IPv4 address space was considered larger enough for future needs and so there were little concerns about address depletion.

However, the rapid growth of the number of devices (e.g., mobile devices (laptop computers, smart phones, etc.), always-on communication devices (ADSL modems, cable modems), communication-enabled vehicles, and other electronic devices) expanded the demand for extra IPv4 addresses, a situation that was not foreseen at the start of the Internet. As addresses were assigned to all these wide range of new users, the number of unassigned addresses decreased. Furthermore, the rising use of Internet-enabled devices and appliances created great concerns that the public IPv4 address space may eventually be depleted sooner than later. To address these concerns, the following practices were adopted (in addition to CIDR):

  • Private IP Addresses: Few blocks of IPv4 addresses were designated for private use within private networks (with interfaces not visible to the outside world) so that the demand for public IPv4 addresses can be reduced.
  • NAT (Network Address Translation): A NAT device is a mechanism (with one or a few public (routable) IPv4 addresses) through which multiple hosts with private IP addresses in a network can communicate with devices in the outside world using public IPv4 addresses. Most devices in a private (residential, campus, or enterprise) network are assigned private IP addresses that are not routable on the Internet. It should be noted that when an IP router receives an IP packet with a private IP address, it does not forward the packet, it drops the packet. So, in order to communicate with devices with public IP addresses, devices in private networks must use an address translation (NAT) service, which translates between public and private addresses. When a device sends an IP packet out of a private network, the NAT replaces the private IP address in the packet with public IP address of the NAT device and vice versa.
  • DHCP (Dynamic Host Configuration Protocol): DHCP is a protocol through which a host interface in a network is assigned an IP address from a predefined IP address pool the DHCP server maintains. The DHCP server may also provide additional information such as the default gateway (router) for a host, subnet mask, DNS Server IP address, lease time with the assigned IP address, and so on. By using DHCP services, a network administrator can manage the assignment of IP addresses automatically and more efficiently.
  • Proxy Server: To enable users access the Internet, a network can use a Proxy Server that has a public IP address assigned to interface. All the hosts in the network will send requests to the Proxy Server that will then forward them to a server on the Internet. The Proxy Server acts on behalf of the hosts to send the requests to the server (somewhere on the Internet) and when it receives responses from the server, the Proxy Server forwards them to the client hosts. This has been an effective method for controlling Internet access in private networks and it facilitates the implementation of web-based policies.
  • Unused Public IP Addresses: These addresses are being reclaimed by the RIRs to be reassigned to new users. By encouraging organizations to implement renumbering of their networks, the RIRs are able to reclaim large blocks of unused IPv4 address space allocated to these organizations during the early development of the Internet. Also, the RIRs are exercising tighter management and control over the allocation of IPv4 address blocks to the LIRs.

The various limitations of IPv4 discovered since its inception spurred the development of IPv6 in the 1990s, which has been in various stages of commercial deployment since 2006. The IETF redesigned the IPv6 addresses with the main goal of addressing the drawbacks of the IPv4 addresses. IPv6 improved many of the functionalities of IPv4, providing changes that improved addressing, security, and configuration and maintenance. An IPv6 packet has a 128 bit address field, large enough to allow very much larger number of devices to be assigned IP addresses.

Currently, a majority of the devices running on the Internet still use IPv4 and it is anticipated that the shift to total IPv6 use is far in the future. This means that IPv4 and IPv6 will coexist for many more years, and this coexistence must be transparent to users of either protocols. A number of mechanisms has been provided by IPv6 [RFC2893, RFC7059], which allow IPv4 and IPv6 to coexist until the Internet shifts to only IPv6:

  • Dual IP Stack
  • Tunneling (6to4, 4to6, etc.)
  • NAT Protocol Translation

It is argued that the best long-term solution to IPv4 address depletion is to move to IPv6. The long-term use of IPv4 is still being debated. IPv6 provides a much larger address space that also allows improved route aggregation across the Internetworks and offers large subnetwork allocations to organizations. Migration to IPv6 is in progress (for communication-enabled vehicles, sensors, Internet-of-Things (IoT) devices, etc.), but complete migration is not expected soon.

Service providers and some enterprises have to make compromises between growing their networks using IPv6 and continuing to serve existing and new IPv4 customers. The technologies and solutions discussed above enable enterprises and service providers to implement mixed IP addressing solutions even as they build IPv6 networks to accommodate new services such as communication-enabled vehicles, sensors, and IoT devices.

B.6 IPv4 Options

If the IP packet's Internet header length (IHL) is greater than 5 (i.e., it is from 6 to 15) 32 bit words, it means that the packet carries Options and must be processed. The Options field is a variable length field (up to 40 bytes in length) and consists of the following subfields:

  • 1bit C (Copy) Flag: This indicates if the Options in the IP packet is to be copied into all its fragments (0 = Do not copy; 1 = Copy).
  • 2bit Class Field: This indicates the class to which the IP Options belong (0 = Control; 1 = Reserved; 2 = Debugging and measurement; 3 = Reserved)
  • 5bit Option Field: This indicates the type of Options carried in the IP packet. Examples of IP Options are 0 = End of options list; 1 = No operation; 2 = Security; 3 = Loose Source Route (LSRR); 4 = Time stamp; 7 = Record Route; 9 = Strict Source Route (SSRR); 18 = Traceroute; and so on.

The value in the IHL field must include enough extra 32 bit words to hold all the options (plus any padding needed to ensure that the header contains an integer number of 32 bit words). The options field is not often used because of some concerns about network security. Security concerns discourage the use of Loose Source and Record Route and Strict Source and Record Route.

B.7 IPv4 Packet Fragmentation and Reassembly

With IPv4, it is possible for routers to fragment packets (split them into multiple smaller units) if required to transmit them on network interfaces that cannot handle larger packets. Packets that are fragmented must be reassembled at the destination node. Fragmentation and reassembly is much complex in IPv4, but the process has been simplified in IPv6.

MTU mismatch occurs when the MTU of the output network interface is smaller than the MTU of the input interface. This mismatch can result in packet fragmentation or discard at the device having the mismatch. It is possible for an IP packet to be fragmented at a routing device, and for the fragments (which are carried in whole IP packets with their own headers) to be again fragmented at another routing device. The Identification (fragment ID) field (16 bits) in the IP header identifies the original packet a fragment belongs to. This information is used by the destination node to reassemble the fragmented packet later. Each original packet is assigned a unique Identification value (fragment ID) and every fragment of that packet is assigned the same Identification value.

In IPv4, any router (including the sender) can fragment an IP packet. Usually, only the destination endpoint reassembles fragments into a complete packet. It is possible for a border firewall (at the edge of a network) to reassemble fragments to obtain the original packets to allow it enforce security filtering rules. In IPv6, packet fragmentation can only be done by the source node, and a fragmentation extension header is carried in the packet header that contains the information needed for the destination node to reassemble complete packets.

The first bit of the IP header Flags field is reserved and all nodes sourcing a packet must set this bit to zero. The second bit (the DF (Don't Fragment) flag), if set (= 1), indicates that any routing device receiving the packet must not fragment it. Instead, if a packet with DF set reaches a routing device whose output interface cannot handle a packet of that size, that packet is dropped (and ICMP Destination Unreachable message is sent to the sender). The DF flag can be used by a source node when it wants to send packets to another node but also wants to determine if the node's interfaces can forward the packet.

The DF flag can be used for Path MTU Discovery [RFC1191], either automatically by features implemented in the host's IP software or manually using diagnostic tools such as ping [RFC792, RFC1122] or Traceroute (IPv4 Option 18) [RFC792]. The third bit (the MF (More Fragments) flag), if MF set indicates that there are more fragments following the particular fragment carrying the MF bit. Packets that have not undergone fragmentation have their MF flag set to zero. Except the last fragment packet, all other fragments of the same packet have their MF flag set.

The 13 bit Fragment Offset field is used by the destination node when reassembling fragmented packets. The offset is measured in 8 byte blocks and is the offset of a particular fragment measured from the front of the original IP packet. The first fragment of an IP packet (which constitutes the start of the packet) has an offset of 0. The last fragment of a packets carries a nonzero Offset field to allow it to be easily differentiated from a packet that has not been fragmented.

The 13 bit Fragment Offset field provides a maximum offset of (213 – 1) × 8 = 65,528 bytes, which is greater than the maximum IP packet length of 65,535 bytes (i.e., 216 – 1) including the 20 byte basic IP header length (65,528 + 20 = 65,548 bytes). The maximum IP payload is limited to 65,535 − 20 bytes = 65,515 bytes. Dividing the payload data 65,515 bytes by 8 byte (unit of the offset) results in a maximum of 8189 offset units.

This means Fragment Offset field is limited to maximum of 8189 actual offset units (and not 8191 (i.e., 213 – 1) when considering the 13 bit Fragment Offset field). An IP fragment carrying an Fragment Offset value set to 8189 (i.e., in the last fragment) could have a maximum payload of 3 bytes:

  • Maximum IP packet length 65,535 bytes – minimum IP header length 20 bytes – (8189 offset units × 8 bytes per offset unit) = maximum 3 bytes.

When a router receives a packet, it performs an IP forwarding table lookup using the IP destination address and determines the outgoing interface to use and that interface's MTU. If the packet size is larger than the interface's MTU, and the DF bit is set to 0, the router may fragment the packet. When there is an MTU mismatch, the router may divide the packet into fragments.

The maximum size of each fragment is the MTU of the interface minus the IP header size. The IP header is 20 bytes minimum (without IP options) and 60 bytes maximum (with IP options). The router formats each fragment into an IP packet and with each fragment carrying IP packet modified as follows:

  • The total length field of the new IP packet is set to the size of fragment plus the IP header size.
  • The MF flag in the packet is set to 1 for all fragment carrying packets except the last one, which is set to 0.
  • The fragment offset field in the new IP packet (measured in units of 8 byte units) is set appropriately as described above.
  • The IP header checksum field in the new IP packet is recalculated (see discussion below).

For an MTU of L bytes and a basic IP header size of 20 bytes, the fragment offsets would be multiples of (L – 20)/8. If we take the MTU of the interface to be 1500 bytes and the minimum IP header size of 20 bytes, then the fragment offsets to be carried in the fragment offset field would be multiples of (1500 – 20)/8 = 185, that is, 0, 185, 370, 555, 740, and so on

To identify an arriving IP packet as a fragment, a receiver checks if at least one of the following conditions is true:

  • The MF flag is set to 1 (which is true for all fragments of a packet except the last).
  • The fragment offset field is nonzero (which is true for all fragments of a packet except the first).

Using the Identification (fragment ID) field, the receiver identifies fragments that belong to the same (original) IP packet so that reassembly can be done. Using both the fragment offset field values and the MF flag, the receiver reassembles the packet from fragments with the same identification field value.

Reassembly may involve placing the fragments in a reassembly buffer, with each new arriving fragment located in the reassembly buffer starting at fragment offset field value × 8 bytes from the front (beginning) of the buffer. When the receiver takes in the last fragment (which has the MF flag set to 0), it can compute the total length of the original data payload by multiplying the offset in the last fragment by 8 and adding the size of the data in the last fragment. After receiving all fragments, the receiver can sequence them in the correct order using their offsets. The reassembled original IP packet is then transferred to the upper-layer protocol for further processing.

If a 2500 byte IP packet is transmitted from a source and is fragmented into chunks (fragments) of 1020 bytes, three fragments can be created as follows:

  • Fragment # 1: MF Flag = 1; total length = 1020; data size = 1000; offset = 0.
  • Fragment # 2: MF Flag = 1; total length = 1020; data size = 1000; offset = 125.
  • Fragment # 3: MF Flag = 0; total length = 520; data size = 500; offset = 250.

In the above fragmentation process, “data size” includes the length of the ICMP, IGMP, or Transport Layer header.

B.8 IP Packets Encapsulated into Ethernet Frames

The maximum length of IP packets (65,535 bytes) are much larger than the maximum length of Ethernet frames (of 1518 bytes, with a payload of 1500 bytes). This means IP packets larger than 1500 must be segmented and carried in several Ethernet frames. For example, the number of Ethernet frames required to transport an IP packet with maximum size of 65,535 bytes can be calculated as 65,535 ÷ 1500 = 43.69.

This shows that it takes 44 Ethernet frames to transport one IP packet of maximum size across an Ethernet interface. However, this example does not imply that IP packets are always segmented/fragmented before forwarded over Ethernet. This is because most IP applications are designed to source packets in data blocks smaller than the maximum Ethernet frame size.

B.9 Forwarding IPv4 Packets

IP forwarding typically involves an IP forwarding table lookup, decrementing the TTL count (by one), recalculating the IP header checksum, encapsulating the IP packet in a Data Link Layer frame, recalculating the Data Link layer checksum, and forwarding the frame to the correct output interface. Forwarding table lookups can be done in hardware, as can the decrementing of the TTL, recalculation of the IP header checksum, and the Data Link layer frame rewrites.

The routing devices also run routing protocols (such as OSPF, RIP, and BGP) allowing them to communicate with other routing devices to generate the information needed to build their routing tables. These routing tables are in turn used to generate the IP forwarding tables that can be used for the IP destination address lookups required to determine the outgoing interface for incoming packets.

B.9.1 CIDR and Routing/Forwarding Table Entries

Allocating blocks of class C addresses was one strategy used to prevent the rapid depletion of class B addresses. However, large class C allocations required many more routing table entries to be maintained in routers. As discussed above, CIDR was introduced to improve both IPv4 address space utilization and routing scalability in the Internet. CIDR allows a block of IP addresses to be aggregated/summarized into a single routing table entry. This consolidation results in a significant reduction in the number of separate routing table entries maintained, particularly, in core routers. The block of IP addresses is consolidated in the routing table entry as follows:

  1. {lowest address in block, supernet mask} or
  2. {lowest address in block, number of common prefix bits}

The start of the address block is the “lowest address in block” and the number of class C addresses in the block is specified by the supernet mask. The supernet mask (or the CIDR mask) contains 1s for the common prefix (i.e., part with identical binary values) for all the addresses, and 0s for the parts of the addresses that have different values.

Routes in a CIDR block can be summarized by routing devices in a single router advertisement called an aggregate. The networks or subsets that make up a given CIDR block are said to be more specific with respect to that CIDR block. The common prefix of the more specific addresses (that make up the CIDR block) is greater than that of the CIDR block itself.

With the introduction of CIDR, routing devices perform IP forwarding table lookups using longest-prefix matching (LPM) searches (also called maximum prefix length match). Currently, all routers support CIDR and use LPM lookups to determine the next hop and outgoing interface for a packet. The CIDR mask is used to determine the number of prefix bits that are to be used in the LPM searches. If there exist multiple routes with different prefix lengths in the forwarding table to the destination, the router selects the route with the longest prefix.

This is because each of the IP forwarding table entries may represent a specific subnet, creating a situation where one IP destination address may match more than one entry. When this happens, the most specific of these matching entries, that is, the one with the longest subnet mask, is referred to as the longest prefix match. It is the forwarding table entry that matches the largest number of leading address bits of the destination address.

An IP address can be checked to see if it is part of a CIDR block. The address is considered to match the supernet/CIDR prefix if the leading (i.e., starting) N bits of the address and the supernet/CIDR prefix are identical. Given the 32 bit IPv4 address and an N-bit CIDR prefix, if 32-N bits do not match, then potentially 232−N IPv4 addresses could match a given N-bit CIDR prefix. A larger CIDR prefix has potentially fewer address matches, while a smaller prefix has potentially more address matches. This also means that a single address lookup can produce multiple CIDR prefix matches, assuming each prefix has a different length.

Let us consider the following two entries in an IPv4 forwarding table:

  1. 192.168.20.15/28
  2. 192.168.0.0/16

When a routing device needs to lookup the destination address, 192.168.20.18, both entries in the forwarding table will match this address, meaning both entries in the forwarding table contain this destination address. However, the longest prefix match is found to be the entry 192.168.20.15/28, since the /28 subnet mask is longer than the other mask /16, making the interface corresponding to 192.168.20.15/28 the more specific route.

B.9.2 TTL Update

The 8 bits TTL field in the IP header is used to prevent packets from being forwarded from router to router indefinitely in a network that has routing loops. The TTL was originally intended to be a lifetime limit for a packet in seconds, but it ended up implemented as a maximum lifetime in the number of hops a packet can traverse, that is, a “hop count.”

This means that every time a packet traverses a router, the TTL is decremented by 1. If the TTL reaches 0, the packet is dropped. When a packet is dropped, an ICMPv4 nondelivery message (ICMP Time Exceeded message [RFC792) is transmitted to the packet sender. This mechanism governs how the Traceroute command (based on IP Option 18) works.

B.9.3 IPv4 Header Checksum Computation

The IPv4 header checksum, as discussed earlier, is a simple mechanism used in an IPv4 packet to protect (only) the header from data corruption [RFC791]. This 16 bit checksum is calculated only over the IP header bytes and the field in which it is carried is also part of the IP packet header. At each router, an IP packet is modified accordingly and the checksum is recalculated. The packet will be discarded if the calculated checksum does not match the received checksum.

The router must update the IP checksum if it modifies or changes any part of the IP header (such as decrementing the TTL and modifying the DSCP bits). The 16 bit IP header checksum field contains the 16 bit ones' complement of the ones' complement sum of all 16 bit words in the header [RFC1812].

Calculating an IPv4 Header Checksum
  • To compute the checksum at a node including the source, the IP checksum field itself is set to zero.
  • The IP header is divided into 16 bit words and these words are summed up, and then finally a one's complement of the sum is performed to obtain the IP checksum.
  • This means if another node sums the entire IP header, including checksum, the result should be zero if the header is not corrupted.
Verifying an IPv4 Header Checksum
  • At any node that receives the packet including the final destination, the IP header is verified to see if it is corrupted. The node does not replace (this time) the checksum value in the header with all zeros (i.e., the original/transmitted IP header checksum is not omitted) when verifying the checksum.
  • To validate the checksum, all 16 bit words in the header are summed including the transmitted checksum. The result of summing the received IP header, including the transmitted checksum, should be zero if the header is not corrupted.
  • If the result is nonzero, then at least 1 bit in the IP header has been corrupted. However, there are certain multiple bit errors that can cancel out, and hence corrupted IP headers can go undetected.

Since the TTL is decremented by one at each hop (router), the IP header checksum must be recalculated at each hop. References [RFC1141] and [RFC1624] describe how the IP checksum can be computed incrementally (after, for example, a TTL update).

The IPv4 header checksum is eliminated in IPv6. It was argued that the checksums provided in Layer 2 protocols such as Ethernet, ATM (Header Error Control (HEC)), and PPP, combined with the checksums supported in upper-layer protocols such as TCP, UDP, SCTP, ICMP, IGMP, and OSPF were sufficient to make including a separate IPv6 header checksum in IPv6 packets unnecessary.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset