IP (i.e., IPv4 and IPv6) is a network layer protocol that provides connectionless services to upper-layer protocols (e.g., TCP (Transmission Control Protocol), UDP (User Datagram Protocol), Internet Control Message Protocol (ICMP), Internet Group Management Protocol (IGMP), Open Shortest Path First (OSPF)). The connectionless service is implemented with the basic unit of transport being the datagrams (frequently referred to as packets) that contain the source and destination IP addresses of end-user points and other parameters needed for protocol operation.
The IP layer relies on the underlying network infrastructure for transporting the IP datagrams (packets). This means that IP datagrams are encapsulated (in some cases after segmentation into smaller units) by the frames of the underlying network such as Ethernet, ATM (Asynchronous Transfer Mode), and SONET/SD.H (Synchronous Optical Network/Synchronous Digital Hierarchy). Regardless of the physical layer protocol mix, maximum transmission unit (MTU) sizes of interfaces and speed differences of the links in the underlying networks, the IP layer translates these different layer networks into a common logical IP network that is independent of physical characteristics and differences.
The upper-layer protocols such as TCP, UDP, ICMP, IGMP, and SCTP (Stream Control Transmission Protocol) need not be aware of the hardware, encapsulation methods, and other characteristics of the underlying network. Upper-layer protocols may expect some levels of quality of service (QoS) from the IP layer during data delivery, such as throughput, delay, and delay variation. These are often referred to as QoS parameters to characterize the nature of the data delivery.
In some cases, the upper layers may pass the QoS expected along with data to the IP layer. The IP layer may support mechanisms to enable it map the QoS parameters to services provided by the underlying network infrastructure. The underlying network may or may not support capabilities to provide the QoS demanded or expected from the upper layer.
The basic IP service is based on “best effort” delivery of data, that is, IP does not guarantee that packets received would be delivered to the next node or final destination, it only tries its best to reach the next node or destination. IP also does not guarantee that packets sent would arrive in the proper sequence or the duplication of packets sent or delivered. Upper-layer transport protocols, such as TCP and SCTP, are responsible for handling sequencing, duplication, and other data integrity issues.
The Internet Protocol version 4 (IPv4), described in IETF (Internet Engineering Task Force) publication RFC 791 (September 1981), is the fourth version or iteration of IP when tracing the development of this protocol (IP). IPv4 was the first version of the protocol to be widely deployed and is the network protocol that currently drives a majority of today's enterprise, service provider, and the Internet.
The previous chapter gives a description of the Ethernet frame format and the different fields that make up a frame. This chapter provides a description of the IPv4 header and its corresponding fields. IPv6, the successor protocol to IPv4, is not discussed in this chapter.
IP receives data segments from upper-layer protocols (e.g., TCP (protocol number 6), UDP (protocol number 17), SCTP (protocol number 132), ICMP (protocol number 1), IGMP (protocol number 2), OSPF (protocol number 89)) and formats them into IP packets. The IP layer receives the data units from the upper-layer protocols and adds its own header information.
The received data from the upper layer constitute the payload in the IP packet. The IP header contains all the information needed to route the packet through the IP network node to node from the source to the destination (unicast transmission) or destinations (multicast transmission). The IP packets created are then encapsulated by the protocol frames of the underlying network (e.g., Ethernet) as illustrated in Figure B.1.
The IPv4 packet header is placed at the front of every IPv4 packet created. The header consists of 14 fields, one of which is optional. The packet is normally 20 bytes in length, but sometimes can carry additional options in a variable length field located after the Destination Address field (Figure B.2). The fields in the IP header are formatted with the most significant byte written first (Big Endian), and within the byte, the most significant bit is written first. For example, the Version field in the IP header is located in the four most significant bits of the first byte of the packet. The IP header and the fields it contains are described below:
When a complete IP packet leaves its source, the source clears the MF (More Fragments) bit to zero and the Fragment Offset field to zero before transmitting the packet.
Differentiated Services introduces the notion of per hop behaviors (PHBs) that define how traffic (packets) belonging to a particular behavior aggregate (i.e., packets considered to have the same forwarding characteristics) is handled at an individual network node [RFC3140]. PHBs are not indicated directly in IP packet headers; instead, the DSCP values carried in the header can be used in implementing the PHBs in a network node. The 6 bit DSCP field allows for 64 possible DSCP values, but there is no limit on the number of PHBs a node can implement.
Any given network domain can implement its own mechanisms for defining locally the mapping between DSCP values and PHBs. There are standardized PHBs with recommended DSCP mappings, but network operators may choose to implement suitable alternative mappings.
A MAC address is an identifier associated with a physical or virtual/logical interface on an Ethernet device or on devices using technologies such as Token Ring and FDDI. The typical case is the MAC address is permanently stored in a network interface adapter to uniquely identify the interface in a network. MAC addresses belong to and are used at the Data Link Layer, while IP addresses belong to/used at the Network Layer. The IP address of a device interface may change as the device is moved in a network to different IP subnets or VLANs or when it powers up (in a network with a Dynamic Host Configuration Protocol (DHCP) server), but the MAC address remains the same, because it is associated with the device interface.
The 32 bit Source Address field (which contains the IPv4 address of the packet source) may be modified by a NAT (Network Address Translation) device. Also, a source IP address can be all 0s (the unspecified IP address) in certain cases, but it can never be a multicast address (which identifies only a group of multicast receivers). The 32 bit Destination Address field (which contains the IPv4 address of the packet's receiving endpoint) may also be modified by NAT in a reply packet. This returned IP address can be a unicast or multicast IPv4 address, but it can never be all 0s (the unspecified IP address).
For convenience and to make IP addresses more readable, 32 bit IPv4 addresses are often written in the dotted-decimal notation. This consists of splitting the 4 byte address into four groups, each with 1 byte. Each octet of a group is then expressed individually in decimal (taking values from 0 to 255) and separated by periods (.). This allows the 32 bit IPv4 addresses to be conveniently expressed in dotted-decimal notation, in which each octet (or byte) is expressed as a separate decimal number. Within an address octet, the rightmost bit represents 20 (or 1), increasing to the left to the first bit in the octet that is 27 (or 128). The following are IP addresses expressed both in binary and dotted-decimal formats:
A 32 bit IPv4 address, in general, is organized in two primary parts: the network prefix and the host identifier or number. Based on this, IPv4 addresses are organized as two groups of bits in the address. The first group contains a portion of the most significant bits and constitutes the network address (or network prefix or network block). This part identifies a whole network or subnet.
The remaining part made up of the least significant bits forms the host identifier. This part specifies a particular interface of a host on that network. This distinction between network prefixes (subnetworks or subnets) specified in the IP address is the basis of traffic routing between IP networks and subnets. The network part (prefixes), which can be of variable length, is also the basis on which IP address allocation policies are developed.
All IP (host) interfaces within a single network or subnet have the same network prefix. This picture gets complicated with the use of supernetting (also called prefix aggregation, route aggregation, or route summarization) [RFC1519], which is not discussed in detail in this book. Each individual interface (host) within the network/subnet also has its own identifier (host identifier/address), that uniquely identifies it in that network, along with its network prefix.
Also, depending on the interface type and the scope of the network, the IP address assigned to it can be either locally or globally unique. Host interfaces that are accessible/visible to other IP nodes or host outside their local network (e.g., email servers, web servers, video servers) must be assigned a globally unique IP addresses. Host interfaces that are accessible/visible only within their local network must be assigned locally unique IP addresses.
The Internet Assigned Numbers Authority (IANA) is the central numbering authority responsible for assigning IP addresses in addition to other related activities such as root zone management in the Domain Name System (DNS), autonomous system number allocation, and so on. IANA is responsible for ensuring that IP addresses given out are globally unique where required. The IANA also has reserved a large IP address space for use by devices interfaces that are not accessible/visible outside their local networks.
IANA allocates (blocks of) addresses to the Regional Internet Registries (RIRs) who then pass on the addresses to their customers (Local Internet Registries (LIRs), Internet service providers, end-user organizations, etc.) to carry out the actual allocation to end users. A local Internet registry receives an address allocation from a regional Internet registry and then assigns parts of the allocation to their customers. Most local Internet registries also operate as Internet service providers.
IPv4 address allocation went through a number of historical changes, from the original ARPANET (Advanced Research Projects Agency Network) address allocation scheme to classful networking, to networking with Variable-Length Subnet Mask (VLSM), and then to Classless Inter-Domain Routing (CIDR).
In this older IP addressing scheme, the 32 bit IP address was organized into two parts: the network identifier and a host identifier. The network identifier (or network number field) was carried in the most significant (or highest order) octet of the 4-octet IP address. The host identifier (or the local address) was carried in the remaining 3 octets of the IP address (these 3 octets were also called the rest field). The network number field (in the most significant 8 bits of an address) specified the particular network a host was attached to, while the local address (rest field) uniquely identifies a host connected to that network. This addressing system allowed the creation of a maximum of 256 unique networks.
This addressing scheme was adequate at that time, because only a few large IP networks existed, one of which was the ARPANET (which was assigned the network number 10). However, with the wide and rapid proliferation of IP networking, the construction of local area networks (LANs), subnets, and large individual IP networks (academic, enterprise, and service provider networks), it became quickly clear that this addressing scheme was inadequate and not scalable for future network growth.
In this IP addressing system, the high-order octets of the 4-octet (32 bit) IP addresses are organized in various blocks and defined to create a set of classes of networks. This scheme was to address the limitations of the original ARPANET addressing scheme and to provide flexibility in the number of addresses allocated to networks of different sizes. This IP addressing system defined five address classes: Class A, B, C, D, and E [RFC791]. Each address class, coded in the first 4 bits of the 32 bit IP address, defines either a network size, that is, the number of potential hosts requiring unicast addresses (classes A, B, C) or a multicast network (class D).
The classes A, B, and C are given different number of bits to accommodate the network identifier. The rest of the bits in an address class is used to identify a host within a network using that class. This means each address class has a different maximum number of host identifiers/addresses that can be assigned to potential hosts.
Class D was designated for IP multicast addressing, while class E was reserved for experimental purposes or future applications. During processing of IP packets, a network node would examine the first few bits of the IP address to determine the class of the address and where the actual network identifier starts and ends.
This classful IP addressing scheme is illustrated in Table B.1. This table shows that each IP address class reserves/specifies a different number of bits for its network identifier (prefix) and host identifier:
Table B.1 Classful Addressing
Class | Leading Bits (Class Identifier) | Size of Network Identifier Field (Bits) | Size of Host Identifier Field (Bits) | Number of Networks | Addresses Per Network | Start Address | End Address |
Class A | 0 | 8 | 24 | 128 (= 27) | 16,777,216 (= 224) | 0.0.0.0 | 127.255.255.255 |
Class B | 10 | 16 | 16 | 16,384 (= 214) | 65,536 (= 216) | 128.0.0.0 | 191.255.255.255 |
Class C | 110 | 24 | 8 | 2,097,152 (= 221) | 256 (= 28) | 192.0.0.0 | 223.255.255.255 |
Class D (multicast) | 1110 | Not defined | Not defined | Not defined | Not defined | 224.0.0.0 | 239.255.255.255 |
Class E (reserved) | 1111 | Not defined | Not defined | Not defined | Not defined | 240.0.0.0 | 255.255.255.255 |
The three IP address classes can be represented in binary format as follows, with an h representing each bit in the host identifier:
Each bit (h) in a host identifier can have a 0 or 1 value. For example, if only 3 bits are reserved for specifying the host identifier, then we have the following possible host identifiers:
For each IP address class, if H is the number of host identifier bits, then the maximum number of host identifiers that can be supported by that particular network prefix is 2H.
The maximum number of usable addresses for addressing individual hosts in each address class, however, is 2H – 2. The minus 2 here is used to account for the predefined all 0s host identifier part used in a network address and the all 1s host identifier part used in a broadcast address. In a classful network, the mask identifying where that network prefix starts and ends is implicitly derived (inferred) from the IP address itself (i.e., from the leading address bits as shown in Table B.1).
In common network practice, the all-zeros (all 0s) in the host identifier is reserved for referring to all hosts in that network (the entire network or subnet). The all-ones (all 1s) in host identifier is used as a broadcast address in the given network or subnet. These reduce the number of identifiers/addresses available for hosts in a network or subnet by 2. The /31 networks (255.255.255.254) are rarely used, typically used only point to point links (RFC3021]. Such a link supports only two hosts (the endpoints), therefore specifying network and broadcast addresses is not necessary
To address the physical, architectural, size, and management limitations encountered when constructing large networks, network designers often segment large networks into smaller networks/subnetworks. Let us assume three hosts connected to one subnet (Subnet Delta) and three other hosts connected to a second subnet (Subnet Gamma). Combined, these six hosts and the two subnets (Delta and Gamma) form a larger network than the individual ones. If we assume that the entire network is assigned the network prefix (a Class B address) 192.14.0.0, then each of the six hosts will be assigned an IP address that carries this network prefix.
In addition to sharing the same network prefix (i.e., 192.14), the hosts on each subnet share the same subnet (192.14.Delta.0 and 192.14.Gamma.0). All hosts in the same subnet must have the same subnet identifier/address. Let us assume that Subnet Delta is assigned the IP address 192.14.125.0, while Subnet Gamma is assigned the IP address 192.14.18.0.
The Gamma subnet address 192.14.18.0 can be expressed in binary notation as follows:
For this network, the first 24 bits in the 32 bit address are used to identify the subnet, making the last 8 bits not significant for network identification. To identify the Gamma subnet, its address can be expressed as 192.14.18.0/24 (or just 192.14.18/24). The /24 in this notation represents the subnet mask (also written as 255.255.255.0).
In the past, subnets were created based on address classes. A subnet could have 8 (class A; /8), 16 (class B; /16), or 24 (class C; /24) network identifier bits, corresponding to a maximum of 224, 216, or 28 hosts. As a result, if an entire class B prefix (/16 subnet) is allocated for a network that required only 600 addresses, then 64,936 (216 – 600 = 64,936) addresses will be wasted.
The introduction of variable-length subnet masks (VLSMs) allowed for IPv4 address spaces to be allocated more efficiently without the wastage seen when using classful addressing [RFC950, RFC1878]. VLSM allows network designers to allocate the number of addresses required for a particular network more precisely. VLSM was developed to allow IPv4 networks to be subdivided conveniently and more efficiently without being constrained by the addressing limitations of classful addressing, particularly address wastage due to unused address space. VLSM provides more flexibility in designing subnets of varying sizes without unnecessary address space wastage. VLSM was the basis on which Classless Inter-Domain Routing (CIDR (discussed below) was developed.
As an example, let us assume a network with the prefix 192.14.18/24 is divided into two smaller subnets, one consisting of 19 hosts and the other of 47 hosts. To accommodate 19 hosts, the first subnet must have 25 (32) host identifiers. Assigning 5 bits to the host identifier results in 27 bits of the 32 bit address being left for the subnet identifier. The IP address of the first subnet then becomes 192.14.18.128/27, which can be expressed in binary notation as follows:
To get the “128” in the above address, the “100xxxxx” is converted to “10000000” which is equal to the decimal value of 128. The subnet mask /27 covers the first 27 most significant bits of the IP address. For the second subnet of 47 hosts, the network must accommodate 26 (64) host identifiers. Assigning 6 bits to the host identifier results in 26 bits of the 32 bit address being left for the subnet identifier. The IP address of the second subnet is therefore 192.14.18.64/26, which in binary notation is as follows:
To get the “64” in the above address, the “010xxxxx” is converted to “01000000,” which is equal to the decimal value of 64. Using the larger subnet mask (/24), the network designer is able to assign address bits within it to create the two smaller subnets. With this the allocated address space is used more efficiently.
As IP networks grew to accommodate more users, it became apparent that many organizations needed larger address blocks than a class C (/24) network provided. These organizations were, therefore, allocated a class B (/16) address block, which in many cases was much larger than their networks required. Also, as enterprise and service provider networks and the Internet grew rapid, the pool of unassigned class B addresses (214, or about 16,000) was rapidly depleted.
Also, during the early phase of the Internet development, some organizations were allocated address spaces far larger than they actually needed. All these factors among others led to inefficient address allocation and use, as well as poor routing in networks. With the class B addresses seriously on the verge of depletion, a large number of class C address were given out. The large number of the allocated smaller class C addresses we used to create networks (geographically dispersed) resulted in very large routing tables in routers. These smaller networks were designed and dispersed such that they offered little opportunity for route aggregation.
CIDR (which is based on VLSM) was designed to address the limitations of classful addressing [RFC1517, RFC1518, RFC1519, RFC4632]. The classful addressing method of allocating the IP address space (combined with how routing of IP packets is done) constrained networks designed with the smaller address classes from being scalable. The CIDR scheme was developed to allow a network designer to flexibly repartition any address space (without being limited by class boundaries) to create a network with a larger or smaller block of addresses to be allocated to users. Other than replacing the inefficient classful addressing method, CIDR slowed the rapid depletion of IPv4 addresses and the growth in routing table sizes on routers in networks.
In classful addressing as discussed above, the network addresses (prefixes) are written in a field that is one or more bytes in length, resulting in the class A, B, or C addresses shown in Table B.1. IP address allocations were therefore based on the byte boundaries of the 4 bytes of an IP address. A full IP address was considered to be the concatenation of an 8, 16, or 24 bit network prefix and a corresponding 24, 16, or 8 bit host identifier. With this, the smallest address allocation was the class C addresses that carried only a maximum of 256 host identifiers. This was often too small for most organizations. The larger class B addresses carried 65,536 host identifiers, which was often too large to be used efficiently by even large networks/organizations.
As enterprise and service provider networks grew, it became increasingly apparent that more flexible and efficient IPv4 addressing methods were needed. With CIDR, an address space is allocated to an organization on any address bit boundary, instead of an 1 byte boundaries. CIDR allows a network designer to partition a larger network into various sized subnets, facilitating creating and sizing a network more appropriately to meet the requirements of the organization.
The CIDR notation (which is now the standard way of representing IP addresses) is a syntax for specifying IP addresses and their associated prefixes for the purpose of routing packets in networks. In this notation, a network address or routing prefix in the IP address is written with a suffix indicating the number of bits of the prefix. Examples are, for IPv4, 192.168.18.0/24, and for IPv6, 2001:db8:abcd:0012::0/64.
With the introduction of CIDR, the process of allocating address blocks to organizations is based on the actual short-term and projected long-term needs of the organizations. The CIDR prefix-based method of representing IP addresses and the associated route aggregation properties allow blocks of addresses to be grouped into single routing table entries. This enables routing to be done more efficiently in the Internet. These address groups (commonly called CIDR blocks) each share a common shorter prefix.
CIDR allows for the aggregation of multiple contiguous network prefixes (called route aggregation, route summarization, or prefix aggregation,) for the creation of supernetwork (or supernet) in networks. The resulting supernet has a subnet mask (network prefix) that is smaller than the individual subnet masks (prefixes) used in constructing the supernet. The supernet prefixes are advertised by routers as aggregates, thus reducing the number of entries in routing tables. The advantages of CIDR and route aggregation can be summarized as follows:
CIDR address blocks are managed by the IANA with assistance from the RIRs. The IANA allocates to the RIRs large, short-prefix CIDR address blocks who are then responsible for distributing these address blocks to their customers. The RIRs (responsible for address management and allocation in each geographic area, North America, Europe, Africa, etc.) divide these short-prefix CIDR address blocks and allocate the subblocks to the LIRs. Similarly, the LIRs subdivide the address blocks they receive and allocate them to their customers. End-user networks receive address prefixes from the LIRs sized according to their network needs.
The following are examples of reserved IPv4 addresses:
The following three blocks of the IPv4 address space are reserved by the IANA for private networks:
From Table B.1, it can be seen that only parts of the “172” and the “192” address ranges are designated for private use. The remaining addresses in these address classes (Classes B and C) are public and routable on the global Internet.
Packet with these addresses cannot be routed on the Internet, so such packets are dropped by the routers. In order to communicate with the outside networks, these IP addresses have to be translated to public (routable) IP addresses using a NAT (Network Address Translation) device, or Web Proxy server. A separate range of private addresses was created to control the assignment of the already-limited IPv4 public routable address pool. By using a private address range within a home, office, campus, and similar environments, the demand for routable IPv4 addresses globally decreased significantly. It has also helped delay the exhaustion of routable IPv4 addresses.
When a process running on the host generates a packet with destination IP address set to the loopback address, the operating system loops the packet back to the process without any interference from the network interface adapter. Data sent to the loopback address are forwarded by the operating system to a virtual network interface within operating system that turns it around. The loopback address is mostly used for testing (on a single host) how a client–server process works after its implementation.
The assignment of link-local addresses may be done manually by a network administrator or automatically through mechanisms and procedures in a host's operating system [RFC3927]. In case a host is not able to obtain an IP address from a DHCP (Dynamic Host Control Protocol) server and it has not been assigned any IP address manually, the host can assign itself an IP address from a range of reserved link-local addresses. In the absence of a DHCP server, the host may randomly choose an IP address from the range of reserved link-local addresses and then check (via ARP) to ensure that no other host has assigned itself the same IP address.
Once the hosts (on the point-to-point link or connected to the same single network segment) are configured with link-local addresses in the same address range, they can communicate directly (not across a router) with each other. These IP addresses do not allow hosts to communicate with each other when they do not belong/connect to the same physical or logical network link or segment.
Address resolution can be done via one of these methods depending on the format of the original address:
The MAC address of an interface can be queried given the already known IP address using the ARP for IPv4 or the Neighbor Discovery Protocol (NDP) for IPv6. ARP or NDP is used to translate IP addresses (OSI Layer 3) into Ethernet MAC addresses (OSI Layer 2). A device maintains an ARP table in which it keeps IP address-to-MAC address mappings for the network.
The rapid growth of the Internet and its extensive reach drastically increased the number of devices that needs unique IP to be able to communicate with others. It came to a point that enterprise and service providers could not continue to give their customers globally unique IPv4 addresses, and at the same time could not get new globally unique IPv4 addresses for expanding their networks. In spite of these factors, the service providers were expected to continue to serve both existing customers and accept new customers.
The service provider can accept new customers requiring globally unique addresses if their allocated IPv4 address space can accommodate them. With the 32 bit (4 byte) IPv4 address field, the total IPv4 address space is limited to 4,294,967,296 (232) addresses. At the start of the Internet, the 32 bit IPv4 address space was considered larger enough for future needs and so there were little concerns about address depletion.
However, the rapid growth of the number of devices (e.g., mobile devices (laptop computers, smart phones, etc.), always-on communication devices (ADSL modems, cable modems), communication-enabled vehicles, and other electronic devices) expanded the demand for extra IPv4 addresses, a situation that was not foreseen at the start of the Internet. As addresses were assigned to all these wide range of new users, the number of unassigned addresses decreased. Furthermore, the rising use of Internet-enabled devices and appliances created great concerns that the public IPv4 address space may eventually be depleted sooner than later. To address these concerns, the following practices were adopted (in addition to CIDR):
The various limitations of IPv4 discovered since its inception spurred the development of IPv6 in the 1990s, which has been in various stages of commercial deployment since 2006. The IETF redesigned the IPv6 addresses with the main goal of addressing the drawbacks of the IPv4 addresses. IPv6 improved many of the functionalities of IPv4, providing changes that improved addressing, security, and configuration and maintenance. An IPv6 packet has a 128 bit address field, large enough to allow very much larger number of devices to be assigned IP addresses.
Currently, a majority of the devices running on the Internet still use IPv4 and it is anticipated that the shift to total IPv6 use is far in the future. This means that IPv4 and IPv6 will coexist for many more years, and this coexistence must be transparent to users of either protocols. A number of mechanisms has been provided by IPv6 [RFC2893, RFC7059], which allow IPv4 and IPv6 to coexist until the Internet shifts to only IPv6:
It is argued that the best long-term solution to IPv4 address depletion is to move to IPv6. The long-term use of IPv4 is still being debated. IPv6 provides a much larger address space that also allows improved route aggregation across the Internetworks and offers large subnetwork allocations to organizations. Migration to IPv6 is in progress (for communication-enabled vehicles, sensors, Internet-of-Things (IoT) devices, etc.), but complete migration is not expected soon.
Service providers and some enterprises have to make compromises between growing their networks using IPv6 and continuing to serve existing and new IPv4 customers. The technologies and solutions discussed above enable enterprises and service providers to implement mixed IP addressing solutions even as they build IPv6 networks to accommodate new services such as communication-enabled vehicles, sensors, and IoT devices.
If the IP packet's Internet header length (IHL) is greater than 5 (i.e., it is from 6 to 15) 32 bit words, it means that the packet carries Options and must be processed. The Options field is a variable length field (up to 40 bytes in length) and consists of the following subfields:
The value in the IHL field must include enough extra 32 bit words to hold all the options (plus any padding needed to ensure that the header contains an integer number of 32 bit words). The options field is not often used because of some concerns about network security. Security concerns discourage the use of Loose Source and Record Route and Strict Source and Record Route.
With IPv4, it is possible for routers to fragment packets (split them into multiple smaller units) if required to transmit them on network interfaces that cannot handle larger packets. Packets that are fragmented must be reassembled at the destination node. Fragmentation and reassembly is much complex in IPv4, but the process has been simplified in IPv6.
MTU mismatch occurs when the MTU of the output network interface is smaller than the MTU of the input interface. This mismatch can result in packet fragmentation or discard at the device having the mismatch. It is possible for an IP packet to be fragmented at a routing device, and for the fragments (which are carried in whole IP packets with their own headers) to be again fragmented at another routing device. The Identification (fragment ID) field (16 bits) in the IP header identifies the original packet a fragment belongs to. This information is used by the destination node to reassemble the fragmented packet later. Each original packet is assigned a unique Identification value (fragment ID) and every fragment of that packet is assigned the same Identification value.
In IPv4, any router (including the sender) can fragment an IP packet. Usually, only the destination endpoint reassembles fragments into a complete packet. It is possible for a border firewall (at the edge of a network) to reassemble fragments to obtain the original packets to allow it enforce security filtering rules. In IPv6, packet fragmentation can only be done by the source node, and a fragmentation extension header is carried in the packet header that contains the information needed for the destination node to reassemble complete packets.
The first bit of the IP header Flags field is reserved and all nodes sourcing a packet must set this bit to zero. The second bit (the DF (Don't Fragment) flag), if set (= 1), indicates that any routing device receiving the packet must not fragment it. Instead, if a packet with DF set reaches a routing device whose output interface cannot handle a packet of that size, that packet is dropped (and ICMP Destination Unreachable message is sent to the sender). The DF flag can be used by a source node when it wants to send packets to another node but also wants to determine if the node's interfaces can forward the packet.
The DF flag can be used for Path MTU Discovery [RFC1191], either automatically by features implemented in the host's IP software or manually using diagnostic tools such as ping [RFC792, RFC1122] or Traceroute (IPv4 Option 18) [RFC792]. The third bit (the MF (More Fragments) flag), if MF set indicates that there are more fragments following the particular fragment carrying the MF bit. Packets that have not undergone fragmentation have their MF flag set to zero. Except the last fragment packet, all other fragments of the same packet have their MF flag set.
The 13 bit Fragment Offset field is used by the destination node when reassembling fragmented packets. The offset is measured in 8 byte blocks and is the offset of a particular fragment measured from the front of the original IP packet. The first fragment of an IP packet (which constitutes the start of the packet) has an offset of 0. The last fragment of a packets carries a nonzero Offset field to allow it to be easily differentiated from a packet that has not been fragmented.
The 13 bit Fragment Offset field provides a maximum offset of (213 – 1) × 8 = 65,528 bytes, which is greater than the maximum IP packet length of 65,535 bytes (i.e., 216 – 1) including the 20 byte basic IP header length (65,528 + 20 = 65,548 bytes). The maximum IP payload is limited to 65,535 − 20 bytes = 65,515 bytes. Dividing the payload data 65,515 bytes by 8 byte (unit of the offset) results in a maximum of 8189 offset units.
This means Fragment Offset field is limited to maximum of 8189 actual offset units (and not 8191 (i.e., 213 – 1) when considering the 13 bit Fragment Offset field). An IP fragment carrying an Fragment Offset value set to 8189 (i.e., in the last fragment) could have a maximum payload of 3 bytes:
When a router receives a packet, it performs an IP forwarding table lookup using the IP destination address and determines the outgoing interface to use and that interface's MTU. If the packet size is larger than the interface's MTU, and the DF bit is set to 0, the router may fragment the packet. When there is an MTU mismatch, the router may divide the packet into fragments.
The maximum size of each fragment is the MTU of the interface minus the IP header size. The IP header is 20 bytes minimum (without IP options) and 60 bytes maximum (with IP options). The router formats each fragment into an IP packet and with each fragment carrying IP packet modified as follows:
For an MTU of L bytes and a basic IP header size of 20 bytes, the fragment offsets would be multiples of (L – 20)/8. If we take the MTU of the interface to be 1500 bytes and the minimum IP header size of 20 bytes, then the fragment offsets to be carried in the fragment offset field would be multiples of (1500 – 20)/8 = 185, that is, 0, 185, 370, 555, 740, and so on
To identify an arriving IP packet as a fragment, a receiver checks if at least one of the following conditions is true:
Using the Identification (fragment ID) field, the receiver identifies fragments that belong to the same (original) IP packet so that reassembly can be done. Using both the fragment offset field values and the MF flag, the receiver reassembles the packet from fragments with the same identification field value.
Reassembly may involve placing the fragments in a reassembly buffer, with each new arriving fragment located in the reassembly buffer starting at fragment offset field value × 8 bytes from the front (beginning) of the buffer. When the receiver takes in the last fragment (which has the MF flag set to 0), it can compute the total length of the original data payload by multiplying the offset in the last fragment by 8 and adding the size of the data in the last fragment. After receiving all fragments, the receiver can sequence them in the correct order using their offsets. The reassembled original IP packet is then transferred to the upper-layer protocol for further processing.
If a 2500 byte IP packet is transmitted from a source and is fragmented into chunks (fragments) of 1020 bytes, three fragments can be created as follows:
In the above fragmentation process, “data size” includes the length of the ICMP, IGMP, or Transport Layer header.
The maximum length of IP packets (65,535 bytes) are much larger than the maximum length of Ethernet frames (of 1518 bytes, with a payload of 1500 bytes). This means IP packets larger than 1500 must be segmented and carried in several Ethernet frames. For example, the number of Ethernet frames required to transport an IP packet with maximum size of 65,535 bytes can be calculated as 65,535 ÷ 1500 = 43.69.
This shows that it takes 44 Ethernet frames to transport one IP packet of maximum size across an Ethernet interface. However, this example does not imply that IP packets are always segmented/fragmented before forwarded over Ethernet. This is because most IP applications are designed to source packets in data blocks smaller than the maximum Ethernet frame size.
IP forwarding typically involves an IP forwarding table lookup, decrementing the TTL count (by one), recalculating the IP header checksum, encapsulating the IP packet in a Data Link Layer frame, recalculating the Data Link layer checksum, and forwarding the frame to the correct output interface. Forwarding table lookups can be done in hardware, as can the decrementing of the TTL, recalculation of the IP header checksum, and the Data Link layer frame rewrites.
The routing devices also run routing protocols (such as OSPF, RIP, and BGP) allowing them to communicate with other routing devices to generate the information needed to build their routing tables. These routing tables are in turn used to generate the IP forwarding tables that can be used for the IP destination address lookups required to determine the outgoing interface for incoming packets.
Allocating blocks of class C addresses was one strategy used to prevent the rapid depletion of class B addresses. However, large class C allocations required many more routing table entries to be maintained in routers. As discussed above, CIDR was introduced to improve both IPv4 address space utilization and routing scalability in the Internet. CIDR allows a block of IP addresses to be aggregated/summarized into a single routing table entry. This consolidation results in a significant reduction in the number of separate routing table entries maintained, particularly, in core routers. The block of IP addresses is consolidated in the routing table entry as follows:
The start of the address block is the “lowest address in block” and the number of class C addresses in the block is specified by the supernet mask. The supernet mask (or the CIDR mask) contains 1s for the common prefix (i.e., part with identical binary values) for all the addresses, and 0s for the parts of the addresses that have different values.
Routes in a CIDR block can be summarized by routing devices in a single router advertisement called an aggregate. The networks or subsets that make up a given CIDR block are said to be more specific with respect to that CIDR block. The common prefix of the more specific addresses (that make up the CIDR block) is greater than that of the CIDR block itself.
With the introduction of CIDR, routing devices perform IP forwarding table lookups using longest-prefix matching (LPM) searches (also called maximum prefix length match). Currently, all routers support CIDR and use LPM lookups to determine the next hop and outgoing interface for a packet. The CIDR mask is used to determine the number of prefix bits that are to be used in the LPM searches. If there exist multiple routes with different prefix lengths in the forwarding table to the destination, the router selects the route with the longest prefix.
This is because each of the IP forwarding table entries may represent a specific subnet, creating a situation where one IP destination address may match more than one entry. When this happens, the most specific of these matching entries, that is, the one with the longest subnet mask, is referred to as the longest prefix match. It is the forwarding table entry that matches the largest number of leading address bits of the destination address.
An IP address can be checked to see if it is part of a CIDR block. The address is considered to match the supernet/CIDR prefix if the leading (i.e., starting) N bits of the address and the supernet/CIDR prefix are identical. Given the 32 bit IPv4 address and an N-bit CIDR prefix, if 32-N bits do not match, then potentially 232−N IPv4 addresses could match a given N-bit CIDR prefix. A larger CIDR prefix has potentially fewer address matches, while a smaller prefix has potentially more address matches. This also means that a single address lookup can produce multiple CIDR prefix matches, assuming each prefix has a different length.
Let us consider the following two entries in an IPv4 forwarding table:
When a routing device needs to lookup the destination address, 192.168.20.18, both entries in the forwarding table will match this address, meaning both entries in the forwarding table contain this destination address. However, the longest prefix match is found to be the entry 192.168.20.15/28, since the /28 subnet mask is longer than the other mask /16, making the interface corresponding to 192.168.20.15/28 the more specific route.
The 8 bits TTL field in the IP header is used to prevent packets from being forwarded from router to router indefinitely in a network that has routing loops. The TTL was originally intended to be a lifetime limit for a packet in seconds, but it ended up implemented as a maximum lifetime in the number of hops a packet can traverse, that is, a “hop count.”
This means that every time a packet traverses a router, the TTL is decremented by 1. If the TTL reaches 0, the packet is dropped. When a packet is dropped, an ICMPv4 nondelivery message (ICMP Time Exceeded message [RFC792) is transmitted to the packet sender. This mechanism governs how the Traceroute command (based on IP Option 18) works.
The IPv4 header checksum, as discussed earlier, is a simple mechanism used in an IPv4 packet to protect (only) the header from data corruption [RFC791]. This 16 bit checksum is calculated only over the IP header bytes and the field in which it is carried is also part of the IP packet header. At each router, an IP packet is modified accordingly and the checksum is recalculated. The packet will be discarded if the calculated checksum does not match the received checksum.
The router must update the IP checksum if it modifies or changes any part of the IP header (such as decrementing the TTL and modifying the DSCP bits). The 16 bit IP header checksum field contains the 16 bit ones' complement of the ones' complement sum of all 16 bit words in the header [RFC1812].
Since the TTL is decremented by one at each hop (router), the IP header checksum must be recalculated at each hop. References [RFC1141] and [RFC1624] describe how the IP checksum can be computed incrementally (after, for example, a TTL update).
The IPv4 header checksum is eliminated in IPv6. It was argued that the checksums provided in Layer 2 protocols such as Ethernet, ATM (Header Error Control (HEC)), and PPP, combined with the checksums supported in upper-layer protocols such as TCP, UDP, SCTP, ICMP, IGMP, and OSPF were sufficient to make including a separate IPv6 header checksum in IPv6 packets unnecessary.