This chapter is mostly focused on how the kernel handles the transmission of network packets. We have already glimpsed at many crucial data structures of the networking code, so we will just give a brief description of the other side of the story; namely, how a network packet is received.
The main difference between transmitting and receiving is that the kernel cannot predict when a packet will arrive at a network card device. Therefore, the networking code that takes care of receiving the packets runs in interrupt handlers and deferrable functions.
Let’s sketch a typical chain of events occurring when a packet carrying the right hardware address (card identifier) arrives to the network device.
The network device saves the packet in a buffer in the device’s memory (the card usually keeps several packets at once in a circular buffer).
The network device raises an interrupt.
The interrupt handler allocates and initializes a new socket buffer for the packet.
The interrupt handler copies the packet from the device’s memory to the socket buffer.
The interrupt handler invokes a function (such as
eth_type_trans( )
function for Ethernet and IEEE
802.3) to determine the protocol of the packet encapsulated in the
data link frame.
The interrupt handler invokes the netif_rx( )
function to notify the Linux networking code that a new packet is
arrived and should be processed.
Of course, the interrupt handler is specific to the network card device. Many device drivers try to be nice to the other devices in the system and move lengthy tasks, such as allocating a socket buffer or copying a packet to deferrable functions.
The netif_rx( )
function is the main entry point
of the receiving code of the networking layer (above the network card
device driver). The kernel uses a per-CPU queue for the packets that
have been received from the network devices and are waiting to be
processed by the various protocol stack layers. The function
essentially appends the new packet in this queue and invokes
cpu_raise_softirq( )
to schedule the activation of
the NET_RX_SOFTIRQ
softirq. (Remember that the
same softirq can be executed concurrently on several CPUs, hence the
reason for the per-CPU queue of received packets.)
The NET_RX_SOFTIRQ
softirq is implemented by the
net_rx_action( )
function, which essentially
executes the following operations:[121]
Extracts the first packet from the queue. If the queue is empty, it terminates.
Determines the network layer protocol number encoded in the data link layer.
Invokes a suitable function of the network layer protocol.
The corresponding function for the IP protocol is named
ip_rcv( )
, which essentially executes the
following actions:
Checks the length and the checksum of the packet and discards it if it is corrupted or truncated.
Invokes ip_route_input( )
, which initializes the
destination cache (dst_entry
field) of the socket
buffer descriptor. To determine the route followed by the packet, the
function looks the route up first in the route cache, and then in the
FIB (if the route cache doesn’t include a relevant
entry). In this way, the kernel determines whether the packet must be
forwarded to another host or simply passed to a protocol of the
transport layer.
Checks to see whether any packet sniffing or other input policy is enforced. In the affirmative case, it handles the packet accordingly; we don’t discuss these topics further.
Invokes the input
method of the
dst_entry
object of the packet.
If the packet has to be forwarded to another host, the input method
is implemented by the ip_forward( )
function;
otherwise, it is implemented by the ip_local_delivery( )
function. Let’s follow the latter path.
The ip_local_delivery( )
function takes care of
reassembling the original IP datagram, if the datagram has been
fragmented along its way. Then the function reads the IP header and
determines the type of transport protocol to which the packet
belongs. If the transport protocol is TCP, the function ends up
invoking tcp_v4_rcv( )
; if the transport protocol
is UDP, the function ends up invoking udp_rcv( )
.
Let’s continue following the UDP path. The
udp_rcv( )
function essentially executes the
following actions:
Invokes the udp_v4_lookup( )
function to find the
INET socket to which the UDP datagram has been sent (by looking at
the port number inside the UDP header). The kernel keeps the INET
socket in a hash table so that the lookup operation is reasonably
fast. If the UDP datagram is not associated with a socket, the
function discards the packet and terminates.
Invokes udp_queue_rcv_skb( )
, which in turn
invokes sock_queue_rcv_skb( )
, to append the
packet into a queue of the INET socket
(receive_queue
field of the
sock
object) and to invoke the
data_ready
method of the sock
object.
Releases the socket buffer and the socket buffer descriptor.
INET sockets implement the data_ready
method by
means of the sock_def_readable( )
function, which
essentially wakes up any process sleeping in the
socket’s wait queue (listed in the
sleep
field of the sock
object).
There is one final step to describe what happens when a process reads
from the BSD socket owning our INET socket. The read( )
system call triggers the read
method
of the file object associated with the socket’s
special file. This method is implemented by the sock_read( )
function, which in turn invokes the
sock_recvmsg( )
function. The latter function is
similar to sock_sendmsg( )
described earlier.
Essentially, it invokes the recvmsg
method of the
BSD socket. In turn, this method (inet_recvmsg( )
)
invokes the recvmsg
method of the INET socket;
that is, either the tcp_recvmsg( )
or the
udp_recvmsg( )
function.
Finally, the udp_recvmsg( )
function executes the
following actions:
Invokes the skb_recv_datagram( )
function to
extract the first packet from the receive_queue
queue of the INET socket and return the address of the corresponding
socket buffer descriptor. If the queue is empty, the function blocks
the current process (unless the read operation was not blocking).
If the UDP datagram carries a valid checksum and checks that the message has not been corrupted during the transmission (actually, this step is performed at the same time as Step 3).
Copies the payload of the UDP datagram into the User Mode buffer.
[121] We omit discussing several special cases, such as when the packet has to be quickly forwarded to another network card device or when the host is acting as a bridge that links two local area network as if they were a single one.