Chapter 7. Express Data Path

Express Data Path (XDP) is a safe, programmable, high-performance, kernel-integrated packet processor in the Linux network data path that executes BPF programs when the NIC driver receives a packet. This allows XDP programs to make decisions regarding the received packet (drop, modify, or just allow it) at the earliest possible point in time.

The execution point is not the only aspect that makes XDP programs fast; other design decisions play a role in that:

  • There are no memory allocations while doing packet processing with XDP.

  • XDP programs work only with linear, unfragmented packets and have the start and end pointers of the packet.

  • There’s no access to full packet metadata, which is why the input context this kind of program receives will be of type xdp_buff instead of the sk_buff struct you encountered in Chapter 6.

  • Because they are eBPF programs, XDP programs have a bounded execution time, and the consequence of this is that their usage has a fixed cost in the networking pipeline.

When talking about XDP, it is important to remember that it is not a kernel bypass mechanism; it is designed to be integrated with other kernel components and the internal Linux security model.

Note

The xdp_buff struct is used to present a packet context to a BPF program that uses the direct packet access mechanism provided by the XDP framework. Think of it as a “lightweight” version of the sk_buff.

The difference between the two is that sk_buff also holds and allows you to mingle with the packets’ metadata (proto, mark, type), which is only available at a higher level in the networking pipeline. The fact that xdp_buff is created early and doesn’t depend on other kernel layers is one reason it’s faster to obtain and process packets using XDP. The other reason is that xdp_buff doesn’t hold references to routes, Traffic Control hooks, or other kind of packet metadata like it would with program types that use an sk_buff.

In this chapter we explore the characteristics of XDP programs, the different kinds of XDP programs out there, and how they can be compiled and loaded. After that, to give more context, we discuss real-world use cases for it.

XDP Programs Overview

Essentially, what XDP programs do is that they make determinations about the received packet, and then they can edit the received packet’s content or just return a result code. The result code is used to determine what happens to the packet in the form of an action. You can drop the packet, you can transmit it out the same interface, or you can pass it up to the rest of the networking stack. Additionally, to cooperate with the network stack, XDP programs can push and pull a packet’s headers; for example, if the current kernel does not support an encapsulation format or a protocol, an XDP program can de-encapsulate it or translate the protocol and send the result to the kernel for processing.

But wait, what’s the correlation between XDP and eBPF?

It turns out that XDP programs are controlled through the bpf syscall and loaded using the program type BPF_PROG_TYPE_XDP. Also, the execution driver hook executes BPF bytecode.

An important concept to understand when writing XDP programs is that the contexts where they will run are also called operation modes.

Operation Modes

XDP has three operation modes to accommodate easily testing functions, custom hardware from vendors, and commonly built kernels without custom hardware. Let’s go over each of them.

Native XDP

This is the default mode. In this mode, the XDP BPF program is run directly out of the networking driver’s early receive path. When using this mode, it’s important to check whether the driver supports it. You can check that by executing the following command against the source tree of a given kernel version:

# Clone the linux-stable repository
git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
linux-stable

# Checkout the tag for your current kernel version
cd linux-stable
git checkout tags/v4.18

# Check the available drivers
git grep -l XDP_SETUP_PROG drivers/

That produces output like this:

drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
drivers/net/ethernet/cavium/thunder/nicvf_main.c
drivers/net/ethernet/intel/i40e/i40e_main.c
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
drivers/net/ethernet/mellanox/mlx4/en_netdev.c
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
drivers/net/ethernet/netronome/nfp/nfp_net_common.c
drivers/net/ethernet/qlogic/qede/qede_filter.c
drivers/net/netdevsim/bpf.c
drivers/net/tun.c
drivers/net/virtio_net.c

From what we can see, kernel 4.18 supports the following:

  • Broadcom NetXtreme-C/E network driver bnxt

  • Cavium thunderx driver

  • Intel i40 driver

  • Intel ixgbe and ixgvevf drivers

  • Mellanox mlx4 and mlx5 drivers

  • Netronome Network Flow Processor

  • QLogic qede NIC Driver

  • TUN/TAP

  • Virtio

With a clear idea of the native operation mode, we can proceed to see how XDP program duties can be directly handled by network cards by using offloaded XDP.

Offloaded XDP

In this mode the XDP BPF program is directly offloaded into the NIC instead of being executed on the host CPU. By pushing execution off of the CPU, this mode has high-performance gains over native XDP.

We can reuse the kernel source tree we just cloned to check what NIC drivers in 4.18 support hardware offload by looking for XDP_SETUP_PROG_HW:

git grep -l XDP_SETUP_PROG_HW drivers/

That should output something like this:

include/linux/netdevice.h
866:    XDP_SETUP_PROG_HW,

net/core/dev.c
8001:           xdp.command = XDP_SETUP_PROG_HW;

drivers/net/netdevsim/bpf.c
200:    if (bpf->command == XDP_SETUP_PROG_HW && !ns->bpf_xdpoffload_accept) {
205:    if (bpf->command == XDP_SETUP_PROG_HW) {
560:    case XDP_SETUP_PROG_HW:

drivers/net/ethernet/netronome/nfp/nfp_net_common.c
3476:   case XDP_SETUP_PROG_HW:

That shows only the Netronome Network Flow Processor (nfp) meaning that it can operate in both modes by also supporting hardware offload along with native XDP.

Now a good question for yourself might be, what do I do when I don’t have network cards and drivers to try my XDP programs? The answer is easy, generic XDP!

Generic XDP

This is provided as a test-mode for developers who want to write and run XDP programs without having the capabilities of native or offloaded XDP. Generic XDP has been supported since kernel version 4.12. You can use this mode, for example, on veth devices—we use this mode in the subsequent examples to show the capabilities of XDP without requiring you to buy specific hardware to follow along.

But who is the actor responsible for the coordination between all of the components and the operation modes? Continue to the next section to learn about the packet processor.

The Packet Processor

The actor that makes it possible to execute BPF programs on XDP packets and that coordinates the interaction between them and the network stack is the XDP packet processor. The packet processor is the in-kernel component for XDP programs that processes packets on the receive (RX) queue directly as they are presented by the NIC. It ensures that packets are readable and writable and allows you to attach post-processing verdicts in the form of packet processor actions. Atomic program updates and new program loads to the packet processor can be done at runtime without any service interruption in terms of networking and associated traffic. While operating, XDP can be used in “busy polling” mode, allowing you to reserve the CPUs that will have to deal with each RX queue; this avoids context switches and allows immediate packet reactivity upon arrival regardless of IRQ affinities. The other mode XDP can be used in is the “interrupt driven” mode that, on the other hand, does not reserve the CPU but instructs an interrupt acting as an event medium to inform the CPU that it has to deal with a new event while still doing normal processing.

In Figure 7-1 you can see in the interaction points between RX/TX, applications, the packet processor, and the BPF programs applied to its packets.

Notice that there are a few squares with a string prepended by XDP_ in Figure 7-1. Those are the XDP result codes, which we cover next.

Interaction between the XDP packet processor and the network stack, dashed lines represent the packets flow, the bold full line represent the loading of the BPF program in the XDP packet processor.
Figure 7-1. The packet processor

XDP result codes (packet processor actions)

After a decision is made about the packet in the packet processor, it can be expressed using one of the five return codes that then can instruct the network driver on how to process the packet. Let’s dive into the actions that the packet processor performs:

Drop (XDP_DROP)

Drops the packet. This happens at the earliest RX stage in the driver; dropping a packet simply implies recycling it back into the RX ring queue it just “arrived” on. Dropping the packet as early as possible is key for the denial-of-service (DoS) mitigation use cases. This way, dropped packets use as little CPU processing time and power as possible.

Forward (XDP_TX)

Forwards the packet. This can happen before or after the packet has been modified. Forwarding a packet implies bouncing the received packet page back out the same NIC it arrived on.

Redirect (XDP_REDIRECT)

Similar to XDP_TX in that it is able to transmit the XDP packet, but it does so through another NIC or into a BPF cpumap. In the case of a BPF cpumap, the CPUs serving XDP on the NIC’s receive queues can continue to do so and push the packet for processing the upper kernel stack to a remote CPU. This is similar to XDP_PASS, but with the ability that the XDP BPF program can keep serving the incoming high load as opposed to temporarily spending work on the current packet for pushing into the upper layers.

Pass (XDP_PASS)

Passes the packet to the normal network stack for processing. This is equivalent to the default packet handling behavior without XDP. This can be done in one of two ways:

  • Normal receive allocates metadata (sk_buff), receives the packet onto the stack, and steers the packet to another CPU for processing. It allows for raw interfaces to user-space. This can happen before or after the packet has been modified.

  • Generic receive offload (GRO) can perform a receive of large packets and combines packets of the same connection. GRO eventually passes the packet through the “normal receive” flow after processing.

Code error (XDP_ABORTED)

Denotes an eBPF program error and results in the packet being dropped. It is not something a functional program should ever use as a return code. For example, XDP_ABORTED would be returned if the program divided by zero. XDP_ABORTED’s value will always be zero. It passes the trace_xdp_exception tracepoint, which can be additionally monitored to detect misbehavior.

These action codes are expressed in the linux/bpf.h header file as follows:

enum xdp_action {
    XDP_ABORTED = 0,
    XDP_DROP,
    XDP_PASS,
    XDP_TX,
    XDP_REDIRECT,
};

Because XDP actions determine different behaviors and are an internal mechanism of the packet processor, you can look at a simplified version of Figure 7-1 focused on only the return actions (see Figure 7-2).

Interaction between XDP actions triggered by a BPF program and the network stack
Figure 7-2. XDP action codes

An interesting thing about XDP programs is that you don’t usually need to write a loader to load them. There is a good loader in most Linux machines implemented by the ip command. The next section describes how to use it.

XDP and iproute2 as a Loader

The ip command, available in iproute2, has the ability to act as a frontend to load XDP programs compiled into an ELF file and has full support for maps, map relocation, tail call and object pinning.

Because loading an XDP program can be expressed as a configuration of an existing network interface, the loader is implemented as part of the ip link command, which is the one that does network device configuration.

The syntax to load the XDP program is simple:

# ip link set dev eth0 xdp obj program.o sec mysection

Let’s analyze this command parameter by parameter:

ip

This invokes the ip command.

link

Configures network interfaces.

set

Changes device attributes.

dev eth0

Specifies the network device on which we want to operate and load the XDP program.

xdp obj program.o

Loads an XDP program from the ELF file (object) named program.o. The xdp part of this command tells the system to use the native driver when it is available and fallback to generic otherwise. You can force using a mode or another by using a more specific selector:

  • xdpgeneric to use generic XDP

  • xdpdrv to use native XDP

  • xdpoffload to use offloaded XDP

sec mysection

Specifies the section name mysection containing the BPF program to use from the ELF file; if this is not specified, the section named prog will be used. If no section is specified in the program, you have to specify sec .text in the ip invocation.

Let’s see a practical example.

The scenario is that we have a system with a web server on port 8000 for which we want to block any access to its pages on the public-facing NIC of the server by disallowing all the TCP connections to it.

The first thing that we will need is the web server in question; if you don’t already have one, you can start one with python3.

$ python3 -m http.server

After your webserver is started, its open port will be shown in the open sockets using ss. As you can see the webserver is bound to any interface, *:8000, so as of now, any external caller with access to our public interfaces will be able to see its content!

$  ss -tulpn
Netid  State      Recv-Q Send-Q Local Address:Port   Peer Address:Port
tcp    LISTEN     0      5      *:8000                *:*
Note

Socket statistics, ss in the terminal, is a command-line utility used to investigate network sockets in Linux. It is effectively a modern version of netstat, and its user experience is similar to Netstat, meaning that you can pass the same arguments and get comparable results.

At this point, we can inspect the network interfaces on the machine that’s running our HTTP server:

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group defau
lt qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP g
roup default qlen 1000
    link/ether 02:1e:30:9c:a3:c0 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 84964sec preferred_lft 84964sec
    inet6 fe80::1e:30ff:fe9c:a3c0/64 scope link
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP g
roup default qlen 1000
    link/ether 08:00:27:0d:15:7d brd ff:ff:ff:ff:ff:ff
    inet 192.168.33.11/24 brd 192.168.33.255 scope global enp0s8
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe0d:157d/64 scope link
       valid_lft forever preferred_lft forever

Notice that this machine has three interfaces, and the network topology is simple:

lo

This is just the loopback interface for internal communication.

enp0s3

This is the management network tier; administrators will use this interface to connect to the web server to do their operations.

enp0s8

This is the interface open to the public, our web server will need to be hidden from this interface.

Now, before loading any XDP program, we can check open ports on the server from another server that can access its network interface, in our case, with IPv4 192.168.33.11.

You can check open ports on a remote host by using nmap as follows:

# nmap -sS 192.168.33.11
Starting Nmap 7.70 ( https://nmap.org ) at 2019-04-06 23:57 CEST
Nmap scan report for 192.168.33.11
Host is up (0.0034s latency).
Not shown: 998 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
8000/tcp open  http-alt

Good! Port 8000 is right there, at this point we need to block it!

Note

Network Mapper (nmap) is a network scanner that can do host, service, network, and port discovery along with operating system detection. Its main use cases are security auditing and network scanning. When scanning a host for open ports, nmap will try every port in the specified (or full) range.

Our program will consist of a single source file named program.c, so let’s see what we need to write.

It needs to use the IPv4 iphdr and Ethernet Frame ethhdr header structs and also protocol constants and other structs. Let’s include the needed headers, as shown here:

#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/in.h>
#include <linux/ip.h>

After the headers are included, we can declare the SEC macro we already met in the previous chapters, used to declare ELF attributes.

#define SEC(NAME) __attribute__((section(NAME), used))

Now we can declare the main entry point for our program, myprogram, and its ELF section name, mysection. Our program takes as input context an xdp_md struct pointer, the BPF equivalent of the in-driver xdp_buff. By using that as the context, we then define the variables we will use next such as the data pointers, the Ethernet, and IP layer structs:

SEC("mysection")
int myprogram(struct xdp_md *ctx) {
  int ipsize = 0;
  void *data = (void *)(long)ctx->data;
  void *data_end = (void *)(long)ctx->data_end;
  struct ethhdr *eth = data;
  struct iphdr *ip;

Because data contains the Ethernet frame, we can now extract the IPv4 layer from it. We also check that the offset where we look for the IPv4 layer doesn’t exceed the whole pointer space so that the static verifier stays happy. When the address space is exceeded we just drop the packet:

  ipsize = sizeof(*eth);
  ip = data + ipsize;
  ipsize += sizeof(struct iphdr);
  if (data + ipsize > data_end) {
    return XDP_DROP;
  }

Now, after all the verifications and setup, we can implement the real logic for the program, which basically drops every TCP packet while allowing anything else:

  if (ip->protocol == IPPROTO_TCP) {
    return XDP_DROP;
  }

  return XDP_PASS;
}

Now that our program is done, we can save it as program.c.

The next step is to compile the ELF file program.o out of our program using Clang. We can do this compilation step outside the target machine because BPF ELF binaries are not platform dependent:

$ clang -O2 -target bpf -c program.c -o program.o

Now back on the machine hosting our web server, we can finally load program.o against the public network interface enp0s8 using the ip utility with the set command, as described earlier:

# ip link set dev enp0s8 xdp obj program.o sec mysection

As you might notice, we select the section mysection as the entry point for the program.

At this stage, if that command returned zero as the exit code with no errors, we can check the network interface to see whether the program had been loaded correctly:

# ip a show enp0s8
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric/id:32
    qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:0d:15:7d brd ff:ff:ff:ff:ff:ff
    inet 192.168.33.11/24 brd 192.168.33.255 scope global enp0s8
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe0d:157d/64 scope link
       valid_lft forever preferred_lft forever

As you can see, our output for ip a now has a new detail; after the MTU, it shows xdpgeneric/id:32, which is showing two interesting bits of information:

  • The driver that had been used, xdpgeneric

  • The ID of the XDP program, 32

The last step is to verify that the loaded program is in fact doing what it is supposed to do. We can verify that by executing nmap again on an external machine to observe that port 8000 is no longer reachable:

# nmap -sS 192.168.33.11
Starting Nmap 7.70 ( https://nmap.org ) at 2019-04-07 01:07 CEST
Nmap scan report for 192.168.33.11
Host is up (0.00039s latency).
Not shown: 998 closed ports
PORT    STATE SERVICE
22/tcp  open  ssh

Another test to verify that it all works can be trying to access the program through a browser or doing any HTTP request. Any kind of test should fail when targeting 192.168.33.11 as the destination. Good job and congratulations on loading your first XDP program!

If you followed all of those steps on a machine that you need to restore to its original state, you can always detach the program and turn off XDP for the device:

# ip link set dev enp0s8 xdp off

Interesting! Loading XDP programs seems easy, doesn’t it?

At least when using iproute2 as the loader, you can skip the part of having to write a loader yourself. In this example, our focus was on iproute2, which already implements a loader for XDP programs. However, the programs are in fact BPF programs, so even if iproute2 can be handy sometimes, you should always remember that you can load your programs using BCC, as shown in the next section, or you can use the bpf syscall directly. Having a custom loader has the advantage of allowing you to manage the lifecycle of the program and its interactions with user-space.

XDP and BCC

Like with any other BPF program, XDP programs can be compiled, loaded, and run using BCC. The following example shows an XDP program that is similar to the one we used for iproute2 but that has a custom user-space loader made with BCC. The loader in this case is needed because we also want to count the number of packets we encounter while dropping TCP packets.

Like before, we create a kernel-space program named program.c first.

In the iproute2 example, our program needed to import the required headers for struct and function definitions related to BPF and protocols. Here we do the same, but we also declare a map of type BPF_MAP_TYPE_PERCPU_ARRAY using the BPF_TABLE macro. The map will contain a packet counter for each IP protocol index, which is the reason for the size 256 (the IP specification contains only 256 values). We want to use a BPF_MAP_TYPE_PERCPU_ARRAY type because that’s the one that guarantees atomicity of the counters at CPU level without locking:

#define KBUILD_MODNAME "program"
#include <linux/bpf.h>
#include <linux/in.h>
#include <linux/ip.h>

BPF_TABLE("percpu_array", uint32_t, long, packetcnt, 256);

After that, we declare our main function, myprogram, which takes as a parameter the xdp_md struct. The first thing this needs to contain is the variable declarations for the Ethernet IPv4 frames:

int myprogram(struct xdp_md *ctx) {
  int ipsize = 0;
  void *data = (void *)(long)ctx->data;
  void *data_end = (void *)(long)ctx->data_end;
  struct ethhdr *eth = data;
  struct iphdr *ip;
  long *cnt;
  __u32 idx;

  ipsize = sizeof(*eth);
  ip = data + ipsize;
  ipsize += sizeof(struct iphdr);

After we have all the variable declarations done and can access the data pointer that now contains the Ethernet frame and the ip pointer with the IPv4 packet, we can check whether the memory space is out of bounds. If it is, we drop the packet. If the memory space is OK, we extract the protocol and lookup the packetcnt array to get the previous value of the packet counter for the current protocol in the variable idx. Then we increment the counter by one. When the increment is handled, we can proceed and check whether the protocol is TCP. If it is, we just drop the packet without questioning; otherwise, we allow it:

  if (data + ipsize > data_end) {
    return XDP_DROP;
  }

  idx = ip->protocol;
  cnt = packetcnt.lookup(&idx);
  if (cnt) {
    *cnt += 1;
  }

  if (ip->protocol == IPPROTO_TCP) {
    return XDP_DROP;
  }

  return XDP_PASS;
}

Now let’s write the loader: loader.py.

It is made of two parts: the actual loading logic and the loop that prints the packet counts.

For the loading logic, we open our program by reading the file program.c. With load_func, we instruct the bpf syscall to use the myprogram function as “main” using the program type BPF.XDP. That stands for BPF_PROG_TYPE_XDP.

After the loading, we gain access to the BPF map named packetcnt using get_table.

Warning

Make sure to change the device variable from enp0s8 to the interface you want to work on.

#!/usr/bin/python

from bcc import BPF
import time
import sys

device = "enp0s8"
b = BPF(src_file="program.c")
fn = b.load_func("myprogram", BPF.XDP)
b.attach_xdp(device, fn, 0)
packetcnt = b.get_table("packetcnt")

The remaining part we need to write is the actual loop to print out the packet counts. Without this, our program will already be able to drop the packets, but we want to see what’s going on there. We have two loops. The outer loop gets keyboard events and terminates when there’s a signal to interrupt the program. When the outer loop breaks, the remove_xdp function is called, and the interface is freed from the XDP program.

Within the outer loop, the inner loop has the duty of getting back the values from the packetcnt map and prints them in the format protocol: counter pkt/s:

prev = [0] * 256
print("Printing packet counts per IP protocol-number, hit CTRL+C to stop")
while 1:
    try:
        for k in packetcnt.keys():
            val = packetcnt.sum(k).value
            i = k.value
            if val:
                delta = val - prev[i]
                prev[i] = val
                print("{}: {} pkt/s".format(i, delta))
        time.sleep(1)
    except KeyboardInterrupt:
        print("Removing filter from device")
        break

b.remove_xdp(device, 0)

Good! Now we can test that program by simply executing the loader with root privileges:

# python program.py

That will output a line every second with the packet counters:

Printing packet counts per IP protocol-number, hit CTRL+C to stop
6: 10 pkt/s
17: 3 pkt/s
^CRemoving filter from device

We encountered only two types of packets: 6 stands for TCP, and 17 stands for UDP.

At this point your brain will probably start thinking about ideas and projects for using XDP, and that’s extremely good! But as always, in software engineering if you want to make a good program, it’s important to write tests first—or at least write tests! The next section covers how you can unit-test XDP programs.

Testing XDP Programs

When working on XDP programs, the most difficult part is that in order to test the actual packet flow, you need to reproduce an environment in which all of the components are aligned to provide the correct packets. Although it’s true that with virtualization technologies nowadays, creating a working environment can be an easy task, it’s also true that a complicated setup can limit the reproducibility and programmability of the test environment. In addition to that, when analyzing the performance aspects of high-frequency XDP programs in a virtualized environment, the cost of virtualization makes the test ineffective because it’s much more substantial than the actual packet processing.

Fortunately, kernel developers have a solution. They have implemented a command that can be used to test XDP programs, called BPF_PROG_TEST_RUN.

Essentially, BPF_PROG_TEST_RUN gets an XDP program to execute, along with an input packet and an output packet. When the program is executed, the output packet variable is populated, and the return XDP code is returned. This means you can use the output packet and return code in your test assertions! This technique can also be used for skb programs.

For the sake of completeness and to make this example simple, we use Python and its unit testing framework.

XDP Testing Using the Python Unit Testing Framework

Writing XDP tests with BPF_PROG_TEST_RUN and integrating them with the Python unit testing framework unittest is a good idea for several reasons:

  • You can load and execute BPF programs using the Python BCC library.

  • Python has one of the best packet crafting and introspection libraries available: scapy.

  • Python integrates with C structs using ctypes.

As said, we need to import all of the needed libraries; that’s the first thing we will do in a file named test_xdp.py:

from bcc import BPF, libbcc
from scapy.all import Ether, IP, raw, TCP, UDP

import ctypes
import unittest


class XDPExampleTestCase(unittest.TestCase):
    SKB_OUT_SIZE = 1514  # mtu 1500 + 14 ethernet size
    bpf_function = None

After all the needed libraries are imported, we can proceed and create a test case class named XDPExampleTestCase. This test class will contain all of our test cases and a member method (_xdp_test_run) that we will use to do assertions and call bpf_prog_test_run.

In the following code you can see what _xdp_test_run looks like:

    def _xdp_test_run(self, given_packet, expected_packet, expected_return):
        size = len(given_packet)

        given_packet = ctypes.create_string_buffer(raw(given_packet), size)
        packet_output = ctypes.create_string_buffer(self.SKB_OUT_SIZE)

        packet_output_size = ctypes.c_uint32()
        test_retval = ctypes.c_uint32()
        duration = ctypes.c_uint32()
        repeat = 1
        ret = libbcc.lib.bpf_prog_test_run(self.bpf_function.fd,
                                           repeat,
                                           ctypes.byref(given_packet),
                                           size,
                                           ctypes.byref(packet_output),
                                           ctypes.byref(packet_output_size),
                                           ctypes.byref(test_retval),
                                           ctypes.byref(duration))
        self.assertEqual(ret, 0)
        self.assertEqual(test_retval.value, expected_return)

        if expected_packet:
            self.assertEqual(
                packet_output[:packet_output_size.value], raw(expected_packet))

It takes three arguments:

given_packet

This is the packet we test our XDP program against; it is the raw packet received by the interface.

expected_packet

This is the packet we expect to receive back after the XDP program processes it; when the XDP program returns an XDP_DROP or XDP_ABORT, we expect this to be None; in all the other cases, the packet remains the same as given_packet or can be modified.

expected_return

This is the expected return of the XDP program after processing our given_packet.

Besides the arguments, the body of this method is simple. It does conversion to C types using the ctypes library, and then it calls the libbcc equivalent of BPF_PROG_TEST_RUN, libbcc.lib.bpf_prog_test_run, using as test arguments our packets and their metadata. Then it does all of the assertions based on the results from the test call along with the given values.

After we have that function we can basically just write test cases by crafting different packets to test how they behave when passing through our XDP program, but before doing that, we need to do a setUp method for our test.

This part is crucial because the setup does the actual load of our BPF program named myprogram by opening and compiling a source file named program.c (that’s the file where our XDP code will be):

    def setUp(self):
        bpf_prog = BPF(src_file=b"program.c")
        self.bpf_function = bpf_prog.load_func(b"myprogram", BPF.XDP)

After the setup is done, the next step is to write the first behavior we want to observe. Without being too imaginative, we want to test that we will drop all TCP packets.

So we craft a packet in given_packet, which is just a TCP packet over IPv4. Then, using our assertion method, _xdp_test_run, we just verify that given our packet, we will get back an XDP_DROP with no return packet:

    def test_drop_tcp(self):
        given_packet = Ether() / IP() / TCP()
        self._xdp_test_run(given_packet, None, BPF.XDP_DROP)

Because that is not enough, we also want to explicitly test that all UDP packets are allowed. We then craft two UDP packets, one for given_packet and one for expected_packet, that are essentially the same. In that way we are also testing that UDP packets are not modified while being allowed with XDP_PASS:

    def test_pass_udp(self):
        given_packet = Ether() / IP() / UDP()
        expected_packet = Ether() / IP() / UDP()
        self._xdp_test_run(given_packet, expected_packet, BPF.XDP_PASS)

To make things a bit more complicated, we decided that this system will then allow TCP packets on the condition that they go to port 9090. When they do, they will also be rewritten to change their destination MAC address to redirect to a specific network interface with address 08:00:27:dd:38:2a.

Here’s the test case to do that. The given_packet has 9090 as a destination port, and we require the expected_packet with the new destination and port 9090 again:

    def test_transform_dst(self):
        given_packet = Ether() / IP() / TCP(dport=9090)
        expected_packet = Ether(dst='08:00:27:dd:38:2a') / 
            IP() / TCP(dport=9090)
        self._xdp_test_run(given_packet, expected_packet, BPF.XDP_TX)

With plenty of test cases, we now write the entry point for our test program, which will just call unittest.main() that then loads and executes our tests:

if __name__ == '__main__':
    unittest.main()

We have now written tests for our XDP program first! Now that we have the test acting as a specific example of what we want to have, we can write the XDP program that implements it by creating a file named program.c.

Our program is simple. It just contains the myprogram XDP function with the logic we just tested. As always, the first thing we need to do is to include the needed headers. Those headers are self-explainatory. We have a BPF program that will process TCP/IP flowing over Ethernet:

#define KBUILD_MODNAME "kmyprogram"

#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/tcp.h>
#include <linux/in.h>
#include <linux/ip.h>

Again, as with the other programs in this chapter, we need to check offsets and fill variables for the three layers of our packet: ethhdr, iphdr, and tcphdr, respectively, for Ethernet, IPv4, and TCP:

int myprogram(struct xdp_md *ctx) {
  int ipsize = 0;
  void *data = (void *)(long)ctx->data;
  void *data_end = (void *)(long)ctx->data_end;
  struct ethhdr *eth = data;
  struct iphdr *ip;
  struct tcphdr *th;

  ipsize = sizeof(*eth);
  ip = data + ipsize;
  ipsize += sizeof(struct iphdr);
  if (data + ipsize > data_end) {
    return XDP_DROP;
  }

Once we have the values we can implement our logic.

The first thing we do is to check whether the protocol is TCP ip->protocol == IPPROTO_TCP. When it is, we always do an XDP_DROP; otherwise, we do an XDP_PASS for everything else.

In the check for the TCP protocol, we do another control to check whether the destination port is 9090, th->dest == htons(9090); if it is, we change the destination MAC address at the Ethernet layer and return XDP_TX to bounce the packet through the same NIC:

  if (ip->protocol == IPPROTO_TCP) {
    th = (struct tcphdr *)(ip + 1);
    if ((void *)(th + 1) > data_end) {
      return XDP_DROP;
    }

    if (th->dest == htons(9090)) {
      eth->h_dest[0] = 0x08;
      eth->h_dest[1] = 0x00;
      eth->h_dest[2] = 0x27;
      eth->h_dest[3] = 0xdd;
      eth->h_dest[4] = 0x38;
      eth->h_dest[5] = 0x2a;
      return XDP_TX;
    }

    return XDP_DROP;
  }

  return XDP_PASS;
}

Amazing! Now we can just run our tests:

sudo python test_xdp.py

The output of it will just report that the three tests passed:

...
--------------------------------
Ran 3 tests in 4.676s

OK

At this point, breaking things is easier! We can just change the last XDP_PASS to XDP_DROP in program.c and observe what happens:

.F.
======================================================================
FAIL: test_pass_udp (__main__.XDPExampleTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_xdp.py", line 48, in test_pass_udp
    self._xdp_test_run(given_packet, expected_packet, BPF.XDP_PASS)
  File "test_xdp.py", line 31, in _xdp_test_run
    self.assertEqual(test_retval.value, expected_return)
AssertionError: 1 != 2

----------------------------------------------------------------------
Ran 3 tests in 4.667s

FAILED (failures=1)

Our test failed—the status code did not match, and the test framework reported an error. That’s exactly what we wanted! This is an effective testing framework to write XDP programs with confidence. We now have the ability to make assertions on specific steps and change them accordingly to the behavior that we want to obtain. Then we write the matching code to express that behavior in the form of an XDP program.

Note

MAC address is short for Media Access Controll address. It is a unique identifier made of two groups of hexadecimal digits that every network interface has and is used in the data link layer (layer 2 in the OSI model) to interconnect devices over technologies like Ethernet, Bluetooth, and WiFi.

XDP Use Cases

While approaching XDP, it is certainly useful to understand the use cases for which it has been employed by various organizations around the globe. This can help you to imagine why using XDP is better than other techniques such as socket filtering or Traffic Control in certain cases.

Let’s begin with a common one: monitoring.

Monitoring

Nowadays, most of the network monitoring systems are implemented either by writing kernel modules or by accessing proc files from user-space. Writing, distributing, and compiling kernel modules is not a task for everyone; it’s a dangerous operation. They are not easy to maintain and debug either. However, the alternative might be even worse. To obtain the same kind of information, such as how many packets a card received in a second, you’d need to open and part a file, in this case /sys/class/net/eth0/statistics/rx_packets. This might seem like a good idea, but it requires a lot of computing just to obtain some simple information, because using the open syscall is not cheap in some cases.

So, we need a solution that allows us to implement features similar to the ones of a kernel module without having to lose on performance. XDP is perfect for that, because we can use an XDP program to send the data we want to extract in a map. Then the map is consumed by a loader that can store the metrics into a storage backend and apply algorithms to it or plot the results in a graph.

DDoS Mitigation

Being able to see packets at the NIC level ensures that any possible packet is intercepted at the first stage, when the system didn’t spend enough computing power yet to understand whether the packets will be useful for the system. In a typical scenario, a bpf map can instruct an XDP program to XDP_DROP packets from a certain source. That packet list can be generated in user-space after analyzing packets received via another map. Once there’s a match between a packet flowing into the XDP program and an element of the list, the mitigation occurs. The packet is dropped, and the kernel didn’t even need to spend a CPU cycle to handle it. That has the result of making the attacker goal difficult to achieve because, in this case, it wasn’t able to waste any expensive computing resources.

Load Balancing

An interesting use case for XDP programs, is load balancing; however, XDP can retransmit packets only on the same NIC where they arrived. This means that XDP is not the best option to implement a classic load balancer that sits in front of all your servers and forwards traffic to them. However, this does not mean that XDP is not good for this use case. If we move load balancing from an external server to the same machines serving the application, you immediately see how their NICs can be used to do the job.

In that way, we can create a distributed load balancer where each machine hosting the application helps spread the traffic to the appropriate servers.

Firewalling

When people think of firewalling on Linux, they typically think of iptables or netfilter. With XDP, you can get the same functionality in a completely programmable way directly in the NIC or its driver. Usually, firewalls are expensive machines sitting on top of the network stack or between nodes to control what their communication looks like. When using XDP, however, it’s immediately clear that because XDP programs are very cheap and fast, we could implement the firewalling logic directly into a nodes’ NICs instead of having a set of dedicated machines. A common use case is to have an XDP loader that controls a map with a set of rules changed with a remote procedure call API. The set of rules in the map then is dynamically passed to the XDP programs loaded into every specific machine, to control what it can receive, from who, and in which situation.

This alternative doesn’t just make firewalling less expensive; it allows every node to deploy its own level of firewalling without relying on user-space software or the kernel to do that. When this is deployed using offloaded XDP as the operation mode, we obtain the maximum advantage because the processing is not even done by the main node CPU.

Conclusion

What great skills you have now! I promise that XDP will help you think about network flows in a completely different way from now on. Having to rely on tools like iptables or other user-space tools when dealing with network packets is often frustrating and slow. XDP is interesting because it is faster as a result of its direct packet processing capabilities, and because you can write your own logic to deal with the network packets. Because all of that arbitrary code can work with maps and interact with other BPF programs, you have an entire world of possible use cases to invent and explore for your own architectures!

Even though it is not about networking, the next chapter returns to a lot of the concepts covered here and in Chapter 6. Again, BPF is used to filter some conditions based on a given input and to filter what a program can do. Don’t forget that the F in BPF stands for filter!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset