In Chapter 1, Linux Networking Constructs and Chapter 2, Configuring and Monitoring Docker Networks, we reviewed common Linux network constructs as well as covered the Docker native options for container networking. In this recipe, we'll walk through how to manually network a container the same way that Docker does in the default bridge network mode. Understanding how Docker handles networking provisioning for containers is a key building block in understanding non-native options for container networking.
In this recipe, we'll be demonstrating the configuration on a single Docker host. It is assumed that this host has Docker installed and that Docker is in its default configuration. In order to view and manipulate networking settings, you'll want to ensure that you have the iproute2
toolset installed. If not present on the system, it can be installed by using the command:
sudo apt-get install iproute2
In order to make network changes to the host, you'll also need root-level access.
In order to manually provision a container's network, we need to explicitly tell Docker not to provision a container's network at runtime. To do this, we run a container using a network mode of none
. For instance, we can start one of the web server containers without any network configuration by using this syntax:
user@docker1:~$ docker run --name web1 --net=none -d
jonlangemak/web_server_1
c108ca80db8a02089cb7ab95936eaa52ef03d26a82b1e95ce91ddf6eef942938
user@docker1:~$
After the container starts, we can check its network configuration using the docker exec
subcommand:
user@docker1:~$ docker exec web1 ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever user@docker1:~$
As you can see, the container doesn't have any interfaces defined besides its local loopback interface. At this point, there is no means to connect to the container. What we've done is essentially created a container in a bubble:
Because we're aiming to mimic the default network configuration, we now need to find a way to connect the container web1
to the docker0
bridge and assign it an IP address from within the bridges IP allocation (172.17.0.0/16
).
That being said, the first thing we need to do is create the interfaces that we'll use to connect the container to the docker0
bridge. As we saw in Chapter 1, Linux Networking Constructs, Linux has a networking component named Virtual Ethernet (VETH) pairs, which will work well for this purpose. One end of the interface will connect to the docker0
bridge and the other end will connect to the container.
Let's start by creating our VETH pair:
user@docker1:~$ sudo ip link add bridge_end type veth peer name container_end user@docker1:~$ ip link show …<Additional output removed for brevity>… 5: container_end@bridge_end: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ce:43:d8:59:ac:c1 brd ff:ff:ff:ff:ff:ff 6: bridge_end@container_end: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 72:8b:e7:f8:66:45 brd ff:ff:ff:ff:ff:ff user@docker1:~$
As expected, we now have two interfaces that are directly associated with each other. Let's now bind one end to the docker0
bridge and turn up the interface:
user@docker1:~$ sudo ip link set dev bridge_end master docker0 user@docker1:~$ sudo ip link set bridge_end up user@docker1:~$ ip link show bridge_end 6: bridge_end@container_end: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc pfifo_fast master docker0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000 link/ether 72:8b:e7:f8:66:45 brd ff:ff:ff:ff:ff:ff user@docker1:~$
The next step is to connect the other end of the VETH pair to the container. This is where things get interesting. Docker creates each container in its own network namespace. This means the other end of the VETH pair needs to land in the container's network namespace. The trick is determining what the container's network namespace is. The namespace for a given container can be located in two different ways.
The first way relies on correlating the container's process ID (PID) to a defined network namespace. It's more involved than the second method but gives you some good background as to some of the internals of network namespaces. As you might recall from Chapter 3, User-Defined Networks, by default we can't use the command-line tool ip netns
to view Docker-created namespaces. In order to view them, we need to create a symlink that ties the location of where Docker stores its network namespaces (/var/run/docker/netns
), to the location that ip netns
is looking (/var/run/netns
):
user@docker1:~$ cd /var/run user@docker1:/var/run$ sudo ln -s /var/run/docker/netns netns
Now if we attempt to list the namespaces, we should see at least one listed in the return:
user@docker1:/var/run$ sudo ip netns list
712f8a477cce
default
user@docker1:/var/run$
But how do we know that this is the namespace associated with this container? To make that determination, we first need to find the PID of the container in question. We can retrieve this information by inspecting the container:
user@docker1:~$ docker inspect web1
…<Additional output removed for brevity>…
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 3156,
"ExitCode": 0,
"Error": "",
"StartedAt": "2016-10-05T21:32:00.163445345Z",
"FinishedAt": "0001-01-01T00:00:00Z"
},
…<Additional output removed for brevity>…
user@docker1:~$
Now that we have PID, we can use the ip netns identify
subcommand to find the network namespace name from the PID:
user@docker1:/var/run$ sudo ip netns identify 3156 712f8a477cce user@docker1:/var/run$
The second way to find a container network namespace is much easier. We can simply inspect and reference the container's network configuration:
user@docker1:~$ docker inspect web1
…<Additional output removed for brevity>…
"NetworkSettings": {
"Bridge": "",
"SandboxID": "712f8a477cceefc7121b2400a22261ec70d6a2d9ab2726cdbd3279f1e87dae22",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {},
"SandboxKey": "/var/run/docker/netns/712f8a477cce",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"EndpointID": "",
…<Additional output removed for brevity>…
user@docker1:~$
Notice the field named SandboxKey
. You'll notice the file path references the location where we said that Docker stores its network namespaces. The filename referenced in this path is the name of the container's network namespace. Docker refers to network namespaces as sandboxes, hence the naming convention used.
Now that we have the network namespace name we can build the connectivity between the container and the docker0
bridge. Recall that VETH pairs can be used to connect network namespaces together. In this example, we'll be placing the container end of the VETH pair in the container's network namespace. This will bridge the container into the default network namespace on the docker0
bridge. To do this, we'll first move the container end of the VETH pair into the namespace we discovered earlier:
user@docker1:~$ sudo ip link set container_end netns 712f8a477cce
We can validate the VETH pair is in the namespace using the docker exec
subcommand:
user@docker1:~$ docker exec web1 ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
5: container_end@if6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 86:15:2a:f7:0e:f9 brd ff:ff:ff:ff:ff:ff
user@docker1:~$
At this point, we've successfully bridged the container and the default namespace together using a VETH pair, so our connectivity now looks something like this:
However, the container web1
is still lacking any kind of IP connectivity since it has not yet been allocated a routable IP address. Recall in Chapter 1, Linux Networking Constructs, we saw that a VETH pair interface can be assigned an IP address directly. To give the container a routable IP address, Docker simply allocates an unused IP address from the docker0
bridge subnet to the container end of the VETH pair.
user@docker1:~$ sudo ip netns exec 712f8a477cce ip addr add 172.17.0.99/16 dev container_end
At this point, we could turn up the interface and we should have reachability to the container from the host. But before we do that let's make things a little cleaner by renaming the container_end
VETH pair to just eth0
:
user@docker1:~$ sudo ip netns exec 712f8a477cce ip link set dev container_end name eth0
Now we can turn up the newly named eth0
interface, which is the container side of the VETH pair:
user@docker1:~$ sudo ip netns exec 712f8a477cce ip link set eth0 up user@docker1:~$ ip link show bridge_end 6: bridge_end@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master docker0 state UP mode DEFAULT group default qlen 1000 link/ether 86:04:ed:1b:2a:04 brd ff:ff:ff:ff:ff:ff user@docker1:~$ sudo ip netns exec 4093b3b4e672 ip link show eth0 5: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 86:15:2a:f7:0e:f9 brd ff:ff:ff:ff:ff:ff user@docker1:~$ sudo ip netns exec 4093b3b4e672 ip addr show eth0 5: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 86:15:2a:f7:0e:f9 brd ff:ff:ff:ff:ff:ff inet 172.17.0.99/16 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::8415:2aff:fef7:ef9/64 scope link valid_lft forever preferred_lft forever user@docker1:~$
If we check from the host, we should now have reachability to the container:
user@docker1:~$ ping 172.17.0.99 -c 2 PING 172.17.0.99 (172.17.0.99) 56(84) bytes of data. 64 bytes from 172.17.0.99: icmp_seq=1 ttl=64 time=0.104 ms 64 bytes from 172.17.0.99: icmp_seq=2 ttl=64 time=0.045 ms --- 172.17.0.99 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.045/0.074/0.104/0.030 ms user@docker1:~$ user@docker1:~$ curl http://172.17.0.99 <body> <html> <h1><span style="color:#FF0000;font-size:72px;">Web Server #1 - Running on port 80</span></h1> </body> </html> user@docker1:~$
With the connectivity in place, our topology now looks like this:
So, while we have IP connectivity, it's only for hosts on the same subnet. The last remaining piece would be to solve for container connectivity at the host level. For outbound connectivity the host hides the container's IP address behind the host's interface IP addresses. For inbound connectivity, in the default network mode, Docker uses port mappings to map a random high port on the Docker host's NIC to the container's exposed port.
Solving for outbound in this case is as simple as giving the container a default route pointing at the docker0
bridge and ensuring that you have a netfilter masquerade rule that will cover this:
user@docker1:~$ sudo ip netns exec 712f8a477cce ip route add default via 172.17.0.1 user@docker1:~$ docker exec -it web1 ping 4.2.2.2 -c 2 PING 4.2.2.2 (4.2.2.2): 48 data bytes 56 bytes from 4.2.2.2: icmp_seq=0 ttl=50 time=39.764 ms 56 bytes from 4.2.2.2: icmp_seq=1 ttl=50 time=40.210 ms --- 4.2.2.2 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max/stddev = 39.764/39.987/40.210/0.223 ms user@docker1:~$
If you're using the docker0
bridge as we did in this example, you won't need to add a custom netfilter masquerade rule. This is because the default masquerade rule already covers the entire subnet of the docker0
bridge:
user@docker1:~$ sudo iptables -t nat -L
…<Additional output removed for brevity>…
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE all -- 172.17.0.0/16 anywhere
…<Additional output removed for brevity>…
user@docker1:~$
For inbound services, we'll need to create a custom rule that uses Network Address Translation (NAT) to map a random high port on the host to the exposed service port in the container. We could do that with a rule like this:
user@docker1:~$ sudo iptables -t nat -A DOCKER ! -i docker0 -p tcp -m tcp --dport 32799 -j DNAT --to-destination 172.17.0.99:80
In this case, we NAT the port 32799
on the host interface to port 80
on the container. This will allow systems on the outside network to access the web server running in web1
via the Docker host's interface on port 32799
.
In the end, we have successfully replicated what Docker provides in the default network mode:
This should give you some appreciation for what Docker does behind the scenes. Keeping track of container IP addressing, port allocations for published ports, and the iptables
rule set are three of the major things that Docker tracks on your behalf. Given the ephemeral nature of containers, this would be almost impossible to do manually.