Manually networking containers

In Chapter 1, Linux Networking Constructs and Chapter 2, Configuring and Monitoring Docker Networks, we reviewed common Linux network constructs as well as covered the Docker native options for container networking. In this recipe, we'll walk through how to manually network a container the same way that Docker does in the default bridge network mode. Understanding how Docker handles networking provisioning for containers is a key building block in understanding non-native options for container networking.

Getting ready

In this recipe, we'll be demonstrating the configuration on a single Docker host. It is assumed that this host has Docker installed and that Docker is in its default configuration. In order to view and manipulate networking settings, you'll want to ensure that you have the iproute2 toolset installed. If not present on the system, it can be installed by using the command:

sudo apt-get install iproute2 

In order to make network changes to the host, you'll also need root-level access.

How to do it…

In order to manually provision a container's network, we need to explicitly tell Docker not to provision a container's network at runtime. To do this, we run a container using a network mode of none. For instance, we can start one of the web server containers without any network configuration by using this syntax:

user@docker1:~$ docker run --name web1 --net=none -d 
jonlangemak/web_server_1
c108ca80db8a02089cb7ab95936eaa52ef03d26a82b1e95ce91ddf6eef942938
user@docker1:~$

After the container starts, we can check its network configuration using the docker exec subcommand:

user@docker1:~$ docker exec web1 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
user@docker1:~$ 

As you can see, the container doesn't have any interfaces defined besides its local loopback interface. At this point, there is no means to connect to the container. What we've done is essentially created a container in a bubble:

How to do it…

Because we're aiming to mimic the default network configuration, we now need to find a way to connect the container web1 to the docker0 bridge and assign it an IP address from within the bridges IP allocation (172.17.0.0/16).

That being said, the first thing we need to do is create the interfaces that we'll use to connect the container to the docker0 bridge. As we saw in Chapter 1, Linux Networking Constructs, Linux has a networking component named Virtual Ethernet (VETH) pairs, which will work well for this purpose. One end of the interface will connect to the docker0 bridge and the other end will connect to the container.

Let's start by creating our VETH pair:

user@docker1:~$ sudo ip link add bridge_end type veth 
peer name container_end
user@docker1:~$ ip link show
…<Additional output removed for brevity>…
5: container_end@bridge_end: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ce:43:d8:59:ac:c1 brd ff:ff:ff:ff:ff:ff
6: bridge_end@container_end: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 72:8b:e7:f8:66:45 brd ff:ff:ff:ff:ff:ff
user@docker1:~$

As expected, we now have two interfaces that are directly associated with each other. Let's now bind one end to the docker0 bridge and turn up the interface:

user@docker1:~$ sudo ip link set dev bridge_end master docker0
user@docker1:~$ sudo ip link set bridge_end up
user@docker1:~$ ip link show bridge_end
6: bridge_end@container_end: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc pfifo_fast master docker0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether 72:8b:e7:f8:66:45 brd ff:ff:ff:ff:ff:ff
user@docker1:~$

Note

The state of the interface at this point will show as LOWERLAYERDOWN. This is because the other end of the interface is unbound and still in a down state.

The next step is to connect the other end of the VETH pair to the container. This is where things get interesting. Docker creates each container in its own network namespace. This means the other end of the VETH pair needs to land in the container's network namespace. The trick is determining what the container's network namespace is. The namespace for a given container can be located in two different ways.

The first way relies on correlating the container's process ID (PID) to a defined network namespace. It's more involved than the second method but gives you some good background as to some of the internals of network namespaces. As you might recall from Chapter 3, User-Defined Networks, by default we can't use the command-line tool ip netns to view Docker-created namespaces. In order to view them, we need to create a symlink that ties the location of where Docker stores its network namespaces (/var/run/docker/netns), to the location that ip netns is looking (/var/run/netns):

user@docker1:~$ cd /var/run
user@docker1:/var/run$ sudo ln -s /var/run/docker/netns netns

Now if we attempt to list the namespaces, we should see at least one listed in the return:

user@docker1:/var/run$ sudo ip netns list
712f8a477cce
default
user@docker1:/var/run$

But how do we know that this is the namespace associated with this container? To make that determination, we first need to find the PID of the container in question. We can retrieve this information by inspecting the container:

user@docker1:~$ docker inspect web1
…<Additional output removed for brevity>…
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 3156,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2016-10-05T21:32:00.163445345Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
…<Additional output removed for brevity>…
user@docker1:~$ 

Now that we have PID, we can use the ip netns identify subcommand to find the network namespace name from the PID:

user@docker1:/var/run$ sudo ip netns identify 3156
712f8a477cce
user@docker1:/var/run$ 

Note

Even if you choose to use the second method, make sure that you create the symlink so that ip netns works for later steps.

The second way to find a container network namespace is much easier. We can simply inspect and reference the container's network configuration:

user@docker1:~$ docker inspect web1
…<Additional output removed for brevity>… 
"NetworkSettings": {
            "Bridge": "",
            "SandboxID": "712f8a477cceefc7121b2400a22261ec70d6a2d9ab2726cdbd3279f1e87dae22",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/var/run/docker/netns/712f8a477cce",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "", 
…<Additional output removed for brevity>… 
user@docker1:~$

Notice the field named SandboxKey. You'll notice the file path references the location where we said that Docker stores its network namespaces. The filename referenced in this path is the name of the container's network namespace. Docker refers to network namespaces as sandboxes, hence the naming convention used.

Now that we have the network namespace name we can build the connectivity between the container and the docker0 bridge. Recall that VETH pairs can be used to connect network namespaces together. In this example, we'll be placing the container end of the VETH pair in the container's network namespace. This will bridge the container into the default network namespace on the docker0 bridge. To do this, we'll first move the container end of the VETH pair into the namespace we discovered earlier:

user@docker1:~$ sudo ip link set container_end netns 712f8a477cce

We can validate the VETH pair is in the namespace using the docker exec subcommand:

user@docker1:~$ docker exec web1 ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
5: container_end@if6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 86:15:2a:f7:0e:f9 brd ff:ff:ff:ff:ff:ff
user@docker1:~$

At this point, we've successfully bridged the container and the default namespace together using a VETH pair, so our connectivity now looks something like this:

How to do it…

However, the container web1 is still lacking any kind of IP connectivity since it has not yet been allocated a routable IP address. Recall in Chapter 1, Linux Networking Constructs, we saw that a VETH pair interface can be assigned an IP address directly. To give the container a routable IP address, Docker simply allocates an unused IP address from the docker0 bridge subnet to the container end of the VETH pair.

Note

IPAM is a huge advantage of allowing Docker to manage your container networking for you. Without IPAM, you'll need to track allocations on your own and make sure that you don't assign any overlapping IP addresses.

user@docker1:~$ sudo ip netns exec 712f8a477cce ip 
addr add 172.17.0.99/16 dev container_end

At this point, we could turn up the interface and we should have reachability to the container from the host. But before we do that let's make things a little cleaner by renaming the container_end VETH pair to just eth0:

user@docker1:~$ sudo ip netns exec 712f8a477cce ip link 
set dev container_end name eth0

Now we can turn up the newly named eth0 interface, which is the container side of the VETH pair:

user@docker1:~$ sudo ip netns exec 712f8a477cce ip link 
set eth0 up
user@docker1:~$ ip link show bridge_end
6: bridge_end@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master docker0 state UP mode DEFAULT group default qlen 1000
    link/ether 86:04:ed:1b:2a:04 brd ff:ff:ff:ff:ff:ff
user@docker1:~$ sudo ip netns exec 4093b3b4e672 ip link show eth0
5: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 86:15:2a:f7:0e:f9 brd ff:ff:ff:ff:ff:ff
user@docker1:~$ sudo ip netns exec 4093b3b4e672 ip addr show eth0
5: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 86:15:2a:f7:0e:f9 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.99/16 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::8415:2aff:fef7:ef9/64 scope link
       valid_lft forever preferred_lft forever
user@docker1:~$

If we check from the host, we should now have reachability to the container:

user@docker1:~$ ping 172.17.0.99 -c 2
PING 172.17.0.99 (172.17.0.99) 56(84) bytes of data.
64 bytes from 172.17.0.99: icmp_seq=1 ttl=64 time=0.104 ms
64 bytes from 172.17.0.99: icmp_seq=2 ttl=64 time=0.045 ms
--- 172.17.0.99 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.045/0.074/0.104/0.030 ms
user@docker1:~$
user@docker1:~$ curl http://172.17.0.99
<body>
  <html>
    <h1><span style="color:#FF0000;font-size:72px;">Web Server #1 - Running on port 80</span></h1>
</body>
  </html>
user@docker1:~$

With the connectivity in place, our topology now looks like this:

How to do it…

So, while we have IP connectivity, it's only for hosts on the same subnet. The last remaining piece would be to solve for container connectivity at the host level. For outbound connectivity the host hides the container's IP address behind the host's interface IP addresses. For inbound connectivity, in the default network mode, Docker uses port mappings to map a random high port on the Docker host's NIC to the container's exposed port.

Solving for outbound in this case is as simple as giving the container a default route pointing at the docker0 bridge and ensuring that you have a netfilter masquerade rule that will cover this:

user@docker1:~$ sudo ip netns exec 712f8a477cce ip route 
add default via 172.17.0.1
user@docker1:~$ docker exec -it web1 ping 4.2.2.2 -c 2
PING 4.2.2.2 (4.2.2.2): 48 data bytes
56 bytes from 4.2.2.2: icmp_seq=0 ttl=50 time=39.764 ms
56 bytes from 4.2.2.2: icmp_seq=1 ttl=50 time=40.210 ms
--- 4.2.2.2 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 39.764/39.987/40.210/0.223 ms
user@docker1:~$

If you're using the docker0 bridge as we did in this example, you won't need to add a custom netfilter masquerade rule. This is because the default masquerade rule already covers the entire subnet of the docker0 bridge:

user@docker1:~$ sudo iptables -t nat -L
…<Additional output removed for brevity>…
Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
MASQUERADE  all  --  172.17.0.0/16        anywhere
…<Additional output removed for brevity>…
user@docker1:~$

For inbound services, we'll need to create a custom rule that uses Network Address Translation (NAT) to map a random high port on the host to the exposed service port in the container. We could do that with a rule like this:

user@docker1:~$ sudo iptables -t nat -A DOCKER ! -i docker0 -p tcp -m tcp 
--dport 32799 -j DNAT --to-destination 172.17.0.99:80

In this case, we NAT the port 32799 on the host interface to port 80 on the container. This will allow systems on the outside network to access the web server running in web1 via the Docker host's interface on port 32799.

In the end, we have successfully replicated what Docker provides in the default network mode:

How to do it…

This should give you some appreciation for what Docker does behind the scenes. Keeping track of container IP addressing, port allocations for published ports, and the iptables rule set are three of the major things that Docker tracks on your behalf. Given the ephemeral nature of containers, this would be almost impossible to do manually.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset