Disabling outbound masquerading

By default, containers are allowed to access the outside network by masquerading or hiding their real IP address behind that of the Docker host. This is accomplished through netfilter masquerade rules that hide container traffic behind the Docker host interface referenced in the next hop. We saw a detailed example of this in Chapter 2, Configuring and Monitoring Docker Networks, when we discussed container-to-container connectivity across hosts. While this type of configuration is ideal in many respects, there are some cases when you might prefer to disable the outbound masquerading capability. For instance, if you prefer to not allow your containers to have outbound connectivity at all, disabling masquerading would prevent containers from talking to the outside network. This, however, only prevents outbound traffic due to lack of return routing. A better option might be to treat containers like any other individual network endpoint and use existing security appliances to define network policy. In this recipe, we'll discuss how to disable IP masquerading as well as how to provide containers with unique IP addressing as they traverse the outside network.

Getting ready

We'll be using a single Docker host in this example. It is assumed that the Docker host used in this lab is in its default configuration. You'll also need access to change Docker service-level settings. In some cases, the changes we make may require you to have root-level access to the system. We'll also be making changes to the network equipment to which the Docker host connects.

How to do it…

You'll recall that IP masquerading in Docker is handled through a netfilter masquerade rule. On a Docker host in its default configuration, we can see this rule by examining the ruleset with iptables:

user@docker1:~$ sudo iptables -t nat -S
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-N DOCKER
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
user@docker1:~$

This rule specifies the source of the traffic as the docker0 bridge subnet and only NAT traffic can be headed off the host. The MASQUERADE target tells the host to source NAT the traffic to the Docker host's next hop interface. That is, if the host has multiple IP interfaces, the container's traffic will source NAT to whichever interface is used as the next hop. This means that container traffic could potentially be hidden behind different IP addresses based on the Docker host interface and routing table configuration. For instance, consider a Docker host with two interfaces, as shown in the following figure:

How to do it…

In the left-hand side example, traffic is taking the default route since the destination of 4.2.2.2 doesn't match a more specific prefix in the host's routing table. In this case, the host performs a source NAT and changes the source of the traffic from 172.17.0.2 to 10.10.10.101 as it traverses the Docker host to the outside network. However, if the destination falls into 172.17.0.0/16, the container traffic will instead be hidden behind the 192.168.10.101 interface, as shown in the example on the right.

The default behavior of Docker can be changed by manipulating the --ip-masq Docker option. By default, the option is considered to be true and can be overridden by specifying the option and setting it to false. We can do this by specifying the option in our Docker systemd drop in file, as follows:

ExecStart=/usr/bin/dockerd --ip-masq=false

Now reload the systemd configuration, restart the Docker service, and check the ICC setting:

user@docker1:~$ sudo systemctl daemon-reload
user@docker1:~$ sudo systemctl restart docker
user@docker1:~$
user@docker1:~$ sudo iptables -t nat -S
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-N DOCKER
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKERuser@docker1:~$

Notice that the masquerade rule is now gone. Traffic generated from a container on this host would attempt to route out through the Docker host with its actual source IP address. A tcpdump on the Docker host would capture this traffic exiting the host's eth0 interface with the original container IP address:

user@docker1:~$ sudo tcpdump –n -i eth0 dst 4.2.2.2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
09:06:10.243523 IP 172.17.0.2 > 4.2.2.2: ICMP echo request, id 3072, seq 0, length 56
09:06:11.244572 IP 172.17.0.2 > 4.2.2.2: ICMP echo request, id 3072, seq 256, length 56

Since the outside network doesn't know where 172.17.0.0/16 is, this request will never receive a response effectively preventing the container from communicating to the outside world.

While this may be a useful means to prevent communication to the outside world, it's not entirely ideal. For starters, you're still allowing the traffic out; the response just won't know where to go as it attempts to return to the source. Also, you've impacted all of the containers, from all networks, on the Docker host. If the docker0 bridge had a routable subnet allocated to it, and the outside network knew where that subnet lived, you could use existing security tooling to make security policy decisions.

For instance, let's assume that the docker0 bridge were to be allocated a subnet of 172.10.10.0/24 and we left IP masquerading disabled. We could do this by changing the Docker options to also specify a new bridge IP address:

ExecStart=/usr/bin/dockerd --ip-masq=false --bip=172.10.10.1/24

As before, traffic leaving a container and destined for the outside network would be unchanged as it traversed the Docker host. Let's assume a small network topology, as the one shown in the following figure:

How to do it…

Let's assume a flow from the container to 4.2.2.2. In this case, egress traffic should work inherently:

  • Container generates traffic toward 4.2.2.2 and uses its default gateway that is the docker0 bridge IP address
  • The Docker host does a route lookup, fails to find a specific prefix match, and forwards the traffic to its default gateway that is the switch.
  • The switch does a route lookup, fails to find a specific prefix match, and forwards the traffic to its default route that is the firewall.
  • The firewall does a route lookup, fails to find a specific prefix match, ensures that the traffic is allowed in the policy, performs a hide NAT to a public IP address, and forwards the traffic to its default route that is the Internet.

So without any additional configuration, egress traffic should reach its destination. The problem is with the return traffic. When the response from the Internet destination gets back to the firewall, it will attempt to determine how to route back to the source. This route lookup will likely fail causing the firewall to drop the traffic.

Note

In some cases, edge network equipment (the firewall in this case) routes all private IP addressing back to the inside (the switch in this case). In those scenarios, the firewall might forward the return traffic to the switch, but the switch won't have a specific return route causing the same problem.

In order for this to work, the firewall and the switch need to know how to return the traffic to the specific container. To do this, we need to add specific routes on each device pointing the docker0 bridge subnet back to the docker1 host:

How to do it…

Once these routes are in place, containers spun up on the Docker host should have connectivity to outside networks:

user@docker1:~$ docker run -it --name=web1 jonlangemak/web_server_1 /bin/bash
root@132530812e1f:/# ping 4.2.2.2
PING 4.2.2.2 (4.2.2.2): 48 data bytes
56 bytes from 4.2.2.2: icmp_seq=0 ttl=50 time=33.805 ms
56 bytes from 4.2.2.2: icmp_seq=1 ttl=50 time=40.431 ms

A tcpdump on the Docker host will show that the traffic is leaving with the original container IP address:

user@docker1:~$ sudo tcpdump –n -i eth0 dst 4.2.2.2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
10:54:42.197828 IP 172.10.10.2 > 4.2.2.2: ICMP echo request, id 3328, seq 0, length 56
10:54:43.198882 IP 172.10.10.2 > 4.2.2.2: ICMP echo request, id 3328, seq 256, length 56

This type of configuration offers the ability to use existing security appliances to decide whether containers can reach certain resources on the outside networks. However, this is also a function of how close the security appliance is to your Docker host. For instance, in this configuration the containers on the Docker host would be able to reach any other network endpoints connected to the switch. The enforcement point (the firewall, in this example) only allows you to limit the container's connectivity to the Internet. In addition, assigning routable IP space each Docker host might introduce IP assignment constraints if you have large scale.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset