Managing netfilter to Docker integration

By default, Docker performs most of the netfilter configuration for you. It takes care of things such as publishing ports and outbound masquerading, as well as allows you to block or allow ICC. However, this is all optional and you can tell Docker not to modify or add to any of your existing iptables rules. If you do this, you'll need to generate your own rules to provide similar functionality. This may be appealing to you if you're already using iptables rules extensively and don't want Docker to automatically make changes to your configuration. In this recipe we'll discuss how to disable automatic iptables rule generation for Docker and show you how to manually create similar rules.

Getting ready

We'll be using a single Docker host in this example. It is assumed that the Docker host used in this lab is in its default configuration. You'll also need access to change Docker service-level settings. In some cases, the changes we make may require you to have root-level access to the system.

How to do it…

As we've already seen, Docker takes care of a lot of the heavy lifting for you when it comes to network configuration. It also allows you the ability to configure these things on your own if need be. Before we look at doing it ourselves, let's confirm what Docker is actually configuring on our behalf with regard to iptables rules. Let's run the following containers:

user@docker1:~$ docker run -dP --name=web1 jonlangemak/web_server_1
f5b7b389890398588c55754a09aa401087604a8aa98dbf55d84915c6125d5e62
user@docker1:~$ docker run -dP --name=web2 jonlangemak/web_server_2
e1c866892e7f3f25dee8e6ba89ec526fa3caf6200cdfc705ce47917f12095470
user@docker1:~$

Running these containers would yield the following topology:

How to do it…

Note

The examples given later will not use the host's eth1 interface directly. It is displayed to illustrate how the rules generated by Docker are written in a manner that encompasses all physical interfaces on the Docker host.

As we've mentioned before, Docker uses iptables to handle the following items:

  • Outbound container connectivity (masquerading)
  • Inbound port publishing
  • Container–to-container connectivity

Since we're using the default configuration and we have published ports on both containers, we should be able to see all three of these items configured in iptables. Let's take a look at the NAT table to start with:

Note

In most cases, I prefer to print the rules and interpret them rather than have them listed in formatted columns. There are trade-offs to each approach, but if you prefer the list mode, you can replace the -S with -vL.

user@docker1:~$ sudo iptables -t nat -S
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-N DOCKER
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -s 172.17.0.2/32 -d 172.17.0.2/32 -p tcp -m tcp --dport 80 -j MASQUERADE
-A POSTROUTING -s 172.17.0.3/32 -d 172.17.0.3/32 -p tcp -m tcp --dport 80 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
-A DOCKER ! -i docker0 -p tcp -m tcp --dport 32768 -j DNAT --to-destination 172.17.0.2:80
-A DOCKER ! -i docker0 -p tcp -m tcp --dport 32769 -j DNAT --to-destination 172.17.0.3:80
user@docker1:~$

Let's review the importance of each of the bolded lines in the preceding output. The first bolded line takes care of the outbound hide NAT or MASQUERADE:

-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE

The rule is looking for traffic that matches two characteristics:

  • The source IP address must match the IP address space of the docker0 bridge
  • The traffic is not exiting through the docker0 bridge. That is, it's leaving through another interface such as eth0 or eth1

The jump statement at the end specifies a target of MASQUERADE, which will source NAT the container traffic to one of the host's IP interfaces based on the routing table.

The next two bolded lines provide similar functionality and provide the NAT required for publishing ports on each respective container. Let's examine one of them:

-A DOCKER ! -i docker0 -p tcp -m tcp --dport 32768 -j DNAT --to-destination 172.17.0.2:80

The rule is looking for traffic that matches three characteristics:

  • The traffic is not entering through the docker0 bridge
  • The traffic is TCP
  • The traffic has a destination port of 32768

The jump statement at the end specifies a target of DNAT and a destination of the container with its real service port (80). Notice that both of these rules are generic in terms of the Docker host's physical interfaces. As we saw earlier, both port publishing and outbound masquerading can occur on any interface on the host unless we 'specifically limit the scope.

The next table we want to review is the filter table:

user@docker1:~$ sudo iptables -t filter -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-N DOCKER
-N DOCKER-ISOLATION
-A FORWARD -j DOCKER-ISOLATION
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 80 -j ACCEPT
-A DOCKER -d 172.17.0.3/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 80 -j ACCEPT
-A DOCKER-ISOLATION -j RETURN
user@docker1:~$

Again, you'll note that the chain policy is set to ACCEPT for the default chains. In the case of the filter table, it has a more drastic impact on functionality. This means that everything is being allowed unless specifically denied in a rule. In other words, if there were no rules defined everything would still work. Docker inserts these rules in case your default policy is not set to ACCEPT. Later on, when we manually create the rules, we'll set the default policy to DROP so that you can see the impact the rules have. The preceding rules require a little more explaining, especially if you aren't familiar with how iptables rules work. Let's review the bolded lines one at a time.

The first bolded line takes care of allowing traffic from the outside network back into the containers. In this case, the rule is specific to instances where the container itself is generating traffic toward, and expecting a response, from the outside network:

-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

The rule is looking for traffic that matches two characteristics:

  • The traffic is exiting through the docker0 bridge
  • The traffic has a connection state of RELATED or ESTABLISHED. This would include sessions that are part of an existing flow or related to it

The jump statement at the end references a target of ACCEPT, which will allow the flow through.

The second bolded line allows the container's connectivity to the outside network:

-A FORWARD -i docker0 ! -o docker0 -j ACCEPT

The rule is looking for traffic that matches two characteristics:

  • The traffic is entering through the docker0 bridge
  • The traffic is not exiting through the docker0 bridge

This is a very generic way of identifying traffic that came from the containers and is leaving through any other interface than the docker0 bridge. The jump statement at the end references a target of ACCEPT, which will allow the flow through. This rule, in conjunction with the first rule, will allow a flow generated from a container toward the outside network to work.

The third bolded line allows inter-container connectivity:

-A FORWARD -i docker0 -o docker0 -j ACCEPT

The rule is looking for traffic that matches two characteristics:

  • The traffic is entering through the docker0 bridge
  • The traffic is exiting through the docker0 bridge

This is another generic means to identify traffic that is originated from a container on the docker0 bridge as well as destined for a target on the docker0 bridge. The jump statement at the end references a target of ACCEPT, which will allow the flow through. This is the same rule that's turned into a DROP target when you disable ICC mode as we saw in earlier chapters.

The last two bolded lines allow the published ports to reach the containers. Let's examine one of them:

-A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 80 -j ACCEPT

The rule is looking for traffic that matches five characteristics:

  • The traffic is destined to the container whose port was published
  • The traffic is not entering through the docker0 bridge
  • The traffic is exiting through the docker0 bridge
  • The protocol is TCP
  • The port number is 80

This rule specifically allows the published port to work by allowing access to the container's service port (80). The jump statement at the end references a target of ACCEPT, which will allow the flow through.

Manually creating the required iptables rules

Now that we've seen how Docker automatically handles rule generation, let's walk through an example of how to build this connectivity on our own. To do this, we first need to instruct Docker to not create any iptables rules. To do this, we set the --iptables Docker option to false in the Docker systemd drop in file:

ExecStart=/usr/bin/dockerd --iptables=false

We'll need to reload the systemd drop in file and restart the Docker service for Docker to reread the service parameters. To ensure that you start with a blank slate, if possible, restart the server or flush all the iptables rules out manually (if you're not comfortable with managing the iptables rules, the best approach is just to reboot the server to clear them out). We'll assume for the rest of the example that we're working with an empty ruleset. Once Docker is restarted, you can restart your two containers and ensure that there are no iptables rules present on the system:

user@docker1:~$ docker start web1
web1
user@docker1:~$ docker start web2
web2
user@docker1:~$ sudo iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
user@docker1:~$

As you can see, there are no iptables rules currently defined. We can also see that our default chain policy in the filter table is set to ACCEPT. Let's now change the default policy in the filter table to DROP for each chain. Along with that, let's also include a rule to allow SSH to and from the host so as not to break our connectivity:

user@docker1:~$ sudo iptables -A INPUT -i eth0 -p tcp --dport 22 
-m state --state NEW,ESTABLISHED -j ACCEPT
user@docker1:~$ sudo iptables -A OUTPUT -o eth0 -p tcp --sport 22 
-m state --state ESTABLISHED -j ACCEPT
user@docker1:~$ sudo iptables -P INPUT DROP
user@docker1:~$ sudo iptables -P FORWARD DROP
user@docker1:~$ sudo iptables -P OUTPUT DROP

Let's now check the filter table once again to make sure that the rules were accepted:

user@docker1:~$ sudo iptables -S
-P INPUT DROP
-P FORWARD DROP
-P OUTPUT DROP
-A INPUT -i eth0 -p tcp -m tcp --dport 22 -m state --state NEW,ESTABLISHED -j ACCEPT
-A OUTPUT -o eth0 -p tcp -m tcp --sport 22 -m state --state ESTABLISHED -j ACCEPT
user@docker1:~$

At this point, the containers web1 and web2 will no longer be able to reach each other:

user@docker1:~$ docker exec -it web1 ping 172.17.0.3 -c 2
PING 172.17.0.3 (172.17.0.3): 48 data bytes
user@docker1:~$

Note

Depending on your operating system, you might notice that web1 actually is able to ping web2 at this point. The most likely reason for this is that the br_netfilter kernel module has not been loaded. Without this module bridged packets will not be inspected by netfilter. To resolve this, you can manually load the module by using the sudo modprobe br_netfilter command. To make the module load at each boot, you could add it to the /etc/modules file as well. When Docker is managing the iptables ruleset, it takes care of loading the module for you.

Now, let's start building the ruleset to recreate the connectivity that Docker previously built for us automatically. The first thing we want to do is allow containers inbound and outbound access. We'll do that with these two rules:

user@docker1:~$ sudo iptables -A FORWARD -i docker0 ! 
-o docker0 -j ACCEPT
user@docker1:~$ sudo iptables -A FORWARD -o docker0 
-m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

Although these two rules will allow containers to generate and receive traffic from the outside network, the connectivity still won't work at this point. In order for this to work, we need to apply the masquerade rule so that the container traffic will be hidden behind an interface on the docker0 host. If we don't do this, the traffic will never get returned as the outside network knows nothing about the 172.17.0.0/16 network in which the containers live:

user@docker1:~$ sudo iptables -t nat -A POSTROUTING 
-s 172.17.0.0/16 ! -o docker0 -j MASQUERADE

With this in place, the containers will now be able to reach network endpoints on the outside network:

user@docker1:~$ docker exec -it web1 ping 4.2.2.2 -c 2
PING 4.2.2.2 (4.2.2.2): 48 data bytes
56 bytes from 4.2.2.2: icmp_seq=0 ttl=50 time=36.261 ms
56 bytes from 4.2.2.2: icmp_seq=1 ttl=50 time=55.271 ms
--- 4.2.2.2 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 36.261/45.766/55.271/9.505 ms
user@docker1:~$

However, the containers still cannot communicate directly with each other:

user@docker1:~$ docker exec -it web1 ping 172.17.0.3 -c 2
PING 172.17.0.3 (172.17.0.3): 48 data bytes
user@docker1:~$ docker exec -it web1 curl -S http://172.17.0.3
user@docker1:~$

We need to add one final rule:

sudo iptables -A FORWARD -i docker0 -o docker0 -j ACCEPT

Since traffic between containers both enters and leaves the docker0 bridge, this will allow the inter-container connectivity:

user@docker1:~$ docker exec -it web1 ping 172.17.0.3 -c 2
PING 172.17.0.3 (172.17.0.3): 48 data bytes
56 bytes from 172.17.0.3: icmp_seq=0 ttl=64 time=0.092 ms
56 bytes from 172.17.0.3: icmp_seq=1 ttl=64 time=0.086 ms
--- 172.17.0.3 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.086/0.089/0.092/0.000 ms
user@docker1:~$
user@docker1:~$ docker exec -it web1 curl http://172.17.0.3
<body>
  <html>
    <h1><span style="color:#FF0000;font-size:72px;">Web Server #2 - Running on port 80</span>
    </h1>
</body>
  </html>
user@docker1:~$

The only configuration remaining is to provide a mechanism to publish ports. We can do that by first provisioning a destination NAT on the Docker host itself. Even though Docker is not provisioning the NAT rules, it's still keeping track of the port allocations on your behalf. At container runtime if you choose to publish a port, Docker will allocate a port mapping for you even though it is not handling the publishing. It is wise to use the port Docker allocates to prevent overlaps:

user@docker1:~$ docker port web1
80/tcp -> 0.0.0.0:32768
user@docker1:~$ docker port web2
80/tcp -> 0.0.0.0:32769
user@docker1:~$
user@docker1:~$ sudo iptables -t nat -A PREROUTING ! -i docker0 
-p tcp -m tcp --dport 32768 -j DNAT --to-destination 172.17.0.2:80
user@docker1:~$ sudo iptables -t nat -A PREROUTING ! -i docker0 
-p tcp -m tcp --dport 32769 -j DNAT --to-destination 172.17.0.3:80
user@docker1:~$

Using the ports Docker allocated, we can define an inbound NAT rule for each container that translates inbound connectivity to an external port on the Docker host to the real container IP and service port. Finally, we just need to allow inbound traffic:

user@docker1:~$ sudo iptables -A FORWARD -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 80 -j ACCEPT
user@docker1:~$ sudo iptables -A FORWARD -d 172.17.0.3/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 80 -j ACCEPT

Once these rules are configured, we can now test the connectivity from outside the Docker host on the published ports:

Manually creating the required iptables rules
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset