By default, Docker performs most of the netfilter configuration for you. It takes care of things such as publishing ports and outbound masquerading, as well as allows you to block or allow ICC. However, this is all optional and you can tell Docker not to modify or add to any of your existing iptables
rules. If you do this, you'll need to generate your own rules to provide similar functionality. This may be appealing to you if you're already using iptables
rules extensively and don't want Docker to automatically make changes to your configuration. In this recipe we'll discuss how to disable automatic iptables
rule generation for Docker and show you how to manually create similar rules.
We'll be using a single Docker host in this example. It is assumed that the Docker host used in this lab is in its default configuration. You'll also need access to change Docker service-level settings. In some cases, the changes we make may require you to have root-level access to the system.
As we've already seen, Docker takes care of a lot of the heavy lifting for you when it comes to network configuration. It also allows you the ability to configure these things on your own if need be. Before we look at doing it ourselves, let's confirm what Docker is actually configuring on our behalf with regard to iptables
rules. Let's run the following containers:
user@docker1:~$ docker run -dP --name=web1 jonlangemak/web_server_1 f5b7b389890398588c55754a09aa401087604a8aa98dbf55d84915c6125d5e62 user@docker1:~$ docker run -dP --name=web2 jonlangemak/web_server_2 e1c866892e7f3f25dee8e6ba89ec526fa3caf6200cdfc705ce47917f12095470 user@docker1:~$
Running these containers would yield the following topology:
As we've mentioned before, Docker uses iptables
to handle the following items:
Since we're using the default configuration and we have published ports on both containers, we should be able to see all three of these items configured in iptables
. Let's take a look at the NAT table to start with:
user@docker1:~$ sudo iptables -t nat -S -P PREROUTING ACCEPT -P INPUT ACCEPT -P OUTPUT ACCEPT -P POSTROUTING ACCEPT -N DOCKER -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE -A POSTROUTING -s 172.17.0.2/32 -d 172.17.0.2/32 -p tcp -m tcp --dport 80 -j MASQUERADE -A POSTROUTING -s 172.17.0.3/32 -d 172.17.0.3/32 -p tcp -m tcp --dport 80 -j MASQUERADE -A DOCKER -i docker0 -j RETURN -A DOCKER ! -i docker0 -p tcp -m tcp --dport 32768 -j DNAT --to-destination 172.17.0.2:80 -A DOCKER ! -i docker0 -p tcp -m tcp --dport 32769 -j DNAT --to-destination 172.17.0.3:80 user@docker1:~$
Let's review the importance of each of the bolded lines in the preceding output. The first bolded line takes care of the outbound hide NAT or MASQUERADE
:
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
The rule is looking for traffic that matches two characteristics:
docker0
bridgedocker0
bridge. That is, it's leaving through another interface such as eth0
or eth1
The jump statement at the end specifies a target of MASQUERADE
, which will source NAT the container traffic to one of the host's IP interfaces based on the routing table.
The next two bolded lines provide similar functionality and provide the NAT required for publishing ports on each respective container. Let's examine one of them:
-A DOCKER ! -i docker0 -p tcp -m tcp --dport 32768 -j DNAT --to-destination 172.17.0.2:80
The rule is looking for traffic that matches three characteristics:
docker0
bridge32768
The jump statement at the end specifies a target of DNAT
and a destination of the container with its real service port (80
). Notice that both of these rules are generic in terms of the Docker host's physical interfaces. As we saw earlier, both port publishing and outbound masquerading can occur on any interface on the host unless we 'specifically limit the scope.
The next table we want to review is the filter table:
user@docker1:~$ sudo iptables -t filter -S -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT -N DOCKER -N DOCKER-ISOLATION -A FORWARD -j DOCKER-ISOLATION -A FORWARD -o docker0 -j DOCKER -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -i docker0 ! -o docker0 -j ACCEPT -A FORWARD -i docker0 -o docker0 -j ACCEPT -A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 80 -j ACCEPT -A DOCKER -d 172.17.0.3/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 80 -j ACCEPT -A DOCKER-ISOLATION -j RETURN user@docker1:~$
Again, you'll note that the chain policy is set to ACCEPT
for the default chains. In the case of the filter table, it has a more drastic impact on functionality. This means that everything is being allowed unless specifically denied in a rule. In other words, if there were no rules defined everything would still work. Docker inserts these rules in case your default policy is not set to ACCEPT
. Later on, when we manually create the rules, we'll set the default policy to DROP
so that you can see the impact the rules have. The preceding rules require a little more explaining, especially if you aren't familiar with how iptables
rules work. Let's review the bolded lines one at a time.
The first bolded line takes care of allowing traffic from the outside network back into the containers. In this case, the rule is specific to instances where the container itself is generating traffic toward, and expecting a response, from the outside network:
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
The rule is looking for traffic that matches two characteristics:
docker0
bridgeRELATED
or ESTABLISHED
. This would include sessions that are part of an existing flow or related to itThe jump statement at the end references a target of ACCEPT
, which will allow the flow through.
The second bolded line allows the container's connectivity to the outside network:
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
The rule is looking for traffic that matches two characteristics:
docker0
bridgedocker0
bridgeThis is a very generic way of identifying traffic that came from the containers and is leaving through any other interface than the docker0
bridge. The jump statement at the end references a target of ACCEPT
, which will allow the flow through. This rule, in conjunction with the first rule, will allow a flow generated from a container toward the outside network to work.
The third bolded line allows inter-container connectivity:
-A FORWARD -i docker0 -o docker0 -j ACCEPT
The rule is looking for traffic that matches two characteristics:
docker0
bridgedocker0
bridgeThis is another generic means to identify traffic that is originated from a container on the docker0
bridge as well as destined for a target on the docker0
bridge. The jump statement at the end references a target of ACCEPT
, which will allow the flow through. This is the same rule that's turned into a DROP
target when you disable ICC mode as we saw in earlier chapters.
The last two bolded lines allow the published ports to reach the containers. Let's examine one of them:
-A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 80 -j ACCEPT
The rule is looking for traffic that matches five characteristics:
docker0
bridgedocker0
bridge80
This rule specifically allows the published port to work by allowing access to the container's service port (80
). The jump statement at the end references a target of ACCEPT
, which will allow the flow through.
Now that we've seen how Docker automatically handles rule generation, let's walk through an example of how to build this connectivity on our own. To do this, we first need to instruct Docker to not create any iptables
rules. To do this, we set the --iptables
Docker option to false
in the Docker systemd drop in file:
ExecStart=/usr/bin/dockerd --iptables=false
We'll need to reload the systemd drop in file and restart the Docker service for Docker to reread the service parameters. To ensure that you start with a blank slate, if possible, restart the server or flush all the iptables
rules out manually (if you're not comfortable with managing the iptables
rules, the best approach is just to reboot the server to clear them out). We'll assume for the rest of the example that we're working with an empty ruleset. Once Docker is restarted, you can restart your two containers and ensure that there are no iptables
rules present on the system:
user@docker1:~$ docker start web1 web1 user@docker1:~$ docker start web2 web2 user@docker1:~$ sudo iptables -S -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT user@docker1:~$
As you can see, there are no iptables
rules currently defined. We can also see that our default chain policy in the filter table is set to ACCEPT
. Let's now change the default policy in the filter table to DROP
for each chain. Along with that, let's also include a rule to allow SSH to and from the host so as not to break our connectivity:
user@docker1:~$ sudo iptables -A INPUT -i eth0 -p tcp --dport 22 -m state --state NEW,ESTABLISHED -j ACCEPT user@docker1:~$ sudo iptables -A OUTPUT -o eth0 -p tcp --sport 22 -m state --state ESTABLISHED -j ACCEPT user@docker1:~$ sudo iptables -P INPUT DROP user@docker1:~$ sudo iptables -P FORWARD DROP user@docker1:~$ sudo iptables -P OUTPUT DROP
Let's now check the filter table once again to make sure that the rules were accepted:
user@docker1:~$ sudo iptables -S -P INPUT DROP -P FORWARD DROP -P OUTPUT DROP -A INPUT -i eth0 -p tcp -m tcp --dport 22 -m state --state NEW,ESTABLISHED -j ACCEPT -A OUTPUT -o eth0 -p tcp -m tcp --sport 22 -m state --state ESTABLISHED -j ACCEPT user@docker1:~$
At this point, the containers web1
and web2
will no longer be able to reach each other:
user@docker1:~$ docker exec -it web1 ping 172.17.0.3 -c 2 PING 172.17.0.3 (172.17.0.3): 48 data bytes user@docker1:~$
Depending on your operating system, you might notice that web1
actually is able to ping web2
at this point. The most likely reason for this is that the br_netfilter
kernel module has not been loaded. Without this module bridged packets will not be inspected by netfilter. To resolve this, you can manually load the module by using the sudo modprobe br_netfilter
command. To make the module load at each boot, you could add it to the /etc/modules
file as well. When Docker is managing the iptables
ruleset, it takes care of loading the module for you.
Now, let's start building the ruleset to recreate the connectivity that Docker previously built for us automatically. The first thing we want to do is allow containers inbound and outbound access. We'll do that with these two rules:
user@docker1:~$ sudo iptables -A FORWARD -i docker0 ! -o docker0 -j ACCEPT user@docker1:~$ sudo iptables -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
Although these two rules will allow containers to generate and receive traffic from the outside network, the connectivity still won't work at this point. In order for this to work, we need to apply the masquerade
rule so that the container traffic will be hidden behind an interface on the docker0
host. If we don't do this, the traffic will never get returned as the outside network knows nothing about the 172.17.0.0/16
network in which the containers live:
user@docker1:~$ sudo iptables -t nat -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
With this in place, the containers will now be able to reach network endpoints on the outside network:
user@docker1:~$ docker exec -it web1 ping 4.2.2.2 -c 2 PING 4.2.2.2 (4.2.2.2): 48 data bytes 56 bytes from 4.2.2.2: icmp_seq=0 ttl=50 time=36.261 ms 56 bytes from 4.2.2.2: icmp_seq=1 ttl=50 time=55.271 ms --- 4.2.2.2 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max/stddev = 36.261/45.766/55.271/9.505 ms user@docker1:~$
However, the containers still cannot communicate directly with each other:
user@docker1:~$ docker exec -it web1 ping 172.17.0.3 -c 2 PING 172.17.0.3 (172.17.0.3): 48 data bytes user@docker1:~$ docker exec -it web1 curl -S http://172.17.0.3 user@docker1:~$
We need to add one final rule:
sudo iptables -A FORWARD -i docker0 -o docker0 -j ACCEPT
Since traffic between containers both enters and leaves the docker0
bridge, this will allow the inter-container connectivity:
user@docker1:~$ docker exec -it web1 ping 172.17.0.3 -c 2 PING 172.17.0.3 (172.17.0.3): 48 data bytes 56 bytes from 172.17.0.3: icmp_seq=0 ttl=64 time=0.092 ms 56 bytes from 172.17.0.3: icmp_seq=1 ttl=64 time=0.086 ms --- 172.17.0.3 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max/stddev = 0.086/0.089/0.092/0.000 ms user@docker1:~$ user@docker1:~$ docker exec -it web1 curl http://172.17.0.3 <body> <html> <h1><span style="color:#FF0000;font-size:72px;">Web Server #2 - Running on port 80</span> </h1> </body> </html> user@docker1:~$
The only configuration remaining is to provide a mechanism to publish ports. We can do that by first provisioning a destination NAT on the Docker host itself. Even though Docker is not provisioning the NAT rules, it's still keeping track of the port allocations on your behalf. At container runtime if you choose to publish a port, Docker will allocate a port mapping for you even though it is not handling the publishing. It is wise to use the port Docker allocates to prevent overlaps:
user@docker1:~$ docker port web1 80/tcp -> 0.0.0.0:32768 user@docker1:~$ docker port web2 80/tcp -> 0.0.0.0:32769 user@docker1:~$ user@docker1:~$ sudo iptables -t nat -A PREROUTING ! -i docker0 -p tcp -m tcp --dport 32768 -j DNAT --to-destination 172.17.0.2:80 user@docker1:~$ sudo iptables -t nat -A PREROUTING ! -i docker0 -p tcp -m tcp --dport 32769 -j DNAT --to-destination 172.17.0.3:80 user@docker1:~$
Using the ports Docker allocated, we can define an inbound NAT rule for each container that translates inbound connectivity to an external port on the Docker host to the real container IP and service port. Finally, we just need to allow inbound traffic:
user@docker1:~$ sudo iptables -A FORWARD -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 80 -j ACCEPT user@docker1:~$ sudo iptables -A FORWARD -d 172.17.0.3/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 80 -j ACCEPT
Once these rules are configured, we can now test the connectivity from outside the Docker host on the published ports: