Kubernetes abstracts the networking to enable communication between containers across nodes. The basic unit to make it possible is named pod, which is the smallest deployment unit in Kubernetes with a shared context in a containerized environment. Containers within a pod can communicate with others by port with the localhost. Kubernetes will deploy the pods across the nodes.
Then, how do pods talk to each other?
Kubernetes allocates each pod an IP address in a shared networking namespace so that pods can communicate with other pods across the network. There are a couple of ways to achieve the implementation. The easiest and across the platform way will be using flannel.
Flannel gives each host an IP subnet, which can be accepted by Docker and allocate the IPs to individual containers. Flannel uses etcd to store the IP mapping information, and has a couple of backend choices for forwarding the packets. The easiest backend choice would be using TUN device to encapsulate IP fragment in a UDP packet. The port is 8285
by default.
Flannel also supports in-kernel VXLAN as backend to encapsulate the packets. It might provide better performance than UDP backend while it is not running in user space. Another popular choice is using the advanced routing rule upon Google Cloud Engine (https://cloud.google.com/compute/docs/networking#routing). We'll use both UDP and VXLAN as examples in this section.
Flanneld is the agent of flannel used to watch the information from etcd, allocate the subnet lease on each host, and route the packets. What we will do in this section is let flanneld be up and running and allocate a subnet for each host.
If you're struggling to find out which backend should be used, here is a simple performance test between UDP and VXLAN. We use qperf (http://linux.die.net/man/1/qperf) to measure packet transfer performance between containers. TCP streaming one way bandwidth through UDP is 0.3x slower than VXLAN when there are some loads on the hosts. If you prefer building Kubernetes on the cloud, GCP is the easiest choice.
Before installing flannel, be sure you have the etcd endpoint. Flannel needs etcd as its datastore. If Docker is running, stop the Docker service first and delete docker0
, which is a virtual bridge created by Docker:
# Stop docker service $ service docker stop # delete docker0 $ ip link delete docker0
Using the etcdctl
command we learned in the previous section on the etcd instance, insert the desired configuration into etcd with the key /coreos.com/network/config
:
Configuration Key |
Description |
---|---|
|
IPv4 network for flannel to allocate to entire virtual network |
|
The subnet prefix length to each host, default is |
|
The beginning of IP range for flannel subnet allocation |
|
The end of IP range for flannel subnet allocation |
|
Backend choices for forwarding the packets. Default is |
# insert desired CIDR for the overlay network Flannel creates $ etcdctl set /coreos.com/network/config '{ "Network": "192.168.0.0/16" }'
Flannel will assign the IP address within 192.168.0.0/16
for overlay network with /24
for each host by default, but you could also overwrite its default setting and insert into etcd:
$ cat flannel-config-udp.json { "Network": "192.168.0.0/16", "SubnetLen": 28, "SubnetMin": "192.168.10.0", "SubnetMax": "192.168.99.0", "Backend": { "Type": "udp", "Port": 7890 } }
Use the etcdctl
command to insert the flannel-config-udp.json
configuration:
# insert the key by json file $ etcdctl set /coreos.com/network/config < flannel-config-udp.json
Then, flannel will allocate to each host with /28
subnet and only issue the subnets within 192.168.10.0
and 192.168.99.0
. Backend will be still udp
and the default port will be changed from 8285
to 7890
.
We could also use VXLAN to encapsulate the packets and use etcdctl
to insert the configuration:
$ cat flannel-config-vxlan.json { "Network": "192.168.0.0/16", "SubnetLen": 24, "Backend": { "Type": "vxlan", "VNI": 1 } } # insert the key by json file $ etcdctl set /coreos.com/network/config < flannel-config-vxlan.json
You might be able to see the configuration you get using etcdctl
:
$ etcdctl get /coreos.com/network/config { "Network": "192.168.0.0/16", "SubnetLen": 24, "Backend": { "Type": "vxlan", "VNI": 1 } }
RHEL 7, CentOS 7, or later have an official package for flannel. You can install them via the yum
command:
# install flannel package $ sudo yum install flannel
After the installation, we have to configure the etcd server in order to use the flannel service:
$ cat /etc/sysconfig/flanneld # Flanneld configuration options # etcd url location. Point this to the server where etcd runs FLANNEL_ETCD="<your etcd server>" # etcd config key. This is the configuration key that flannel queries # For address range assignment FLANNEL_ETCD_KEY="/coreos.com/network" # Any additional options that you want to pass #FLANNEL_OPTIONS=""
We should always keep flanneld
up and running all the time when we boot up the server. Using systemctl
could do the trick:
# Enable flanneld service by default $ sudo systemctl enable flanneld # start flanneld $ sudo service flanneld start # check if the service is running $ sudo service flannel status
You can always download a binary as an alternative. The CoreOS flannel official release page is here: https://github.com/coreos/flannel/releases. Choose the packages with the Latest release tag; it will always include the latest bug fixes:
# download flannel package $ curl -L -O https://github.com/coreos/flannel/releases/download/v0.5.5/flannel-0.5.5-linux-amd64.tar.gz # extract the package $ tar zxvf flannel-0.5.5-linux-amd64.tar.gz # copy flanneld to $PATH $ sudo cp flannel-0.5.5/flanneld /usr/local/bin
If you use a startup script (systemd
) in the etcd section, you might probably choose the same way to describe flanneld
:
$ cat /usr/lib/systemd/system/flanneld.service [Unit] Description=Flanneld overlay address etcd agent Wants=etcd.service After=etcd.service Before=docker.service [Service] Type=notify EnvironmentFile=/etc/sysconfig/flanneld EnvironmentFile=-/etc/sysconfig/docker-network ExecStart=/usr/bin/flanneld -etcd-endpoints=${FLANNEL_ETCD} -etcd-prefix=${FLANNEL_ETCD_KEY} $FLANNEL_OPTIONS Restart=on-failure RestartSec=5s [Install] WantedBy=multi-user.target
Then, enable the service on bootup using sudo systemctl enable flanneld
.
Alternatively, you could use a startup script (init) under /etc/init.d/flanneld
if you're using an init-based Linux:
#!/bin/bash # flanneld This shell script takes care of starting and stopping flanneld # # Source function library. . /etc/init.d/functions # Source networking configuration. . /etc/sysconfig/network prog=/usr/local/bin/flanneld lockfile=/var/lock/subsys/`basename $prog`
After you have sourced and set the variables, you should implement start, stop status, and restart for the service. The only thing you need to take care of is to ensure to add the etcd
endpoint into the configuration when the daemon starts:
start() { # Start daemon. echo -n $"Starting $prog: " daemon $prog --etcd-endpoints=="<your etcd server>" -ip-masq=true > /var/log/flanneld.log 2>&1 & RETVAL=$? echo [ $RETVAL -eq 0 ] && touch $lockfile return $RETVAL } stop() { [ "$EUID" != "0" ] && exit 4 echo -n $"Shutting down $prog: " killproc $prog RETVAL=$? echo [ $RETVAL -eq 0 ] && rm -f $lockfile return $RETVAL } case "$1" in start) start ;; stop) stop ;; status) status $prog ;; restart|force-reload) stop start ;; try-restart|condrestart) if status $prog > /dev/null; then stop start fi ;; reload) exit 3 ;; *) echo $"Usage: $0 {start|stop|status|restart|try-restart|force-reload}" exit 2 esac
If flannel gets stuck when starting up
Check out your etcd endpoint is accessible and the key listed in FLANNEL_ETCD_KEY
exists:
# FLANNEL_ETCD_KEY="/coreos.com/network/config" $ curl -L http://<etcd endpoint>:2379/v2/keys/coreos.com/network/config
You could also check out flannel logs using sudo journalctl -u flanneld
.
After the flannel service starts, you should be able to see a file in /run/flannel/subnet.env
and the flannel0
bridge in ifconfig
.
To ensure flannel works well and transmits the packets from the Docker virtual interface, we need to integrate it with Docker.
flanneld
is up and running, use the ifconfig
or ip
commands to see whether there is a flannel0
virtual bridge in the interface:# check current ipv4 range $ ip a | grep flannel | grep inet inet 192.168.50.0/16 scope global flannel0
We can see from the preceding example, the subnet lease of flannel0
is 192.168.50.0/16
.
flanneld
service starts, flannel will acquire the subnet lease and save in etcd and then write out the environment variable file in /run/flannel/subnet.env
by default, or you could change the default path using the --subnet-file
parameter when launching it:# check out flannel subnet configuration on this host $ cat /run/flannel/subnet.env FLANNEL_SUBNET=192.168.50.1/24 FLANNEL_MTU=1472 FLANNEL_IPMASQ=true
There are a couple of parameters that are supported by the Docker daemon. In /run/flannel/subnet.env
, flannel already allocated one subnet with the suggested MTU and IPMASQ settings. The corresponding parameters in Docker are:
Parameters |
Meaning |
---|---|
|
Specify network bridge IP ( |
|
Set the container network MTU (for |
|
(Optional) Enable IP masquerading |
/run/flannel/subnet.env
into the Docker daemon:# import the environment variables from subnet.env $ . /run/flannel/subnet.env # launch docker daemon with flannel information $ docker -d --bip=${FLANNEL_SUBNET} --mtu=${FLANNEL_MTU} # Or if your docker version is 1.8 or higher, use subcommand daemon instead $ docker daemon --bip=${FLANNEL_SUBNET} --mtu=${FLANNEL_MTU}
OPTIONS
of /etc/sysconfig/docker
, which is the Docker configuration file in CentOS:### in the file - /etc/sysconfig/docker # set the variables into OPTIONS $ OPTIONS="--bip=${FLANNEL_SUBNET} --mtu=${FLANNEL_MTU} --ip-masq=${FLANNEL_IPMASQ}"
In the preceding example, specify ${FLANNEL_SUBNET}
is replaced by 192.168.50.1/24
and ${FLANNEL_MTU}
is 1472
in the /etc/sysconfig/docker
.
service docker start
and type ifconfig
; you might be able to see the virtual network device docker0
and its allocated IP address from flannel.There are two virtual bridges named flannel0
and docker0
that are created in the previous steps. Let's take a look at their IP range using the ip
command:
# checkout IPv4 network in local $ ip -4 a | grep inet inet 127.0.0.1/8 scope host lo inet 10.42.1.171/24 brd 10.42.21.255 scope global dynamic ens160 inet 192.168.50.0/16 scope global flannel0 inet 192.168.50.1/24 scope global docker0
Host IP address is 10.42.1.171/24
, flannel0
is 192.168.50.0/16
, docker0
is 192.168.50.1/24
, and the route is set for the full flat IP range:
# check the route $ route -n Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.42.1.1 0.0.0.0 UG 100 0 0 ens160 192.168.0.0 0.0.0.0 255.255.0.0 U 0 0 0 flannel0 192.168.50.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0
Let's go a little bit deeper to see how etcd stores flannel subnet information. You could retrieve the network configuration by using the etcdctl
command in etcd:
# get network config $ etcdctl get /coreos.com/network/config { "Network": "192.168.0.0/16" } # show all the subnet leases $ etcdctl ls /coreos.com/network/subnets /coreos.com/network/subnets/192.168.50.0-24
The preceding example shows that the network CIDR is 192.168.0.0/16
. There is one subnet lease. Check the value of the key; it's exactly the IP address of eth0
on the host:
# show the value of the key of `/coreos.com/network/subnets/192.168.50.0-24` $ etcdctl get /coreos.com/network/subnets/192.168.50.0-24 {"PublicIP":"10.42.1.171"}
If you're using other backend solutions rather than simple UDP, you might see more configuration as follows:
# show the value when using different backend $ etcdctl get /coreos.com/network/subnets/192.168.50.0-24 {"PublicIP":"10.97.1.171","BackendType":"vxlan","BackendData":{"VtepMAC":"ee:ce:55:32:65:ce"}}
Following is an illustration about how a packet from Pod1 goes through the overlay network to Pod4. As we discussed before, every pod will have its own IP address and the packet is encapsulated so that pod IPs are routable. The packet from Pod1 will go through the veth (virtual network interface) device that connects to docker0, and routes to flannel0. The traffic is encapsulated by flanneld and sent to the host (10.42.1.172) of the target pod.
Let's perform a simple test by running two individual containers to see whether flannel works well. Assume we have two hosts (10.42.1.171
and 10.42.1.172
) with different subnets, which are allocated by Flannel with the same etcd backend, and have launched Docker run by docker run -it ubuntu /bin/bash
in each host:
Container 1 on host 1 (10.42.1.171) |
Container 2 on host 2 (10.42.1.172) |
---|---|
root@0cd2a2f73d8e:/# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 02:42:c0:a8:3a:08 inet addr:192.168.50.2 Bcast:0.0.0.0 Mask:255.255.255.0 inet6 addr: fe80::42:c0ff:fea8:3a08/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:8951 Metric:1 RX packets:8 errors:0 dropped:0 overruns:0 frame:0 TX packets:8 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:648 (648.0 B) TX bytes:648 (648.0 B) root@0cd2a2f73d8e:/# ping 192.168.65.2 PING 192.168.4.10 (192.168.4.10) 56(84) bytes of data. 64 bytes from 192.168.4.10: icmp_seq=2 ttl=62 time=0.967 ms 64 bytes from 192.168.4.10: icmp_seq=3 ttl=62 time=1.00 ms
|
root@619b3ae36d77:/# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 02:42:c0:a8:04:0a inet addr:192.168.65.2 Bcast:0.0.0.0 Mask:255.255.255.0 inet6 addr: fe80::42:c0ff:fea8:40a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:8973 Metric:1 RX packets:8 errors:0 dropped:0 overruns:0 frame:0 TX packets:8 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:648 (648.0 B) TX bytes:648 (648.0 B)
|
We can see that two containers can communicate with each other using ping. Let's observe the packet using tcpdump
in host2
, which is a command-line tool that can help dump traffic on a network:
# install tcpdump in container $ yum install -y tcpdump # observe the UDP traffic from host2 $ tcpdump host 10.42.1.172 and udp 11:20:10.324392 IP 10.42.1.171.52293 > 10.42.1.172.6177: UDP, length 106 11:20:10.324468 IP 10.42.1.172.47081 > 10.42.1.171.6177: UDP, length 106 11:20:11.324639 IP 10.42.1.171.52293 > 10.42.1.172.6177: UDP, length 106 11:20:11.324717 IP 10.42.1.172.47081 > 10.42.1.171.6177: UDP, length 106
The traffic between the containers are encapsulated in UDP through port 6177
using flanneld.
After setting up and understanding the overlay network, we have a good understanding of how flannel acts in Kubernetes. Check out the following recipes: