Using the concepts demonstrated in previous chapters, let's walk through the creation and decomposition of a highly available router. In the following example, I've started out with an external provider network and tenant network named GATEWAY_NET
and TENANT_NET
, respectively:
Using the router-create
command with the --ha=true
option, an HA router named MyRouter-HA
will be created:
Upon the creation of the HA router, a network namespace will be created on at least two hosts running the Neutron L3 agent. In this demonstration, the L3 agent is running on the controller and both compute nodes. Neutron has been configured to create a network namespace on up to three L3 agents.
In the following screenshot, a router namespace that corresponds to the MyRouter-HA
router can be observed on each host:
Neutron automatically creates a network reserved for communication between the routers upon the creation of the first HA router within a tenant using the network defined by the l3_ha_net_cidr
configuration option in the L3 agent configuration file:
The HA network is not directly associated with the tenant and is not visible to anyone but an administrator, who can see all the networks:
The name of the HA network includes the ID of the tenant who created the router, and the network is used by Neutron for all HA routers created by this tenant in the future.
Both a gateway and internal interface will be attached using the router-gateway-set
and router-interface-add
commands, respectively. Neutron's router-port-list
command reveals the gateway, internal, and HA ports associated with the router:
The three HA ports will be created automatically by Neutron and are used for communication between the router namespaces on the hosts. Inside the network namespaces, we can find the corresponding interfaces, as shown in the following figures:
In the preceding figures, the router namespace on compute02
was elected as the master router, as evidenced by the virtual IP, 169.254.0.1/24
, being configured on the ha
interface within the namespace. In addition to the ha
interface, the qg
and qr
interfaces are only fully configured on the master router. If more than one router owns the virtual IP or you see the qg
and qr
interfaces fully configured on more than one router, there may be communication issues between the routers on the HA network.
Based on Neutron's configuration of keepalived, the virtual IP is not used as a gateway address and is only used as a standardized address that can failover to other routers as part of the VRRP failover mechanisms. Neutron does not treat addresses on the qg
or qr
interfaces as virtual IPs. Along with the virtual IP, the qg
and qr
interfaces should only be configured and active on the master router at any given time.
Neutron configures keepalived in each namespace so that together the namespaces can act as a single virtual router. A keepalived
service runs in each namespace and uses the following configuration file on the underlying host:
/var/lib/neutron/ha_confs/<router_id>/keepalived.conf
A look at the configuration file shows keepalived and VRRP configurations in use:
Under normal circumstances, a node is not likely to fail and router failover is unlikely to occur. A failure scenario can be recreated and failover tested by manually rebooting the node hosting the active router or by putting its physical interfaces in a down state.
Failover actions are logged in the following location within the router namespaces:
/var/lib/neutron/ha_confs/<router_id>/neutron-keepalived-state-change.log
An example of a router going from a backup state to master state once failure is detected can be observed in the following screenshot:
Once the former master router detects that a new master is elected, it moves to a backup state:
In the current Kilo release of OpenStack, there are outstanding bugs in relation to router failover and L2 population that affect both LinuxBridge- and Open vSwitch-based environments. When the l2population
and vxlan
drivers are used, Neutron programs the forwarding table with manual entries on each host to indicate which VXLAN tunnel to use when communicating with particular addresses.
Normally, when a router fails over to another node, the newly elected master sends a gratuitous ARP to indicate its new location, and MAC address tables or bridge forwarding tables are updated accordingly. Because Neutron is responsible for updating these tables dynamically based on the information in the database, the lack of notification to the Neutron agent when a router fails over means that stale information may remain in the FDB table when a failover occurs. Instances attempting to communicate with their respective gateway may utilize the wrong VXLAN tunnel, among other issues.
A fix for this issue has been implemented for Open vSwitch in the latest Kilo release. However, the respective LinuxBridge fix remains incomplete. More information on this issue can be seen in the following Neutron bug report: