OpenStack has been designed for highly scalable environments where it is possible to avoid single point of failures (SPOFs), but sometimes you must build this into your own environment. For example, Keystone is a central service underpinning your entire OpenStack environment, so you would build multiple instances into your environment. Glance is another service that is key to the running of your OpenStack environment. By setting up multiple instances running these services, controlled with Pacemaker and Corosync, we can enjoy an increase in resilience to failure of the nodes running these services. Using Pacemaker and Corosync is one way of providing a highly available solution to OpenStack services. This recipe is designed to give you options for your deployments and allow you to use Pacemaker and Corosync elsewhere in your environment.
For this recipe, we will assume that there are two controller
nodes available that are running Glance and Keystone. Installation of Keystone and Glance was covered in the first two chapters of this book.
The first controller1
node will have a host management address of 192.168.100.221
. The second controller2
node will have a host management address of 192.168.100.222
.
Visit https://github.com/OpenStackCookbook/Controller-Corosync.git for a two-node OpenStack Controller example that accompanies this section.
To install Pacemaker and Corosync on the two servers that will be running OpenStack services such as Keystone and Glance, carry out the following steps.
sudo apt-get update sudo apt-get install pacemaker corosync
/etc/hosts
to avoid DNS lookups:192.168.100.221 controller1.book controller1 192.168.100.222 controller2.book controller2
/etc/corosync/corosync.conf
file so that the interface section matches the following code:interface { # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 192.168.100.0 mcastaddr: 226.94.1.1 mcastport: 5405 }
corosync
service isn't set to start. To ensure that it starts, edit the /etc/default/corosync
service and set START=yes
, as follows:sudo sed -i 's/^START=no/START=yes/g' /etc/default/corosync
sudo corosync-keygen
corosync-keygen
command is waiting for entropy, run the following command:while /bin/true do dd if=/dev/urandom of=/tmp/100 bs=1024 count=100000 for i in {1..10} do cp /tmp/100 /tmp/tmp_$i_$RANDOM done rm -f /tmp/tmp_* /tmp/100 done
corosync-keygen
command has finished running and an authkey
file has been generated, simply press Ctrl + C to cancel this random entropy creation loop.We now need to install Pacemaker and Corosync on our second host, controller2
.
pacemaker
and corosync
packages as follows:sudo apt-get update sudo apt-get install pacemaker corosync
/etc/hosts
file has the same entries for our other host (as before):192.168.100.221 controller1.book controller1 192.168.100.222 controller2.book controller2
corosync
service isn't set to start. To ensure that it starts, edit the /etc/default/corosync
service and set START=yes
:sudo sed -i 's/^START=no/START=yes/g' /etc/default/corosync
With the /etc/corosync/corosync.conf
file modified and the /etc/corosync/authkey
file generated, we copy this to the other node (or nodes) in our cluster:
scp /etc/corosync/corosync.conf /etc/corosync/authkey [email protected]:
We can now put the same corosync.conf
file as used by our first node and the generated authkey
file into /etc/corosync
:
sudo mv corosync.conf authkey /etc/corosync
sudo service pacemaker start sudo service corosync start
crm_mon
command to query the cluster status:sudo crm_mon -1
This will return output similar to the following where the important information includes the number of nodes configured, the expected number of nodes, and a list of our two nodes that are online:
crm_verify
command:sudo crm_verify -L -V
stonith
:sudo crm configure property stonith-enabled=false
crm_verify
again will now show errors:sudo crm_verify -L
quorum
using the following command:sudo crm configure property no-quorum-policy=ignore
controller1
, we can now configure our services and set up a floating address that will be shared between the two servers. In the following command, we've chosen 192.168.100.253
as the FloatingIP
address and a monitoring interval of 5 seconds. To do this, we use the crm
command again to configure this FloatingIP
address, which we will call the FloatingIP
command. The command is as follows:sudo crm configure primitive FloatingIP ocf:heartbeat:IPaddr2 params ip=192.168.100.253 cidr_netmask=32 op monitor interval=5s
crm_mon
, we can now see that the FloatingIP
address has been assigned to our controller1
host:sudo crm_mon -1
The output is similar to the following example that now says we have one resource configured for this setup (our FloatingIP
):
FloatingIP
address of 192.168.100.253
to connect to our first node. When we power that node off, this address will be sent to our second node after 5 seconds of no response from the first node. We can test this FloatingIP
address by executing the following commands from either of the controller hosts:export OS_TENANT_NAME=cookbook export OS_USERNAME=admin export OS_PASSWORD=openstack export OS_AUTH_URL=https://192.168.100.253:5000/v2.0/ keystone --insecure endpoint-list
We will get an output similar to this:
Making OpenStack services highly available is a complex subject, and there are a number of ways to achieve this. Using Pacemaker and Corosync is a very good solution to this problem. It allows us to configure a floating IP address assigned to the cluster that will attach itself to the appropriate node (using Corosync), as well as control services using agents, so the cluster manager can start and stop services as required to provide a highly available experience to the end user.
We install both Keystone and Glance onto two nodes (each configured appropriately with a remote database backend such as MySQL and Galera), having the images available using a shared filesystem or cloud storage solution. Doing this provides us with the advantage of configuring these services with Pacemaker, and allowing Pacemaker to monitor these services. If the required services are unavailable on the active node, Pacemaker can start those services on the passive node.