Nova is one of the central services of OpenStack, and it is also one of the largest in terms of lines of code. It's also worth noting that Nova is one of the oldest OpenStack projects, and it has seen a lot of changes and development over the years. Nova leverages and interacts with many of the other OpenStack services. As a result, isolating and troubleshooting problems with Nova can be challenging, but in this chapter, we will give you the necessary tips to be successful.
When troubleshooting Nova, it helps to follow a series of steps as you seek to isolate the problems you may encounter. In this chapter, we will work through each of the following topics step by step as we troubleshoot Nova:
A successful Nova deployment will have multiple Nova services running, and, in addition, there will be multiple supporting services at play as well. A good first step when troubleshooting is to make sure that each of the services has been successfully initiated. We can check the various Nova services by running this command:
ps –aux | grep nova-
Be sure to include a dash (-
), as the Nova services are prefixed with nova-
. There are a lot of Nova processes, and in the following sections, we will look at each of these processes. The processes that we will explore are as follows:
nova-api
nova-scheduler
nova-conductor
nova-compute
The Nova API service is usually run on the controller node. Nova supports an OpenStack API, which is the default, in addition to an AWS EC2 API. A request to port 8774
will be handled by the OpenStack API. A request to port 8773
will be handled by the AWS EC2 API. Nova also supports a metadata service, which will listen on port 8775
.
In order to confirm that the nova-api
service is running, execute the following command:
ps -aux | grep nova-api
When nova-api
is running as expected, the output from this command will look like the output shown here:
If the Nova API is not running, the preceding ps -aux
command that you ran will come back with just the grep
command and no nova-api
processes. Also, when you attempt to use the command-line client with Nova, you may encounter an error like the following one:
This is your first clue to any problems with the nova-api
service. Your first course of action will be to attempt to start the nova-api
service. On Ubuntu systems that use upstart, you can run this command:
start nova-api
Once you start the service, you should make sure that it has actually started and continues running successfully. At this point, you want to run ps –aux | grep nova-api
again and make sure that the nova-api
process is still up.
If the nova-api
process isn't returned in the output, then I would recommend that you try to start the process manually. When you start an OpenStack process manually, that is, without the init
scripts, any errors during startup will be printed to the console. If you are dealing with a process that fails on startup, your log files will most likely be empty. Starting the process manually will provide you with the clues that you need to troubleshoot further. To start the nova-api
service manually, execute the following command:
sudo -u nova nova-api --config-file=/etc/nova/nova.conf
As the preceding command is executed, you will see the start up values printed to the console. Toward the end of this output, you will need to look for the lines indicating that your APIs have started up:
If you do not see lines similar to the lines shown in the preceding code snippet, you will most likely be staring at an error somewhere in the output. The good news is that this error will provide sufficient information to determine what is stopping the service from starting. While we cannot cover every potential cause within the confines of this book, we will take a look at the following few potential causes.
Suppose that, when starting the nova-api
service manually, you see an error like the one shown here:
ERROR nova error: [Errno 98] Address already in use
This error means that there is something else running on port 8774
, which Nova uses for the API service. You can further troubleshoot this issue by running this command:
lsof -i :8774
This command will tell you what is running on port 8774
. Once you clear this port, you can attempt to start the nova-api
service again by running start nova-api
. As always, we want to check whether the nova-api
process has started successfully by running the ps –aux | grep nova-api
command. If the API has not started successfully, we can attempt to start it manually, as we did before, and look for the error output.
Suppose that, when you attempt to start the nova-api
process manually, you receive an error like the one shown here:
An error like the preceding one points to the fact that there is a permission or ownership problem with the Nova configuration file, typically located at /etc/nova/nova.conf
:
chmod 644 /etc/nova/nova.conf chown nova:nova /etc/nova/nova.conf
The Nova configuration file needs to be readable by the Nova user. The preceding chmod
and chown
commands will set the proper permissions and ownership for this configuration file. After this fix, you can attempt to start the nova-api
service again and verify that it is running successfully. If it doesn't start successfully, remember to check the nova-api.log
file for clues.
The Nova scheduler service is responsible for selecting the compute node that will host a particular instance. If this service is not operating as expected, you will notice problems when trying to create new instances. To check whether the nova-scheduler
service is running, we can use the following command:
ps -aux | grep nova-scheduler
The output of this command should have a line similar to the following one:
If the Nova scheduler service does not start properly, there are a couple of things you should check. The first troubleshooting step should be attempting to start the nova-scheduler
service manually. You can do this by running the following command:
sudo -u nova /usr/bin/python /usr/local/bin/nova-scheduler --config-file=/etc/nova/nova.conf
Any errors returned from this command should give you clues as to why the nova-scheduler
service isn't starting. One error that you may see here is as follows:
As we've seen before, the Nova configuration file located at /etc/nova/nova.conf
needs to be readable by the Nova user. This error will cause problems with several Nova services, including the Nova scheduler. This problem can be resolved if you make sure that the configuration file is readable by the Nova user.
Once you run the nova-scheduler
service successfully, you may still discover problems with the service. Your troubleshooting process should continue by looking at the nova-scheduler
log for clues. The nova-scheduler
log is typically located at /etc/nova/nova-scheduler.log
. It is helpful to grep this log for errors by using a command like the one shown here:
less /var/log/nova/nova-scheduler.log | grep 'ERROR'
The output of this command will list any errors captured in the scheduler log files. There are a few errors to look out for in particular. To operate correctly, the Nova scheduler requires access to the OpenStack message broker and the Nova database.
The preceding error indicates that the nova-scheduler
service is not able to connect to the AMQP server. In this instance, you want to make sure that the message broker is running and accessible. If you are using RabbitMQ, you can check its status by running this command:
rabbitmqctl status
When the RabbitMQ service is not running, the output of this command will look similar to the output shown here:
The fix for this problem is to start your message broker. For RabbitMQ, you can use the following command to start the message broker:
service rabbitmq-server start
You can confirm that RabbitMQ has started successfully by running the rabbitmqctl
status command again. If RabbitMQ starts successfully, you will see an output similar to the following one:
In the nova-scheduler.log
file, you should also see a confirmation that the scheduler was able to successfully connect to the message broker. Look for the log lines like the ones shown in this code snippet:
2015-09-27 23:52:28.248 2355 INFO oslo.messaging._drivers.impl_rabbit [req-0c95b20c-a70d-40c8-bb90-deeec2f0cd47 - - - - -] Reconnected to AMQP server on myrabbitserver:5672 2015-09-27 23:52:28.249 2355 INFO oslo.messaging._drivers.impl_rabbit [req-0c95b20c-a70d-40c8-bb90-deeec2f0cd47 - - - - -] Connected to AMQP server on myrabbitserver:5672
The Nova compute service needs to be running on each of the compute nodes. You can check whether the service is running by executing the following command:
ps -aux | grep nova-compute
If you find that the nova-compute
service is not running, you can start the service by executing the following command on Ubuntu:
start nova-compute
After you attempt to start the nova-compute
server, make sure that it is running successfully using the ps -aux
command again. If the service does not start and remain running, you should try to start the service manually to check whether there are any errors printed out to the console. Use the following command to start the nova-compute
service manually:
sudo -u nova nova-compute --config-file=/etc/nova/nova.conf --config-file=/etc/nova/nova-compute.conf
After executing this command, there will be several log lines containing the startup information for the service. You want to be on the look out for any errors or traces printed in this output. If something pops up, use the details of the error to troubleshoot further.
The compute service is responsible for interacting with the underlying hypervisor and plays a critical role when manipulating instances in OpenStack. If there are problems with the Nova compute service, this can result in multiple issues or errors. For example, if you attempt to launch a new instance without running the nova-compute
service, you may see that the instance eventually ends up with an ERROR
status. For example, when you run nova list
, you may see an output like this:
You will also notice that the status of the instance is ERROR
. As demonstrated earlier, the first step is to make sure that the nova-compute
service is running. If it is and you are still experiencing problems, there are several reasons why you may find your instance in this state. To find more clues about the root cause, we should begin looking through the Nova logs. When troubleshooting an instance with an ERROR
state, you will want to look for errors in any of the following log files:
/var/log/nova/nova-compute.log
/var/log/nova/nova-scheduler.log
/var/log/nova/nova-conductor.log
If the nova-compute
service is indeed the root cause of the issue, you are likely to find an error in the nova-conductor.log
command, similar to the error shown here:
Remember that the nova-compute
service is what abstracts the hypervisor in OpenStack. You can run the nova hypervisor-list
command to see which hypervisors are available, and this can also give you clues about your compute hosts. For example, if we run nova hypervisor-list
when the nova-compute
service is down, we may see an output similar to this:
As you can see in this output, the state of the hypervisor is down
. This would indicate that we need to look at the nova-compute
service to ensure that it is functioning properly.
When the nova-compute
service is running, but instances are still ending up with an error state, you can continue troubleshooting by looking at a few more potential causes.
An error like the preceding one in the nova-compute.log
file is an indication that you may have a configuration problem on your compute host. Specifically, this error points to the setting of virt_type
in /etc/nova/nova-compute.conf
. The fix here would be to change virt_type
to a value accepted by your hypervisor. Specifically, we would change this value to qemu
. Remember to restart the nova-compute
service whenever you make a change to the nova-compute.conf
configuration file.
One of the purposes of the Nova conductor service is to handle all the database interactions on behalf of the compute nodes. This allows us to have a more secure installation, as the compute hosts don't have direct access to the database. To check whether nova-conductor
is running, you can use the following command:
ps -aux | grep nova-conductor
If nova-conductor
is running, the preceding command will return the following output:
If the service is not running, you can start it by running a command similar to the following command:
start nova-conductor
Once you attempt to start the service, be sure to confirm that it is running successfully using the ps -aux
command, like we did earlier. If you find that this service has not started in the right way, you should attempt to start the service manually and check the console for errors:
sudo -u nova nova-conductor --config-file=/etc/nova/nova.conf
When running the service manually, keep an eye on the console for traces or errors. These errors can help you identify issues that are prohibiting the Nova conductor from starting.
Since nova-compute
has a dependency on nova-conductor
, the order in which these services start is important. If you try to start nova-compute
before nova-conductor
, you will most likely see the following warning in your nova-compute.log
file:
The fix for the preceding issue is to simply start nova-conductor
and then start nova-compute
. If your nova-conductor
service is not running, you will have issues when trying to launch a new instance, as you can see in the following screenshot:
In the preceding example, the instance is stuck in the BUILD
status with scheduling
as the task state. A quick look at the nova-compute.log
file will reveal the root of the issue, as shown in the following screenshot:
As the error message indicates that nova-conductor
is not running. The fix here is to start the nova-conductor
service. Nova can typically recover from this sort of error, allowing your instance to eventually build successfully:
Note that the status of the preceding instance is ACTIVE
and the power state is Running
. This indicates that the instance has been successfully created. While there are various issues that may cause an instance to build unsuccessfully, you can typically determine the root cause by following these points: