Chapter 4. Compute Nodes

Compute nodes form the resource core of the OpenStack Compute cloud, providing the processing, memory, network and storage resources to run instances.

CPU Choice

The type of CPU in your compute node is a very important choice. First, ensure the CPU supports virtualization by way of VT-x for Intel chips and AMD-v for AMD chips.

The number of cores that the CPU has also affects the decision. It’s common for current CPUs to have up to 12 cores. Additionally, if the CPU supports Hyper-threading, those 12 cores are doubled to 24 cores. If you purchase a server that supports multiple CPUs, the number of cores is further multiplied.

Whether you should enable hyper-threading on your CPUs depends upon your use case. We recommend you do performance testing with your local workload with both hyper-threading on and off to determine what is more appropriate in your case.

Hypervisor Choice

OpenStack Compute supports many hypervisors to various degrees, including KVM, LXC, QEMU, UML, VMWare ESX/ESXi, Xen, PowerVM, Hyper-V.

Probably the most important factor in your choice of hypervisor is your current usage or experience. Aside from that, there are practical concerns to do with feature parity, documentation, and the level of community experience.

For example, KVM is the most widely adopted hypervisor in the OpenStack community. Besides KVM, more deployments exist running Xen, LXC, VMWare and Hyper-V than the others listed — however, each of these are lacking some feature support or the documentation on how to use them with OpenStack is out of date.

The best information available to support your choice is found on the Hypervisor Support Matrix (https://wiki.openstack.org/wiki/HypervisorSupportMatrix), and in the configuration reference (http://docs.openstack.org/trunk/config-reference/content/section_compute-hypervisors.html).

It is also possible to run multiple hypervisors in a single deployment using Host Aggregates or Cells. However, an individual compute node can only run a single hypervisor at a time.

Instance Storage Solutions

As part of the procurement for a compute cluster, you must specify some storage for the disk on which the instantiated instance runs. There are three main approaches to providing this temporary-style storage, and it is important to understand the implications of the choice.

They are:

  • Off compute node storage – shared file system

  • On compute node storage – shared file system

  • On compute node storage – non-shared file system

In general, the questions you should be asking when selecting the storage are as follows:

  • What is the platter count you can achieve?

  • Do more spindles result in better I/O despite network access?

  • Which one results in the best cost-performance scenario you’re aiming for?

  • How do you manage the storage operationally?

Off Compute Node Storage – Shared File System

Many operators use separate compute and storage hosts. Compute services and storage services have different requirements, compute hosts typically require more CPU and RAM than storage hosts. Therefore, for a fixed budget, it makes sense to have different configurations for your compute nodes and your storage nodes with compute nodes invested in CPU and RAM, and storage nodes invested in block storage.

Also, if you use separate compute and storage hosts then you can treat your compute hosts as “stateless”. This simplifies maintenance for the compute hosts. As long as you don’t have any instances currently running on a compute host, you can take it offline or wipe it completely without having any effect on the rest of your cloud.

However, if you are more restricted in the number of physical hosts you have available for creating your cloud and you want to be able to dedicate as many of your hosts as possible to running instances, it makes sense to run compute and storage on the same machines.

In this option, the disks storing the running instances are hosted in servers outside of the compute nodes. There are also several advantages to this approach:

  • If a compute node fails, instances are usually easily recoverable.

  • Running a dedicated storage system can be operationally simpler.

  • Being able to scale to any number of spindles.

  • It may be possible to share the external storage for other purposes.

The main downsides to this approach are:

  • Depending on design, heavy I/O usage from some instances can affect unrelated instances.

  • Use of the network can decrease performance.

On Compute Node Storage – Shared File System

In this option, each nova-compute node is specified with a significant amount of disks, but a distributed file system ties the disks from each compute node into a single mount. The main advantage of this option is that it scales to external storage when you require additional storage.

However, this option has several downsides:

  • Running a distributed file system can make you lose your data locality compared with non-shared storage.

  • Recovery of instances is complicated by depending on multiple hosts.

  • The chassis size of the compute node can limit the number of spindles able to be used in a compute node.

  • Use of the network can decrease performance.

On Compute Node Storage – Non-shared File System

In this option, each nova-compute node is specified with enough disks to store the instances it hosts. There are two main reasons why this is a good idea:

  • Heavy I/O usage on one compute node does not affect instances on other compute nodes.

  • Direct I/O access can increase performance.

This has several downsides:

  • If a compute node fails, the instances running on that node are lost.

  • The chassis size of the compute node can limit the number of spindles able to be used in a compute node.

  • Migrations of instances from one node to another are more complicated, and rely on features which may not continue to be developed.

  • If additional storage is required, this option does not to scale.

Issues with Live Migration

We consider live migration an integral part of the operations of the cloud. This feature provides the ability to seamlessly move instances from one physical host to another, a necessity for performing upgrades that require reboots of the compute hosts, but only works well with shared storage.

Live migration can be also done with non-shared storage, using a feature known as KVM live block migration. While an earlier implementation of block-based migration in KVM and QEMU was considered unreliable, there is a newer, more reliable implementation of block-based live migration as of QEMU 1.4 and libvirt 1.0.2 that is also compatible with OpenStack. However, none of the authors of this guide have first-hand experience using live block migration.

Choice of File System

If you want to support shared storage live migration, you’ll need to configure a distributed file system.

Possible options include:

  • NFS (default for Linux)

  • GlusterFS

  • MooseFS

  • Lustre

We’ve seen deployments with all, and recommend you choose the one you are most familiar with operating.

Overcommitting

OpenStack allows you to overcommit CPU and RAM on compute nodes. This allows you to increase the number of instances you can have running on your cloud, at the cost of reducing the performance of the instances. OpenStack Compute uses the following ratios by default:

  • CPU allocation ratio: 16

  • RAM allocation ratio: 1.5

The default CPU allocation ratio of 16 means that the scheduler allocates up to 16 virtual cores on a node per physical core. For example, if a physical node has 12 cores, the scheduler allocates up to 192 virtual cores to instances (such as, 48 instances, in the case where each instance has 4 virtual cores).

Similarly, the default RAM allocation ratio of 1.5 means that the scheduler allocates instances to a physical node as long as the total amount of RAM associated with the instances is less than 1.5 times the amount of RAM available on the physical node.

For example, if a physical node has 48 GB of RAM, the scheduler allocates instances to that node until the sum of the RAM associated with the instances reaches 72 GB (such as nine instances, in the case where each instance has 8 GB of RAM).

You must select the appropriate CPU and RAM allocation ratio for your particular use case.

Logging

Logging is detailed more fully in the section called “Logging”. However it is an important design consideration to take into account before commencing operations of your cloud.

OpenStack produces a great deal of useful logging information, however, in order for it to be useful for operations purposes you should consider having a central logging server to send logs to, and a log parsing/analysis system (such as logstash).

Networking

Networking in OpenStack is a complex, multi-faceted challenge. See Chapter 6.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset