Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Building an Orchestrator cluster

In this recipe, we are building an Orchestrator cluster and configuring it for HA. Load-balancing is discussed in a separate recipe in this chapter.

Getting ready

The prerequisites for a cluster are not that hard, but they are important:

Shared DB: Please have a look at the recipe Configuring an external database in Chapter 1, Installing and Configuring Orchestrator. Also, check that you have set the required extras as stated in the There's more... section of the recipe Configuring an external database. This is especially important if you are using an MS SQL DB. Also note that if you're serious about HA, you may want to use a DB cluster.
You need two Orchestrator installations. It's best to use fresh ones, each with fixed IPs.
Make sure NTP is configured and the time is synced.
You should be familiar with the content of the recipe Important Orchestrator settings in Chapter 1, Installing and Configuring Orchestrator, as well as with the recipe Configuring the Orchestrator service SSL certificate in Chapter 2, Optimizing Orchestrator Configuration.

How to do it...

In this recipe, I will deploy and configure two fresh Orchestrator installations and configure them into a cluster.

Tip

We will discuss SSL certificates and clusters in the recipe Load-balancing Orchestrator in this chapter. If you want to use load-balancing or CA signed certificates with your cluster, please read the How it works... section of that recipe first before continuing.

Preparation work

Before we come to the main event, we need to prepare some things:

Create an external DB with dedicated user (the third rule of IT: Dedicated Services = Dedicated Users) and configure the DB according to Configuring an external database in Chapter 1, Installing and Configuring Orchestrator
Deploy two Orchestrator installations, as shown in the recipe Deploying the Orchestrator appliance in Chapter 1, Installing and Configuring Orchestrator

Configuring the first node of the cluster

We now prepare the first node of the cluster:

Connect the Orchestrator to an authentication source, as shown in the recipe Configuring external authentication in Chapter 1, Installing and Configuring Orchestrator.
Connect the Orchestrator to an external DB as shown in the recipe Configuring an external database in Chapter 1, Installing and Configuring Orchestrator.
If you want to use CA signed server certificates or the SSL SAM variant, then you need to configure them now. See the recipe Configuring the Orchestrator service SSL certificate in Chapter 2, Optimizing Orchestrator Configuration, and Load-balancing Orchestrator in this chapter.
Configure licensing and a package signing as shown in the recipe Important Orchestrator settings inChapter 1, Installing and Configuring Orchestrator.
You may need to force plugins reinstall, as shown in the recipe Important Orchestrator settings in Chapter 1, Installing and Configuring Orchestrator.
Configure the vCenter connection, as shown in the recipe Connecting to vCenter in Chapter 1, Installing and Configuring Orchestrator.
Add and configure any other plugins you need, as shown in the recipe Installing plugins in Chapter 1, Installing and Configuring Orchestrator.
Upload any package you need. Basically, make this Orchestrator installation production-ready.

Configure cluster settings

We are now configuring the cluster settings:

In the Orchestrator Control Center, click on Orchestrator Cluster Management.
For this example, set the Heartbeats to 1 and the Failover Heartbeats to 5. This setting will make sure that the Orchestrator server fails over after 5 seconds. More about the settings in the How it works... section.
Click on Save and restart the Orchestrator service:

Join a node to the cluster

Now we are joining an additional Orchestrator to the cluster; with 7.1 this becomes extremely easy:

Log in to the second Orchestrator's Control Center.
In Control Center, click Orchestrator Cluster Management | Join Node To Cluster.
Enter the FQDN or IP of the first Orchestrator node.
Enter the credentials of the root user and click on Join:
Restart the Orchestrator service.
Click on Orchestrator Cluster Management. The current status of the cluster is displayed at the bottom. It may take some time for the second node to show up:

You can see which Orchestrator is the active node by looking at the state. The local node is the node you are currently connected to.

Tip

Please note that the SSL certificate of the primary node has also been pushed out to the new node. Please also see the How it works... section of this recipe.

Configuring an Orchestrator cluster in vSphere

When you want your Orchestrator cluster to work properly, you also need to configure the Orchestrator VMs to be properly configured in vSphere:

Log in to vCenter and navigate to the cluster where your Orchestrator VMs are.
Click on Manage, then on VM Overrides, then on Add.
Add both Orchestrator VMs and set Automation Level to Disabled to make sure that the failed Orchestrator VM doesn't restart. This could cause a very unstable configuration. Click on OK.
Click on VM/Host Rules and then on Add.
Give the new rule a name, such as SeparateOrchestrator.
Select Separate Virtual Machines and then Add the two Orchestrator VMs. This will make sure that if an ESXi hosts fails, only one Orchestrator node is affected.
Click OK when you are finished.

Playing with the cluster

We will now simulate cluster failover:

Log in to vCenter.
Power off the active Orchestrator node; that should be at this stage the first node.
You will have to wait 5 seconds for the failover (see the settings in step 2 of the Configure cluster settings section).
Log in to the second node and check Orchestrator Cluster Management. You should see that the second node is now running and the first one is down (not responding):
In vCenter, power on the first node again.
After a while, you should now see that the first node is in standby mode.

Push configuration

The push configuration is a new feature in vRO 7.1 and makes the synchronization of clusters much easier. Let's have a look at this:

Log in to the active node of your cluster.
Add an additional plugin, as shown in the recipe Installing plugins in Chapter 1, Installing and Configuring Orchestrator.
Check Orchestrator Cluster Management. You should see that the second node is not in sync:
Click on Push Configuration, and wait until you see the notification saying that the node configuration was pushed successfully.
Click on Refresh. You should see that the second node requires a rest. You can now either restart the second node via the Control Center or click on Push Configuration and reset node.
Wait a little and then click Refresh.
Tip
Please note that a push configuration will also push the SSL certificate to all the nodes. Please also see the How it works... section of this recipe.

How it works...

Since vRO7.1, the configuration of Orchestrator clusters has become a lot easier. The function that an additional node will automatically be synced to the configuration of the cluster is a massive improvement and makes things a lot easier. The other function that was added is the ability to actively push a configuration to all the nodes, making it easier to change clusters.

Tip

Please note that with vRO7.1, the certificates are shared between the nodes when you join a cluster. This results in the fact that you will need to use SAN certificates for the Orchestrator cluster. Please also see the How it works... section of this recipe.

The Orchestrator cluster can function in two ways. The first and easiest is HA mode. This means that we have at least two Orchestrator installations, and if one fails, the other will continue running. When a workflow is running, Orchestrator will save the state of the workflow to the database, before executing a workflow element. This is the same behavior that lets us resume failed workflows or do debugging (see the recipe Resuming failed workflows in Chapter 4, Programming Skills).

What is happening when one server fails is that the last state of the workflow execution will be picked up by the new active node and continued. For a purely HA function, you set the Number of active nodes to one.

The difference between HA and load-balanced is that in the load-balanced versions multiple Orchestrators can execute workflows at the same time, meaning that each Orchestrator instance is doing less work. For load-balancing, you need to set the Number of active nodes to more than one and you should configure a load-balancer to Round-Robin.

Clearly, you can use both modes at the same time. For example, if you have four Orchestrator nodes and you have configured Number of active nodes to two, two of the Orchestrators are running and two are in standby. If one of the active nodes fails, then one of the standby nodes will be brought to active mode. If the failed node is available again, it becomes a standby node.

The Heartbeat interval (in seconds) gives the interval in which an Orchestrator node sends keep alive signals to all other nodes of the cluster.

The Number of failover heartbeats defines how many keep alive signals can be missed before a node is declared dead by the other members of the cluster.

You determine the failover time by multiplying the Heartbeat interval (in seconds) by the Number of failover heartbeats.

Tip

If you want to use local files in a clustered Orchestrator environment you should use NFS or SMB shares. See the recipe Configuring access to the local filesystem in Chapter 2, Optimizing Orchestrator Configuration.

SSL Certificates in vRO7.1.0

At the time of writing (vRO 7.1.0), when a node joins a cluster it will automatically take the certificate of the primary host. If you reconfigure a node for a different certificate, the cluster will be out of sync. If your security isn't allowing for SAN certificates, you can run with an unsynced cluster. It's not nice, but it works.

VMware has promised to make sure that in the next release the certificates will not be pushed out automatically, allowing you to create a separate machine account for each node.

Cluster and Orchestrator Client

When you have more than one active Orchestrator, you need to have a think about the Orchestrator Client usage. Officially, the usage is not supported but, works anyhow. The idea behind it is that it would be possible for two users (one on each of the Orchestrators) to modify the same resource (for example, a workflow). This can be worked around by not giving the users edit or administrator rights (see the recipe User management in Chapter 7, Interacting with Orchestrator) or by using locks (see the recipe Using the Locking System in Chapter 8, Better Workflows and Optimized Working).

The supported and best practice, however, is to test a change on a separate Orchestrator installation and then transfer it to the cluster when only one Orchestrator node is running and the workflow that is to be changed is not in use.

Changing cluster content

When you want to change content, such as workflows, that are stored on the cluster, you must shut down all but one Orchestrator services, then change the content on one server and restart all the other Orchestrator services.

If you are adding a new plugin, you will need to install this plugin on all nodes before restarting the Orchestrator services.

Changing cluster settings

When you would like to change the Orchestrator server settings, it's best to stop all but one Orchestrator nodes, change the settings, and then restart the others. If you don't do that, you will end up with an unstable cluster, meaning that the cluster fails over from one node to the other all the time. Try it out...

Removing a node from the cluster

This is easier than you think; just delete the node or reconfigure it to use a different database (such as the built-in PostgreSQL).

There's more...

There are several more interesting things.

Logs

When you are writing to logs in your workflow while using clusters, you should use the Server log, not the System log, as the System will be written to the localhost while the Server is written to the database. Check out the example workflow for this recipe.

Tip

Please remember that excessive logging will impact DB growth.

Another method of load-balancing

If you are looking for pure load-balancing, as in trying to run a process on several Orchestrators in parallel, you could also consider using the AMQP plugin. Have a look at the recipe Working with AMQP in Chapter 10, Built-in Plugins.

Example workflow - cluster test

In the example package, there is a workflow called 03.01 Cluster Test. For it to work, follow these steps:

Connect with the Orchestrator Client to the active node.
Start the workflow and wait until the first counts show in the logs.
Power off the active node.
Connect with the Orchestrator Client to the new active node (you need to wait a few seconds).
Check the workflow logs and then the events of the workflow.

You will see that the logs show only the entries that have been made after workflow execution was switched to the new host (System.log). The events tab will show all the log entries (Server.log).

Table of Contents for
Building an Orchestrator cluster

Building an Orchestrator cluster

Getting ready

How to do it...