Using Whirr

Using EMR is not the only way to deploy Hadoop in the cloud. If you prefer more control over the cluster installation and configuration process, you may want to explore other options.

Whirr is an Apache project that was developed to automate setting up and configuring Hadoop clusters in the cloud. Unlike EMR, Whirr can create Hadoop clusters using not only Amazon EC2, but also other cloud providers. As of now, Whirr supports EC2 and Rackspace cloud.

Installing and configuring Whirr

Whirr is not another Hadoop component. It is a collection of Java programs that helps you to automate creating a Hadoop cluster in the cloud. You can download Whirr from the project's website at:

http://www.apache.org/dyn/closer.cgi/whirr/

Whirr doesn't require any special steps to be installed. You can download the archive, unpack it, and start using the whirr binary, which can be found in the bin directory.

There are several configuration files you need to tune before you can use Whirr to launch clusters:

  1. First of all, you need to create the ~/.whirr/credentials file in your home directory. This file contains credentials that will be used to provision instances using your cloud provider. In case of Amazon EC2, this will be your Access Key ID and Secret Access Key. If you are using the Rackspace cloud, you will need to provide the username and API Key. You will have to copy the template file from conf/credentials.sample located in the Whirr installation directory.
  2. Next, you need to create a configuration file for your cluster. Here is a sample test-hadoop.properties file:
    whirr.cluster-name=testhadoop 
    whirr.instance-templates=1 hadoop-jobtracker 1 hadoop-namenode,5 hadoop-datanode+hadoop-tasktracker 
    whirr.provider=aws-ec2
    whirr.private-key-file=${sys:user.home}/.ssh/id_rsa
    whirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pub
  3. This configuration defines a 7-node cluster using Amazon EC2. The way Whirr specifies a cluster layout is very simple; you just need to specify the number and types of instances you want in the whirr-instance-templates variable. You will also need to generate a dedicated key pair to be used for the cluster setup. To launch the cluster with this configuration, run:
    #whirr launch-cluster --config test-hadoop.properties
    
  4. When you are done with the cluster, you can easily decommission it by running:
    #whirr destroy-cluster --config test-hadoop.properties
    

For more information on the available Whirr options, please refer to the project's documentation page at

http://whirr.apache.org/docs/0.8.1/configuration-guide.html#cloud-provider-config

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset