Using EMR is not the only way to deploy Hadoop in the cloud. If you prefer more control over the cluster installation and configuration process, you may want to explore other options.
Whirr is an Apache project that was developed to automate setting up and configuring Hadoop clusters in the cloud. Unlike EMR, Whirr can create Hadoop clusters using not only Amazon EC2, but also other cloud providers. As of now, Whirr supports EC2 and Rackspace cloud.
Whirr is not another Hadoop component. It is a collection of Java programs that helps you to automate creating a Hadoop cluster in the cloud. You can download Whirr from the project's website at:
http://www.apache.org/dyn/closer.cgi/whirr/
Whirr doesn't require any special steps to be installed. You can download the archive, unpack it, and start using the whirr
binary, which can be found in the bin
directory.
There are several configuration files you need to tune before you can use Whirr to launch clusters:
~/.whirr/credentials
file in your home directory. This file contains credentials that will be used to provision instances using your cloud provider. In case of Amazon EC2, this will be your Access Key ID and Secret Access Key. If you are using the Rackspace cloud, you will need to provide the username and API Key. You will have to copy the template file from conf/credentials.sample
located in the Whirr installation directory.test-hadoop.properties
file:whirr.cluster-name=testhadoop whirr.instance-templates=1 hadoop-jobtracker 1 hadoop-namenode,5 hadoop-datanode+hadoop-tasktracker whirr.provider=aws-ec2 whirr.private-key-file=${sys:user.home}/.ssh/id_rsa whirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pub
whirr-instance-templates
variable. You will also need to generate a dedicated key pair to be used for the cluster setup. To launch the cluster with this configuration, run:#whirr launch-cluster --config test-hadoop.properties
#whirr destroy-cluster --config test-hadoop.properties
For more information on the available Whirr options, please refer to the project's documentation page at
http://whirr.apache.org/docs/0.8.1/configuration-guide.html#cloud-provider-config