Preface

In the last couple of years, Hadoop has become a standard solution for building data integration platforms. Introducing any new technology into a company's data infrastructure stack requires system engineers and database administrators to quickly learn all the aspects of the new component. Hadoop doesn't make this task any easier because it is not a single software product, but it is rather a collection of multiple separate open source projects. These projects need to be properly installed and configured in order to make the Hadoop platform robust and reliable.

Many existing Hadoop distributions provide a simplified way to install Hadoop using some kind of graphical interface. This approach dramatically reduces the amount of time required to go from zero to the fully functional Hadoop cluster. It also simplifies managing the cluster configuration. The problem with an automated setup and configuration is that it actually hides a lot of important aspects about Hadoop components that work together, such as why some components require other components, and which configuration parameters are the most important, and so on.

This book provides a guide to installing and configuring all the main Hadoop components manually. Setting up at least one fully operational cluster by yourself will provide very useful insights into how Hadoop operates under the hood and will make it much easier for you to debug any issues that may arise. You can also use this book as a quick reference to the main Hadoop components and configuration options gathered in one place and in a succinct format. While writing this book, I found myself constantly referring to it when working on real production Hadoop clusters, to look up a specific variable or refresh a best practice when it comes to OS configuration. This habit reassured me that such a guide might be useful to other aspiring and experienced Hadoop administrators and developers.

What this book covers

Chapter 1, Setting Up Hadoop Cluster – from Hardware to Distribution, reviews the main Hadoop components and approaches for choosing and sizing cluster hardware. It also touches on the topic of various Hadoop distributions.

Chapter 2, Installing and Configuring Hadoop, provides step-by-step instructions for installing and configuring the main Hadoop components: NameNode (including High Availability), JobTracker, DataNodes, and TaskTrackers.

Chapter 3, Configuring the Hadoop Ecosystem, reviews configuration procedures for Sqoop, Hive, and Impala.

Chapter 4, Securing Hadoop Installation, provides guidelines to securing various Hadoop components. It also provides an overview of configuring Kerberos with Hadoop.

Chapter 5, Monitoring Hadoop Cluster, guides you to getting your cluster ready for production usage.

Chapter 6, Deploying Hadoop to the Cloud, reviews using Hadoop in virtualized environments, including Elastic MapReduce and using Whirr.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset