Home Page Icon
Home Page
Table of Contents for
Table of Contents
Close
Table of Contents
by Danil Zburivsky
Hadoop Cluster Deployment
Hadoop Cluster Deployment
Table of Contents
Hadoop Cluster Deployment
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Errata
Piracy
Questions
1. Setting Up Hadoop Cluster – from Hardware to Distribution
Choosing Hadoop cluster hardware
Choosing the DataNode hardware
Low storage density cluster
High storage density cluster
NameNode and JobTracker hardware configuration
The NameNode hardware
The JobTracker hardware
Gateway and other auxiliary services
Network considerations
Hadoop hardware summary
Hadoop distributions
Hadoop versions
Choosing Hadoop distribution
Cloudera Hadoop distribution
Hortonworks Hadoop distribution
MapR
Choosing OS for the Hadoop cluster
Summary
2. Installing and Configuring Hadoop
Configuring OS for Hadoop cluster
Choosing and setting up the filesystem
Setting up Java Development Kit
Other OS settings
Setting up the CDH repositories
Setting up NameNode
JournalNode, ZooKeeper, and Failover Controller
Hadoop configuration files
NameNode HA configuration
JobTracker configuration
Configuring the job scheduler
JobQueueTaskScheduler
FairScheduler
CapacityTaskScheduler
DataNode configuration
TaskTracker configuration
Advanced Hadoop tuning
hdfs-site.xml
mapred-site.xml
core-site.xml
Summary
3. Configuring the Hadoop Ecosystem
Hosting the Hadoop ecosystem
Sqoop
Installing and configuring Sqoop
Sqoop import example
Sqoop export example
Hive
Hive architecture
Installing Hive Metastore
Installing the Hive client
Installing Hive Server
Impala
Impala architecture
Installing Impala state store
Installing the Impala server
Summary
4. Securing Hadoop Installation
Hadoop security overview
HDFS security
MapReduce security
Hadoop Service Level Authorization
Hadoop and Kerberos
Kerberos overview
Kerberos in Hadoop
Configuring Kerberos clients
Generating Kerberos principals
Enabling Kerberos for HDFS
Enabling Kerberos for MapReduce
Summary
5. Monitoring Hadoop Cluster
Monitoring strategy overview
Hadoop Metrics
JMX Metrics
Monitoring Hadoop with Nagios
Monitoring HDFS
NameNode checks
JournalNode checks
ZooKeeper checks
Monitoring MapReduce
JobTracker checks
Monitoring Hadoop with Ganglia
Summary
6. Deploying Hadoop to the Cloud
Amazon Elastic MapReduce
Installing the EMR command-line interface
Choosing the Hadoop version
Launching the EMR cluster
Temporary EMR clusters
Preparing input and output locations
Using Whirr
Installing and configuring Whirr
Summary
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Cover
Next
Next Chapter
Hadoop Cluster Deployment
Table of Contents
Hadoop Cluster Deployment
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Errata
Piracy
Questions
1. Setting Up Hadoop Cluster – from Hardware to Distribution
Choosing Hadoop cluster hardware
Choosing the DataNode hardware
Low storage density cluster
High storage density cluster
NameNode and JobTracker hardware configuration
The NameNode hardware
The JobTracker hardware
Gateway and other auxiliary services
Network considerations
Hadoop hardware summary
Hadoop distributions
Hadoop versions
Choosing Hadoop distribution
Cloudera Hadoop distribution
Hortonworks Hadoop distribution
MapR
Choosing OS for the Hadoop cluster
Summary
2. Installing and Configuring Hadoop
Configuring OS for Hadoop cluster
Choosing and setting up the filesystem
Setting up Java Development Kit
Other OS settings
Setting up the CDH repositories
Setting up NameNode
JournalNode, ZooKeeper, and Failover Controller
Hadoop configuration files
NameNode HA configuration
JobTracker configuration
Configuring the job scheduler
JobQueueTaskScheduler
FairScheduler
CapacityTaskScheduler
DataNode configuration
TaskTracker configuration
Advanced Hadoop tuning
hdfs-site.xml
mapred-site.xml
core-site.xml
Summary
3. Configuring the Hadoop Ecosystem
Hosting the Hadoop ecosystem
Sqoop
Installing and configuring Sqoop
Sqoop import example
Sqoop export example
Hive
Hive architecture
Installing Hive Metastore
Installing the Hive client
Installing Hive Server
Impala
Impala architecture
Installing Impala state store
Installing the Impala server
Summary
4. Securing Hadoop Installation
Hadoop security overview
HDFS security
MapReduce security
Hadoop Service Level Authorization
Hadoop and Kerberos
Kerberos overview
Kerberos in Hadoop
Configuring Kerberos clients
Generating Kerberos principals
Enabling Kerberos for HDFS
Enabling Kerberos for MapReduce
Summary
5. Monitoring Hadoop Cluster
Monitoring strategy overview
Hadoop Metrics
JMX Metrics
Monitoring Hadoop with Nagios
Monitoring HDFS
NameNode checks
JournalNode checks
ZooKeeper checks
Monitoring MapReduce
JobTracker checks
Monitoring Hadoop with Ganglia
Summary
6. Deploying Hadoop to the Cloud
Amazon Elastic MapReduce
Installing the EMR command-line interface
Choosing the Hadoop version
Launching the EMR cluster
Temporary EMR clusters
Preparing input and output locations
Using Whirr
Installing and configuring Whirr
Summary
Index
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset