Chapter 4. Securing Hadoop Installation

In this chapter, we will look into the essential topics related to Hadoop security. As you know, Hadoop consists of multiple components and securing a Hadoop cluster means securing each of those components. This makes securing a Hadoop cluster a nontrivial task. In this chapter, we will cover the following topics:

  • Hadoop security overview
  • HDFS security
  • MapReduce security
  • Hadoop Service Level Authorization
  • Hadoop and Kerberos

Hadoop security overview

Originally, Hadoop was designed to operate in a trusted environment. It was assumed that all cluster users can be trusted to correctly present their identity and will not try to obtain more permissions than they have. This resulted in implementation of a simple security mode, which is the default authentication system in Hadoop. In a simple security mode, Hadoop trusts the operating system to provide the user's identity. Unlike most relational databases, Hadoop doesn't have any centralized users and privileges storage. There is no user/password concept that would allow Hadoop to properly authenticate the user. Instead, Hadoop accepts the name of the user as represented by the operating system and trusts it without any further checks. The problem with this model is that it is possible to impersonate another user. For example, a rogue user could use a custom built HDFS client, which instead of using Java calls to identify the current OS user will just substitute it with the root user and gain full access to all the data. The simple security mode can still be a preferred choice in some cases, especially if the cluster is in a trusted environment and there are only a few users having access to the system.

For many organizations, such a relaxed approach to user authorization is not acceptable. Companies are starting to store sensitive data in their Hadoop clusters and are concerned with being able to isolate data in large multitenant environments. To solve this problem, Hadoop introduced support for Kerberos. Kerberos is a proven authentication protocol. The Kerberos server is used as an external user's repository, which allows users to authenticate with a password. Users that have successfully authenticated with Kerberos are granted an access ticket. They can use the Hadoop services while this ticket is valid. Kerberos support is introduced not only for external cluster users, but for all internal services as well. This means that all DataNode, TaskTracker, NameNode, and so on daemons need to authenticate with Kerberos before they can join the cluster. Kerberos provides a much stronger security mode for Hadoop, but it also introduces additional challenges in terms of configuration and maintenance of the cluster.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset