Hadoop Service Level Authorization

In addition to the HDFS permissions model and MapReduce jobs queues administration, you can specify which users and groups can access the different cluster services. This can be useful to limit access to HDFS and submitting jobs only to a small set of users.

To enable service level authorization, you need to add the following option in the core-site.xml configuration file:

<name>hadoop.security.authorization</name>
<value>true</value>

Similarly, to MapReduce queue privileges, service level ACLs are defined in a separate file called hadoop-policy.xml. CDH provides a sample of this file in the /etc/hadoop/conf directory and, by default, it is wide open (all users can access all services).

The difference between Service Level Authorization and HDFS or MapReduce permissions is the order in which these checks are performed. Permissions checks on services' levels are performed before the user starts communicating with HDFS or the MapReduce service. This can be useful to block some users or groups from the cluster completely.

Let's say we want to limit the access of HDFS and MapReduce to only the Linux group named hadoopusers. To achieve this, we need to set the following options in hadoop-policy.xml:

<name>security.client.protocol.acl</name>
<value> hadoopusers</value>
<name>security.client.datanode.protocol.acl</name>
<value> hadoopusers</value>

The preceding options will prevent users other than those in the hadoopusers group from communicating with HDFS daemons. The format for the values is the same as we used for MapReduce permissions. Notice that there is a space character before "hadoopuser" string.

To allow only the members of the hadoopuser group to submit MapReduce jobs, specify the following option:

<name>security.job.submission.protocol.acl</name>
<value> hadoopusers</value>

There are many more possible options that you can specify in the hadoop-policy.xml file, basically limiting access to any Hadoop service, including internal communication protocols. For example, you could allow only the hdfs user to be used for inter-DataNode communication, as well as NameNode communication. All these options are outlined in the sample hadoop-policy.xml file, so you can tune them if necessary.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset