HDFS security

HDFS mimics the Unix-style filesystem permissions mode. Each file and directory has a user, group owner, and set of permissions. These permissions can allow or disallow user access to a given directory or file. For example:

# hdfs dfs -ls /
drwxr-xr-x   - mapred mapred        0 2013-05-27 04:40 /jobtracker
drwxrwxrwt   - hdfs   supergroup    0 2013-06-08 16:03 /tmp

You can see that the /jobtracker directory is owned by the user mapred and only this user is allowed to write files into this directory, while every user can read files in this directory. On the other hand, while the /tmp directory is owned by the hdfs user, everyone can read and write files there. This mimics the behavior of the Unix-type /tmp directory.

Note

Note that there is a sticky bit set on the /tmp directory as well. This will allow only the file owner to delete and rename files there.

To manipulate files permissions, HDFS provides commands that are similar to those of the Unix environment. As an example, let's create a home directory for the user alice and change the directory ownership:

[root@nn1 ~]# hdfs dfs -mkdir /user/alice/
mkdir: Permission denied: user=root, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x

This attempt to create a directory fails and we get an error because the Linux root user doesn't have appropriate permissions on HDFS.

In CDH, the hdfs user is an equivalent of superuser —a user with the highest level of privileges. This is because all the HDFS daemons are running under the hdfs user.

Note

It is important to distinguish the user "hdfs" from the hdfs command line tool.

To fix this error, we will switch to the hdfs user instead:

# sudo su - hdfs
$ hdfs dfs -mkdir /user/alice/
$ hdfs dfs -chown alice:alice /user/alice/

You can also use the –chmod command with syntax similar to the syntax in Linux, to change the access mode for files and directories.

Note

There is no direct connection between OS users and the permissions you assign on HDFS. When you change the directory or file ownership, Hadoop doesn't check if the user actually exists. You need to be careful with the correct spelling of the usernames, since you will not get user doesn't exist errors from HDFS.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset