HDFS mimics the Unix-style filesystem permissions mode. Each file and directory has a user, group owner, and set of permissions. These permissions can allow or disallow user access to a given directory or file. For example:
# hdfs dfs -ls / drwxr-xr-x - mapred mapred 0 2013-05-27 04:40 /jobtracker drwxrwxrwt - hdfs supergroup 0 2013-06-08 16:03 /tmp
You can see that the /jobtracker
directory is owned by the user mapred
and only this user is allowed to write files into this directory, while every user can read files in this directory. On the other hand, while the /tmp
directory is owned by the hdfs
user, everyone can read and write files there. This mimics the behavior of the Unix-type /tmp
directory.
To manipulate files permissions, HDFS provides commands that are similar to those of the Unix environment. As an example, let's create a home directory for the user alice
and change the directory ownership:
[root@nn1 ~]# hdfs dfs -mkdir /user/alice/ mkdir: Permission denied: user=root, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x
This attempt to create a directory fails and we get an error because the Linux root user doesn't have appropriate permissions on HDFS.
In CDH, the hdfs
user is an equivalent of superuser
—a user with the highest level of privileges. This is because all the HDFS daemons are running under the hdfs
user.
To fix this error, we will switch to the hdfs
user instead:
# sudo su - hdfs $ hdfs dfs -mkdir /user/alice/ $ hdfs dfs -chown alice:alice /user/alice/
You can also use the –chmod
command with syntax similar to the syntax in Linux, to change the access mode for files and directories.
There is no direct connection between OS users and the permissions you assign on HDFS. When you change the directory or file ownership, Hadoop doesn't check if the user actually exists. You need to be careful with the correct spelling of the usernames, since you will not get user doesn't exist errors from HDFS.