Chapter 18. Security

To understand Hive security, we have to backtrack and understand Hadoop security and the history of Hadoop. Hadoop started out as a subproject of Apache Nutch. At that time and through its early formative years, features were prioritized over security. Security is more complex in a distributed system because multiple components across different machines need to communicate with each other.

Unsecured Hadoop like the versions before the v0.20.205 release derived the username by forking a call to the whoami program. Users are free to change this parameter by setting the hadoop.job.ugi property for FSShell (filesystem) commands. Map and reduce tasks all run under the same system user (usually hadoop or mapred) on TaskTracker nodes. Also, Hadoop components are typically listening on ports with high numbers. They are also typically launched by nonprivileged users (i.e., users other than root).

The recent efforts to secure Hadoop involved several changes, primarily the incorporation of Kerberos authorization support, but also other changes to close vulnerabilities. Kerberos allows mutual authentication between client and server. A client’s request for a ticket is passed along with a request. Tasks on the TaskTracker are run as the user who launched the job. Users are no longer able to impersonate other users by setting the hadoop.job.ugi property. For this to work, all Hadoop components must use Kerberos security from end to end.

Hive was created before any of this Kerberos support was added to Hadoop, and Hive is not yet fully compliant with the Hadoop security changes. For example, the connection to the Hive metastore may use a direct connection to a JDBC database or it may go through Thrift, which will have to take actions on behalf of the user. Components like the Thrift-based HiveService also have to impersonate other users. The file ownership model of Hadoop, where one owner and group own a file, is different than the model many databases have implemented where access is granted and revoked on a table in a row- or column-based manner.

This chapter attempts to highlight components of Hive that operate differently between secure and nonsecure Hadoop. For more information on Hadoop security, consult Hadoop: The Definitive Guide by Tom White (O’Reilly).

Warning

Security support in Hadoop is still relatively new and evolving. Some parts of Hive are not yet compliant with Hadoop security support. The discussion in this section summarizes the current state of Hive security, but it is not meant to be definitive.

For more information on Hive security, consult the Security wiki page https://cwiki.apache.org/confluence/display/Hive/Security. Also, more than in any other chapter in this book, we’ll occasionally refer you to Hive JIRA entries for more information.

Integration with Hadoop Security

Hive v0.7.0 added integration with Hadoop security,[24] meaning, for example, that when Hive sends MapReduce jobs to the JobTracker in a secure cluster, it will use the proper authentication procedures. User privileges can be granted and revoked, as we’ll discuss below.

There are still several known security gaps involving Thrift and other components, as listed on the security wiki page.

Authentication with Hive

When files and directories are owned by different users, the permissions set on the files become important. The HDFS permissions system is very similar to the Unix model, where there are three entities: user, group, and others. Also, there are three permissions: read, write, and execute. Hive has a configuration variable hive.files.umask.value that defines a umask value used to set the default permissions of newly created files, by masking bits:

<property>
  <name>hive.files.umask.value</name>
  <value>0002</value>
  <description>The dfs.umask value for the hive created folders</description>
</property>

Also, when the property hive.metastore.authorization.storage.checks is true, Hive prevents a user from dropping a table when the user does not have permission to delete the underlying files that back the table. The default value for this property is false, but it should be set to true:

<property>
  <name>hive.metastore.authorization.storage.checks</name>
  <value>true</value>
  <description>Should the metastore do authorization checks against
        the underlying storage for operations like drop-partition (disallow
  the drop-partition if the user in question doesn't have permissions
  to delete the corresponding directory on the storage).</description>
</property>

When running in secure mode, the Hive metastore will make a best-effort attempt to set hive.metastore.execute.setugi to true:

<property>
  <name>hive.metastore.execute.setugi</name>
  <value>false</value>
  <description>In unsecure mode, setting this property to true will
  cause the metastore to execute DFS operations using the client's
  reported user and group permissions. Note that this property must
  be set on both the client and server sides. Further note that its
  best effort. If client sets it to true and server sets it to false,
  client setting will be ignored.</description>
</property>

More details can be found at https://issues.apache.org/jira/browse/HIVE-842, “Authentication Infrastructure for Hive.”

Authorization in Hive

Hive v0.7.0 also added support for specifying authorization settings through HiveQL.[25]

By default, the authorization component is set to false. This needs to be set to true to enable authentication:

<property>
  <name>hive.security.authorization.enabled</name>
  <value>true</value>
  <description>Enable or disable the hive client authorization</description>
</property>
<property>
  <name>hive.security.authorization.createtable.owner.grants</name>
  <value>ALL</value>
  <description>The privileges automatically granted to the owner whenever
  a table gets created.An example like "select,drop" will grant select
        and drop privilege to the owner of the table</description>
</property>

By default, hive.security.authorization.createtable.owner.grants is set to null, disabling user access to her own tables. So, we also gave table creators subsequent access to their tables!

Warning

Currently it is possible for users to use the set command to disable authentication by setting this property to false.

Users, Groups, and Roles

Privileges are granted or revoked to a user, a group, or a role. We will walk through granting privileges to each of these entities:

hive> set hive.security.authorization.enabled=true;

hive> CREATE TABLE authorization_test (key int, value string);
Authorization failed:No privilege 'Create' found for outputs { database:default}.
Use show grant to get more details.

Already we can see that our user does not have the privilege to create tables in the default database. Privileges can be assigned to several entities. The first entity is a user: the user in Hive is your system user. We can determine the user and then grant that user permission to create tables in the default database:

hive> set system:user.name;
system:user.name=edward

hive> GRANT CREATE ON DATABASE default TO USER edward;

hive> CREATE TABLE authorization_test (key INT, value STRING);

We can confirm our privileges using SHOW GRANT:

hive> SHOW GRANT USER edward ON DATABASE default;

database        default
principalName   edward
principalType   USER
privilege       Create
grantTime       Mon Mar 19 09:18:10 EDT 2012
grantor edward

Granting permissions on a per-user basis becomes an administrative burden quickly with many users and many tables. A better option is to grant permissions based on groups. A group in Hive is equivalent to the user’s primary POSIX group:

hive> CREATE TABLE authorization_test_group(a int,b int);

hive> SELECT * FROM authorization_test_group;
Authorization failed:No privilege 'Select' found for inputs
{ database:default, table:authorization_test_group, columnName:a}.
Use show grant to get more details.

hive> GRANT SELECT on table authorization_test_group to group edward;

hive> SELECT * FROM authorization_test_group;
OK
Time taken: 0.119 seconds

When user and group permissions are not flexible enough, roles can be used. Users are placed into roles and then roles can be granted privileges. Roles are very flexible, because unlike groups that are controlled externally by the system, roles are controlled from inside Hive:

hive> CREATE TABLE authentication_test_role (a int , b int);

hive> SELECT * FROM authentication_test_role;
Authorization failed:No privilege 'Select' found for inputs
{ database:default, table:authentication_test_role, columnName:a}.
Use show grant to get more details.

hive> CREATE ROLE users_who_can_select_authentication_test_role;

hive> GRANT ROLE users_who_can_select_authentication_test_role TO USER edward;

hive> GRANT SELECT ON TABLE authentication_test_role
    > TO ROLE users_who_can_select_authentication_test_role;

hive> SELECT * FROM authentication_test_role;
OK
Time taken: 0.103 seconds

Privileges to Grant and Revoke

Table 18-1 lists the available privileges that can be configured.

Table 18-1. Privileges

NameDescription

ALL

All the privileges applied at once.

ALTER

The ability to alter tables.

CREATE

The ability to create tables.

DROP

The ability to remove tables or partitions inside of tables.

INDEX

The ability to create an index on a table (NOTE: not currently implemented).

LOCK

The ability to lock and unlock tables when concurrency is enabled.

SELECT

The ability to query a table or partition.

SHOW_DATABASE

The ability to view the available databases.

UPDATE

The ability to load or insert table into table or partition.

Here is an example session that illustrates the use of CREATE privileges:

hive> SET hive.security.authorization.enabled=true;

hive> CREATE DATABASE edsstuff;

hive> USE edsstuff;

hive> CREATE TABLE a (id INT);
Authorization failed:No privilege 'Create' found for outputs
{ database:edsstuff}. Use show grant to get more details.

hive> GRANT CREATE ON DATABASE edsstuff TO USER edward;

hive> CREATE TABLE a (id INT);

hive> CREATE EXTERNAL TABLE ab (id INT);

Similarly, we can grant ALTER privileges:

hive> ALTER TABLE a REPLACE COLUMNS (a int , b int);
Authorization failed:No privilege 'Alter' found for inputs
{ database:edsstuff, table:a}. Use show grant to get more details.

hive> GRANT ALTER ON TABLE a TO USER edward;

hive> ALTER TABLE a REPLACE COLUMNS (a int , b int);

Note that altering a table to add a partition does not require ALTER privileges:

hive> ALTER TABLE a_part_table ADD PARTITION (b=5);

UPDATE privileges are required to load data into a table:

hive> LOAD DATA INPATH '${env:HIVE_HOME}/NOTICE'
    > INTO TABLE a_part_table PARTITION (b=5);
Authorization failed:No privilege 'Update' found for outputs
{ database:edsstuff, table:a_part_table}. Use show grant to get more details.

hive> GRANT UPDATE ON TABLE a_part_table TO USER edward;

hive> LOAD DATA INPATH '${env:HIVE_HOME}/NOTICE'
    > INTO TABLE a_part_table PARTITION (b=5);
Loading data to table edsstuff.a_part_table partition (b=5)

Dropping a table or partition requires DROP privileges:

hive> ALTER TABLE a_part_table DROP PARTITION (b=5);
Authorization failed:No privilege 'Drop' found for inputs
{ database:edsstuff, table:a_part_table}. Use show grant to get more details.

Querying from a table or partition requires SELECT privileges:

hive> SELECT id FROM a_part_table;
Authorization failed:No privilege 'Select' found for inputs
{ database:edsstuff, table:a_part_table, columnName:id}. Use show
grant to get more details.

hive> GRANT SELECT ON TABLE a_part_table TO USER edward;

hive> SELECT id FROM a_part_table;

Warning

The syntax GRANT SELECT(COLUMN) is currently accepted but does nothing.

You can also grant all privileges:

hive> GRANT ALL ON TABLE a_part_table TO USER edward;

Partition-Level Privileges

It is very common for Hive tables to be partitioned. By default, privileges are granted on the table level. However, privileges can be granted on a per-partition basis. To do this, set the table property PARTITION_LEVEL_PRIVILEGE to TRUE:

hive> CREATE TABLE authorize_part (key INT, value STRING)
    > PARTITIONED BY (ds STRING);

hive> ALTER TABLE authorization_part
    > SET TBLPROPERTIES ("PARTITION_LEVEL_PRIVILEGE"="TRUE");
Authorization failed:No privilege 'Alter' found for inputs
{database:default, table:authorization_part}.
Use show grant to get more details.

hive> GRANT ALTER ON table authorization_part to user edward;

hive> ALTER TABLE authorization_part
    > SET TBLPROPERTIES ("PARTITION_LEVEL_PRIVILEGE"="TRUE");
hive> GRANT SELECT ON TABLE authorization_part TO USER edward;

hive> ALTER TABLE authorization_part ADD PARTITION (ds='3');

hive> ALTER TABLE authorization_part ADD PARTITION (ds='4');

hive> SELECT * FROM authorization_part WHERE ds='3';

hive> REVOKE SELECT ON TABLE authorization_part partition (ds='3') FROM USER edward;

hive> SELECT * FROM authorization_part WHERE ds='3';
Authorization failed:No privilege 'Select' found for inputs
{ database:default, table:authorization_part, partitionName:ds=3, columnName:key}.
Use show grant to get more details.

hive> SELECT * FROM authorization_part WHERE ds='4';
OK
Time taken: 0.146 seconds

Automatic Grants

Regular users will want to create tables and not bother with granting privileges to themselves to perform subsequent queries, etc. Earlier, we showed that you might want to grant ALL privileges, by default, but you can narrow the allowed privileges instead.

The property hive.security.authorization.createtable.owner.grants determines the automatically granted privileges for a table given to the user who created it. In the following example, rather than granting ALL privileges, the users are automatically granted SELECT and DROP privileges for their own tables:

<property>
  <name>hive.security.authorization.createtable.owner.grants</name>
  <value>select,drop</value>
</property>

Similarly, specific users can be granted automatic privileges on tables as they are created. The variable hive.security.authorization.createtable.user.grants controls this behavior. The following example shows how a Hive administrator admin1 and user edward are granted privileges to read every table, while user1 can only create tables:

<property>
  <name>hive.security.authorization.createtable.user.grants</name>
  <value>admin1,edward:select;user1:create</value>
</property>

Similar properties exist to automatically grant privileges to groups and roles. The names of the properties are hive.security.authorization.createtable.group.grants for groups and hive.security.authorization.createtable.role.grants for roles. The values of the properties follow the same format just shown.



[25] See https://issues.apache.org/jira/browse/HIVE-78, “Authorization infrastructure for Hive,” and a draft description of this feature at https://cwiki.apache.org/Hive/languagemanual-auth.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset