To understand Hive security, we have to backtrack and understand Hadoop security and the history of Hadoop. Hadoop started out as a subproject of Apache Nutch. At that time and through its early formative years, features were prioritized over security. Security is more complex in a distributed system because multiple components across different machines need to communicate with each other.
Unsecured Hadoop like the versions before the v0.20.205 release
derived the username by forking a call to the whoami
program. Users are free to change this parameter by setting the hadoop.job.ugi
property for FSShell
(filesystem) commands. Map and reduce
tasks all run under the same system user (usually hadoop
or mapred
) on TaskTracker nodes.
Also, Hadoop components are typically listening on ports with high numbers.
They are also typically launched by nonprivileged users (i.e., users other
than root).
The recent efforts to secure Hadoop involved several changes,
primarily the incorporation of Kerberos
authorization support, but also other changes to close vulnerabilities.
Kerberos allows mutual authentication between client and server. A client’s
request for a ticket is passed along
with a request. Tasks on the TaskTracker are run as the
user who launched the job. Users are no longer able to impersonate other
users by setting the hadoop.job.ugi
property. For this to work, all Hadoop components must use Kerberos security
from end to end.
Hive was created before any of this Kerberos support was added to Hadoop, and Hive is not yet fully compliant with the Hadoop security changes. For example, the connection to the Hive metastore may use a direct connection to a JDBC database or it may go through Thrift, which will have to take actions on behalf of the user. Components like the Thrift-based HiveService also have to impersonate other users. The file ownership model of Hadoop, where one owner and group own a file, is different than the model many databases have implemented where access is granted and revoked on a table in a row- or column-based manner.
This chapter attempts to highlight components of Hive that operate differently between secure and nonsecure Hadoop. For more information on Hadoop security, consult Hadoop: The Definitive Guide by Tom White (O’Reilly).
Security support in Hadoop is still relatively new and evolving. Some parts of Hive are not yet compliant with Hadoop security support. The discussion in this section summarizes the current state of Hive security, but it is not meant to be definitive.
For more information on Hive security, consult the Security wiki page https://cwiki.apache.org/confluence/display/Hive/Security. Also, more than in any other chapter in this book, we’ll occasionally refer you to Hive JIRA entries for more information.
Hive v0.7.0 added integration with Hadoop security,[24] meaning, for example, that when Hive sends MapReduce jobs to the JobTracker in a secure cluster, it will use the proper authentication procedures. User privileges can be granted and revoked, as we’ll discuss below.
There are still several known security gaps involving Thrift and other components, as listed on the security wiki page.
When files and directories are owned by different users, the
permissions set on the files become important. The HDFS permissions system
is very similar to the Unix model, where there are three entities:
user, group, and
others. Also, there are three permissions:
read, write, and
execute. Hive has a configuration variable hive.files.umask.value
that defines a
umask
value used to set the default permissions of
newly created files, by masking bits:
<property>
<name>
hive.files.umask.value</name>
<value>
0002</value>
<description>
The dfs.umask value for the hive created folders</description>
</property>
Also, when the property hive.metastore.authorization.storage.checks
is
true
, Hive prevents a user from
dropping a table when the user does not have permission to delete the
underlying files that back the table. The default value for this property
is false
, but it should be set to
true
:
<property>
<name>
hive.metastore.authorization.storage.checks</name>
<value>
true</value>
<description>
Should the metastore do authorization checks against the underlying storage for operations like drop-partition (disallow the drop-partition if the user in question doesn't have permissions to delete the corresponding directory on the storage).</description>
</property>
When running in secure mode, the Hive metastore will make a
best-effort attempt to set hive.metastore.execute.setugi
to true:
<property>
<name>
hive.metastore.execute.setugi</name>
<value>
false</value>
<description>
In unsecure mode, setting this property to true will cause the metastore to execute DFS operations using the client's reported user and group permissions. Note that this property must be set on both the client and server sides. Further note that its best effort. If client sets it to true and server sets it to false, client setting will be ignored.</description>
</property>
More details can be found at https://issues.apache.org/jira/browse/HIVE-842, “Authentication Infrastructure for Hive.”
Hive v0.7.0 also added support for specifying authorization settings through HiveQL.[25]
By default, the authorization component is set to false
. This needs to be set to true
to enable authentication:
<property>
<name>
hive.security.authorization.enabled</name>
<value>
true</value>
<description>
Enable or disable the hive client authorization</description>
</property>
<property>
<name>
hive.security.authorization.createtable.owner.grants</name>
<value>
ALL</value>
<description>
The privileges automatically granted to the owner whenever a table gets created.An example like "select,drop" will grant select and drop privilege to the owner of the table</description>
</property>
By default, hive.security.authorization.createtable.owner.grants
is set to null
, disabling user access
to her own tables. So, we also gave table creators subsequent access to
their tables!
Currently it is possible for users to use the set
command to disable authentication by
setting this property to false
.
Privileges are granted or revoked to a user, a group, or a role. We will walk through granting privileges to each of these entities:
hive> set hive.security.authorization.enabled=true; hive> CREATE TABLE authorization_test (key int, value string); Authorization failed:No privilege 'Create' found for outputs { database:default}. Use show grant to get more details.
Already we can see that our user does not have the privilege to create tables in the default database. Privileges can be assigned to several entities. The first entity is a user: the user in Hive is your system user. We can determine the user and then grant that user permission to create tables in the default database:
hive> set system:user.name; system:user.name=edward hive> GRANT CREATE ON DATABASE default TO USER edward; hive> CREATE TABLE authorization_test (key INT, value STRING);
We can confirm our privileges using SHOW
GRANT
:
hive> SHOW GRANT USER edward ON DATABASE default; database default principalName edward principalType USER privilege Create grantTime Mon Mar 19 09:18:10 EDT 2012 grantor edward
Granting permissions on a per-user basis becomes an administrative burden quickly with many users and many tables. A better option is to grant permissions based on groups. A group in Hive is equivalent to the user’s primary POSIX group:
hive> CREATE TABLE authorization_test_group(a int,b int); hive> SELECT * FROM authorization_test_group; Authorization failed:No privilege 'Select' found for inputs { database:default, table:authorization_test_group, columnName:a}. Use show grant to get more details. hive> GRANT SELECT on table authorization_test_group to group edward; hive> SELECT * FROM authorization_test_group; OK Time taken: 0.119 seconds
When user and group permissions are not flexible enough, roles
can be used. Users are placed into roles
and then roles can be granted privileges. Roles are very flexible,
because unlike groups that are controlled externally by the system,
roles are controlled from inside Hive:
hive
>
CREATE
TABLE
authentication_test_role
(
a
int
,
b
int
);
hive
>
SELECT
*
FROM
authentication_test_role
;
Authorization
failed
:
No
privilege
'Select'
found
for
inputs
{
database
:
default
,
table
:
authentication_test_role
,
columnName
:
a
}
.
Use
show
grant
to
get
more
details
.
hive
>
CREATE
ROLE
users_who_can_select_authentication_test_role
;
hive
>
GRANT
ROLE
users_who_can_select_authentication_test_role
TO
USER
edward
;
hive
>
GRANT
SELECT
ON
TABLE
authentication_test_role
>
TO
ROLE
users_who_can_select_authentication_test_role
;
hive
>
SELECT
*
FROM
authentication_test_role
;
OK
Time
taken
:
0
.
103
seconds
Table 18-1 lists the available privileges that can be configured.
Name | Description |
---|---|
| All the privileges applied at once. |
| The ability to alter tables. |
| The ability to create tables. |
| The ability to remove tables or partitions inside of tables. |
| The ability to create an index on a table (NOTE: not currently implemented). |
| The ability to lock and unlock tables when concurrency is enabled. |
| The ability to query a table or partition. |
| The ability to view the available databases. |
| The ability to load or insert table into table or partition. |
Here is an example session that illustrates the use of
CREATE
privileges:
hive
>
SET
hive
.
security
.
authorization
.
enabled
=
true
;
hive
>
CREATE
DATABASE
edsstuff
;
hive
>
USE
edsstuff
;
hive
>
CREATE
TABLE
a
(
id
INT
);
Authorization
failed
:
No
privilege
'Create'
found
for
outputs
{
database
:
edsstuff
}
.
Use
show
grant
to
get
more
details
.
hive
>
GRANT
CREATE
ON
DATABASE
edsstuff
TO
USER
edward
;
hive
>
CREATE
TABLE
a
(
id
INT
);
hive
>
CREATE
EXTERNAL
TABLE
ab
(
id
INT
);
Similarly, we can grant ALTER
privileges:
hive
>
ALTER
TABLE
a
REPLACE
COLUMNS
(
a
int
,
b
int
);
Authorization
failed
:
No
privilege
'Alter'
found
for
inputs
{
database
:
edsstuff
,
table
:
a
}
.
Use
show
grant
to
get
more
details
.
hive
>
GRANT
ALTER
ON
TABLE
a
TO
USER
edward
;
hive
>
ALTER
TABLE
a
REPLACE
COLUMNS
(
a
int
,
b
int
);
Note that altering a table to add a partition does not require
ALTER
privileges:
hive
>
ALTER
TABLE
a_part_table
ADD
PARTITION
(
b
=
5
);
UPDATE
privileges are required
to load data into a table:
hive
>
LOAD
DATA
INPATH
'${env:HIVE_HOME}/NOTICE'
>
INTO
TABLE
a_part_table
PARTITION
(
b
=
5
);
Authorization
failed
:
No
privilege
'Update'
found
for
outputs
{
database
:
edsstuff
,
table
:
a_part_table
}
.
Use
show
grant
to
get
more
details
.
hive
>
GRANT
UPDATE
ON
TABLE
a_part_table
TO
USER
edward
;
hive
>
LOAD
DATA
INPATH
'${env:HIVE_HOME}/NOTICE'
>
INTO
TABLE
a_part_table
PARTITION
(
b
=
5
);
Loading
data
to
table
edsstuff
.
a_part_table
partition
(
b
=
5
)
Dropping a table or partition requires DROP
privileges:
hive
>
ALTER
TABLE
a_part_table
DROP
PARTITION
(
b
=
5
);
Authorization
failed
:
No
privilege
'Drop'
found
for
inputs
{
database
:
edsstuff
,
table
:
a_part_table
}
.
Use
show
grant
to
get
more
details
.
Querying from a table or partition requires SELECT
privileges:
hive
>
SELECT
id
FROM
a_part_table
;
Authorization
failed
:
No
privilege
'Select'
found
for
inputs
{
database
:
edsstuff
,
table
:
a_part_table
,
columnName
:
id
}
.
Use
show
grant
to
get
more
details
.
hive
>
GRANT
SELECT
ON
TABLE
a_part_table
TO
USER
edward
;
hive
>
SELECT
id
FROM
a_part_table
;
The syntax GRANT
SELECT(COLUMN)
is currently accepted but does nothing.
You can also grant all privileges:
hive
>
GRANT
ALL
ON
TABLE
a_part_table
TO
USER
edward
;
It is very common for Hive tables to be partitioned. By
default, privileges are granted on the table level. However, privileges
can be granted on a per-partition basis. To do this, set the table
property PARTITION_LEVEL_PRIVILEGE
to
TRUE
:
hive
>
CREATE
TABLE
authorize_part
(
key
INT
,
value
STRING
)
>
PARTITIONED
BY
(
ds
STRING
);
hive
>
ALTER
TABLE
authorization_part
>
SET
TBLPROPERTIES
(
"PARTITION_LEVEL_PRIVILEGE"
=
"TRUE"
);
Authorization
failed
:
No
privilege
'Alter'
found
for
inputs
{
database
:
default
,
table
:
authorization_part
}
.
Use
show
grant
to
get
more
details
.
hive
>
GRANT
ALTER
ON
table
authorization_part
to
user
edward
;
hive
>
ALTER
TABLE
authorization_part
>
SET
TBLPROPERTIES
(
"PARTITION_LEVEL_PRIVILEGE"
=
"TRUE"
);
hive
>
GRANT
SELECT
ON
TABLE
authorization_part
TO
USER
edward
;
hive
>
ALTER
TABLE
authorization_part
ADD
PARTITION
(
ds
=
'3'
);
hive
>
ALTER
TABLE
authorization_part
ADD
PARTITION
(
ds
=
'4'
);
hive
>
SELECT
*
FROM
authorization_part
WHERE
ds
=
'3'
;
hive
>
REVOKE
SELECT
ON
TABLE
authorization_part
partition
(
ds
=
'3'
)
FROM
USER
edward
;
hive
>
SELECT
*
FROM
authorization_part
WHERE
ds
=
'3'
;
Authorization
failed
:
No
privilege
'Select'
found
for
inputs
{
database
:
default
,
table
:
authorization_part
,
partitionName
:
ds
=
3
,
columnName
:
key
}
.
Use
show
grant
to
get
more
details
.
hive
>
SELECT
*
FROM
authorization_part
WHERE
ds
=
'4'
;
OK
Time
taken
:
0
.
146
seconds
Regular users will want to create tables and not bother with
granting privileges to themselves to perform subsequent queries, etc.
Earlier, we showed that you might want to grant ALL
privileges, by default, but you can narrow
the allowed privileges instead.
The property hive.security.authorization.createtable.owner.grants
determines the automatically granted privileges for a table given to the
user who created it. In the following example, rather than granting
ALL
privileges, the users are
automatically granted SELECT
and
DROP
privileges for their own
tables:
<property>
<name>
hive.security.authorization.createtable.owner.grants</name>
<value>
select,drop</value>
</property>
Similarly, specific users can be granted automatic privileges on
tables as they are created. The variable
hive.security.authorization.createtable.user.grants
controls this behavior. The
following example shows how a Hive administrator admin1
and user edward
are granted privileges to read every
table, while user1
can only create
tables:
<property>
<name>
hive.security.authorization.createtable.user.grants</name>
<value>
admin1,edward:select;user1:create</value>
</property>
Similar properties exist to automatically grant privileges
to groups and roles. The names of
the properties are hive.security.authorization.createtable.group.grants
for groups and hive.security.authorization.createtable.role.grants
for roles. The values of the properties follow the same format
just shown.
[25] See https://issues.apache.org/jira/browse/HIVE-78, “Authorization infrastructure for Hive,” and a draft description of this feature at https://cwiki.apache.org/Hive/languagemanual-auth.html.