Chapter 11. Security

A NOTE FOR EARLY RELEASE READERS

This will be the 6th chapter of the final book. Please note that the Github repo will be made active later on.

If you have comments about how we might improve the content and/or examples in this book, or if you notice missing material, please reach out to the author at [email protected].

Authentication

Authentication is the process of proving an identity to a system and essential to any secure system. There are a variety of authentication methods in computer systems including Kerberos, LDAP, Certificate, JWT, and Password authentication. It is up to each system to support the particular authentication method. In Presto, clients can authenticate to the Presto cluster by one of three methods:

Kerberos

Password via LDAP

Certificates

Json Web Token (JWT)

By default, there is not authentication configured in Presto and anyone who can access the Presto coordinator can connect. However, authentication is just one piece. Once the principal is authenticated, they are assigned the privileges of what they are able to do. What you can do is referred to as authorization and governed by the SystemAccessControl and ConnectorAcessControl. We will describe in more detail in the authorization section of this chapter. For the purpose of this section, let’s assume for now that once authenticated, the user can perform any action. By default this is true for accessing anything as the system level such as querying the system catalog. But the defaults will vary for each connector and the default implementation of the ConnectorAccessControl.

Password and LDAP Authentication

Password authentication is an authentication method you probably use every day. By providing a username and password to the system, you are proving who you say you are by providing something you know. Presto supports this basic form of authentication using its password authenticator. The Password authenticator receives the username and password credentials from the client, validates them, and creates a principal. The password authenticator is designed to support custom password authenticators deployed as a plugin in Presto. We’ll learn more about plugins in the Developing in Presto {Chapter X}. Currently, authentication based on the password authenticator distributed in Presto is the LDAP authenticator. However, you could write your own custom password authenticator such as retrieving from a file or a database.

When using the LDAP authenticator, a user passes a username and password to the Presto coordinator. This can be done from the CLI, JDBC driver, or any other client that supports passing the username and password. The Presto coordinator will then validate these credentials with an external LDAP service and then create the principal from the username. In order to enable LDAP authentication with Presto, you need to add to the config.properties file on the Presto coordinator.

http-server.authentication.type=PASSWORD

By setting the authentication type to PASSWORD, we are telling the Presto coordinator to use the password authenticator to authenticate. Because there could be other password authenticators packaged in Presto in the future or you may have your own custom password authenticator, we must configure it further. We do that by creating an additional file in the etc directory password-authenticator.properties.

password-authenticator.name=ldap
ldap.url=ldaps://ldap-server:636
ldap.user-bind-pattern=${USER}@prestosql.io

The password-authenticator.name specifies to use the ldap plugin for the password authenticator. If you wrote your own plugin or there is a future plugin in Presto, the name would correspond to that. The following lines are configurations specific to the password authenticator. In the case of LDAP this is the ldap-url and the ldap-user-bind-pattern.

Note

Presto requires using secure LDAP referred to as LDAPS. Therefore you need to make sure you have TLS enabled on your LDAP server. In addition, make sure the url for the ldap-url property uses ldaps:// and not ldap://. Because this communication occurs over TLS, you’ll need to import the LDAP server’s TLS certification to the truststore used by the Presto coordinator. Or if the LDAP server is using a certificate signed by a CA, you need to make sure that the CA chain is in the truststore.

In order to securely use the LDAP authenticator you also need to configure HTTPS access to the Presto coordinator. This will ensure that the password being sent from the client is not in cleartext over an unsecure network. We will discuss this the section on encryption.

How does Presto work with LDAP?

LDAP stands for Lightweight Directory Access Protocol and is an industry standard application protocol for accessing and managing information in a directory server. The data model in the directory server is a hierarchical data structure that stores identity information. We won’t elaborate on too many more details on what LDAP is or how it works. If you are interested in learning more there is plenty of information in books or on the web such as https://ldap.com. Let’s take a look at an example directory structure and how Presto can use LDAP to authenticate.

Figure 11-1. Example LDAP Directory Structure storing entries of users.

Given the directory, let’s say we want to authenticate user matt using the Presto CLI.

./presto-cli --user matt --password

Specifying --password will prompt to enter the password. When you configure the LDAP password authenticator, you will have set the the user bind pattern.

uid=${USER},OU=people,DC=prestosql,DC=io

In LDAP, there are several operation types to interact with the LDAP directory such as add, delete, modify, search, and bind. Bind is the operation used to authenticate clients to the directory server and what Presto uses for supporting LDAP authentication. In order to bind, you need to supply and identify and proof of identity such as a password. LDAP allows for different types of authentication, but user identity (also known as the distinguished name) and password is only supported by Presto for LDAP authentication.

When the user name is password to the Presto coordinator from the client, the CLI in our example, Presto will replace this user name in the bind pattern and send this as a security principal and the password as the security credentials as part of the bind request for LDAP. In our example, this principal uses to match the distinguished name in the directory structure is uid=matt,OU=people,DC=prestosql,DC=io. Each entry in the LDAP directory may consist of a number of attributes. One such attribute is userPassword for which the bind operation uses to match the password sent.

Figure 11-2. LDAP Authentication using the bind operator for an external LDAP service

Presto can also further restrict access based on group memberships. We will learn more about the importance of groups in the authorization sections. But by using groups you can assign privileges to a group so the users in that group inherit all the privileges of that group rather than having to manage privileges individually. Say for example, you want to only allow people in the engineering group authenticate with Presto. In our example, we want users matt and martin to be able to authenticate to Presto, but not user justin.

To further restrict users based on group membership, Presto allows you to specify additional properties in the password.properties file.

ldap.user-base-dn=OU=people,DC=prestosql,DC=io
ldap.group-auth-pattern=(&(objectClass=inetOrgPerson)(uid=${USER})(memberof=CN=developers,OU=groups,DC=prestosql,DC=io))

Taking our example, the above filter will restrict users from the base distinguished name that belongs to the developers group in LDAP. If user justin tries to authenticate, the bind will succeed since justin is a valid user (assuming he entered his password), but will be further filtered out because he does not belong to the developers group.

Authorization

In the previous sections we talked about authentication the various methods of proving who you are. However, in an environment and many users and sensitive data, you do not want any user who authenticates to be able to access any data. To restrict access based on users, we need to configuration authorization of what a user can do. We will first examine the SQL model in Presto in terms of what access controls exist. Then we will look at control access to Presto at the system level and connector level.

Connector Access Control

Recall the set of objects Presto exposes in order to query data. A catalog is the configured instance of a connector. A catalog may consist of a set of namespaces called schemas. And finally the schemas contain a collection of tables with columns and views on top of tables. Presto supports the SQL standard GRANT to grant privileges on tables and view to a user or role and also to grant user membership to a role. Today Presto supports a subset of privileges defined by the SQL standard. In Presto you can grant the following privileges to a table or view

SELECT

INSERT

DELETE

Let’s look at an example:

presto> GRANT SELECT on hive.ontime.flights TO matt;

In this example, the user running this query is granting to user matt the SELECT privilege on table flights that is in the ontime schema of the hive catalog. Optionally, you can specify the WITH GRANT OPTION that will allow the grantee to grant the same privileges to others. You can also specify more than one privilege by separating commas or by specifying ALL PRIVILEGES to grant SELECT, INSERT, and DELETE to the object.

presto> GRANT SELECT, DELETE on hive.ontime.flights TO matt WITH GRANT OPTION;
Note

In order to grant privileges you must possess the same privileges and the GRANT OPTION. Or you must be an owner of the table or view or member of the role that owns the table or view. At the time of writing this book, there is no way in Presto to alter the owner of the object so that must be done by the underlying data source. For example, using Apache Hive, you can run the following SQL statement:

ALTER SCHEMA ontime SET OWNER USER matt;

A role consists of a collection of privileges that can be assigned to a user or another role. This makes the ease of administering privileges for many users easier. By using roles you are separating the need to assign privileges directly to Presto. Instead you assign the privileges to a role and then users assigned to that role will inherit those privileges. Using roles to manage privileges is generally the best practice.

Let’s reuse our example of the fights table and use roles.

presto> CREATE ROLE admin
presto> GRANT SELECT, DELETE on hive.ontime.flights TO admin;
presto> GRANT admin TO user matt, martin;

Now let’s say you wish to remove privileges from a user. Instead of having to remove all the privileges on object granted to a user, you can simply revoke the user from the role.

presto> REVOKE admin from user matt;

In addition to remove users from a role, you can also revoke privileges from a role so that all users of the role will no longer by have that privilege.

presto> REVOKE DELETE on hive.ontime.flights FROM admin;

In this example, we’ve revoked the DELETE privileges on the flights from the admin role. However, the admin role and its members will still have the SELECT privilege.

Users may belong to multiple roles where the roles may have distinct or an intersection of privileges. When a user runs a query Presto will examine the privileges that the user has either assigned directly or through the roles. If you wish to only use the privileges of a single role you belong to you can use the SET ROLE command. For example, say you belong to both an admin role and developer role. But only want to be using the privileges assigned to the developer role.

presto> SET ROLE developer;

You can also set the role to ALL so that Presto examines your privileges for every role you belong. Or you can set it to NONE.

Note

As of the writing of this book, only the Hive connector supports roles and grants. Because this depends on the connector implementation, each connector needs to implement the ConnectorAccessControl to support this SQL Standard functionality.

System Access Control

System Access Control enforces authorization at the global Presto level before authorization at the connector level (Connector Access Control). As we learned in the authentication section of this chapter, security principals are the entity used to authenticate to Presto. The principal may be an individual user or a service account. Presto also separates the principal for authentication from the user who is running queries. For example, multiple users may share a principal for authenticating, but run queries as themselves. By default, Presto will allow any principal that can authenticate to run queries as anyone else.

$ presto-cli --krb5-principal [email protected] --user bob

This is generally not what you would want to run in a real environment and requires additional configuration. Presto comes with a set of built-in System Access Control plugins for configuration. We will also learn about in {{Chapter X}} that Presto SPI allows for further customization by writing and providing your own Presto plugin. In this section we will discuss the built-in plugins available.

Allow All

By default, this System Access Control plugin allows any authenticated user to do anything. This is the least secure and not recommended when deploying in a production environment. While it is the default you can set it explicitly by creating an access-control.properties within the etc cofniguration directory.

access-control.name=allow-all

Read Only

The read only System Access Control plugin is slightly more secure in that it only allows any operation that is reading data or metadata from Presto. This would include SELECT queries, but not CREATE, INSERT, or DELETE queries. You can enable this plugin by creating an access-control.properties within the etc cofniguration directory.

access-control.name=read-only

File Based System Access Control

The file based System Access Control allows you to specify certain access control rules for catalog access by users and what users a principal can identify as. These rules are specified in a file that you maintain.

Using the file based plugin you can create access control rules for catalog access by users. When using file based System Access Control, all access to catalogs are denied unless there is a matching rule for a user that explicitly gives them permission. You can enable this plugin by creating an access-control.properties within the etc cofniguration directory.

access-control.name=file
security.config-file=etc/rules.json

You’ll notice there another another configuration property security.config-file specifying the location of the file based rules. It must be a json file of a certain format and can exist anywhere Presto has access to, but best practice is to keep in the same directory as all other configuration files.

{
“catalogs”: [
{
“user”: “admin”,
“catalog”: “system”,
“allow”: true
},
{
“catalog”: “hive”,
“allow”: true
},
{
“user”: “alice”,
“catalog”: “postgresql”,
“allow”: true
}
{
“catalog”: “system”,
“allow”: false
}
]
}

Rules are examined in order and the first rule that matches is used. In this example, the admin user is allowed access to the system catalog where all other users are denied. We mentioned earlier that all catalog access is denied by default unless there is matching rule. The exception is that all users have access to the system catalog by default. Therefore, if you want a different behavior, then you must override the rule. This example also grants access to the hive catalog for all users but only the postgresql catalog the user alice. While this is useful, access controls are only being enforced at the catalog level.

As we mentioned earlier, by default any authenticated principal can run queries as any user. This is generally not desirable as it allows users to potentially access data as someone else. And if the connector has implemented a Connector Access Control, it means that one can authenticate with a principal and pretend to be another user to access data they should not have access to. Therefore it is important to make enforce an appropriate matching between the principal and the Presto user running the queries.

Let’s use as an example where we want to set the user name to that of the LDAP principal.

{
“catalogs”: [
{
“allow”: true
}
],
“principals”: [
{
“principal”: “(.*)”,
“principal_to_user”: “$1”,
“allow”: true
}
]
}

This can be further extended to enforce the user to use exactly their Kerberos principal name. In addition, we can match the user name to a group principal that may be shared.

“principals”: [
{
“principal”: “([^/]+)/?.*@prestosql.io”,
“principal_to_user”: “$1”,
“allow”: true
},
{
“principal”: “[email protected]”,
“user”: “alice|bob”,
“allow”: true
}
]

Encryption

Encryption is a process of transforming data from a readable form to an unreadable form so that only authorized users are able to transform it back to a readable form. This prevents any malicious attacker who intercepts the data from the intended user from being able to read it. There are standard cryptographic techniques Presto uses to encrypt data in motion and at rest.

Plain Text

SSN: 123-45-6789

Encrypted Text

5oMgKBe38tSs0pl/Rg7lITExIWtCITEzIfSVydAHF8Gux1cpnCg=

Encrypted data in motion includes:

Data transfer between the client and Presto coordinator (Figure 3.A)

inter data transfer within the Presto cluster (Figure 3.B)

Data transfer from the data sources (Figure 3.C)

Encryption of data at rest includes

Data at rest in the data source (Figure 4.A)

Spilling to disk functionality (Figure 4.B)

Each of these options can be configured in Presto independently. For example, you could configure Presto to encrypt client to coordinator communication (Figure 3.A) and inter cluster communication (Figure 3.B), but leave data at rest encryption unconfigured. Or you may choose to only configure Presto for encrypted client to coordinator communication. While each combination is possible to configure, there some combinations that wouldn’t make much sense. For example, only configuring inter cluster communication but leaving client to coordinator communication unencrypted would be an unlikely use case. Later in the chapter we will provide a reference architecture as part of best practices.

Figure 11-3. Options in Presto for encryption of data in transit. AWS S3 is just one example of a data source.
Figure 11-4. Options in Presto for encryption in transit. AWS S3 is just one example of a data source.

As we’ve learned, external and internal communication in Presto happens exclusively over HTTP. In order to secure communication between the client and coordinator and inter cluster communication, Presto can be configured to use Transport Layer Security (TLS) on top of HTTP, referred to as HTTPS. TLS is a cryptographic protocol for encrypting data over a network and HTTPS is using TLS to secure the HTTP protocol. You’re probably most familiar with HTTPS from visiting websites as most use HTTPS these days. For example, if you’re logged into your online bank account, HTTPS is used to encrypt data between the webserver and your web browser. On modern web browsers, you’ll often see the padlock icon indicating the data transfer is secure and the server you’ve connected to is who they say that are.

Figure 11-5. Padlock indicating data secured communication from oreilly.com and our web browser.
Note

TLS is the successor to Secure Sockets Layer (SSL) and sometimes the terms are used interchangeable. SSL is an older protocol with known vulnerabilities and considered insecure. Because of the prominence and name recognition of SSL, when someone refers to SSL they often are referring to TLS. We will correctly use the term TLS when encryption.

Encrypting Presto Client to Coordinator Communication

It’s important to secure the traffic between the client for two reasons. First, if you are using LDAP authentication, the password is in clear text. And with Kerberos authentication, the SPNEGO token could be intercepted as well. Additionally, any data returned from queries is in plain text as well unless secured over TLS.

Understanding the lower level details of the TLS handshake for encryption algorithms is not crucial to understand how Presto encrypts network traffic. But it is important to understand more about certificates as you will need to create and configure them for use by Presto. Figure 5 depicts communications between a client web browser and web server secured over HTTPS. This is exactly how HTTPS communication is established to the Presto Coordinator when using a Presto client such as the Presto Web UI or CLI.

Figure 11-7. Secured communication over HTTP between Presto Clients and the Presto Coordinator

A TLS certificate relies on public-key (asymmetric) cryptography using key pairs:

A public key which is available to anyone.

A private key which is kept private by the owner.

Anyone can use the public key to encrypted messages that can only be decrypted by those who have the private key. Therefore any message encrypted with the public key should be secret so long as only the owner of the key pair doesn’t share or has its private key stolen. A TLS certificate contains information such as the domain the certificate was issued, the person or company it was issued to, the public key, and several other items. This information is then hashed and encrypted using a private key. This process of signing the certificate creates the signature to include in the certificate. These certificate is often signed by a trusted Certificate Authority such as DigiCert or GlobalSign. These authorities will verify that the person requesting a certificate to be issued is who they say they are and own the domain as stated in the certificate. The certificate is signed by the authorities private key for which their public keys are made widely available and typically installed by default on most operating systems and web browsers. The process of signing the certificate is important during the TLS handshake to verify authenticity. The client uses the public key of the pair to decrypt the signature and compare to the content in the certificate to make sure it was not tampered. Now that we understand the basics of TLS, let’s look at how we can encrypt data between the Presto clients and coordinator.

In order to enable HTTPS on the Presto coordinator you need to set additional properties in the config.properties file.

http-server.https.enabled Set this to true to enabled HTTP for Presto. By default this is set to false.
http-server.https.port Specify the HTTPS port to use. 8443 is a common port number.
http-server.https.keystore.path Specify the path to the Java Keystore file that stores the private key and certificate used by Presto for TLS
http-server.https.keystore.key Specify the Java Keystore password Presto needs to access the keystore.
Note

Even though we are configuring Presto to use HTTPS, by default HTTP Is still enabled as well. Enabling HTTPS does not disabled HTTP. If you wish to do disable HTTP you must add http-server.http.enabled=false into your config.properties. However, you may want to keep HTTP enabled until you have completed configuring a secured Presto environment. Testing how or if something works over HTTP may be a good way to debug an issue if you run into complications during configurations.

Take for example the following lines to add to your config.properties.

http-server.https.enabled=true
http-server.https.port=8443
http-server.https.keystore.path=/etc/presto/presto_keystore.jks
http-server.https.keystore.key=slickpassword

Remember to restart the the Presto coordinator after you update the properties file.

Creating Java Keystore Java Truststore

The Java keytool2 is a command line tool for creating and managing keystores and truststores. Let’s go over a simple example for creating a Java keystore and truststore. These commands are included in the accompanying Git repository we’ve created for the book. For simplicity, we will use self-signed certificates. But on the Git repository we also provide examples for simulating a CA to demonstrate how the certificate chain works.

Let’s first create the keystore to be used by the Presto coordinator. The following keytool command will create a public/private key pair and wrap the public key in a certificate that is self-signed.

$ keytool -genkeypair
-alias presto_server
-dname CN=*.example.com
-validity 10000 -keyalg RSA -keysize 2048
-keystore keystore.jks
-keypass password
-storepass password

The keystore.jks generate is the one you want to specify in the property. The password is the one you want to specify in the property. In this example we’re using what is referred to as a wildcard certificate. We specify the Common Name (CN) to be *.example.com. This certificate can be shared by all the nodes on the Presto cluster assuming they belong to the same domain. This certificate would work with coordinator.example.com, worker1.example.com, worker2.example.com, etc. The disadvantage to this approach is any node under the example.com domain can use the certificate. You could limit the subdomains by using a Subject Alternative Name (SubjectAltName), where you list the subdomains. This allows you to create a single certificate to be shared by a limited specific list of hosts coordinator.example.com, worker1.example.com, worker2.example.com, etc. An alternative approach is to create a certificate for each node requiring you to explicitly define the full domain for each. This adds an administrative burden but makes it challenging when scaling a Presto cluster as the new nodes will require certificates bound to the full domain.

For simplicity, we’re using a self-signed certificate for the Java keystore. When connecting a Presto client to the Presto coordinator, the Presto coordinator will send it’s certificate for the client to verify its authenticity. A truststore is used to verify the authenticity by containing the Presto coordinator certificate if self signed, or a certificate chain if signed by a CA. We’ll discuss later about how to use a certificate chain of a CA. Because the keystore also contains the certificate, you could simply copy the keystore to the client machine and use that as the truststore. However, that is not secure as the keystore also contains the private key which was want to key secret. In order to create a custom trust store, we will export the certificate from the keystore and import into a trust store.

First on the coordinator where your keystore was created we will export the certificate.

$ keytool --exportcert
-alias presto_server
-file presto_server.cer
-keystore keystore.jks
-storepass password

This will create a file presto_server.cer. Next we will create the truststore using this certificate.

$ keytool --importcert
-alias presto_server
-file presto_server.cer
-keystore truststore.jks
-storepass password

Since the certificate is self-signed, this keytool command will prompt you if you want to trust this certificate. We want to trust it so simply type yes and the truststore.jks is created. Now you can safely distribute this trust store to any machine you wish to connect the the Presto coordinator from a client.

Now that we have the Presto coordinator enabled with HTTPS using a keystore and we’ve created a truststore for the clients we can securely connect to Presto such that the communication between the client and coordinator is encrypted. Here is an example, to use the Presto CLI.

$ ./presto-cli
--server https://presto-coordinator.example.com:8443
--truststore-path ~/truststore.jks
--truststore-password password

Encrypting Inter Presto Cluster Communication

Next, let’s look at how to secure inter cluster communication. Inter cluster communication means that the communication between the Presto workers and Presto coordinator uses HTTP over TLS as we did with the client to Presto coordinator encryption. While the client to Presto coordinator communication may be over an untrusted network, the internal Presto cluster is generally deployed on a more secure network making secured inter cluster communication more optional. However if you’re concerned about a malicious being able to get on to the network of the Presto cluster, communication can be encrypted.

As with securing client to coordinator communication, intern cluster communication also relies on the same keystore. This keystore we created on the coordinator must be distributed to all the Presto worker nodes.

Figure 11-8. Secured communication over HTTP between the nodes in the Presto Cluster

The same method of performing the TLS handshake to establish trust between the client and server and create an encrypted channel works the same way for the inter cluster communication. Communication in the cluster is bi-directional meaning a node may act as a client sending the HTTPS request to another node. Or a node can act as a server when it receives the request and presents the certificate to the client for verification. Because the node can act as both, it needs the keystore which contains both the private key and public key wrapped in the certificate.

As when we configured the coordinator for client to coordinator secured communication, the workers must also enable HTTPS. On the coordinator and each worker node, you need to add the following lines to add to your config.properties.

http-server.https.enabled=true
http-server.https.port=8443
internal-communication.https.required=true
discovery.uri=https://coordinator.example.com:8443
internal-communication.https.keystore.path=/etc/presto/presto_keystore.jks
internal-communication.https.keystore.key=slickpassword

Remember to restart the the Presto workers after you update the properties file. Now you have an entirely secured the internal and external communication safe from eavesdroppers on the network trying to intercept data from Presto.

BEING IMPORTANT NOTE

Once you have everything working, it’s important to disable the HTTP enabled by setting http-server.http.enabled=false in the config.properties. Otherwise a user could still connect over HTTP.

END IMPORTANT NOTE

Certificate Authority vs. Self-Signed Certificates

When you try out Presto for the first time and work to get it configured securely, it’s easiest to use a self signed certificate. However, in practice it may not be allowed in your organization as they are much less secure and susceptible to attacks in certain situations. Therefore you may use a certificate that was digitally signed by a CA.

In this approach you still create a keystore as we discussed earlier. But you then create a Certificate Signing Request (CSR) to send the CA to be signed. The CA will verify you are who you say you are and issue you a certificate signed by them. The certificate is them imported into your keystore. This CA signed certificate is presented to the client instead of the original self-signed one.

The more interesting part is with the Java truststore. Java provides a default trust store which may contain the CA already. In which case the certificate presented to the client can be verified by the default truststore. Using the Java default truststore is something cumbersome and may not contain the CA. Or perhaps your organization has its own internal CA for issuing organizational certifications to employees and services. So if you’re using a CA, it’s still recommended to create your own truststore for Presto to use. However, you can import the CA certificate chain instead of the actual certificates being used for Presto. A certificate chain is a list of 2 or more TLS certificates where each certificate in the chain is signed by the next one in the chain. At the top of the chain is the root certificate and this is always self-signed by the CA itself. It is used to sign the downstream certificates known as intermediate certificates. When you are issued a certificate for Presot, it will be signed by an intermediate certificate which is the first one in the chain. The advantage to this is if there are multiple certificates for multiple Presto clusters or that certificates are reissued, you don’t need to reimport them into your truststore each time. It’s less often for the CA to reissue an intermediate or root certificate.

Figure 11-9. Presto using a certificate issued by a CA. The truststore only contains the intermediate and root certificates of the CA. The TLS certificate from the Presto coordinator is verified using this certificate chain in the client truststore

Let’s say you had your Presto certificate signed by a CA. In order for the Presto client to trust it, we need to create a trust store containing the intermediate and root certificates. As in the earlier example where we imported the Presto self-signed certificate, we will do the same to import the CA certificate chain.

$ keytool --importcert
-alias presto_server
-file root-ca.cer
-keystore truststore.jks
-storepass password
$ keytool --importcert
-alias presto_server
-file intermediate-ca.cer
-keystore truststore.jks
-storepass password

Note that there may be more than a single intermediate certificate and we’re using a single one here for simplicity.

Certificate Authentication

One of the authentication methods in Presto is Certificate Authentication. As we learned in the encryption section, the communication between the client and Presto coordinator can be secured using TLS. As part of the TLS handshake, the server provides the client a certificate so that the client can authenticate the server is who they say that are. Mutual TLS is where the client as a part of the handshake provides a certificate to the server to be authenticated. The server will verify the certificate in the same way we have seen the client verify the certificate. The server will have a trust store that contains the CA chain or the self-signed certificate for verification.

Figure 11-10. Secured communication over HTTP between Presto Clients and the Presto Coordinator using mutual TLS

In order to configure the Presto coordinator for mutual TLS authentication, we need to add some additional properties in the config.properties file. We will add in bold the new properties we need to add.

http-server.http.enabled=false
http-server.https.enabled=true
http-server.https.port=8443
http-server.https.keystore.path=/etc/presto/presto_keystore.jks
http-server.https.keystore.key=slickpassword
http-server.https.truststore.path=/etc/presto/presto_truststore.jks
http-server.https.truststore.key=slickpassword
node.internal-address-source=FQDN
internal-communication.https.required=true
internal-communication.https.keystore.path=/etc/presto/presto_keystore.jks
internal-communication.https.keystore.key=slickpassword
http-server.authentication.type=CERTIFICATE

The property http-server.authentication indicates the type of authentication to use. In this case Presto is using certificate authentication. This will indicate to the Presto coordinate to use the full TLS handshake for mutual authentication. In particular, the server side, Presto coordinator, will send a certificate request message as part of the full TLS handshake to the client to provide the signed certificate for verification by the Presto coordinator. In addition, we configure the truststore on the coordinator in order to verify the certificate presented by the client. Let’s use our command to connect from the CLI to Presto.

$ ./presto-cli
--server https://presto-coordinator.example.com:8443
--truststore-path ~/truststore.jks
--truststore-password password
--user matt
presto> select * from system.runtime.nodes;
Error running command: Authentication failed: Unauthorized

You’ll find that authentication failed. This is because the client did not use the keystore that have the certificate to provide the client certificate to the Presto coordinator for mutual authentication. Let’s modify our command to include a keystore. Note that this keystore is different than the keystore on the Presto cluster. This keystore specifically contains the key pair for the client. Let’s first create our keystore on the client side.

$ keytool -genkeypair
-alias presto_server
-dname CN=matt
-validity 10000 -keyalg RSA -keysize 2048
-keystore client-keystore.jks
-keypass password
-storepass password

In this example, you’ll see that we set the CN to user matt. In this case, it’s more than likely this is a self signed certificate or an organization has its own internal CA. Let’s specify the client keystore in the CLI command

$ ./presto-cli
--server https://presto-coordinator.example.com:8443
--truststore-path ~/truststore.jks
--truststore-password password
--keystore-path ~/client-keystore.jks
--keystore-password password
--user matt
presto> select * from system.runtime.nodes;
Query failed: Access Denied: Authenticated user AuthenticatedUser[username=CN=matt, principal=CN=matt] cannot become user matt

Now that we have authenticated, authorization is failing. Recall that authentication proves who you are where authorization controls what you can do. In the case of certificate authentication, Presto will extract the subject distinguished name from the X.509 certificate3. This value is used the principal to compare to the user name. Recall the user name is either specified exclitily such as using the --user option in the CLI. Or the CLI using the operating system user name implicitly. In this case the user matt was being compared to the distinguished common name in the certificate CN=matt. One workaround is to simply pass the option to the CLI --user CN=matt, but that it cumbersome. Let’s leverage the built in file-based System Access Control we learned earlier for some customization.

First we need to create a file in the Presto installation directly etc/access-control.properties on the Presto coordinator..

access-control.name=file
security.config-file=/etc/presto/rules.json

Next we need to create the rules.json file on the Presto coordinator as the path location specified in the access-control.properties file.

{
“catalogs”: [
{
“allow”: true
}
],
“principals”: [
{
“principal”: “CN=(.*)”,
“principal_to_user”: “$1”,
“allow”: true
}
]
}

In this example, we are matching a princical regex with a capturing group. We then use that capturing group to map the principal to user. In our example, the regex will match CN=matt where matt is part of the capturing group to map to the user. Once you create these files and restart the Presto coordinator both the certificate authentication and authorization of that subject principal to the user will work.

presto> select * from system.runtime.nodesG
-[ RECORD 1 ]+--------------------------------------------------------
node_id | i-0779df73d79748087
http_uri | https://coordinator.example.com:8443
node_version | 312
coordinator | true
state | active
-[ RECORD 2 ]+--------------------------------------------------------
node_id | i-0d3fba6fcba08ddfe
http_uri | https://worker-1.example.com:8443
node_version | 312
coordinator | false
state | active

Hive Connector Security

In {Chapter X} we learned about how to use the Hive connector to query data from distributed storage such as HDFS, S3, Microsoft Azure Storage, Google Cloud Storage, and others.

Reading encrypted data from the storage

Internal encrypting the data when reading from the data source to Presto

Authentication to the Hive Metastore used as the catalog for the Hive connector

Hive Connector Authentication

Apache Hadoop comprises of many different components such as HDFS, NameNode, Zookeeper, HiveServer, Hive Metastore Service, YARN, and others. We learned earlier in this chapter about Presto’s integration with Kerberos as one of the options for client to Presto authentication. Hadoop has fairly standardized on Kerberos as the de facto authentication mechanism and any enterprise using Hadoop has secured it with Kerberos. This means that any software either internal Hadoop components or external to Hadoop need to be able to authenticate to Hadoop using Kerberos. The Presto Hive connector has Kerberos integration and when configuring the Hive connector to query data from HDFS in a secured Hadoop cluster, it must be configured. Recall from {Chapter X} on the Hive connector, that Hive is essentially 3 parts.

Hive runtime

Hive metastore

Hive warehouse table format on HDFS (or other distributed storage)

As we learned, the Hive connector is a bit of a misnomer because it does not use the Hive runtime. It replaces the Hive runtime in this stack but still leverages the Hive Metastore Service and knowledge of the Hive table format in order to query data directly on disk. In order for Presto to use those two, Presto needs to authenticate to the Hive Metastore Service and HDFS. The section will focus on the Presto specific integration points with Hadoop. For more information about security with Hive or Hadoop see Programming in Hive (O’Reilly) and Hadoop: The Definitive Guide (O’Reilly).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset