Chapter 9: An Overview Of Security Aspects

In Chapter 8, Exploring the Admin User Interface, we learned about the admin user interface, which provides several features to help manage the cluster. The admin user interface is also very useful in debugging issues as it provides metrics about the health of the cluster, transactions, and queries.

In this chapter, we will learn about security. The increase in cloud usage has given rise to an enormous amount of surface attacks. Now, hackers have various ways to compromise a given system and leak its data. Ransomware, phishing, malware, spyware, and threatware are some of the most popular programs that intend to cause harm to your systems and leak their data, either to gain popularity or money. Cloud cybersecurity experts are the most sought-after in today's world. Since a transactionally distributed SQL database will be at the heart of any core infrastructure, it's of the utmost importance to pay attention to the security aspects.

In this chapter, we will start by providing a brief introduction to various aspects of security, followed by a deeper discussion of each of those aspects and various options that are available when using CockroachDB. To do this, we will cover the following topics:

  • Introduction to security concepts
  • Client and node authentication
  • Authorization mechanisms
  • Data encryption at rest and in flight
  • Audit logging
  • RTO and RPO
  • Keeping the network secure
  • Security best practices

Technical requirements

The examples in this chapter require you to have CockroachDB installed. If you still haven't done so, please refer to the Technical requirements section of Chapter 2, How Does CockroachDB Work Internally?.

Introduction to security concepts

Authentication is required for a SQL client that executes queries against a CockroachDB cluster and for nodes in a cluster that communicate with each other. In this section, we will go over some of the available options for client and node authentication.

Authorization is about deciding who can access what resources. In this section, we will discuss users, roles, and configuring privilege access to various schema objects.

Data at rest refers to data when it's stored on a physical storage device. Encrypting the data that's on a storage device renders it unreadable, even when a hacker gets hold of the encrypted data. Data in flight refers to the data that's on-wire when it's being transferred between the client and the CockroachDB cluster or between the nodes in a CockroachDB cluster. It is important to encrypt the data on-wire as it makes it useless when some middleman manages to sniff the data.

Audit logging is a log collection process that keeps track of all the activities that were performed on the data, including the time, the client's device information, the query, the operation, the event, and the user. This information is very useful if a data leak occurs and we want to investigate or ensure our system is compliant with certain standards.

The recovery time objective (RTO) and recovery point objective (RPO) are two important parameters when it comes to disaster recovery and data protection. RTO refers to how soon we can resume operations, after a given failure, while RPO refers to the maximum allowed time when we can restore the data. We will discuss these terms while providing examples and learn how to prepare for disaster recovery later in this chapter.

With deployments that involve on-premises, multi-cloud, and hybrid cloud environments, network security is key to denying hackers from gaining access to the data. In this section, we will briefly cover some of the options for providing maximum network security.

In the last section, we will list some of the best security practices.

In the next section, we will learn about authentication and some of the options we can use to provide authentication in CockroachDB.

Client and node authentication

Authentication is the process of verifying the identity of a system that is making a request. In the context of CockroachDB, this can be a client executing queries on a CockroachDB cluster or the nodes in a cluster that are talking to each other. Authentication can be achieved by using certificates and keys. Let's look at an example. Let's assume that foo and bar want to talk to each other and that before they start talking, they want to ensure they are talking to each other. First, we must understand the concept of public-private keys. Any message that you encrypt with a public key can be decrypted using its corresponding private key. This pair is supposed to be unique in that no other key can be used for decryption. Also, they have to be different. The following diagram shows how public key encryption works:

Figure 9.1 – Public key encryption

Figure 9.1 – Public key encryption

So, going back to our example, foo and bar have a pair of public and private keys. At the beginning of their communication, their public keys should be exchanged. During this exchange, both parties need a way to know if the public key is coming from an intended recipient, not an imposter, and that the public key is authentic. For example, if foo shares its public key with bar, then bar should have a way to ensure that it's coming from foo and that the public key belongs to foo. This is where a certificate authority (CA) comes into the picture. The following diagram shows how a digital certificate is generated by the CA once a CSR request is received from the applicant:

Figure 9.2 – Digital certificate generation by CA

Figure 9.2 – Digital certificate generation by CA

A CA is an entity that issues certificates for other entities. Let's say that baz is a CA. Now, baz can issue a certificate on behalf of foo, which contains foo's public key and some more identity information about foo. This certificate is signed by baz's private key. foo and bar both have the CA's (baz's) public key. So, we now have baz's certificate, which is used to sign foo and bar's certificates. We also have the certificates of foo and bar that have been signed by the CA, which contain their public keys. When the communication between foo and bar starts, they can exchange these certificates and also verify their authenticity as they are issued by the same CA, by using CA's public key. After that, they can securely talk to each other.

First, let's look at some of the CockroachDB commands that can be used to generate certificates and keys.

Generating certificates and keys

The following are some of the options you have to generate certificates and keys that can be used for authentication:

  • create-ca: Creates the self-signed CA. This can be used to create and authenticate certificates for all the nodes within the cluster. You can also use an external CA for this.
  • create-node: Creates a certificate and key for a specific node in the cluster. You can specify all the addresses (IP address or hostname) to reach this node.
  • create-client: Creates a certificate and key for a user accessing the cluster from a client. If you have multiple users, you should generate a separate certificate for each user.
  • list: Lists the certificates and keys that were found in the CA. These are passed in the input argument.

The following commands can be used to generate a certificate and key for the CA, node, and client:

Generate CA certificate.

$ cockroach cert create-ca

--certs-dir=<certs_directory>

--ca-key=<CA_key_directory>

The following commands can be used to generate the node's certificate and key. Any number of hostnames or addresses can be used to reach the node:

$ cockroach cert create-node

<hostname_1_of_node>

<hostname_2_of_node>

--certs-dir=<certs_directory>

--ca-key=<CA_key_directory>

The following commands can be used to generate the client's certificate and key:

$ cockroach cert create-client <user_name>

--certs-dir=<certs_directory>

--ca-key=<CA_key_directory>

The following command lists the certificates and keys in a given directory:

$ cockroach cert list

--certs-dir=<certs_directory>

You can also use the OpenSSL tool to generate the certificates and keys. This tool can be downloaded from https://www.openssl.org/source/. Please refer to the documentation on the aforementioned website to learn how to use this tool.

Next, we will learn how these certificates and keys can be used to provide client and node authentication.

Client authentication

Whenever a SQL client sends a request to a CockroachDB cluster, it's important to authenticate the client before serving the request. In the following subsections, we will look at some of the ways in which we can ensure client authentication.

Password authentication without TLS

This option is what we have used in all of our examples in this book. It's useful in cases where both the client and server are within a secured perimeter that doesn't need further transport layer security. However, you still need a user to identify yourself. Let's learn how to create a password for a user:

  1. First, create a user with a password, as follows:

    > CREATE USER kishen WITH LOGIN PASSWORD 'oldmacdonaldhadafarm(I(Io'

  2. Then, you can use that user from the client:

    $ cockroach sql --user=kishen --insecure

    #

    # Welcome to the CockroachDB SQL shell.

    # All statements must be terminated by a semicolon.

    # To exit, type: q.

    #

    Enter password:

Next, we will learn how to perform password authentication using transport layer security.

Password authentication with TLS

If you wish to perform password authentication with Transport Layer Security (TLS), you must provide the certificate directory in the input, as follows:

$ cockroach sql --user=<user_name>

--certs-dir=<certs_directory>

Single sign-on (SSO) authentication

Single sign-on (SSO) is an Enterprise-only feature that needs an external OAuth 2.0 identity provider. This requires the user to log in to an external identity provider in the admin user interface. Once authenticated, the user is redirected to the CockroachDB cluster through a callback URL. An ID token is used to authorize the callback. This ID token is a security token that contains claims about the authentication of a user by an OAuth 2.0 identity provider. The ID token will be in JSON Web Token (JWT) form, which is a JSON containing details about the authentication, including the expiry time of the token, the unique subject identifier, and the issuer identifier. CockroachDB can match the ID token to a SQL user and subsequently create a web session for that SQL user. With the web session, the user can access the admin user interface.

Generic Security Services API (GSSAPI) using Kerberos authentication

This is an Enterprise-only feature. You would typically need a Kerberos environment, a GSSAPI-compatible PostgreSQL client, a service principal, and a Kerberos client to be installed to provide this authentication mechanism. Please refer to the GSSAPI setup guide at https://www.cockroachlabs.com/docs/stable/gssapi_authentication.html for specific guidelines.

Node authentication

We learned how to generate certificates and keys in the Generating certificates and keys section. Once we've generated the node's certificates, we just have to make sure they are configured correctly when we start the node. If they're not, you will see communication errors around the TLS connection when the nodes are trying to talk to each other. You can provide the certificate directory when you start a node like so:

$ cockroach start

--certs-dir=<certs_directory>

--store=node1

--listen-addr=localhost:26257

--http-addr=localhost:8080

--join=localhost:26257, localhost:26258, localhost:26259

You can also use a tool such as Vault (https://www.vaultproject.io/) to dynamically generate short-lived certificates. This will make them available for all the nodes in a cluster and automatically rotate the certificates.

In the next section, we will learn about authorization and how to give granular permissions to schema objects.

Authorization mechanisms

Authorization involves controlling access to schema objects and giving the minimum required permissions and privileges for users and roles. Authorization becomes critical when the data's size, number of nodes, number of clients, number of clusters, and use cases grow in size.

In the context of CockroachDB, a user and role can be used interchangeably as there is no technical distinction between them. Even when executing CockroachDB commands, role and user can be used interchangeably in some cases. An example of this is as follows:

  1. First, we must execute SHOW ROLES:

    $ SHOW ROLES;

      username | options | member_of

    -----------+---------+------------

      admin    |         | {}

      root     |         | {admin}

    (2 rows)

    Time: 13ms total (execution 12ms / network 1ms)

  2. Next, we must execute SHOW USERS:

    $ SHOW USERS;

      username | options | member_of

    -----------+---------+------------

      admin    |         | {}

      root     |         | {admin}

    (2 rows)

    Time: 9ms total (execution 9ms / network 0ms)

As you can see, both the preceding outputs are identical. Henceforth, we will only use the term user for simplicity.

A user can be allowed to perform specific actions on specific schema objects. Also, if a user is a member of another user, all their privileges will be inherited.

A user can be created with the CREATE USER command, as shown in the following code:

CREATE USER 'kishen' WITH LOGIN PASSWORD 'oldmacdonaldhadaform(I(I0' VALID UNTIL '2021-12-25';

Make sure that you start your cluster by configuring the certificate's directory, without the insecure flag. Otherwise, you will see the following error message:

ERROR: setting or updating a password is not supported in insecure mode

SQLSTATE: 28P01

Next, let's look at giving privileges to specific objects.

Roles

You can create roles that perform specific functions. For example, you can create a role for just creating other roles, like so:

$ CREATE ROLE can_create_role WITH CREATEROLE CREATELOGIN;

When you run SHOW ROLES, the new role should appear:

$ SHOW ROLES;

     username     |             options              | member_of

------------------+----------------------------------+------------

  admin           |                                  | {}

  can_create_db   | CREATEDB, NOLOGIN                | {}

  can_create_role | CREATELOGIN, CREATEROLE, NOLOGIN | {}

  root            |                                  | {admin}

(4 rows)

You have the following options for roles:

  • CREATEROLE: Allows the role to CREATE, ALTER, and DROP non-admin roles
  • LOGIN: Allows the role to log in using client authentication
  • CONTROLJOB: Allows the role to pause, resume, and cancel jobs
  • CONTROLCHANGEFEED: Allows the role to create CHANGEFEED on tables where it has the SELECT privilege
  • CREATEDB: Allows the role to create or rename a database
  • VIEWACTIVITY: Allows the role to execute SHOW STATEMENTS and SHOW SESSIONS, including queries that are executed by other roles
  • CANCELQUERY: Allows the role to cancel a query
  • MODIFYCLUSTERSETTING: Allows the role to modify cluster settings
  • PASSWORD <password>: Allows a role to use a given password to authenticate itself
  • VALID UNITL <timestamp>: Specifies the validity of a given password

You can prepend the NO keyword to many of these roles to take away a given role. For example, NOCREATEROLE will not allow a role to create or manage non-admin roles. Similarly, NOCREATEDB will not allow a role to create or rename a database.

Privileges

You can use the GRANT command to give specific privileges to a given role. In the following example, we are providing complete privileges to the admin role on the startrek database:

$ GRANT ALL ON DATABASE startrek TO admin;

Now, you can look at all the grants on the startrek database and see that the admin role has ALL access:

$ SHOW GRANTS ON DATABASE startrek;

  database_name | grantee | privilege_type

----------------+---------+-----------------

  startrek      | admin   | ALL

  startrek      | root    | ALL

There are various levels where grants can be given, such as the database, schema, table, and type level. The ALL keyword specifies all the levels in the schema hierarchy.

Privilege can be given on the following constructs:

  • CREATE
  • DROP
  • SELECT
  • INSERT
  • DELETE
  • UPDATE
  • USAGE
  • ZONECONFIG
  • CONNECT

You can also manage users in the admin console in the cluster's SQL Users page.

In the next section, we will discuss data encryption at rest and in flight and how to configure the same.

Data encryption at rest and in flight

Encryption is the process of encoding plain text into an alternative unreadable format known as ciphertext. Decryption is the process of decoding the ciphertext back into its original plain text readable format. It is important to encrypt stored data, as well as the data that's being transferred between the client and nodes. In this section, we will learn how to achieve this.

Encryption at rest

Data at rest indicates the data that is stored on a physical storage system, such as a disk. Encryption at rest is an Enterprise-only feature. This feature allows you to encrypt all the files on the physical storage using Advanced Encryption Standard (AES).

Two types of keys are involved:

  • Store keys: These are provided by the user and are used to encrypt data keys.
  • Data keys: These are generated by CockroachDB and are used to encrypt all the files on disk. They are persisted in a registry file and are encrypted using the store key.

You can generate an AES-128 encryption key using the following command:

$ cockroach gen encryption-key -s 128 encr-keys/encryption.key

successfully created AES-128 key: encr-keys/encryption.key

Now, you can use the encryption key that you generated in the previous step in the input when you are starting the node:

$ cockroach start

--insecure

--store=node4,attrs=encrypt

--enterprise-encryption=path=node4,key=encr-keys/encryption.key,old-key=plain

--listen-addr=localhost:26259

--http-addr=localhost:8082

--join=localhost:26257,localhost:26258,localhost:26259

--locality=region=us-west

--background

Here, when the node is started, it uses the old key to read the data and then rewrites the data using the new key. If you want to disable the encryption, you can specify key=plain as the encryption configuration.

Now, let's look at encryption in flight.

Encryption in flight

CockroachDB uses TLS 1.2 to encrypt the client-node and inter-node communication. Please refer to the Client and node authentication section to learn how to configure certificates and keys for the TLS. If you don't need encryption for client-node communication, you can use the --accept-sql-without-tls flag when you are starting the node. This lets the node accept connections from clients without TLS.

In the next section, we will learn how to enable SQL audit logging to monitor activities on a table.

Audit logging

SQL audit logging is an important security feature that you can use to track all the activities that are occurring in a given CockroachDB cluster. Specifically, you can select tables whose activity must be tracked and only enable audit logging on them.

The following information gets logged during auditing:

  • Full query text.
  • The date and time of the query.
  • The client's IP address.
  • The application's name.
  • The user.
  • The event type, which will be SENSITIVE_TABLE_ACCESS. This indicates that it's an event related to SQL audit logging.
  • The name of the table that was queried.

Now, let's look at an example. We will enable audit logging for one of the tables in the default databases. startrek is a database that comes by default with the open source CockroachDB:

$ show databases;

  database_name | owner | primary_region | regions | survival_goal

----------------+-------+----------------+---------+----------------

  defaultdb     | root  | NULL           | {}      | NULL

  postgres      | root  | NULL           | {}      | NULL

  startrek      | root  | NULL           | {}      | NULL

  system        | node  | NULL           | {}      | NULL

Let's enable SQL audit logging for the episodes table, as shown here:

$ ALTER TABLE startrek.episodes EXPERIMENTAL_AUDIT

SET READ WRITE;

ALTER TABLE

Now, if you look at one of the latest entries in the CockroachDB SQL audit log, you should see an entry that indicates that SQL audit logging has been enabled for the startrek.episodes table. Usually, this SQL audit log can be found under the logs directory and the log file will be cockroach-sql-audit.log. The following is an entry from the SQL audit log that shows the ALTER TABLE statement that enabled SQL audit logging:

I211129 01:03:01.155934 25421 8@util/log/event_log.go:32 [n1,client=‹127.0.0.1:58274›,hostnossl,user=root] 3 ={"Timestamp":1638147781150167000,"EventType":"sensitive_table_access","Statement":"‹ALTER TABLE startrek.public.episodes EXPERIMENTAL_AUDIT SET READ WRITE›","Tag":"ALTER TABLE","User":"root","DescriptorID":53,"ApplicationName":"$ cockroach sql","ExecMode":"exec","Age":5.706,"TxnCounter":16,"TableName":"‹startrek.public.episodes›","AccessMode":"rw"}

Now, let's execute a SELECT statement against the same table:

$ select * from startrek.episodes;

Now, if you go back to the logs/cockroach-sql-audit.log file once more and look at the latest entries, you should see an entry for the SELECT statement that you executed:

I211129 01:03:38.010291 25421 8@util/log/event_log.go:32 [n1,client=‹127.0.0.1:58274›,hostnossl,user=root] 4 ={"Timestamp":1638147818007255000,"EventType":"sensitive_table_access","Statement":"‹SELECT * FROM "".startrek.episodes›","Tag":"SELECT","User":"root","DescriptorID":53,"ApplicationName":"$ cockroach sql","ExecMode":"exec","NumRows":79,"Age":2.01,"FullTableScan ":true,"TxnCounter":18,"TableName":"‹startrek.public.episodes›","AccessMode":"r"}

If you no longer need SQL audit logging for a particular table, it's possible to turn it off, like so:

ALTER TABLE startrek.episodes EXPERIMENTAL_AUDIT SET OFF;

You can confirm that audit logging has been turned off by going through the entries in the SQL audit log file; that is, logs/cockroach-sql-audit.log:

I211129 01:15:32.779808 25421 8@util/log/event_log.go:32 [n1,client=‹127.0.0.1:58274›,hostnossl,user=root] 9 ={"Timestamp":1638148532758348000,"EventType":"sensitive_table_access","Statement":"‹ALTER TABLE startrek.public.episodes EXPERIMENTAL_AUDIT SET OFF›","Tag":"ALTER TABLE","User":"root","DescriptorID":53,"ApplicationName":"$ cockroach sql","ExecMode":"exec","Age":12.184,"TxnCounter":30,"TableName":"‹startrek.public.episodes›","AccessMode":"rw"}

In the next section, we will learn about RTO and RPO, two of the key parameters in defining your stance for data protection, data loss, and business disruption.

RTO and RPO

RTO determines how soon you can recover from a disaster and start serving requests. It's almost impossible to have zero RTO, which means there will be some amount of application downtime whenever things go wrong. RPO determines how much data you can lose during a failure without causing any major business impact.

Technically, having a very low RTO and zero RPO is the dream of any team that manages the database, but it's incredibly hard to achieve. Also, having no data loss is an important requirement for many mission-critical applications that can never lose any committed data. Since we are talking about data at scale, including several nodes where data is being replicated multiple times, nodes that are sitting in different cloud regions, and infrastructure spread across heterogeneous systems, it's very challenging to achieve desirable RTO and RPO numbers.

There is an Enterprise-only feature that you can use to take full and incremental backups. Incremental backups determine your RPO number as you can always restore to the previous incremental backup. Also, if your data becomes corrupted, you can use a point-in-time restore to restore your cluster to a specific timestamp. You can also pay extra attention to the storage layer and make it highly redundant and highly available so that there is absolutely no data loss at the storage layer. You can achieve zero RPO with a full and incremental backup strategy and having enough replicas across availability zones and regions.

Whenever we have a non-zero RTO, this means that there is some amount of downtime. Depending on the nature of the business, we can significantly lose income during such disruptions. It also affects the overall reputation of the company. Luckily, with the cloud, it's relatively easier to provide redundancy in terms of compute, network, memory, and storage. If everything is automated correctly, we should achieve a few seconds of RTO.

In the next section, we will discuss network security.

Keeping the network secure

Networks are the most popular places for hackers to perform targeted attacks. With today's modern infrastructure being comprised of on-premises, private, public cloud, and multi-cloud environments, there are ample opportunities for hackers to get into insecure networks. First, we should ensure that all the communication that goes in and out of the CockroachDB cluster is completely secured and encrypted. It's always a good idea to turn on TLS for inter-node and client-node communication. Once the data becomes larger, we will end up having a dedicated Site-Reliability Engineering (SRE) organization that ensures CockroachDB is up and running at all times. We should ensure that the right set of folks has the right access to the data. DDLs such as DROP and ALTER should be much more restrictive in production. Also, at any given time, only the folks on production on-call rotation should have access to bastion hosts.

Wherever we have deployed the CockroachDB instances, whether it's directly on a virtual machine, a bare-metal server, or a Docker container, we have to make sure that only the relevant ports can be accessed and that only secured connections are allowed. Also, if you are using a service mesh such as Istio, it already provides out-of-the-box mTLS and TLS termination proxy services.

It's important to only authorize specific application networks that can access the CockroachDB cluster. This can be achieved through IP allowlisting and virtual private cloud (VPC) peering. In IP allowlisting, you provide the IP addresses of all the applications that need to access the CockroachDB cluster.

In VPC peering, you connect two virtual private clouds so that all the traffic between them can be routed using private IP addresses. The advantages of VPC peering are as follows:

  • Improved network latency as the traffic between two virtual private clouds doesn't have to go through the public internet
  • More secure as the traffic is isolated from the public internet
  • More cost savings as we will avoid using external IP addresses, which reduces the egress cost

In the next section, we will look at some of the security best practices that should be followed to get maximum security.

Security best practices

Let's go over some of the best security practices:

  • Certification and key rotation: It is important to rotate certificates and keys from time to time and keep their expiration times shorter. We should also automate a way to rotate certificates and keys, which will be useful whenever we come across any attacks. There are secret management tools such as Vault that can make it easy to automate dynamic certificate generation and rotation.
  • Client password: We should ensure that we follow all the necessary rules to generate a very strong password. Weak passwords are easier to predict, which makes them more vulnerable.
  • Planning for disaster recovery: We should be diligent about our backup and restore strategy, ensuring that none of the backups are missed and that they are stored in multiple different regions.
  • Automation: It's always a good practice to automate most of the routine work around security. This will allow us to quickly respond without too much manual intervention when things go wrong.
  • Data encryption: It's important to encrypt data at rest as hackers constantly try to get access to the storage devices.
  • Transport layer security (TLS): Unless both the client and cluster nodes are behind a firewall or within a secure perimeter, it's always a good idea to enable TLS.
  • Secret management: Several tools are available for managing secrets. So, please make use of any of the well trusted solutions, rather than reinventing the wheel.
  • Production access: Production access should be highly restricted and should be done on an as-needed basis. We must make sure that only the folks who are currently on production on-call rotation have access, not everyone. Setting up bastion hosts to access SQL clients is a must.
  • Backups and archived data: We must make sure that even the backups and historically archived data are encrypted.
  • Personally identifiable information (PII): We should make sure that the PII and sensitive information doesn't show up in some database logs or audit logs.

With that, we have come to the end of this chapter.

Summary

In this chapter, we looked at several aspects of security, including authentication, authorization, encryption, disaster recovery, auditing, and network security. Since a given CockroachDB infrastructure can be spread across on-premises environments and various private and public cloud providers, it's important to ensure maximum security and be prepared to quickly recover in case things go wrong. Hackers and ransomware attacks are increasing every day, so being educated about security and constantly improving our security posture is the only way to prevent attacks.

In the next chapter, we will discuss debugging various performance-related issues in CockroachDB.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset