In Chapter 6, How Indexes Work in CockroachDB, we learned what indexes are, how they are useful in improving query performance, the various types of indexes that are supported in CockroachDB, and the best practices while using indexes.
In this chapter, we will go through the syntax for various Structured Query Language (SQL) operations. Though we have learned some of the syntaxes throughout other chapters, it's useful to have all of them in a single place. Throughout this chapter, only commonly used query options are included, and some of the experimental and enterprise-only features have been left out.
The following topics will be covered in this chapter:
We will need at least a single-node CockroachDB cluster to try some of the queries discussed in this chapter. So, please refer to the Technical requirements section of Chapter 2, How Does CockroachDB Work Internally?
If you want to try this on a larger CockroachDB cluster, then please refer to the Working example of fault tolerance at play section of Chapter 5, Fault Tolerance and Auto-Rebalancing, where we create a six-node cluster.
Also, if you are not sure about some SQL operations and how to use them, you can just use the CockroachDB Help option to get more information.
For example, if you want to know about all the options available with CREATE TABLE, you can just try h CREATE TABLE in the SQL client console, as shown in the following code snippet:
root@localhost:26258/test> h CREATE TABLE
You can try h with any of the commands to get more detailed information.
In the first section, we will learn about DDL statements.
DDL statements are mainly responsible for creating, altering, and dropping tables, indexes, and users. DDL statements typically comprise CREATE, ALTER, and DROP operations. In this section, we will go over various DDL operations and their syntax, starting with CREATE.
CREATE is the keyword used for creating something new such as a database, schema, table, view, or user.
A CREATE DATABASE statement accepts the following parameters:
The following databases are created by default and are used internally by CockroachDB:
The following default database is used for default connections:
The following default databases are used for demonstration purposes:
Some of these databases are used internally by CockroachDB, and some are there with the sample schema maintained by Cockroach Labs, so it is advisable not to use any of these preloaded databases and to instead create separate ones for your application.
Let's look at a specific example of creating a database. You can view the code here:
CREATE DATABASE users PRIMARY REGION "us-central1" REGIONS "us-east1", "us-central1", "us-west1" SURVIVE REGION FAILURE;
The preceding statement will create a database with the primary region being "us-central1" and with "us-east1", "us-central1", and "us-west1" database regions.
Next, we will look at the syntax of the CREATE TABLE statement.
The CREATE TABLE statement accepts the following parameters:
For complete options, please check out the CREATE TABLE documentation by Cockroach Labs at the following link: https://www.cockroachlabs.com/docs/stable/create-table.html.
Here is an example of the CREATE TABLE statement being deployed:
CREATE TABLE users (
id INT NOT NULL,
name STRING NOT NULL,
age INT NOT NULL,
PRIMARY KEY (id)
);
Optionally, you can also create a schema. In the naming hierarchy, a cluster can have multiple databases, a database can have multiple schemas, and a schema can contain multiple tables, views, and sequences. A schema can be created using the CREATE SCHEMA <schema_name> statement.
CREATE VIEW is used for creating views. A view is a virtual table that stores the result of a query. Whenever there is a change in the data in the original table, it is automatically updated. Views don't take up additional physical storage space.
A materialized view is similar to a view but it's physically stored separately. When the data is changed in the original table, materialized views don't get updated automatically. You can use REFRESH MATERIALIZED VIEW <view_name> to refresh the contents of a materialized view.
Views are useful for the following reasons:
CREATE VIEW accepts the following parameters:
Let's look at example of how views are helpful. Let's first insert some records into the users table that was created previously, as follows:
INSERT INTO users(id, name, age) values ( 1, 'foo', 13);
INSERT INTO users(id, name, age) values ( 2, 'bar', 24);
INSERT INTO users(id, name, age) values ( 3, 'alice', 14);
INSERT INTO users(id, name, age) values ( 4, 'bob', 29);
> select * from users;
id | name | age
-----+-------+------
1 | foo | 13
2 | bar | 24
3 | alice | 14
4 | bob | 29
(4 rows)
Now, let's say you want to track all users who are in the age group of 12-17.
You can run this query on the users table itself and get the result, as follows:
> SELECT * FROM users where age > 12 AND age < 17;
id | name | age
-----+-------+------
1 | foo | 13
3 | alice | 14
Now, let's say you don't want to run any query directly on the users table. Then, you can create a view with that condition and query the view, like this:
> CREATE VIEW vaccine_big_kids_group
AS SELECT id, name, age
FROM users
WHERE age > 12 AND age <17;
Let's query the view and check the results, as follows:
> select * from vaccine_big_kids_group;
id | name | age
-----+-------+------
1 | foo | 13
3 | alice | 14
(2 rows)
Let's insert one more record that falls under the same age group of 12-17, as follows:
INSERT INTO users(id, name, age) values ( 5, 'john', 15);
Since the view directly refers to the original table, it's automatically updated, as shown here:
> select * from vaccine_big_kids_group;
id | name | age
-----+-------+------
1 | foo | 13
3 | alice | 14
5 | john | 15
(3 rows)
Instead of a view, if you had created a materialized view, you would have had to explicitly refresh in order to get the latest data. Here is an example of creating a materialized view and refreshing it:
> CREATE MATERIALIZED VIEW vaccine_big_kids_group_materialized
AS SELECT *
FROM users
WHERE age > 12 AND age <17;
REFRESH MATERIALIZED VIEW vaccine_big_kids_group_materialized;
CockroachDB also supports session-scoped temporary views, which are automatically dropped at the end of a session. You can use CREATE VIEW TEMP <view_definition> to create a temporary view.
The ALTER statement is used to modify an existing schema object. You can alter the definition of several schema objects such as DATABASE, SCHEMA, TABLE, COLUMN, TYPE, USER, INDEX, VIEW, and so on. We will go over ALTER TABLE and ALTER INDEX in this subsection.
ALTER TABLE takes the following parameters:
ALTER INDEX takes the following parameters:
Here is an example of a column being added:
ALTER TABLE users ADD COLUMN new_column INT;
> SHOW COLUMNS from users;
column_name | data_type | is_nullable | column_default | generation_expression | indices | is_hidden
--------------+-----------+-------------+----------------+-----------------------+-----------+------------
id | INT8 | false | NULL | | {primary} | false
name | STRING | false | NULL | | {} | false
age | INT8 | false | NULL | | {} | false
new_column | INT8 | true | NULL | | {} | false
(4 rows)
Here is an example of a constraint being added:
> ALTER TABLE users ADD CONSTRAINT age_check CHECK (age > 0);
INSERT INTO users(id, name, age) values ( 0, 'cindy', -1);
ERROR: failed to satisfy CHECK constraint (age > 0:::INT8)
SQLSTATE: 23514
CONSTRAINT: age_check
The DROP statement is used to delete a schema object and all the data within it.
The general syntax for DROP is DROP <SCHEMA TYPE> name.
DROP DATABASE <database_name>;
DROP ROLE <role_name>;
DROP TABLE <table_name>;
DROP TABLE takes the following parameters:
Now, let's look at DML statements in the next section.
DML statements are used for managing data within schema objects. They consist of INSERT, UPDATE, UPSERT, and DELETE statements and are generally referred to as statements that insert, update, or delete the data in a database. We will go over the syntax of INSERT, UPDATE, UPSERT, and DELETE statements used for a table in this section.
The INSERT statement takes the following parameters:
Here is an example of INSERT INTO:
INSERT INTO users(id, name, age) values ( 0, 'cindy', 15);
The UPDATE statement updates the existing rows of a table and takes the following parameters:
Here is an example of the UPDATE statement being deployed, and also the table data before and after the update:
> select * from users;
id | name | age | new_column
-----+-------+-----+-------------
0 | cindy | 15 | NULL
1 | foo | 13 | NULL
2 | bar | 24 | NULL
3 | alice | 14 | NULL
4 | bob | 29 | NULL
5 | john | 15 | NULL
(6 rows)
> UPDATE users SET age = 39 WHERE id = 4;
> select * from users;
id | name | age | new_column
-----+-------+-----+-------------
0 | cindy | 15 | NULL
1 | foo | 13 | NULL
2 | bar | 24 | NULL
3 | alice | 14 | NULL
4 | bob | 39 | NULL
5 | john | 15 | NULL
(6 rows)
The UPSERT statement inserts rows if they don't violate uniqueness constraints, and it updates rows if the values violate uniqueness constraints. UPSERT only looks at PKs for uniqueness.
The UPSERT statement takes the following parameters:
> select * from users;
id | name | age | new_column
-----+-------+-----+-------------
0 | cindy | 15 | NULL
1 | foo | 13 | NULL
2 | bar | 24 | NULL
3 | alice | 14 | NULL
4 | bob | 39 | NULL
5 | john | 15 | NULL
(6 rows)
> UPSERT INTO users(id, name, age) VALUES (2, 'bar', 34);
> select * from users;
id | name | age | new_column
-----+-------+-----+-------------
0 | cindy | 15 | NULL
1 | foo | 13 | NULL
2 | bar | 34 | NULL
3 | alice | 14 | NULL
4 | bob | 39 | NULL
5 | john | 15 | NULL
(6 rows)
The DELETE statement deletes rows from a table. It takes the following parameters:
Have a look at the following example:
> select * from users;
id | name | age | new_column
-----+-------+-----+-------------
0 | cindy | 15 | NULL
1 | foo | 13 | NULL
2 | bar | 34 | NULL
3 | alice | 14 | NULL
4 | bob | 39 | NULL
5 | john | 15 | NULL
(6 rows)
> DELETE from users where id = 4;
> select * from users;
id | name | age | new_column
-----+-------+-----+-------------
0 | cindy | 15 | NULL
1 | foo | 13 | NULL
2 | bar | 34 | NULL
3 | alice | 14 | NULL
5 | john | 15 | NULL
(5 rows)
In the next section, we will go through some examples of DQL.
DQL is used for reading or querying data or table metadata. SQL statements involving SELECT and SHOW fall under this category.
SELECT is a very commonly used SQL syntax to read table data. It takes the following parameters:
SHOW SQL syntax is commonly used to retrieve the table metadata.
You can use the following queries to list down databases, tables in a database, and columns for a given table respectively:
SHOW DATABASES;
SHOW TABLES;
SHOW COLUMNS FROM <table_name>;
SHOW STATISTICS FOR TABLE <table_name>; is useful for looking at table statistics. This data is used by the cost-based optimizer to improve query performance. This also comes in handy when debugging slow queries.
SHOW can be combined with keywords such as COLUMNS, DATABASES, TABLES, CREATE, RANGES, STATISTICS, TYPES, USERS, ROLES, REGIONS, and so on to retrieve the metadata associated with a given schema object.
Next, we will go over the list of data types supported by CockroachDB.
In this section, we will go through a list of data types that CockroachDB supports. Here is a list of data types and their descriptions:
'{
"id": 12345,
"name": "elan",
"isAlien": "true"
}'
In the next section, we will learn about database constraints and various types of constraints in CockroachDB.
Constraints are rules that are enforced on data columns in a table. Whenever there is any change in data within a table, all the constraints are verified to make sure none is violated. If violated, the changes are rejected with the appropriate error message. Here is a list of column-level constraints:
Here's an example to illustrate this:
CREATE TABLE user (
id INT NOT NULL,
name STRING NOT NULL,
age INT NOT NULL CHECK (age > 18) CHECK (age < 65),
PRIMARY KEY (id)
);
Here's an example to illustrate this:
CREATE TABLE employees (
id INT NOT NULL,
name STRING NOT NULL,
salary FLOAT NOT NULL,
bonus FLOAT DEFAULT 0.0,
PRIMARY KEY (id)
);
Now, if you insert a row without providing any value for the bonus column (for example, INSERT INTO employee (id, name, salary) VALUES (1, 'foo', 10000.00);), CockroachDB inserts the default value 0.0 for the bonus column.
Let's look at an example whereby we create an employees table, as shown in the following code snippet, followed by an employee_info table in which the emp_id column refers to the id column of the employees table:
CREATE TABLE employees (
id INT NOT NULL,
name STRING NOT NULL,
salary FLOAT NOT NULL,
bonus FLOAT DEFAULT 0.0,
PRIMARY KEY (id)
);
Here is an example where the emp_id column references the id column in the previous employees table:
CREATE TABLE employees_info (
emp_id INT NOT NULL,
address STRING NOT NULL,
CONSTRAINT fk_emp_id FOREIGN KEY (emp_id) REFERENCES employees (id)
);
Alternatively, an employees_info table can also be created, as follows:
CREATE TABLE employees_info (
emp_id INT NOT NULL REFERENCES employees(id),
address STRING NOT NULL
);
Here's an example to illustrate this:
CREATE TABLE employees (
id INT NOT NULL,
name STRING NOT NULL,
salary FLOAT NOT NULL,
bonus FLOAT DEFAULT 0.0,
PRIMARY KEY (id)
);
CREATE TABLE user (
id INT NOT NULL,
email STRING UNIQUE,
name STRING NOT NULL,
age INT NOT NULL CHECK (age > 18) CHECK (age < 65),
PRIMARY KEY (id)
);
In the next section, we will go over various types of joins.
Table joins are used to combine data from more than one table based on certain conditions on a certain column or columns.
For example, let's assume the following two tables exist:
CREATE TABLE customers (
id UUID PRIMARY KEY,
name STRING NOT NULL
);
CREATE TABLE purchase_orders (
id UUID PRIMARY KEY,
customer_id UUID,
n_of_items INT,
total_price DECIMAL(10,2)
);
Now, let's look at each of the JOIN types with an example, as follows:
Let's look at the following example involving an inner join between the customers and purchase_orders tables:
SELECT a.id as customer_id, a.name AS customer_name, b.id AS purchase_order_id FROM customers AS a
INNER JOIN purchase_orders AS b ON a.id = b.customer_id;
This returns all customers and their purchase orders that have matching customers.id values with that of purchase_orders.customer_id.
Let's look at the following example involving a left outer join between the customers and purchase_orders tables:
SELECT a.id as customer_id, a.name AS customer_name, b.id AS purchase_order_id FROM customers AS a
LEFT OUTER JOIN purchase_orders AS b ON a.id = b.customer_id;
The preceding query will return all customers, whether they have any purchase orders or not.
Let's look at the following example involving a right outer join between the customers and purchase_orders tables:
SELECT a.id as customer_id, a.name AS customer_name, b.id AS purchase_order_id FROM customers AS a
RIGHT OUTER JOIN purchase_orders AS b ON a.id = b.customer_id;
The preceding query will return all purchase_orders values, whether they have a corresponding customer in the customers table or not.
Have a look at the following example to illustrate this:
SELECT a.id as customer_id, a.name AS customer_name, b.id AS purchase_order_id FROM customers AS a
FULL OUTER JOIN purchase_orders AS b ON a.id = b.customer_id;
The preceding query will return all the rows from both tables.
Next, we will learn about sequences and the syntax for creating a sequence.
Sequences are helpful when you need an auto-increment integer sequence in a table.
Let's look at the following example in which we create a sequence with the default setting:
root@localhost:26258/test> CREATE SEQUENCE customer_id_seq;
CREATE SEQUENCE
Time: 158ms total (execution 158ms / network 0ms)
root@localhost:26258/test> SHOW CREATE customer_id_seq;
table_name | create_statement
------------------+-----------------------------------------------------------------------------------------------------
customer_id_seq | CREATE SEQUENCE public.customer_id_seq MINVALUE 1 MAXVALUE 9223372036854775807 INCREMENT 1 START 1
(1 row)
Time: 136ms total (execution 135ms / network 1ms)
Here is an example of using the previously created sequence in a table column:
CREATE TABLE customers (
id INT,
row_id INT DEFAULT nextval('customer_id_seq'),
name STRING
);
Here, whenever a row is inserted, DEFAULT will call nextval, which will generate increments in the customer_id_seq sequence and use that as the value for the row_id column.
Next, we will go through the benefits of online schema changes.
CockroachDB supports online schema changes, which don't require any downtime. An existing schema can be changed using statements that include ALTER and CREATE INDEX operations.
CockroachDB internally maintains a consistent distributed schema cache, along with consistent table data that works with multiple versions of the schema concurrently. This enables the rolling out of a new schema while the older schema is still being used. CockroachDB backfills the table data into the newer schema without holding locks. So, online schema changes don't affect the current read/write operations on the cluster on a particular table whose schema is being modified.
Here are some of the benefits of online schema changes:
Here are some of the known limitations of schema changes:
Please refer to a complete list of known limitations here: https://www.cockroachlabs.com/docs/stable/known-limitations.html.
Some of these limitations might get fixed in future releases, and there might also be new ones.
With that, we have reached the end of this chapter.
In this chapter, we learned about the SQL syntax for DDLs, DMLs, and DQLs. We also went over other popular features such as indexes, joins, and sequences. We have left out some of the experimental and enterprise-only features. Also, whenever you are not sure about the syntax of a given SQL operation, please use /h to get detailed information about the SQL operation.
In the next chapter, we will go over the admin user interface (UI) and learn how to use it to manage a CockroachDB cluster.