Using Apache Cassandra
Apache Cassandra is a wide-column, open source NoSQL database and the most commonly used NoSQL database in its category. The container of data, equivalent to a database schema in a relational database, in Apache Cassandra is a Keyspace. The basic unit of storage is a column family (also called table), and each record in a table is stored in a row with the data being stored in columns. A column has a name, a value, and a timestamp associated with it. A column is not required to store a value and the column could be empty. Apache Cassandra is based on a flexible schema (or schema-free or dynamic schema) data model in which different rows could have different columns and the columns are not required to be pre-specified in a table definition. Apache Cassandra supports data types for column names (called comparators) and column values (called validators), but does not require the data types (validators and comparators) to be specified. The validators and comparators may be added or modified after a table (column family) has been defined. Apache Cassandra provides a Cassandra Query Language (CQL) for CRUD (add, get, update, delete) operations on a table. Apache Cassandra installation includes a cqlsh utility, which is an interactive shell, from which CQL commands may be run. An official Docker image for Apache Cassandra is available and in this chapter we shall run Apache Cassandra in a Docker container.
Setting the Environment
The following software is required for this chapter.
We have used an Amazon EC2 AMI as in other chapters to install Docker and the Docker image. First, SSH to the Amazon EC2 instance.
ssh -i "docker.pem" [email protected]
Installing Docker is discussed in Chapter 1. Start the Docker service. The following command should output an OK message.
sudo service docker start
Verify that the Docker service has been started. The following command should output active (running) in the Active field.
sudo service docker status
Output from the preceding commands is shown in Figure 6-1.
Figure 6-1. Starting Docker Service and verifying Status
Next, download the latest cassandra Docker image.
sudo docker pull cassandra:latest
List the Docker images downloaded.
sudo docker images
The cassandra image should get listed as shown in Figure 6-2.
Figure 6-2. Listing Docker Image cassandra
Starting Apache Cassandra
Start the Apache Cassandra server process in a Docker container with the following command in which the inter-node Apache Cassandra cluster communication port is specified as 7000 and the directory in which Apache Cassandra stores data is /cassandra/data. The container name is specified with the –name option as cassandradb. The syntax to start a Cassandra instance in detached mode is as follows.
docker run --name some-cassandra -d cassandra:tag
The –d parameter starts the container in a detached mode, implying that an interactive shell is not connected to with the docker run command even if the –t –i options are specified.
sudo docker run -t -i -v /cassandra/data:/var/lib/cassandra/data --name cassandradb -d -p 7000:7000 cassandra
A Docker container running an Apache Cassandra server process gets started as shown in Figure 6-3.
Figure 6-3. Starting Docker Container for Apache Cassandra
List the running Docker containers with the following command.
sudo docker ps
The cassandradb container, which is running an Apache Cassandra server instance, gets listed. The container id is also listed. By default, port 9042 is the client port on which Apache Cassandra listens for client connections. Port 9160 is Thrift API as shown in Figure 6-4.
Figure 6-4. Listing Docker Containers that are Running
Starting the TTY
Start the interactive terminal (tty) with the following command.
sudo docker exec -it cassandradb bash
The tty gets connected to and the command prompt gets set to user@containerid. If the user is root and the container id is dfade56f871, the command prompt becomes root@dfade56f871 as shown in Figure 6-5.
Figure 6-5. Starting the TTY
Connecting to CQL Shell
The cqlsh terminal is used to connect to an Apache Cassandra instance and run CQL commands. Start the cqlsh terminal with the following command.
cqlsh
A connection gets established to the Test Cluster at 127.0.0.1:9042. The Apache Cassandra version gets output as 2.2.2 and the CQL spec version as 3.3.1. The cqlsh> command prompt gets displayed as shown in Figure 6-6.
Figure 6-6. Connecting the CQL Shell
We started the interactive terminal using the container name, but the tty may also be started using the container id. The cqlsh shell is started with the cqlsh command regardless of how the tty is started.
sudo docker exec –it dfade56f871 bash
cqlsh
The cqlsh> command prompt gets displayed as before as shown in Figure 6-7.
Figure 6-7. Connecting to CQL Shell using the Container ID
Creating a Keyspace
A Keyspace is the container of application data and is used to group column families. Replication is set at a per-keyspace basis. The DDL command for creating a Keyspace is as follows.
CREATE KEYSPACE (IF NOT EXISTS)? <identifier> WITH <properties>
By default, the keyspace name is case-insensitive and may consist exclusively of alpha-numeric characters with a maximum length of 32. To make a keyspace name case-sensitive add quotes. The supported properties by the CREATE KEYSPACE statement, which creates a top-level keyspace, are replication for specifying the replication strategy and options and durable_writes for whether a commit log is to be used for updates on the keyspace, with the replication property being mandatory. As an example, create a keyspace called CatalogKeyspace with replication strategy class as SimpleStrategy and replication factor as 3.
CREATE KEYSPACE CatalogKeyspace
WITH replication = {’class’: ’SimpleStrategy’, ’replication_factor’ : 3};
The CatalogKeyspace keyspace gets created as shown in Figure 6-8.
Figure 6-8. Creating a Keyspace
Altering A Keyspace
The ALTER KEYSPACE statement is used to alter a keyspace and has the following syntax with the supported properties being the same as for the CREATE KEYSPACE statement.
ALTER KEYSPACE <identifier> WITH <properties>
As an example, alter the CatalogKeyspace keyspace to make the replication factor 1.
ALTER KEYSPACE CatalogKeyspace
WITH replication = {’class’: ’SimpleStrategy’, ’replication_factor’ : 1};
The replication factor gets set to 1 as shown in Figure 6-9.
Figure 6-9. Altering a Keyspace
Using A Keyspace
The USE statement is used to set the current keyspace and has the following syntax.
USE <identifier>
All subsequent commands are run in the context of the Keyspace set with the USE statement. As an example, set the current Keyspace as CatalogKeyspace.
use CatalogKeyspace;
The cqlsh> command prompt becomes cqlsh:catalogkeyspace> as shown in Figure 6-10.
Figure 6-10. Using a Keyspace
Creating a Table
A TABLE is also called a COLUMN FAMILY, and the CREATE TABLE or CREATE COLUMN FAMILY statement is used to create a table (column family).
CREATE ( TABLE | COLUMNFAMILY ) ( IF NOT EXISTS )? <tablename>
’(’ <column-definition> ( ’,’ <column-definition> )* ’)’
( WITH <option> ( AND <option>)* )?
For the complete syntax of the CREATE TABLE statement refer to https://cassandra.apache.org/doc/cql3/CQL.html#createTableStmt. As an example create a table called ‘catalog’ with columns catalog_id, journal, publisher, edition, title and author all of type text. Specify the primary key as catalog_id and set the compaction class as LeveledCompactionStrategy.
CREATE TABLE catalog(catalog_id text,journal text,publisher text,edition text,title text,author text,PRIMARY KEY (catalog_id)) WITH compaction = { ’class’ : ’LeveledCompactionStrategy’ };
The catalog table gets created as shown in Figure 6-11.
Figure 6-11. Creating a Table
Adding Table Data
The INSERT DML statement is used to add data into a table and has the following syntax.
INSERT INTO <tablename>
’(’ <identifier> ( ’,’ <identifier> )* ’)’
VALUES ’(’ <term-or-literal> ( ’,’ <term-or-literal> )* ’)’
( IF NOT EXISTS )?
( USING <option> ( AND <option> )* )?
For complete syntax for the INSERT statement refer https://cassandra.apache.org/doc/cql3/CQL.html#insertStmt. As an example add two rows of data to the catalog table and include the IF NOT EXISTS clause to add a row if a row identified by the primary key does not exist.
INSERT INTO catalog (catalog_id, journal, publisher, edition,title,author) VALUES (’catalog1’,’Oracle Magazine’, ’Oracle Publishing’, ’November-December 2013’, ’Engineering as a Service’,’David A. Kelly’) IF NOT EXISTS;
INSERT INTO catalog (catalog_id, journal, publisher, edition,title,author) VALUES (’catalog2’,’Oracle Magazine’, ’Oracle Publishing’, ’November-December 2013’, ’Quintessential and Collaborative’,’Tom Haunert’) IF NOT EXISTS;
As indicated by the [applied] True output, two rows of data get added as shown in Figure 6-12.
Figure 6-12. Adding Table Data
Querying a Table
The SELECT statement, which has the following syntax, is used to query a table.
SELECT <select-clause>
FROM <tablename>
( WHERE <where-clause> )?
( ORDER BY <order-by> )?
( LIMIT <integer> )?
( ALLOW FILTERING )?
For the complete syntax for the SELECT statement refer to https://cassandra.apache.org/doc/cql3/CQL.html#selectStmt. As an example select all columns from the catalog table.
SELECT * FROM catalog;
The two rows of data added previously get listed as shown in Figure 6-13.
Figure 6-13. Querying Table
Deleting from a Table
The DELETE statement is used to delete columns and rows and has the following syntax.
DELETE ( <selection> ( ’,’ <selection> )* )?
FROM <tablename>
( USING TIMESTAMP <integer>)?
WHERE <where-clause>
( IF ( EXISTS | ( <condition> ( AND <condition> )*) ) )?
For complete syntax for the DELETE statement refer to https://cassandra.apache.org/doc/cql3/CQL.html#deleteStmt. As an example, delete all columns from the row with catalog_id as catalog1.
DELETE catalog_id, journal, publisher, edition, title, author from catalog WHERE catalog_id=’catalog1’;
Subsequently, query the catalog table with the SELECT statement.
SELECT * FROM catalog;
Column values from the row with catalog_id as catalog1 get deleted, but the row itself including the primary key column value do not get deleted even though the primary key catalog_id is listed as one of the columns to delete. Subsequent query lists the primary key column value but lists the column values for the other columns as null as shown in Figure 6-14.
Figure 6-14. Deleting Table Data
Truncating a Table
The TRUNCATE statement removes all data from a table and has the following syntax.
TRUNCATE <tablename>
As an example, truncate the catalog table. Subsequently, run a query with the SELECT statement.
TRUNCATE catalog;
SELECT * from catalog;
As the output of the query indicates, no data is listed because the TRUNCATE statement has removed all data as shown in Figure 6-15.
Figure 6-15. Truncating a Table
Dropping A Table
The DROP TABLE or DROP COLUMN FAMILY statement is used to drop a table and has the following syntax.
DROP TABLE ( IF EXISTS )? <tablename>
As an example, drop the catalog table.
DROP TABLE IF EXISTS catalog;
If the IF EXISTS clause is not specified and the table does not exist, an error is generated. But with the IF EXISTS clause, an error is not generated as indicated by two consecutively run DROP TABLE statements with the IF EXISTS clause included in Figure 6-16.
Figure 6-16. Dropping a Table
Dropping a Keyspace
The DROP KEYSPACE statement, which has the following syntax, removes the specified key space including the column families in the key space and the data in the column families, and the keyspace does not have to be empty before being dropped.
DROP KEYSPACE ( IF EXISTS )? <identifier>
As an example, drop the CatalogKeyspace keyspace.
DROP KEYSPACE IF EXISTS CatalogKeyspace;
If the IF EXISTS clause is not specified and the keyspace does not exist, an error is generated. But with the IF EXISTS clause, an error is not generated as indicated by two consecutively run DROP KEYSPACE statements with the IF EXISTS clause included as shown in Figure 6-17.
Figure 6-17. Dropping a Keyspace
Exiting CQL Shell
To exit the cqlsh shell specify the exit command as shown in Figure 6-18. Subsequently exit the tty with the exit command also.
Figure 6-18. Exiting CQL Shell
Stopping Apache Cassandra
To stop Apache Cassandra, stop the Docker container running the Apache Cassandra server.
sudo docker stop cassandradb
Subsequently, run the following command to list the running containers.
sudo docker ps
The cassndradb container does not get listed as running as shown in Figure 6-19.
Figure 6-19. Stopping Cassandra DB Docker Container
Starting Multiple Instances of Apache Cassandra
Multiple Docker containers running Apache Cassandra instances may be started, but the container name has to be unique. As an example, start a new Docker container also called cassandradb to run another instance of Apache Cassandra database.
sudo docker run -t -i -v /cassandra/data:/var/lib/cassandra/data --name cassandradb -d -p 7000:7000 cassandra
Because a Docker container with the same name (cassandradb) was already created earlier, an error is generated even though the container has been stopped as shown in Figure 6-20. A container has to be removed with the docker rm command to be able to create a new container with the same name.
Figure 6-20. Duplicate Docker Container name error
Another container with a different name, cassandradb2 for example, may be started.
sudo docker run -t -i -v /cassandra/data:/var/lib/cassandra/data --name cassandradb2 -d -p 7000:7000 cassandra
Start a third container and specify the CASSANDRA_SEEDS environment variable for the IP address/es to be used to run multiple nodes in the cluster if required.
sudo docker run -t -i -v /cassandra/data:/var/lib/cassandra/data --name cassandradb3 -d -p 7000:7000 -e CASSANDRA_SEEDS=52.91.214.50,54.86.243.122,54.86.205.95 cassandra
Subsequently, run the following command to list the running containers.
sudo docker ps
The cassandradb2 and cassandradb3 containers get listed as running as shown in Figure 6-21.
Figure 6-21. Running Multiple Docker Containers for Instances of Apache Cassandra
Summary
In this chapter we use the Docker image for Apache Cassandra to run Apache Cassandra in a Docker container. We used the different CQL statements in a cqlsh shell to create a Keyspace, create a table in the Keyspace and add data to the table. We also ran CQL statements to query a table, delete data from the table, truncate a table, drop a table, and drop a keyspace. We also demonstrated creating multiple Docker containers to run multiple instances of Apache Cassandra. In the next chapter we shall run Couchbase Server in Docker.