CHAPTER 6

image

Using Apache Cassandra

Apache Cassandra is a wide-column, open source NoSQL database and the most commonly used NoSQL database in its category. The container of data, equivalent to a database schema in a relational database, in Apache Cassandra is a Keyspace. The basic unit of storage is a column family (also called table), and each record in a table is stored in a row with the data being stored in columns. A column has a name, a value, and a timestamp associated with it. A column is not required to store a value and the column could be empty. Apache Cassandra is based on a flexible schema (or schema-free or dynamic schema) data model in which different rows could have different columns and the columns are not required to be pre-specified in a table definition. Apache Cassandra supports data types for column names (called comparators) and column values (called validators), but does not require the data types (validators and comparators) to be specified. The validators and comparators may be added or modified after a table (column family) has been defined. Apache Cassandra provides a Cassandra Query Language (CQL) for CRUD (add, get, update, delete) operations on a table. Apache Cassandra installation includes a cqlsh utility, which is an interactive shell, from which CQL commands may be run. An official Docker image for Apache Cassandra is available and in this chapter we shall run Apache Cassandra in a Docker container.

  • Setting the Environment
  • Starting Apache Cassandra
  • Starting the TTY
  • Connecting to CQL Shell
  • Creating a Keyspace
  • Altering A Keyspace
  • Using A Keyspace
  • Creating a Table
  • Adding Table Data
  • Querying a Table
  • Deleting from a Table
  • Truncating a Table
  • Dropping A Table
  • Dropping a Keyspace
  • Exiting CQLSh
  • Stopping Apache Cassandra
  • Starting Multiple Instances of Apache Cassandra

Setting the Environment

The following software is required for this chapter.

  • -Docker (version 1.8)
  • -Docker image for Apache Cassandra

We have used an Amazon EC2 AMI as in other chapters to install Docker and the Docker image. First, SSH to the Amazon EC2 instance.

ssh -i "docker.pem" [email protected]

Installing Docker is discussed in Chapter 1. Start the Docker service. The following command should output an OK message.

sudo service docker start

Verify that the Docker service has been started. The following command should output active (running) in the Active field.

sudo service docker status

Output from the preceding commands is shown in Figure 6-1.

9781484218297_Fig06-01.jpg

Figure 6-1. Starting Docker Service and verifying Status

Next, download the latest cassandra Docker image.

sudo docker pull cassandra:latest

List the Docker images downloaded.

sudo docker images

The cassandra image should get listed as shown in Figure 6-2.

9781484218297_Fig06-02.jpg

Figure 6-2. Listing Docker Image cassandra

Starting Apache Cassandra

Start the Apache Cassandra server process in a Docker container with the following command in which the inter-node Apache Cassandra cluster communication port is specified as 7000 and the directory in which Apache Cassandra stores data is /cassandra/data. The container name is specified with the –name option as cassandradb. The syntax to start a Cassandra instance in detached mode is as follows.

docker run --name some-cassandra -d cassandra:tag

The –d parameter starts the container in a detached mode, implying that an interactive shell is not connected to with the docker run command even if the –t –i options are specified.

sudo docker run -t -i -v /cassandra/data:/var/lib/cassandra/data --name cassandradb -d -p 7000:7000  cassandra

A Docker container running an Apache Cassandra server process gets started as shown in Figure 6-3.

9781484218297_Fig06-03.jpg

Figure 6-3. Starting Docker Container for Apache Cassandra

List the running Docker containers with the following command.

sudo docker ps

The cassandradb container, which is running an Apache Cassandra server instance, gets listed. The container id is also listed. By default, port 9042 is the client port on which Apache Cassandra listens for client connections. Port 9160 is Thrift API as shown in Figure 6-4.

9781484218297_Fig06-04.jpg

Figure 6-4. Listing Docker Containers that are Running

Starting the TTY

Start the interactive terminal (tty) with the following command.

sudo docker exec -it cassandradb bash

The tty gets connected to and the command prompt gets set to user@containerid. If the user is root and the container id is dfade56f871, the command prompt becomes root@dfade56f871 as shown in Figure 6-5.

9781484218297_Fig06-05.jpg

Figure 6-5. Starting the TTY

Connecting to CQL Shell

The cqlsh terminal is used to connect to an Apache Cassandra instance and run CQL commands. Start the cqlsh terminal with the following command.

cqlsh

A connection gets established to the Test Cluster at 127.0.0.1:9042. The Apache Cassandra version gets output as 2.2.2 and the CQL spec version as 3.3.1. The cqlsh> command prompt gets displayed as shown in Figure 6-6.

9781484218297_Fig06-06.jpg

Figure 6-6. Connecting the CQL Shell

We started the interactive terminal using the container name, but the tty may also be started using the container id. The cqlsh shell is started with the cqlsh command regardless of how the tty is started.

sudo docker exec –it dfade56f871 bash
cqlsh

The cqlsh> command prompt gets displayed as before as shown in Figure 6-7.

9781484218297_Fig06-07.jpg

Figure 6-7. Connecting to CQL Shell using the Container ID

Creating a Keyspace

A Keyspace is the container of application data and is used to group column families. Replication is set at a per-keyspace basis. The DDL command for creating a Keyspace is as follows.

CREATE KEYSPACE (IF NOT EXISTS)? <identifier> WITH <properties>

By default, the keyspace name is case-insensitive and may consist exclusively of alpha-numeric characters with a maximum length of 32. To make a keyspace name case-sensitive add quotes. The supported properties by the CREATE KEYSPACE statement, which creates a top-level keyspace, are replication for specifying the replication strategy and options and durable_writes for whether a commit log is to be used for updates on the keyspace, with the replication property being mandatory. As an example, create a keyspace called CatalogKeyspace with replication strategy class as SimpleStrategy and replication factor as 3.

CREATE KEYSPACE CatalogKeyspace
           WITH replication = {’class’: ’SimpleStrategy’, ’replication_factor’ : 3};

The CatalogKeyspace keyspace gets created as shown in Figure 6-8.

9781484218297_Fig06-08.jpg

Figure 6-8. Creating a Keyspace

Altering A Keyspace

The ALTER KEYSPACE statement is used to alter a keyspace and has the following syntax with the supported properties being the same as for the CREATE KEYSPACE statement.

ALTER KEYSPACE <identifier> WITH <properties>

As an example, alter the CatalogKeyspace keyspace to make the replication factor 1.

ALTER KEYSPACE CatalogKeyspace
          WITH replication = {’class’: ’SimpleStrategy’, ’replication_factor’ : 1};

The replication factor gets set to 1 as shown in Figure 6-9.

9781484218297_Fig06-09.jpg

Figure 6-9. Altering a Keyspace

Using A Keyspace

The USE statement is used to set the current keyspace and has the following syntax.

USE <identifier>

All subsequent commands are run in the context of the Keyspace set with the USE statement. As an example, set the current Keyspace as CatalogKeyspace.

use CatalogKeyspace;

The cqlsh> command prompt becomes cqlsh:catalogkeyspace> as shown in Figure 6-10.

9781484218297_Fig06-10.jpg

Figure 6-10. Using a Keyspace

Creating a Table

A TABLE is also called a COLUMN FAMILY, and the CREATE TABLE or CREATE COLUMN FAMILY statement is used to create a table (column family).

 CREATE ( TABLE | COLUMNFAMILY ) ( IF NOT EXISTS )? <tablename>
                          ’(’ <column-definition> ( ’,’ <column-definition> )* ’)’
                          ( WITH <option> ( AND <option>)* )?

For the complete syntax of the CREATE TABLE statement refer to https://cassandra.apache.org/doc/cql3/CQL.html#createTableStmt. As an example create a table called ‘catalog’ with columns catalog_id, journal, publisher, edition, title and author all of type text. Specify the primary key as catalog_id and set the compaction class as LeveledCompactionStrategy.

CREATE TABLE catalog(catalog_id text,journal text,publisher text,edition text,title text,author text,PRIMARY KEY (catalog_id)) WITH  compaction = { ’class’ : ’LeveledCompactionStrategy’ };

The catalog table gets created as shown in Figure 6-11.

9781484218297_Fig06-11.jpg

Figure 6-11. Creating a Table

Adding Table Data

The INSERT DML statement is used to add data into a table and has the following syntax.

INSERT INTO <tablename>
                            ’(’ <identifier> ( ’,’ <identifier> )* ’)’
                     VALUES ’(’ <term-or-literal> ( ’,’ <term-or-literal> )* ’)’
                     ( IF NOT EXISTS )?
                     ( USING <option> ( AND <option> )* )?

For complete syntax for the INSERT statement refer https://cassandra.apache.org/doc/cql3/CQL.html#insertStmt. As an example add two rows of data to the catalog table and include the IF NOT EXISTS clause to add a row if a row identified by the primary key does not exist.

INSERT INTO catalog (catalog_id, journal, publisher, edition,title,author) VALUES (’catalog1’,’Oracle Magazine’, ’Oracle Publishing’, ’November-December 2013’, ’Engineering as a Service’,’David A.  Kelly’) IF NOT EXISTS;

INSERT INTO catalog (catalog_id, journal, publisher, edition,title,author) VALUES (’catalog2’,’Oracle Magazine’, ’Oracle Publishing’, ’November-December 2013’, ’Quintessential and Collaborative’,’Tom Haunert’) IF NOT EXISTS;

As indicated by the [applied] True output, two rows of data get added as shown in Figure 6-12.

9781484218297_Fig06-12.jpg

Figure 6-12. Adding Table Data

Querying a Table

The SELECT statement, which has the following syntax, is used to query a table.

SELECT <select-clause>
                  FROM <tablename>
                  ( WHERE <where-clause> )?
                  ( ORDER BY <order-by> )?
                  ( LIMIT <integer> )?
                  ( ALLOW FILTERING )?

For the complete syntax for the SELECT statement refer to https://cassandra.apache.org/doc/cql3/CQL.html#selectStmt. As an example select all columns from the catalog table.

SELECT * FROM catalog;

The two rows of data added previously get listed as shown in Figure 6-13.

9781484218297_Fig06-13.jpg

Figure 6-13. Querying Table

Deleting from a Table

The DELETE statement is used to delete columns and rows and has the following syntax.

DELETE ( <selection> ( ’,’ <selection> )* )?
                  FROM <tablename>
                  ( USING TIMESTAMP <integer>)?
                  WHERE <where-clause>
                  ( IF ( EXISTS | ( <condition> ( AND <condition> )*) ) )?

For complete syntax for the DELETE statement refer to https://cassandra.apache.org/doc/cql3/CQL.html#deleteStmt. As an example, delete all columns from the row with catalog_id as catalog1.

DELETE catalog_id, journal, publisher, edition, title, author from catalog WHERE catalog_id=’catalog1’;

Subsequently, query the catalog table with the SELECT statement.

SELECT * FROM catalog;

Column values from the row with catalog_id as catalog1 get deleted, but the row itself including the primary key column value do not get deleted even though the primary key catalog_id is listed as one of the columns to delete. Subsequent query lists the primary key column value but lists the column values for the other columns as null as shown in Figure 6-14.

9781484218297_Fig06-14.jpg

Figure 6-14. Deleting Table Data

Truncating a Table

The TRUNCATE statement removes all data from a table and has the following syntax.

TRUNCATE <tablename>

As an example, truncate the catalog table. Subsequently, run a query with the SELECT statement.

TRUNCATE catalog;
SELECT * from catalog;

As the output of the query indicates, no data is listed because the TRUNCATE statement has removed all data as shown in Figure 6-15.

9781484218297_Fig06-15.jpg

Figure 6-15. Truncating a Table

Dropping A Table

The DROP TABLE or DROP COLUMN FAMILY statement is used to drop a table and has the following syntax.

DROP TABLE ( IF EXISTS )? <tablename>

As an example, drop the catalog table.

DROP TABLE IF EXISTS catalog;

If the IF EXISTS clause is not specified and the table does not exist, an error is generated. But with the IF EXISTS clause, an error is not generated as indicated by two consecutively run DROP TABLE statements with the IF EXISTS clause included in Figure 6-16.

9781484218297_Fig06-16.jpg

Figure 6-16. Dropping a Table

Dropping a Keyspace

The DROP KEYSPACE statement, which has the following syntax, removes the specified key space including the column families in the key space and the data in the column families, and the keyspace does not have to be empty before being dropped.

 DROP KEYSPACE ( IF EXISTS )? <identifier>

As an example, drop the CatalogKeyspace keyspace.

DROP KEYSPACE IF EXISTS CatalogKeyspace;

If the IF EXISTS clause is not specified and the keyspace does not exist, an error is generated. But with the IF EXISTS clause, an error is not generated as indicated by two consecutively run DROP KEYSPACE statements with the IF EXISTS clause included as shown in Figure 6-17.

9781484218297_Fig06-17.jpg

Figure 6-17. Dropping a Keyspace

Exiting CQL Shell

To exit the cqlsh shell specify the exit command as shown in Figure 6-18. Subsequently exit the tty with the exit command also.

9781484218297_Fig06-18.jpg

Figure 6-18. Exiting CQL Shell

Stopping Apache Cassandra

To stop Apache Cassandra, stop the Docker container running the Apache Cassandra server.

sudo docker stop cassandradb

Subsequently, run the following command to list the running containers.

sudo docker ps

The cassndradb container does not get listed as running as shown in Figure 6-19.

9781484218297_Fig06-19.jpg

Figure 6-19. Stopping Cassandra DB Docker Container

Starting Multiple Instances of Apache Cassandra

Multiple Docker containers running Apache Cassandra instances may be started, but the container name has to be unique. As an example, start a new Docker container also called cassandradb to run another instance of Apache Cassandra database.

sudo docker run -t -i -v /cassandra/data:/var/lib/cassandra/data --name cassandradb -d -p 7000:7000  cassandra

Because a Docker container with the same name (cassandradb) was already created earlier, an error is generated even though the container has been stopped as shown in Figure 6-20. A container has to be removed with the docker rm command to be able to create a new container with the same name.

9781484218297_Fig06-20.jpg

Figure 6-20. Duplicate Docker Container name error

Another container with a different name, cassandradb2 for example, may be started.

sudo docker run -t -i -v /cassandra/data:/var/lib/cassandra/data --name cassandradb2 -d -p 7000:7000  cassandra

Start a third container and specify the CASSANDRA_SEEDS environment variable for the IP address/es to be used to run multiple nodes in the cluster if required.

sudo docker run -t -i -v /cassandra/data:/var/lib/cassandra/data --name cassandradb3 -d -p 7000:7000 -e CASSANDRA_SEEDS=52.91.214.50,54.86.243.122,54.86.205.95 cassandra

Subsequently, run the following command to list the running containers.

sudo docker ps

The cassandradb2 and cassandradb3 containers get listed as running as shown in Figure 6-21.

9781484218297_Fig06-21.jpg

Figure 6-21. Running Multiple Docker Containers for Instances of Apache Cassandra

Summary

In this chapter we use the Docker image for Apache Cassandra to run Apache Cassandra in a Docker container. We used the different CQL statements in a cqlsh shell to create a Keyspace, create a table in the Keyspace and add data to the table. We also ran CQL statements to query a table, delete data from the table, truncate a table, drop a table, and drop a keyspace. We also demonstrated creating multiple Docker containers to run multiple instances of Apache Cassandra. In the next chapter we shall run Couchbase Server in Docker.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset