0%

Book Description

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you’ll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This third edition—updated for Cassandra 4.0—provides the technical details and practical examples you need to put this database to work in a production environment.

Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra’s nonrelational design, with special attention to data modeling. If you’re a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandra’s speed and flexibility.

  • Understand Cassandra’s distributed and decentralized structure
  • Use the Cassandra Query Language (CQL) and cqlsh—the CQL shell
  • Create a working data model and compare it with an equivalent relational model
  • Develop sample applications using client drivers for languages including Java, Python, and Node.js
  • Explore cluster topology and learn how nodes exchange data

Table of Contents

  1. Preface
    1. Why Apache Cassandra?
    2. Is This Book for You?
    3. What’s in This Book?
    4. New for the Second Edition
    5. New for the Third Edition
    6. Conventions Used in This Book
    7. Using Code Examples
    8. O’Reilly Online Learning
    9. How to Contact Us
    10. Acknowledgments
  2. 1. Beyond Relational Databases
    1. What’s Wrong with Relational Databases?
    2. A Quick Review of Relational Databases
      1. Transactions, ACID-ity, and two-phase commit
      2. Schema
      3. Sharding and shared-nothing architecture
    3. Web Scale
    4. The Rise of NoSQL
    5. Summary
  3. 2. Introducing Cassandra
    1. The Cassandra Elevator Pitch
      1. Cassandra in 50 Words or Less
      2. Distributed and Decentralized
      3. Elastic Scalability
      4. High Availability and Fault Tolerance
      5. Tuneable Consistency
      6. Brewer’s CAP Theorem
      7. Row-Oriented
      8. High Performance
    2. Where Did Cassandra Come From?
    3. Is Cassandra a Good Fit for My Project?
      1. Large Deployments
      2. Lots of Writes, Statistics, and Analysis
      3. Geographical Distribution
      4. Hybrid- and Multi-Cloud Deployment
    4. Getting Involved
    5. Summary
  4. 3. Installing Cassandra
    1. Installing the Apache Distribution
      1. Extracting the Download
      2. What’s In There?
    2. Building from Source
      1. Additional Build Targets
    3. Running Cassandra
      1. Setting the environment
      2. Starting the Server
      3. Stopping Cassandra
    4. Other Cassandra Distributions
    5. Running the CQL Shell
    6. Basic cqlsh Commands
      1. cqlsh Help
      2. Describing the Environment in cqlsh
      3. Creating a Keyspace and Table in cqlsh
      4. Writing and Reading Data in cqlsh
    7. Running Cassandra in Docker
    8. Summary
  5. 4. The Cassandra Query Language
    1. The Relational Data Model
    2. Cassandra’s Data Model
      1. Clusters
      2. Keyspaces
      3. Tables
      4. Columns
    3. CQL Types
      1. Numeric Data Types
      2. Textual Data Types
      3. Time and Identity Data Types
      4. Other Simple Data Types
      5. Collections
      6. Tuples
      7. User-Defined Types
    4. Summary
  6. 5. Data Modeling
    1. Conceptual Data Modeling
    2. RDBMS Design
      1. Design Differences Between RDBMS and Cassandra
    3. Defining Application Queries
    4. Logical Data Modeling
      1. Hotel Logical Data Model
      2. Reservation Logical Data Model
    5. Physical Data Modeling
      1. Hotel Physical Data Model
      2. Reservation Physical Data Model
    6. Evaluating and Refining
      1. Calculating Partition Size
      2. Calculating Size on Disk
      3. Breaking Up Large Partitions
    7. Defining Database Schema
      1. Cassandra Data Modeling Tools
    8. Summary
  7. 6. The Cassandra Architecture
    1. Data Centers and Racks
    2. Gossip and Failure Detection
    3. Snitches
    4. Rings and Tokens
    5. Virtual Nodes
    6. Partitioners
    7. Replication Strategies
    8. Consistency Levels
    9. Queries and Coordinator Nodes
    10. Hinted Handoff
    11. Anti-Entropy, Repair, and Merkle Trees
    12. Lightweight Transactions and Paxos
    13. Memtables, SSTables, and Commit Logs
    14. Bloom Filters
    15. Caching
    16. Compaction
    17. Deletion and Tombstones
    18. Managers and Services
      1. Cassandra Daemon
      2. Storage Engine
      3. Storage Service
      4. Storage Proxy
      5. Messaging Service
      6. Stream Manager
      7. CQL Native Transport Server
    19. System Keyspaces
    20. Summary
  8. 7. Designing Applications with Cassandra
    1. Hotel Application Design
      1. Cassandra and Microservice Architecture
      2. Microservice Architecture for a Hotel Application
      3. Identifying Bounded Contexts
      4. Identifying Services
      5. Designing Microservice Persistence
    2. Extending Designs
      1. Secondary Indexes
      2. Materialized Views
    3. Reservation Service: A Sample Microservice
      1. Design Choices for a Java Microservice
    4. Deployment and Integration Considerations
      1. Services, Keyspaces and Clusters
      2. Data Centers and Load Balancing
      3. Interactions Between Microservices
    5. Summary
  9. 8. Application Development with Drivers
    1. DataStax Java Driver
      1. Development Environment Configuration
      2. Connecting to a Cluster
      3. Statements
      4. Simple Statements
      5. Prepared Statements
      6. Query Builder
      7. Object Mapper
      8. Asynchronous Execution
      9. Driver Configuration
      10. Metadata
      11. Debugging and Monitoring
    2. Other Cassandra Drivers
    3. Summary
  10. 9. Writing and Reading Data
    1. Writing
      1. Write Consistency Levels
      2. The Cassandra Write Path
      3. Writing Files to Disk
      4. Lightweight Transactions
      5. Batches
    2. Reading
      1. Read Consistency Levels
      2. The Cassandra Read Path
      3. Read Repair
      4. Range Queries, Ordering and Filtering
      5. Paging
    3. Deleting
    4. Summary
  11. 10. Configuring and Deploying Cassandra
    1. Cassandra Cluster Manager
      1. Creating a Cluster
      2. Adding Nodes to a Cluster
      3. Dynamic Ring Participation
    2. Node Configuration
      1. Seed Nodes
      2. Snitches
      3. Partitioners
      4. Tokens and Virtual Nodes
      5. Network Interfaces
      6. Data Storage
      7. Startup and JVM Settings
    3. Planning a Cluster Deployment
      1. Cluster Topology and Replication Strategies
      2. Sizing Your Cluster
      3. Selecting Instances
      4. Storage
      5. Network
    4. Cloud Deployment
      1. Amazon Web Services
      2. Google Cloud Platform
      3. Microsoft Azure
    5. Summary
  12. 11. Monitoring
    1. Monitoring Cassandra with JMX
    2. Cassandra’s MBeans
      1. Database MBeans
      2. Cluster-Related MBeans
      3. Internal MBeans
    3. Monitoring with nodetool
      1. Getting Cluster Information
      2. Getting Statistics
    4. Virtual Tables
      1. System Virtual Schema
      2. System Views
    5. Metrics
    6. Logging
      1. Examining Log Files
      2. Full Query Logging
    7. Summary
  13. 12. Maintenance
    1. Health Check
    2. Basic Maintenance
      1. Flush
      2. Cleanup
      3. Repair
      4. Rebuilding Indexes
      5. Moving Tokens
    3. Adding Nodes
      1. Adding Nodes to an Existing Data Center
      2. Adding a Data Center to a Cluster
    4. Handling Node Failure
      1. Repairing Failed Nodes
      2. Replacing Nodes
      3. Removing Nodes
    5. Upgrading Cassandra
    6. Backup and Recovery
      1. Taking a Snapshot
      2. Clearing a Snapshot
      3. Enabling Incremental Backup
      4. Restoring from Snapshot
    7. SSTable Utilities
    8. Maintenance Tools
      1. Netflix Priam
      2. DataStax OpsCenter
      3. Cassandra Sidecars
      4. Cassandra Kubernetes Operators
    9. Summary
  14. 13. Performance Tuning
    1. Managing Performance
      1. Setting Performance Goals
      2. Benchmarking and Stress Testing
      3. Monitoring Performance
      4. Analyzing Performance Issues
      5. Tracing
      6. Tuning Methodology
    2. Caching
      1. Key Cache
      2. Row Cache
      3. Counter Cache
      4. Saved Cache Settings
    3. Memtables
    4. Commit Logs
    5. SSTables
    6. Hinted Handoff
    7. Compaction
    8. Concurrency and Threading
    9. Networking and Timeouts
    10. JVM Settings
      1. Memory
      2. Garbage Collection
    11. Summary
  15. 14. Security
    1. Authentication and Authorization
      1. Password Authenticator
      2. Using CassandraAuthorizer
      3. Role-Based Access Control
    2. Encryption
      1. SSL, TLS, and Certificates
      2. Node-to-Node Encryption
      3. Client-to-Node Encryption
    3. JMX Security
      1. Securing JMX Access
      2. Security MBeans
    4. Audit Logging
    5. Summary
  16. 15. Migrating and Integrating
    1. Knowing when to migrate
    2. Adapting the data model
      1. Translating Entities
      2. Translating Relationships
    3. Adapting the application
      1. Refactoring Data Access
      2. Maintaining Consistency
      3. Migrating Stored Procedures
    4. Planning the deployment
    5. Migrating data
      1. Zero Downtime Migration
      2. Bulk Loading
    6. Common Integrations
      1. Managing data flow with Apache Kafka
      2. Searching with Apache Lucene, SOLR, and Elasticsearch
      3. Analyzing data with Apache Spark
    7. Summary