1.2. How Storm fits into the big data picture
Chapter 2. Core Storm concepts
2.1. Problem definition: GitHub commit count dashboard
2.3. Implementing a GitHub commit count dashboard in Storm
3.1. Approaching topology design
3.2. Problem definition: a social heat map
3.3. Precepts for mapping the solution to Storm
3.3.1. Consider the requirements imposed by the data stream
3.4. Initial implementation of the design
3.4.1. Spout: read data from a source
3.4.2. Bolt: connect to an external service
3.4.3. Bolt: collect data in-memory
3.4.4. Bolt: persisting to a data store
3.4.5. Defining stream groupings between the components
3.4.6. Building a topology for running in local cluster mode
3.5.1. Understanding parallelism in Storm
3.5.2. Adjusting the topology to address bottlenecks inherent within design
3.5.3. Adjusting the topology to address bottlenecks inherent within a data stream
3.6. Topology design paradigms
3.6.1. Design by breakdown into functional components
3.6.2. Design by breakdown into components at points of repartition
3.6.3. Simplest functional components vs. lowest number of repartitions
Chapter 4. Creating robust topologies
4.1. Requirements for reliability
4.2. Problem definition: a credit card authorization system
4.2.1. A conceptual solution with retry characteristics
4.2.2. Defining the data points
4.2.3. Mapping the solution to Storm with retry characteristics
4.3. Basic implementation of the bolts
4.4. Guaranteed message processing
4.4.1. Tuple states: fully processed vs. failed
4.5.1. Degrees of reliability in Storm
4.5.2. Examining exactly once processing in a Storm topology
Chapter 5. Moving from local to remote topologies
5.1.1. The anatomy of a worker node
5.1.2. Presenting a worker node within the context of the credit card authorization topology
5.2. Fail-fast philosophy for fault tolerance within a Storm cluster
5.3. Installing a Storm cluster
5.3.1. Setting up a Zookeeper cluster
5.3.2. Installing the required Storm dependencies to master and worker nodes
5.3.3. Installing Storm to master and worker nodes
5.3.4. Configuring the master and worker nodes via storm.yaml
5.4. Getting your topology to run on a Storm cluster
5.4.1. Revisiting how to put together the topology components
5.4.2. Running topologies in local mode
5.5. The Storm UI and its role in the Storm cluster
5.5.1. Storm UI: the Storm cluster summary
6.1. Problem definition: Daily Deals! reborn
6.2.1. Spout: read from a data source
6.2.2. Bolt: find recommended sales
6.3.1. The Storm UI: your go-to tool for tuning
6.3.2. Establishing a baseline set of performance numbers
6.3.3. Identifying bottlenecks
6.3.4. Spouts: controlling the rate data flows into a topology
6.4. Latency: when external systems take their time
6.5. Storm’s metrics-collecting API
6.5.1. Using Storm’s built-in CountMetric
6.5.2. Setting up a metrics consumer
Chapter 7. Resource contention
7.1. Changing the number of worker processes running on a worker node
7.2. Changing the amount of memory allocated to worker processes (JVMs)
7.3. Figuring out which worker nodes/processes a topology is executing on
7.4. Contention for worker processes in a Storm cluster
7.5. Memory contention within a worker process (JVM)
7.6. Memory contention on a worker node
7.7. Worker node CPU contention
7.8. Worker node I/O contention
8.1. The commit count topology revisited
8.1.1. Reviewing the topology design
8.1.2. Thinking of the topology as running on a remote Storm cluster
8.1.3. How data flows between the spout and bolts in the cluster
8.2. Diving into the details of an executor
8.2.1. Executor details for the commit feed listener spout
8.2.2. Transferring tuples between two executors on the same JVM
8.2.3. Executor details for the email extractor bolt
8.2.4. Transferring tuples between two executors on different JVMs
8.4. Knowing when Storm’s internal queues overflow
8.4.1. The various types of internal queues and how they might overflow
8.4.2. Using Storm’s debug logs to diagnose buffer overflowing
8.5. Addressing internal Storm buffers overflowing
8.5.1. Adjust the production-to-consumption ratio
8.5.2. Increase the size of the buffer for all topologies
9.2. Kafka and its role with Trident
9.3. Problem definition: Internet radio
9.4. Implementing the internet radio design as a Trident topology
9.4.1. Implementing the spout with a Trident Kafka spout
9.4.2. Deserializing the play log and creating separate streams for each of the fields
9.4.3. Calculating and persisting the counts for artist, title, and tag
9.5. Accessing the persisted counts through DRPC
9.6. Mapping Trident operations to Storm primitives
9.7. Scaling a Trident topology