Table of Contents

Copyright

Brief Table of Contents

Table of Contents

Foreword

Preface

Acknowledgments

About this Book

About the Cover Illustration

Chapter 1. Introducing Storm

1.1. What is big data?

1.1.1. The four Vs of big data

1.1.2. Big data tools

1.2. How Storm fits into the big data picture

1.2.1. Storm vs. the usual suspects

1.3. Why you’d want to use Storm

1.4. Summary

Chapter 2. Core Storm concepts

2.1. Problem definition: GitHub commit count dashboard

2.1.1. Data: starting and ending points

2.1.2. Breaking down the problem

2.2. Basic Storm concepts

2.2.1. Topology

2.2.2. Tuple

2.2.3. Stream

2.2.4. Spout

2.2.5. Bolt

2.2.6. Stream grouping

2.3. Implementing a GitHub commit count dashboard in Storm

2.3.1. Setting up a Storm project

2.3.2. Implementing the spout

2.3.3. Implementing the bolts

2.3.4. Wiring everything together to form the topology

2.4. Summary

Chapter 3. Topology design

3.1. Approaching topology design

3.2. Problem definition: a social heat map

3.2.1. Formation of a conceptual solution

3.3. Precepts for mapping the solution to Storm

3.3.1. Consider the requirements imposed by the data stream

3.3.2. Represent data points as tuples

3.3.3. Steps for determining the topology composition

3.4. Initial implementation of the design

3.4.1. Spout: read data from a source

3.4.2. Bolt: connect to an external service

3.4.3. Bolt: collect data in-memory

3.4.4. Bolt: persisting to a data store

3.4.5. Defining stream groupings between the components

3.4.6. Building a topology for running in local cluster mode

3.5. Scaling the topology

3.5.1. Understanding parallelism in Storm

3.5.2. Adjusting the topology to address bottlenecks inherent within design

3.5.3. Adjusting the topology to address bottlenecks inherent within a data stream

3.6. Topology design paradigms

3.6.1. Design by breakdown into functional components

3.6.2. Design by breakdown into components at points of repartition

3.6.3. Simplest functional components vs. lowest number of repartitions

3.7. Summary

Chapter 4. Creating robust topologies

4.1. Requirements for reliability

4.1.1. Pieces of the puzzle for supporting reliability

4.2. Problem definition: a credit card authorization system

4.2.1. A conceptual solution with retry characteristics

4.2.2. Defining the data points

4.2.3. Mapping the solution to Storm with retry characteristics

4.3. Basic implementation of the bolts

4.3.1. The AuthorizeCreditCard implementation

4.3.2. The ProcessedOrderNotification implementation

4.4. Guaranteed message processing

4.4.1. Tuple states: fully processed vs. failed

4.4.2. Anchoring, acking, and failing tuples in our bolts

4.4.3. A spout’s role in guaranteed message processing

4.5. Replay semantics

4.5.1. Degrees of reliability in Storm

4.5.2. Examining exactly once processing in a Storm topology

4.5.3. Examining the reliability guarantees in our topology

4.6. Summary

Chapter 5. Moving from local to remote topologies

5.1. The Storm cluster

5.1.1. The anatomy of a worker node

5.1.2. Presenting a worker node within the context of the credit card authorization topology

5.2. Fail-fast philosophy for fault tolerance within a Storm cluster

5.3. Installing a Storm cluster

5.3.1. Setting up a Zookeeper cluster

5.3.2. Installing the required Storm dependencies to master and worker nodes

5.3.3. Installing Storm to master and worker nodes

5.3.4. Configuring the master and worker nodes via storm.yaml

5.3.5. Launching Nimbus and Supervisors under supervision

5.4. Getting your topology to run on a Storm cluster

5.4.1. Revisiting how to put together the topology components

5.4.2. Running topologies in local mode

5.4.3. Running topologies on a remote Storm cluster

5.4.4. Deploying a topology to a remote Storm cluster

5.5. The Storm UI and its role in the Storm cluster

5.5.1. Storm UI: the Storm cluster summary

5.5.2. Storm UI: individual Topology summary

5.5.3. Storm UI: individual spout/bolt summary

5.6. Summary

Chapter 6. Tuning in Storm

6.1. Problem definition: Daily Deals! reborn

6.1.1. Formation of a conceptual solution

6.1.2. Mapping the solution to Storm concepts

6.2. Initial implementation

6.2.1. Spout: read from a data source

6.2.2. Bolt: find recommended sales

6.2.3. Bolt: look up details for each sale

6.2.4. Bolt: save recommended sales

6.3. Tuning: I wanna go fast

6.3.1. The Storm UI: your go-to tool for tuning

6.3.2. Establishing a baseline set of performance numbers

6.3.3. Identifying bottlenecks

6.3.4. Spouts: controlling the rate data flows into a topology

6.4. Latency: when external systems take their time

6.4.1. Simulating latency in your topology

6.4.2. Extrinsic and intrinsic reasons for latency

6.5. Storm’s metrics-collecting API

6.5.1. Using Storm’s built-in CountMetric

6.5.2. Setting up a metrics consumer

6.5.3. Creating a custom SuccessRateMetric

6.5.4. Creating a custom MultiSuccessRateMetric

6.6. Summary

Chapter 7. Resource contention

7.1. Changing the number of worker processes running on a worker node

7.1.1. Problem

7.1.2. Solution

7.1.3. Discussion

7.2. Changing the amount of memory allocated to worker processes (JVMs)

7.2.1. Problem

7.2.2. Solution

7.2.3. Discussion

7.3. Figuring out which worker nodes/processes a topology is executing on

7.3.1. Problem

7.3.2. Solution

7.3.3. Discussion

7.4. Contention for worker processes in a Storm cluster

7.4.1. Problem

7.4.2. Solution

7.4.3. Discussion

7.5. Memory contention within a worker process (JVM)

7.5.1. Problem

7.5.2. Solution

7.5.3. Discussion

7.6. Memory contention on a worker node

7.6.1. Problem

7.6.2. Solution

7.6.3. Discussion

7.7. Worker node CPU contention

7.7.1. Problem

7.7.2. Solution

7.7.3. Discussion

7.8. Worker node I/O contention

7.8.1. Network/socket I/O contention

7.8.2. Disk I/O contention

7.9. Summary

Chapter 8. Storm internals

8.1. The commit count topology revisited

8.1.1. Reviewing the topology design

8.1.2. Thinking of the topology as running on a remote Storm cluster

8.1.3. How data flows between the spout and bolts in the cluster

8.2. Diving into the details of an executor

8.2.1. Executor details for the commit feed listener spout

8.2.2. Transferring tuples between two executors on the same JVM

8.2.3. Executor details for the email extractor bolt

8.2.4. Transferring tuples between two executors on different JVMs

8.2.5. Executor details for the email counter bolt

8.3. Routing and tasks

8.4. Knowing when Storm’s internal queues overflow

8.4.1. The various types of internal queues and how they might overflow

8.4.2. Using Storm’s debug logs to diagnose buffer overflowing

8.5. Addressing internal Storm buffers overflowing

8.5.1. Adjust the production-to-consumption ratio

8.5.2. Increase the size of the buffer for all topologies

8.5.3. Increase the size of the buffer for a given topology

8.5.4. Max spout pending

8.6. Tweaking buffer sizes for performance gain

8.7. Summary

Chapter 9. Trident

9.1. What is Trident?

9.1.1. The different types of Trident operations

9.1.2. Trident streams as a series of batches

9.2. Kafka and its role with Trident

9.2.1. Breaking down Kafka’s design

9.2.2. Kafka’s alignment with Trident

9.3. Problem definition: Internet radio

9.3.1. Defining the data points

9.3.2. Breaking down the problem into a series of steps

9.4. Implementing the internet radio design as a Trident topology

9.4.1. Implementing the spout with a Trident Kafka spout

9.4.2. Deserializing the play log and creating separate streams for each of the fields

9.4.3. Calculating and persisting the counts for artist, title, and tag

9.5. Accessing the persisted counts through DRPC

9.5.1. Creating a DRPC stream

9.5.2. Applying a DRPC state query to a stream

9.5.3. Making DRPC calls with a DRPC client

9.6. Mapping Trident operations to Storm primitives

9.7. Scaling a Trident topology

9.7.1. Partitions for parallelism

9.7.2. Partitions in Trident streams

9.8. Summary

 Afterword

You’re right, you don’t know that

There’s so much to know

Metrics and reporting

Trident is quite a beast

When should I use Trident?

Abstractions! Abstractions everywhere!

Index

List of Figures

List of Tables

List of Listings

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset