Afterword

Congratulations, you’ve made it to the end of the book. Where do you go from here? The answer depends on how you got here. If you read the book from start to end, we suggest implementing your own topologies while referring back to various chapters until you feel like you’re “getting the hang of Storm.” We hesitate to say “mastering Storm” as we’re not sure you’ll ever feel like you’re mastering Storm. It’s a powerful and complicated beast, and mastery is a tricky thing.

If you took a more iterative approach to the book, working through it slowly and gaining expertise as you went along, then everything else that follows in this afterword is for you. Don’t worry if you took the start-to-end approach; this afterword will be waiting for you once you feel like you’re getting the hang of Storm. Here are all the things we want you to know as you set off on the rest of your Storm journey without us.

You’re right, you don’t know that

We’ve been using Storm in production for quite a while now, and we’re still learning new things all the time. Don’t worry if you feel like you don’t know everything. Use what you know to get what you need done. You’ll learn more as you go. Analysis paralysis can be a real thing with Storm.

There’s so much to know

We haven’t covered every last nook and cranny of Storm. Dig into the official documentation, join the IRC channel, and join the mailing list. Storm is an evolving project. At the time this book is going to press, it hasn’t even reached version 1.0. If you’re using Storm for business-critical processes, make sure you know how to stay up to date. Here are a couple of things we think you should keep an eye on:

  • Storm on Yarn
  • Storm on Mesos

What’s Yarn? What’s Mesos? That’s really a book unto itself. For now, let’s just say they’re cluster resource managers that can allow you to share Storm cluster resources with other technologies such as Hadoop. That’s a gross simplification. We strongly advise you to check out Yarn and Mesos if you are planning on running a large Storm cluster in production. There’s a lot of exciting stuff going on in those projects.

Metrics and reporting

The metrics support in Storm is pretty young. We suspect it will grow a lot more robust over time. Additionally, the most recent version of Storm introduced a REST API that allows you to access the information from the Storm UI in a programmatic fashion. That’s not particularly exciting outside of a couple of automation or monitoring scenarios. But it creates a path for exposing more information about what’s going on inside Storm to the outside world in an easily accessible fashion. We wouldn’t be surprised at all if some really cool things were built by exposing still more info via that API.

Trident is quite a beast

We spent one chapter on Trident. A lot of debate went into how much we should cover Trident. This ranged from nothing to several chapters. We settled on a single chapter to get you going with Trident. Why? Well, we considered not covering Trident at all. You can happily use Storm without ever needing to touch Trident. We don’t consider it a core part of Storm, but one of many abstractions you can build on top of Storm (more on that later). Even if that’s true, we were disabused of the notion that we couldn’t cover it at all based on feedback where every early reviewer brought up Trident as a must-cover topic.

We considered spending three chapters on Trident much like we had three chapters on core Storm (chapters 2 to 4) and introducing it in the same fashion. If we were writing a book on Trident, we would have taken that approach, but large portions of those chapters would have mirrored the content in chapters 2 to 4. Trident is, after all, an abstraction on top of Storm. We settled on a single chapter intro to Trident because we felt that as long as you understood the basics of Trident, everything else would flow from there. There are many more Trident operations we didn’t cover, but they all operate in the same fashion as the ones we did cover. If Trident seems like a better approach than core Storm for your problems, we feel we’ve given you what you need to dig in and start solving with Trident.

When should I use Trident?

Use Trident only when you need to. Trident adds a lot of complexity compared to core Storm. It’s easier to debug problems with core Storm because there are fewer layers of abstraction to get through. Core Storm is also considerably faster than Trident. If you are really concerned about speed, favor core Storm. Why might you need to use Trident?

  • “What” not “how” is very important to you.

    • The important algorithmic details of your computation are hard to follow using core Storm but are very clear using Trident. If your process is all about the algorithm, and it’s hard to see what’s going on with core Storm, maintenance is going to be difficult.
  • You need exactly once processing.

    • As we discussed in chapter 4, exactly once processing is very hard to achieve; some would say it’s impossible. We won’t go that far. We will say that there are scenarios where it’s impossible. Even when it is possible, getting it right can be hard. Trident can help you build an exactly once processing system. You can do that with core Storm as well but there’s more work involved on your part.
  • You need to maintain state.

    • Again, you can do this with core Storm, but Trident is good at maintaining state, and DRPC provides a nice way to get at that state. If your workload is less about data pipelines (transforming input to output and feeding that output into another data pipeline) and more about creating queryable pools of data, then Trident state with DRPC can help you get there.

Abstractions! Abstractions everywhere!

Trident isn’t the only abstraction that runs on Storm. We’ve seen numerous projects come and go in GitHub that try to build on top of Storm. Honestly, most of them weren’t that interesting. If you do the same type of work in topology after topology, perhaps you too will create your own abstraction over Storm to make that particular workflow easier. The most interesting abstraction over Storm that currently exists is Algebird (https://github.com/twitter/algebird) from Twitter.

Algebird is a Scala library that allows you to write abstract algebra code that can be “compiled” to run on either Storm or Hadoop. Why is that interesting? You can code up various algorithms and then reuse them in both batch and streaming contexts. That’s pretty damn cool if you ask us. Even if you don’t need to write reusable algebras, we suggest you check out the project if you’re interested in building abstractions on top of Storm; you can learn a lot from it.

And that really is it from us. Good luck; we’re rooting for you! Sean, Matt, and Peter out.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset