Congratulations, you’ve made it to the end of the book. Where do you go from here? The answer depends on how you got here. If you read the book from start to end, we suggest implementing your own topologies while referring back to various chapters until you feel like you’re “getting the hang of Storm.” We hesitate to say “mastering Storm” as we’re not sure you’ll ever feel like you’re mastering Storm. It’s a powerful and complicated beast, and mastery is a tricky thing.
If you took a more iterative approach to the book, working through it slowly and gaining expertise as you went along, then everything else that follows in this afterword is for you. Don’t worry if you took the start-to-end approach; this afterword will be waiting for you once you feel like you’re getting the hang of Storm. Here are all the things we want you to know as you set off on the rest of your Storm journey without us.
We’ve been using Storm in production for quite a while now, and we’re still learning new things all the time. Don’t worry if you feel like you don’t know everything. Use what you know to get what you need done. You’ll learn more as you go. Analysis paralysis can be a real thing with Storm.
We haven’t covered every last nook and cranny of Storm. Dig into the official documentation, join the IRC channel, and join the mailing list. Storm is an evolving project. At the time this book is going to press, it hasn’t even reached version 1.0. If you’re using Storm for business-critical processes, make sure you know how to stay up to date. Here are a couple of things we think you should keep an eye on:
What’s Yarn? What’s Mesos? That’s really a book unto itself. For now, let’s just say they’re cluster resource managers that can allow you to share Storm cluster resources with other technologies such as Hadoop. That’s a gross simplification. We strongly advise you to check out Yarn and Mesos if you are planning on running a large Storm cluster in production. There’s a lot of exciting stuff going on in those projects.
The metrics support in Storm is pretty young. We suspect it will grow a lot more robust over time. Additionally, the most recent version of Storm introduced a REST API that allows you to access the information from the Storm UI in a programmatic fashion. That’s not particularly exciting outside of a couple of automation or monitoring scenarios. But it creates a path for exposing more information about what’s going on inside Storm to the outside world in an easily accessible fashion. We wouldn’t be surprised at all if some really cool things were built by exposing still more info via that API.
We spent one chapter on Trident. A lot of debate went into how much we should cover Trident. This ranged from nothing to several chapters. We settled on a single chapter to get you going with Trident. Why? Well, we considered not covering Trident at all. You can happily use Storm without ever needing to touch Trident. We don’t consider it a core part of Storm, but one of many abstractions you can build on top of Storm (more on that later). Even if that’s true, we were disabused of the notion that we couldn’t cover it at all based on feedback where every early reviewer brought up Trident as a must-cover topic.
We considered spending three chapters on Trident much like we had three chapters on core Storm (chapters 2 to 4) and introducing it in the same fashion. If we were writing a book on Trident, we would have taken that approach, but large portions of those chapters would have mirrored the content in chapters 2 to 4. Trident is, after all, an abstraction on top of Storm. We settled on a single chapter intro to Trident because we felt that as long as you understood the basics of Trident, everything else would flow from there. There are many more Trident operations we didn’t cover, but they all operate in the same fashion as the ones we did cover. If Trident seems like a better approach than core Storm for your problems, we feel we’ve given you what you need to dig in and start solving with Trident.
Use Trident only when you need to. Trident adds a lot of complexity compared to core Storm. It’s easier to debug problems with core Storm because there are fewer layers of abstraction to get through. Core Storm is also considerably faster than Trident. If you are really concerned about speed, favor core Storm. Why might you need to use Trident?
Trident isn’t the only abstraction that runs on Storm. We’ve seen numerous projects come and go in GitHub that try to build on top of Storm. Honestly, most of them weren’t that interesting. If you do the same type of work in topology after topology, perhaps you too will create your own abstraction over Storm to make that particular workflow easier. The most interesting abstraction over Storm that currently exists is Algebird (https://github.com/twitter/algebird) from Twitter.
Algebird is a Scala library that allows you to write abstract algebra code that can be “compiled” to run on either Storm or Hadoop. Why is that interesting? You can code up various algorithms and then reuse them in both batch and streaming contexts. That’s pretty damn cool if you ask us. Even if you don’t need to write reusable algebras, we suggest you check out the project if you’re interested in building abstractions on top of Storm; you can learn a lot from it.
And that really is it from us. Good luck; we’re rooting for you! Sean, Matt, and Peter out.