Foreword

“Backend rewrites are always hard.”

That’s how ours began, with a simple statement from my brilliant and trusted colleague, Keith Bourgoin. We had been working on the original web analytics backend behind Parse.ly for over a year. We called it “PTrack”.

Parse.ly uses Python, so we built our systems atop comfortable distributed computing tools that were handy in that community, such as multiprocessing and celery. Despite our mastery of these, it seemed like every three months, we’d double the amount of traffic we had to handle and hit some other limitation of those systems. There had to be a better way.

So, we started the much-feared backend rewrite. This new scheme to process our data would use small Python processes that communicated via ZeroMQ. We jokingly called it “PTrack3000,” referring to the “Python3000” name given to the future version of Python by the language’s creator, when it was still a far-off pipe dream.

By using ZeroMQ, we thought we could squeeze more messages per second out of each process and keep the system operationally simple. But what this setup gained in operational ease and performance, it lost in data reliability.

Then, something magical happened. BackType, a startup whose progress we had tracked in the popular press,[1] was acquired by Twitter. One of the first orders of business upon being acquired was to publicly release its stream processing framework, Storm, to the world.

1 This article, “Secrets of BackType’s Data Engineers” (2011), was passed around my team for a while before Storm was released: http://readwrite.com/2011/01/12/secrets-of-backtypes-data-engineers.

My colleague Keith studied the documentation and code in detail, and realized: Storm was exactly what we needed!

It even used ZeroMQ internally (at the time) and layered on other tooling for easy parallel processing, hassle-free operations, and an extremely clever data reliability model. Though it was written in Java, it included some documentation and examples for making other languages, like Python, play nicely with the framework. So, with much glee, “PTrack9000!” (exclamation point required) was born: a new Parse.ly analytics backend powered by Storm.

Nathan Marz, Storm’s original creator, spent some time cultivating the community via conferences, blog posts, and user forums.[2] But in those early days of the project, you had to scrape tiny morsels of Storm knowledge from the vast web.

2 Nathan Marz wrote this blog post about his early efforts at evangelizing the project in “History of Apache Storm and lessons learned” (2014): http://nathanmarz.com/blog/history-of-apache-storm-and-lessons-learned.html.

Oh, how I wish Storm Applied, the book you’re currently reading, had already been written in 2011. Although Storm’s documentation on its design rationale was very strong, there were no practical guides on making use of Storm (especially in a production setting) when we adopted it. Frustratingly, despite a surge of popularity over the next three years, there were still no good books on the subject through the end of 2014!

No one had put in the significant effort required to detail how Storm components worked, how Storm code should be written, how to tune topology performance, and how to operate these clusters in the real world. That is, until now. Sean, Matthew, and Peter decided to write Storm Applied by leveraging their hard-earned production experience at TheLadders, and it shows. This will, no doubt, become the definitive practitioner’s guide for Storm users everywhere.

Through their clear prose, illuminating diagrams, and practical code examples, you’ll gain as much Storm knowledge in a few short days as it took my team several years to acquire. You will save yourself many stressful firefights, head-scratching moments, and painful code re-architectures.

I’m convinced that with the newfound understanding provided by this book, the next time a colleague turns to you and says, “Backend rewrites are always hard,” you’ll be able to respond with confidence: “Not this time.”

Happy hacking!

ANDREW MONTALENTI

COFOUNDER & CTO, PARSE.LY[3]

3 Parse.ly’s web analytics system for digital storytellers is powered by Storm: http://parse.ly.

CREATOR OF STREAMPARSE, A PYTHON PACKAGE FOR STORM[4]

4 To use Storm with Python, you can find the streamparse project on Github: https://github.com/Parsely/streamparse.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset