Summary

In this chapter, we introduced a new class of algorithms, that is, frequent pattern mining applications, and showed you how to deploy them in a real-world scenario. We first discussed the very basics of pattern mining and the problems that can be addressed using these techniques. In particular, we saw how to implement the three available algorithms in Spark, FP-growth, association rules, and prefix span. As a running example for the applications we used clickstream data provided by MSNBC, which also helped us to compare the algorithms qualitatively.

Next, we introduced the basic terminology and entry points of Spark Streaming and considered a few real-world scenarios. We discussed how to deploy and evaluate one of the frequent pattern mining algorithms with a streaming context first. After that, we addressed the problem of aggregating user session data from raw streaming data. To this end, we had to find a solution to mock providing click data as streaming events.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset