Summary

In this chapter, we have shown how to develop large-scale machine learning applications from real-time Twitter stream data and graph data. We have discussed the social network and time-series data analysis. In addition, we also developed an emerging recommendation application by using the content-based collaborative filtering algorithms of Spark MLlib to make movie recommendations for users. These applications, however, can be extended and deployed for other use cases.

It is worth noting that the current implementation of Spark contains a few implemented algorithms for the streaming or network data analysis. However, we can hope that, for example, GraphX will be improved in the future and extended for not only Scala, but for Java, R, and Python too. In the next chapter, we will focus on how to interact with external data sources to make the Spark working environment more diverse.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset