In this chapter, we will explore Spark's capability to perform regression tasks with models such as linear regression and support-vector machines (SVMs). We will learn how to compute summary statistics with MLlib, and discover correlations in datasets using Pearson and Spearman correlations. We will also test our hypothesis on large datasets.
We will cover the following topics:
- Computing summary statistics with MLlib
- Using the Pearson and Spearman methods to discover correlations
- Testing our hypotheses on large datasets