Multivariate regression

It is possible to minimize multiple metrics at the same time. While Spark only has a few multivariate analysis tools, other more traditional well-established packages come with Multivariate Analysis of Variance (MANOVA), a generalization of Analysis of Variance (ANOVA) method. I will cover ANOVA and MANOVA in Chapter 7, Working with Graph Algorithms.

For a practical analysis, we first need to understand if the target variables are correlated, for which we can use the PCA Spark implementation covered in Chapter 3, Working with Spark and MLlib. If the dependent variables are strongly correlated, maximizing one leads to maximizing the other, and we can just maximize the first principal component (and potentially build a regression model on the second component to understand what drives the difference).

If the targets are uncorrelated, building a separate model for each of them can pinpoint the important variables that drive either and whether these two sets are disjoint. In the latter case, we could build two separate models to predict each of the targets independently.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset