Exercises

In the one-hot encoding solution, can you use different classifiers supported in PySpark instead of logistic regression, such as decision tree, random forest, and linear SVM?
In the feature hashing solution, can you try other hash sizes, such as 5,000, and 20,000? What do you observe?
In the feature interaction solution, can you try other interactions, such as C1 and C20?
Can you first use feature interaction and then feature hashing in order to lower the expanded dimension? Are you able to obtain higher AUC?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Exercises