Further reading

My suggested go-to introduction to feature selection is Ando Sabaas' four-part exploration of a broad range of feature selection techniques. It's full of Python code snippets and informed commentary. Get started at http://blog.datadive.net/selecting-good-features-part-i-univariate-selection/.

For a discussion on feature selection and engineering that ranges across materials in chapters 6 and 7, consider Alexandre Bourhard-Côté's slides at http://people.eecs.berkeley.edu/~jordan/courses/294-fall09/lectures/feature/slides.pdf. Also consider reviewing Jeff Howbert's slides at http://courses.washington.edu/css490/2012.Winter/lecture_slides/05a_feature_creation_selection.pdf.

There is a shortage of thorough discussion of feature creation, with a lot of available material discussing either dimensionality reduction techniques or very specific feature creation as required in specific domains. One way to get a more general understanding of the range of possible transformations is to read code documentation. A decent place to build on your existing knowledge is Spark ML's feature-transformation algorithm documentation at https://spark.apache.org/docs/1.5.1/ml-features.html#feature-transformers, which describes a broad set of possible transformations on numerical and text features. Remember, though, that feature creation is often problem-specific, domain-specific, and a highly creative process. Once you've learned a range of technical options, the trick is in figuring out how to apply these techniques to the problem at hand!

For readers with an interest in hyperparameter optimization, I recommend that you read Alice Zheng's posts on Turi's blog as a great place to start: http://blog.turi.com/how-to-evaluate-machine-learning-models-part-4-hyperparameter-tuning.

I also find the scikit-learn documentation to be a useful reference for grid search specifically: http://scikit-learn.org/stable/modules/grid_search.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset