Part 2. Modeling methods

In part 1, we discussed the initial stages of a data science project. After you’ve defined more precisely the questions you want to answer and the scope of the problem you want to solve, it’s time to analyze the data and find the answers. In part 2, we work with powerful modeling methods from statistics and machine learning.

Chapter 6 covers how to identify appropriate modeling methods to address your specific business problem. It also discusses how to evaluate the quality and effectiveness of models that you or others have discovered.

Chapter 7 covers basic linear models: linear regression, logistic regression, and regularized linear models. Linear models are the workhorses of many analytical tasks, and are especially helpful for identifying key variables and gaining insight into the structure of a problem. A solid understanding of them is immensely valuable for a data scientist.

Chapter 8 temporarily moves away from the modeling task to cover advanced data preparation with the vtreat package. vtreat prepares messy real-world data for the modeling step. Because understanding how vtreat works requires some understanding of linear models and of model evaluation metrics, it seemed best to defer this topic until part 2.

Chapter 9 covers unsupervised methods: clustering and association rule mining. Unsupervised methods don’t make explicit outcome predictions; they discover relationships and hidden structure in the data. Chapter 10 touches on some more-advanced modeling algorithms. We discuss bagged decision trees, random forests, gradient boosted trees, generalized additive models, and support vector machines.

We work through every method that we cover with a specific data science problem along with a nontrivial dataset. Where appropriate, we also discuss additional model evaluation and interpretation procedures that are specific to the methods we cover.

On completing part 2, you’ll be familiar with the most popular modeling methods, and you’ll have a sense of which methods are most appropriate for answering different types of questions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset