Summary

In this chapter, we covered the most important tools that machine learning practitioners use in order to make sense of their data and get the learning algorithm to get the most out of their data.

Feature engineering was the first and commonly used tool in data science; it's a must-have component in any data science pipeline. The purpose of this tool is to make better representations for your data and increase the predictive power of your model.

We saw how a large number of features can be problematic and lead to worse classifier performance. We also saw that there is an optimal number of features that should be used to get the maximum model performance, and this optimal number of features is a function of the number of data samples/observations you got.

Subsequently, we introduced one of the most powerful tools, which is bias-variance decomposition. This tool is widely used to test how good the model is over the test set.

Finally, we went through learning visibility, which answers the question of how much data we should need in order to get in business and do machine learning. The rule of thumb showed that we need data samples/observations at least 10 times the number of features in your data. However, this rule of thumb can be broken by using another tool called regularization, which will be addressed in more detail in the next chapter.

Next up, we are going to continue to increase our data science tools that we can use to drive meaningful analytics from our data, and face some daily problems of applying machine learning.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset