Summary

In this chapter, we referred to a raw dataset, explored the data, and took the necessary preprocessing steps to get the data ready for modeling. We performed data type transformations to convert numbers and dates being stored as character strings into numeric and date value columns, respectively. In addition, we performed some feature engineering by breaking up the date value into its component parts. After completing preprocessing, we modeled our data. We followed an approach that included creating a baseline model and then tuning hyperparameters to improve our initial score. We used early stopping rounds and grid searches to identify hyperparameter values that produced the best results. After modifying our model-based results from our tuning procedures, we noticed much better performance.

All of the aspects of machine learning that were discussed in this chapter will be used in the subsequent chapters too. We will need to get our data ready for modeling, and we will need to know how we can improve model performance by adjusting its settings. In addition, we have been focusing on a decision tree ensembling model in xgboost because our work with neural networks in upcoming chapters will be similar. We will need to consider efficiency and performance just as we did with xgboost, by adjusting how trees are grown.

This review of machine learning provides the foundation for stepping into deep learning. We begin, in the next chapter, with installing and exploring the packages used. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset