Best practice 1 – completely understanding the project goal

Before starting to collect data, we should make sure that the goal of the project and the business problem, is completely understood, as this will guide us on what data sources to look into, and where sufficient domain knowledge and expertise is also required. For example, in the previous chapter, Chapter 9, Stock Price Prediction with Regression Algorithms, our goal was to predict the future prices of the DJIA index, so we first collected data of its past performance, instead of past performance of an irrelevant European stock. In Chapter 6, Predicting Online Ads Click-through with Tree-Based Algorithms, and Chapter 7, Predicting Online Ads Click-through with Logistic Regression, the business problem was to optimize advertising targeting efficiency measured in click-through rate, so we collected the clickstream data of who clicked or did not click on what ad on what page, instead of merely how many ads were displayed in a web domain.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset