Application of machine learning – Kaggle Titanic competition

To illustrate how we can use pandas to assist us at the start of our machine learning journey, we will apply it to a classic problem, which is hosted on the Kaggle website (http://www.kaggle.com). Kaggle is a competition platform for machine learning problems. The idea behind Kaggle is to enable companies that are interested in solving predictive analytics problems with their data to post their data on Kaggle and invite data scientists to come up with proposed solutions to their problems. A competition can be ongoing over a period of time, and the rankings of the competitors are posted on a leaderboard. At the close of the competition, the top-ranked competitors receive cash prizes.

The classic problem that we will study to illustrate the use of pandas for machine learning with scikit-learn is the Titanic: Machine Learning from Disaster problem hosted on Kaggle as their classic introductory machine learning problem. The dataset involved in the problem is a raw dataset. Hence, pandas is very useful in the preprocessing and cleansing of the data before it is submitted as input to the machine learning algorithm implemented in scikit-learn.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset