The Titanic: Machine Learning from Disaster problem

The dataset for the Titanic consists of the passenger manifest for the doomed trip, along with various features and an indicator variable telling whether the passenger survived the sinking of the ship or not. The essence of the problem is to be able to predict, given a passenger and his/her associated features, whether this passenger survived the sinking of the Titanic or not. The features are as follows.

The data consists of two datasets: one training dataset and one test dataset. The training dataset consists of 891 passenger cases, and the test dataset consists of 491 passenger cases.

The training dataset also consists of 11 variables, of which 10 are features and 1 dependent/indicator variable, Survived, which indicated whether the passenger survived the disaster or not.

The feature variables are as follows:

PassengerID
Cabin
Sex
Pclass (passenger class)
Fare
Parch (number of parents and children)
Age
Sibsp (number of siblings)
Embarked

We can make use of pandas to help us to preprocess data in the following ways:

Data cleaning and the categorization of some variables
The exclusion of unnecessary features that obviously have no bearing on the survivability of the passenger; for example, name
Handling missing data

There are various algorithms that we can use to tackle this problem. They are as follows:

Decision trees
Neural networks
Random forests
Support vector machines

Table of Contents for The Titanic: Machine Learning from Disaster problem

Create new playlist

Sign In

Sign Up

Table of Contents for
The Titanic: Machine Learning from Disaster problem