Support vector machine

A Support Vector Machine (SVM) is a powerful supervised learning algorithm used for classification and regression. It is a discriminative classifier—it draws a boundary between clusters or classifications of data, so new points can be classified based on the cluster that they fall into.

SVMs do not just find a boundary line; they also try to determine margins for the boundary on either side. The SVM algorithm tries to find the boundary with the largest possible margin around it.

Support vectors are points that define the largest margin around the boundary—remove these points, and possibly, a larger margin can be found. Hence the name, support, as they support the margin around the boundary line. The support vectors matter. This is illustrated in the following diagram:

For more information on this, refer to http://winfwiki.wi-fom.de/images/c/cf/Support_vector_2.png.

To use the SVM algorithm for classification, we specify one of the following three kernels: linear, poly, and rbf (also known as radial basis functions).

Then, we import the Support Vector Classifier (SVC):

    from sklearn import svm
  

We then instantiate an SVM classifier, fit the model, and predict the following:

    model = svm.SVC(kernel=kernel)
    svm_model = model.fit(X_train, y_train)
    X_test = pt.dmatrix(formula, test_df_filled)
    . . .
  

Upon submitting our data to Kaggle, the following results were obtained:

Formula Kernel Type Kaggle Score
C(Pclass) + C(Sex) + Fare poly 0.71292
C(Pclass) + C(Sex) poly 0.76555
C(Sex) poly 0.76555
C(Pclass) + C(Sex) + Age + SibSp + Parch poly 0.75598
C(Pclass) + C(Sex) + Age + Parch + C(Embarked) poly 0.77512
C(Pclass) + C(Sex) + Age + Sibsp + Parch + C(embarked) poly 0.79426
C(Pclass) + C(Sex) + Age + Sibsp + Parch + C(Embarked) rbf 0.7512

 

The code can be seen in its entirety in the following file: run_svm_titanic.py.

Here, we see that the SVM with a kernel type of poly (polynomial) and the combination of the Pclass, Sex, Age, Sibsp, and Parch features produces the best results when submitted to Kaggle. Surprisingly, it seems as if the embarkation point (Embarked) and whether the passenger traveled alone or with family members (Sibsp + Parch) do have a material effect on a passenger's chances of survival.

The latter effect was probably due to the women-and-children-first policy on the Titanic.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset