Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Feature selection for SVMs

However, all is not lost on feature selection and I want to take some space to show you a quick way in how to begin exploring this matter. It will require some trial and error on your part. Again, the caret package helps out in this matter as it will run a cross-validation on a linear SVM based on the kernlab package.

To do this, we will need to set the random seed, specify the cross-validation method in the caret's rfeControl() function, perform a recursive feature selection with the rfe() function, and then test how the model performs on the test set. In rfeControl(), you will need to specify the function based on the model being used. There are several different functions that you can use. Here we will need lrFuncs. To see a list of the available functions, your best bet is to explore the documentation with ?rfeControl and ?caretFuncs. The code for this example is as follows:

> set.seed(123)
> rfeCNTL = rfeControl(functions=lrFuncs, method="cv", number=10)

> svm.features = rfe(train[,1:7], train[,8],sizes = c(7, 6, 5, 4), rfeControl = rfeCNTL, method = "svmLinear")

To create the svm.features object, it was important to specify the inputs and response factor, number of input features via sizes, and linear method from kernlab, which is the svmLinear syntax. Other options are available using this method, such as svmPoly. No method for a sigmoid kernel is available. Calling the object allows us to see how the various feature sizes perform, as follows:

> svm.features

Recursive feature selection

Outer resampling method: Cross-Validated (10 fold) 

Resampling performance over subset size:

 Variables Accuracy  Kappa AccuracySD KappaSD Selected
         4   0.7797 0.4700    0.04969  0.1203         
         5   0.7875 0.4865    0.04267  0.1096        *
         6   0.7847 0.4820    0.04760  0.1141         
         7   0.7822 0.4768    0.05065  0.1232         

The top 5 variables (out of 5):

Counter-intuitive as it is, the five variables perform quite well by themselves as well as when skin and bp are included. Let's try this out on the test set, remembering that the accuracy in the full model was 76.2 percent:

> svm.5 <- svm(type~glu+ped+npreg+bmi+age, data=train, kernel="linear")
> svm.5.predict <- predict(svm.5, newdata=test[c(1,2,5,6,7)])
> table(svm.5.predict, test$type)
             
svm.5.predict No Yes
          No  79  21
          Yes 14  33

This did not perform as well and we can stick with the full model. You can see through trial and error how this technique can play in order to determine some simple identification of feature importance. If you want to explore the other techniques and methods that you can apply here—and for blackbox techniques in particular—I recommend that you start by reading the work by Guyon and Elisseeff (2003) on this subject.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Feature selection for SVMs

Create new playlist

Sign In

Sign Up

Feature selection for SVMs

Table of Contents for
Feature selection for SVMs