In the previous chapter on linear regression, we used the glmnet
package to perform regularization with ridge regression and the lasso. As we've seen that, it might be a good idea to remove some of our features, we'll try applying lasso to our dataset and assess the results. First, we'll train a series of regularized models with glmnet()
and then we will use cv.glmnet()
to estimate a suitable value for λ. Then, we'll examine the coefficients of our regularized model using this λ:
> library(glmnet) > heart_train_mat <- model.matrix(OUTPUT ~ ., heart_train)[,-1] > lambdas <- 10 ^ seq(8, -4, length = 250) > heart_models_lasso <- glmnet(heart_train_mat, heart_train$OUTPUT, alpha = 1, lambda = lambdas, family = "binomial") > lasso.cv <- cv.glmnet(heart_train_mat, heart_train$OUTPUT, alpha = 1,lambda = lambdas, family = "binomial") > lambda_lasso <- lasso.cv$lambda.min > lambda_lasso [1] 0.01057052 > predict(heart_models_lasso, type = "coefficients", s = lambda_lasso) 19 x 1 sparse Matrix of class "dgCMatrix" 1 (Intercept) -4.980249537 AGE . SEX 1.029146139 CHESTPAIN2 0.122044733 CHESTPAIN3 . CHESTPAIN4 1.521164330 RESTBP 0.013456000 CHOL 0.004190012 SUGAR -0.587616822 ECG1 . ECG2 0.338365613 MAXHR -0.010651758 ANGINA 0.807497991 DEP 0.211899820 EXERCISE2 0.351797531 EXERCISE3 0.081846313 FLUOR 0.947928099 THAL6 0.083440880 THAL7 1.501844677
We see that a number of our features have effectively been removed from the model because their coefficients are zero. If we now use this model to measure the classification accuracy on our training and test sets, we observe that in both cases, we get slightly better performance. Even if this difference is small, remember that we have achieved this using three fewer features:
> lasso_train_predictions <- predict(heart_models_lasso, s = lambda_lasso, newx = heart_train_mat, type = "response") > lasso_train_class_predictions <- as.numeric(lasso_train_predictions > 0.5) > mean(lasso_train_class_predictions == heart_train$OUTPUT) [1] 0.8913043 > heart_test_mat <- model.matrix(OUTPUT ~ ., heart_test)[,-1] > lasso_test_predictions <- predict(heart_models_lasso, s = lambda_lasso, newx = heart_test_mat, type = "response") > lasso_test_class_predictions <- as.numeric(lasso_test_predictions > 0.5) > mean(lasso_test_class_predictions == heart_test$OUTPUT) [1] 0.925