Improving our models

Although in this chapter we have built a rudimentary model that matches the performance of academic research studies, there is certainly room for improvement. The following are some ideas for how the model can be improved, and we leave it to the reader to implement these suggestions and any other tricks or techniques the reader might know to improve performance. How high will your performance go?

First and foremost, the current training data has a large number of columns. Some sort of feature selection is almost always performed, particularly for logistic regression and random forest models. For logistic regression, common methods of performing feature selection include:

  • Using a certain number of predictors that have the highest coefficients
  • Using a certain number of predictors that have the lowest p-values
  • Using lasso regularization and removing predictors whose coefficients become zero
  • Using a greedy algorithm such as forward- or backward-stepwise logistic regression that removes/adds predictors systematically according to rules
  • Using a brute-force algorithm, such as best subset logistic regression, that tests every predictor permutation/combination for a given number of predictors

For random forests, using the variable importance and selecting a certain number of predictors with the highest importance is very common, as is performing grid searches.

Neural networks have their own unique improvement techniques.

  • For one thing, more data is always good, particularly with neural networks.
  • The specific optimization algorithm used can factor into the performance.
  • In this example, our neural networks only had one hidden layer. In the industry, models with multiple hidden layers are becoming increasingly common (although they may take a long time to train).
  • The specific nonlinear activation function used may also impact the model's performance.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset