Using the models to make predictions

We have finished preprocessing the data, and making and scoring the model. The AUC is similar to that reported in previous academic studies that predict ED outcomes (see Cameron et al., 2015 for an example).

The next step would be to save and deploy the model and use it to make live predictions. Fortunately, all of the classifiers in the scikit-learn library include several functions for making predictions:

  • For most classifiers, the predict() function takes a matrix, X, that contains unlabeled data as input and simply returns the class predictions with no further information.
  • The predict_proba() function takes a matrix, X, that contains unlabeled data as input and returns the probabilities with which the observations belong to each class. These should add up to 1 for each observation.
  • The predict_log_proba() function is similar to the predict_proba() function except that it returns the log probabilities with which the observations belong to each class.

Keep the following important fact in mind: When making predictions, the unlabeled data must be preprocessed identically to the manner in which the training data was preprocessed. This includes:

  • Column additions and deletions
  • Column transformations
  • Imputation of missing values
  • Scaling and centering
  • One-hot encoding

Just one column that is not preprocessed properly can have an extremely negative impact on the model predictions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset