Detailed classification reports

The model is trained on the full development set. The scores are computed on the full evaluation set. Precision-recall f1-score support:

0 1.00 0.96 0.98 85296
1 0.04 0.93 0.08 147
micro avg 0.96 0.96 0.96 85443
macro avg 0.52 0.94 0.53 85443
weighted avg 1.00 0.96 0.98 85443

We find the best hyperparameter optimizing for recall:

def print_gridsearch_scores(x_train_data,y_train_data):
c_param_range = [0.01,0.1,1,10,100]

clf = GridSearchCV(LogisticRegression(), {"C": c_param_range}, cv=5, scoring='recall')
clf.fit(x_train_data,y_train_data)

print "Best parameters set found on development set:"
print
print clf.bestparams

print "Grid scores on development set:"
means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, clf.cv_results_['params']):
print "%0.3f (+/-%0.03f) for %r" % (mean, std * 2, params)

return clf.best_params_["C"]

We find the best parameters set found on development, as shown here:

best_c = print_gridsearch_scores(X_train_undersample,y_train_undersample)

The output looks like this:

{'C': 0.01}

Grid scores on set:


0.916 (+/-0.056) for {'C': 0.01}
0.907 (+/-0.068) for {'C': 0.1}
0.916 (+/-0.089) for {'C': 1}
0.916 (+/-0.089) for {'C': 10}
0.913 (+/-0.095) for {'C': 100}

Create a function to plot a confusion matrix. This function prints and plots the confusion matrix. Normalization can be applied by setting normalize=True:

import itertools

def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):

plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=0)
plt.yticks(tick_marks, classes)

if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
#print("Normalized confusion matrix")
else:
1#print('Confusion matrix, without normalization')

thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, cm[i, j],
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")

plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset