As our dataset is highly skewed (that is, it has a high degree of class imbalance), we cannot utilize accuracy in order to evaluate our models. This is due to the fact that by classifying all instances as non-frauds, we can achieve an accuracy of 99.82%. Certainly, this number does not represent an acceptable performance, as we are unable to detect any fraudulent transactions. Thus, in order to evaluate our models, we will use recall (the percentage of frauds we detected) and F1 score, a weighted average between recall and precision (a measure of how many of the transactions predicted as fraudulent were indeed fraudulent).