Evaluating machine learning models

In this section, we will discuss how to evaluate a machine learning model because you should always evaluate a model to determine if it is ready to perform well consistently, predicting the target for new and future data. Obviously future data might have many unknown target values. Therefore, you need to check performance-related metrics such as the accuracy metric of the ML model on the data. In this regard, you need to provide a dataset containing scores generated from a trained model and then evaluate the model to compute a set of industry-standard evaluation metrics.

To evaluate a model appropriately, you need to present a sample of data that has been labeled with the target and this data will be used as the ground truth or facts dataset from the training data source. As we have already discussed, evaluating the predictive accuracy of an ML model with the same training dataset might not be useful.

The reason is that the model itself can remember the training data based on the rewards it receives instead of generalizing from it. Thus, when the ML model training has finished, you can get to know the target values to be predicted from the presented observations in the model. After that you can compare the predicted values that are returned by the ML model you have trained against the known target value.

Finally, you might be interested in computing the summary metric that shows the performance metrics to indicate how well the predicted and true values match accuracy parameters such as precision, recall, weighted true positive, weighted true negative, lift, and so on. In this section, however, we will particularly discuss how we can evaluate regression, classification (that is, binary classification, multiclass classification) and clustering model in the first place.

Evaluating a regression model

Assume you that you are an online currency trader and you work on Forex or Fortrade. Right now you have two currency pairs in mind to buy or sell, for example, GBP/USD and USD/JPY pairs. If you look at these two pairs carefully, you'll see that USD is common in both pairs. Now, if you observe the historical prices of USD, GBP or JPY you can predict the future outcome of whether you should open the trade in buy or sell.

This kind of problem can be treated as the typical regression problem. Here, the target variable (price in this case) is a continuous numeric value changing over time, based on the market opening time. Therefore, for making predictions on the prices, based on the given feature values of a certain currency (that is, USD, GBP or JPY in this example), we can fit a simple linear regression or logistic regression model.

In this case, the feature values could be the historical prices and some external factors that drift the value of a certain currency or currency pair. In this way, the trained build model can predict the price of a certain currency.

The regression models (that is, linear, logistic or generalized liner regression models) can be used for finding or calculating the score of the same dataset we trained, now that the predicted prices of all of the currencies or the historical prices of these three currencies are available. We can further evaluate the performance of the model by analyzing, on average, how much the predicted prices deviate compared to the actual prices. In this way, people can guess whether the predicted price will go up or down and can earn money from online currency websites such as Forex or Fortrade and so on.

Evaluating a binary classification model

As we have already discussed in the binary classification scenario, there are only two possible outcomes for the target variable. For example: {0, 1}, {spam, hap}, {B, N}, {false, true} and {negative, positive} and so on. Now assume that you are given a dataset comprising researchers around the world with demographic, socio-economic and employment variables and you would like to predict the Ph.D. scholarship amount (that is, salary level) as a binary variable with the values of, say, {<=1.5K$, >=4.5K$}.

In this particular example, the negative class would represent the researcher whose salary or scholarship is less than or equal to $1,500 monthly. Consequently, the positive class, on the other hand, represents all other researchers whose salary is more than or equal to $4,500.

Now, from the problem scenario, it is clear that it is also a regression problem. As a result, you would train a model, score the data, and evaluate the results and see how much it deviates with actual labels. Therefore, in this type of problem, you would perform an experiment to evaluate the performance of a two-class (that is, binary class) logistic regression model, which is one of the most commonly used binary classifier in the area of ML.

Evaluating a multiclass classification model

In Chapter 6, Building Scalable Machine Learning Pipelines, we developed several applications and pipelines, and you might remember that we also developed a multiclass classification problem for the OCR dataset using logistic regression and showed the result using Multiclass Metrics. In that example, there were 26 classes for 26 characters (that is, from A to Z). We had to predict the class or label of a certain character and whether it really fell under the correct class or labels. This kind of regression problem can be resolved using the multiclass classification method.

Therefore, in this type of problem, you would perform an experiment to evaluate the performance of a multiclass (that is, more than two class) logistic regression model too.

Evaluating a clustering model

Since the clustering models differ expressively from classification and regression models in many different aspects, if you evaluate a clustering model you will find a different set of statistics and performance related metrics for clustering models. The performance metrics that were returned in the clustering model evaluation technique, describe how many data points were assigned to each cluster, the amount of separation between clusters, and how tightly the data points are bunched within each cluster.

For example, if you recall, in Chapter 5, Supervised and Unsupervised Learning by Examples, you found a clustering problem that we discussed and resolved using Spark ML and MLlib in the Unsupervised Learning with Spark: An example section. In that particular example, we showed K-Means clustering of the neighborhood using the Saratoga NY Homes dataset, and showed an exploratory analysis based on price and lot size features for possible neighborhoods of houses located in the same area. This kind of problem can be resolved and evaluated using a clustering model. However, the current implementation of Spark does not yet provide any developed algorithm for model evaluation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset