Model evaluation

As we saw in Chapter 7, Model Evaluation, to evaluate the performance of a regression model, we can use a plot called Predicted versus Observed Plot (Pr v Ob), as shown here:

Model evaluation

We've quickly developed a model that achieved a Pseudo R-square of 0.744. We did a small optimization in the model; we can improve the performance by working with the different variables.

After improving the model using the training dataset to build the model and the validation dataset to evaluate this performance, we need to confirm the performance of our model by creating a Predicted versus Observed Plot with the test dataset. We can do that to detect overfitting.

A very interesting feature of Rattle is that we can run multiple models and evaluate the performance of the different models. Go to the Model tab and build a Neural Network model. Now, return to the Evaluate tab and select the Linear and Neural Net checkboxes and press the Execute button, you can compare the two different models, as shown in the following screenshot. As we saw in Chapter 6, Decision Trees and Other Supervised Learning Methods, Neural Networks are a kind of algorithm inspired by biological Neural Networks which are used to approximate functions:

Model evaluation

Now that we have a model to predict the demand of rental bikes, we want to add demand forecast information to our Qlik Sense application.

The first step is to predict the new values, or forecast the demand. The second and final step is to load the data into Qlik Sense.

Scoring new data

We have three main options to score new data, these are listed as follows:

  • Qlik Sense:
    • We've found the coefficients for the regression; we can use it in Qlik Sense during the data loading or we can create a measure with the formula.
    • This option has a great disadvantage; we would normally prefer to re-evaluate the model with new data. If we change the model, we will have to update the Qlik Sense app.
  • R:
    • We can save our model in Rattle and then we can load the model in R.
    • We can use the predict() function to score new data. This is a good option but the last option is the easiest of all.
  • Rattle:
    • In Rattle's Evaluate tab, we can use the Score option. Using this option, we can score the training, the validation or the testing dataset, or we can load a dataset from a CSV file.

In a real case, we will have the weather forecast for the next week, and we'll use Rattle to load the data from a CSV file. In this example, we don't have next week's forecast and we'll score the testing dataset and the complete dataset.

Go to the Evaluate tab, select Score, Testing, and All and press the Execute button. Rattle will score the testing dataset for you and will write the prediction in a CSV file. This file will include all original variables and a new variable called glm with the predicted value, as shown in the following screenshot. Before finishing, you need to confirm the location and name of the file:

Scoring new data

Now, we will check the performance of our model by using Qlik Sense. Create a new Qlik Sense application and load the file generated by Rattle. This file contains 111 rows and the testing dataset, and each row contains all the original data and the predicted value. Create a scatter plot with dteday as the dimension and glm and cnt as measures, as shown here:

Scoring new data

We've created a Predicted versus Observed Plot with the testing dataset. This chart and the one we created in Rattle gives us an idea of the predictive power of our model. We don't need this plot because we've created it in Rattle.

Now, come back to Rattle to score the complete dataset. Select the Full dataset and report just the Identifiers included, as shown here:

Scoring new data

Rattle will create a file with 731 observation with all the original columns and a new column with the predicted value called glm.

We will load this information in to our original bikes application. Go to Qlik Sense, open the original application, and drag and drop this second file into the application. Qlik Sense will ask you if you want to add new data or replace the current data, if yes, then select Add data. After loading the data, open Data model viewer, as shown in the following screenshot:

Scoring new data

Rattle has created a file that contains variables, the identifier ident and the prediction glm. When we load this file into Qlik Sense, it creates an association with our original table using the field ident. Finally, create two charts to show the predictive power of our model. In the first chart, I used a line chart with dteday as the dimension and cnt and glm as measures, as shown here:

Scoring new data

Be careful with these plots, we've added the training dataset and the validation and testing ones. We used training and validation datasets to build the model, so this plot doesn't provide us a real idea of the predictive power of our model. The previous plots we did just with the testing dataset gave us a real idea of the performance.

In the real world, we will load the weather forecast for the following week into Rattle and we will score it. By loading historic and forecast data into Qlik Sense, we will be able to create visualizations that are similar to the following screenshot, which shows historic and forecast data together:

Scoring new data
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset