As we saw in Chapter 7, Model Evaluation, to evaluate the performance of a regression model, we can use a plot called Predicted versus Observed Plot (Pr v Ob), as shown here:
We've quickly developed a model that achieved a Pseudo R-square of 0.744. We did a small optimization in the model; we can improve the performance by working with the different variables.
After improving the model using the training dataset to build the model and the validation dataset to evaluate this performance, we need to confirm the performance of our model by creating a Predicted versus Observed Plot with the test dataset. We can do that to detect overfitting.
A very interesting feature of Rattle is that we can run multiple models and evaluate the performance of the different models. Go to the Model tab and build a Neural Network model. Now, return to the Evaluate tab and select the Linear and Neural Net checkboxes and press the Execute button, you can compare the two different models, as shown in the following screenshot. As we saw in Chapter 6, Decision Trees and Other Supervised Learning Methods, Neural Networks are a kind of algorithm inspired by biological Neural Networks which are used to approximate functions:
Now that we have a model to predict the demand of rental bikes, we want to add demand forecast information to our Qlik Sense application.
The first step is to predict the new values, or forecast the demand. The second and final step is to load the data into Qlik Sense.
We have three main options to score new data, these are listed as follows:
In a real case, we will have the weather forecast for the next week, and we'll use Rattle to load the data from a CSV file. In this example, we don't have next week's forecast and we'll score the testing dataset and the complete dataset.
Go to the Evaluate tab, select Score, Testing, and All and press the Execute button. Rattle will score the testing dataset for you and will write the prediction in a CSV file. This file will include all original variables and a new variable called glm
with the predicted value, as shown in the following screenshot. Before finishing, you need to confirm the location and name of the file:
Now, we will check the performance of our model by using Qlik Sense. Create a new Qlik Sense application and load the file generated by Rattle. This file contains 111
rows and the testing dataset, and each row contains all the original data and the predicted value. Create a scatter plot with dteday
as the dimension and glm
and cnt
as measures, as shown here:
We've created a Predicted versus Observed Plot with the testing dataset. This chart and the one we created in Rattle gives us an idea of the predictive power of our model. We don't need this plot because we've created it in Rattle.
Now, come back to Rattle to score the complete dataset. Select the Full dataset and report just the Identifiers included, as shown here:
Rattle will create a file with 731
observation with all the original columns and a new column with the predicted value called glm
.
We will load this information in to our original bikes application. Go to Qlik Sense, open the original application, and drag and drop this second file into the application. Qlik Sense will ask you if you want to add new data or replace the current data, if yes, then select Add data. After loading the data, open Data model viewer, as shown in the following screenshot:
Rattle has created a file that contains variables, the identifier ident and the prediction glm. When we load this file into Qlik Sense, it creates an association with our original table using the field ident. Finally, create two charts to show the predictive power of our model. In the first chart, I used a line chart with dteday as the dimension and cnt and glm as measures, as shown here:
Be careful with these plots, we've added the training dataset and the validation and testing ones. We used training and validation datasets to build the model, so this plot doesn't provide us a real idea of the predictive power of our model. The previous plots we did just with the testing dataset gave us a real idea of the performance.
In the real world, we will load the weather forecast for the following week into Rattle and we will score it. By loading historic and forecast data into Qlik Sense, we will be able to create visualizations that are similar to the following screenshot, which shows historic and forecast data together: