In order to build and test our recommendation engines, we can use the same function, Recommender()
, merely changing the specification for each technique. In order to see what the package can do and explore the parameters available for all six techniques, you can examine the registry. Looking at the following IBCF, we can see that the default is to find 30 neighbors using the cosine method with the centered data while the missing data is not coded as a zero:
> recommenderRegistry$get_entries(dataType = "realRatingMatrix") $IBCF_realRatingMatrix Recommender method: IBCF Description: Recommender based on item-based collaborative filtering (real data). Parameters: k method normalize normalize_sim_matrix alpha na_as_zero minRating 1 30 Cosine center FALSE 0.5 FALSE NA $PCA_realRatingMatrix Recommender method: PCA Description: Recommender based on PCA approximation (real data). Parameters: categories method normalize normalize_sim_matrix alpha na_as_zero 1 20 Cosine center FALSE 0.5 FALSE minRating 1 NA $POPULAR_realRatingMatrix Recommender method: POPULAR Description: Recommender based on item popularity (real data). Parameters: None $RANDOM_realRatingMatrix Recommender method: RANDOM Description: Produce random recommendations (real ratings). Parameters: None $SVD_realRatingMatrix Recommender method: SVD Description: Recommender based on SVD approximation (real data). Parameters: categories method normalize normalize_sim_matrix alpha treat_na 1 50 Cosine center FALSE 0.5 median minRating 1 NA $UBCF_realRatingMatrix Recommender method: UBCF Description: Recommender based on user-based collaborative filtering (real data). Parameters: method nn sample normalize minRating 1 cosine 25 FALSE center NA
Here is how you can put together the algorithms based on the train
data. For simplicity, let's use the default algorithm settings. You can adjust the parameter settings by simply including your changes in the function as a list. For instance, SVD treats the missing values as the column median. If you wanted to have the missing values coded as zero, you would need to include param=list(treat_na="0")
:
> ubcf = Recommender(getData(e,"train"), "UBCF") > ibcf = Recommender(getData(e,"train"), "IBCF") > svd = Recommender(getData(e, "train"), "SVD") > popular = Recommender(getData(e, "train"), "POPULAR") > pca = Recommender(getData(e, "train"), "PCA") > random = Recommender(getData(e, "train"), "RANDOM")
Now, using the predict()
and getData()
functions, we will get the predicted ratings for the 15 items of the test
data for each of the algorithms, as follows:
> user_pred = predict(ubcf, getData(e,"known"),type="ratings") > item_pred = predict(ibcf, getData(e, "known"),type="ratings") > svd_pred = predict(svd, getData(e, "known"),type="ratings") > pop_pred = predict(popular, getData(e, "known"),type="ratings") > pca_pred = predict(pca, getData(e, "known"),type="ratings") > rand_pred = predict(random, getData(e, "known"), type="ratings")
We will examine the error between the predictions and unknown portion of the test
data using the calcPredictionAccuracy()
function. The output will consist of RMSE
, MSE
, and MAE
for all the methods. We'll examine UBCF
by itself. After creating the objects for all six methods, we can build a table by creating an object with the rbind()
function and giving names to the rows with the rownames()
function:
> P1 = calcPredictionAccuracy(user_pred, getData(e, "unknown")) > P1 RMSE MSE MAE 4.5 19.9 3.5 > P2 = calcPredictionAccuracy(item_pred, getData(e,"unknown")) > P3 = calcPredictionAccuracy(svd_pred, getData(e, "unknown")) > P4 = calcPredictionAccuracy(pop_pred, getData(e,"unknown")) > P5 = calcPredictionAccuracy(pca_pred, getData(e,"unknown")) > P6 = calcPredictionAccuracy(rand_pred, getData(e,"unknown")) > error = rbind(P1,P2,P3,P4,P5,P6) > rownames(error) = c("UBCF", "IBCF", "SVD", "Popular", "PCA", "Random") > error RMSE MSE MAE UBCF 4.467276 19.95655 3.496973 IBCF 4.651552 21.63693 3.517007 SVD 5.275496 27.83086 4.454406 Popular 5.064004 25.64414 4.233115 PCA 4.711496 22.19819 3.725162 Random 7.830454 61.31601 6.403661
We can see in the output that the user-based algorithm slightly outperforms IBCF and PCA. It is also noteworthy that a simple algorithm such as the popular-based recommendation does fairly well.
There is another way to compare methods using the evaluate()
function. Making comparisons with evaluate()
allows one to examine additional performance metrics as well as performance graphs. As the UBCF and IBCF algorithms performed the best, we will look at them along with the popular-based one.
The first task in this process is to create a list of the algorithms that we want to compare, as follows:
> algorithms = list(POPULAR = list(name = "POPULAR"),UBCF =list(name = "UBCF"),IBCF = list(name = "IBCF")) > algorithms $POPULAR $POPULAR$name [1] "POPULAR" $UBCF $UBCF$name [1] "UBCF" $IBCF $IBCF$name [1] "IBCF"
You can adjust the parameters with param=…
in the list()
function just as the preceding example. In the next step, you can create the results using evaluate()
and also set up a comparison on a specified number of recommendations. For this example, let's compare the top 5
, 10
, and 15
joke recommendations:
> evlist = evaluate(e, algorithms,n=c(5,10,15)) POPULAR run 1 [0.05sec/1.02sec] UBCF run 1 [0.03sec/68.26sec] IBCF run 1 [2.03sec/0.86sec]3
Note that by executing the command, you will receive an output on how long it took to run the algorithm. We can now examine the performance using the avg()
function:
> avg(evlist) $POPULAR TP FP FN TN precision recall TPR 5 2.092 2.908 14.193 70.807 0.4184 0.1686951 0.1686951 10 3.985 6.015 12.300 67.700 0.3985 0.2996328 0.2996328 15 5.637 9.363 10.648 64.352 0.3758 0.4111718 0.4111718 FPR 5 0.03759113 10 0.07769088 15 0.12116708 $UBCF TP FP FN TN precision recall TPR 5 2.074 2.926 14.211 70.789 0.4148 0.1604751 0.1604751 10 3.901 6.099 12.384 67.616 0.3901 0.2945067 0.2945067 15 5.472 9.528 10.813 64.187 0.3648 0.3961279 0.3961279 FPR 5 0.03762910 10 0.07891524 15 0.12362834 $IBCF TP FP FN TN precision recall 5 1.010 3.990 15.275 69.725 0.2020 0.06047142 10 2.287 7.713 13.998 66.002 0.2287 0.15021068 15 3.666 11.334 12.619 62.381 0.2444 0.23966150 TPR FPR 5 0.06047142 0.0534247 10 0.15021068 0.1027532 15 0.23966150 0.1504704
Note that the performance metrics for POPULAR
and UBCF
are nearly the same. One could say that the simpler-to-implement popular-based algorithm is probably the better choice for a model selection. Indeed, what is disappointing about this whole exercise is the anemic TPR
, for example, for UBCF
of 15
recommendations, only an average of 5.5 were truly accurate. As mentioned, we can plot and compare the results as Receiver Operating Characteristic Curves (ROC), where you can compare TPR
and FPR
or as precision/recall curves, as follows:
> plot(evlist, legend="topleft", annotate=TRUE)
The following is the output of the preceding command:
To get the precision/recall curve plot you only need to specify "prec
" in the plot function:
> plot(evlist, "prec", legend="bottomright", annotate=TRUE)
The output of the preceding command is as follows:
You can clearly see in the plots that the popular-based and user-based algorithms are almost identical and outperform the item-based one. The annotate=TRUE
parameter provides numbers next to the point that corresponds to the number of recommendations that we called for in our evaluation.
This was simple, but what are the actual recommendations from a model for a specific individual? This is quite easy to code as well. First, let's build a user-based recommendation engine on the full dataset. Then, we will find the top five recommendations for the first two raters. We will use the Recommend()
function and apply it to the whole dataset, as follows:
> R1 = Recommender(Jester5k, method="UBCF") > R1 Recommender of type 'UBCF' for 'realRatingMatrix' learned using 5000 users.
Now, we just need to get the top five recommendations—in order—for the first two raters and produce them as a list:
> recommend = predict(R1, Jester5k[1:2], n=5) > as(recommend, "list") [[1]] [1] "j81" "j78" "j83" "j80" "j73" [[2]] [1] "j96" "j87" "j89" "j76" "j93"
It is also possible to see a rater's specific rating score for each of the jokes by specifying this in the predict()
syntax and then putting it in a matrix for review. Let's do this for ten individuals (raters 300
through 309
) and three jokes (71
through 73
):
> rating = predict(R1, Jester5k[300:309], type="ratings") > rating 10 x 100 rating matrix of class 'realRatingMatrix' with 322 ratings. > as(rating, "matrix")[,71:73] j71 j72 j73 [1,] -0.8055227 -0.05159179 -0.3244485 [2,] NA NA NA [3,] -1.2472200 NA -1.5193913 [4,] 4.0659217 4.45316186 4.0651614 [5,] NA NA NA [6,] 1.1233854 1.37527380 NA [7,] 0.4938482 0.18357168 -0.1378054 [8,] 0.2004399 0.58525761 0.2910901 [9,] -0.5184774 0.03067017 0.2209107 [10,] 0.1480202 0.35858842 NA
The numbers in the matrix indicate the predicted ratings for the jokes that the individual rated, while the NAs indicate those that the user did not rate.
Our final effort on this data will show how to build recommendations for those situations where the ratings are binary, that is, good or bad or 1 or 0. We will need to turn the ratings into this binary format with 5 or greater as a 1 and less than 5 as 0. This is quite easy to do with Recommenderlab
using the binarize()
function and specifying minRating=5
:
> Jester.bin = binarize(Jester5k, minRating=5)
Now, we will need to have our data reflect the number of ratings—equal to one—in order to match what we need the algorithm to use for the training. For argument's sake, let's go with given=10
. The code to create the subset of the necessary data is shown in the following lines:
> Jester.bin = Jester.bin[rowCounts(Jester.bin)>10] > Jester.bin 3054 x 100 rating matrix of class 'binaryRatingMatrix' with 84722 ratings.
You will need to create evaluationScheme
. In this instance, we will go with cross-validation
. The default k-fold in the function is 10
, but we can also safely go with k=5
, which will reduce our computation time:
> set.seed(456) > e.bin = evaluationScheme(Jester.bin, method="cross-validation", k=5, given=10)
For comparison purposes, the algorithms under evaluation will include random
, popular
and UBCF
:
> algorithms.bin = list("random" = list(name="RANDOM", param=NULL),"popular" = list(name="POPULAR", param=NULL),"UBCF" = list(name="UBCF"))
It is now time to build our model, as follows:
> results.bin = evaluate(e.bin, algorithms.bin, n=c(5,10,15)) RANDOM run 1 [0sec/0.41sec] 2 [0.01sec/0.39sec] 3 [0sec/0.39sec] 4 [0sec/0.41sec] 5 [0sec/0.4sec] POPULAR run 1 [0.01sec/3.79sec] 2 [0sec/3.81sec] 3 [0sec/3.82sec] 4 [0sec/3.92sec] 5 [0.02sec/3.78sec] UBCF run 1 [0sec/5.94sec] 2 [0sec/5.92sec] 3 [0sec/6.05sec] 4 [0sec/5.86sec] 5 [0sec/6.09sec]
Forgoing the table of performance metrics, let's take a look at the plots:
> plot(results.bin, legend="topleft")
The output of the preceding command is as follows:
> plot(results.bin, "prec", legend="bottomright")
The output of the preceding command is as follows:
The user-based algorithm slightly outperforms the popular-based one, but you can clearly see that they are both superior to any random recommendation. In our business case, it will come down to the judgment of the decision-making team as to which algorithm to implement.