A recommender system is an original killer application which is a subclass of an information filtering system that looks to predict the rating or preference from the users that they usually provide to an item. The concept of recommender systems has become very common in recent years and has been subsequently applied in different applications. The most popular ones are probably products (for example, movies, music, books, research articles), news, search queries, social tags, and so on). Recommender systems can be typed into four categories as stated in Chapter 2, Machine Learning Best Practices. These are shown in Figure 10:
From the technical viewpoint, we can further categorize them as follows:
As shown in Figure 11, the model-based recommender system that widely used advanced algorithms such as SVM, LDA, or SVD is the most robust approach in the recommender system class:
As already mentioned, the collaborative filtering techniques are commonly used for recommender systems. However, Spark MLlib currently supports model-based collaborative filtering only. Here, users and products are described by a small set of latent factors. The latent factors are later used for making the prediction of the missing entries. According to the Spark API reference for the collaborative filtering on http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html: the Alternating Least Squares (ALS) (also known as non-linear least square,that is, NLS; see more at https://en.wikipedia.org/wiki/Non-linear_least_squares) algorithm is used to learn these latent factors by considering the following parameters:
numBlocks
is the number of blocks used for the parallelized computation using the native LAPACKrank
is the number of latent factors during the machine learning model buildingiterations
are the number of iterations needed to gain more accurate predictionslambda
signifies the regularization parameter for the ALS algorithmimplicitPrefs
specifies which feedback to be used (explicit feedback ALS variant or one adapted for implicit feedback data)alpha
specifies the baseline confidence in preference observations for the ALS algorithmAt first, the ALS, which is an iterative algorithm, is used to model the rating matrix as the multiplication of low-ranked users and product factors. After that, the learning task is done by using these factors by minimizing the reconstruction error of the observed ratings.
However, the unknown ratings can successively be calculated by multiplying these factors together. The approach for the move recommendation or any other recommendation based on the collaborative filtering technique used in the Spark MLlib has been proven a high performer with high prediction accuracy and is scalable for the billions of ratings on commodity clusters used by companies such as Netflix. In following this way, a company such as Netflix can recommend movies to its subscribers based on the predicted ratings. The ultimate target is to increase the sales and of course the customer satisfaction.
For brevity and page limitation, we will not show the movie recommendations using the collaborative filtering approach in this chapter. However, a step-by-step example using Spark will be shown in Chapter 9, Advanced Machine Learning with Streaming and Graph Data.
For the time being, interested readers are advised to visit the Spark website for the latest API and codes for the same at this URL: http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html, where an example has been presented to show the sample movie recommendations using the ALS algorithm.