Demystifying recommendation systems

Although the inner workings of recommendation systems may seem intimidating at first, they are actually quite intuitive. Let's take an example of various movies and users. Each user has the option to rate a movie on a scale of 1 to 5. The recommendation system will try to find users with similar preferences to a new user, and will then recommend movies that the new user will probably like, as similar users also like them. Let's take the following simple example, consisting of four users and six movies:

User	Interstellar	2001: A Space Odyssey	The Matrix	Full Metal Jacket	Jarhead	Top Gun
U0	5	4		2	1
U1		1		4	4	3
U2	4		4			1
U3		4	5	5	4

Ratings for each movie from each user

As is evident, each user has rated a number of movies, although not all users watched the same movies and each user liked different movies. If we want to recommend a movie to user two (U2), we must first find the most similar users. We can then make predictions in a k-Nearest Neighbor (k-NN) fashion, using the K most similar users. Of course, we can see that the user probably likes sci-fi films, but we need a quantitative method to measure it. If we treat each user's preferences as a vector, we have four vectors of six elements. We can then compute the cosine between any two vectors. If the vectors align perfectly, the cosine will be 1, indicating a perfect equality. If the vectors are completely opposite, it will be -1, indicating a perfect disagreement between the two users' preferences. The only problem that arises is the fact that not all movies have been rated by each user. We can fill empty entries with zeros, in order to compute the cosine similarities. The following graph shows the cosine similarities between the users:

Cosine similarities between users

We notice that users U0 and U3 exhibit a high level of similarity with U2. The problem is that U0 also exhibits high similarity with U1, although their ratings are complete opposites. This is due to the fact that we fill any non-rated movie with 0, meaning all users who have not watched a movie agree that they do not like it. This can be remedied by first subtracting the mean of each user's ratings from their ratings. This normalizes the values and centers them around 0. Following this, we assign 0 to any movie the user has not yet rated. This indicates that the user is indifferent toward this movie and the user's mean rating is not altered. By computing the centered cosine similarity, we get the following values:

Centered cosine similarities between users

We can now see that U2 is similar to U0 and U3, while U1 and U0 are quite dissimilar. In order to compute a prediction about movies that U2 has not seen, but that the nearest K neighbors have seen, we will compute the weighted average for each movie, using the cosine similarities as weights. We only do this for movies that all similar users have rated, but that the target user has not rated yet. This gives us the following predicted ratings. If we were to recommend a single movie to U2, we would recommend 2001: A Space Odyssey, a sci-fi film, as we speculated earlier:

Interstellar	2001: A Space Odyssey	The Matrix	Full Metal Jacket	Jarhead	Top Gun
-	4.00	-	3.32	2.32	-

Predicted ratings for user U2

This recommendation method is called collaborative filtering. When we search for similar users, as we did in this small example, it is called user-user filtering. We can also apply this method to search for similar items by transposing the ratings table. This is called item-item filtering, and it usually performs better in real-world applications. This is due to the fact that items usually belong to more well-defined categories, when compared to users. For example, a movie can be an action movie, a thriller, a documentary, or a comedy with little overlap between the genres. A user may like a certain mix of those categories; thus, it is easier to find similar movies, rather than similar users.

Table of Contents for Demystifying recommendation systems

Create new playlist

Sign In

Sign Up

Table of Contents for
Demystifying recommendation systems