Let's build a recommendation engine that can recommend movies to a bunch of users. We will be using data put together by the GroupLens Research research group at the University of Minnesota.
Follow these steps:
- First, we will import the relevant packages:
import pandas as pd
import numpy as np
- Now, let's import the user_id and item_id datasets:
df_reviews = pd.read_csv('reviews.csv')
df_movie_titles = pd.read_csv('movies.csv',index_col=False)
- We merge the two DataFrames by the movie ID:
df = pd.merge(df_users, df_movie_titles, on='movieId')
The header of the df DataFrame, after running the preceding code, looks like the following:
The details of the columns are as follows:
-
- userid: The unique ID of each of the users
- movieid: The unique ID of each of the movies
- rating: Ratings of each of the movies from 1 to 5
- timestamp: The timestamp when the movie was rated
- title: The title of the movie
- genres: The genre of the movie
- To look into the summary trends of the input data, let's compute the mean and count of ratings per movie using groupby by the title and rating columns:
.
- Let's now prepare data for the recommendation engine. For that, we will transform the dataset into a matrix, which will have the following characteristics:
-
- Movie titles will be columns.
- User_id will be the index.
- Ratings will be the value.
We will use the pivot_table function of the DataFrame to get it done:
movie_matrix = df.pivot_table(index='userId', columns='title', values='rating')
Note that the preceding code will generate a very sparse matrix.
- Now, let's use this recommendation matrix that we have created to recommend movies. For that, let's consider a particular user who has watched the movie, Avatar (2009). First, we will find all of the users that have shown interest in Avatar (2009):
Avatar_user_rating = movie_matrix['Avatar (2009)']
Avatar_user_rating = Avatar_user_rating.dropna()
Avatar_user_rating.head()
- Now, let's try to suggest the movies that correlate with Avatar (2009). For that, we will calculate the correlation of the Avatar_user_rating DataFrame with movie_matrix, as follows:
similar_to_Avatar=movie_matrix.corrwith(Avatar_user_rating)
corr_Avatar = pd.DataFrame(similar_to_Avatar, columns=['correlation'])
corr_Avatar.dropna(inplace=True)
corr_Avatar = corr_Avatar.join(df_ratings['number_of_ratings'])
corr_Avatar.head()
This gives out the following output:
This means that we can use these movies as recommendations for the user.