Practical example – creating a recommendation engine

Let's build a recommendation engine that can recommend movies to a bunch of users. We will be using data put together by the GroupLens Research research group at the University of Minnesota.

 

Follow these steps:

  1. First, we will import the relevant packages:
import pandas as pd 
import numpy as np
  1. Now, let's import the user_id and item_id datasets:
df_reviews = pd.read_csv('reviews.csv')
df_movie_titles = pd.read_csv('movies.csv',index_col=False)
  1. We merge the two DataFrames by the movie ID:
df = pd.merge(df_users, df_movie_titles, on='movieId')

The header of the df DataFrame, after running the preceding code, looks like the following:

The details of the columns are as follows:

    • userid: The unique ID of each of the users
    • movieid: The unique ID of each of the movies
    • rating: Ratings of each of the movies from 1 to 5
    • timestamp: The timestamp when the movie was rated
    • title: The title of the movie
    • genres: The genre of the movie
  1. To look into the summary trends of the input data, let's compute the mean and count of ratings per movie using groupby by the title and rating columns:

.

  1. Let's now prepare data for the recommendation engine. For that, we will transform the dataset into a matrix, which will have the following characteristics:
    • Movie titles will be columns.
    • User_id will be the index.
    • Ratings will be the value.

We will use the pivot_table function of the DataFrame to get it done:

movie_matrix = df.pivot_table(index='userId', columns='title', values='rating')

Note that the preceding code will generate a very sparse matrix.

  1. Now, let's use this recommendation matrix that we have created to recommend movies. For that, let's consider a particular user who has watched the movie, Avatar (2009). First, we will find all of the users that have shown interest in Avatar (2009):
Avatar_user_rating = movie_matrix['Avatar (2009)']
Avatar_user_rating = Avatar_user_rating.dropna()
Avatar_user_rating.head()
  1. Now, let's try to suggest the movies that correlate with Avatar (2009). For that, we will calculate the correlation of the Avatar_user_rating DataFrame with movie_matrix, as follows:
similar_to_Avatar=movie_matrix.corrwith(Avatar_user_rating)
corr_Avatar = pd.DataFrame(similar_to_Avatar, columns=['correlation'])
corr_Avatar.dropna(inplace=True)
corr_Avatar = corr_Avatar.join(df_ratings['number_of_ratings'])
corr_Avatar.head()

This gives out the following output:

This means that we can use these movies as recommendations for the user.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset