Using Keras for movie recommendations

In this section, we will utilize Keras as a deep learning framework in order to build our models. Keras can easily be installed by using either pip (pip install keras) or conda (conda install -c conda-forge keras). In order to build the neural networks, we must first understand our data. The MovieLens dataset consists of almost 100,000 samples and 4 different variables:

  • userId: A numeric index corresponding to a specific user
  • movieId: A numeric index corresponding to a specific movie
  • rating: A value between 0 and 5
  • timestamp: The specific time when the user rated the movie

A sample from the dataset is depicted in the following table. As is evident, the dataset is sorted by the userId column. This can potentially create overfitting problems in our models. Thus, we will shuffle the data before any split happens. Furthermore, we will not utilize the timestamp variable in our models, as we do not care about the order in which the movies were rated:

userId

movieId

rating

timestamp

1

1

4

964982703

1

3

4

964981247

1

6

4

964982224

1

47

5

964983815

1

50

5

964982931

A sample from the dataset

By looking at the distribution of ratings on the following graph, we can see that most movies were rated at 3.5, which is above the middle of the rating scale (2.5). Furthermore, the distribution shows a left tail, indicating that most users are generous with their ratings. Indeed, the first quartile of the ratings spans from 0.5 to 3, while the other 75% of the ratings lie in the 3-5 range. In other words, a user only rates 1 out of 4 movies with a value of less than 3:

Ratings' distribution
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset