Creating compelling visualizations using EasyPlot

For this section, I would like to propose a question. As we get older, do our perceptions of movies increase with approval, decrease with approval, or stay about the same? In other words, what we are trying to figure out is: as we get older, do we appreciate movies more and rate them higher, do we appreciate movies less and rate them lower, or do they stay about the same? What we would like to do is pull together the data in this section, plot it, and see if we can answer that question. So, in this section, we're going to perform a table join to study age and the average movie rating. We're going to be parsing that information into a useable type for plotting, and then we're going to plot. Let's go back to our MovieLens IHaskell notebook and import Graphics.EasyPlot, it's a plotting library. This is shown in the following example:

Because of how I stated the question, this can be done in a single SELECT query. So, let's get started. ageRatings is what we are going to call our result. This is shown in the following example:

ageRating <- quickQuery db "SELECT users.age, avg(data.rating) FROM data, users WHERE data.userid=users.userid GROUP BY users.age" []

We have done a quickQuery on our database, where we have selected users.age and average of data.rating. We're going to rely on the avg function built into SQLite3 for our averaging. We have then pulled these from data and users where data.userid is equal to users.userid. At this point in our query, if we were to perform a cross-section of all of our users' ages and data ratings, then every age would have a corresponding rating; and if we were to average that corresponding rating we would just be averaging a single number, and so the averaging wouldn't really serve a purpose. What we did was to group all of those ages together, and then average that group in a batch.

Next, we need to parse the information out of our ageRatings variable, as shown in the following example:

We're going to read columns on our ageRatings. But before we can read the age ratings directly, we need to massage that dataset a little bit. Right now it is in a column format, where the data we want is in columns. Now you can't just ask for a column in this particular two-dimensional data structure, you can only ask for a full row. So, what we can do is transpose the data, and then ask for a full row. So, we have pulled from the first row and then parsed this as a list of Doubles. Now, in our ageRatings table, the average ratings are going to be on the second column. Once again, you can't just grab a column, you have to transpose and pull from the second row. So, all we do is change that 0 to a 1, and change age to avgRating, as shown in the following example:

Now we need to plot. So, for that, we will use the following command:

So, we have zipped our age and avgRating, and we have our plot, as shown in the following graph:

We can see a few features here. For instance, the lowest point on the graph appears to be approximately 10-year-olds, with an average rating of less than 3, and the highest rating appears to be approximately 57-year-olds, with an average rating of 4. There's another high point we can see, which is 72-year-olds with a rating of nearly 4. So, we can see that there is an upward slant to the dataset. In other words, based on looking at this graph, as we get older we tend to appreciate movies more. Now, is there any truth to this? Well, we would have to perform something called regression analysis, which is a topic for another section. For now, we're just going to have to speculate based on the direction of the data, that is, this visualization. We could improve this plot by adding a title, and we can do that as follows:

So we have added our title and a blank list, as shown in the following graph:

This plot seems to be publication-ready. In our next section, we will discuss kernel density estimation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset