Summary

In this chapter, we used an analysis of the word frequency of tweets and the mathematics behind eigenvectors to build a recommendation engine to help Twitter users find other users similar to themselves. Along the way, we learned that there are many words in our language that aren't useful to the data analysis process. We filtered out the stop words from our dataset. We studied the jump from bivariate data to multivariate data and looked at the tools that linear algebra has to offer. Before this, we had worked with simple lists and a single input and output. We got the chance to work with matrices with many dimensions of data. We were able to organize our multidimensional datasets using a covariance matrix. We were able to measure the skewness of our covariance matrix using the eigenvalue decomposition process. We learned that not all dimensions of data are useful, and we were able to weed out the less useful dimensions by utilizing Principal Component Analysis (the process by which we used only the top ranked eigenvalues and eigenvectors). Using our lower dimensional dataset, we built a recommendation engine that searched for the users that were the nearest matches to a user using the Euclidean squared distance measure. Finally, we tested our dataset by selecting a user and discovered some similar users to this member.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset