Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Modeling tweet topics

In machine learning and natural language processing, a topic model is a type of statistical model used to discover the abstract topics that occur in a collection of documents. A good example or use case to illustrate this concept is Twitter. Suppose we could analyze an individual's (or an organization's) tweets to discover any overriding trend. Let's look at a simple example.

If you have a Twitter account, you can perform this exercise pretty easily (you can then apply the same process to an archive of tweets you want to focus on and/or model). First, we need to create a tweet archive file.

Under Settings, you can submit a request to receive your tweets in an archive file. Once it's ready, you'll get an email with a link to download it:

And then save your file locally:

Now that we have a data source to work with, we can move the tweets into a list object (we'll call it x) and then convert that into an R data frame object (df1):

The tweets were first converted to a data frame before using the R tm package to convert them to a corpus or Corpus collection (of text documents) object:

Next, we convert the Corpus to a Document-Term Matrix object with the following code. This creates a mathematical matrix that describes the frequency of terms that occur in a collection of documents, in this case, our collection of tweets:

Word clouding

After building a document-term matrix (shown earlier), we can more easily show the importance of the words found within our tweets with a word cloud (also known as a tag cloud). We can do this using the R package wordcloud:

Finally, let's generate the word cloud visual:

Seems like there may be a theme involved here! The word cloud shows us that the words south and carolinas are the most important words.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Modeling tweet topics

Create new playlist

Sign In

Sign Up

Modeling tweet topics

Word clouding

Table of Contents for
Modeling tweet topics