Classifying tweets in real time

We can use our model in order to classify tweets in real time using Twitter’s API. In order to simplify things, we will make use of a very popular wrapper library for the API, tweepy (https://github.com/tweepy/tweepy). Installation is easily achieved with pip install tweepy. The first step to accessing Twitter programmatically is to generate relevant credentials. This is achieved by navigating to https://apps.twitter.com/ and selecting Create an app. The application process is straightforward and should be accepted quickly.

Using tweepy's StreamListener, we will define a class that listens for incoming tweets, and as soon as they arrive, it classifies them and prints the original text and predicted polarity. First, we will load the required libraries. As a classifier, we will utilize the voting ensemble we trained earlier. First, we load the required libraries. We need the json library, as tweets are received in the JSON format; parts of the tweepy library; as well as the scikit-learn components we utilized earlier. Furthermore, we store our API keys in variables:

import pandas as pd
import json
from sklearn.ensemble import VotingClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression, RidgeClassifier
from sklearn.naive_bayes import MultinomialNB
from tweepy import OAuthHandler, Stream, StreamListener
# Please fill your API keys as strings
consumer_key="HERE,"
consumer_secret="HERE,"

access_token="HERE,"
access_token_secret="AND HERE"

We then proceed to create and train our TfidfVectorizer and VotingClassifier with 30,000 features and n-grams in the [1, 3] range:

# Load the data
data = pd.read_csv('sent140_preprocessed.csv')
data = data.dropna()
# Replicate our voting classifier for 30.000 features and 1-3 n-grams
train_size = 10000
tf = TfidfVectorizer(max_features=30000, ngram_range=(1, 3),
stop_words='english')
tf.fit(data.text)
transformed = tf.transform(data.text)
x_data = transformed[:train_size].toarray()
y_data = data.polarity[:train_size].values
voting = VotingClassifier([('LR', LogisticRegression()),
('NB', MultinomialNB()),
('Ridge', RidgeClassifier())])
voting.fit(x_data, y_data)

We then proceed with defining our StreamClassifier class, responsible for listening for incoming tweets and classifying them as they arrive. It inherits the StreamListener class from tweepy. By overriding the on_data function, we are able to process tweets as they arrive through the stream. The tweets arrive in JSON format, so we first parse them with json.loads(data), which returns a dictionary, and then extract the text using the "text" key. We can then extract the features using the fitted vectorizer and utilize the features in order to predict its polarity:

# Define the streaming classifier
class StreamClassifier(StreamListener):
def __init__(self, classifier, vectorizer, api=None):
super().__init__(api)
self.clf = classifier
self.vec = vectorizer
# What to do when a tweet arrives
def on_data(self, data):
# Create a json object
json_format = json.loads(data)
# Get the tweet's text
text = json_format['text']
features = self.vec.transform([text]).toarray()
print(text, self.clf.predict(features))
return True
# If an error occurs, print the status
def on_error(self, status):
print(status)

Finally, we instantiate StreamClassifier, passing as arguments, the trained voting ensemble and TfidfVectorizer and authenticate using the OAuthHandler. In order to start the stream, we instantiate a Stream object with the OAuthHandler and StreamClassifier objects as parameters and define the keywords we want to track with filter(track=['Trump']). In this case, we track tweets that contain the keyword 'Trump' as shown here:

# Create the classifier and authentication handlers
classifier = StreamClassifier(classifier=voting, vectorizer=tf)
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# Listen for specific hashtags
stream = Stream(auth, classifier)
stream.filter(track=['Trump'])

That's it! The preceding code now tracks any tweet containing the keyword Trump and predicts its sentiment in real time. The following table depicts some simple tweets that were classified:

Text

Polarity

RT @BillyBaldwin: Only two things funnier than my brothers impersonation of Trump. Your daughters impersonation of being an honest, decent…

Negative

RT @danpfeiffer: This is a really important article for Democrats to read. Media reports of Trump’s malfeasance is only the start. It's the…

Positive

RT @BillKristol: "In other words, Trump had backed himself, not Mexico, into a corner. They had him. He had to cave. And cave he did. He go…

Positive

RT @SenJeffMerkley: That Ken Cuccinelli started today despite not being nominated is unacceptable. Trump is doing an end run around the Sen…

Negative

Example of tweets being classified
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset