Vector embeddings and neural networks

It's time to combine vector embeddings and neural networks to, hopefully, create more robust models. Let's do a simple extension to our workflow by training a neural network as a classifier. We start with the now familiar preprocessing and vector embedding:

library(plyr)
library(dplyr)
library(text2vec)
library(tidytext)
library(caret)
library(tokenizers)

imdb <- read.csv("./data/labeledTrainData.tsv"
, encoding = "utf-8"
, quote = ""
, sep=" "
, stringsAsFactors = F)

tokens <- tokenize_words(imdb$review, stopwords = stopwords())

# Boring boilerplate preprocessing
token_iterator <- itoken(tokens)
vocab <- create_vocabulary(token_iterator)
vocab <- prune_vocabulary(vocab, term_count_min = 5L)
vectorizer <- vocab_vectorizer(vocab)

# Create context and embedding
tcm <- create_tcm(token_iterator, vectorizer, skip_grams_window = 5L)
glove <- GlobalVectors$new(word_vectors_size = 50,
vocabulary = vocab,
x_max = 10)
wv_main = glove$fit_transform(tcm,
n_iter = 10,
convergence_tol = 0.01)
text <- unlist(imdb$review)
text_df <- data_frame(line = 1:length(text), text = text)
text_df <- text_df %>%
unnest_tokens(word, text)

Let's try now with the context vector as well, instead of the main word vector:

wv_context <- glove$components
wv <- as.data.frame(wv_main+t(wv_context))
wv$word <- row.names(wv)
df <- wv%>% inner_join(text_df)

To then finally create the trained matrix:

df <- df %>% group_by(line) %>% summarize_all(mean) %>% select(1:51)
df$label <- as.factor(imdb$sentiment)

And finally, create a baseline neural network model with a single layer:

library(keras)

X <- df[,2:51]
y <- df[,52]

y <- to_categorical(y[["label"]])
y <- y[,2:3]

model <- keras_model_sequential()
model %>%
layer_dense(activation='relu', units =20, input_shape=c(50))%>%
layer_dense(units=2, activation = 'softmax')

model %>% compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)

history <- model %>% keras::fit(
as.matrix(X), y,
epochs = 30, batch_size = 128,
validation_split = 0.2
)

Calling the fit method triggers the training procedure. Once the training is done, we can see how it went by using the plot function:

plot(history)

Which gives us some slightly disappointing results:

Performance of our single layer neural network with 20 neurons

Why is that disappointing? Well, our mighty neural network with 20 neurons did not improve upon, say, random forests. How can we improve this result? 

It seems to be that neural networks in this case, or at least feed-forward neural networks might not be of much help here, or at least we were not able to find significant improvements by adding more layers.  Actually it seemed to worsen results with three or more layers, and we even observed over-fitting.  

Occam's razor should prevail. The model you keep should be as simple as possible. A complicated model might be of little use in production because of the technical complications needed to deploy it. Furthermore, when we have no way of interpreting the model, unexpected results might happen when the model is applied to slightly different data from that used for training and testing.

We can save both our vector embedding and model for later use:

write.csv(wv,"./data/wv.csv", row.names = F)
save_model_hdf5(model,"glove_nn.hdf5")
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset