134 6. MULTIMODAL SEQUENTIAL LEARNING
• BPR [137]: is is a Bayesian personalized ranking model, which trains on pairwise items
by maximizing the difference between the posterior probability of the positive samples and
the negative ones.
• CNN-R: is model is a CNN-based recommendation system, which utilizes the CNN
structure to model sequential information. In particular, it first applies different convo-
lutional kernels to the sequential feature matrix. Explicitly, the window size varies from
one to ten, and each kernel size has 32 linear filters. ereafter, it feeds the obtained fea-
ture map into the max pooling layer followed by a fully connected layer to obtain interest
embedding. Finally, a MLP is followed to predict the click probability.
• LSTM-R: is model utilizes the LSTM network to model the user’s sequential infor-
mation. Having obtained the hidden states, it feeds them into a fully connected layer to
generate the interest representation, and then a MLP module is adopted to predict the
click probability.
• ATRank [203]: It is an attention-based user behavior modeling framework, which cap-
tures the user’s behavior interactions in multiple semantic spaces by the self-attention
mechanism.
• NCF [60]: It is a collaborative filtering-based deep recommendation model, which learns
the user embedding and the item embedding with a shallow network (element-wise prod-
uct between user and item) and a deep network (concatenation of the user and item em-
bedding followed by several MLP layers).
• THACIL [28]: It is a self-attention-based method for the micro-video recommenda-
tion, which utilizes a multi-head self-attention layer to capture the long-term correlation
within user behaviors and the item and category two-level attention layer to model the
fine-grained profiling of the user interest.
It is worth mentioning that THACIL and ATRank utilize the same click probability
prediction layer as our model. As to the other methods including CNN-R, LSTM-R, BPR,
and NCF, we fed the interest representations and the embedding of the new micro-video into
the MLP layer to predict the click probability.
6.5.3 OVERALL COMPARISON
We conducted an empirical study to investigate whether our proposed model can achieve better
recommendation performance. e results of all methods on two datasets are summarized in
Table 6.1. Several observations stand out.
• BPR performs worse than the other baselines since it overlooks the sequential character-
istic of the users’ interest information. It hence fails to exploit the user’s dynamic interest,
revealing the necessity of modeling the historical sequence.