134 6. MULTIMODAL SEQUENTIAL LEARNING
BPR [137]: is is a Bayesian personalized ranking model, which trains on pairwise items
by maximizing the difference between the posterior probability of the positive samples and
the negative ones.
CNN-R: is model is a CNN-based recommendation system, which utilizes the CNN
structure to model sequential information. In particular, it first applies different convo-
lutional kernels to the sequential feature matrix. Explicitly, the window size varies from
one to ten, and each kernel size has 32 linear filters. ereafter, it feeds the obtained fea-
ture map into the max pooling layer followed by a fully connected layer to obtain interest
embedding. Finally, a MLP is followed to predict the click probability.
LSTM-R: is model utilizes the LSTM network to model the user’s sequential infor-
mation. Having obtained the hidden states, it feeds them into a fully connected layer to
generate the interest representation, and then a MLP module is adopted to predict the
click probability.
ATRank [203]: It is an attention-based user behavior modeling framework, which cap-
tures the user’s behavior interactions in multiple semantic spaces by the self-attention
mechanism.
NCF [60]: It is a collaborative filtering-based deep recommendation model, which learns
the user embedding and the item embedding with a shallow network (element-wise prod-
uct between user and item) and a deep network (concatenation of the user and item em-
bedding followed by several MLP layers).
THACIL [28]: It is a self-attention-based method for the micro-video recommenda-
tion, which utilizes a multi-head self-attention layer to capture the long-term correlation
within user behaviors and the item and category two-level attention layer to model the
fine-grained profiling of the user interest.
It is worth mentioning that THACIL and ATRank utilize the same click probability
prediction layer as our model. As to the other methods including CNN-R, LSTM-R, BPR,
and NCF, we fed the interest representations and the embedding of the new micro-video into
the MLP layer to predict the click probability.
6.5.3 OVERALL COMPARISON
We conducted an empirical study to investigate whether our proposed model can achieve better
recommendation performance. e results of all methods on two datasets are summarized in
Table 6.1. Several observations stand out.
BPR performs worse than the other baselines since it overlooks the sequential character-
istic of the users’ interest information. It hence fails to exploit the users dynamic interest,
revealing the necessity of modeling the historical sequence.
6.5. EXPERIMENTS 135
Table 6.1: Performance comparison between our proposed model and several state-of-the-art
baselines over Dataset III-1 and III-2. And statistical significance over AUC between ALPINE
and the best baseline (i.e., THACIL) is determined by a t-test (4 denotes p-value <0.01).
Methods
Dataset III-1 Dataset III-2
AUC P@50 R@50 F@50 AUC P@50 R@50 F@50
BPR 0.595 0.290 0.387 0.331 0.583 0.241 0.181 0.206
LRTM-R 0.713 0.316 0.420 0.360 0.641 0.277 0.205 0.236
CNN-R 0.719 0.312 0.413 0.356 0.650 0.287 0.214 0.245
ATRank 0.722 0.322 0.426 0.367 0.660 0.297 0.221 0.253
NCF 0.724 0.320 0.420 0.364 0.672 0.316 0.225 0.262
THACIL 0.727 0.325 0.429 0.369 0.684 0.324 0.234 0.269
ALPINE 0.739
0.331 0.436 0.376 0.713
0.300 0.460 0.362
Sequential modeling methods, including LSTM-R, CNN-R, ATRank, and THACIL,
surpass the BPR model. is verifies the effectiveness of sequence modeling. Moreover,
the self-attention based models, i.e., ATRank and THACIL, outperform CNN-R and
LSTM-R, especially the latter one. It reveals that simply utilizing the LSTM network
is insufficient to capture the users’ dynamic and diverse interest information from a very
long sequence. e attention mechanism can implicitly reduce the memorization length
by focusing on the key interest information, that is why ATRank and THACIL achieve
better performance on two datasets.
While NCF does not model the users historical information as a sequence, it also achieves
promising performance compared with the other baselines. Probably because setting a user
embedding matrix and updating it in the training stage can improve the interest represen-
tation. Moreover, two operations, the element wise product and several MLPs, model the
relationship between users and items better.
ALPINE achieves the best performance, substantially surpassing all the baselines. Partic-
ularly, ALPINE presents consistent improvements over sequential models like ATRank
and THACIL, verifying the importance of memorizing the prior interested information
and employing the temporal graph-based LSTM network on enhancing the interest rep-
resentation. In addition, our proposed ALPINE exceeds NCF, because NCF randomly
initializes the user matrix rather than explores its multi-level interest information. is jus-
tifies the effectiveness of our proposed multi-level interest modeling module. Moreover,
as ALPINE also characterizes the users uninterested cues, which can further improve the
recommendation performance.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset