6.4. MULTIMODAL SEQUENTIAL LEARNING 131
where x
t
is the micro-video embedding at the time step t, h
t1
and c
t1
are, respectively, the
hidden state and memory cell at the time step t 1, linking by edge < v
t
c
1
c
; v
t
c
c
>, and h
and
c
are the hidden state and memory cell at the time step t
, linking by edge < v
t
c
c
; v
t
c
c
>. ere-
fore, our temporal graph-based LSTM network can simultaneously leverage user’s neighbor and
cross-time interested context information to enhance the memorization of diverse interest and
further strengthen the interest representation. And we can obtain the users interested feature se-
quence F
in
D Œh
in;1
; h
in;2
; : : : ; h
in;m
c
2 R
d
c
m
c
, where d
c
is the dimension of each hidden state
in F
in
.
As the users uninterested points are also dynamic and diverse, we build another tempo-
ral graph-based LSTM layer to model the users U
n
sequence and then obtain the uninterested
feature sequence of the user, i.e., F
un
D Œh
un;1
; h
un;2
; : : : ; h
un;m
n
2 R
d
n
m
n
, where d
n
is the di-
mension of each hidden state in F
un
.
6.4.2 THE MULTI-LEVEL INTEREST MODELING LAYER
Since there are multiple interactions between a user and a micro-video and they reflect different
degrees of user’s interest, we propose a multi-level interest modeling layer to further obtain the
enhanced interest representation. As the like” and “follow behaviors indicate users’ stronger
interest compared with the click” one, we hence utilize the “like” and “follow” information
to enhance the interest representation. Particularly, for the user u, we set the weighted sum of
micro-video representations in U
l
and U
f
as the users enhanced interest feature f
en
, formulated
as
f
en
D w
l
m
l
X
t
l
D1
x
t
l
l
C w
f
m
f
X
t
f
D1
x
t
f
f
; (6.3)
where x
t
l
l
is the embedding of micro-video v
t
l
l
in U
l
, x
t
f
f
is the embedding of micro-video v
t
f
f
in
U
f
, w
l
, and w
f
are the hyper parameters controlling the weights between like” and “follow.”
With the enhanced interest representation f
en
, we can construct an embedding matrix
U 2 R
N D
, i.e., user matrix, where N and D, respectively, denote the number of users and the
dimension of the enhanced interest representations. As the users like” and “follow information
more precisely indicates the users interest, we can obtain more accurate interest representations
using the user matrix. e user matrix U will be updated in the training phrase. Moreover, for
each user, we utilize embedding lookup strategy to search the users enhanced interest represen-
tation from the matrix U during the training and testing phrase.
6.4.3 THE PREDICTION LAYER
Standing on the shoulder of the users interested feature sequence F
in
, uninterested feature se-
quence F
un
, and enhanced interest representation f
en
, we place a prediction layer to get the click
probability of the given micro-video v
new
, as shown in Figure 6.4. Specifically, we first feed F
in
and the embedding of the given micro-video x
new
into a vanilla attention layer to obtain the
132 6. MULTIMODAL SEQUENTIAL LEARNING
MLP
MLP
MLP
Enhanced Interest
Representation
ŷ
ŷ
!
ŷ
"!
ŷ
#!
Uninterested Feature SequenceInterested Feature Sequence
Vanilla Attention Vanilla Attention
Figure 6.4: Structure of the Prediction Layer.
improved interested representation f
in
. Formally, the attention layer is defined as follows:
8
<
:
˛
j
D
exp
.
f
.
h
in;j
;x
new
//
P
m
c
j D1
exp
.
f
.
h
in;j
;x
new
//
;
f
h
in;j
; x
new
D h
T
in;j
Wx
new
;
(6.4)
where h
in;j
2 R
d
c
, x
new
2 R
D
, W 2 R
d
c
D
, and ˛
j
denotes the attention score of the j th in-
terested feature in F
in
. With the attention weight ˛
j
, the improved interested representation is
computed as follows:
f
in
D
m
c
X
j D1
˛
j
h
in;j
: (6.5)
ereafter, we concatenate the improved interested representation f
in
and the representation of
the new micro-video x
new
, and then feed it into a multi-layer perception (MLP) network, as
follows:
8
<
:
f
1
D
.
W
1
Œ
f
in
; x
new
C b
1
/
;
Oy
in
D W
2
f
1
C b
2
;
(6.6)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset