Experiments

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6.4. MULTIMODAL SEQUENTIAL LEARNING 131

where x

is the micro-video embedding at the time step t, h

t1

and c

t1

are, respectively, the

hidden state and memory cell at the time step t  1, linking by edge < v

1

; v

>, and h



and



are the hidden state and memory cell at the time step t



, linking by edge < v



; v

>. ere-

fore, our temporal graph-based LSTM network can simultaneously leverage user’s neighbor and

cross-time interested context information to enhance the memorization of diverse interest and

further strengthen the interest representation. And we can obtain the user’s interested feature se-

quence F

D Œh

in;1

; h

in;2

; : : : ; h

in;m

 2 R

m

, where d

is the dimension of each hidden state

in F

As the user’s uninterested points are also dynamic and diverse, we build another tempo-

ral graph-based LSTM layer to model the user’s U

sequence and then obtain the uninterested

feature sequence of the user, i.e., F

D Œh

un;1

; h

un;2

; : : : ; h

un;m

 2 R

m

, where d

is the di-

mension of each hidden state in F

6.4.2 THE MULTI-LEVEL INTEREST MODELING LAYER

Since there are multiple interactions between a user and a micro-video and they reﬂect diﬀerent

degrees of user’s interest, we propose a multi-level interest modeling layer to further obtain the

enhanced interest representation. As the “like” and “follow” behaviors indicate users’ stronger

interest compared with the “click” one, we hence utilize the “like” and “follow” information

to enhance the interest representation. Particularly, for the user u, we set the weighted sum of

micro-video representations in U

and U

as the user’s enhanced interest feature f

, formulated

D w

C w

; (6.3)

where x

is the embedding of micro-video v

in U

, x

is the embedding of micro-video v

, w

, and w

are the hyper parameters controlling the weights between “like” and “follow.”

With the enhanced interest representation f

, we can construct an embedding matrix

U 2 R

N D

, i.e., user matrix, where N and D, respectively, denote the number of users and the

dimension of the enhanced interest representations. As the user’s “like” and “follow” information

more precisely indicates the user’s interest, we can obtain more accurate interest representations

using the user matrix. e user matrix U will be updated in the training phrase. Moreover, for

each user, we utilize embedding lookup strategy to search the user’s enhanced interest represen-

tation from the matrix U during the training and testing phrase.

6.4.3 THE PREDICTION LAYER

Standing on the shoulder of the user’s interested feature sequence F

, uninterested feature se-

quence F

, and enhanced interest representation f

, we place a prediction layer to get the click

probability of the given micro-video v

new

, as shown in Figure 6.4. Speciﬁcally, we ﬁrst feed F

and the embedding of the given micro-video x

new

into a vanilla attention layer to obtain the

132 6. MULTIMODAL SEQUENTIAL LEARNING

MLP

Enhanced Interest

Representation

Uninterested Feature SequenceInterested Feature Sequence

Vanilla Attention Vanilla Attention

Figure 6.4: Structure of the Prediction Layer.

improved interested representation f

. Formally, the attention layer is deﬁned as follows:

exp

in;j

new

j D1

exp

in;j

new

;



in;j

; x

new



D h

in;j

new

;

(6.4)

where h

in;j

2 R

, x

new

2 R

, W 2 R

D

, and ˛

denotes the attention score of the j th in-

terested feature in F

. With the attention weight ˛

, the improved interested representation is

computed as follows:

j D1

in;j

: (6.5)

ereafter, we concatenate the improved interested representation f

and the representation of

the new micro-video x

new

, and then feed it into a multi-layer perception (MLP) network, as

follows:

D 

; x

new



C b

;

D W

C b

;

(6.6)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Experiments

Create new playlist

Sign In

Sign Up

Table of Contents for
Experiments