The Temporal Graph-Based LSTM Layer

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6.4. MULTIMODAL SEQUENTIAL LEARNING 127

to the hybrid models, they aim to combine the above two methods within a uniﬁed frame-

work. For example, the recommendation model presented in [201] generates multiple ranking

lists via exploring diﬀerent information sources in a multi-task framework. Since the underly-

ing assumption of the traditional video recommendation models is that users’ interest is static,

therefore they cannot be applied to extract users’ dynamic interest.

Recently, many models have been proposed to characterize users’ dynamic preferences.

ese methods are in three variants: CNN-based methods [152, 157], recurrent neural net-

work (RNN) based methods [52, 132], and self-attention based methods [28, 203]. As a typical

example in the ﬁrst category, Tuan et al. [157] utilized 3-D CNNs to combine session clicks

and content features to generate recommendations. As for RNN based methods, Quadrana et

al. [132] proposed the RNN based approach for session-based recommendation, which relays

and evolves latent hidden states of the RNNs across user sessions. In [52], the authors proposed

a dynamic RNN to model users’ dynamic interest for the personalized video recommendation.

Due to the high time consumption and long sequence restriction, the self-attention mechanism

has been applied to recommender systems and gained impressive performance. For example,

Zhou et al. [203] proposed an attention-based user behavior model by considering heteroge-

neous user behaviors in e-commerce. Although the aforementioned methods have considered

users’ dynamic interest and been successfully applied to video communities, they are inadequate

to handle micro-video communities due to their diﬀerent characteristics. In particular, micro-

video communities continuously route micro-videos to users and users click their interested ones

by previewing the thumbnails, whereas traditional video communities are apt to display users’

interested videos via their query information. In addition, users’ interest information in micro-

video communities has a multi-level structure.

6.4 MULTIMODAL SEQUENTIAL LEARNING

To address the aforementioned problems, in this chapter, we develop an end-to-end temporAL

graPh-guIded recommeNdation systEm, dubbed ALPINE, to route micro-videos. e scheme

of our proposed approach is illustrated in Figure 6.2. Speciﬁcally, to model users’ diverse and

dynamic interest, we encode users’ click history information into a graph where the node refers

to micro-videos in the click history and the edge between two nodes stands for the tempo-

ral relationship. Based upon this graph, we design a novel long short-term memory (LSTM)

network to learn users’ interest representation. Afterward, we estimate the click probability via

calculating the similarity between the users’ interest representation and the embedding of the

given micro-video. Considering that users’ interest is multi-level, we introduce a user matrix to

enhance the user interest modeling by incorporating their “like” and “follow” information. And

at this step, we also get a click probability with respect to users’ more precise interest informa-

tion. Analogously, since we know the sequence of users’ disliked micro-videos, another temporal

graph-based LSTM is built to characterize users’ uninterested information, and the other click

probability can be estimated based on true negative samples. We can thus obtain a click prob-

128 6. MULTIMODAL SEQUENTIAL LEARNING

Interested Feature Sequence Uninterested Feature Sequence

Temporal Graph LSTM Temporal Graph LSTM

Enhanced Interest

Representation

Multi-level Interest

Prediction Layer

Item Embedding

Figure 6.2: Illustration of our proposed ALPINE model.

ability regarding users’ uninterested information. Finally, the weighted sum of the above three

probability scores is set as our ﬁnal prediction result.

Let v and u denote a micro-video and a user, respectively. We present the user’s histori-

cal information as a sequence of micro-videos U D f.u; v

tD1

, where j 2 fc; n ; l; f g, respec-

tively, represents user’s “click,” “not click,” “like,” and “follow” behaviors, and m is the length

of the sequence. As the user’s interest is multi-level, its sequential behaviors can be segmented

into four sub-sequences, namely “click” sequence U

D f.u; v

, “not click” sequence U

f.u; v

; /g

, “like” sequence U

D f.u; v

, and “follow” sequence U

D f.u; v

where m

C m

D m. As such, the micro-video recommendation problem can be

formally deﬁned as:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for The Temporal Graph-Based LSTM Layer

Create new playlist

Sign In

Sign Up

Table of Contents for
The Temporal Graph-Based LSTM Layer