Overall Comparison

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6.5. EXPERIMENTS 133

where W

2 R

.d

CD/

and W

2 R

1d

denote the weight matrixes, b

2 R

and b

, re-

spectively, denote the bias vector and the bias value, and  denotes the ReLU activation function.

is the click probability calculated by the improved interested representation f

Similarly, we can obtain the improved uninterested representation f

based on F

and

new

using another vanilla attention layer. Afterward, we feed the concatenation of the improved

uninterested representation f

and the new micro-video embedding x

new

into two MLP layers,

and obtain the click probability based on the improved uninterested representation, i.e., Oy

Analogously, the click probability based on the enhanced interest representation, i.e., Oy

, can

be obtained by feeding the concatenation of the enhanced interest representation f

and the

new micro-video embedding x

new

into two MLP layers.

Finally, the weighted sum of the above three probability values is set as our prediction

result,

Oy D ˛

C ˛

; (6.7)

where ˛

, ˛

, and ˛

are the hyper parameters controlling the weights of Oy

, Oy

, and Oy

respectively, and Oy is the ﬁnal output of our model denoting the click probability of the given

user on the given new micro-video.

Our method is trained as an end-to-end deep learning model equipped with the sigmoid

cross-entropy loss:

L. Oy/ D 

y log 

C .1  y/ log

1  

///

; (6.8)

where  denotes the sigmoid activation function and y 2 f0; 1g is the ground truth that indicates

whether the user clicks the micro-video or not. Besides, the back-propagation through time

(BPTT) method is adopted to train our ALPINE model.

6.5 EXPERIMENTS

6.5.1 EXPERIMENTAL SETTINGS

Implementation Details. In the Dataset III-1, we utilized the 64-d visual embedding to repre-

sent the micro-video. As for the Dataset III-2, the concatenation of the 64-d category embed-

ding and the 64-d visual embedding is set as the micro-video embedding. e length of users’

historical sequence is set to 300. If it exceeds 300, we truncated it to 300; otherwise, we padded

it to 300 and masked the padding in the network. We optimized the parameters using Adam

with the initial learning rate 0.001, and the batch size is 2048.

6.5.2 BASELINES

To demonstrate the eﬀectiveness of our proposed ALPINE model, we compared it with the

following state-of-the-art methods.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Overall Comparison

Create new playlist

Sign In

Sign Up

Table of Contents for
Overall Comparison