Data Insights

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

18 3. DATA-DRIVEN COMPATIBILITY MODELING

Taking advantage of the back-propagation strategy, we ﬁrst calculate the

bpr

mod

, and

rec

as follows:

bpr

D .m

ijk

@.Qv

 v

mod

D  .z

@.Qv

rec

D .Ov

 v

@.Ov

(3.9)

@.Qv

and

@.Ov

can be derived from Ov

D .

K1

/ and Qv

D .W

K1

C b

we can easily access @L

bpr

=@W

, @L

mod

=@W

and @L

rec

. en we can iteratively ob-

tain @L

bpr

=@W

and @L

mod

=@W

; k D K; : : : ; 1. Meanwhile, we obtain the @L

rec

and

rec

=@W

, k D K; : : : ; 1, in the similar manner. We then employ the stochastic gradient de-

scent to optimize the proposed model, where the network parameters can be updated as follows:

 



bpr

C 

mod

C 

rec

C W



 



rec

C 

;

(3.10)

where  is the learning rate.

3.4 EXPERIMENT

In this part, we conducted extensive experiments to verify our proposed BPR-DAE model on

the Dataset I by answering the following research questions.

• Does BPR-DAE outperform the state-of-the-art methods?

• What is the contribution of each component of BPR-DAE?

• How does each modality contribute to the compatibility modeling?

3.4.1 EXPERIMENT SETTINGS

In this chapter, we extract the visual and contextual features of fashion items as follows.

Visual Modality. In this work, we took advantage of the advanced deep CNNs, which

has been proven to be the state-of-the-art model for image representation learning [10, 55,

134]. In particular, we chose the pre-trained ImageNet deep neural network provided by the

Caﬀe software package [51], which consists of ﬁve convolutional layers followed by three fully

3.4. EXPERIMENT 19

connected layers. We fed the image of each fashion item to the CNNs, and adopted the fc7

layer output as the visual feature. erefore, for each item, its visual modality is represented by

a 4096-D vector.

Contextual Modality. Considering the short length of such contextual information, we

utilized the bag-of-words scheme [50], which has been proven to be eﬀective to encode con-

textual metadata [26]. We ﬁrst constructed a style vocabulary based on the categories and the

words in all the titles in our dataset. As such user-generated metadata can be inevitably noisy,

we thus ﬁltered out the categories and words that appeared in less than ﬁve items as well as the

words with less than three characters, which are more likely to be noise. We ultimately obtained

a vocabulary of 3,529 phrases, and hence compiled the contextual modality of each fashion item

with a 3,529-D boolean vector.

We separated the positive pair set S in Dataset I into three chunks: 80% of tripes for

training, 10% for validation, and 10% for testing, which are denoted as S

train

, S

valid

, and S

test

respectively. en we generated the triple set D

train

, D

valid

, and D

test

according to Eq. (3.5). In

particular, for each positive top-bottom pair t

and b

, we randomly sampled M bottoms b

’s

to construct M triplets .i; j; k/, where b

… B

and M is set as 3. We then adopted the widely

used metric AUC (Area Under the ROC curve) [99], which is deﬁned as

AUC D

jT j

E.i /

.j;k/2E.i /



> m



; (3.11)

where the evaluation pairs per top i are deﬁned as

E.i/ WD

.j; k/j.i; j / 2 S

test

^ .i; k/ … S

: (3.12)

ı.b/ is the indicator function that returns one if the argument b is true and zero otherwise.

For optimization, we employed the stochastic gradient descent (SGD) [3] with the mo-

mentum factor as 0:9. We adopted the grid search strategy to determine the optimal values

for the regularization parameters (i.e., ; ; ) among the values f10

jr 2 f5; : : : ; 1gg. In

addition, the mini-batch size, the number of hidden units and learning rate for all methods

were searched in Œ32; 64; 128; 256; 512; 1024, Œ128; 256; 512; 1024, and Œ0:001; 0:01; 0:1, re-

spectively. e proposed model was ﬁne-tuned based on training set and validation set for 30

epochs, and the performance on testing set was reported. We experimentally found that the pro-

posed model achieves the optimal performance with K D 1 hidden layer of 512 hidden units.

All the experiments were conducted over a server equipped with four NVIDIA Titan X GPUs.

We ﬁrst experimentally veriﬁed the convergence of the proposed learning algorithm. e

changes of the objective function in Eq. (3.8) and the training AUC with one run of the training

algorithm are illustrated in Figure 3.4. As we can see, both values ﬁrst change rapidly within a

few epochs and then tend to go steady ﬁnally, which well demonstrates the convergence of our

model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Data Insights

Create new playlist

Sign In

Sign Up

Table of Contents for
Data Insights