On Attention Mechanism (RQ2)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

36 4. KNOWLEDGE-GUIDED COMPATIBILITY MODELING

In this section, we ﬁrst introduce the experiment setting and then provide the experiment results

as well as discussion on each above research question.

4.4.1 EXPERIMENT SETTINGS

In this chapter, we extract the visual and contextual representations of fashion items as follows.

Visual Modality. Regarding the visual modality, similar to our previous work, we adopted

the pre-trained ImageNet deep neural network provided by the Caﬀe software package [51],

which consists of ﬁve convolutional layers followed by three fully connected layers. We repre-

sented the visual modality of each item with the 4096-D output vector of the fc7 layer.

Contextual Modality. In this work, contextual description of each fashion item refers to

its title and category labels in diﬀerent granularity. To obtain the eﬀective contextual represen-

tation, instead of the traditional linguistic features [106, 107], we adopted the CNN architec-

ture [57], which has achieved compelling performance in various natural language processing

tasks [102]. In particular, we ﬁrst represented each contextual description as a concatenated

word vector, where each row represents one constituent word and each word is represented by

the publicly available 300-D word2vec vector. We then deployed the single-channel CNN, con-

sisting of a convolutional layer on top of the concatenated word vectors and a max pooling layer.

In particular, we had four kernels with sizes of 2, 3, 4, and 5, 100 feature maps for each and

the rectiﬁed linear unit (ReLU) as the activation function. Ultimately, we obtained a 400-D

contextual representation for each item.

We divided the positive pair set S into three chunks: 80% of triplets for training, 10% for

validation, and 10% for testing, denoted as S

trai n

, S

valid

, and S

test

, respectively. We then gener-

ated the triplets D

train

, D

valid

, and D

test

according to Eq. (4.3). For each positive pair of t

and

, we randomly sampled M bottoms b

’s and each b

contributes to a triplet .i; j; k/, where

… B

and M is set as 3. We adopted the area under the ROC curve (AUC) [133] as the eval-

uation metric. For optimization, we employed the stochastic gradient descent (SGD) [3] with

the momentum factor as 0:9. We adopted the grid search strategy to determine the optimal

values for the regularization parameters (i.e., ; C ) among the values f10

jr 2 f4; : : : ; 1gg

and Œ2; 4; 6; 8, respectively. In addition, the mini-batch size, the number of hidden units, and

learning rate were searched in Œ32; 64; 128; 256, Œ128; 256; 512; 1024, and Œ0:01; 0:05; 0:1, re-

spectively. e proposed model was ﬁne-tuned for 40 epochs, and the performance on the testing

set was reported. We empirically found that the proposed model achieves the optimal perfor-

mance with K D 1 hidden layer of 1,024 hidden units.

We ﬁrst experimentally veriﬁed the convergence of the proposed learning scheme. Fig-

ure 4.4 shows the changes of the objective function in Eq. (4.5) and the training AUC with one

iteration of our algorithm. As we can see, both values ﬁrst change rapidly in a few epochs and

then go steady ﬁnally, which well demonstrates the convergence of our model.

4.4. EXPERIMENT 37

600

500

400

300

200

100

1.00

0.98

0.96

0.94

0.92

0.90

0.88

0.86

0.84

0.82

0.80

0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20

Epoch

(a) (b)

Loss

AUC

Epoch

Figure 4.4: Training loss and the AUC curves.

4.4.2 ON MODEL COMPARISON (RQ1)

Due to the sparsity of our dataset, where matrix factorization-based methods [14, 15, 94, 122]

are not applicable, we chose the following content-based baselines regarding compatibility mod-

eling to evaluate the proposed model AKD-DBPR.

• POP: We used the “popularity” of bottom b

to measure its compatibility with top t

. Here

the “popularity” is deﬁned as the number of tops that has been paired with b

in the training

set.

• RAND: We randomly assigned the compatibility scores of m

and m

between items.

• IBR: We chose the image-based recommendation method proposed by [86], which aims to

model the compatibility between objects based on their visual appearance. is method learns

a latent style space, where the retrieval of related objects can be performed by traditional

nearest-neighbor search. Diﬀerent from our model, this baseline learns the latent space by

simple linear transformation and only consider the visual information of fashion items.

• ExIBR: We adopted the extension of IBR in [108], which is able to handle both the visual

and contextual data of fashion items.

• BPR-DAE: We selected the content-based neural scheme introduced by [108], which is

capable of jointly modeling the coherent relation between diﬀerent modalities of fashion

items and the implicit preference among items via a dual autoencoder network.

• DBPR: To get a better understanding of our model, we introduced the baseline DBPR,

which is the derivation of our model by removing the guidance of the teacher network and

solely relies on the student network.

Since we can choose either the distilled student network p or the teacher network q with

a ﬁnal projection according to Eq. (4.7) for the testing, we introduced two derivations of our

38 4. KNOWLEDGE-GUIDED COMPATIBILITY MODELING

model: AKD-DBPR-p and AKD-DBPR-q. Here p (q) means to use the ﬁnal student (teacher)

network to calculate the compatibility between items.

Table 4.2 shows the performance comparison among diﬀerent approaches. From this ta-

ble, we have the following observations. (1) DBPR outperforms all the other state-of-the-art

pure data-driven baselines, which indicates the superiority of the proposed content-based neu-

ral networks for compatibility modeling. (2) AKD-DBPR-p and AKD-DBPR-q both surpass

DBPR, which validates the beneﬁt of knowledge distillation in the context of compatibility

modeling. To intuitively understand the impact of the rule guidance, we illustrate the compar-

ison between AKD-DBPR and DBPR on several testing triplets in Figure 4.5. As we can see,

AKD-DBPR performs especially better in cases when the given two bottoms b

and b

both

seem to be visually compatible to the top t

. Nevertheless, the general knowledge rules may also

lead to the failed triplets, which could be explained by the fact that not all knowledge rules in

fashion domain can be universally applicable to all the fashion item pairs.

Table 4.2: Performance comparison among diﬀerent approaches in terms of AUC

Approach AUC

POP 0.4206

RAND 0.5094

IBR 0.6075

ExIBR 0.7033

BPR-DAE 0.7616

DBPR 0.7704

AKD-DBPR-p 0.7843

AKD-DBPR-q 0.7852

Moreover, to get a deep understanding of the rule guidance, we further conducted experi-

ments on each rule. Table 4.3 exhibits the performance of the student network and teacher net-

work with diﬀerent rules. Notably, we found that the negative rules (e.g., “no T-shirt C dress”)

seem to achieve better performance as compared to the positive ones (e.g., “coat C dress”). One

possible explanation is that people are more likely to distinguish the incompatible pairs than the

compatible ones. In addition, as we can see, rules regarding category show superiority over rules

pertaining to other attributes, such as material and color. is may be due to two reasons: (1)

the category related rules are more speciﬁc and acceptable by the public, and hence have strong

rule conﬁdences and provide better guidance to the neural networks and (2) the category meta-

data is better structured, cleaner, and more complete as compared to the loose and noisy title

description, where we derived the other attributes (e.g., material and color) for fashion items.

Moreover, as to the color-related rules, we found that the rule “black + black” surprisingly out-

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for On Attention Mechanism (RQ2)

Create new playlist

Sign In

Sign Up

Table of Contents for
On Attention Mechanism (RQ2)