36 4. KNOWLEDGE-GUIDED COMPATIBILITY MODELING
In this section, we first introduce the experiment setting and then provide the experiment results
as well as discussion on each above research question.
4.4.1 EXPERIMENT SETTINGS
In this chapter, we extract the visual and contextual representations of fashion items as follows.
Visual Modality. Regarding the visual modality, similar to our previous work, we adopted
the pre-trained ImageNet deep neural network provided by the Caffe software package [51],
which consists of five convolutional layers followed by three fully connected layers. We repre-
sented the visual modality of each item with the 4096-D output vector of the fc7 layer.
Contextual Modality. In this work, contextual description of each fashion item refers to
its title and category labels in different granularity. To obtain the effective contextual represen-
tation, instead of the traditional linguistic features [106, 107], we adopted the CNN architec-
ture [57], which has achieved compelling performance in various natural language processing
tasks [102]. In particular, we first represented each contextual description as a concatenated
word vector, where each row represents one constituent word and each word is represented by
the publicly available 300-D word2vec vector. We then deployed the single-channel CNN, con-
sisting of a convolutional layer on top of the concatenated word vectors and a max pooling layer.
In particular, we had four kernels with sizes of 2, 3, 4, and 5, 100 feature maps for each and
the rectified linear unit (ReLU) as the activation function. Ultimately, we obtained a 400-D
contextual representation for each item.
We divided the positive pair set S into three chunks: 80% of triplets for training, 10% for
validation, and 10% for testing, denoted as S
trai n
, S
valid
, and S
test
, respectively. We then gener-
ated the triplets D
S
train
, D
S
valid
, and D
S
test
according to Eq. (4.3). For each positive pair of t
i
and
b
j
, we randomly sampled M bottoms b
k
s and each b
k
contributes to a triplet .i; j; k/, where
b
k
B
C
i
and M is set as 3. We adopted the area under the ROC curve (AUC) [133] as the eval-
uation metric. For optimization, we employed the stochastic gradient descent (SGD) [3] with
the momentum factor as 0:9. We adopted the grid search strategy to determine the optimal
values for the regularization parameters (i.e., ; C ) among the values f10
r
jr 2 f4; : : : ; 1gg
and Œ2; 4; 6; 8, respectively. In addition, the mini-batch size, the number of hidden units, and
learning rate were searched in Œ32; 64; 128; 256, Œ128; 256; 512; 1024, and Œ0:01; 0:05; 0:1, re-
spectively. e proposed model was fine-tuned for 40 epochs, and the performance on the testing
set was reported. We empirically found that the proposed model achieves the optimal perfor-
mance with K D 1 hidden layer of 1,024 hidden units.
We first experimentally verified the convergence of the proposed learning scheme. Fig-
ure 4.4 shows the changes of the objective function in Eq. (4.5) and the training AUC with one
iteration of our algorithm. As we can see, both values first change rapidly in a few epochs and
then go steady finally, which well demonstrates the convergence of our model.
4.4. EXPERIMENT 37
600
500
400
300
200
100
0
1.00
0.98
0.96
0.94
0.92
0.90
0.88
0.86
0.84
0.82
0.80
0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20
Epoch
(a) (b)
Loss
AUC
Epoch
Figure 4.4: Training loss and the AUC curves.
4.4.2 ON MODEL COMPARISON (RQ1)
Due to the sparsity of our dataset, where matrix factorization-based methods [14, 15, 94, 122]
are not applicable, we chose the following content-based baselines regarding compatibility mod-
eling to evaluate the proposed model AKD-DBPR.
POP: We used the popularity of bottom b
j
to measure its compatibility with top t
i
. Here
the popularity is defined as the number of tops that has been paired with b
j
in the training
set.
RAND: We randomly assigned the compatibility scores of m
ij
and m
ik
between items.
IBR: We chose the image-based recommendation method proposed by [86], which aims to
model the compatibility between objects based on their visual appearance. is method learns
a latent style space, where the retrieval of related objects can be performed by traditional
nearest-neighbor search. Different from our model, this baseline learns the latent space by
simple linear transformation and only consider the visual information of fashion items.
ExIBR: We adopted the extension of IBR in [108], which is able to handle both the visual
and contextual data of fashion items.
BPR-DAE: We selected the content-based neural scheme introduced by [108], which is
capable of jointly modeling the coherent relation between different modalities of fashion
items and the implicit preference among items via a dual autoencoder network.
DBPR: To get a better understanding of our model, we introduced the baseline DBPR,
which is the derivation of our model by removing the guidance of the teacher network and
solely relies on the student network.
Since we can choose either the distilled student network p or the teacher network q with
a final projection according to Eq. (4.7) for the testing, we introduced two derivations of our
38 4. KNOWLEDGE-GUIDED COMPATIBILITY MODELING
model: AKD-DBPR-p and AKD-DBPR-q. Here p (q) means to use the final student (teacher)
network to calculate the compatibility between items.
Table 4.2 shows the performance comparison among different approaches. From this ta-
ble, we have the following observations. (1) DBPR outperforms all the other state-of-the-art
pure data-driven baselines, which indicates the superiority of the proposed content-based neu-
ral networks for compatibility modeling. (2) AKD-DBPR-p and AKD-DBPR-q both surpass
DBPR, which validates the benefit of knowledge distillation in the context of compatibility
modeling. To intuitively understand the impact of the rule guidance, we illustrate the compar-
ison between AKD-DBPR and DBPR on several testing triplets in Figure 4.5. As we can see,
AKD-DBPR performs especially better in cases when the given two bottoms b
j
and b
k
both
seem to be visually compatible to the top t
i
. Nevertheless, the general knowledge rules may also
lead to the failed triplets, which could be explained by the fact that not all knowledge rules in
fashion domain can be universally applicable to all the fashion item pairs.
Table 4.2: Performance comparison among different approaches in terms of AUC
Approach AUC
POP 0.4206
RAND 0.5094
IBR 0.6075
ExIBR 0.7033
BPR-DAE 0.7616
DBPR 0.7704
AKD-DBPR-p 0.7843
AKD-DBPR-q 0.7852
Moreover, to get a deep understanding of the rule guidance, we further conducted experi-
ments on each rule. Table 4.3 exhibits the performance of the student network and teacher net-
work with different rules. Notably, we found that the negative rules (e.g., no T-shirt C dress”)
seem to achieve better performance as compared to the positive ones (e.g., “coat C dress”). One
possible explanation is that people are more likely to distinguish the incompatible pairs than the
compatible ones. In addition, as we can see, rules regarding category show superiority over rules
pertaining to other attributes, such as material and color. is may be due to two reasons: (1)
the category related rules are more specific and acceptable by the public, and hence have strong
rule confidences and provide better guidance to the neural networks and (2) the category meta-
data is better structured, cleaner, and more complete as compared to the loose and noisy title
description, where we derived the other attributes (e.g., material and color) for fashion items.
Moreover, as to the color-related rules, we found that the rule black + black” surprisingly out-
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset