6.4. EXPERIMENT 71
Experiment Settings. In our context of matching bottoms for a given top, we only con-
sidered the outfits that either contain a top and a bottom, or a coat plus a bottom/dress, where we
treated the coat as the “top” while the bottom/dress as the bottom.” As one user may coordinate
different shoes or accessories for the same top-bottom pair to make different outfits, we removed
the duplicated top-bottom pairs from the dataset, resulting in 217,806 unique top-bottom pairs.
Regarding the evaluation, we adopted the leave-one-out strategy, where we randomly sampled
one top-bottom pair for each user and retained it as the testing sample. en we generated the
quadruple set D
train
, D
valid
, and D
test
according to Eq. (6.7), where for each positive top-bottom
pair .t
i
; b
j
/ of the user u
m
, we randomly sampled a negative bottom b
k
from the whole bottom
dataset (i.e., B) to comprise a quadruplet .m; i; j; k/. Finally, we adopted the AUC [133] as the
evaluation metric.
For optimization, we employed the adaptive moment estimation method (Adam) [58].
We adopted the grid search strategy to determine the optimal values for the regularization
parameter and trade-off parameters (, , and ). In addition, the mini-batch size, the
number of hidden units and learning rate were searched in Œ32; 64; 128, Œ256; 512; 1024, and
Œ0:0005; 0:001; 0:005; 0:01, respectively. e proposed model was fine-tuned for 40 epochs, and
the performance on the testing set was reported. We empirically set the number of hidden layers
in representation learning K D 1.
6.4.2 ON MODEL COMPARISON (RQ1)
We chose the following state-of-the-art methods as the baselines to evaluate the proposed
model.
POP-T: We used the popularity of the bottom to measure its compatibility with top, which
is defined as the number of outfits that the bottom appeared in the training set.
POP-U: Similarly, in this baseline, we defined the “popularity of the bottom as the number
of users who once interacted with the bottom in the training set.
RAND: We randomly assigned the compatibility scores of m
ij
and m
ik
between items.
Bi-LSTM: We chose the bidirectional LSTM model in [31] which explores the outfit com-
patibility by sequentially predicting the next item conditioned on previous ones. In our con-
text, we adapted Bi-LSTM to deal with an outfit comprising of a top and a bottom.
BPR-DAE: We selected the content-based neural scheme introduced by [108] that is capa-
ble of jointly modeling the coherent relation between different modalities of fashion items
and the implicit preference among items via a dual autoencoder network. It is worth noting
that BPR-DAE overlooks the user factor in the compatibility modeling.
BPR-MF: We used the pairwise ranking method introduced in [100], where the latent user-
item relations are captured by the MF method.
72 6. PERSONALIZED COMPATIBILITY MODELING
VBPR: We adopted the VBPR in [34], which exploits the visual data of fashion items with
the factorization method to recommend an item for the user.
TBPR: We derived TBPR from VBPR by replacing the visual signals with the contextual
modality of fashion items.
VTBPR: We extended VBPR in [34] by further introducing the context factor to compre-
hensively characterize the users preference from both the visual and contextual perspectives.
Table 6.1 shows the performance comparison among different approaches. From this ta-
ble, we have the following observations. (1) BPR-DAE shows superiority over Bi-LSTM, which
implies that the content-based scheme performs better than the sequential model in the general
compatibility modeling between fashion items. (2) VTBPR outperforms VBPR, TBPR, and
BPR-MF, which confirms the advantage of considering both the visual and contextual modali-
ties in the personal preference modeling. Interestingly, we found that TBPR slightly surpasses
VBPR, demonstrating the great potential of contextual data in characterizing users’ personal
preference of items. (3) GP-BPR achieves better performance than all the other methods that
focus on either the general compatibility modeling or person preference modeling, validating the
necessity of incorporating both the general item-item compatibility and user-item preference in
the context of personalized clothing matching.
Table 6.1: Performance comparison among different approaches in terms of AUC
Approach AUC
POP-T 0.6042
POP-U 0.5951
RAND 0.5014
Bi-LSTM 0.6739
BPR-DAE 0.7096
BPR-MF 0.7958
VBPR 0.8170
TBPR 0.8190
VTBPR 0.8232
GP-BPR 0.8388
To evaluate the contribution of each modality in our model, we further compared
GP-BPR with its two derivatives: GP-BPR-V and GP-BPR-T, where only the visual and con-
textual modality of fashion items were explored, respectively. Table 6.2 shows the performance
comparison of our model with different modalities. We observed that our model outperforms
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset