6.4. EXPERIMENT 71
Experiment Settings. In our context of matching bottoms for a given top, we only con-
sidered the outfits that either contain a top and a bottom, or a coat plus a bottom/dress, where we
treated the coat as the “top” while the bottom/dress as the “bottom.” As one user may coordinate
different shoes or accessories for the same top-bottom pair to make different outfits, we removed
the duplicated top-bottom pairs from the dataset, resulting in 217,806 unique top-bottom pairs.
Regarding the evaluation, we adopted the leave-one-out strategy, where we randomly sampled
one top-bottom pair for each user and retained it as the testing sample. en we generated the
quadruple set D
train
, D
valid
, and D
test
according to Eq. (6.7), where for each positive top-bottom
pair .t
i
; b
j
/ of the user u
m
, we randomly sampled a negative bottom b
k
from the whole bottom
dataset (i.e., B) to comprise a quadruplet .m; i; j; k/. Finally, we adopted the AUC [133] as the
evaluation metric.
For optimization, we employed the adaptive moment estimation method (Adam) [58].
We adopted the grid search strategy to determine the optimal values for the regularization
parameter and trade-off parameters (, , and ). In addition, the mini-batch size, the
number of hidden units and learning rate were searched in Œ32; 64; 128, Œ256; 512; 1024, and
Œ0:0005; 0:001; 0:005; 0:01, respectively. e proposed model was fine-tuned for 40 epochs, and
the performance on the testing set was reported. We empirically set the number of hidden layers
in representation learning K D 1.
6.4.2 ON MODEL COMPARISON (RQ1)
We chose the following state-of-the-art methods as the baselines to evaluate the proposed
model.
• POP-T: We used the “popularity” of the bottom to measure its compatibility with top, which
is defined as the number of outfits that the bottom appeared in the training set.
• POP-U: Similarly, in this baseline, we defined the “popularity” of the bottom as the number
of users who once interacted with the bottom in the training set.
• RAND: We randomly assigned the compatibility scores of m
ij
and m
ik
between items.
• Bi-LSTM: We chose the bidirectional LSTM model in [31] which explores the outfit com-
patibility by sequentially predicting the next item conditioned on previous ones. In our con-
text, we adapted Bi-LSTM to deal with an outfit comprising of a top and a bottom.
• BPR-DAE: We selected the content-based neural scheme introduced by [108] that is capa-
ble of jointly modeling the coherent relation between different modalities of fashion items
and the implicit preference among items via a dual autoencoder network. It is worth noting
that BPR-DAE overlooks the user factor in the compatibility modeling.
• BPR-MF: We used the pairwise ranking method introduced in [100], where the latent user-
item relations are captured by the MF method.