7
C H A P T E R 2
Data Collection
In this chapter, we introduce the three datasets we collected, corresponding to the tasks of the
general compatibility modeling, personalized compatibility modeling, and personal wardrobe
creation, respectively.
2.1 DATASET I FOR GENERAL COMPATIBILITY
MODELING
In fact, several fashion datasets have been collected for different research purposes, for instance,
the WoW [76], Exact Street2Shop [29], and Fashion-136K [49]. However, most of the existing
released datasets are collected from wild street photos and thus inevitably involve a clothing pars-
ing technique, which still remains a great challenge in the computer vision domain [125, 126].
In addition, these datasets lack the rich contextual metadata of each fashion item, which makes
it difficult to fully model the fashion items. erefore, to guarantee the evaluation quality and
facilitate the experiment conduction, we constructed our own dataset FashionVC by crawling
outfits created by fashion experts on Polyvore. In particular, we first collected a seed set of pop-
ular outfits on Polyvore, based on which we tracked 248 fashion experts. We then crawled the
historical outfits published by them, based on which we constructed the ground truth for posi-
tive item pairs. Considering that certain improper outfits can be accidentally created by users on
Polyvore, we also set a threshold z D 50 with respect to the number of likes” for each outfit to
ensure the quality of the positive fashion pairs. Finally, we obtained 20,726 outfits with 14,871
tops and 13,663 bottoms. For each fashion item, we particularly collected its visual image, cat-
egories, and title description.
Table 2.1 lists several examples of fashion items in our dataset. Each fashion item is asso-
ciated with an image, a title, and several categories in terms of different granularity.
2.2 DATASET II FOR PERSONALIZED COMPATIBILITY
MODELING
As a matter of fact, most of the existing publicly available datasets lack the user context, which
makes it intractable to tackle the personalized clothing matching problem. It is worth noting
that although the dataset Amazon [86] contains the valuable user contexts but it focuses more
on the item recommendation based on the user preference and hence lacks the ground truth
regarding the coordination among fashion items. Moreover, the dataset used in [43] contains
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset