7.3. MICRO-VIDEO THUMBNAIL SELECTION 145
literatures and make a step forward of micro-video captioning tasks. (5) Different from image
captioning, visual descriptions in micro-videos are relatively shorter and the number of ground
truth descriptions is limited, which results in the infeasibility of traditional captioning evaluation
metrics, e.g., BLEU, ROUGE, METEOR, and CIDEr. erefore, developing new evaluation
metrics fitting micro-video captioning should be a popular future topic.
7.3 MICRO-VIDEO THUMBNAIL SELECTION
To retain users’ stickiness, beyond improving the quality of micro-videos, micro-video platforms
and publishers have to draw users’ eyes quickly [63]. As the most representative snapshot, the
thumbnail summaries a micro-video visually and provides the first impression to the users, as
shown in Figure 7.4. Moreover, studies report that the thumbnail is a crucial deciding factor
in determining to watch a video or skip to another [13]. It means that an appealing thumbnail
makes the micro-video more attractive. However, due to the inconvenient operation on smart-
phones or lack of experience, selecting a good thumbnail poses a challenge to users. erefore, we
suggest that an automatic thumbnail selection strategy is necessary to the micro-video sharing
platform.
Figure 7.4: Exemplar demonstration of the micro-video thumbnail.
Although several pioneer efforts [53, 76, 93, 104, 194] have been dedicated to jointly con-
sider the quality and representativeness for selecting the thumbnail, they ignored the fact that
the thumbnail should reflect the publisher’s preference and meet more users’ interests. Consid-
ering such fact, it brings the following challenges to the task: (1) how to measure the publisher’s
preferences on the different frames extracted from the micro-video; (2) how to calculate the
popularity of each frame according to the distribution of users’ interests on the platform; and
(3) to our knowledge, there is no such a suitable dataset to explore the micro-video thumbnail se-
lection. Toward these challenges, amounts of micro-videos associated with the side-information
(e.g., comments, publishers’ profiles) are first collected to build a large-scale micro-video dataset