141
C H A P T E R 7
Research Frontiers
In this book, we investigate some application-motivated problems, namely the research prob-
lems of micro-video understanding. To solve these problems, we design some general principles,
methodologies, and optimizations by jointly learning from multiple correlated modalities of the
given micro-videos, including the textual, visual, acoustic, and social ones. ey are empiri-
cally validated on multiple real-world datasets. In particular, we first introduce the proliferation
of micro-video services and identify three practical tasks of micro-video understanding: popu-
larity prediction, venue category estimation, and micro-video routing. Based upon these tasks,
we analyze the unique research challenges of micro-videos that are distinct from traditional
long videos, such as information sparseness, hierarchical structure, low-quality, multimodal se-
quential data, as well as lack of benchmark datasets. To address these problems, we present a
series of multimodal learning methods, consisting of multimodal transductive learning, multi-
modal cooperative learning, multimodal transductive learning and multimodal sequential learn-
ing. ese theoretical methods are verified over three datasets we constructed. To facilitate other
researchers, we have released the codes, parameter settings, as well as the three datasets. We have
to emphasize that learning from multiple modalities of the given micro-videos is still a young
and highly promising research field. ere are many unexplored but fruitful future directions
and challenging research issues. We illustrate a few of them here.
7.1 MICRO-VIDEO ANNOTATION
Facing the exponentially growing number of micro-videos, it is important to help users quickly
identify their desired ones. e hashtags associated with micro-videos are typically provided by
uploaders to summarize the post content of users and attract the attention of followers. Taking
the popular social platform Instagram as an example, as shown in Figure 7.1, the hashtags are
prefixed with the symbol “#” to mark keywords or key topics of a post. e hashtags have been
proved to be useful in many applications, including micro-blog retrieval, event analysis, and
sentiment analysis. Moreover, the tagging service can benefit the stakeholders of micro-video
ecosystems. For users, hashtags facilitate them to search and locate their desired micro-videos.
For post-sharers, concise and concrete hashtags can increase the probability of their micro-videos
to be discovered. For platforms, hashtags can make the management of micro-videos (e.g., cat-
egorization) more convenient. Despite their importance, numerous micro-videos are lack of
hashtags or the hashtags are inaccurate or incomplete. In light of this, micro-video annotation,