59
C H A P T E R 4
Multimodal Cooperative
Learning for Micro-Video
Venue Categorization
4.1 BACKGROUND
As is known, geographic information benefits many services, such as location-based search, rec-
ommendation, and social networking. However, in real-world scenarios, few users tag their
micro-videos with specific geographic information due to privacy concerns. Specifically, as re-
ported in [192], around 98.78% micro-videos do not have location information. Despite its
significance, we have to mention that it is hard, if not impossible, to infer the specific loca-
tion information, such as American Airlines Arena in Florida, USA. Instead, we turn to infer
the venue category of a given micro-video, such as “Basketball Court. And technically speak-
ing, venue category estimation of micro-videos can be treated as a multi-modal fusion problem
and solved by integrating the geographic cues from visual, acoustic, and textual modalities of
micro-videos. Motivated by this, in this chapter, we propose three different multi-modal learn-
ing models to infer the venue information of micro-videos.
4.2 RESEARCH PROBLEMS
Inferring the venue categories from micro-videos is non-trivial, due to the following challenges.
(1) Heterogeneous multi-modalities. Similar to the traditional long videos, like the ones
in YouTube, micro-videos are also the unity of textual, visual, and acoustic modalities, which
characterizes the video content from multiple complementary views. Although some efforts
have been dedicated to data fusion [103, 116, 148], how to model the relatedness among multi-
modalities and effectively fuse them is still an open research question.
(2) Sparse information. e most prominent attribute of micro-video platforms is that
they are thriving heavily in the realm of shortness and instant. For example, Vine allows users to
upload about 6 s videos online; Snapchat offers its users the option to create 10 s micro-videos;
and Viddy limits the length of its upload videos to 30 s. Persuasively, short length makes video
production and broadcasting easy, downloading timely, and playing fluent on portable devices,
however, in contrast to the traditional long videos, micro-videos are comparatively short, thereby
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset