7.2. MICRO-VIDEO CAPTIONING 143
100,000
10,000
1,000
100
10
Frequent Hashtags
Hashtags Ordered by Frequency
Long-tail Hashtags
Frequency
Frequency
2.0k 4.0k 6.0k 8.0k 10.0k 12.0k 14.0k
Figure 7.2: e statistic of hashtag frequency distribution in Instagram.
existing approaches recommend hashtags while ignoring redundancy among them, therefore,
how to obtain relevant hashtags in consideration of their inter-dependencies is difficult.
In the future, we will tackle this task from the following three directions. First, we plan
to construct a knowledge graph to explore hashtag correlations, and leverage existing structural
knowledge to derive proper dependencies between frequent hashtags and long-tail hashtags.
Second, we will introduce multi-level attention mechanism into the multimodal sequence model
to focus on important cues among the sequential features and multi-modality features. Lastly, we
expect to simulate how human annotators works and generate diverse and distinct micro-video
annotation.
7.2 MICRO-VIDEO CAPTIONING
Micro-video captioning aims to auto generate textual descriptions for micro-videos. Some ex-
amples can be found in Figure 7.3. Due to its representation capability involving both computer
vision and natural language processing techniques, the micro-video captioning shows great po-
tential in aiding visually impaired people better understand visual contents. Moreover, it also
plays a vital role in searching micro-videos and answering questions regarding micro-video con-
tents. As users tend to submit queries and ask questions about micro-video clips through text-
based keywords, a better content descriptor can promote the user satisfaction as well as loyalty for
144 7. RESEARCH FRONTIERS
Figure 7.3: Example of micro-video caption.
micro-video platforms. However, current micro-video systems (e.g., Vine, Instagram, Kuaishou,
TikTok) lack of these content descriptions, resulting in performance degradation of micro-video
retrieval and question-answering systems. Besides, some of the user-annotated captions are not
adequate enough to correctly describe the micro-video contents. erefore, it is crucial to develop
micro-video captioning approaches to auto generate concise and accurate video descriptions.
Although micro-video captioning is an important research task in literature, there are
some challenges:
(1) With the fast development of DNNs, employing more powerful network structures
(e.g., graph neural networks, reinforcement learning techniques) to micro-video captioning will
undoubtedly improve the model performance. (2) Normally the salient part inside a micro-video
consists of a short video clips (e.g., 10 s), which fits well with the attention mechanism. Con-
sidering this, how to utilizing the attention mechanism to generate micro-video descriptions
will be an important research problem. (3) Traditional video captioning is struggled with te-
dious description problem due to the limitation of training corpus. erefore, the novel caption
generation will be a potential direction for the micro-video captioning task. (4) Since construc-
tion of datasets is a fundamental problem in machine learning and current micro-video datasets
are short of these captioning information, more abundant datasets will benefit further related
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset