144 7. RESEARCH FRONTIERS
Figure 7.3: Example of micro-video caption.
micro-video platforms. However, current micro-video systems (e.g., Vine, Instagram, Kuaishou,
TikTok) lack of these content descriptions, resulting in performance degradation of micro-video
retrieval and question-answering systems. Besides, some of the user-annotated captions are not
adequate enough to correctly describe the micro-video contents. erefore, it is crucial to develop
micro-video captioning approaches to auto generate concise and accurate video descriptions.
Although micro-video captioning is an important research task in literature, there are
some challenges:
(1) With the fast development of DNNs, employing more powerful network structures
(e.g., graph neural networks, reinforcement learning techniques) to micro-video captioning will
undoubtedly improve the model performance. (2) Normally the salient part inside a micro-video
consists of a short video clips (e.g., 10 s), which fits well with the attention mechanism. Con-
sidering this, how to utilizing the attention mechanism to generate micro-video descriptions
will be an important research problem. (3) Traditional video captioning is struggled with te-
dious description problem due to the limitation of training corpus. erefore, the novel caption
generation will be a potential direction for the micro-video captioning task. (4) Since construc-
tion of datasets is a fundamental problem in machine learning and current micro-video datasets
are short of these captioning information, more abundant datasets will benefit further related