20 3. MULTIMODAL TRANSDUCTIVE LEARNING
fusion and late fusion [149]. ey, however, fail to account for the relatedness among multiple
modalities. erefore, it is important to take modality relatedness into consideration.
(2) Interconnected. Heterogeneous features extracted from different modalities show dif-
ferent aspects of micro-videos, which are complementary to each other. In this case, it will be
beneficial to develop an effective approach to finding the interconnected patterns shared by all
views. However, due to the restrictions brought by micro-video producers and platforms, the
additional information associated with a micro-video, textual description, for example, suffers
from the diverse or unstructured nature, causing the features extracted from certain views un-
available in many situations. For example, according to our statistics over around 2 million Vine
micro-videos, as reported in [24], more than 11% of micro-videos do not provide textual de-
scriptions. In contrast, micro-video content itself ensures a steady information source to enable
popularity prediction. us, to compensate for this limitation, micro-video content features are
considered to be an indispensable component for a more descriptive and predictive analysis on
the one hand, and it is necessary to exploit the complementarity between different views to learn
the latent interconnected patterns to address the incomplete problem on the other hand.
(3) Noisy. Originating from certain external factors in reality, various types of noises make
the real underlying data structure hidden in the observed data. For example, micro-videos are
often captured by users with hand-held mobile devices which easily result in poor video quality,
such as low-resolution, wobbly frames, constrained lighting conditions, and background noise.
Besides, textual descriptions related to micro-videos may be noisy and uncorrelated. e afore-
mentioned challenges drive us to build a robust model to explore the intrinsic structure property
embedded in data by inferring meaningful features and alleviating the impact of noisy ones.
3.3 FEATURE EXTRACTION
It is apparent that both the publisher influence and content influence contribute to the popularity
of UGCs. In particular, we characterized the publisher influence via the social modality, and
the content influence via visual, acoustic and textual modalities. For content influence, we first
examined the popular micro-videos in Dataset I and proposed three common characteristics of
online micro-videos. For each characteristic, we then explained the insights, and transformed
it into a set of features for video representation. Finally, we developed a rich set of popularity-
oriented features from each modality of micro-videos in Dataset I.
3.3.1 OBSERVATIONS
Universal Appeal. e subjects of widely popular micro-videos cannot be something that can
only be appreciated by a small group of people. erefore, the topics and objects contained
in micro-videos should be something common so that to be interpreted the same way across
people and cultures. To capture this characteristic, we extracted Sentence2Vector feature from
the textual modality and deep object feature from the visual one.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset