3.3. FEATURE EXTRACTION 21
Emotional Content. People are naturally drawn to things that arouse their emotions. Micro-
videos showing funny animals or lovely babies make people feel urge to share them to express
the same emotions. As a result, micro-videos that are highly emotional are more likely to be
shared. erefore, we extracted textual sentiment, visual sentiment features for each video as
well as several acoustic features, which is widely used in emotion recognition in music [171].
High Quality and Aesthetic Design. When people share information on social networks, peo-
ple are actually showing a little piece of themselves to their audience. erefore, high quality
and aesthetic design of the content, which could reflect the taste of people, is another impor-
tant characteristic of popular micro-videos. Color histogram, aesthetic feature, and visual quality
feature were thus extracted to encode such characteristic. In addition, the acoustic features we
extracted are frequently used in music modeling, which could help to detect music in the audio
track of micro-videos [88].
3.3.2 SOCIAL MODALITY
It is intuitive that micro-videos posted by users, who has more followers or has a verified account,
are more likely to be propagated, and thus tend to receive a higher number of audiences. To char-
acterize the influence of micro-video publishers, we developed the following publisher-centric
features for micro-videos.
Follower/Followee Count. e number of followers and followees of the given micro-
video publisher.
Loop Count. e total number of loops received by all the posts of the publisher.
Post Count. e number of posts generated by the publisher.
Twitter Verification. A binary value indicating whether the publisher has been verified by
Twitter.
2
3.3.3 VISUAL MODALITY
Due to the short length of micro-videos, the visual content is usually highly related to a single
theme, which enables us to only employ a few key frames to represent the whole micro-video.
Inspired by this, we extracted the visual features from certain key frames. e mean pooling
was performed across all the key frames to create a fixed-length vector representation of each
micro-video.
Color Histogram It has been found that most basic visual features (i.e., intensity and the mean
value of different color channels in HSV space) except color histogram, have little correlation
with popularity [77]. Color histogram has outstanding correlation due to the fact that striking
2
A Vine account can be verified by Twitter, if it is linked to a verified Twitter account.
22 3. MULTIMODAL TRANSDUCTIVE LEARNING
colors tend to catch users’ eyes. erefore, we only extracted color histogram as the basic visual
feature to characterize popular micro-videos. To reduce the size of color space, we grouped the
color space into 50 distinct colors, which results in a 50-d vector for each frame.
Object Features It has been studied that popular UGCs are strongly correlated with the ob-
jects contained in the videos [54]. We believe that the presence of certain objects affect micro-
videos’ popularity. For example, micro-videos with “cute dogs” or beautiful girls” are more likely
to be popular than those with “desks” and stones.” We thus employed the deep convolutional
neural networks (CNNs) [82], a powerful model for image recognition problems [188], to detect
objects in micro-videos. Specifically, we applied the well-trained AlexNet deep neural network
(DNN) provided by the Caffe software package [71] to the input key frames. e output of the
fc7 layer and the final 1;000-way softmax layer in AlexNet is a probability distribution over the
1;000 class labels predefined in ImageNet. We treat them as our feature representation of each
frame. In the end, a mean pooling was performed over the frames to generate a single 4;096-d
vector and 1;000-d vector for each micro-video.
SentiBank Features We performed the sentiment analysis of the visual modality due to that
the sentiment of UGCs has been proven to be strongly correlated with their popularity [54]. In
particular, we extracted the visual sentiment features based on the deep CNNs model which was
trained on the SentiBank dataset [11]. SentiBank contains 2;089 concepts and each of them in-
vokes specific sentiments such as cute girls” and “funny animals.” erefore, after mean pooling
among keyframes, each micro-video is represented by a 2;089-d vector.
Aesthetic Features Aesthetic features are a set of handful selected features related to the prin-
ciples of the nature and appreciation of beauty, which have been studied and found to be ef-
fective in popularity prediction [36]. Intuitively, micro-videos that are objectively aesthetic are
more likely to be popular. We employed the released tool
3
[10] to extract the following aesthetic
features: (a) dark channel feature; (b) luminosity feature; (c) sharpness; (d) symmetry; (e) low
depth of field; (f ) white balance; (g) colorfulness; (h) color harmony, and (i) eye sensitivity, at
3 3 grids over each key frame. We then calculated: (a) normalized area of dominant object
and (b) normalized distances of centroid of dominant objects with respect to four stress points
at frame level. In the end, we obtained 149-d aesthetic features for each micro-video.
Visual Quality Assessment Features It is important that the visual quality of popular con-
tents are maintained at an acceptable level, given rising consumer expectations of the quality of
multimedia content delivered to them [140]. In particular, we employed the released tool
4
to
extract the micro-videos quality features based on the motion and spatio-temporal information,
which have been proven to correlate highly with human visual judgments of quality. is results
in a 46-d features.
3
http://www.ee.columbia.edu/~subh/Software.php
4
http://live.ece.utexas.edu/
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset