22 3. MULTIMODAL TRANSDUCTIVE LEARNING
colors tend to catch users’ eyes. erefore, we only extracted color histogram as the basic visual
feature to characterize popular micro-videos. To reduce the size of color space, we grouped the
color space into 50 distinct colors, which results in a 50-d vector for each frame.
Object Features It has been studied that popular UGCs are strongly correlated with the ob-
jects contained in the videos [54]. We believe that the presence of certain objects affect micro-
videos’ popularity. For example, micro-videos with “cute dogs” or “beautiful girls” are more likely
to be popular than those with “desks” and “stones.” We thus employed the deep convolutional
neural networks (CNNs) [82], a powerful model for image recognition problems [188], to detect
objects in micro-videos. Specifically, we applied the well-trained AlexNet deep neural network
(DNN) provided by the Caffe software package [71] to the input key frames. e output of the
fc7 layer and the final 1;000-way softmax layer in AlexNet is a probability distribution over the
1;000 class labels predefined in ImageNet. We treat them as our feature representation of each
frame. In the end, a mean pooling was performed over the frames to generate a single 4;096-d
vector and 1;000-d vector for each micro-video.
SentiBank Features We performed the sentiment analysis of the visual modality due to that
the sentiment of UGCs has been proven to be strongly correlated with their popularity [54]. In
particular, we extracted the visual sentiment features based on the deep CNNs model which was
trained on the SentiBank dataset [11]. SentiBank contains 2;089 concepts and each of them in-
vokes specific sentiments such as “cute girls” and “funny animals.” erefore, after mean pooling
among keyframes, each micro-video is represented by a 2;089-d vector.
Aesthetic Features Aesthetic features are a set of handful selected features related to the prin-
ciples of the nature and appreciation of beauty, which have been studied and found to be ef-
fective in popularity prediction [36]. Intuitively, micro-videos that are objectively aesthetic are
more likely to be popular. We employed the released tool
3
[10] to extract the following aesthetic
features: (a) dark channel feature; (b) luminosity feature; (c) sharpness; (d) symmetry; (e) low
depth of field; (f ) white balance; (g) colorfulness; (h) color harmony, and (i) eye sensitivity, at
3 3 grids over each key frame. We then calculated: (a) normalized area of dominant object
and (b) normalized distances of centroid of dominant objects with respect to four stress points
at frame level. In the end, we obtained 149-d aesthetic features for each micro-video.
Visual Quality Assessment Features It is important that the visual quality of popular con-
tents are maintained at an acceptable level, given rising consumer expectations of the quality of
multimedia content delivered to them [140]. In particular, we employed the released tool
4
to
extract the micro-videos quality features based on the motion and spatio-temporal information,
which have been proven to correlate highly with human visual judgments of quality. is results
in a 46-d features.
3
http://www.ee.columbia.edu/~subh/Software.php
4
http://live.ece.utexas.edu/