Acoustic Modality

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Social Modality

Next Chapter

Related Work

3.3. FEATURE EXTRACTION 21

Emotional Content. People are naturally drawn to things that arouse their emotions. Micro-

videos showing funny animals or lovely babies make people feel urge to share them to express

the same emotions. As a result, micro-videos that are highly emotional are more likely to be

shared. erefore, we extracted textual sentiment, visual sentiment features for each video as

well as several acoustic features, which is widely used in emotion recognition in music [171].

High Quality and Aesthetic Design. When people share information on social networks, peo-

ple are actually showing a little piece of themselves to their audience. erefore, high quality

and aesthetic design of the content, which could reﬂect the taste of people, is another impor-

tant characteristic of popular micro-videos. Color histogram, aesthetic feature, and visual quality

feature were thus extracted to encode such characteristic. In addition, the acoustic features we

extracted are frequently used in music modeling, which could help to detect music in the audio

track of micro-videos [88].

3.3.2 SOCIAL MODALITY

It is intuitive that micro-videos posted by users, who has more followers or has a veriﬁed account,

are more likely to be propagated, and thus tend to receive a higher number of audiences. To char-

acterize the inﬂuence of micro-video publishers, we developed the following publisher-centric

features for micro-videos.

• Follower/Followee Count. e number of followers and followees of the given micro-

video publisher.

• Loop Count. e total number of loops received by all the posts of the publisher.

• Post Count. e number of posts generated by the publisher.

• Twitter Veriﬁcation. A binary value indicating whether the publisher has been veriﬁed by

Twitter.

3.3.3 VISUAL MODALITY

Due to the short length of micro-videos, the visual content is usually highly related to a single

theme, which enables us to only employ a few key frames to represent the whole micro-video.

Inspired by this, we extracted the visual features from certain key frames. e mean pooling

was performed across all the key frames to create a ﬁxed-length vector representation of each

micro-video.

Color Histogram It has been found that most basic visual features (i.e., intensity and the mean

value of diﬀerent color channels in HSV space) except color histogram, have little correlation

with popularity [77]. Color histogram has outstanding correlation due to the fact that striking

A Vine account can be veriﬁed by Twitter, if it is linked to a veriﬁed Twitter account.

22 3. MULTIMODAL TRANSDUCTIVE LEARNING

colors tend to catch users’ eyes. erefore, we only extracted color histogram as the basic visual

feature to characterize popular micro-videos. To reduce the size of color space, we grouped the

color space into 50 distinct colors, which results in a 50-d vector for each frame.

Object Features It has been studied that popular UGCs are strongly correlated with the ob-

jects contained in the videos [54]. We believe that the presence of certain objects aﬀect micro-

videos’ popularity. For example, micro-videos with “cute dogs” or “beautiful girls” are more likely

to be popular than those with “desks” and “stones.” We thus employed the deep convolutional

neural networks (CNNs) [82], a powerful model for image recognition problems [188], to detect

objects in micro-videos. Speciﬁcally, we applied the well-trained AlexNet deep neural network

(DNN) provided by the Caﬀe software package [71] to the input key frames. e output of the

fc7 layer and the ﬁnal 1;000-way softmax layer in AlexNet is a probability distribution over the

1;000 class labels predeﬁned in ImageNet. We treat them as our feature representation of each

frame. In the end, a mean pooling was performed over the frames to generate a single 4;096-d

vector and 1;000-d vector for each micro-video.

SentiBank Features We performed the sentiment analysis of the visual modality due to that

the sentiment of UGCs has been proven to be strongly correlated with their popularity [54]. In

particular, we extracted the visual sentiment features based on the deep CNNs model which was

trained on the SentiBank dataset [11]. SentiBank contains 2;089 concepts and each of them in-

vokes speciﬁc sentiments such as “cute girls” and “funny animals.” erefore, after mean pooling

among keyframes, each micro-video is represented by a 2;089-d vector.

Aesthetic Features Aesthetic features are a set of handful selected features related to the prin-

ciples of the nature and appreciation of beauty, which have been studied and found to be ef-

fective in popularity prediction [36]. Intuitively, micro-videos that are objectively aesthetic are

more likely to be popular. We employed the released tool

[10] to extract the following aesthetic

features: (a) dark channel feature; (b) luminosity feature; (c) sharpness; (d) symmetry; (e) low

depth of ﬁeld; (f ) white balance; (g) colorfulness; (h) color harmony, and (i) eye sensitivity, at

3  3 grids over each key frame. We then calculated: (a) normalized area of dominant object

and (b) normalized distances of centroid of dominant objects with respect to four stress points

at frame level. In the end, we obtained 149-d aesthetic features for each micro-video.

Visual Quality Assessment Features It is important that the visual quality of popular con-

tents are maintained at an acceptable level, given rising consumer expectations of the quality of

multimedia content delivered to them [140]. In particular, we employed the released tool

extract the micro-videos quality features based on the motion and spatio-temporal information,

which have been proven to correlate highly with human visual judgments of quality. is results

in a 46-d features.

http://www.ee.columbia.edu/~subh/Software.php

http://live.ece.utexas.edu/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Acoustic Modality

Create new playlist

Sign In

Sign Up

Table of Contents for
Acoustic Modality