Related Work

3.3. FEATURE EXTRACTION 23

3.3.4 ACOUSTIC MODALITY

Acoustic modality usually works as an important complement to visual modality in many video-

related tasks, such as video classiﬁcation [175]. In fact, audio channels embedded in the micro-

videos may also contribute to the popularity of micro-videos to a large extent. For example, the

audio channel may indicate the quality of a given micro-video and convey rich background infor-

mation about the emotion as well as the scene contained in the micro-video, which signiﬁcantly

aﬀects the popularity of a micro-video. e acoustic information is especially useful for the cases

where the visual features could not carry enough information. erefore, we adopted the fol-

lowing widely used acoustic features, i.e., mel-frequency cepstral coeﬃcients (MFCC) [88] and

Audio-Six (i.e., Energy Entropy, Signal Energy, Zero Crossing Rate, Spectral Rolloﬀ, Spectral

Centroid, and Spectral Flux [171]). ese features are frequently used in diﬀerent audio-related

tasks, such as emotion detection and music recognition. We ﬁnally obtained a 36-d acoustic fea-

ture vector for each micro-video.

3.3.5 TEXTUAL MODALITY

Micro-videos are usually associated with textual modality in the form of descriptions, such as

“when Leo ﬁnally gets the Oscar” and “Puppy dog dreams,” which may precisely summarize the

micro-videos. Such summarization may depict the topics and sentiment information regard-

ing the micro-videos, which has been proven to be of signiﬁcance in online article popularity

prediction [9].

Sentence2Vector We found that the popular micro-videos are sometimes related to the topics

of the textual descriptions. is observation propels us to conduct content analysis over the tex-

tual descriptions of micro-videos. Considering the short-length of descriptions, to perform con-

tent analysis, we employed the state-of-the-art textual feature extraction tool Sentence2Vector,

which was developed on the basis of work embedding algorithm Word2Vector [115]. In this

way, we extracted 100-d features for video descriptions.

Textual Sentiment We also analyze the sentiments over text, which has been proven to play

an important role in popularity prediction [8]. With the help of the Sentiment Analysis tool in

Stanford CoreNLP tools,

we assigned each micro-video a sentiment score ranging from 0–4

and they correspond to very negative, negative, neutral, positive, and very positive, respectively.

https://github.com/klb3713/sentence2vec

http://stanfordnlp.github.io/CoreNLP/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Related Work