Optimization

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

3.6. MULTIMODAL TRANSDUCTIVE LEARNING 29

modality can be represented as X

2 R

.N CM /Z

. e popularity of all the videos are denoted

by y D fy

; y

; : : : ; y

2 R

. Let f D ff

; f

; : : : ; f

; f

N C1

; f

N C2

; : : : ; f

N CM

2 R

N CM

stand for the predicted results regarding popularity for all samples, including the labeled and un-

labeled ones. We aim to jointly learn the common space X

2 R

.N CM /Z

shared by multiple

modalities and the popularity for the M unlabeled micro-videos.

We present a novel Transductive Multi-modAL Learning approach, TMALL for short,

to predicting the popularity of micro-videos. As illustrated in Figure 3.1, we ﬁrst crawl a rep-

resentative micro-video dataset from Vine and develop a rich set of popularity-oriented fea-

tures from multi-modalities. We then perform multi-modal learning to predict the popularity

of micro-videos, which seamlessly takes the modality relatedness and modality limitation into

account by utilizing a common space shared by all modalities. We assume that there exists an

optimal common space, which maintains the original intrinsic characteristics of micro-videos in

the original spaces. In light of this, all modalities are forced to be correlated. Meanwhile, micro-

videos with diﬀerent popularity can be better separated in such optimal common space, as com-

pared to that of each single modality. In a sense, we alleviate the modality limitation problem. It

is worth mentioning that, in this work, we aim to predict how popular a given micro-video will

be when the propagation is stable rather than when the given micro-video would be popular.

An Example of Micro-video

Ground Truth

Social Visual Acoustic Textual

Feature Extraction from Multi-modal MALL Model Predicted Popularity

Top

Optimal Subspace

Figure 3.1: Micro-video popularity prediction via our proposed TMALL model.

3.6.1 OBJECTIVE FORMULATION

It is apparent that diﬀerent modalities may contribute distinctive and complementary informa-

tion about micro-videos. For example, textual modality gives us hints about the topics of the

given micro-video; acoustic and visual modalities may, respectively, convey location and situa-

tion of micro-videos, and user modality demonstrates the inﬂuence of the micro-video publisher.

ese clues jointly contribute to the popularity of a micro-video. Obviously, due to the noise and

information insuﬃciency of each modality, it may be suboptimal to conduct learning directly

30 3. MULTIMODAL TRANSDUCTIVE LEARNING

from each single modality separately. In contrast, we assume that there exists an optimal latent

space, in which micro-videos can be better described. Moreover, the optimal latent space should

maintain the original intrinsic characteristics conveyed by multi-modalities of the given micro-

videos. erefore, we penalize the disagreement of the normalized Laplacian matrix between

the latent space and each modality. In particular, we formalize this assumption as follows: Let

2 R

.N CM /.N CM/

be the similarity matrix,

which is computed by the Gaussian similarity

function as follows:

.i; j / D

exp





 x



2

; if i ¤ j ,

0 ; if i D j ,

(3.5)

where x

and x

are the micro-video pairs in the k-th modality space. ereinto, the radius

parameter 

is simply set as the median of the Euclidean distances over all video pairs in the

k-th modality. We then derive the corresponding normalized Laplacian matrix as follows:

L.S

/ D I  D



; (3.6)

where I is a .N C M /  .N C M / identity matrix and D

2 R

.N CM /.N CM/

is the diagonal

degree matrix, whose .u; u/-th entry is the sum of the u-th row of S

. Since S

.i; j / > 0, we

can derive that tr.L.S

// > 0. We thus can formulate the disagreement penalty between the

latent space and the original modalities as

kD1







; (3.7)

where tr.A/ is the trace of matrix A and



denotes the Frobenius norm of matrix. In addition,

inspired by [164], considering that similar micro-videos attempt to have similar popularity in

the latent common space, we adopt the following regularizer:

N CM

mD1

N CM

nD1

f .x



f .x

.m; n/ D f

L.S

/f: (3.8)

Based upon these formulations, we can deﬁne the loss function that measures the em-

pirical error on the training samples. As reported in [123], the squared loss usually yields good

performance as other complex ones. We thus adopt the squared loss in our algorithm for sim-

plicity and eﬃciency. In particular, since we do not have the labels for testing samples, we only

To facilitate the illustration, k ranges from 0–K.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Optimization

Create new playlist

Sign In

Sign Up

Table of Contents for
Optimization