Multimodal Cooperative Learning for Micro-Video Venue Categorization

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Summary

Next Chapter

Related Work

56 3. MULTIMODAL TRANSDUCTIVE LEARNING

improvement in the micro-video popularity prediction tasks; (5) as stated in [67], SVR provides

a suboptimal learning solution compared to ELM. Accordingly, the results present that ELM

achieves better prediction performance than SVR; and (6) although MSNL and TMALL are

appropriate to deal with incomplete data, TLRMVR still outperforms them, thus demonstrat-

ing the eﬀectiveness of our approach.

Complexity Discussion In order to analyze the complexity of TLRMVR, we suppose that

the number of samples is larger than the dimension of data, i.e., .N C M / > .D

C D

C    C

/. As discussed previously, we can ﬁnd that the main computational complexity comes from

the following parts.

• nuclear norm calculation in step 3,

• matrix inverse calculation in step 5, and

• solving the Lyapunov equation in step 6.

e computational complexity of nuclear norm is at most O..N C M /

/. e matrix inverse

costs O..N C M /

/. e typical cost of the Lyapunov equation needs O..N C M /

/. If the al-

gorithm converges within T iteration steps for its outer loop, the upper bound of the complexity

is O.3T .N C M /

/. e simulations of our proposed algorithm are carried out in MATLAB

7.0.1 environment running in Core 3 Quad, 3.6-GHZ CPU with 8-GB RAM. e learning

and testing processes over all micro-videos can be accomplished within 1,627 s. e speed bot-

tleneck lies in the number of samples. erefore, to handle large-scale dataset, Coppersmith and

Winograd [32] presented a new method to accelerate matrix inversion to O..N C M /

2:376

/. Liu

et al. [94] oﬀered a more eﬃcient method to solve nuclear norm calculation.

3.8 SUMMARY

In this chapter, we ﬁrst present a novel transductive multi-modal learning method (TMALL),

to predict the popularity of micro-videos. In particular, TMALL works by learning an optimal

latent common space from multi-modalities of the given micro-videos, in which the popularity

of micro-videos are much more distinguishable. e latent common space is capable of unifying

and preserving information from diﬀerent modalities, and it helps to alleviate the modality

limitation problem. To verify our model, we built a benchmark dataset and extracted a rich

set of popularity-oriented features to characterize micro-videos from multiple perspectives. By

conducting extensive experiments, we draw the following conclusions: (1) the optimal latent

common space exists and works; (2) the more modalities we incorporate to learn the common

space, the more discriminant it is; and (3) the features extracted to describe the social and content

inﬂuence are representative.

Also, we introduce a novel low-rank multi-view embedding framework to alleviate the

heterogeneous, interconnected, and noisy problems in micro-video popularity prediction. By

3.8. SUMMARY 57

taking advantages of low-rank representation and multi-view learning, we eﬀectively integrated

all heterogeneous features extracted from diﬀerent views into a common feature subspace and

achieved enhanced robust feature representation for regression analysis. We also designed an

eﬀective optimization algorithm to solve the proposed model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Multimodal Cooperative Learning for Micro-Video Venue Categorization

Create new playlist

Sign In

Sign Up

Table of Contents for
Multimodal Cooperative Learning for Micro-Video Venue Categorization