56 3. MULTIMODAL TRANSDUCTIVE LEARNING
improvement in the micro-video popularity prediction tasks; (5) as stated in [67], SVR provides
a suboptimal learning solution compared to ELM. Accordingly, the results present that ELM
achieves better prediction performance than SVR; and (6) although MSNL and TMALL are
appropriate to deal with incomplete data, TLRMVR still outperforms them, thus demonstrat-
ing the effectiveness of our approach.
Complexity Discussion In order to analyze the complexity of TLRMVR, we suppose that
the number of samples is larger than the dimension of data, i.e., .N C M / > .D
1
C D
2
C C
D
K
/. As discussed previously, we can find that the main computational complexity comes from
the following parts.
nuclear norm calculation in step 3,
matrix inverse calculation in step 5, and
solving the Lyapunov equation in step 6.
e computational complexity of nuclear norm is at most O..N C M /
3
/. e matrix inverse
costs O..N C M /
3
/. e typical cost of the Lyapunov equation needs O..N C M /
3
/. If the al-
gorithm converges within T iteration steps for its outer loop, the upper bound of the complexity
is O.3T .N C M /
3
/. e simulations of our proposed algorithm are carried out in MATLAB
7.0.1 environment running in Core 3 Quad, 3.6-GHZ CPU with 8-GB RAM. e learning
and testing processes over all micro-videos can be accomplished within 1,627 s. e speed bot-
tleneck lies in the number of samples. erefore, to handle large-scale dataset, Coppersmith and
Winograd [32] presented a new method to accelerate matrix inversion to O..N C M /
2:376
/. Liu
et al. [94] offered a more efficient method to solve nuclear norm calculation.
3.8 SUMMARY
In this chapter, we first present a novel transductive multi-modal learning method (TMALL),
to predict the popularity of micro-videos. In particular, TMALL works by learning an optimal
latent common space from multi-modalities of the given micro-videos, in which the popularity
of micro-videos are much more distinguishable. e latent common space is capable of unifying
and preserving information from different modalities, and it helps to alleviate the modality
limitation problem. To verify our model, we built a benchmark dataset and extracted a rich
set of popularity-oriented features to characterize micro-videos from multiple perspectives. By
conducting extensive experiments, we draw the following conclusions: (1) the optimal latent
common space exists and works; (2) the more modalities we incorporate to learn the common
space, the more discriminant it is; and (3) the features extracted to describe the social and content
influence are representative.
Also, we introduce a novel low-rank multi-view embedding framework to alleviate the
heterogeneous, interconnected, and noisy problems in micro-video popularity prediction. By
3.8. SUMMARY 57
taking advantages of low-rank representation and multi-view learning, we effectively integrated
all heterogeneous features extracted from different views into a common feature subspace and
achieved enhanced robust feature representation for regression analysis. We also designed an
effective optimization algorithm to solve the proposed model.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset