26 3. MULTIMODAL TRANSDUCTIVE LEARNING
ferent modalities, whereby the information in each modality is insufficient to generate the same
label prediction.
Multiple Kernel Learning [45] leverages a predefined set of kernels corresponding to
different views and learns an optimal linear or nonlinear combination of kernels to boost the
performance. Lanckriet et al. [83] constructed a convex Quadratically Constrained Quadratic
Program by conically combining the multiple kernels from a library of candidate kernels and
applied the method to several applications. To extend this method to a large-scale dataset, Bach
et al. [4] took the dual formulation as a second-order cone programming problem and developed
a sequential minimal optimization algorithm to obtain the optimal solution. Further, Ying and
Campbell [184] used the metric entropy integrals and pseudo-dimension of a set of candidate
kernels to estimate the empirical Rademacher chaos complexity.
Subspace learning [161] obtains a latent subspace shared by multiple views by assuming
that the input views are generated from this subspace. e dimensionality of the subspace is
lower than that of any input view, so the subspace learning alleviates the “curse of dimension-
ality.” e canonical correlation analysis (CCA) [65] is straightforwardly applied to select the
shared latent subspace through maximizing the correlation between the views. Since the sub-
space is linear, it is impossible to apply CCA to the real-world datasets exhibiting nonlinearities.
To compensate for this problem, Akaho [1] proposed a kernel variant of CCA, namely KCCA.
Diethe et al. [37] proposed the Fisher Discriminant Analysis using the label information to find
the informative projections, more informative in the supervised learning settings. Recently, Zhai
et al. [187] studied the multi-view metric learning by constructing embedding projections from
multi-view data to a shared subspace. Although the subspace learning approaches alleviate the
curse of dimensionality,” the dimensionality of subspace changes along with the task.
Overall, compelling success has been achieved by multi-view learning models on various
problems, such as categorization [145, 150], clustering [19, 51] and multimedia retrieval [91,
92]. However, to the best of our knowledge, limited efforts have been dedicated to applying
multi-view learning in the context of micro-video understanding, which is the major concern of
our work.
3.4.3 LOW-RANK SUBSPACE LEARNING
In recent years, low-rank representation [95, 198200] has been considered as a promising tech-
nique for exploring the latent low-dimensional representation embedded in the original space.
Low-rank subspace learning has been applied to a wide range of machine learning tasks, in-
cluding matrix recovery [210], image classification [106, 197], subspace segmentation [96], and
missing modality recognition [40].
Robust principal component analysis (RPCA) [74] is a popular low-rank matrix recov-
ery method for high-dimensional data processing. is method aims to decompose a data ma-
trix into a low-rank matrix and a sparse error matrix. To promote the discriminative ability of
the original RPCA and improve the robust representation of corrupted data, Chen et al. [22]
3.4. RELATED WORK 27
presented a novel low-rank matrix approximation method with a structural incoherence con-
straint, which decomposes the raw data into a set of representative bases with associated sparse
error matrices. Based on the principle of self-representation, Liu et al. [95] proposed the low-
rank representation (LRR) method to search for the lowest-rank representation among all the
candidates. To overcome the incompetence of LRR in handling unobserved, insufficient, and
extremely noisy data, Liu and Yan [96] further developed an advanced version of LRR, called
latent low-rank representation (LatLRR), for subspace segmentation. Zhang et al. [196] pro-
posed a structured low-rank representation method for image classification, which constructs
a semantic-structured and constructive dictionary by incorporating class label information into
the training stage. Zhou et al. [205] provided a novel supervised and low-rank-based discrim-
inative feature learning method that integrates LatLRR with ridge regression to minimize the
classification error directly.
To handle data that are generated from multiple views in many real-world applications,
some multi-view low-rank subspace learning methods have been developed to search for a la-
tent low-dimensional common subspace such that it can capture the commonality among all the
views. For example, Xia et al. [176] proposed to construct a transition probability matrix from
each view and then recover a shared low-rank transition probability matrix via low-rank and
sparse decomposition. Liu et al. [101] presented a novel low-rank multi-view matrix completion
(lrMMC) method for multi-label image classification, where a set of basic matrices are learned
by minimizing the reconstruction errors and the rank of the latent common representation. In
the case that the view information of the testing data is unknown, Ding and Fu [39] proposed
a novel low-rank common subspace (LRCS) algorithm in a weakly supervised setting, where
only the view information is employed in the training phase. In [41], a dual low-rank decompo-
sition model was developed to learn a low-dimensional view-invariant subspace. To guide the
decomposition process, two supervised graph regularizers were considered to separate the class
structure and view structure. Li et al. [86] proposed a novel approach, named low-rank discrim-
inant embedding (LRDE), by considering the correlations between views and the geometric
structures contained within each view simultaneously. ese multi-view low-rank learning ap-
proaches have been proven to be effective when different feature views are complementary to
each other.
Although low-rank representation enables an effective learning mechanism in exploring
the low-rank structure in noisy datasets [177], only a limited amount of low-rank models have
been developed to address the popularity prediction in social networks. e prediction of video
popularity can be considered as a standard regression problem. To the best of our knowledge, one
of the most related work to our approach is introduced in [202], in which a multi-view low-rank
regression model is presented by imposing low-rank constraints on the multi-view regression
model. However, in that work, the structure and relations among different views were ignored.
To overcome this drawback, we propose to learn a set of view-specific projections by maximizing
the total correlations among views to map multi-view features into a common space. Another
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset