Notations and Preliminaries

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

26 3. MULTIMODAL TRANSDUCTIVE LEARNING

ferent modalities, whereby the information in each modality is insuﬃcient to generate the same

label prediction.

Multiple Kernel Learning [45] leverages a predeﬁned set of kernels corresponding to

diﬀerent views and learns an optimal linear or nonlinear combination of kernels to boost the

performance. Lanckriet et al. [83] constructed a convex Quadratically Constrained Quadratic

Program by conically combining the multiple kernels from a library of candidate kernels and

applied the method to several applications. To extend this method to a large-scale dataset, Bach

et al. [4] took the dual formulation as a second-order cone programming problem and developed

a sequential minimal optimization algorithm to obtain the optimal solution. Further, Ying and

Campbell [184] used the metric entropy integrals and pseudo-dimension of a set of candidate

kernels to estimate the empirical Rademacher chaos complexity.

Subspace learning [161] obtains a latent subspace shared by multiple views by assuming

that the input views are generated from this subspace. e dimensionality of the subspace is

lower than that of any input view, so the subspace learning alleviates the “curse of dimension-

ality.” e canonical correlation analysis (CCA) [65] is straightforwardly applied to select the

shared latent subspace through maximizing the correlation between the views. Since the sub-

space is linear, it is impossible to apply CCA to the real-world datasets exhibiting nonlinearities.

To compensate for this problem, Akaho [1] proposed a kernel variant of CCA, namely KCCA.

Diethe et al. [37] proposed the Fisher Discriminant Analysis using the label information to ﬁnd

the informative projections, more informative in the supervised learning settings. Recently, Zhai

et al. [187] studied the multi-view metric learning by constructing embedding projections from

multi-view data to a shared subspace. Although the subspace learning approaches alleviate the

“curse of dimensionality,” the dimensionality of subspace changes along with the task.

Overall, compelling success has been achieved by multi-view learning models on various

problems, such as categorization [145, 150], clustering [19, 51] and multimedia retrieval [91,

92]. However, to the best of our knowledge, limited eﬀorts have been dedicated to applying

multi-view learning in the context of micro-video understanding, which is the major concern of

our work.

3.4.3 LOW-RANK SUBSPACE LEARNING

In recent years, low-rank representation [95, 198–200] has been considered as a promising tech-

nique for exploring the latent low-dimensional representation embedded in the original space.

Low-rank subspace learning has been applied to a wide range of machine learning tasks, in-

cluding matrix recovery [210], image classiﬁcation [106, 197], subspace segmentation [96], and

missing modality recognition [40].

Robust principal component analysis (RPCA) [74] is a popular low-rank matrix recov-

ery method for high-dimensional data processing. is method aims to decompose a data ma-

trix into a low-rank matrix and a sparse error matrix. To promote the discriminative ability of

the original RPCA and improve the robust representation of corrupted data, Chen et al. [22]

3.4. RELATED WORK 27

presented a novel low-rank matrix approximation method with a structural incoherence con-

straint, which decomposes the raw data into a set of representative bases with associated sparse

error matrices. Based on the principle of self-representation, Liu et al. [95] proposed the low-

rank representation (LRR) method to search for the lowest-rank representation among all the

candidates. To overcome the incompetence of LRR in handling unobserved, insuﬃcient, and

extremely noisy data, Liu and Yan [96] further developed an advanced version of LRR, called

latent low-rank representation (LatLRR), for subspace segmentation. Zhang et al. [196] pro-

posed a structured low-rank representation method for image classiﬁcation, which constructs

a semantic-structured and constructive dictionary by incorporating class label information into

the training stage. Zhou et al. [205] provided a novel supervised and low-rank-based discrim-

inative feature learning method that integrates LatLRR with ridge regression to minimize the

classiﬁcation error directly.

To handle data that are generated from multiple views in many real-world applications,

some multi-view low-rank subspace learning methods have been developed to search for a la-

tent low-dimensional common subspace such that it can capture the commonality among all the

views. For example, Xia et al. [176] proposed to construct a transition probability matrix from

each view and then recover a shared low-rank transition probability matrix via low-rank and

sparse decomposition. Liu et al. [101] presented a novel low-rank multi-view matrix completion

(lrMMC) method for multi-label image classiﬁcation, where a set of basic matrices are learned

by minimizing the reconstruction errors and the rank of the latent common representation. In

the case that the view information of the testing data is unknown, Ding and Fu [39] proposed

a novel low-rank common subspace (LRCS) algorithm in a weakly supervised setting, where

only the view information is employed in the training phase. In [41], a dual low-rank decompo-

sition model was developed to learn a low-dimensional view-invariant subspace. To guide the

decomposition process, two supervised graph regularizers were considered to separate the class

structure and view structure. Li et al. [86] proposed a novel approach, named low-rank discrim-

inant embedding (LRDE), by considering the correlations between views and the geometric

structures contained within each view simultaneously. ese multi-view low-rank learning ap-

proaches have been proven to be eﬀective when diﬀerent feature views are complementary to

each other.

Although low-rank representation enables an eﬀective learning mechanism in exploring

the low-rank structure in noisy datasets [177], only a limited amount of low-rank models have

been developed to address the popularity prediction in social networks. e prediction of video

popularity can be considered as a standard regression problem. To the best of our knowledge, one

of the most related work to our approach is introduced in [202], in which a multi-view low-rank

regression model is presented by imposing low-rank constraints on the multi-view regression

model. However, in that work, the structure and relations among diﬀerent views were ignored.

To overcome this drawback, we propose to learn a set of view-speciﬁc projections by maximizing

the total correlations among views to map multi-view features into a common space. Another

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Notations and Preliminaries

Create new playlist

Sign In

Sign Up

Table of Contents for
Notations and Preliminaries