Low-Rank Subspace Learning

3.4. RELATED WORK 25

timents in video popularity propagation into account but also reveals more underlying factors

that determine the popularity of a video.

However, the aforementioned studies do not consider the combined impact of hetero-

geneous, interconnected, and noisy data. In contrast, our proposed scheme not only pursues a

solid fusion of heterogeneous multi-view features based on the complementary characteristics

but also concentrates on exploiting the advantages of the low-rank representation to learn robust

features within the incomplete and noisy data. As a complement, we aim to timely predict the

popularity of a given micro-video even before it get published by proposing a novel multi-modal

learning scheme.

3.4.2 MULTI-VIEW LEARNING

Technically speaking, traditional multimodal fusion approaches consist of early fusion and late

fusion. Early fusion approaches, such as [42, 146], typically concatenate the unimodal features

extracted from each individual modality into a single representation to adapt to the learning set-

ting. Following that, one can devise a classiﬁer, such as a neural network, treating the overall rep-

resentation as the input. However, these approaches generally overlook the obvious fact that each

view has its own speciﬁc statistical property and ignore the structural relatedness among views.

Hence, it fails to explore the modal correlations to strengthen the expressiveness of each modal-

ity and further improve the capacity of the fusion method. Late fusion performs the learning

directly over unimodal features, and then the prediction scores are fused to predict the venue cat-

egory, such as averaging [144], voting [118] and weighting [134]. Although this fusion method

is ﬂexible and easy to work, it overlooks the correlation in the mixed feature space.

In contrast to the early and late fusion, as a new paradigm, multi-view learning exploits the

correlations between the representations of the information from multiple modalities to improve

the learning performance. It can be classiﬁed into three categories: co-training, multiple kernel

learning, and subspace learning.

Co-training [31] is a semi-supervised learning technique which ﬁrst learns a separate

classiﬁer for each view using the labeled examples. It maximizes the mutual agreement on two

distinct views of the unlabeled data by alternative training. Many variants have since been de-

veloped. Instead of committing labels for the unlabeled examples, Nigam et al. [125] proposed a

co-EM approach to running EM in each view and assigned probabilistic labels to the unlabeled

examples. To resolve the regression problems, Zhou and Li [208] employed two k-nearest neigh-

bor regressors to label the unknown instances during the learning process. More recently, Yu et

al. [186] proposed a Bayesian undirected graphical model for co-training through the Gaussian

process. e success of the co-training algorithms relies on three assumptions: (a) each view is

suﬃcient to estimate on its own; (b) it is probable that a function predicts the same labels for

each view feature; and (c) the views are conditionally independent of the given label. However,

these assumptions are too strong to satisfy in practice, especially for the micro-videos with dif-

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Low-Rank Subspace Learning