74 4. MULTIMODAL COOPERATIVE LEARNING
Initial Labeled Dataset
Unlabeled
Representation Smoothness
Tree-Guided
Tree-Guided Multi-Modal Dictionary Learning
Modality Consistency
Online
Upload
Venue Category
Estimation
D
textual
D
visual
D
acoustic
Visual Features
University
Museum
Labeled
Sparse Coding
Enhance Dictionary
Learning
Encode
Classifier
School
Road
Garden
Bridge
Field
Acoustic Features Textual Features
Figure 4.5: Scheme of our proposed INTIMATE approach. It consists of an offline dictionary
learning component and an online learning component.
of the traditional dictionary learning framework, we advance it by devising a tree-guided group
lasso via jointly considering the following two principles.
(1) Hierarchical Smoothness. Micro-videos with close labels in the hierarchical tree
should have similar sparse representations.
(2) Structural Consistency. e tree structure is invariant across the textual, visual, and
acoustic modalities. With the sparse representations, we can estimate the venue categories of
micro-videos with shallow classifiers, such as softmax [30]. Moreover, we develop an online
algorithm to solve the INTIMATE model. If an incoming micro-video is unlabeled, we can
efficiently infer its venue category; otherwise, we will harvest its knowledge to strengthen our
model.
In this part, we first briefly review the dictionary learning over mono-modal and multi-
modal data. We then formulate the proposed INTIMATE model. At last, we optimize the
INTIMATE model via an online algorithm.
4.5.1 MULTI-MODAL DICTIONARY LEARNING
Real-world objects are usually described by multi-modalities from different aspects. It is thus
natural to extend the mono-modal dictionary learning to handle the multi-modal data. Suppose
we have N samples with M modalities f.x
1
n
; : : : ; x
M
n
/g
N
nD1
, in which x
m
n
2 X
m
.m D 1; : : : ; M /
denotes the m-th modality of the sample x
n
, X
m
2 R
D
m
N
denotes the m-th modality of the
given N data, and D
m
denotes the dimension of the m-th modality. e sparse representation of
the n-th sample A
n
D Œa
1
n
; : : : ; a
M
n
2 R
KM
and multi-modal dictionaries D D fD
1
; : : : ; D
M
g
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset