4.5. MULTIMODAL COMPLEMENTARY LEARNING 73
Iterations
1 2 3 4 5 6 7 8 9 10
Micro-F1(%)
0
10
20
30
Object Value
0
20
40
60
Object Value
Performance
Figure 4.4: Performance of TRUMANN on Dataset II with the iteration times.
with respect to the number of iterations. From this figure, it can be seen that our algorithm can
converge very fast.
4.5 MULTIMODAL COMPLEMENTARY LEARNING
It is worth mentioning that the pioneer work in Section 4.4 has studied the problem of micro-
video categorization and devised a so-called “TRUMANN” model. However, TRUMANN
explicitly projects all the modalities into the same feature space and represents all the modalities
with a unified feature vector. In this way, the TRUMANN model does capture the common
information among modalities, but it may lose some complementary information among them.
For instance, the acoustic modality may contain the atom of “chirp of birds” that is hardly ex-
pressed by the visual modality. And the TRUMANN model aims to utilize the tree structure
to guide the specific classifier learning rather than representation learning, it hence only suits
for venue estimation task and cannot be applied to other applications. Moreover, the proposed
TRUMANN model is an offline learning, which overlooks the importance of the online learn-
ing factor.
To address these problems, we develop an IncremeNtal Tree-guIded Multi-modAl dic-
Tionary lEarning approach, dubbed INTIMATE, to organize micro-videos into a tree tax-
onomy. Our proposed approach is illustrated in Figure 4.5. Specifically, given a set of labeled
micro-videos at the initial offline stage, our model is able to learn a concept-level dictionary for
each modality, which is the basis of micro-video sparse representation. Standing on the shoulder
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset