Multimodal Transfer Learning in Micro-Video Analysis

4.7. SUMMARY 107

venue of “Pet store,” the colors representing “Kid speaking” and “Whoop” are dark, and

the color representing “Battle cry” is lighter. ese observations agree with our common

sense and demonstrate that the attention score can select the discriminative features toward

the venue category.

4.7 SUMMARY

is chapter presents three novel and eﬃcient multi-modal learning models for micro-video

venue categorization, i.e., multi-task multimodal consistent learning model TRUMANN, tree-

guided multimodal complementary learning strategy INTIMATE, and neural multimodal co-

operative learning one NMCL. Speciﬁcally, the TRUMANN model is capable of learning a

common feature space from multiple and heterogonous modalities and preserve the informa-

tion of each modality via disagreement penalty. e INTIMATE co-regularizes the hierarchical

smoothness and structure consistency within a uniﬁed model to learn the high-level sparse rep-

resentations of micro-videos. Considering the timeliness and limited training samples, an online

learning algorithm is developed to eﬃciently and incrementally strengthen the learning perfor-

mance. And the NMCL model sheds light on characterizing and modeling the correlations

between modalities, especially the consistent and complementary relations. In this model, we

introduced a novel relation-aware attention mechanism to split the consistent information from

the complementary one. Following that, we integrated the consistent information to learn an

enhanced consistent vector and supplemented the complementary information to enrich this

enhanced vector. And the experimental results on publicly accessible dataset have well validated

the promising eﬃciency and eﬀectiveness of these three models.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Multimodal Transfer Learning in Micro-Video Analysis