4.7. SUMMARY 107
venue of “Pet store,” the colors representing “Kid speaking and Whoop” are dark, and
the color representing Battle cry is lighter. ese observations agree with our common
sense and demonstrate that the attention score can select the discriminative features toward
the venue category.
4.7 SUMMARY
is chapter presents three novel and efficient multi-modal learning models for micro-video
venue categorization, i.e., multi-task multimodal consistent learning model TRUMANN, tree-
guided multimodal complementary learning strategy INTIMATE, and neural multimodal co-
operative learning one NMCL. Specifically, the TRUMANN model is capable of learning a
common feature space from multiple and heterogonous modalities and preserve the informa-
tion of each modality via disagreement penalty. e INTIMATE co-regularizes the hierarchical
smoothness and structure consistency within a unified model to learn the high-level sparse rep-
resentations of micro-videos. Considering the timeliness and limited training samples, an online
learning algorithm is developed to efficiently and incrementally strengthen the learning perfor-
mance. And the NMCL model sheds light on characterizing and modeling the correlations
between modalities, especially the consistent and complementary relations. In this model, we
introduced a novel relation-aware attention mechanism to split the consistent information from
the complementary one. Following that, we integrated the consistent information to learn an
enhanced consistent vector and supplemented the complementary information to enrich this
enhanced vector. And the experimental results on publicly accessible dataset have well validated
the promising efficiency and effectiveness of these three models.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset