Cooperative Networks

92 4. MULTIMODAL COOPERATIVE LEARNING

Concatenation (g

)

Full Connection Full Connection

Consistent βComplementary α Complementary αConsistent β

Full Connection

Output

Concatenation

Figure 4.15: Illustration of Cooperative Net. e cooperative nets separete the consistent com-

ponents from the complmentary ones, and yield an augmented feature vector comprised of the

enhanced consistent vector and complementary vectors.

m 2 M D fv, a, tg denote the modality indicator, and x

2 R

denote the D

-dimensional

feature vector over the m-th modality. In our work, each micro-video is associated with one of

K pre-deﬁned venue categories, namely a one-hot label vector y 2 R

, where K refers to the

number of venue category.

4.6.1 MULTIMODAL EARLY FUSION

As aforementioned, fusing multimodal information is capable of producing the comprehensive

description for micro-videos. Prior studies [47, 166] have practically demonstrated the eﬀective-

ness of early fusion strategy, which concatenates the features from all modalities into a uniﬁed

representation. Following that, one can devise a classiﬁer, such as a neural network, treating

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Cooperative Networks