5.6. EXPERIMENTS 121
0.166
0.164
0.162
0.160
0.158
0.156
0.312
0.310
0.308
0.306
0.304
0.302
0 50 100 150 200 250 300 313
Acoustic Concept#
Macro-F1
Mi cro-F1
0 50 100 150 200 250 300 313
Acoustic Concept#
(a) Macro-F1 vs. Acoustic Concept (b) Micro-F1 vs. Acoustic Concept
Figure 5.4: Performance of our model w.r.t. the number of external acoustic concepts.
will achieve, since it is able to cover a much wider range of acoustic concepts appeared in micro-
videos.
5.6.5 VISUALIZATION
We conducted experiments to shed some light on the correlation between venue categories and
acoustic concepts. In particular, we calculated the correlations between acoustic concepts and
venue categories via producing inner products on the conceptual distributions and venue label
vectors of samples. To save the space, we visualized part of correlation matrix via a heat map,
where lighter color indicates weak correlation and vice versa, as shown in Figure 5.5. We can see
that almost each selected venue category is tightly related to several acoustic concepts. More-
over, different venues emphasize a variety of acoustic concepts. For example, the micro-videos
with venue of Italian Restaurant and College (University) have significant correlations with the
onomatopoeia concepts, such as rattle, jingle, and rumble; meanwhile, several motion concepts,
such as scream, running, and clapping provide clear cues to infer the venue information of Hous-
ing Development, Gym, and Playground, respectively. ese observations agree with our daily
experiences and further demonstrate the potential influence of acoustic information on the task
of venue category estimation.
5.6.6 STUDY OF DARE MODEL (RQ4)
We wonder whether our model converges and how fast it is. To answer this question, we plot the
training loss, macro-F1, and micro-F1 with respect to the number of iterations in Figures 5.6a,
5.6b, and 5.6c, respectively. From these three sub-figures, it can be seen that the training loss
of our proposed DARE model decreases quickly within the first 10 iterations, and accordingly
122 5. MULTIMODAL TRANSFER LEARNING
College Classroom
Italian Restaurant
Art Museum
Hockey Area
College and University
Event Space
eater
River
Laughter
Girl
Wind
Girl
Laughter
Wave
Whistle
Crowd
Dog Run
Playground
Pier
Dive Bar
Pool
Housing
Development
City Hall
Gym/Fitness
Center
Rumble
Rattle
Wind
Beep
Clang
Jingle
Boom
Kick
Clapping
Running
Laughter
Whistle
Scream
Singing
Talking
Applause
(a) Onomatopoeia Concepts (b) Motion Concepts
(c) Stadium (d) Motion Concepts
Figure 5.5: Visualization regarding correlations between venue category and two types of acous-
tic concepts.
5.6. EXPERIMENTS 123
8×10
3
0.18
0.32
0.30
0.28
0.26
0.24
0.22
0.16
0.14
0.12
0.10
0.08
0.06
0.04
7×10
3
6×10
3
5×10
3
4×10
3
3×10
3
2×10
3
1×10
3
0
0 105 1015 3025 35
0 105 2015 3025 35
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
105 1015 3025 35
Training LossMicro-F1
0.30
0.25
0.20
0.15
0.10
0.05
Performance
Macro-F1
IterationIteration
(b) Macro-F1 vs. iteration(a) Training loss vs. iteration
(d) Micro-F1/Macro-F1 vs. dropout ratio(c) Micro-F1 vs. iteration
Dropout Ratio ρIteration
Micro-F1
Macro-F1
Figure 5.6: Convergence and dropout ratio study of our proposed DARE model on Dataset II.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset