120 5. MULTIMODAL TRANSFER LEARNING
Table 5.3: Performance comparison between our model and the baselines on Dataset II (p-
value1
and p-value2
are, respectively, p-value over micro-F1 and macro-F1)
Feature Sets Micro-F1 Macro-F1 P-value1* P-value2*
Default 11.40% 0.53% 1.93e-9 1.41e-8
MDL 20.46 ± 0.49% 7.06 ± 0.27% 3.39e-8 2.01e-7
D
3
L 19.03 ± 0.29% 3.87 ± 0.24% 1.29e-8 2.29e-8
MTDL 20.67 ± 0.29% 6.16 ± 0.24% 4.29e-8 1.94e-8
AlexNet
0
25.95 ± 0.08% 6.04 ± 0.07% 9.81e-7 1.36e-8
AlexNet
1
28.95 ± 0.17% 9.45 ± 0.13% 2.15e-5 1.38e-7
AlexNet
2
29.04 ± 0.17% 10.86 ± 0.18% 4.02e-5 1.24e-6
AlexNet
3
28.55 ± 0.49% 10.65 ± 0.34% 1.91e-4 4.87e-6
TRUMANN 25.27 ± 0.17% 5.21 ± 0.29% 2.46e-7 9.23e-8
Our DARE 31.21 ± 0.22% 16.66 ± 0.30%
e TRUMANN model is better than dictionary learning methods, since it considers the
hierarchical structure of venue categories.
AlexNet with at least one hidden layer remarkably outperforms AlexNet
0
and dictionary
learning ones across metrics. is demonstrates the advantage of deep models.
Among the AlexNet series, it is not the deeper the better. is is caused by the intrinsic
limitation of AlexNet. (We will detail it in RQ4.)
Without a doubt, our proposed model achieves the best regardless of the metrics. is
justifies the effectiveness of our model. From the perspective of macro-F1, our model
makes noteworthy progress. is further shows the rationality of similarity preservation
by encoding the structural category information. In addition, we also conducted pair-wise
significant test between our model and each baseline. All the p-values are greatly smaller
than 0.05, which indicates the performance improvement is statistically significant.
5.6.4 EXTERNAL KNOWLEDGE EFFECT (RQ3)
We carried out experiments to study the effect of external sound knowledge on our model.
In particular, we varied the number of external acoustic concepts from 0 to 313. Figures 5.4a
and 5.4b illustrate the performance of our model according to the external data size w.r.t macro-
F1 and micro-F1, respectively. It is clear that these two curves go up very fast. Such phe-
nomenons tell us that transferring external sound knowledge is useful to boost the categorization
accuracy. Also, it signals that the more external sounds are involved, the better performance we
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset