86 4. MULTIMODAL COOPERATIVE LEARNING
Table 4.4: Comparison between models with CNN-based dictionary learning and our dictionary
learning for the venue category estimation (p-value
: p-value over accuracy)
Models Accuracy Micro-F1 P-value*
DPL 4.64 ± 0.24% 4.87 ± 0.28% 3.86e-08
INTIMATE 6.28 ± 0.08% 6.60 ± 0.09% –
Effectiveness of the Tree Structure
We argued that encoding the tree structure to constrain the sparse representation learning can
strengthen the representation discrimination. In this part, we carried out experiments to verify
the effectiveness of the tree structure from quantitative and qualitative aspects.
Quantitative Analysis: To show the effect of the tree structure on the sparse representa-
tion learning, we compared it with a flat model without tree structure, dubbed INTIMATE
-
,
min
D;A
1
2
M
X
mD1
X
m
D
m
A
m
2
F
C
2
M
X
mD1
X
c2C
A
m
c
2;1
C
2
M
X
mD1
A
m
2
F
;
s.t.
d
m
k
1; 8k; m;
(4.38)
where C is the set of categories. We only took the leaf nodes into consideration, and did not
consider the hierarchical tree structure to regularize the representation learning. To ensure a fair
comparison, we trained INTIMATE and INTIMATE
-
over the same offline training set and
reported the final results over the testing set. Analogous to other experiments, we also repeated
this one on ten round training/testing data sampling.
e experimental results are shown in Table 4.5. From this table, it can be seen that com-
pared to INTIMATE, the performance of INTIMATE
-
drops significantly regarding accu-
racy and micro-F1 metrics. is is because, the INTIMATE
-
baseline does take the class label
information into consideration and learns the category-aware sparse representations for each
micro-video, however, it completely ignores the hierarchical relatedness among categories. is
further justifies the usefulness of encoding the tree structure to learn the sparse representations.
Table 4.5: Comparison between models with and without structure information for the venue
category estimation on Dataset II (p-value
: p-value over accuracy)
Models Accuracy Micro-F1 P-value*
INTIMATE
–
3.00 ± 0.04% 3.18 ± 0.05% 6.86e-15
INTIMATE 6.28 ± 0.08% 6.60 ± 0.09% –