4.5. MULTIMODAL COMPLEMENTARY LEARNING 83
and ODLSC methods outperformed WDL and SRMTL. e latter two methods leveraged the
raw features to directly train the classifier, whereas the former two learned the sparse represen-
tations of micro-videos before training the classifiers. is justifies the necessity of dictionary
learning and the discrimination of sparse representation; (2) multi-modal dictionary learning
methods surpass the mono-modal ones. is demonstrates that there are indeed relatedness
among modalities. Appropriate capturing and modeling such relatedness can reinforce the dic-
tionary learning, and hence the discrimination of sparse representation; (3) multi-modal dictio-
nary learning methods achieve relatively better results than that of TRUMMAN. is signals
that not all the modalities of micro-videos share the same space and it thus may not be the best
to learn the common space via the agreement constrains; (4) MTDL shows its superiority to
the MDL. is tells us that in the supervised settings, joint minimization of misclassification
and reconstruction errors result in the dictionaries that are adapted to the desired tasks. And
supervised methods can lead to a more accurate classification compared with the unsupervised
ones; (5) our proposed INTIMATE substantially outperforms the others, including MTDL
and MDL. is verifies that harvesting the prior knowledge of tree structure is useful for a more
discriminative dictionary learning toward venue classification; and (6) as to the efficiency over
the offline part, we can see that the cost of our model is relatively lower than that of the others.
In addition, we also conducted the significance test between our model and each of the base-
lines relying on the average results over 10-round experiments cross validation results. We can
see that all the p-values are substantially smaller than 0.05, indicating that the advantage of our
model is statistically significant. Besides, we can see that the performance of our model hardly
fluctuates, showing the stability of our model.
Parameter Tuning and Sensitivity Analysis
Our model has three key parameters: the number of dictionary atoms K, the tradeoff parameters
and . In each of the 10-round experiments, we adopted the grid search strategy to carefully
tune and select the optimal parameters from the training data [123]. Take one experiment as an
example. We first performed the grid search in a coarse level within a wide range of [0, 1000]
using an adaptive step size. Once we obtained the approximate scope of each parameter, we then
performed the fine tuning within a narrow range using a small step size.
Figure 4.7 shows the performance of our model regarding the three parameters, which
is accomplished by varying one and fixing the others. We can see that the performance of our
model changes within small ranges nearby the optimal settings. is justifies that our model is
non-sensitive to the parameters around their optimal settings. It is observed that the setting of
K D 150, D 0:85, and D 1 works well for all of our experiments. e procedures of tuning
the parameters are analogous to other competitors to ensure a fair comparison.