68 4. MULTIMODAL COOPERATIVE LEARNING
and .2NK
2
C K
3
/T . ereinto, M is the iteration times of the alternative optimization, which
is a small value less than 10 in our above analysis. N , T , S , K, and D, respectively, refer to the
number of micro-videos, venue categories, modalities, latent dimension, and the total feature
dimensions over all the modalities. Usually, we consider only a few number of modalities. S
is hence very small. In our experimental settings, K and T are in the order of a few hundreds.
Meanwhile, the number of feature dimension is about 5,000. erefore, D
2
is greater than K
2
T .
In light of this, we can reduce the time complexity to be ND
2
, which is faster than SVM, in
terms of O.N
3
/.
4.4.4 EXPERIMENTS
To valid the effectiveness of the first model TRUMANN, we conducted several experiments
over a server equipped with Inter(R) Core(TM) CPU i7-4790 at 3.6 GHz on 32 Gb RAM,
8 cores and 64-bit Windows 10 operation system. To thoroughly measure our model and the
baselines, we employed multiple metrics, namely macro-F1 and micro-F1 [55]. e averaging
macro-F1 gives equal weight to each class-label in the averaging process, whereas the averaging
micro-F1 gives equal weight to all instances in the averaging process. Both macro-F1 and micro-
F1 metrics reach their best value at 1 and worst score at 0.
e experimental results reported in this paper were based on 10-fold cross-validation.
In particular, the stratified cross-validation [130] was adopted to ensure all categories contain
approximately the same percentage between training and testing samples. In each round of the
10-fold cross-validation, we split Dataset II into three chunks: 80% of the micro-videos (i.e.,
194,505 videos) were used for training, 10% (i.e., 24,313 videos) were used for validation, and
the rest (i.e., 24,313 videos) were held out for testing. e training set was used to adjust the
parameters, while the validation set was used to avoid overfitting, i.e., verifying that any perfor-
mance increase over the training dataset actually yields an accuracy increase over a dataset that
has not been shown to the model before. e testing set was used only for testing the final solu-
tion to confirm the actual predictive power of our model with optimal parameters. Grid search
was employed to select the optimal parameters with small but adaptive step size.
Performance Comparison among Models
We carried out experiments on Dataset II to compare the overall effectiveness of our proposed
TRUMANN model with several state-of-the-art baselines.
• SRMTL: e Sparse Graph Regularization Multi-Task Learning method can capture the
relationship between task pairs and further impose a sparse graph regularization scheme
to enforce the related pairs close to each other [99].
• regMVMT: is semi-supervised inductive multi-view multi-task learning model consid-
ers information from multiple views and learns multiple related tasks simultaneously [190].
Besides, we also compared our model with the variant of regMVMT method, dubbed reg-