34 3. MULTIMODAL TRANSDUCTIVE LEARNING
were based on 10-fold cross-validation. In each round of the 10-fold cross-validation, we split
Dataset I into two chunks: 90% of the micro-videos were used for training and 10% were used
for testing.
We report performance in terms of normalized mean square error (nMSE) [123] between
the predicted popularity and the actual popularity. e nMSE is an estimator of the overall
deviations between predicted and measured values. It is defined as
nMSE D
P
iD1
.p
i
r
i
/
2
P
iD1
r
2
i
; (3.24)
where p
i
is the predicted value and r
i
is the target value in ground truth.
We have three key parameters as shown in Eq. (3.10). e optimal values of these param-
eters were carefully tuned with the training data in each of the 10 fold. We employed the grid
search strategy to obtain the optimal parameters between 10
5
to 10
2
with small but adaptive
step sizes. In particular, the step sizes were 0:00001, 0:0001, 0:001, 0:01, 0:1, 1, and 10 for the
range of [0:00001,0:0001], [0:0001,0:001], [0:001,0:01], [0:01,0:1], [0:1,1], [1,10], and [10,100],
respectively. e parameters corresponding to the best nMSE were used to report the final re-
sults. For other compared systems, the procedures to tune the parameters are analogous to ensure
the fair comparison. Considering one fold as an example, we observed that our model reached
the optimal performance at D 1, D 0:01 and D 100.
On Model Comparison
To demonstrate the effectiveness of our proposed TMALL model, we carried out experiments
on Dataset I with several state-of-the-art multi-view learning approaches.
• Early_Fusion. e first baseline concatenates the features extracted from the four modal-
ities into a single joint feature vector, on which traditional machine learning models can
be applied. In this work, we adopted the widely used regression model—SVR, and imple-
mented it with the help of scikit-learn [130].
• Late_Fusion. e second baseline first separately predicts the popularity of micro-videos
from each modality via SVR model, and then linearly integrates them to obtain the final
results.
• regMVMT. e third baseline is the regularized multi-view learning model [190]. is
model only regulates the relationships among different views within the original space.
• MSNL. e fourth one is the multiple social network learning (MSNL) model proposed
in [149]. is model takes the source confidence and source consistency into consideration.
• MvDA. e fifth baseline is a multi-view discriminant analysis (MvDA) model [75],
which aims to learn a single unified discriminant common space for multiple views by