52 3. MULTIMODAL TRANSDUCTIVE LEARNING
Table 3.5: Performance comparison with different visual-level feature combinations at predicting
micro-video popularity on Dataset I
Color Object Sentiment Aesthetics ALL
Top50 0.364 0.231 0.247 0.203 0.309
Top100 0.325 0.229 0.231 0.194 0.280
Top200 0.301 0.193 0.204 0.174 0.276
Bottom200 0.279 0.184 0.199 0.167 0.265
Bottom100 0.254 0.182 0.193 0.164 0.256
Bottom50 0.253 0.177 0.191 0.160 0.249
nMSE 0.975 0.967 0.969 0.971 0.934
P-value < 0.05 < 0.05 < 0.05 < 0.05
Table 3.6: Performance comparison with different view-level feature combinations at predicting
micro-video popularity on Dataset I
T + V + A T + A + S T + V + S V + A + S TLRMVR
Top50 0.273 0.241 0.289 0.272 0.309
Top100 0.241 0.201 0.250 0.227 0.280
Top200 0.238 0.255 0.249 0.225 0.276
Bottom200 0.233 0.199 0.247 0.218 0.265
Bottom100 0.224 0.179 0.229 0.213 0.256
Bottom50 0.218 0.172 0.221 0.201 0.249
nMSE 0.979 0.970 0.958 0.955 0.934
P-value < 0.05 < 0.05 < 0.05 < 0.05
Parameter Sensitivity Analysis
Among all the parameters in our proposed objective function, we found that the parameters
and ˇ play significant roles in affecting the prediction results. As shown in Eq. (3.42), the
trade-off parameter is used to balance the effects between the graph regularization and ridge
regression and the trade-off parameter ˇ is mainly used to control the effect of the supervised
loss term. erefore, we would like to evaluate different values of and ˇ to investigate the
variation in prediction performance. In this experiment, the parameter and ˇ are selected via
a grid search in a heuristic manner, ranging from 0.05–0.30 with an interval 0.05 and ranging
from 0.25–1.25 with an interval 0.25, respectively. nMSE results for various values of and ˇ
are reported in Tables 3.7 and 3.8, respectively. As shown in this table, the best performance is
3.7. MULTI-MODAL TRANSDUCTIVE LOW-RANK LEARNING 53
Table 3.7: Performance comparison with different on our proposed framework on Dataset I
0.05 0.10 0.15 0.20 0.25 0.30
Top50 0.370 0.309 0.283 0.238 0.230 0.198
Top100 0.347 0.280 0.269 0.227 0.219 0.187
Top200 0.330 0.276 0.251 0.212 0.205 0.175
Bottom200 0.309 0.265 0.241 0.204 0.197 0.168
Bottom100 0.298 0.256 0.231 0.196 0.189 0.162
Bottom50 0.294 0.249 0.227 0.193 0.186 0.159
nMSE 0.948 0.934 0.953 0.957 0.958 0.961
P-value < 0.05 < 0.05 < 0.05 < 0.05 < 0.05
Table 3.8: Performance comparison with different ˇ on our proposed framework on Dataset I
0.25 0.50 0.75 1 1.25
Top50 0.309 0.308 0.322 0.200 0.204
Top100 0.305 0.279 0.294 0.185 0.189
Top200 0.285 0.276 0.283 0.181 0.186
Bottom200 0.263 0.265 0.268 0.175 0.181
Bottom100 0.257 0.256 0.257 0.170 0.176
Bottom50 0.252 0.249 0.251 0.166 0.172
nMSE 0.949 0.934 0.950 0.962 0.968
P-value < 0.05 < 0.05 < 0.05 < 0.05
achieved when D 0:10 and ˇ D 0:50. In fact, when is set 0, our proposed method is reduced
to discard the graph regularization term, which easily induces the overfitting problem. If ˇ is set
0, our proposed method is equivalent to discard the supervised information and easily induces
unsatisfactory results. is conclusions can be verified in Section 4.4.
We also evaluated the influence of various dimensions of the projection matrices. e
performance of TLRMVR with different D from 10–60 is illustrated in Table 3.9. From the
table, we discovered that the best dimension is 20. Too small or too large a dimension leads to a
suboptimal prediction performance. It is a reasonable choice to take 20 as the reduced dimension
in consideration of the complementary properties of different views.
54 3. MULTIMODAL TRANSDUCTIVE LEARNING
Table 3.9: Performance comparison with different reduced dimensions D on our proposed
framework on Dataset I
10 20 30 40 50 60
Top50 0.319 0.308 0.318 0.316 0.316 0.297
Top100 0.288 0.279 0.286 0.277 0.277 0.270
Top200 0.274 0.276 0.275 0.274 0.274 0.262
Bottom200 0.269 0.265 0.269 0.267 0.267 0.256
Bottom100 0.250 0.256 0.252 0.249 0.249 0.245
Bottom50 0.243 0.249 0.243 0.241 0.241 0.236
nMSE 0.950 0.934 0.947 0.949 0.951 0.953
P-value < 0.05 < 0.05 < 0.05 < 0.05 < 0.05
Comparison with state-of-the-art methods We compared our proposed scheme with sev-
eral existing state-of-the-art methods, including multiple linear regression (MLR), lasso re-
gression, support vector regression (SVR) [147], RegMVMT [190], multi-feature learning via
hierarchical regression (MLHR) [181], multiple social network learning (MSNL) [149], multi-
view discriminant analysis [75], transductive multi-modal learning (TMALL) [24], and extreme
learning machine (ELM) [67].
MLR: Multiple linear regression (MLR) attempts to capture the dependency between two
or more independent variables and a response variable using a linear equation, which is an
extension of classical linear regression.
Lasso: Lasso regression considers both variable selection and regularization to enhance
the prediction performance.
SVR: Support vector regression [147] is a classical regression technique with a maximum
margin criterion. We combined all the features together with an RBF kernel to learn a
non-linear SVR in a high-dimensional kernel-induced feature space.
RegMVMT: RegMVMT [190] is an inductive learning framework to address the gen-
eral multi-view learning problem, in which the co-regularization technique is utilized to
enforce the agreement with other views on unlabeled samples.
MLHR: e multi-feature fusion via hierarchical regression [181] is a semi-supervised
learning method, which has been developed to explore the structural information embed-
ded in data from the view of multi-feature fusion.
3.7. MULTI-MODAL TRANSDUCTIVE LOW-RANK LEARNING 55
MSNL: Multiple social network learning (MSNL) [149] is proposed to address the in-
complete data in source confidence and source consistency by modeling source confidence
and source consistency simultaneously.
MvDA: Multi-view discriminant analysis (MvDA) [75] is a multi-view learning model,
which has been developed to search for a latent common space by enforcing the view-
consistency of multi-linear transforms.
TMALL: e transductive multi-modal learning (TMALL) model is presented for pre-
dicting the popularity of micro-videos, in which different modal features can be unified
and preserved in a latent common space to address the insufficient information problems.
ELM: As ELM [68, 154] can embed a wide type of feature mappings, Huang et al. [67]
extended ELM to kernel learning and proposed a unified learning mechanism for regres-
sion applications with higher scalability and less computational complexity.
Table 3.10 reports the prediction performances of our proposed method and other state-
of-the-art algorithms. From this table, we have the following observations: (1) our proposed TL-
RMVR performs the best among all the comparative methods; (2) lasso and MLR performs the
worst, as expected, indicating that simple feature selection and linear regression are insufficient
to predict the popularity of micro-videos; (3) in contrast to Lasso and MLR, the algorithms, in-
cluding RegMVMT, MLHR, MSNL, MvDA, and TMALL, also perform comparably, which
can be attributed to their ability to solve the multi-view/modal feature fusion problem; (4) after
employing the RBF kernel to deal with multiple features, the SVR model provides a significant
Table 3.10: Performance comparison between our proposed method and several state-of-the-art
methods on Dataset I
Methods nMSE P-value
MLR 1.442 ± 2.55e-01 1.05e-07
Lasso 1.568 ± 1.72e-01 4.42e-08
SVR 0.991 ± 5.00e-02 7.36e-06
RegMVMT 1.058 ± 4.33e-05 1.88e-03
MLHR 1.167 ± 1.40e-02 4.75e-06
MSNL 1.098 ± 1.30e-01 2.11e-04
MvDA 0.982 ± 7.00e-03 2.62e-05
TMALL 0.979 ± 9.42e-03 1.43e-08
ELM 0.982 ± 6.68e-05 3.71e-07
TLRMVR 0.934 ± 7.67e-04
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset