Summary

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

52 3. MULTIMODAL TRANSDUCTIVE LEARNING

Table 3.5: Performance comparison with diﬀerent visual-level feature combinations at predicting

micro-video popularity on Dataset I

Color Object Sentiment Aesthetics ALL

Top50 0.364 0.231 0.247 0.203 0.309

Top100 0.325 0.229 0.231 0.194 0.280

Top200 0.301 0.193 0.204 0.174 0.276

Bottom200 0.279 0.184 0.199 0.167 0.265

Bottom100 0.254 0.182 0.193 0.164 0.256

Bottom50 0.253 0.177 0.191 0.160 0.249

nMSE 0.975 0.967 0.969 0.971 0.934

P-value < 0.05 < 0.05 < 0.05 < 0.05 –

Table 3.6: Performance comparison with diﬀerent view-level feature combinations at predicting

micro-video popularity on Dataset I

T + V + A T + A + S T + V + S V + A + S TLRMVR

Top50 0.273 0.241 0.289 0.272 0.309

Top100 0.241 0.201 0.250 0.227 0.280

Top200 0.238 0.255 0.249 0.225 0.276

Bottom200 0.233 0.199 0.247 0.218 0.265

Bottom100 0.224 0.179 0.229 0.213 0.256

Bottom50 0.218 0.172 0.221 0.201 0.249

nMSE 0.979 0.970 0.958 0.955 0.934

P-value < 0.05 < 0.05 < 0.05 < 0.05 –

Parameter Sensitivity Analysis

Among all the parameters in our proposed objective function, we found that the parameters

 and ˇ play signiﬁcant roles in aﬀecting the prediction results. As shown in Eq. (3.42), the

trade-oﬀ parameter  is used to balance the eﬀects between the graph regularization and ridge

regression and the trade-oﬀ parameter ˇ is mainly used to control the eﬀect of the supervised

loss term. erefore, we would like to evaluate diﬀerent values of  and ˇ to investigate the

variation in prediction performance. In this experiment, the parameter  and ˇ are selected via

a grid search in a heuristic manner, ranging from 0.05–0.30 with an interval 0.05 and ranging

from 0.25–1.25 with an interval 0.25, respectively. nMSE results for various values of  and ˇ

are reported in Tables 3.7 and 3.8, respectively. As shown in this table, the best performance is

3.7. MULTI-MODAL TRANSDUCTIVE LOW-RANK LEARNING 53

Table 3.7: Performance comparison with diﬀerent  on our proposed framework on Dataset I

0.05 0.10 0.15 0.20 0.25 0.30

Top50 0.370 0.309 0.283 0.238 0.230 0.198

Top100 0.347 0.280 0.269 0.227 0.219 0.187

Top200 0.330 0.276 0.251 0.212 0.205 0.175

Bottom200 0.309 0.265 0.241 0.204 0.197 0.168

Bottom100 0.298 0.256 0.231 0.196 0.189 0.162

Bottom50 0.294 0.249 0.227 0.193 0.186 0.159

nMSE 0.948 0.934 0.953 0.957 0.958 0.961

P-value < 0.05 – < 0.05 < 0.05 < 0.05 < 0.05

Table 3.8: Performance comparison with diﬀerent ˇ on our proposed framework on Dataset I

0.25 0.50 0.75 1 1.25

Top50 0.309 0.308 0.322 0.200 0.204

Top100 0.305 0.279 0.294 0.185 0.189

Top200 0.285 0.276 0.283 0.181 0.186

Bottom200 0.263 0.265 0.268 0.175 0.181

Bottom100 0.257 0.256 0.257 0.170 0.176

Bottom50 0.252 0.249 0.251 0.166 0.172

nMSE 0.949 0.934 0.950 0.962 0.968

P-value < 0.05 – < 0.05 < 0.05 < 0.05

achieved when  D 0:10 and ˇ D 0:50. In fact, when  is set 0, our proposed method is reduced

to discard the graph regularization term, which easily induces the overﬁtting problem. If ˇ is set

0, our proposed method is equivalent to discard the supervised information and easily induces

unsatisfactory results. is conclusions can be veriﬁed in Section 4.4.

We also evaluated the inﬂuence of various dimensions of the projection matrices. e

performance of TLRMVR with diﬀerent D from 10–60 is illustrated in Table 3.9. From the

table, we discovered that the best dimension is 20. Too small or too large a dimension leads to a

suboptimal prediction performance. It is a reasonable choice to take 20 as the reduced dimension

in consideration of the complementary properties of diﬀerent views.

54 3. MULTIMODAL TRANSDUCTIVE LEARNING

Table 3.9: Performance comparison with diﬀerent reduced dimensions D on our proposed

framework on Dataset I

10 20 30 40 50 60

Top50 0.319 0.308 0.318 0.316 0.316 0.297

Top100 0.288 0.279 0.286 0.277 0.277 0.270

Top200 0.274 0.276 0.275 0.274 0.274 0.262

Bottom200 0.269 0.265 0.269 0.267 0.267 0.256

Bottom100 0.250 0.256 0.252 0.249 0.249 0.245

Bottom50 0.243 0.249 0.243 0.241 0.241 0.236

nMSE 0.950 0.934 0.947 0.949 0.951 0.953

P-value < 0.05 – < 0.05 < 0.05 < 0.05 < 0.05

Comparison with state-of-the-art methods We compared our proposed scheme with sev-

eral existing state-of-the-art methods, including multiple linear regression (MLR), lasso re-

gression, support vector regression (SVR) [147], RegMVMT [190], multi-feature learning via

hierarchical regression (MLHR) [181], multiple social network learning (MSNL) [149], multi-

view discriminant analysis [75], transductive multi-modal learning (TMALL) [24], and extreme

learning machine (ELM) [67].

• MLR: Multiple linear regression (MLR) attempts to capture the dependency between two

or more independent variables and a response variable using a linear equation, which is an

extension of classical linear regression.

• Lasso: Lasso regression considers both variable selection and regularization to enhance

the prediction performance.

• SVR: Support vector regression [147] is a classical regression technique with a maximum

margin criterion. We combined all the features together with an RBF kernel to learn a

non-linear SVR in a high-dimensional kernel-induced feature space.

• RegMVMT: RegMVMT [190] is an inductive learning framework to address the gen-

eral multi-view learning problem, in which the co-regularization technique is utilized to

enforce the agreement with other views on unlabeled samples.

• MLHR: e multi-feature fusion via hierarchical regression [181] is a semi-supervised

learning method, which has been developed to explore the structural information embed-

ded in data from the view of multi-feature fusion.

3.7. MULTI-MODAL TRANSDUCTIVE LOW-RANK LEARNING 55

• MSNL: Multiple social network learning (MSNL) [149] is proposed to address the in-

complete data in source conﬁdence and source consistency by modeling source conﬁdence

and source consistency simultaneously.

• MvDA: Multi-view discriminant analysis (MvDA) [75] is a multi-view learning model,

which has been developed to search for a latent common space by enforcing the view-

consistency of multi-linear transforms.

• TMALL: e transductive multi-modal learning (TMALL) model is presented for pre-

dicting the popularity of micro-videos, in which diﬀerent modal features can be uniﬁed

and preserved in a latent common space to address the insuﬃcient information problems.

• ELM: As ELM [68, 154] can embed a wide type of feature mappings, Huang et al. [67]

extended ELM to kernel learning and proposed a uniﬁed learning mechanism for regres-

sion applications with higher scalability and less computational complexity.

Table 3.10 reports the prediction performances of our proposed method and other state-

of-the-art algorithms. From this table, we have the following observations: (1) our proposed TL-

RMVR performs the best among all the comparative methods; (2) lasso and MLR performs the

worst, as expected, indicating that simple feature selection and linear regression are insuﬃcient

to predict the popularity of micro-videos; (3) in contrast to Lasso and MLR, the algorithms, in-

cluding RegMVMT, MLHR, MSNL, MvDA, and TMALL, also perform comparably, which

can be attributed to their ability to solve the multi-view/modal feature fusion problem; (4) after

employing the RBF kernel to deal with multiple features, the SVR model provides a signiﬁcant

Table 3.10: Performance comparison between our proposed method and several state-of-the-art

methods on Dataset I

Methods nMSE P-value

MLR 1.442 ± 2.55e-01 1.05e-07

Lasso 1.568 ± 1.72e-01 4.42e-08

SVR 0.991 ± 5.00e-02 7.36e-06

RegMVMT 1.058 ± 4.33e-05 1.88e-03

MLHR 1.167 ± 1.40e-02 4.75e-06

MSNL 1.098 ± 1.30e-01 2.11e-04

MvDA 0.982 ± 7.00e-03 2.62e-05

TMALL 0.979 ± 9.42e-03 1.43e-08

ELM 0.982 ± 6.68e-05 3.71e-07

TLRMVR 0.934 ± 7.67e-04 –

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary