3.6. MULTIMODAL TRANSDUCTIVE LEARNING 31
consider the squared loss regarding the N unlabeled samples to guarantee the learning perfor-
mance. We ultimately reach our objective function as
min
f;L.S
0
/
N
X
iD1
.y
i
f
i
/
2
C f
T
L.S
0
/f C
K
X
kD1
1
tr.L.S
0
//
L.S
0
/
1
tr.L.S
k
//
L.S
k
/
2
F
;
where and are both nonnegative regularization parameters. To be more specific, penal-
izes the disagreement among the latent space and modalities, and
encourages that similar
popularity will be assigned to similar micro-videos.
3.6.2 OPTIMIZATION
To simplify the representation, we first define that
8
<
:
Q
L D
1
tr.L.S
0
//
L.S
0
/;
Q
L
k
D
1
tr.L.S
k
//
L.S
k
/:
(3.9)
erefore, the objective function can be transformed to
min
f
N
X
iD1
.y
i
f
i
/
2
C
K
X
k
D
1
Q
L
Q
L
k
2
F
C f
T
Q
Lf; s.t. tr.L.S
0
// D 1: (3.10)
Furthermore, to optimize
Q
L more efficiently, inspired by the property that tr.
Q
L
k
/ D 1, we
let
L.S
0
/ D
K
X
kD1
ˇ
k
Q
L
k
; s.t.
K
X
kD1
ˇ
k
D 1: (3.11)
Consequently, we have,
Q
L D
1
tr.L.S
0
//
L.S
0
/ D
K
X
kD1
ˇ
k
Q
L
k
; s.t.
K
X
kD1
ˇ
k
D 1: (3.12)
Interestingly, we find that ˇ
k
can be treated as the co-related degree between the latent common
space and each modality. It is worth noting that we do not impose the constraint of ˇ 0, since
we want to keep both positive and negative co-relations. A positive coefficient indicates the
positive correlation between the modality space and the latent common space, while a negative
coefficient reflects the negative correlation, which may be due to the noisy data of the modality.
e larger the ˇ
k
is, the higher correlation between the latent space and the k-th modality will