66 4. MULTIMODAL COOPERATIVE LEARNING
where
i
’s are the introduced variables that should satisfy
P
M
iD1
i
D 1,
i
> 0 and the equality
holds for
i
D jb
i
j=kbk
1
. Based on this preliminary, we can derive the following inequality:
X
v2V
e
v
k
W
G
v
k
!
2
K
X
kD1
X
v2V
e
2
v
kw
k
G
v
k
2
2
q
k;v
; (4.10)
where
P
k
P
v
q
k;v
D 1, q
k;v
0, 8k; v; w
k
G
v
denotes the k-th row vector of the group matrix
W
G
v
. It worth noting that the equality holds when
q
k;v
D
e
2
v
w
k
G
v
2
2
P
K
kD1
P
v2V
e
2
v
w
k
G
v
2
2
: (4.11)
us far, we have theoretically derived that minimizing with respect to W is equivalent
to minimizing the following convex objective function:
min
W;q
k;v
1
2
k
Y BW
k
2
F
C
1
2
S
X
sD1
X
s
A
s
B
2
F
C
2
2
S
X
sD1
A
s
2
F
C
3
2
K
X
kD1
X
v2V
e
v
W
k
G
v
2
q
k;v
:
(4.12)
To facilitate the computation of the derivative of objective function with respect to w
t
for the
t-th task, we define a diagonal matrix Q
t
2 R
KK
with the diagonal entry as follows:
Q
t
kk
D
X
fv2Vjt 2vg
e
2
v
q
k;v
: (4.13)
We ultimately have the following objective function:
min
W;Q
T
X
tD1
k
y
t
Bw
t
k
2
F
C
1
2
S
X
sD1
k
X
s
A
s
B
k
2
F
C
2
2
S
X
sD1
k
A
s
k
2
F
C
3
2
T
X
tD1
w
T
t
Q
t
w
t
: (4.14)
e alternative optimization strategy is also applicable here. By fixing Q
t
, taking derivative
of the above formulation regarding w
t
, and setting it to zero, we reach
w
t
D
B
T
B C
3
Q
t
1
B
T
y
t
: (4.15)
Once we obtain all the w
t
, we can easily compute Q
t
based on Eq. (4.11).
4.4.2 TASK RELATEDNESS ESTIMATION
According to our assumption, the hierarchical tree structure of venue categories plays a pivotal
role to boost the learning performance in our model. Hence, the key issue is how to precisely