78 4. MULTIMODAL COOPERATIVE LEARNING
where y
n
denotes the category label of the sample x
n
. We ultimately reach the following objective
function:
min
A;Q
1
2
N
X
nD1
k
x
n
Da
n
k
2
F
C
2
N
X
nD1
k
a
n
k
2
F
C
2
N
X
nD1
.
a
n
/
T
Q
n
a
n
: (4.27)
e alternative optimization strategy is applicable here. By fixing Q
n
, taking derivative of
the above formulation regarding a
n
, and setting it to zero, we reach
8
ˆ
ˆ
ˆ
<
ˆ
ˆ
ˆ
:
D
T
.
x
n
Da
n
/
C a
n
C Q
n
a
n
D 0;
D
T
D C I C Q
n
a
n
D D
T
x
n
;
a
n
D
D
T
D C I C Q
n
1
D
T
x
n
:
(4.28)
Once we obtain all the a
n
, we can easily compute Q
n
based on Eqs. (4.24) and (4.26).
Computing D with A fixed: Fixing A and taking the derivative of with respect to D,
we have
@
@D
D .DA X/A
T
: (4.29)
By setting Eq. (4.29) to zero, it can be derived that
8
<
:
D
AA
T
XA
T
D 0;
D D
XA
T
AA
T
1
:
(4.30)
It is straightforward that the above algorithm converges, because in each iteration, will
decrease, as shown in Figure 4.6. By using Algorithm 4.2, we can learn a set of dictionaries D
m
for each modality of samples X
m
and their corresponding representations A
m
.
4.5.4 ONLINE LEARNING
As analyzed in the introduction, the efficient operation and incremental learning of micro-
videos deserve our attention. To accomplish this, we present an online learning algorithm (re-
ferred to Algorithm 4.3). Generally speaking, if an incoming sample is labeled, we leverage it
to strengthen the dictionary learning. We treat the learned D over the initial training data as
D
.0/
and update it to D
.t/
at the current time t. Otherwise, we compute its sparse representation
based on the current dictionaries and classify it into the right venue category.
An Incoming Labeled Sample: At the t-th online update, a new sample x
t
with a label
y
t
is given. We can know which leaf node this micro-video is from and then use it to update
the dictionaries D
.t1/
. From Eq. (4.30), we find that the solution of D
.t/
relies on the sparse
representation A
.t/
D ŒA
.t1/
; a
t
. We thus need to compute a
t
first that is the representation
vector of x
t
. However, Eq. (4.28) tells us that a
t
is related to Q
t
computed by A
.t/
D ŒA
.t1/
; a
t
.
4.5. MULTIMODAL COMPLEMENTARY LEARNING 79
10987654321
Iterations
10
0
10
1
10
2
10
3
10
4
10
5
Objective
Figure 4.6: Example of the convergence of Algorithm 4.2.
To address this problem, we firstly initialize a
t
to get a temporal A
.t/
D ŒA
.t1/
; a
t
, and then
we use Eq. (4.26) to compute Q
t
. Afterward, we can use Eq. (4.28) to compute a
t
with D
.t1/
as the dictionary. We repeat this procedure until we obtain the stable A
.t/
for sample x
t
.
To estimate D
.t/
when fixing A
.t/
, we adopt the similar procedure introduced in [109].
In particular, we sequentially update each column of D
.t/
. We here take the j -th column as an
example to illustrate the procedure.
We define d
j
.t/ as the j -th column of D
.t/
. And we set
g
D
.t/
D
1
2
t
X
iD1
x
i
D
.t1/
a
i
2
2
: (4.31)
We then set 5
d
j
.t/
g.D
.t/
/ to be zero, and obtain
d
j
.t/ D
P
t
iD1
a
T
ij
x
i
Q
DQa
i
P
t
iD1
a
T
ij
a
ij
; (4.32)
where a
ij
is the j -th entry of a
i
,
Q
D D D
.t1/
n fd
j
.t-1/g is a dictionary excluding the j -th atom
and Qa
i
D a
i
n fa
ij
g defines the coefficients for the corresponding dictionary atoms of
Q
D.
After deriving this equation, we have the additive property of linear solution
P
t
iD1
a
T
ij
x
i
D
.t1/
a
i
P
t
iD1
a
T
ij
a
ij
C d
j
.t 1/: (4.33)
80 4. MULTIMODAL COOPERATIVE LEARNING
Algorithm 4.3 Our INTIMATE Algorithm
Input:
Initialization input matrix fX
m
g
M
m
;
Streaming data f: : : ; x
t
; : : :g;
Node assignment fG
v
g
V
v
with weights fe
v
g
V
v
;
Parameters f; g;
Ensure:
Discriminant dictionaries fD
m
g
M
m
;
Sparse coding fa
m
t
g
M
m
of x
t
and its label;
1: Initialize fD
m
.0/
g
M
m
and fA
m
.0/
g
M
m
using Algorithm 4.2;
2: for each modality m do
3: Training the classifier f
m
using A
m
.0/
;
4: end for
5: Initialize t 1;
6: for a newly sample x
.t/
in the stream do
7: if x
.t/
has a label y
.t/
then
8: for each modality m do
9: Fixing D
m
.t1/
, learn a
m
.t/
using Eq. (4.28);
10: Fixing A
m
.t1/
and a
m
.t/
, update D
m
.t/
using Eq. (4.35) and Eq. (4.36);
11: end for
12: else if x
.t/
without label then
13: for each modality m do
14: Learning the representation a
m
.t/
with D
m
.t1/
;
15: Leveraging a
m
.t/
and f
m
, predict its label y
m
t
;
16: end for
17: Based on fy
m
t
g, obtain the final label y
t
using Eq. (4.37);
18: end if
19: update t t C 1;
20: end for
21: return D
m
D
m
.t/
We set
8
ˆ
ˆ
ˆ
ˆ
ˆ
<
ˆ
ˆ
ˆ
ˆ
ˆ
:
U.t/ D
Œ
u
1
.t/; : : : ; u
K
.t/
D U.t 1/ C a
t
.
a
t
/
T
;
F.t/ D
Œ
f
1
.t/; : : : ; f
K
.t/
D F.t 1/ C x
t
.
a
t
/
T
;
U.0/ D
P
N
nD1
a
n
.0/
.
a
n
.0/
/
T
;
F.0/ D
P
N
nD1
x
n
.0/
.
a
n
.0/
/
T
;
(4.34)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset