38 3. MULTIMODAL TRANSDUCTIVE LEARNING
(a) Illustration of three micro-video pairs, and each pair was published by two distinct users. e
publishers of the videos in top row are much more famous than those of the bottom.
(b) Illustration of three micro-video pairs—each pair was published by the same user. e videos
in the first row are much more acoustically comfortable, visually joyful, and
aesthetically beautiful than those in the second row.
(c) Illustration of three popular micro-videos with different textual descriptions, which
contains superstar names, hot events, and detail information, respectively.
Figure 3.2: Comparative illustration of video examples in Dataset I. ey respectively: justify the
importance of social, acoustic as well as visual, and textual modalities, we use three key frames
to represent each video.
e construction of H has the time complexity of O.K
2
.N C M //. Fortunately, H keeps the
same in each iteration, and thus can be computed by offline. e computation of g needs the
time cost O.K.N C M /
2
/. In addition, computing the inverse of H and G has the complexity
of O.K
3
/ and O..N C M /
3
/, respectively. e computation cost of ˇ in Eq. (3.19) is O.K
2
/.
erefore, the speed bottleneck lies in the computation of the inverse of G. In practice, the
proposed TMALL model converges very fast, which on average takes less than 10 iterations.
Overall, the learning process over 9,720 micro-videos can be accomplished within 50 s.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset