Multimodal Sequential Learning for Micro-Video Recommendation

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Summary

Next Chapter

Related Work

124 5. MULTIMODAL TRANSFER LEARNING

the performance is also boosted very fast. is demonstrates the rationality of a learning model.

In addition, the loss and performance tend to be stable at around 30 iterations. is signals the

convergence property of our model and also indicates its eﬃciency.

e key idea of dropout technique is to randomly drop units (along with their connec-

tions) from the neural network during training. is prevents units from co-adapting too much.

Figure 5.6d displays the macro-F1 and micro-F1 by varying the dropout ratio . From this ﬁg-

ure, it can be seen that the two measurements consistently reach their best value when using

a dropout ratio of 0.1. After 0.1, the performance decreases gradually as the dropout ratio in-

creases. is may be caused by insuﬃcient information. Also, we can see that our model suﬀers

from overﬁtting with relatively lower performance when dropout ratio is set as 0.

We also studied the impact of hidden layers on our DARE model. To save the computa-

tional tuning costs, we applied the same dropout ratio 0.1 for each hidden layer. e results of

our model with one, two, and three hidden layers are summarized in Table 5.4. Usually, stacking

more hidden layers is beneﬁcial to boost the desired performance. However, we notice that our

model achieves the best across metrics when having only one hidden layer. is is due to that,

as the authors of AlexNet clariﬁed, the current 7-layer AlexNet structure is optimal and more

layers would lead to worse results. In our work, the abstractive features of visual modality were

extracted by AlexNet with seven layers. erefore, stacking more hidden layers in our DARE

model seems to add more hidden layers to AlexNet.

Table 5.4: Performance of DARE with diﬀerent hidden layers on Dataset II (p-value1



and

p-value2



are, respectively, p-value over micro-F1 and macro-F1)

Hidden Layers Micro-F1 Macro-F1 P-value1* P-value2*

[1024] 31.21 ± 0.22% 16.66 ± 0.30% – –

[1024, 1024] 30.67 ± 0.06% 15.57 ± 0.03% 1.32e-2 3.50e-3

[1024, 1024, 1024] 29.43 ± 0.02% 13.37 ± 0.04% 1.17e-4 1.57e-6

5.7 SUMMARY

In this chapter, we study the task of micro-video category estimation. In particular, we ﬁrst

perform a user study to show that the acoustic modality conveys useful cues to signal venue

information, yet it is of low-quality. We then point out that the training sample distribution

over venue categories are extremely unbalanced. To address these problems, we present a deep

transfer model, which is able to transfer external sound knowledge to strengthen the low-quality

acoustic modality in micro-videos, and also alleviate the problem of unbalanced training sam-

ples via encoding the category structure information. To justify our model, we constructed the

external sound sets with diverse acoustic concepts, and released it to facilitate other researchers.

Experimental results on a public benchmark micro-video dataset well validate our model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Multimodal Sequential Learning for Micro-Video Recommendation

Create new playlist

Sign In

Sign Up

Table of Contents for
Multimodal Sequential Learning for Micro-Video Recommendation