138 6. MULTIMODAL SEQUENTIAL LEARNING
Table 6.2: Component-wise validation of our proposed ALPINE model over Datasets III-1 and
III-2 by disabling one component each time. And statistical significance over AUC among all
baselines is determined by a t-test (4 denotes p-value <0.01 and Þ denotes p-value <0.05).
Methods
Dataset III-1 Dataset III-2
AUC P@50 R@50 F@50 AUC P@50 R@50 F@50
ALPINE_u
0.737
0.330 0.435 0.375 0.702
0.294 0.454 0.356
ALPINE_m
0.735
0.329 0.433 0.374
ALPINE_um
0.734
0.327 0.432 0.372
ALPINE_umg ALPINE_ug
0.716
0.318 0.426 0.363 0.654
0.291 0.219 0.250
ALPINE
0.739 0.331 0.436 0.376 0.713 0.300 0.460 0.362
ALPINE surpasses ALPINE_m, indicating that incorporating the user matrix layer
is beneficial to strengthen the interest representation. Moreover, compared with
ALPINE_u, the performance of ALPINE_um conformably drops 0.3% under four met-
rics, which further reflects the effectiveness of our multi-level interest modeling layer. It is
worth mentioning that the Dataset III-2 only contains click” and “not click” interaction,
therefore the corresponding results are vacant.
ALPINE_um shows the consistent improvements over the ALPINE_umg on Dataset
III-1 and ALPINE_ug on Dataset III-2. Specifically, the improvements of ALPINE_um
over these models in terms of AUC are 2.3% on Dataset III-1 and 5.9% on Dataset III-2,
demonstrating the great advantage of our novel temporal graph-based LSTM network on
capturing both dynamic and diverse interest.
6.5.5 JUSTIFICATION OF THE TEMPORAL GRAPH
Apart from achieving the superior performance, the key advantage of ALPINE over other meth-
ods is that its temporal graph structure is able to strengthen the interest representation. Toward
this end, we carried out experiments over the two datasets to verify the influence of the neighbor
size L of the temporal graph.
In this experiment, we selected the top L similar micro-videos from the graph as neighbors
of the given micro-video rather than considering the top one. Specifically, we set the average
of the top L similar micro-videos’ hidden states and memory cells as h
and c
in Eq. (6.2),
respectively. e comparison results vs. the neighbor size L are illustrated in Figure 6.6. We
found that the performance consistently drops under different evaluation metrics when L in-
creases, especially the AUC drops significantly. is may be due to the fact that much more noise
is introduced when a micro-video is connected with many others. erefore, in this chapter, we
set L equals to one.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset