149
Bibliography
[1] S. Akaho. A kernel method for canonical correlation analysis. IMPS, 40(2):263–269,
2006. 15, 26, 93
[2] K. Ashraf, B. Elizalde, F. Iandola, M. Moskewicz, J. Bernd, G. Friedland, and K. Keutzer.
Audio-based multimedia event detection with DNNs and sparse sampling. In ACM
ICMR
, pages 611–614, 2015.
DOI: 10.1145/2671188.2749396 110
[3] F. R. Bach. Consistency of the group lasso and multiple kernel learning. In JMLR, vol. 9,
pages 1179–1225, 2008. 65, 77
[4] F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality,
and the SMO algorithm. In ICML, pages 6–13, 2004. DOI: 10.1145/1015330.1015424
16, 26
[5] Y. Bae and H. Lee. Sentiment analysis of twitter audiences: Measuring the positive or
negative influence of popular twitterers. Journal of the American Society for Information
Science and Technology, 63(12):2521–2535, 2012. DOI: 10.1002/asi.22768 24
[6] S. Bahrampour, N. M. Nasrabadi, A. Ray, and W. K. Jenkins. Multimodal task-
driven dictionary learning for image classification. IEEE Transactions on Image Processing,
25(1):24–38, 2016. DOI: 10.1109/tip.2015.2496275 62, 82, 118
[7] S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, and
M. Aly. Video suggestion and discovery for youtube: Taking random walks through the
view graph. In Proc. of the ACM International Conference on World Wide Web, pages 895–
904, 2008. DOI: 10.1145/1367497.1367618 125, 126
[8] J. Berger. Arousal increases social transmission of information. Psychological Science,
22(7):891–893, 2011. DOI: 10.1177/0956797611413294 23
[9] J. Berger and K. L. Milkman. What makes online content viral? Journal of Marketing
Research, 49(2):192–205, 2012. DOI: 10.2139/ssrn.1528077 23
[10] S. Bhattacharya, B. Nojavanasghari, T. Chen, D. Liu, S.-F. Chang, and M. Shah. To-
wards a comprehensive computational model foraesthetic assessment of videos. In Proc. of
the ACM Multimedia Conference, pages 361–364, 2013. DOI: 10.1145/2502081.2508119
22
150 BIBLIOGRAPHY
[11] D. Borth, R. Ji, T. Chen, T. M. Breuel, and S. Chang. Large-scale visual sentiment
ontology and detectors using adjective noun pairs. In Proc. of the ACM Multimedia Con-
ference, pages 223–232, 2013. DOI: 10.1145/2502081.2502282 22
[12] S. Burger, Q. Jin, P. F. Schulam, and F. Metze. Noisemes: Manual annotation of envi-
ronmental noise in audio streams. Technical Report CMU-LTI-12–07, pages 1–5, 2012.
15, 111
[13] G. Buscher, E. Cutrell, and M. R. Morris. What do you see when you’re surfing?:
Using eye tracking to predict salient regions of web pages. In Proc. of the SIGCHI
Conference on Human Factors in Computing Systems, pages 21–30, ACM, 2009. DOI:
10.1145/1518701.1518705 145
[14] J.-F. Cai, E. J. Candès, and Z. Shen. A singular value thresholding algorithm for
matrix completion. SIAM Journal on Optimization, 20(4):1956–1982, 2010. DOI:
10.1137/080738970 45
[15] S. Cao and N. Snavely. Graph-based discriminative learning for location recognition. In
IEEE CVPR, pages 700–707, 2013. DOI: 10.1109/cvpr.2013.96 60, 61
[16] S. Cappallo, T. Mensink, and C. G. Snoek. Latent factors of visual popularity prediction.
In Proc. of International Conference on Multimedia Retrieval, pages 195–202, 2015. DOI:
10.1145/2671188.2749405 24
[17] D. Castan and M. Akbacak. Segmental-GMM approach based on acoustic concept
segmentation. In SLAM@ INTERSPEECH, pages 15–19, 2013. 110
[18] M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon. I tube, you tube, ev-
erybody tubes: Analyzing the worlds largest user generated content video system. In
Proc. of ACM SIGCOMM Conference on Internet Measurement, pages 1–14, 2007. DOI:
10.1145/1298306.1298309 24
[19] K. Chaudhuri, S. M. Kakade, K. Livescu, and K. Sridharan. Multi-view clustering via
canonical correlation analysis. In Proc. of the International Conference on Machine Learning,
pages 129–136, ACM, 2009. DOI: 10.1145/1553374.1553391 26
[20] S. Chaudhuri and B. Raj. Unsupervised structure discovery for semantic analysis of audio.
In NIPS, pages 1178–1186, 2012. 110
[21] B.-C. Chen, Y.-Y. Chen, F. Chen, and D. Joshi. Business-aware visual concept discovery
from social media for multimodal business venue recognition. In AAAI, pages 61–68,
2016. 61
BIBLIOGRAPHY 151
[22] C.-F. Chen, C.-P. Wei, and Y.-C. F. Wang. Low-rank matrix recovery with structural
incoherence for robust face recognition. In Proc. of IEEE Conference on Computer Vision
and Pattern Recognition, pages 2618–2625, 2012. DOI: 10.1109/cvpr.2012.6247981 26
[23] D. M. Chen, G. Baatz, K. Köser, S. S. Tsai, R. Vedantham, T. Pylvä, K. Roimela,
X. Chen, J. Bach, M. Pollefeys, et al. City-scale landmark identification on mobile de-
vices. In IEEE CVPR, pages 737–744, 2011. DOI: 10.1109/cvpr.2011.5995610 60,
61
[24] J. Chen, X. Song, L. Nie, X. Wang, H. Zhang, and T.-S. Chua. Micro tells macro: Pre-
dicting the popularity of micro-videos via a transductive model. In Proc. of ACM Interna-
tional Conference on Multimedia, pages 898–907, 2016. DOI: 10.1145/2964284.2964314
20, 24, 54
[25] J. Chen, H. Zhang, X. He, L. Nie, W. Liu, and T.-S. Chua. Attentive collaborative fil-
tering: Multimedia recommendation with item-and component-level attention. In Proc.
of the International ACM SIGIR Conference on Research and Development in Information
Retrieval, pages 335–344, 2017. DOI: 10.1145/3077136.3080797 125, 126
[26] J. Chen, J. Zhou, and J. Ye. Integrating low-rank and group-sparse structures for robust
multi-task learning. In ACM KDD, pages 42–50, 2011. DOI: 10.1145/2020408.2020423
69
[27] N. Chen, J. Zhu, and E. P. Xing. Predictive subspace learning for multi-view data: A
large margin approach. In NIPS, pages 361–369, 2010. 61, 63
[28] X. Chen, D. Liu, Z.-J. Zha, W. Zhou, Z. Xiong, and Y. Li. Temporal hierarchi-
cal attention at category-and item-level for micro-video click-through prediction. In
Proc. of the ACM International Conference on Multimedia, pages 1146–1153, 2018. DOI:
10.1145/3240508.3240617 16, 126, 127, 134
[29] J. Choi, G. Friedland, V. Ekambaram, and K. Ramchandran. Multimodal location esti-
mation of consumer media: Dealing with sparse training data. In IEEE ICME, pages 43–
48, 2012. DOI: 10.1109/icme.2012.141 61
[30] W. Chong, D. Blei, and F.-F. Li. Simultaneous image classification and annotation. In
IEEE Conference on Computer Vision and Pattern Recognition, pages 1903–1910, 2009.
DOI: 10.1109/cvprw.2009.5206800 74
[31] C. M. Christoudias, R. Urtasun, A. Kapoorz, and T. Darrell. Co-training
with noisy perceptual observations. In CVPR, pages 2844–2851, 2016. DOI:
10.1109/cvpr.2009.5206572 25
152 BIBLIOGRAPHY
[32] D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. In
Proc. of ACM Symposium on eory of Computing, pages 1–6, 1987. DOI: 10.1016/s0747-
7171(08)80013-2 56
[33] D. J. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world’s
photos. In ACM WWW, pages 761–770, 2009. DOI: 10.1145/1526709.1526812 61
[34] P. Cui, Z. Wang, and Z. Su. What videos are similar with you?: Learning a common
attributed representation for video recommendation. In ACM MM, pages 597–606, 2014.
DOI: 10.1145/2647868.2654946 125, 126
[35] A. Culotta, N. K. Ravi, and J. Cutler. Predicting the demographics of twitter users from
website traffic data. In National Conference of the American Association for Artificial Intel-
ligence, pages 72–78, 2015. 82, 118
[36] S. Dhar, V. Ordonez, and T. L. Berg. High level describable attributes for predicting
aesthetics and interestingness. In Proc. of the IEEE Conference on Computer Vision and
Pattern Recognition, pages 1657–1664, 2011. DOI: 10.1109/cvpr.2011.5995467 22
[37] T. Diethe, D. R. Hardoon, and J. Shawe-Taylor. Multiview Fisher discriminant analysis.
In NIPS, pages 1–8, 2008. 16, 26
[38] W. Ding, Y. Shang, L. Guo, X. Hu, R. Yan, and T. He. Video popularity predic-
tion by sentiment propagation via implicit network. In Proc. of ACM International on
Conference on Information and Knowledge Management, pages 1621–1630, 2015. DOI:
10.1145/2806416.2806505 24
[39] Z. Ding and Y. Fu. Low-rank common subspace for multi-view learning. In
Proc. of IEEE International Conference on Data Mining, pages 110–119, 2014. DOI:
10.1109/icdm.2014.29 27
[40] Z. Ding, M. Shao, and Y. Fu. Latent low-rank transfer subspace learning for missing
modality recognition. In Proc. of AAAI Conference on Artificial Intelligence, pages 1192–
1198, 2014. 26
[41] Z. Ding and Y. Fu. Robust multi-view subspace learning through dual low-rank decom-
positions. In Proc. of AAAI Conference on Artificial Intelligence, pages 1181–1187, 2016.
27
[42] S. K. D’Mello and J. Kory. A review and meta-analysis of multimodal affect detection
systems. ACM Computing Surveys, 47(3):1–36, 2015. DOI: 10.1145/2682899 25
[43] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A
deep convolutional activation feature for generic visual recognition. In ICML, pages 647–
655, 2014. 118
BIBLIOGRAPHY 153
[44] C. Dong, C. C. Loy, K. He, and X. Tang. Image super-resolution using deep con-
volutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence,
38(2):295–307, 2016. DOI: 10.1109/tpami.2015.2439281 85
[45] L. Duan, D. Xu, I. W. Tsang, and J. Luo. Visual event recognition in videos by learning
from web data. TPAMI, 34(9):1667–1680, 2012. DOI: 10.1109/cvpr.2010.5539870 26
[46] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over
learned dictionaries. IEEE Transactions on Image Processing, 15(12):3736–3745, 2006.
DOI: 10.1109/tip.2006.881969 62
[47] F. Feng, L. Nie, X. Wang, R. Hong, and T. S. Chua. Computational social indica-
tors: A case study of Chinese university ranking. In SIGIR, pages 455–464, 2017. DOI:
10.1145/3077136.3080773 92
[48] A. Ferracani, D. Pezzatini, M. Bertini, and A. Del Bimbo. Item-based video
recommendation: An hybrid approach considering human factors. In Proc. of the
ACM on International Conference on Multimedia Retrieval, pages 351–354, 2016. DOI:
10.1145/2911996.2912066 15, 125, 126
DOI: 10.1145/2072609.2072619
[49] G. Friedland, J. Choi, H. Lei, and A. Janin. Multimodal location estimation on flickr
videos. In Proc. of the ACM SIGMM International Workshop on Social Media, pages 23–28,
2011. DOI: 10.1145/2072609.2072619 4, 61
[50] G. Friedland, O. Vinyals, and T. Darrell. Multimodal location estimation. In ACM MM,
pages 1245–1252, 2010. DOI: 10.1145/1873951.1874197 61
[51] H. Gao, F. Nie, X. Li, and H. Huang. Multi-view subspace clustering. In
IEEE International Conference on Computer Vision, pages 4238–4246, 2015. DOI:
10.1109/iccv.2015.482 26
[52] J. Gao, T. Zhang, and C. Xu. A unified personalized video recommendation via dynamic
recurrent neural networks. In Proc. of the ACM International Conference on Multimedia,
pages 127–135, 2017. DOI: 10.1145/3123266.3123433 126, 127
[53] Y. Gao, T. Zhang, and J. Xiao. ematic video thumbnail selection. In 16th
IEEE International Conference on Image Processing (ICIP), pages 4333–4336, 2009. DOI:
10.1109/icip.2009.5419128 145
DOI: 10.1145/2733373.2806361
[54] F. Gelli, T. Uricchio, M. Bertini, A. Del Bimbo, and S.-F. Chang. Image popularity
prediction in social media using sentiment and context features. In Proc. of ACM Interna-
tional Conference on Multimedia, pages 907–910, 2015. DOI: 10.1145/2733373.2806361
22, 24, 42
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset