Blank Page (1/3)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Next Chapter

Blank Page (2/3)

Multimodal Learning

toward Micro-Video

Understanding

Liqiang Nie

Meng Liu

Xuemeng Song

Series Editor: Alan C. Bovik, University of Texas, Austin

Multimodal Learning toward Micro-Video Understanding

Liqiang Nie, Shandong University, Jinan, China

Meng Liu, Shandong University, Jinan, China

Xuemeng Song, Shandong University, Jinan, China

Micro-videos, a new form of user-generated content, have been spreading widely across various social platforms, such

as Vine, Kuaishou, and TikTok. Dierent from traditional long videos, micro-videos are usually recorded by smart

mobile devices at any place within a few seconds. Due to their brevity and low bandwidth cost, micro-videos are gaining

increasing user enthusiasm. e blossoming of micro-videos opens the door to the possibility of many promising

applications, ranging from network content caching to online advertising. us, it is highly desirable to develop an

eective scheme for high-order micro-video understanding.

Micro-video understanding is, however, non-trivial due to the following challenges: (1) how to represent micro-

videos that only convey one or few high-level themes or concepts; (2) how to utilize the hierarchical structure of venue

categories to guide micro-video analysis; (3) how to alleviate the inuence of low quality caused by complex surrounding

environments and camera shake; (4) how to model multimodal sequential data, i.e. textual, acoustic, visual, and social

modalities to enhance micro-video understanding; and (5) how to construct large-scale benchmark datasets for analysis.

ese challenges have been largely unexplored to date.

In this book, we focus on addressing the challenges presented above by proposing some state-of-the-art multimodal

learning theories. To demonstrate the eectiveness of these models, we apply them to three practical tasks of micro-video

understanding: popularity prediction, venue category estimation, and micro-video routing. Particularly, we rst build

three large-scale real-world micro-video datasets for these practical tasks. We then present a multimodal transductive

learning framework for micro-video popularity prediction. Furthermore, we introduce several multimodal cooperative

learning approaches and a multimodal transfer learning scheme for micro-video venue category estimation. Meanwhile,

we develop a multimodal sequential learning approach for micro-video recommendation. Finally, we conclude the book

and gure out the future research directions in multimodal learning toward micro-video understanding.

store.morganclaypool.com

About SYNTHESIS

This volume is a printed version of a work that appears in the Synthesis

Digital Library of Engineering and Computer Science. Synthesis

books provide concise, original presentations of important research and

development topics, published quickly, in digital and print formats.

Synthesis Lectures on

Image, Video  Multimedia Processing

Series ISSN: 1559-8136

Synthesis Lectures on

Image, Video  Multimedia Processing

NIE • ET AL MULTIMODAL LEARNING TOWARD MICROVIDEO UNDERSTANDING MORGAN & CLAYPOOL

Multimodal Learning toward

Micro-Video Understanding

Synthesis Lectures on Image,

Video, and Multimedia

Processing

Editor

Alan C. Bovik, University of Texas, Austin

e Lectures on Image, Video and Multimedia Processing are intended to provide a unique and

groundbreaking forum for the world’s experts in the ﬁeld to express their knowledge in unique and

eﬀective ways. It is our intention that the Series will contain Lectures of basic, intermediate, and

advanced material depending on the topical matter and the authors’ level of discourse. It is also

intended that these Lectures depart from the usual dry textbook format and instead give the author

the opportunity to speak more directly to the reader, and to unfold the subject matter from a more

personal point of view. e success of this candid approach to technical writing will rest on our

selection of exceptionally distinguished authors, who have been chosen for their noteworthy

leadership in developing new ideas in image, video, and multimedia processing research,

development, and education.

In terms of the subject matter for the series, there are few limitations that we will impose other

than the Lectures be related to aspects of the imaging sciences that are relevant to furthering our

understanding of the processes by which images, videos, and multimedia signals are formed,

processed for various tasks, and perceived by human viewers. ese categories are naturally quite

broad, for two reasons: First, measuring, processing, and understanding perceptual signals involves

broad categories of scientiﬁc inquiry, including optics, surface physics, visual psychophysics and

neurophysiology, information theory, computer graphics, display and printing technology, artiﬁcial

intelligence, neural networks, harmonic analysis, and so on. Secondly, the domain of application of

these methods is limited only by the number of branches of science, engineering, and industry that

utilize audio, visual, and other perceptual signals to convey information. We anticipate that the

Lectures in this series will dramatically inﬂuence future thought on these subjects as the

Twenty-First Century unfolds.

Multimodal Learning toward Micro-Video Understanding

Liqiang Nie, Meng Liu, and Xuemeng Song

2019

Virtual Reality and Virtual Environments in 10 Lectures

Stanislav Stanković

2015

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Blank Page (1/3)

Create new playlist

Sign In

Sign Up

Table of Contents for
Blank Page (1/3)