11
C H A P T E R 2
Data Collection
In this book, we have three micro-video datasets corresponding to the tasks of popularity pre-
diction, venue category estimation, and micro-video routing, respectively. In this chapter, we
detail them one by one.
2.1 DATASET I FOR POPULARITY PREDICTION
e first micro-video data collection, dubbed Dataset I, was crawled from one of the most promi-
nent micro-video sharing social networks, Vine. e reason we chose Vine is because in addition
to the historical uploaded micro-videos, it also archives users’ profiles and their social connec-
tions.
In particular, we first randomly selected 10 active Vine users from Rankzoo,
1
which pro-
vides the top 1,000 active users of Vine, as the seed users. Considering that these seed users
may have millions of followers, we practically only retained the first 1,000 returned followers for
each seed user to improve the crawling efficiency. We then adopted the breadth-first strategy to
expand our user set by gathering their followers. is is accomplished with the help of the public
Vine API.
2
We terminated our expansion after three layers. After three layers of crawling, we
harvested a densely connected user set consisting of 98,166 users as well as 120,324 following
relationships among users. For each user, his/her brief profile was crawled, containing full name,
description, location, follower count, followee count, like count, post count, and loop count of
all posted videos. Besides, we also collected the timeline (the micro-video posting history, in-
cluding the repostings from others) of each user between July 1 and October 1, 2015. Finally, we
obtained 1.6 million video postings, including a total number of 303,242 unique micro-videos
with a total duration of 499.8 h. In Figure 2.1, we show the procedure of the Dataset I collection.
To measure the popularity of micro-videos, we considered four popularity-related indi-
cators as shown in Figure 2.2, namely, the number of comments (n_comments), the number
of likes (n_likes), the number of reposts (n_reposts), and the number of loops/views (n_loops)
to measure the popularity of micro-videos. Figure 2.3 illustrates the proportion of micro-videos
regarding each of the four indicators in our dataset; it is noted that each distribution is different,
and each measures one aspect of popularity. In order to comprehensively and precisely mea-
sure the popularity of each micro-video, y
i
, we linearly fuse all four indicators as the popularity
1
https://rankzoo.com/vine_users
2
https://github.com/davoclavo/vinepy
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset