Dataset II for Venue Category Estimation

C H A P T E R 2

Data Collection

In this book, we have three micro-video datasets corresponding to the tasks of popularity pre-

diction, venue category estimation, and micro-video routing, respectively. In this chapter, we

detail them one by one.

2.1 DATASET I FOR POPULARITY PREDICTION

e ﬁrst micro-video data collection, dubbed Dataset I, was crawled from one of the most promi-

nent micro-video sharing social networks, Vine. e reason we chose Vine is because in addition

to the historical uploaded micro-videos, it also archives users’ proﬁles and their social connec-

tions.

In particular, we ﬁrst randomly selected 10 active Vine users from Rankzoo,

which pro-

vides the top 1,000 active users of Vine, as the seed users. Considering that these seed users

may have millions of followers, we practically only retained the ﬁrst 1,000 returned followers for

each seed user to improve the crawling eﬃciency. We then adopted the breadth-ﬁrst strategy to

expand our user set by gathering their followers. is is accomplished with the help of the public

Vine API.

We terminated our expansion after three layers. After three layers of crawling, we

harvested a densely connected user set consisting of 98,166 users as well as 120,324 following

relationships among users. For each user, his/her brief proﬁle was crawled, containing full name,

description, location, follower count, followee count, like count, post count, and loop count of

all posted videos. Besides, we also collected the timeline (the micro-video posting history, in-

cluding the repostings from others) of each user between July 1 and October 1, 2015. Finally, we

obtained 1.6 million video postings, including a total number of 303,242 unique micro-videos

with a total duration of 499.8 h. In Figure 2.1, we show the procedure of the Dataset I collection.

To measure the popularity of micro-videos, we considered four popularity-related indi-

cators as shown in Figure 2.2, namely, the number of comments (n_comments), the number

of likes (n_likes), the number of reposts (n_reposts), and the number of loops/views (n_loops)

to measure the popularity of micro-videos. Figure 2.3 illustrates the proportion of micro-videos

regarding each of the four indicators in our dataset; it is noted that each distribution is diﬀerent,

and each measures one aspect of popularity. In order to comprehensively and precisely mea-

sure the popularity of each micro-video, y

, we linearly fuse all four indicators as the popularity

https://rankzoo.com/vine_users

https://github.com/davoclavo/vinepy

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Dataset II for Venue Category Estimation