Title: I Tube, You Tube, Everybody Tubes
1I Tube, You Tube, Everybody TubesPablo
RodriguezTelefonica ResearchBarcelona
2YouTube Video Example
3Content is NOT king
4Content Explosion
How to search content?
5Aggregation and Recommendation
Infinite Choice Overwhelming Confusion
Filters required to connect users with content
that appeal to their interests
6Video and Social Networks
- Trends in video services
- Users generate new videos
- Users help each other finding videos
- Need to understand users and contents
- Video characteristics in YouTube
- User-behavior and potential for recommendations
7Particularities of
- bite-size bits for high-speed munching
- Wired mag. Mar 2007
- Plethora of YouTube clones
- UGC is very different
- How different?
8UGC vs. Non-UGC
- Massive production scale
- 15 days in YouTube to produce 120-yr worth of
movies in IMDb! - Extreme publishers
- 1000 uploads over few years vs. 100 movies over
50 years - Short video length
- 30 sec5 min vs. 100 min movies in LoveFilm
- the rest consumption patterns
9User Participation/Finding Videos
- Despite Web 2.0 features, user participation
remains low - Only 0.16-0.22 viewers rate videos/comment.
- 47 videos have pointers from external sites
- But requests from such sites account for less
than 3 of the total views
10Goals and Data
- Potential for recommendation systems?
- Popularity evolution
- Content Duplication
- Crawled YouTube and other UGC systems
- metadata video ID, length, views
- 1.6M Entertainment, 250KScience videos
Goals
Data
11Part1 Popularity Distribution
- Static popularity characteristics
- Underlying mechanism
12Pareto Principle
- 10 popular videos account for 80 total views
13Dominant Power-Law Behavior
- Richer-get-richer principle
- If video has K views, then users will watch the
video with rate K
- word frequency- citations of papers - scale of
earthquakes - web hits
14UGC Video Distribution
- Straight-line waists and truncated both ends
15Focusing on Popular Videos
- Why popular videos deviate from power-law?
- Fetch-at-most-once SOSP2003
- Behavior of fetching immutable objects oncecf.
visiting popular web sites many times
16Why the Unpopular Tail Falls Off
- Natural shape is curved
- Sampling bias or pre-filters
- Publishers tend to upload interesting videos
- Information filtering or post-filters
- Search results or suggestions favor popular items
17Impact of Post-Filters
- Videos exposed longer to filtering effect
appear more truncated
video rank
18Is it Naturally Curved?
- Matlab curve fitting for Science
19Is it Naturally Curved?
- Matlab curve fitting for Science
Zipf is scale-free, while exponential is scaled
underlying mechanism is Zipf and truncation
is due to bottlenecks
20Implication of Our Findings
- Latent demand for products that is suppressed by
bottlenecks in the system - Chris Anderson, The Long Tail
40 additional views! How?
Personalized
recommendation Enriched metadataAbundant videos
21Part2 Popularity Evolution
- Relationship between popularity and age
22Popularity Evolution
- So far, we focused on static popularity
- Now focus on popularity dynamics
- How requests on any given day are distributed
across the video age? - 6-day daily trace of Science videos
- Step1- Group videos requested at least once by
age - Step2- Count request volume per age group
23Request Volume Across Age
User preference relatively insensitive to age
--gt 80 requests on videos older than a month
The probability of a video being watched is 43,
18, 17 and 14 for the first 24 hours, 6 days,
3 weeks, and 1 month accordingly
24Part4 Content Duplication
- Level of duplication
- Birth of duplicates
25Content Duplication
- Alias- identical or similar copies of the same
content - Aliases dilute popularity of a single event
- Views distributed across multiple copies
- Difficulty in recommendation ranking systems
- Test with 51 volunteers
- Find alias using keyword search
- Identified 1,224 aliases for 184 original videos
26The Level of Popularity Dilution
- Popularity diluted up to few-orders magnitude
- Often aliases got more requests than original
- (e.g. alias got gt1000 times more requests)
27How Late Aliases Appear?
- Significant aliases appear within one week
- Within the first day of posting the original
video, sometimes you get more than 80 aliases
28Conclusions
- UGC is a new form of video social interaction
- User interaction remains low
- Lots of potential for social recommendations
29Questions?
- Dataset available at http//an.kaist.ac.kr/trace
s/IMC2007.html