Diversity of User Activity and Content Quality in Online Communities PowerPoint PPT Presentation

presentation player overlay
1 / 47
About This Presentation
Transcript and Presenter's Notes

Title: Diversity of User Activity and Content Quality in Online Communities


1
Diversity of User Activity and Content Quality
in Online Communities
  • Tad Hogg and Gabor Szabo
  • HP Labs

thanks to C. Chan and J. Kittiyachavalit
(Essembly) M. Brzozowski and D. Wilkinson (HP)
2
online communities
wisdom of crowds
3
Why model online communities?
  • predict
  • e.g., which new content will become popular?
  • design web sites
  • e.g., what to show users?
  • encourage high-quality contributions
  • e.g., what incentives?

4
heterogeneity is pervasive
  • most activity from a few top users
  • most interest in small fraction of content
  • broad, long-tail distributions

typical ltlt average ltlt maximum
5
topics
  • case study Essembly
  • user activity
  • content ratings

6
What is Essembly?
  • political discussion web site
  • help people identify others with similar views
  • self-organize for political activity

7
Essembly resolves
  • users create resolves
  • e.g., free trade is good for American workers
  • other users vote comment
  • 4-point scale
  • agree, lean agree, lean against, against

8
Why study Essembly?
  • voting history since start of site
  • modest-sized community
  • can examine all users and content
  • useful to study diversity
  • distinct link semantics
  • friend, ally, nemesis
  • similar diversity as other communities
  • Digg, Wikipedia,

9
data set
  • Aug. 2005 to
  • Dec. 2006
  • 15,424 users
  • 24,953 resolves
  • 1.3 million votes
  • networks
  • comments

50 new resolves per day
10
data limitations
  • anonymous
  • no user characteristics
  • e.g., demographics, political party,
  • no content of resolves or comments
  • e.g., political topic area
  • environment, economics, foreign aid,.
  • hence
  • cant test if characteristics explain diversity
  • user privacy vs. research usefulness
  • no info on
  • which resolves users view (but dont vote on)
  • how users find resolves (e.g., via networks)

11
topics
  • case study Essembly
  • user activity
  • content ratings

12
user activity
4741 active users with at least one action
actions create a resolve, vote on a resolve,
form a link
13
user model
inactive no activity for at least 30
days (conventional, but somewhat arbitrary,
definition)
  • how long user is active
  • how often user contributes while active

correlation between activity time and rate
-0.07 model as independent components of user
behavior caveat users active only a short time
have larger (negative) correlation -0.2
14
user model
this model consider whether user votes on
resolve not how user voted (agree,,disagree) or
comments
note how users vote correlates with link type
(friend, ally, nemesis) M. Brzozowski et al.,
"Friends and Foes Ideological Social
Networking", Proc of CHI 2008
15
user activitymodel components
  • activity time
  • activity rate

16
activity time distributionstretched exponential
for users active at least 1 day
  • diverse time scales for user participation
  • users active a long time less likely to quit in
    next day than new users
  • applies to many online communities Wilkinson
    2008

17
user activitymodel components
  • activity time
  • activity rate

18
activity rate distribution lognormal
actions create a resolve, vote on a resolve,
form a link
19
user activity
  • activity time
  • activity rate
  • combined model

20
user activity distribution
  • product (activity time) x (activity rate)

model captures diversity of action counts, but
not bursts of activity (sessions of 3
hours with longer breaks)
4741 active users with at least one action
21
What determines user activity?
  • diversity from two underlying broad
    distributions
  • activity time (stretched exponential)
  • multiple time scales for losing interest in site
  • activity rate (lognormal)
  • multiplicative process leading to activity rate
    heterogeneity
  • open question
  • What user characteristics and community
    properties produce these distributions?

22
activity timeprior interest or experience?
nature
23
How to encourage participation?
  • nature
  • attract users whose interests fit the community
  • expose potential users to site, word of mouth,
  • nurture
  • improve rewards of use to keep people engaged
  • top contributor status, niche subgroups,

24
topics
  • case study Essembly
  • user activity
  • content ratings

25
votes on resolves
24953 resolves
similar broad distribution in other online
communities Digg, Wikipedia, Wilkinson 2008
26
vote model
  • visibility
  • how easily users find a resolve
  • interestingness
  • probability users who see a resolve vote on it

similar model for Digg Lerman 2007
27
content ratingsmodel components
  • visibility
  • interest

28
visibilityhow users find content
  • browse
  • e.g., recent or popular
  • in general and within online network
  • word of mouth
  • from people aware of, and liking, the content
  • e.g., link on a blog
  • search

29
visibility distribution power-law
  • recency is key factor for visibility in Essembly
  • contrast with controversy (standard dev. of
    votes) not correlated with number of votes

(number of subsequently introduced resolves)
30
content ratingsmodel components
  • visibility
  • interest

31
interestingnesshow much users like what they see
  • persistent property of resolves
  • resolves consistently get few or many votes
    compared to average at similar age
  • may have time dependence
  • novelty decay Wu Huberman 2007
  • e.g., current news stories (Digg)
  • vs. ideological discussions (e.g., free
    trade)

32
model parameter estimation
  • model
  • visibility based on recency
  • next vote goes to resolve x with relative
    probability rx f(ax)
  • r is resolves interestingness
  • a is resolves age
  • number of subsequently introduced resolves
  • simultaneously estimate
  • aging visibility function f(a)
  • interestingness for resolves r1,r2,
  • arbitrary scale factor for f and r
  • we take f(1)1

33
interestingness distribution lognormal
normal distribution fit to log(r) values
34
growth in number of votesfor high and low
interestingness
two examples
log scale
r0.65
r0.01
(number of subsequently introduced resolves)
35
content ratings
  • visibility
  • interest
  • combined model

36
vote distribution
  • sample at different ages from a multiplicative
    process double Pareto lognormal distribution

Reed Jorgensen 2004
37
What determines content value?
  • lognormal ? multiplication of factors
  • possible mechanisms
  • rich get richer
  • inherited wealth
  • or a mix of both

38
model visibility and interest lead to votes
votes increase visibility (popular resolves)
votes
visibility
interest
39
votes ? more votesrich get richer
  • new votes
  • proportional to number of prior votes
  • with some variation
  • influenced by observed popularity
  • among all users or just friends
  • examples
  • costly to evaluate content personally
  • fashion, latest cool product

40
match user interestsinherited wealth
  • new votes
  • from matching users prior interests
  • with some variation
  • e.g. popular vs. niche political topics
  • why a broad distribution?
  • possibly information cascade confirmation bias
  • M. Shermer The Political Brain Scientific Amer.
    July 2006
  • S. Bikhchandani et al., A Theory of Fads J.
    Political Economy 100992 (1992)

41
topics
  • case study Essembly
  • user activity
  • content ratings
  • additional behaviors

42
predictions from early behavior
  • model can identify
  • new users likely to be very active
  • new resolves likely to have high interest
  • by factoring
  • web site properties (visibility)
  • user properties (interest in content)
  • also with other sites Digg, YouTube
  • e.g., Crane Sornette 2008 Lerman Galstyan
    2008 Szabo Huberman 2008

43
number of links per user
  • model links due to common votes
  • as intended to link ideologically similar users
  • caveat linked users also share visibility ? votes

degree distribution
Hogg Szabo, in Europhysics Letters (to appear)
44
Do active users create interesting resolves?
r vs. user activity rate
r vs. user activity time
(actions/day)
1827 active users who introduced at least one
resolve
little correlation between a users activity
and interestingness of resolves from that user
45
future work summary
46
distinguishing mechanisms(future work)
  • experiments
  • alter information shown to random groups of users
  • can change both visibility and popularity
    measures
  • e.g., music downloads Salganik et al, 2006
  • correlation ? causal factors
  • do votes depend on how users find content?
  • e.g., influence of friends
  • relate to characteristics of content and users

47
summary
  • heterogeneous behavior
  • user activity
  • interest in content
  • model via components of behavior
  • steps toward identifying mechanisms
  • example political discussion Essembly
  • user activity time on site activity rate
  • votes visibility interestingness
  • experiments to distinguish mechanisms
Write a Comment
User Comments (0)
About PowerShow.com