The Collaborative Organization of Knowledge D. Spinellis and P. Louridas Strong Regularities in Online Peer Production D. Wilkinson - PowerPoint PPT Presentation

About This Presentation
Title:

The Collaborative Organization of Knowledge D. Spinellis and P. Louridas Strong Regularities in Online Peer Production D. Wilkinson

Description:

Harvard University The Collaborative Organization of Knowledge D. Spinellis and P. Louridas Strong Regularities in Online Peer Production D. Wilkinson – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: The Collaborative Organization of Knowledge D. Spinellis and P. Louridas Strong Regularities in Online Peer Production D. Wilkinson


1
The Collaborative Organization of KnowledgeD.
Spinellis and P. LouridasStrong Regularities in
Online Peer ProductionD. Wilkinson
Harvard University
  • Ziyad Aljarboua
  • Monday, November 10, 2008

2
Intro - Wikipedia
  • Free multilingual encyclopedia launched in 2001
  • Operated by the non-profit Wikimedia Foundation
  • Contains 2,610,291 articles in English and 10
    million in total
  • 236 active language editions
  • Content written by volunteers

3
Intro - Wikipedia
  • Developed by Jimmy Wales and Larry Sanger
  • Times 2006 list of the worlds most influential
    people
  • Largest and most popular general reference work
    on the internet. Wikipedia

Source Wikipedia
4
Intro - Wikipedia
  • No formal peer-review and changes take effect
    immediately
  • New articles are created by registered users but
    can be edited by anyone
  • Redistribution, creation of derivative works and
    commercial use of content is permitted
  • 25,000 to 60,000 page request per second
  • 50 of traffic to Wikipedia comes from Google

5
Intro - Wikipedia
Source Wikipedia
Wikipedia contributors by country
6
Intro - Wikipedia
Source Wikipedia
Article Count from Jan, 2001 to Sep 2007
7
Wikipedia - Concerns
  • Michael Scott from the office"Wikipedia is the
    best thing ever. Anyone in the world can write
    anything they want about any subject, so you know
    you are getting the best possible information".
  • Quality of articles undermined
  • Bias Content reflects contributors interest

8
Wikipedia - vandalism
9
The Collaborative Organization of Knowledge
  • Attempts to study Wikipedias growth how human
    knowledge is recorded and organized through an
    open collaborative process (in Wikipedia)
  • Examines relationship between existing and
    referenced nonexistent articles
  • How existing entries foster development of new
    entries?

10
The Collaborative Organization of Knowledge
  • Examines the recorded evolutionary development of
    Wikipedia's structure through article revisions
    and contributions
  • Motivation Wikipedias coverage has not declined
    while its scope sharply increased.

11
Growth
  • Technologies and open participation policy behind
    rapid growth
  • Edit with no prior authorization
  • Edit history for all pages
  • Watchlist for users to alerts them for changes in
    their selected pages
  • Ability to revert changes if page is vandalized
  • Ability to lock entries against revisions
  • Easiness to link to other articles
  • Categorizing articles using markup tags

12
The Study
  • Study processed all material on Wikipedia as of
    February of 2006 (485GB worth of xml documents)
  • examined all recorded changes (28.2 million
    revisions on 1.9 million pages) and how entries
    were created and linked

13
General findings
  • Reverting is returning page to previous version
    most of the time to undo vandalism
  • 4 of article revisions were reverts
  • Average time to revert a vandalized page is 13
    hours
  • 11 of pages that were reverted at least once had
    been vandalized at least once
  • Most reverted and revised George W. Bush with
    28,000 revisions (29,300 reverts and vandalism)
  • 2,441 entries (0.13) locked
  • 20 of articles were stubs

14
Conclusion 1
  • Creation of new Wikipedia entries is not a random
    process but is related to the references to
    nonexistent articles
  • what drives Wikipedia growth is the inclusion of
    red links, ie references to articles that do not
    exist yet. Wikipedia

15
Conclusion 1
16
Conclusion 1
Mena number of references to a nonexistent
article raised exponentially until the article
was created. Once article is created, mean rises
linearly or levels.
17
Inflationary/deflationary hypothesis
  • Inflationary hypothesis number of links to
    nonexistent articles increase at a higher rate
    than that of the new article creation
  • Wikipedia is located in a midpoint between the
    two scenarios (thin coverage vs. decline in
    growth rate)

18
Wikipedia growth
Incomplete include nonexistent articles and stubs
19
Wikipedia growth
  • Between 2003 and 2006, number of entries
    increased from 140,000 to 1.4 million and ration
    of complete/incomplete remained roughly the same
  • Growth of Wikipedia partly attributed to
    splitting of articles (depth in articles
    translate into breadth)
  • Rate of article creation vs rate of knowledge
    expansion ?

20
Wikipedia content
  • Process of adding new articles that depends on
    current nonexistent referenced articles leads to
    content balance
  • Articles are more likely to be written because
    they are popular (have many references leading to
    them) that because contributor is interested
  • Are not most references originating from an
    articles will link to an article similar in
    subject? (assumes knowledge is a fully connected
    graph)

21
Finding 1
  • Process of referencing an nonexistent article and
    subsequent definition of that article seemed to
    be a collaborative effort.
  • The person who referenced a nonexistent article
    and the person who started the referenced article
    was the same in only 3 of the cases
  • Wikipedia growth is limited by number of
    contributors not individual contributors!

22
Conclusion 2
  • Wikipedia is a scale-free network

23
Scale-Free Network
  • Degree of a node number of connections to other
    nodes
  • Degree distribution probability distribution of
    degrees over entire network
  • For degree j
  • P(j) nodes with degree j / nodes
  • Fraction of nodes with degree j to all nodes

24
Scale-Free Network
  • A network where degree distribution follows a
    power law
  • i.e. degree distribution approaches 1/js as j
    increases
  • Fraction of nodes with degree j decreases as j
    (number of connections) increases

25
Scale-Free network
Source Wikipedia
26
Building the network
  • Models explaining why Wikipedia is scale-free
  • Power laws result of an optimization process
  • Power laws result of growth model (preferential
    attachment model)
  • Simple network
  • Wikipedia
  • Expected reference

27
Building the network
28
  • Strong Regularities in Online Peer ProductionD.
    Wilkinson

29
Introduction
  • Open source software development, blogs, wikis,
    social networks
  • Some of most visited website and continue to
    grow
  • Online peer production share common macroscopic
    properties?

30
Objective
  • Describe strong macroscopic regularities in
    peoples contributions to PPS (distribution of
    user participation and activity per topic)
  • Examine basic dynamical rules guiding evolution
    of PPS
  • Why distribution of levels of user participation
    is power law?
  • Not a psychological analysis of contributors

31
Methodology
  • Examines 4 different PPS Wikipedia, Bugzilla,
    Digg, Essembly
  • Data analyzed are exhaustive involves all users
    and contributions

System Time span Users Topics contributions
Wikipedia 6y, 10m 5.07M 1.5M 50M
Bugzilla 6y, 7m 111K 357k 3.08M
Digg 3y 1.05M 3.57M 105M
Essembly 1y, 4m 12.04K 24.9K 1.31M
32
PPSs
  • Wikipedia
  • Essembly social network for individuals to
    discuss and vote on political matters and
    organize to take action
  • Bugzilla bug-tracking system where developers
    report and collaborate to fix bugs
  • Digg news aggregator

33
User Participation
  • Power law distribution few dedicated members
    account for most activity
  • Focus on inactive users (generality)
  • of Inactive
  • Wikipedia 71 of editors
  • Bugzilla 95 of commentors
  • Digg 61 of voters 56 of submitters
  • Essembly 83 of voters 53 submitters
  • Inactive
  • Digg Essembly 3 months
  • Wikipedia bugzilla 6 months

34
User Participation
Essembly Votes Digg Votes
Essembly Resolves Bugzilla comments
Wikipedia edits Digg submissions
35
User Contributions
  • Power law exponent is strongly related to the
    systems barrier to contribution (cost of
    contributions)
  • Both active and inactive users have distribution
    of contributions that follows a power law

36
Participation Momentum
  • When people stop participating?
  • Momentum associated with users participation
  • Probability of stop is inversely proportional to
    of contributions

37
Participation Momentum
38
Exponent Significance
  • Probability to contribute proportional to
    contribution cost (exponent)
  • Power law exponent reflects cost to make a
    contribution

39
User Participation
  • Distribution of count of all users
    (activeinactive) also follows power law but with
    smaller exponent

Inactive users
All users
Inactive users
40
Activity per topic
  • contributions/topic. (edits/article)
  • Popular topics attract more users ? more edits.
  • Results
  • Distribution of contributions/topic is lognormal
  • Lognormal mean and variance depend linearly on
    time for topics where novelty decay is not a
    factor
  • Contributions to a topic increases its visibility
    and popularity.

41
Activity per Topic
  • Contributions ? popularity ? more contributions
    (multiplicative reinforcement mechanism)

Wikipedia Essembly
Digg
42
Activity per Topic
Number of articles
Number of resolves
Log(number of edits)
Log(number of votes)
43
Activity per Topic
  • Variance and mean depend linearly on age (t) of
    topic

44
Popularity factor interface design
  • Digg vs. Essembly vs. Wikipedia
  • Small number of topics attracts vast majority of
    contributions (long-tail log dist. plots)

45
Discussion
  • How size of a group coactively working together
    affect results?

46
Sources
  • Wikipedia
  • D. Spinellis and P. Louridas, The Collaborative
    Organization of Knowledge
  • D. Wilkinson, Strong Regularities in Online Peer
    Production
Write a Comment
User Comments (0)
About PowerShow.com