Title: The Collaborative Organization of Knowledge D. Spinellis and P. Louridas Strong Regularities in Online Peer Production D. Wilkinson
1The Collaborative Organization of KnowledgeD.
Spinellis and P. LouridasStrong Regularities in
Online Peer ProductionD. Wilkinson
Harvard University
- Ziyad Aljarboua
- Monday, November 10, 2008
2Intro - Wikipedia
- Free multilingual encyclopedia launched in 2001
- Operated by the non-profit Wikimedia Foundation
- Contains 2,610,291 articles in English and 10
million in total - 236 active language editions
- Content written by volunteers
3Intro - Wikipedia
- Developed by Jimmy Wales and Larry Sanger
- Times 2006 list of the worlds most influential
people - Largest and most popular general reference work
on the internet. Wikipedia
Source Wikipedia
4Intro - Wikipedia
- No formal peer-review and changes take effect
immediately - New articles are created by registered users but
can be edited by anyone - Redistribution, creation of derivative works and
commercial use of content is permitted - 25,000 to 60,000 page request per second
- 50 of traffic to Wikipedia comes from Google
5Intro - Wikipedia
Source Wikipedia
Wikipedia contributors by country
6Intro - Wikipedia
Source Wikipedia
Article Count from Jan, 2001 to Sep 2007
7Wikipedia - Concerns
- Michael Scott from the office"Wikipedia is the
best thing ever. Anyone in the world can write
anything they want about any subject, so you know
you are getting the best possible information". - Quality of articles undermined
- Bias Content reflects contributors interest
8Wikipedia - vandalism
9The Collaborative Organization of Knowledge
- Attempts to study Wikipedias growth how human
knowledge is recorded and organized through an
open collaborative process (in Wikipedia) - Examines relationship between existing and
referenced nonexistent articles - How existing entries foster development of new
entries?
10The Collaborative Organization of Knowledge
- Examines the recorded evolutionary development of
Wikipedia's structure through article revisions
and contributions - Motivation Wikipedias coverage has not declined
while its scope sharply increased.
11Growth
- Technologies and open participation policy behind
rapid growth - Edit with no prior authorization
- Edit history for all pages
- Watchlist for users to alerts them for changes in
their selected pages - Ability to revert changes if page is vandalized
- Ability to lock entries against revisions
- Easiness to link to other articles
- Categorizing articles using markup tags
12The Study
- Study processed all material on Wikipedia as of
February of 2006 (485GB worth of xml documents) - examined all recorded changes (28.2 million
revisions on 1.9 million pages) and how entries
were created and linked
13General findings
- Reverting is returning page to previous version
most of the time to undo vandalism - 4 of article revisions were reverts
- Average time to revert a vandalized page is 13
hours - 11 of pages that were reverted at least once had
been vandalized at least once - Most reverted and revised George W. Bush with
28,000 revisions (29,300 reverts and vandalism) - 2,441 entries (0.13) locked
- 20 of articles were stubs
14Conclusion 1
- Creation of new Wikipedia entries is not a random
process but is related to the references to
nonexistent articles - what drives Wikipedia growth is the inclusion of
red links, ie references to articles that do not
exist yet. Wikipedia
15Conclusion 1
16Conclusion 1
Mena number of references to a nonexistent
article raised exponentially until the article
was created. Once article is created, mean rises
linearly or levels.
17Inflationary/deflationary hypothesis
- Inflationary hypothesis number of links to
nonexistent articles increase at a higher rate
than that of the new article creation - Wikipedia is located in a midpoint between the
two scenarios (thin coverage vs. decline in
growth rate)
18Wikipedia growth
Incomplete include nonexistent articles and stubs
19Wikipedia growth
- Between 2003 and 2006, number of entries
increased from 140,000 to 1.4 million and ration
of complete/incomplete remained roughly the same - Growth of Wikipedia partly attributed to
splitting of articles (depth in articles
translate into breadth) - Rate of article creation vs rate of knowledge
expansion ?
20Wikipedia content
- Process of adding new articles that depends on
current nonexistent referenced articles leads to
content balance - Articles are more likely to be written because
they are popular (have many references leading to
them) that because contributor is interested - Are not most references originating from an
articles will link to an article similar in
subject? (assumes knowledge is a fully connected
graph)
21Finding 1
- Process of referencing an nonexistent article and
subsequent definition of that article seemed to
be a collaborative effort. - The person who referenced a nonexistent article
and the person who started the referenced article
was the same in only 3 of the cases - Wikipedia growth is limited by number of
contributors not individual contributors!
22Conclusion 2
- Wikipedia is a scale-free network
23Scale-Free Network
- Degree of a node number of connections to other
nodes - Degree distribution probability distribution of
degrees over entire network - For degree j
- P(j) nodes with degree j / nodes
- Fraction of nodes with degree j to all nodes
24Scale-Free Network
- A network where degree distribution follows a
power law - i.e. degree distribution approaches 1/js as j
increases - Fraction of nodes with degree j decreases as j
(number of connections) increases
25Scale-Free network
Source Wikipedia
26Building the network
- Models explaining why Wikipedia is scale-free
- Power laws result of an optimization process
- Power laws result of growth model (preferential
attachment model) - Simple network
- Wikipedia
- Expected reference
27Building the network
28- Strong Regularities in Online Peer ProductionD.
Wilkinson
29Introduction
- Open source software development, blogs, wikis,
social networks - Some of most visited website and continue to
grow - Online peer production share common macroscopic
properties?
30Objective
- Describe strong macroscopic regularities in
peoples contributions to PPS (distribution of
user participation and activity per topic) - Examine basic dynamical rules guiding evolution
of PPS - Why distribution of levels of user participation
is power law? - Not a psychological analysis of contributors
31Methodology
- Examines 4 different PPS Wikipedia, Bugzilla,
Digg, Essembly - Data analyzed are exhaustive involves all users
and contributions
System Time span Users Topics contributions
Wikipedia 6y, 10m 5.07M 1.5M 50M
Bugzilla 6y, 7m 111K 357k 3.08M
Digg 3y 1.05M 3.57M 105M
Essembly 1y, 4m 12.04K 24.9K 1.31M
32PPSs
- Wikipedia
- Essembly social network for individuals to
discuss and vote on political matters and
organize to take action - Bugzilla bug-tracking system where developers
report and collaborate to fix bugs - Digg news aggregator
33User Participation
- Power law distribution few dedicated members
account for most activity - Focus on inactive users (generality)
- of Inactive
- Wikipedia 71 of editors
- Bugzilla 95 of commentors
- Digg 61 of voters 56 of submitters
- Essembly 83 of voters 53 submitters
- Inactive
- Digg Essembly 3 months
- Wikipedia bugzilla 6 months
34User Participation
Essembly Votes Digg Votes
Essembly Resolves Bugzilla comments
Wikipedia edits Digg submissions
35User Contributions
- Power law exponent is strongly related to the
systems barrier to contribution (cost of
contributions) - Both active and inactive users have distribution
of contributions that follows a power law
36Participation Momentum
- When people stop participating?
- Momentum associated with users participation
- Probability of stop is inversely proportional to
of contributions
37Participation Momentum
38Exponent Significance
- Probability to contribute proportional to
contribution cost (exponent) - Power law exponent reflects cost to make a
contribution
39User Participation
- Distribution of count of all users
(activeinactive) also follows power law but with
smaller exponent
Inactive users
All users
Inactive users
40Activity per topic
- contributions/topic. (edits/article)
- Popular topics attract more users ? more edits.
- Results
- Distribution of contributions/topic is lognormal
- Lognormal mean and variance depend linearly on
time for topics where novelty decay is not a
factor - Contributions to a topic increases its visibility
and popularity.
41Activity per Topic
- Contributions ? popularity ? more contributions
(multiplicative reinforcement mechanism)
Wikipedia Essembly
Digg
42Activity per Topic
Number of articles
Number of resolves
Log(number of edits)
Log(number of votes)
43Activity per Topic
- Variance and mean depend linearly on age (t) of
topic
44Popularity factor interface design
- Digg vs. Essembly vs. Wikipedia
- Small number of topics attracts vast majority of
contributions (long-tail log dist. plots)
45Discussion
- How size of a group coactively working together
affect results?
46Sources
- Wikipedia
- D. Spinellis and P. Louridas, The Collaborative
Organization of Knowledge - D. Wilkinson, Strong Regularities in Online Peer
Production