Title: Social%20Knowledge%20Dynamics:%20A%20Case%20Study%20on%20Modeling%20Wikipedia
1Social Knowledge DynamicsA Case Study on
Modeling Wikipedia
The 10th HKBU-CSD Postgraduate Research Symposium
Supervisor
Prof. Jiming Liu
Department of Computer Science Hong Kong Baptist
University September, 2009
2Outline
- Wikipedia and Social Knowledge Dynamics
- Previous Work on Wikipedia
- Degree distribution
- Reciprocity and feedback loops
- Motifs
- Modeling Wikipedias Growth
- A model about reference
- A model about degree distribution
- AOC-based Models
- Conclusion
3Wikipedia
- Anyone can create, edit, as well as delete
- Some properties
- Each article can be treated as a collective
knowledge of a group of users - Users can exchange knowledge through talk
page - Users with similar knowledge may form
communities - The underlying structure of some article may
inversely influence users knowledge
4Social Knowledge Dynamics
Knowledge is embodied in people gathered in
communities and networks. The road to knowledge
is via people, conversations, connections and
relationships. Knowledge surfaces through dialog,
all knowledge is socially mediated and access to
knowledge is by connecting to people that know or
know who to contact. -- Denham Grey
- Social dynamics
- A society of individuals to react to inner
and/or outer changes - Global patterns can emerge from even simple
individuals - phase transitions, catastrophe, etc.
5Difficulties and Motivations
- Two levels of difficulty to discover global
emergence by local dynamic models - The definition of sensible and realistic
microscopic models (intact data is needed) - The usual problem of inferring the macroscopic
phenomena out of the microscopic dynamic models - Motivations of studying Wikipedia
- The formation of Wikipedia is a kind of social
knowledge dynamics (if treat articles as
knowledge) - Intact data for download
- Articles, categories, images and multimedia, talk
pages, redirect and broken links, and so on.
6Related Analysis on Wikipedia
- Treat Wikipedia as complex networks, where the
articles represent the nodes, and hyperlinks
represent links.
Degree distribution
Reciprocity and feedback loops
Motifs
7Degree distribution
- Degree measure the number of articles that link
into or out of - Meanings of degree
- Two articles sharing a link reflect some kind of
relations in term of their contents - Articles with high degree are more likely to be
common knowledge
8Observations Scale-free
The out-degree distribution of Japan Wikipedia.
(adopted from Fig. 3 in ref1.)
The in-degree distribution of Japan Wikipedia.
(adopted from Fig. 3 in ref1.)
Reference 1 V. Zlatic, M. Bozicevic, H.
Stefancic, and M. Domazet, Wikipedias
Collaborative Web-based Encyclopedias as Complex
Networks, Physical Review E 74, 016615, 2006.
9Scale-free and Phase Transition
The theory of phase transitions told us loud and
clear that the road from disorder to order is
maintained by the powerful forces of
self-organization and is paved by power laws. It
told us that power laws are the patent signatures
of self-organization in complex
systems. --Barabasi AL. 2002. Linked The new
science of networks. Cambridge Perseus
Publishing.
Similar results can be observed from Wikipedia
with other languages. What are the fundamental
principle behind the similar type of growth?
Preferential Attachment?
10Reciprocity and Feedback Loops
- Reciprocal links are just the links pointing from
the node i to the node j for which exists a link
pointing from node j to the node i. - Reciprocity qualifies mutual exchange between
two articles. - Feedback loops A loop with directed links that
start from and end with the same node.
The density of the links
11Feedback Loops in Ecological System
- The ecological study observed that the number of
feedback loops in the species network is
correlated with system lifetime.
State before crash
Normal State
Reference 2 R. Mehrotra, V. Soni, and S. Jain.
Diversity sustains an evolving network. Journal
of the Royal Society Interface, 6(38)793799,
2009.
12Motifs
- Motifs 3 are small subgraphs of networks, which
are used to systematically study similarity in
the local structure of networks.
Questions Do Wikipedia with different languages
share same functions? Is the formation of social
knowledge driven by the same fundamental function?
Reference 3 R. Milo, S. Itzkovitz, N. Kashtan,
R. Levitt, S. Shen-Orr, I. Ayzenshtat, M.
Sheffer, and U. Alon. Superfamilies of evolved
and designed networks. Science,
303(5663)15381542, 2004.
13Modeling Reference Growth
At each time step t, A number of entries and rt
references are added The references are
distributed among all entries following a
probability
Frequency distribution of the expected and actual
number of references added each month to each
article (adopted from Fig. 3b in 4).
The expected number of references added to entry
i at time t is
Reference 4 D. Spinellis and P. Louridas. The
collaborative organization of knowledge.
Communications of the ACM, 51(8)6873, 2008.
14Modeling about Degree Distribution
- The model consists of two steps
- A new node t attaches to a network with m
outgoing links. The probability that the given
link will attach itself to some node s is
proportional to the in-degree ki(s) of the node
s. - Every new link with the probability r, a new
reciprocal link is formed between node s and t.
Comparison of in-degree distribution. Chosen
parameters are t 94094, m 16.75, r0.18.
(adopted from 5)
Reference 5 Vinko et al. Model of wikipedia
growth based on information exchange via
reciprocal arcs. Physics and Society, 2009.
15Insufficiency (1)
- The above two models seems to reflect the
preferential attachment as a principle behind
scale-free phenomena - However, other researchers also show that
selective removal 6 can also formed the
scale-free distribution.
- The models for scale-free can be divided into two
groups - Scale-free as the result of an optimization or
phase transition process - Scale-free as the results of a growth model, such
as preferential attachment.
Reference 6 M. Salathé, Robert M May, and S.
Bonhoeffer, The Evolution of Network Topology by
Selective Removal, Journal of Royal Society,
Interface, 2(5) 533536, 2005.
16Insufficiency (2)
- The above two models are based on simple
stochastic processes - we should realize that the real Wikipedia is
driven by the social dynamics, including
user-user interactions, use-group interactions,
and group-group interactions, rather than the
simple stochastic processes.
17AOC-based Models
- Components of Autonomy-Oriented Computing
- Entities
- Interactions
- Behavioral rules
- Self-organizations
- Collective regulations
- Aggregations
Wikipedia Users Interact for a
page Behaviors Self-organized groups Feedbacks
Relationships
Behaviors
Used to solve large-scale dynamically-evolving,
and/or highly distributed computational problems.
Reference 7 M. Salathé, Robert M May, and S.
Bonhoeffer, The Evolution of Network Topology by
Selective Removal, Journal of Royal Society,
Interface, 2(5) 533536, 2005.
18Questions
- What are the fundamental behavioral rules (e.g.,
explicit/implicit optimization objectives) of
entities to form global patterns of Wikipedia? - How do entities self-organize themselves during
the evolution of Wikipedia? - Do these rules and self-organization reflect the
formation rule of social knowledge and social
organization?
19Three Possible Directions-1
- Wikipedia as a system
- As a collaborative system based solely on users
spontaneous actions, whats the driven of its
birth, boom, and death? - Existing results on ecosystems
- Large randomly assembled ecosystems tend to be
less stable as they increase in complexity, - the complexity is measured by the connectance and
the average interaction strength between species. - The typical lifetime of the system increase with
the diversity of its components.
20Three Possible Directions-2
- Topic evolution on Wikipedia
- We can treat the topic evolution on Wikipedia as
a results of user-to-user interactions, or even
the interaction among groups of users. (Like
cultural dynamics) - Existing work
- Static data mining (Time windows for dynamic
data mining) - Semantic/content analysis (What is the driven
force?)
21Three Possible Directions-3
- User community dynamics on Wikipedia
- Each user may associate with multiple articles
- For each article, there will be multiple users
acting on it - Communities may emerge from entities local
interactions, which may change over time - Existing work
- Modularity
- The linkage-based measurement cannot reflect
multiple relationships
22Three Levels of Consideration
- Describing the structure
- Such as food webs in ecosystems, neural networks
in organisms, etc. - How the structure influence what happens in the
system - Such as the food-web structure affects the
dynamics of population of species - How the structure change over time
- Species going extinct will influence the food-web
structure
23Conclusion
- The relation of Wikipedia and social knowledge
(Motivations) - The current studies on Wikipedia and their
insufficiency - The possibility of adopting AOC-based modeling
- Three research directions
24Thanks!