Title: Diapositiva 1
1Social Networks, the case of Wikipedia
- A. Capocci(1), D. Donato (2), S. Leonardi(2), F.
Rao (1), L. Salete Buriol(2), V.Zlatic (1), G.C.
(1) - CNR-INFM Centre SMC Dep. Physics University
Sapienza Rome, Italy - Dep. of Informatics, University Sapienza Rome,
Italy
PRE 74, 036116 (2006) EPL 81, 28006 (2008)
2INTRODUCTION
3(No Transcript)
4INTRODUCTION
Wikipedia in other languages You may read and
edit articles in many different
languages Wikipedia encyclopedia languages with
over 100,000 articles Deutsch (German)
Français (French) Italiano (Italian)
(Japanese) Nederlands (Dutch) Polski (Polish)
Português (Portuguese) Svenska (Swedish)
Wikipedia encyclopedia languages with over
10,000 articles ??????? (Arabic) ?????????
(Bulgarian) Català (Catalan) Cesky (Czech)
Dansk (Danish) Eesti (Estonian) Español
(Spanish) Esperanto Galego (Galician) ?????
(Hebrew) Hrvatski (Croatian) Ido Bahasa
Indonesia (Indonesian) ??? (Korean) Lietuviu
(Lithuanian) Magyar (Hungarian) Bahasa Melayu
(Malay) Norsk bokmål (Norwegian) Norsk
nynorsk (Norwegian) Româna (Romanian) ???????
(Russian) Slovencina (Slovak) Slovencina
(Slovenian) ?????? (Serbian) Suomi (Finnish)
Türkçe (Turkish) ?????????? (Ukrainian) ??
(Chinese) Wikipedia encyclopedia languages with
over 1,000 articles Alemannisch (Alemannic)
Afrikaans Aragonés (Aragonese) Asturianu
(Asturian) Az?rbaycan (Azerbaijani)
Bân-lâm-gú (Min Nan) ?????????? (Belarusian)
Bosanski (Bosnian) Brezhoneg (Breton) ?a???
?e??? (Chuvash) Corsu (Corsican) Cymraeg
(Welsh) ???????? (Greek) Euskara (Basque)
????? (Persian) Føroyskt (Faroese) Frysk
(Western Frisian) Gaeilge (Irish) Gàidhlig
(Scots Gaelic) ?????? (Hindi) Interlingua
Íslenska (Icelandic) Basa Jawa (Javanese)
??????? (Georgian) ????? (Kannada) Kurdî /
????? (Kurdish) Latina (Latin) Latvieu
(Latvian) Lëtzebuergesch (Luxembourgish)
Limburgs (Limburgish) ?????????? (Macedonian)
????? (Marathi) Napulitana (Neapolitan)
Occitan ???? (Ossetic) Plattdüütsch (Low
Saxon) Scots Sicilianu (Sicilian) Simple
English Shqip (Albanian) Sinugboanon
(Cebuano) Srpskohrvatski/??????????????
(SerboCroatian) ????? (Tamil) Tagalog
??????? (Thai) Tatarça (Tatar) ??????
(Telugu) Ti?ng Vi?t (Vietnamese) Walon
(Walloon) Complete list Multilingual
coordination Start a Wikipedia in another
language
5INTRODUCTION
A Nature investigation aimed to find if Wikipedia
is an authoritative source of information with
respect to established sources as Encyclopedia
Britannica.
- Among 42 entries tested, the difference in
accuracy was not particularly great - the average science entry in Wikipedia contained
around four inaccuracies - the one in Britannica, about three.
- On the other hand the articles on Wikipedia are
longer on average than those of Britannica. This
accounts for a lower rate of errors in Wikipedia.
- In a survey of more than 1,000 Nature authors
- 70 had heard of Wikipedia
- of those
- 17 of those consulted it on a weekly basis.
- less than 10 help to update it
(Nature 438, 900-901 2005)
6(No Transcript)
7INTRODUCTION
- sociological reasons the encyclopedia collects
pages written by a number of indipendent and
eterogeneous individuals. Each of them
autonomously decides about the content of the
articles with the only constraint of a prefixed
layout. The autonomy is a common feature of the
content creation in the Web. The wikipedia
authors community is formed by members whose
only wish is to make available to the world
concepts and topics that they consider
meaningful. In some sense, tracing the evolution
of the wikipedia subsets should mirror the
develop of significant trends within each
linguistic community. - generation on time wikipedia provides time
information associated with nodes. Moreover, it
provides old information time information for
the creation and the modifications for each page
on the dataset. - independency of external links wikipedia
articles link mainly to articles on the same
dataset. - variety of graph sizes it can be collected one
graph by language, and the graph dimensions vary
from a few hundred pages up to half million pages.
8DATA
We generated six wikigraphs, wikiEN, wikiDE,
wikiFR, wikiES, wikiIT and wikiPT, generated from
the English, German, French, Spanish, Italian and
Portuguese datasets, respectively. The graphs
were obtained from an old dump of June 13, 2004.
We are not using the current data due to disk
space restrictions. The English dataset of June
2005 has more than 36 GB compacted, that is about
200 GB expanded.
The page that was mostly visited was the main
pages for wikiEN, wikiDE, wikiFR and wikiES,
while that for the datasets wikiIT and wikiPT
there were no visits associated with the pages.
9DATA
- SCC (Strongly Connected Component) includes
pages that are mutually reachable by traveling on
the graph - IN component is the region from which one can
reach SCC - OUT component encompasses the pages reached from
SCC. - TENDRILS are pages reacheable from the IN
component,and not pointing to SCC or OUT region
TENDRILS also includes those pages that point to
the OUT region not belonging to any of the other
de?ned regions. - TUBES connect directly IN and OUT regions,
- DISCONNECTED regions are those isolated from the
rest.
The Bow-tie structure, found in the WWW (Broder
et al. Comp. Net. 33, 309, 2000)
10DATA
The measure/size of the Wikigraph for the various
languages.
The percentage of the various components of the
Wikigraph for the various languages.
11DATA
The Degree shows fat tails that can be
approximated by a power-law function of the kind
P(k) k-g Where the exponent is the same both
for in-degree and out-degree.
In the case of WWW 2 gin 2.1
indegree(empty) and outdegree(filled)
Occurrency distributions for the Wikgraph in
English (?) and Portuguese (?).
12DATA
As regards the assortativity (as measured by the
average degree of the neighbours of a vertex with
degree k) there is no evidence of any assortative
behaviour.
The average neighbors indegree, computed along
incoming edges, as a function of the indegree
for the English (?) and Portuguese (?)
13MODEL
- We introduced an evolution rule, similar to other
models of - rewiring already considered,
- At each time step, a vertex is added to the
network. It is connected to the existing
vertices by M oriented edges the direction of
each edge is drawn at random - with probability R1 the edge leaves the new
vertex pointing to an existing one chosen with
probability proportional to its indegree - with probability R2, the edge points to the new
vertex, and the source vertex is chosen with
probability proportional to its outdegree. - Finally, with probability R3 1 - R1 - R2 the
edge is added between existing vertices the
source vertex is chosen with probability
proportional to the outdegree, while the
destination vertex is chosen with probability
proportional to the indegree.
See for example Krapivsky Rodgers and Redner
PRL 86 5401 (2001)
14MODEL
At each time step one adds a node and M edges.
1. with probability R1 the edge leaves the new
node and points an existing node chosen with
probability proportional to its in-degree.
2. with probability R2 the edge points the new
node and leaves an existing node chosen with
probability proportional to its out-degree
3. with probability R3 1 R1 - R2 the edge
points an existing node with probability
proportional to its in-degree and leaves and
leaves an existing node chosen with probability
proportional to its out-degree.
15MODEL
The parameters have a physical meaning and can
been measured on real data. In the english case,
for instance, this yields R1 0.026, R2
0.091 in the data we have, M 10
By approximating discrete time variation by
derivativatives with respect to the continuous
variable t, one can write and solve the following
rate equations for the in- and out-degree dkin
/dt (R1R3) kin t-1 dkout /dt (R2R3) kout t-1
16MODEL
By solving the rate equation, one obtains the
time evolutions and, with little algebra, the
distributions of the in- and out-degree
17CONCLUSION
- We have a structure that resembles the bow-tie
of the WWW - We have a power-law decay for the degree
distributions and also - a power-law decay for the number of one page
updates - Preferential Attachment in the Rewiring seems to
be the driving force - in the evolution of the system
- The microscopic structure of rewiring is very
different from that of WWW - In principle a user can change any series of
edges and add as many - pages as wanted. Still most of the quantities
are similar
18INCOMING CONFERENCE
19INCOMING CONFERENCE
Uri Alon Joseph Stiglitz (tbc)
Alessandro Vespignani Alain Barrat
Ginestra Bianconi Dirk Brockmann
Debora Donato James Fowler Kwang-Il
Goh (tbc) Shlomo Havlin Dirk Helbing
Matthew O. Jackson János Kertész
(tbc) Amos Maritan José Fernando
Mendes Luciano Pietronero Frank
Schweitzer H. Eugene Stanley Marc
Vidal
20Thanks!
http//www.guidocaldarelli.com
21(No Transcript)