Title: Lotkaian Informetrics and applications to social networks
1Lotkaian Informetrics and applications to social
networks
- L. Egghe
- Chief Librarian Hasselt UniversityProfessor
Antwerp UniversityEditor-in-Chief Journal of
Informetrics - leo.egghe_at_uhasselt.be
21-dimensional informetrics
- authors in a field
- journals in a field
- articles in a field
- references (or citations) in a field
- borrowings in a library
- websites, hosts,
- web citations to a paper
- in- (or out-) links to/from a website
- downloads of an article
3Growth
- Exponential growth
-
- All new fields grow exponentially
- Otherwise there is S-shaped growth.
4 web servers versus time
5(No Transcript)
6(No Transcript)
72- dimensional informetrics
- authors in a field (sources)
- articles in a field (items)
- indicating which author has written which
papers - S Set of sources
- I set of items
-
- IPP Information Production Process
8Examples of IPPs
S F I
Authors Articles
Journals Articles
Articles Citations (to/from)
Books Borrowings
Words ( types) Use of words in a text ( tokens)
Web sites Hyperlinks (in-/out-)
Web sites Web pages
Cities/villages Inhabitants
Employees Their production
Employees Their salaries
9- size-frequency function
- for n 1,2,3,
- sources with n items
- rank-frequency function
- for r 1,2,3,
- items in the source on rank r
- (sources are ranked in decreasing order of
number of items they have)
10Continuous model
- Source densities
- Item densities
11Lotkaian Informetrics
- The law of Lotka and the law of Zipf
- Lotka (1926)
. The value is a
turning point in informetrics (see further).
12- Lotkas law is equivalent with Zipfs law
Linguistics Zipfs law in econometrics is called
Paretos law
13Dependence of G on . Existence of a Groos
droop if .
14log-log scale
- decreasing straight line with slope
15Rank-frequency distributions for websites
16The scale-free property
17Theorem (i)?(ii)
- f is continuous, decreasing and scale-free
- f is a decreasing power function
- such that
- i.e. Lotkas law
18- Explanation of Lotkas law based on exponential
growth of sources and items (Naranan (1970)) and
an interpretation of Lotkaian IPPs as
self-similar fractals - (Egghe (2005))
- Fractals and fractal dimension
19- Divide a line piece into 3 equal parts
- ? we need 331 line pieces of this length to
cover the original line piece - 3 ? need 331 ? dim1
20- Divide the sides of a square into 3 equal parts ?
we need 932 squares with this side length to
cover the original square - 3 ? need 932 ? dim2
- The same for a cube
- 3 ? need 2733 ? dim3
21Construction of the triadic Koch curve
22- For the triadic Koch curve
- 3 ? need 43D ? dimD
- with
The Koch curve is a proper fractal with fractal
dimension Complexity theory Fractal theory
Mandelbrot
23Naranan (Nature, 1970)
- Theorem
- (i) The number of sources grows exponentially in
time t -
- (ii) The number of items in each source grows
exponentially in time - (iii) The growth rate in (ii) is the same for
every source (ii) and (iii) together imply a
fixed exponential function -
- for the number of items in each source at time
t.
24- Then this IPP is Lotkaian, i.e. the law of Lotka
applies if f(p) denotes the number of sources
with p items, we have - where
25Egghe (2005) (Book and JASIST)
- (i) The number of line pieces grows
exponentially in time t, here proportional with
4t - (ii),(iii) 1/length of each line piece grows
exponentially in time t and with the same
growth rate 3. Hence we have growth proportional
with 3t.
26- Rephrased in terms of informetrics
- a (Lotkaian) IPP is a self-similar fractal and
its fractal dimension is given by the logarithm
of the growth rate of the sources, divided by the
logarithm of the growth rate of the items. - (which can be gt or lt 1). Hence, the exponent in
Lotkas law satisfies the important relation - This result was earlier seen by Mandelbrot but
only in the context of (artificial) random texts
(hence in linguistics).
27Further applications of Lotkaian Informetrics
- Concentration theory (inequality theory) Lorenz
curves (cf. econometrics). - Egghe (2005) (Book, Chapter IV).
- Fractional modelling of authorship (case of
multi-authored articles) determine - authors with articles
- (fractional counting an author in an
- m-authored paper receives a score ).
28Theoretical and experimental fractional frequency
distributions (case of i4).
29- Dynamics of Lotkaian IPPs, described via
transformations on the sources and on the items
includes the description of dynamics of networks.
- Relations with 3-dimensional informetrics See
new journal - L. Egghe. General evolutionary theory of IPPs
and applications to the evolution of networks.
Journal of Informetrics 1(2), 115-122, 2007
30- Item transformation
- Source transformation
- New rank-frequency function
31- Theorem New size-frequency function
- where
32- Case is example of linear 3
dimensional informetrics - Sources1 ? Items1 Sources2 ? Items2
- Examples
- Webpages ? hyperlinks ? use of hyperlinks
- Library subject categories ? books
- ? borrowings
- See further.
- Back to the general case.
33- Power law transformations in Lotkaian IPPs
34- Theorem
- is only dependent on b/c due to the
- scale-free nature of Lotkaian systems.
35- Corollary
- With this, one can study the evolution of an IPP,
e.g. a part of WWW V. Cothey (2007) confirms
theory except in one case where non-Lotkaian
evolution is found, probably due to automatic
creation of web pages (deviation from a social
network).
36- Further application
- IPPs without low productive sources
- (Egghe and Rousseau (2006))
- Take sources remain but they grow
in number of items - Now
37- and (since )
- Evolution decreasing Lotka exponent and no low
productive sources
38Examples
- Country sizes data from www.gazetteer.de (July
10, 2005) 237 countries 1.69 (best fit) - Municipalities in Malta (1997 data) 67
municipalities 1.12 (best fit) - Database sizes on the topic fuzzy set theory
(20 largest databases on this topic) (Hood and
Wilson (2003)) - 1.09 (best fit)
- Unique documents in databases (20 databases
above) 1.33 (best fit).
39- Application of Lotkas law to the modelling of
the cumulative first-citation distribution - i.e.
- the distribution over time at which an article
receives its first citation.
40- The time t1 at which an article receives its
first citation is an important indicator of the
visibility of research. - At t1 the article switches its status from
unused to used. - t1 is a measure of immediacy but, of course,
different from the immediacy index (Thomson
Scientific).
41- The distribution of t1 over a group of articles
is the topic of the present study. We will study
the cumulative first-citation distribution - cumulative fraction of all papers
that have, at t1, at least 1 citation.
42- Rousseau (1994) uses two different differential
equations to model two types of graphs a concave
one and an S-shaped one. These equations are not
explained and are not linked to any informetric
distribution.
43- In Egghe (2000), I use only 2 elementary
informetric tools - the density function of citations to an
article, t time after its publication
(exponential, ), - the density function of the number of
papers with A citations in total (Lotka,
), (only ever cited papers
are used here).
44- Normalizing to distributions
- becomes for an article
with - A citations in total
- becomes but we will use
- the fraction of ever cited articles, in
order to include also the never cited articles.
45- Theorem
- concave if
- S-shaped if
- , hence explaining both shapes in one model.
- Note the turning point of .
46- Proof A first citation is received if
-
-
- ()
-
- ? Cumulative fraction of all articles that are
already cited at time t1 - ()
-
- ? () into () yields
47Motylev (1981)
48 49Rousseau (1994)
JACS to JACS data of Rousseau Time-unit 2
weeks, 4-year period