Title: Connectivity and the Small World
1Connectivity and the Small World
- Overview
- Background
- de Pool and Kochen
- Random Biased networks
- Rapoports work on diffusion
-
- Travers and Milgram
- Argument
- Method
- Watts
- Argument
- Findings
- Methods
- Biased Networks
- Reachability Curves
- Calculating L and C
-
2Connectivity and the Small World
Started by asking the probability than any two
people would know each other. Extended to the
probability that people could be connected
through paths of 2, 3,,k steps Linked to
diffusion processes If people can reach others,
then their diseases can reach them as well, and
we can use the structure of the network to model
the disease. The reachability structure was
captured by comparing curves with a random
network, which we will do later today.
3Connectivity and the Small World
Travers and Milgrams work on the small world is
responsible for the standard belief that
everyone is connected by a chain of about 6
steps.
Two questions Given what we know about networks,
what is the longest path (defined by handshakes)
that separates any two people? Is 6 steps a
long distance or a short distance?
4Longest Possible Path Two Hermits on the
opposite side of the country
OH Hermit
Store Owner
Truck Driver
Manager
About 12-13 steps.
Corporate Manager
Corporate President
Congress Rep.
Congress Rep.
Corporate Manager
Corporate President
Truck Driver
Manager
Store Owner
Mt. Hermit
5What if everyone maximized structural holes?
Associates do not know each other Results in an
exponential growth curve. Reach entire planet
quickly.
6Random graph theory shows us that we could reach
people quite quickly if ties were random.
What if people know each other randomly?
7Random Reachability By number of close friends
8Milgrams test Send a packet from sets of
randomly selected people to a stockbroker in
Boston.
Experimental Setup
Arbitrarily select people from 3 pools a)
People in Boston b) Random in Nebraska c)
Stockholders in Nebraska
9Milgrams Findings Distance to target person,
by sending group.
10Most chains found their way through a small
number of intermediaries.
What do these two findings tell us of the global
structure of social relations?
11Milgrams Findings Length of completed chains
12Duncan Watts Networks, Dynamics and the
Small-World Phenomenon
Asks why we see the small world pattern and what
implications it has for the dynamical properties
of social systems. His contribution is to show
that globally significant changes can result from
locally insignificant network change.
13Duncan Watts Networks, Dynamics and the
Small-World Phenomenon
Watts says there are 4 conditions that make the
small world phenomenon interesting
1) The network is large - O(Billions) 2) The
network is sparse - people are connected to a
small fraction of the total network 3) The
network is decentralized -- no single (or small
) of stars 4) The network is highly clustered --
most friendship circles are overlapping
14Duncan Watts Networks, Dynamics and the
Small-World Phenomenon
Formally, we can characterize a graph through 2
statistics.
- 1) The characteristic path length, L
- The average length of the shortest paths
connecting any two actors. - (note this only works for connected graphs)
- 2) The clustering coefficient, C
- Version 1 the average local density. That is,
Cv ego-network density, and C Cv/n - Version 2 transitivity ratio. Number of closed
triads divided by the number of closed and open
triads. - A small world graph is any graph with a
relatively small L and a relatively large C.
15The most clustered graph is Watts Caveman
graph
16Duncan Watts Networks, Dynamics and the
Small-World Phenomenon
C and L as functions of k for a Caveman graph of
n1000
17Duncan Watts Networks, Dynamics and the
Small-World Phenomenon
Compared to random graphs, C is large and L is
long. The intuition, then, is that clustered
graphs tend to have (relatively) long
characteristic path lengths. But the small world
phenomenon rests on just the opposite high
clustering and short path distances. How is this
so?
18Duncan Watts Networks, Dynamics and the
Small-World Phenomenon
A model for pair formation, as a function of
mutual contacts formations.
Using this equation, a produces networks that
range from completely ordered to completely
random. (Mij is the number of friends in common,
p is a baseline probability of a tie, and k is
the average degree of the graph)
19Duncan Watts Networks, Dynamics and the
Small-World Phenomenon
A model for pair formation, as a function of
mutual contacts formations.
20Duncan Watts Networks, Dynamics and the
Small-World Phenomenon
CLarge, L is Small SW Graphs
21Why does this work? Key is fraction of shortcuts
in the network
In a highly clustered, ordered network, a single
random connection will create a shortcut that
lowers L dramatically
Watts demonstrates that Small world graphs occur
in graphs with a small number of shortcuts
22Empirical Examples
1) Movie network Actors through Movies Lo/Lr
1.22 Co/Cr 2925 2) Western Power Grid
Lo/Lr 1.50 Co/Cr 16 3) C. elegans Lo/Lr
1.17 Co/Cr 5.6
23What are the substantive implications? Return
to the initial interest in connectivity disease
diffusion
1) Diseases move more slowly in highly clustered
graphs (fig. 11) - not a new finding. 2) The
dynamics are very non-linear -- with no clear
pattern based on local connectivity.
Implication small local changes (shortcuts) can
have dramatic global outcomes (disease diffusion)
24Duncan Watts Networks, Dynamics and the
Small-World Phenomenon
How do we know if an observed graph fits the SW
model?
Random expectations For basic one-mode networks
(such as acquaintance nets), we can get
approximate random values for L and C
as Lrandom ln(n) / ln(k) Crandom k /
n As k and n get large. Note that C essentially
approaches zero as N increases, and K is assumed
fixed. This formula uses the density-based
measure of C.
25Duncan Watts Networks, Dynamics and the
Small-World Phenomenon
How do we know if an observed graph fits the SW
model?
One problem with using the simple formulas for
most extant data on large graphs is that, because
the data result from people overlapping in
groups/movies/publications, necessary clustering
results from the assignment to groups.
G1 G2 G3 G4 G5 Amy 1 0 1 0
0 Billy 0 1 0 1 0 Charlie 0 1 0 1
0 Debbie 1 0 0 0 0 Elaine 1 0 1 0
1 Frank 0 1 0 1 0 George 0 1 0 1 0
. . . . LINES CUT . . . . . William 0 1 0
0 0 Xavier 0 1 0 1 0 Yolanda 1 0 1 0
0 Zanfir 0 1 1 1 1 12 14 9 14 5
26Duncan Watts Networks, Dynamics and the
Small-World Phenomenon
How do we know if an observed graph fits the SW
model?
Newman, M. E. J. Strogatz, S. J., and Watts, D.
J. Random Graphs with arbitrary degree
distributions and their applications Phys. Rev.
E. 2001
This paper extends the formulas for expected
clustering and path length using a generating
functions approach, making it possible to
calculate E(C,L) for graphs with any degree
distribution. Importantly, this procedure also
makes it possible to account for clustering in a
two-mode graph caused by the distribution of
assignment to groups.
27Duncan Watts Networks, Dynamics and the
Small-World Phenomenon
How do we know if an observed graph fits the SW
model?
Newman, M. E. J. Strogatz, S. J., and Watts, D.
J. Random Graphs with arbitrary degree
distributions and their applications Phys. Rev.
E. 2001
Where N is the size of the graph, Z1 is the
average number of people 1 step away (degree) and
z2 is the average number of people 2 steps away.
Theoretically, these formulas can be used to
calculate many properties of the network
including largest component size, based on degree
distributions. A word of warning The math in
these papers is not simple, sharpen your calculus
pencil before reading the paper
28Duncan Watts Networks, Dynamics and the
Small-World Phenomenon
How do we know if an observed graph fits the SW
model?
Since C is just the transitivity ratio, there are
a number of good formulas for calculating the
expected value. Using the ratio of complete to
(incomplete complete) triads, we can use the
expected values from the triad distribution in
PAJEK for a simple graph or we can use the
expected value conditional on the dyad types (if
we have directed data) using the formulas in SPAN
and WF.
29The line of work most closely related to the
small world is that on biased and random
networks. Recall the reachability curves in a
random graph
30For a random network, we can estimate the trace
curves with the following equation
Where Pi is the proportion of the population
newly contacted at step i, Xi is the cumulative
number contacted by step i, and a is the mean
number of contacts people have. This model
describes the reach curves for a random network.
The model is based on a, which (essentially)
tells us how many new people we will reach from
the new people we just contacted. This is based
on the assumption that peoples friends know each
other at a simple random rate.
31- For a real network, peoples friends are not
random, but clustered. We can modify the random
equation by adjusting a, such that some portion
of the contacts are random, the rest not. This
adjustment is a bias - I.e. a non-random
element in the model -- that gives rise to the
notion of biased networks. People have studied
(mathematically) biases associated with - Race (and categorical homophily more generally)
- Transitivity (Friends of friends are friends)
- Reciprocity (i--gt j, j--gt i)
- There is still a great deal of work to be done in
this area empirically, and it promises to be a
good way of studying the structure of very large
networks.
32Figure 1. Connectivity Distribution for a large
Jr. High School (Add Health data)
Random graph
Observed
33How useful are C L for characterizing a network?
These two graphs both have high C
34To calculate Average Path Length and Clustering
in UCINET
Clustering Coefficient
- Load the network
- To keep w. Watts, make the network symmetric
- Transform gt Symmetrasize gt Maximum
- Note what you saved the graph as
- Calculate clustering coefficient
- Network gt Network Properties gt Clustering
Coefficient - The local density version is the overall
clustering coefficient - The transitivity version is the weighted
clustering coefficient
35To calculate Average Path Length and Clustering
in UCINET
Average Length
- Load the network
- To keep w. Watts, make the network symmetric
- Transform gt Symmetrasize gt Maximum
- Note what you saved the graph as
- Calculate Distance
- Network ? cohesion ? Distance
- Tools ? Statistics ? Univariate ? Matrix