Title: An Introduction to
1An Introduction to Social Network Analysis
James Moody Department of Sociology The Ohio
State University
2Introduction
The world we live in is connected
Jim Moody
3Introduction
These patterns of connection form a social space.
Social network analysis maps and analyzes this
social space.
4Adolescent Social Structure
5(No Transcript)
6(No Transcript)
7Introduction
Yet standard social science analysis methods do
not take this space into account. Moreover, the
complexity of the relational world makes it
impossible (in most cases) to understand this
connectivity using only our intuitive
understanding of a setting.
8Introduction
- Why networks matter
- Intuitive information travels through contacts
between actors, which can reflect a power
distribution or influence attitudes and
behaviors. Our understanding of social life
improves if we account for this social space. - Less intuitive patterns of inter-actor contact
can have effects on the spread of goods or
power dynamics that could not be seen focusing
only on individual behavior.
9Introduction
- Social network analysis is
- a set of relational methods for systematically
understanding and identifying connections among
actors - a body of theory relating to types of observable
social spaces and their relation to individual
and group behavior.
10Introduction
- Network analysis assumes that
- How actors behave depends in large part on how
they are linked together - Example Adolescents with peers that smoke are
more likely to smoke themselves. - The success or failure of organizations may
depend on the pattern of relations within the
organization - Example The ability of companies to survive
strikes depends on how product flows through
factories and storehouses. - (continued..)
11Introduction
Network analysis assumes that
- Patterns of relations reflect the power
structure of a given setting, and clustering may
reflect coalitions within the group - Example Overlapping voting patterns in a
coalition government
12Introduction
An information network Email exchanges within
the Reagan white house, early 1980s (source
Blanton, 1995)
13Introduction
Power positions and potential influence
14Overview
15Basic Concepts
- Origins of network analysis
- Beginning in the 1930s, a systematic approach to
theory and research, based on the notion that
relations matter, began to emerge - In 1934 Jacob Moreno introduced the ideas and
tools of sociometry - At the end of World War II, Alex Bavelas founded
the Group Networks Laboratory at M.I.T.
16Basic Concepts
- From the outset, the network analysis has been
- a. guided by formal theory organized in
mathematical terms, and - b. grounded in the systematic analysis of
empirical data - In the 1970s, when modern discrete combinatorics
(esp. graph theory) developed rapidly and
powerful computers became readily available that
the study of social networks began to flourish
17Basic Concepts
- Actors are nodes
- Ideas, Papers, Events, Individuals,
- Organizations, Nations
- Relations are lines between pairs of nodes
- Symmetric (shares a room with)
- Asymmetric (gives an order to)
- Valued (number of times seen together)
18Basic Concepts
- Network data are familiar to you
- For example
- - Personal, face-to-face contact
- - Telephone contact
- - Email contact
- - Contact through faxes or wires
- - Snail-mail contact
- - Membership in the same organization
- - Attendance at the same meetings
- - Graduates of the same university
19Basic Concepts
For example, you might be tracking the activities
of a number of people in related, but not
identical cases, including meetings they
attended. You may know little of the content of
the event, or what they may have said to each
other, only whether particular people were at the
event. Your data might look like
20Basic Concepts
11.19.2001. Meeting at Brussels.
Attending Smith, Johnson, Davis, James,
Jackson 12.22.2001. Meeting at Paris.
Attending Johnson, James, Jones,
Wilson 1.12.2001. Meeting in New York.
Attending Jones, Carter, Burns 2.14.2001.
Meeting in Denver. Attending Wilson, Burns,
Wilf, Newman
(Red bold indicates people who are the focus of
an investigation)
21Basic Concepts
While perhaps not immediately apparent when
looking at the list of names, a simple algorithm
reveals connections among these actors.
22Basic concepts
Types of network data 1) Ego-network - Have
data on a respondent (ego) and the people they
are connected to (alters) - May include
estimates of connections among alters
23Basic concepts
Types of network data 2) Partial network - Ego
networks plus some amount of tracing to reach
contacts of contacts - Something less than full
account of connections among all pairs of actors
in the relevant population - Example CDC
Contact tracing data for STDs
24Basic concepts
Types of network data 3) Complete - Data on all
actors within a particular (relevant) boundary -
Never exactly complete, but boundaries are set -
Example Coauthorship data among all writers in
the social sciences
25Examples linked levels of data
Actor
Key contact
Primary Relation
26Why networks matter
Consider the following (much simplified) scenario
- Probability that actor i passes information to
actor j (pij)is a constant over all relations
0.6 - S T are connected through the following
structure
S
T
- The probability that S passes the information to
T through either path would be 0.09
27Probability of transfer of information over
independent paths
- The probability that the information passes from
i to j is assumed constant at pij. - The probability that the information passes
through multiple links (i to j, and from j to k)
is the joint probability of each (link1 and link2
and link k) pijd where d is the path
distance. - To calculate the probability of of the
information passing through multiple paths, use
the compliment of it not passing through any
paths. The probability of not passing through
path l is 1-pijd, and thus the probability of not
passing through any path is (1-pijd)k, where k is
the number of paths - Thus, the probability of i passing the
information to j given k independent paths is
Why matter
Distance
28Probability of information passing over
non-independent paths
- To get the probability that I passes the
information to j given that paths intersect at 4,
I calculate
Using the independent paths formula.
29Why networks matter
Now consider the following (similar?) scenario
S
T
- Every actor but one has the exact same number of
contacts - The category-to-category mixing is identical
- The distance from S to T is the same (7 steps)
- S and T have not changed their behavior
- Their contacts contacts have the same behavior
- But the probability of the information passing
from S to T is - 0.148
- Different outcomes different potentials for
intervention
30Overview
Introduction
Basic Concepts
Flows within Networks
Structure of Social Space
Tools, Models Methods For Flows and Structures
Conclusions
31Network Flow
- In addition to the simple probablity that one
actor passes information on to another (pij), two
factors affect flow through a network - Topology
- the shape, or form, of the network
- - Example one actor cannot pass information to
another unless they are either directly or
indirectly connected - Time
- - the timing of contact matters
- - Example an actor cannot pass information he
has not receive yet
32Topology
Two features of the networks shape are known to
be important connectivity and centrality
- Connectivity refers to how actors in one part of
the network are connected to actors in another
part of the network. - Reachability Is it possible for actor i to
reach actor j? This can only be true if there is
a chain of contact from one actor to another. - Distance Given they can be reached, how many
steps are they from each other? - Number of paths How many different paths
connect each pair?
33Network topology reachability
Without full network data, you cant distinguish
actors with limited information from those more
deeply embedded in a setting.
c
b
a
34Network topology distance number of paths
- Given that ego can reach alter, distance
determines the likelihood of information passing
from one end of the chain to another. - Because information spread is never certain, the
probability of transfer decreases over distance. - However, the probability of transfer increases
with each alternative path connecting pairs of
people in the network.
35Network topology distance number of paths
Distance is measured by the (weighted) number of
relations separating a pair
Actor a is 1 step from 4 2 steps from 5
3 steps from 4 4 steps from 3 5 steps from 1
a
36Network topology distance number of paths
Paths are the different routes one can take.
Node-independent paths are particularly important.
There are 2 independent paths connecting a and b.
b
There are many non-independent paths
a
37Probability of information transfer
by distance and number of paths, assume a
constant pij of 0.6
1.2
1
10 paths
0.8
5 paths
probability
0.6
2 paths
0.4
1 path
0.2
0
2
3
4
5
6
Path distance
38Reachability in Colorado Springs (Sexual contact
only)
- High-risk actors over 4 years
- 695 people represented
- Longest path is 17 steps
- Average distance is about 5 steps
- Average person is within 3 steps of 75 other
people - 137 people connected through 2 independent paths,
core of 30 people connected through 4 independent
paths
(Node size log of degree)
39Network topology centrality
- Centrality refers to (one dimension of) location,
identifying where an actor resides in a network.
- For example, we can compare actors at the edge
of the network to actors at the center. - In general, this is a way to formalize intuitive
notions about the distinction between insiders
and outsiders.
40Centrality example
At the local level, we expect people like NSJMP
and NSOLN to have greater access to information
than others in the network. Network analysis
gives us a set of tools to quantify this
difference.
41Centrality example
Actors that appear very different when seen
individually, are comparable in the global
network.
(Node size proportional to betweenness centrality
)
42Information flows
Two factors that affect network
flows Topology - the shape, or form, of the
network - simple example one actor cannot pass
information to another unless they are either
directly or indirectly connected Time - the
timing of contacts matters - simple example an
actor cannot pass information he has not receive
yet
43Timing in networks
A focus on contact structure often slights the
importance of network dynamics Time affects
networks in two important ways 1) The structure
itself goes through phases that are correlated
with information spread 2) The timing of contact
constrains information flow
44Sexual Relations among A syphilis outbreak
Changes in Network Structure
Rothenberg et al map the pattern of sexual
contact among youth involved in a Syphilis
outbreak in Atlanta over a one year period.
(Syphilis cases in red)
Jan - June, 1995
45Sexual Relations among A syphilis outbreak
July-Dec, 1995
46Sexual Relations among A syphilis outbreak
July-Dec, 1995
47Drug Relations, Colorado Springs, Year 1
Data on drug users in Colorado Springs, over 5
years
48Drug Relations, Colorado Springs, Year 2 Current
year in red, past relations in gray
Data on drug users in Colorado Springs, over 5
years
49Drug Relations, Colorado Springs, Year 3 Current
year in red, past relations in gray
Data on drug users in Colorado Springs, over 5
years
50Drug Relations, Colorado Springs, Year 4 Current
year in red, past relations in gray
Data on drug users in Colorado Springs, over 5
years
51Drug Relations, Colorado Springs, Year 5 Current
year in red, past relations in gray
Data on drug users in Colorado Springs, over 5
years
52What impact does timing have on flow through the
network?
In addition to changes in the shape over time,
contact timing constrains how information can
flow through the network. Consider the
following example
53A hypothetical contact network
8 - 9
C
E
3 - 7
2 - 5
B
A
0 - 1
3 - 5
D
F
Numbers above lines indicate contact periods
54The path graph for the hypothetical contact
network
E
C
B
A
D
F
55Direct contact network of 8 people in a ring
(adjacency matrix cell number of paths from
row to column)
56Implied contact network of 8 people in a ring All
contacts concurrent
57Implied contact network of 8 people in a
ring Mixed Concurrent
2
3
2
1
1
2
2
3
Density 0.57
58Implied contact network of 8 people in a
ring Serial (1)
1
8
2
7
3
6
5
4
Density 0.73
59Implied contact network of 8 people in a
ring Serial (2)
1
8
2
7
3
6
1
4
Density 0.51
60Implied contact network of 8 people in a
ring Serial (3)
1
2
2
1
1
2
1
2
Density 0.43
61Information flows
Summary Topology - Information requires
connected communication chains - Real-world
networks are too complex to map these without
specialized tools. Time - Network topology
changes over time. This has implications for
information flow. - Because small changes in
relationship timing can have dramatic effects on
information flow, it is impossible to know this
intuitively.
62Overview
Introduction
Basic Concepts
Flows within Networks
Structure of Social Space
Tools, Models Methods For Flows and Structures
Conclusions
63Structure of Social Space
Information flows are only one use of
networks It is also possible to characterize the
key topological features of any social network.
These features include things such as the extent
of hierarchy and clustering.
64Structure of Social Space
1) Identify core groups patterns of relations
among groups a. embeddedness in groups constrains
action b. group structure affects stability
resource distribution 2) Locate tensions or
inconsistencies in a relational structure that
might indicate sources of social change.
65Structure of Social Space
Two features of interest related to network
structure 1) Cohesive groups Sets of people
who interact frequently with each other. These
are often groups that work together. Groups are
often organized into positions within a network
that indicate particular roles or access
resources 2) Hierarchy Relational structure
can identify the leadership positions within a
network, though either direction of ties or
periphery status
66Structure of cohesive groups
A cohesive group is a set of actors with more
interaction inside the group than outside the
group, mutually connected through multiple paths.
67Cohesive Group Structure
Immaculate Preparatory High School
68Cohesive Group Structure 3 types of positions
Immaculate Preparatory High School
69Cohesive Group Structure Group member
Immaculate Preparatory High School
70Cohesive Group Structure Group Member
Immaculate Preparatory High School
71Cohesive Group Structure Bridge between groups
Immaculate Preparatory High School
72Cohesive Group Structure Outsider
Immaculate Preparatory High School
73Cohesive Groups Relevance
- Identify people who bridge important
constituencies - - people who are between groups have a unique
ability to control information -
- Such actors are said to bridge structural holes,
the number of holes an actor bridges gives
insight into an actors power position in the
network.
74Hierarchy and network position
Many cohesive groups are embedded within a
hierarchy, which one can map using relational
tools. Changes in the hierarchical position
indicate changes in the power structure.
75Examples of Hierarchical Systems
Linear Hierarchy (all triads transitive)
Simple Hierarchy
Branched Hierarchy
Mixed Hierarchy
76Hierarchy and network position
If you dont know the hierarchy of the network,
asymmetry optimization techniques allow one to
identify levels in a hierarchy
77Hierarchy and network position
If you dont know the hierarchy of the network,
asymmetry optimization techniques allow one to
identify levels in a hierarchy
78Group structure through multiple relations
Start with some basic ideas of what a role is
An exchange of something (support, ideas,
commands, etc) between actors. Thus, we might
represent a family as
H
W
C
C
C
Provides food for
(and there are, of course, many other relations
inside the family)
79Group structure through multiple relations
The key idea, is that we can express a role
through a relation (or set of relations) and thus
a social system by the inventory of roles. If
roles equate to positions in an exchange system,
then we need only identify particular aspects of
a position. But what aspect? Structural
Equivalence
Two actors are structurally equivalent if they
have the same types of ties to the same people.
80Structural Equivalence
A single relation
81Structural Equivalence
Graph reduced to positions
82Alternative notions of equivalence
Instead of exact same ties to exact same alters,
you look for nodes with similar ties to similar
types of alters
83Overview
Introduction
Basic Concepts
Flows within Networks
Structure of Social Space
Tools, Models Methods For Flows and Structures
Conclusions
84Tools, Methods Models
Data Representations
Adjacency Matrix
Graph
Arc List
Node List
85Tools, Methods Models
Graphical Display
- Benefits
- Intuitive way to display networks.
- Helps people see the social space it is a map.
- A concise presentation of a great deal of data.
- Costs
- Lack of standards for how to display can create
misleading images. - Displays of large networks tend to reveal only
the roughest properties of the network
86Tools, Methods Models
Graphical Display Software
- PAJEK
- Program for analyzing and plotting very large
networks - Intuitive windows interface
- Used for most of the real data plots in this
presentation - Mainly a graphics program, but is expanding the
analytic capabilities - Free
- Available from
87Tools, Methods Models
Graphical Display Software
- Cyram Netminer for Windows
- Very new largely untested
- Price range depends on application
- Limited to smaller networks O(100)
88Tools, Methods Models
Graphical Display Software
- NetDraw
- Also very new, but by one of the best known names
in network analysis software. - Free
- Limited to smaller networks O(100)
89Tools, Methods Models
Analysis Methods Descriptive / Measurement
The key text for methods and measurement is
Wasserman, Stanley and Katherine Faust. 1994.
Social Network Analysis. Cambridge Cambridge
University Press.
The basic network measures use graph theory to
formalize aspects of the network, and always work
from either an adjacency matrix (slow for large
graphs) or an edge/node list.
90Tools, Methods Models
Analysis Methods Descriptive / Measurement
Properties of interest include
Individual Level Degree Number of contacts for
each person - Sum over the row/column of the
adjacency matrix. Closeness Centrality Inverse
of the distance to every other node in the
network. Count path distances from ego to
alters. Sub-group Level Group Membership Which
groups are there? Various search algorithms for
identifying groups. Group Position Where does a
given group fit in the overall flow of relations?
Various Equivalence algorithms. Graph
Level Density Number of ties present as a
percentage of all possible ties. Centralization
To what degree are edges focused through a small
number of nodes. Various formulas for different
centrality indices.
91Tools, Methods Models
Analysis Methods Descriptive / Measurement
Software
- 1) UCI-NET
- General Network analysis program, runs in Windows
- Good for computing measures of network topography
for single nets - Input-Output of data is a little clunky, but
workable. - Not optimal for large networks
- Available from
- Analytic Technologies
- Borgatti_at_mediaone.net
- 2) STRUCTURE
- A General Purpose Network Analysis Program
providing Sociometric Indices, Cliques,
Structural and Role Equivalence, Density Tables,
Contagion, Autonomy, Power and Equilibria In
Multiple Network Systems. - DOS Interface w. somewhat awkward syntax
- Great for role and structural equivalence models
- Manual is a very nice, substantive, introduction
to network methods - Available from a link at the INSNA web site
- http//www.heinz.cmu.edu/project/INSNA/soft_inf.ht
ml
92Tools, Methods Models
Analysis Methods Descriptive / Measurement
Software
- 3) NEGOPY
- Program designed to identify cohesive sub-groups
in a network, based on the relative density of
ties. - DOS based program, need to have data in arc-list
format - Moving the results back into an analysis program
is difficult. - Available from
- William D. Richards
- http//www.sfu.ca/richards/Pages/negopy.htm
- 4) SPAN - Sas Programs for Analyzing Networks
(Moody, ongoing) - is a collection of IML and Macro programs that
allow one to - a) create network data structures from nomination
data - b) import/export data to/from the other network
programs - c) calculate measures of network pattern and
composition - d) analyze network models
- Allows one to work with multiple, large networks
- Easy to move from creating measures to analyzing
data - All of the Add Health data are already in SAS
- Available by sending an email to
- Moody.77_at_osu.edu
93Tools, Methods Models
Analysis Methods Statistical Models
There are two general classes of statistical
models for networks
1) Models of the network itself The statistical
question is how an observed network fits into the
class of all possible random graphs with a given
set of topological characteristics. The whole
network is the substantive unit of analysis,
though technically one works with the dyads from
the network. Examples p models (Wasserman
and Pattison), MCMC random graph models (Tom
Snijders, Mark Handcock) 2) Models of individual
behavior that incorporate network
characteristics The statistical question is
whether or not network properties affect
individual behaviors. Examples Network
regressive-autoregressive models (Doriean), Peer
influence models (Friedkin)
94Tools, Methods Models
Analysis Methods Statistical Models
Exponential Random Graph Models
Where z is a collection of r explanatory
variables, calculated on x 2 is a collection of
r parameters to be estimated k is a normalizing
constant that ensures the probability sums to 1.
As it turns out, k is incredibly difficult to
identify, introducing a number of complexities to
the model.
95Exponential Random Graph Model Details Kindly
provided by Mark Handcock, University of
Washington Statistics Department.
96Exponential Random Graph Model Details Kindly
provided by Mark Handcock, University of
Washington Statistics Department.
97Exponential Random Graph Model Details Kindly
provided by Mark Handcock, University of
Washington Statistics Department.
98Exponential Random Graph Model Details Kindly
provided by Mark Handcock, University of
Washington Statistics Department.
99Exponential Random Graph Model Details Kindly
provided by Mark Handcock, University of
Washington Statistics Department.
100Exponential Random Graph Model Details Kindly
provided by Mark Handcock, University of
Washington Statistics Department.
101Exponential Random Graph Model Details Kindly
provided by Mark Handcock, University of
Washington Statistics Department.
102Tools, Methods Models
Analysis Methods Statistical Models
Exponential Random Graph Models
To estimate the model, we work with the
conditional probabilities (XijXcij) instead of
the full graph. This transforms the exponential
model to a logit model on the dyads
103Analysis Methods Statistical Models
Exponential Random Graph Models
Software for analyzing these models is available
from Logit Pseudo-Likelihood estimation http//
kentucky.psych.uiuc.edu/pstar/index.html (SPSS
programs) http//www.sfu.ca/richards/Pages/pspar
.html (Program for Large graphs) Empirically,
these models are tricky to estimate, as the
potential result space can easily become
degenerate, particularly as z starts to include a
more complicated rage of dependencies. MCMC
Estimation Ongoing work by Mark Handcock, Tom
Snijders and Co.
104Tools, Methods Models
Analysis Methods Statistical Models
Network Effect Models
Question is whether or not being connected to a
particular set of people affects an individuals
behavior. The key statistical point is that we
have abandoned the assumption that our cases are
independent. These models originated in spatial
statistics looking at the effect of an adjacent
geographic area on outcomes for any given area.
105Basic Peer Influence Model
Formal Model
(1)
(2)
Y(1) an N x M matrix of initial opinions on M
issues for N actors X an N x K matrix of K
exogenous variable that affect Y B a K x M
matrix of coefficients relating X to Y a a
weight of the strength of endogenous
interpersonal influences W an N x N matrix of
interpersonal influences
106Basic Peer Influence Model
Formal Model
(1)
This is the basic general linear model. It says
that a dependent variable (Y) is some function
(B) of a set of independent variables (X). At
the individual level, the model says that
Usually, one of the covariates is e, the model
error term.
107Basic Peer Influence Model
(2)
This part of the model taps social influence. It
says that each persons final opinion is a
weighted average of their own initial opinions
And the opinions of those they communicate with
(which can include their own current opinions)
108Basic Peer Influence Model
The key to the peer influence part of the model
is W, a matrix of interpersonal weights. W is a
function of the communication structure of the
network, and is usually a transformation of the
adjacency matrix. In general
Various specifications of the model change the
value of wii, the extent to which one weighs
their own current opinion and the relative weight
of alters.
109Basic Peer Influence Model
Formal Properties of the model
If we allow the model to run over t, we can
describe the model as
The model is directly related to spatial
econometric models
Where the two coefficients (a and b) are
estimated directly (See Doreian, 1982, SMR)
110Overview
Introduction
Basic Concepts
Flows within Networks
Structure of Social Space
Tools, Models Methods For Flows and Structures
Conclusions
111(No Transcript)