Title: Tutorial: Analyzing real network data
1Tutorial Analyzing real network data 1) Creating
data from survey
- You can download all of the needed files from
here - http//www.soc.duke.edu/jmoody77/rwj/wsfiles.htm
- This is data (modified) from one of the Add
Health schools. Ive changed the data some for
security reasons. Well walk through some of the
data coding issues, creating measures figures,
and then running peer influence structural
models on the network. - Outline
- From survey to analysis files
- Exploring the network visualization
- Network Behavior Peer Influence Models
- Network structure as indep variable
- Peer influence models
- Dyad similarity models
- Network Structure analyses
- Clustering for peer groups
- Block models
- Statistical Models for networks (STANET).
2Tutorial Analyzing real network data 1) Creating
data from survey
This is what students filled out in the Add
Health, in school survey. One set for male
friends, another for female friends. This is the
foundation of our data.
3Tutorial Analyzing real network data 1) Creating
data from survey
This is what students filled out in the Add
Health, in school survey. One set for male
friends, another for female friends. This is the
foundation of our data. Resulting in a
nomination data file that looks something like
this (actual numbers changed). We want to turn
this file into something PAJEK, UCINET, etc. can
read. Open netcreate.sas walk through logic
of the file.
4Tutorial Analyzing real network data 1) Creating
data from survey
Netcreate.sas used files from SPAN to create
PAJEK files. PAJEK files have a fixed structure
that is easy to program for. See the PAJEK
support files for details. There are programs
that convert excel or text to PAJEK format. And
UCINET (and STATNET, sort of ) all read pajek
.NET files.
5Tutorial Analyzing real network data 2)
Exploring the network graphically
I think its extremely useful to simply play
with the network in various ways and get a sense
of the shape of the network. This is perhaps
PAJEKs most usefule effect. -- Load a network
and work through good/bad plots.
6Tutorial Analyzing real network data 2)
Exploring the network graphically
- Once you have a network, how do you create a
print-ready image? - Screen shots (good for .ppt)
- Export to .ps or FLASH and edit in Illustrator
7Tutorial Analyzing real network data 3) Network
Behavior Peer Influence
We often want to know how some simple features of
the network position affect students. These are
network behavior models, where some indicator
measure of network position is used to predict an
outcome. One should think carefully about a
theoretical model here. Cause is often very
difficult to disentangle. Here well leave those
questions asside and simply look for correlates
of network position in behavior. Well look
at a) network volume (degree) b) centrality
(Closeness) c) local reciprocity (proportion of
ties ego send that are received) We can get most
of these from either SAS or PAJEK, though Im not
sure PAJEK can give you node-level reciprocity
rates Paj_nodestatread.sas is the SAS file
8Tutorial Analyzing real network data 3) Network
Behavior Peer Influence
Paj_nodestatread.sas is the SAS file After
running some models we get
9Tutorial Analyzing real network data 3) Network
Behavior Peer Influence
Open nodestats1.sas to see how to code these same
stats, plus a few, in SAS
10Tutorial Analyzing real network data 3) Network
Behavior Peer Influence
QAP is an alternative method that doesnt make as
many strong assumptions about the model. To use
QAP, we can run in SAS (but its slow and basic),
or export to UCINET (which is fast, sophisticated
and all that jazz). The qapstats.sas file
moves the data for us.
11Tutorial Analyzing real network data 3) Network
Behavior Peer Influence
We can also estimate the network autocorrelation
model directly. We can get QAD estimates just
by adding the WY term to the base model, which
typically performs fairly well. Open
peerinfl1.sas to see this routine. Alternatively,
UCINET calculates a simple network correlation
between any vector (Nx1) and any matrix (NxN) to
estimate the bivariate peer effect, and Carter
Butts LNAM routine in R (as part of SNA), lets
you run a full linear network autocorrelation
model. For stats details Leenders, T.Th.A.J.
(2002) Modeling Social Influence Through
Network Autocorrelation Constructing the Weight
Matrix'' Social Networks, 24(1), 21-47.
Anselin, L. (1988) Spatial Econometrics
Methods and Models. Norwell, MA Kluwer
12Tutorial Analyzing real network data 3) Network
Behavior Peer Influence
To run the R version, we need to export the data.
We can get started using the send2r.mac routine
and reshape some of the files. The sas program
sas2r_peerinfl.sas creates the needed external
files The r script lname_example.r is the
needed r script. Run the example models.
Call lnam(y fights, x cv, W1 w1, W2
clbs) Residuals Min 1Q Median 3Q
Max -1.3138 -0.7955 -0.3844 0.3147 3.6792
Coefficients Estimate Std. Error Z
value Pr(gtz) FEMALE -0.292433 0.144148
-2.029 0.042489 WHITE 0.160314 0.149228
1.074 0.282692 S3 0.061595 0.014843
4.150 3.33e-05 rho1.1 0.379421 0.103426
3.669 0.000244 rho2.1 0.001573 0.003954
0.398 0.690870 ---
Result of fights as Y, friendship as W1, club
overlap as W2
13Tutorial Analyzing real network data 3) Network
Behavior Peer Influence
Getting measures from PAJEK. PAJEK has no
direct ID link to files. These are simply text
files, so sort order matters. The basic routine
to get any measure in PAJEK is to create the
measure using the dropdown menus, then save the
files and read them into SAS, SPSS or whatever
stats program you use. Open the PAJEK files and
create in-degree, out-degree, closeness
centrality, reciprocity.
14Tutorial Analyzing real network data 4) Network
Structure Clustering the network
As part of the description, we often want to
identify significant clusters in the network.
There are lots of ways to do this, well sample a
few. a) Using UCINETs routines b) Clustering a
distance matrix (SAS) c) The Jiggle routine
(SAS, Moody) d) The Crowds algorithm e) Using
PAJEKs blockmodel routine to fine-tune a peer
group model.
15Tutorial Analyzing real network data 4) Network
Structure Clustering the network
- Clustering in UCINET
- -I find it simplest to read PAJEK files. Then
the best general routine is FACTIONS, though it
is slow for large (100s) nets. Very effective
for small nets. - In a pinch, CONCOR will often yield reasonable
peer groups, and its faster in UCINET - Clustering in SAS
- - We can often get a quick starting point by
simply using a hierarchical clustering on the
distance matrix. This is a fair place to start
for nets in the 100s of nodes size. - - Two algorithms that work fairly well are
Jiggle for large nets and Crowds for smaller
nets. Both work by extending the RNM approach of
Moody (2001), but jiggle is faster for large
nets, Crowds includes more checks for particular
structurs (like biconnected sets) and thus is
slower.
16Tutorial Analyzing real network data 4) Network
Structure Clustering the network
Clustering in PAJEK Pajek doesnt have a
dedicated clustering routine for finding peer
groups in nets. But you can coerce the
blockmodel routine to find block-diagonal
structures (slow) or use some of its neighboring
partitions. Keep an eye on this, as I bet they
implement Newmans algorithm soon Lets try
running some of these.
17Tutorial Analyzing real network data 4) Network
Structure Clustering the network
Sample results This is the resulting graph from
a Jiggle run on the school net. Note this is a
randomized algorithm, so you will get dif.
Results from dif. runs
18Tutorial Analyzing real network data 4) Network
Structure Clustering the network
Sample results This is the resulting graph from
a Crowds run on the school net. We end up with
smaller clusters, and a larger background set.
By construction, the clusters must be
bi-connected, so they are rounder than in the
prior algorithm.
19Tutorial Analyzing real network data 4) Network
Structure Clustering the network
Sample results This is the resulting graph from
a Crowds run on the school net. We end up with
smaller clusters, and a larger background set.
By construction, the clusters must be
bi-connected, so they are rounder than in the
prior algorithm.
20Tutorial Analyzing real network data 4) Network
Structure Clustering the network
Sample results This is the resulting graph from
a Crowds run on the school net. We end up with
smaller clusters, and a larger background set.
By construction, the clusters must be
bi-connected, so they are rounder than in the
prior algorithm.
21Tutorial Analyzing real network data 4) Network
Structure Block modeling a network
Split 1
Sample results The most commonly used blockmodel
routine is ConCorr, which is simple and fast.
The result is a set of nested splits to some
pre-specified depth. Here I apply that result to
the school net, working to a depth of 3 splits.
22Tutorial Analyzing real network data 4) Network
Structure Block modeling a network
Split 2
Sample results The most commonly used blockmodel
routine is ConCorr, which is simple and fast.
The result is a set of nested splits to some
pre-specified depth. Here I apply that result to
the school net, working to a depth of 3 splits.
23Tutorial Analyzing real network data 4) Network
Structure Block modeling a network
Split 3
Sample results The most commonly used blockmodel
routine is ConCorr, which is simple and fast.
The result is a set of nested splits to some
pre-specified depth. Here I apply that result to
the school net, working to a depth of 3 splits.
24Tutorial Analyzing real network data 4) Network
Structure Block modeling a network
More in keeping w. the spirit of the original
block modeling papers, regular equivalence
models are less likely to generate block-diagonal
models. A simple positional model is the
core-periphery model. This searches for a
single core in the net. Since we know this net
is split in two wings, well just look within
one of them.
25Tutorial Analyzing real network data 4) Network
Structure Block modeling a network
Another simple way to get at positions in a
network is to compare nodes across a vector of
triad-positions. In a directed network, the
vector giving the count of which positions an
actor is part of nicely summarizes the type of
role the actor plays in the net.
26Tutorial Analyzing real network data 4) Network
Structure Block modeling a network
Another simple way to get at positions in a
network is to compare nodes across a vector of
triad-positions. In a directed network, the
vector giving the count of which positions an
actor is part of nicely summarizes the type of
role the actor plays in the net.
27Tutorial Analyzing real network data 4)
Statistical Models for Networks
The exponential random graph (ERGM) class of
models are designed to let you model an observed
network as a function of local-network, node, and
dyad-level features. These models take the form
28Tutorial Analyzing real network data Statistical
Models for Networks
http//csde.washington.edu/statnet/Sunbelt2006/erg
mssunbeltxxviintroduction.ppt
29Tutorial Analyzing real network data Statistical
Models for Networks
http//csde.washington.edu/statnet/Sunbelt2006/erg
mssunbeltxxviintroduction.ppt
30Tutorial Analyzing real network data Statistical
Models for Networks
From Handcock (2006)http//csde.washington.edu/st
atnet/Sunbelt2006/ergmssunbeltxxviergmclass.pdf
31Tutorial Analyzing real network data Statistical
Models for Networks
Note this is a very simple dyad independence
model.
From Handcock (2006)http//csde.washington.edu/st
atnet/Sunbelt2006/ergmssunbeltxxviergmclass.pdf
32Tutorial Analyzing real network data Statistical
Models for Networks
The dyad-independence model had been extended to
other node features
From Handcock (2006)http//csde.washington.edu/st
atnet/Sunbelt2006/ergmssunbeltxxviergmclass.pdf
33Tutorial Analyzing real network data Statistical
Models for Networks
Lots of other structural features can be
included, though not all imply reasonable models
From Handcock (2006)http//csde.washington.edu/st
atnet/Sunbelt2006/ergmssunbeltxxviergmclass.pdf
34Tutorial Analyzing real network data Statistical
Models for Networks
From Handcock (2006)http//csde.washington.edu/st
atnet/Sunbelt2006/ergmssunbeltxxviergmclass.pdf
35Tutorial Analyzing real network data Statistical
Models for Networks
- The STATNET statistical package in R is the best
way to estimate these models. - We will
- walk through exporting our school friendship
data from SAS and bringing it into R. - Specify some simple models
- Demonstrate getting goodness of fit stats on
these models - Demonstrate simulating from a model
- The ultimate set of stats one can add to a model
are growing quickly. - Open statnet_datawrite.sas to see how to create
data for export.
From Handcock (2006)http//csde.washington.edu/st
atnet/Sunbelt2006/ergmssunbeltxxviergmclass.pdf
36Tutorial Analyzing real network data Statistical
Models for Networks
Results from a model (takes too long to run in
real time!)
Summary of model fit F
ormula s_friends edges mutual ttriad
nodematch("S3") nodematch("WHITE")
edgecov(s_clubs, "ovlpec") Newton-Raphson
iterations 87 MCMC sample of size 10000
Monte Carlo MLE Results
estimate s.e. p-value MCMC s.e. edges
-6.0927 0.1590376 lt 1e-04 3.054007
mutual 1.7009 0.3217789 lt
1e-04 0.716237 ttriad 0.4666
0.0003942 lt 1e-04 0.006069 nodematch.S3
1.4469 0.1719817 lt 1e-04 0.597009
nodematch.WHITE 0.9567 0.2931915
0.00110 2.890984 edgecov.s_clubs.ovlpec 0.2689
0.1585942 0.09001 0.555580 Null Deviance
85606.4 on 61752 degrees of freedom Residual
Deviance 6867.4 on 61746 degrees of freedom
Deviance 78739.0 on 6 degrees of
freedom AIC 6879.4 BIC 6933.6
From Handcock (2006)http//csde.washington.edu/st
atnet/Sunbelt2006/ergmssunbeltxxviergmclass.pdf
37(No Transcript)