Title: The statistical analysis of personal network data
1The statistical analysis of personal network data
- Part I Cross-sectional analysis
- Part II Dynamic analysis
2A word about quantitative and qualitative
approaches
- Quantitative and qualitative approaches play
complementary roles in personal network analysis - A qualitative pilot study can help to identify
important predictors / Qualitative analyses can
provide insights into the sources of error/
temporal instability - Quantitative analyses are crucial to determine
the statistical effect of characteristics /
Individuals do not know how for example their own
constant characteristics influence their network.
3In summary, types of information collected with
Egonet
- Information about the respondent (ego e.g., age,
sex, nationality) - Information about the associates (alters) to whom
ego is connected (e.g., age, sex, nationality) - Information about the ego-alter pairs (e.g.,
closeness, frequency and or means of contact,
time of knowing, geographic distance, whether
they discuss a certain topic, type of relation
e.g., family, friend, neighbour, workmate -) - Information about the relations among alters as
perceived by ego (simply whether they are related
or not, or strong/weak/no relation)
4The statistical analysis of personal versus
sociocentric networks what are the differences?
- Whereas sociocentric network researchers often
(yet not always) concentrate on a single network,
personal network researchers typically
investigate a sample of networks. - The dependency structure of sociocentric networks
is complex, therefore leading to the need of
specialized social network software, but personal
network researchers, as they often hardly use the
data on alter-alter relations, have a simpler
dependency structure...
5Personal network data have a multilevel
structure
- E.g. sample of 20 respondents, for each
respondent, we collected data of 45 alters, so we
have in total a collection of 900 dyads
ego
alter
6Three types of analysis have been used in past
research
- Type I Aggregated analysis
- Type II Disaggregated analysis
- (not okay, forget about it quickly!)
- Type III Multilevel analysis
7Type 1 Aggregated analysis
- First, aggregate all information to the
ego-level - Compositional variables (aggregated
characteristics of alters or ego-alter
relations) e.g., percentage of women, average
age of the alters, average time of knowing,
average closeness - Structural variables (aggregated characteristics
of alter-alter relations) e.g., network size,
density of the network, betweenness, number of
isolates, cliques - Then use standard statistical procedures to e.g.
- Describe the network composition or structure or
compare them across populations - Explain the networks (network as a dependent
variable) - Relate the networks to some variable of interest
(network as an explanatory variable) - Statistically correct provided that you are aware
of your level of analysis
8Example Effect at network level cannot be
interpreted at tie level
9Example Effect at network level cannot be
interpreted at tie level
10Type 2 Disaggregate analysis
- Disaggregated analysis of dyadic relations (e.g.,
run an linear regression analysis on the 900
alters) is statistically not correct even though
it has been done (e.g. Wellman et al., 1997,
Suitor et al., 1997) - Observations of alters are not statistically
independent as is assumed by standard statistical
procedures - Standard errors are underestimated, and
consequently significance is overestimated
11Type 3 Multilevel analysis
- Multilevel analysis of dyadic relations
- Multilevel analysis is a generalization of linear
regression, where the variance in outcome
variables can be analyzed at multiple
hierarchical levels. In our case, alters (level
1) are nested within egos / networks (level 2),
hence variance is decomposed in variance between
and within networks. - Software e.g., MLwiN, HLM, VarCL
- Dependent variable Some characteristic of the
dyadic relation (e.g., strength of tie) -
Networks as the dependent variables. Note
Special multilevel models have been developed for
discrete dependent variables. - Explanatory variables can be (among others)
- characteristics of ego (level 2),
- characteristics of alters (level 1),
- characteristics of the ego-alter pairs (level 1).
12See for a good article about the possibilities of
multilevel analysis of personal networks (incl. a
quick comparison with aggregated and
disaggregated types of analysis)
- Van Duijn, M. A. J., Van Busschbach, J. T.,
Snijders, T. A. B. (1999). Multilevel analysis of
personal networks as dependent variables. Social
Networks, 21, 187-209.
13In summary, cross-sectional analysis...
The two types of analysis, even when focusing on
the same variable, address different types of
questions ? Multilevel analysis e.g., what
predicts the strength of ties? ? Aggregated
analysis e.g., what predicts the average
strength of ties in personal networks?
14Illustration of type I Aggregate analysis The
case of migrants in Spain
- We collected information of about 300 migrants in
Catalonia with Egonet (in 2004-2005), from four
countries of origin - For each respondent, information was collected
about - Ego (country of origin, years of residence in
Spain, sex, age, marital status, level of
education, etc.) - Alters (country of origin, country of living,
etc.) - Ego-alter pairs (closeness, tie strength, type of
relation, etc.) - Relations among alters
15Illustration The case of migrants in Spain
- Our research questions were
- Can we distinguish different types of personal
networks (profiles) among migrants? - Can the type of personal network be predicted by
the years of residence of a migrant? - If so, do years of residence still predict
network profiles when controlled for other
important background characteristics?
16Method
- For each personal network (excluding ego), we
first calculated compositional and structural
characteristics (aggregate level) - Then, we used the following statistical
procedures to analyse the 286 valid cases - K-means cluster analysis based on various network
characteristics (see next slide), to identify
homogeneous groups of networks (network
profiles) - ANOVA to see whether profiles differ in years of
residence - Multinomial logistic regression to predict
profile membership from years of residence
controlled for background variables age, sex,
country of origin, employment
17K-means cluster analysis (SPSS)
- Based on the network variables (all
standardized) - 1. Proportion of alters whose country of origin
is Spain - 2. Proportion of fellow migrants
- 3. Density
- 4. Network betweenness centralization
- 5. Number of clusters (subgroups) within the
network - 6. Subgroup homogeneity regarding living in Spain
- 7. Average frequency of contact (7-point scale)
- 8. Average closeness (5-point scale)
- 9. Proportion of family in the network
18Results cluster analysis
- Five-cluster solution was best interpretable and
reasonably balanced - Cluster sizes
- Profile 1, the scarce network N 54
- Profile 2, the dense family network N 28
- Profile 3, the multiple subgroups network N
73 - Profile 4, the two worlds connected network N
75 - Profile 5, the embedded network N 50
- Characteristics that most contributed to the
cluster partition are - density
- homogeneity of the subgroups regarding living in
Spain - percentage of Spanish in the network
19Description of profiles
20Profile 1. Scarce network
Color country of origin (white foreign, black
Spain) Size country of living (large
Spain, small other country)
21Description of profiles
22Profile 2. Dense family network
Color country of origin (white foreign, black
Spain) Size country of living (large
Spain, small other country)
23Description of profiles
24Profile 3 Multiple subgroups network
Color country of origin (white foreign, black
Spain) Size country of living (large
Spain, small other country)
25Description of profiles
26Profile 4 Two worlds connected
Color country of origin (white foreign, black
Spain) Size country of living (large
Spain, small other country)
27Description of profiles
28Profile 5 Embedded network
Color country of origin (white foreign, black
Spain) Size country of living (large
Spain, small other country)
29Is the partition related to years of residence?
(ANOVA in SPSS)
Overall F (4, 2.67) 6.634, p lt .001 Per
profile There are two homogeneous subsets that
differ significantly in years of residence
Profiles 1 and 2, versus profiles 3, 4, and 5.
30Is the partition also related to years of
residence when controlled for background
characteristics?
- Multinominal logistic regression (SPSS)
- Age and employment status did not have
significant effects - Sex and country of origin, however, influenced
profile membership significantly e.g.,
Senegambians had a higher probability to have a
dense family network than others. - However, even controlled for these background
characteristics, years of residence still
predicts cluster membership.
31Conclusion of our illustration
- The network profiles give valuable information
about adaptation to a host country - The scarce network and the dense family network
seem transitional networks, whereas the other
three seem more settled.
32But...
- In order to investigate whether the networks of
migrants really follow a certain pattern of
change (or multiple patterns depending on for
example country of origin or entry situation), we
need a longitudinal model.
33... and what about the analysis of alter-alter
relations?
- Most researchers are only interested in
alter-alter relations to say something about the
structure of personal networks of respondents - Use structural measures (density, betweenness,
number of cliques etc.) in an aggregated analysis - Apply triad census analysis (Kalish Robins,
2006) - If youre interested in predicting who is related
to whom (among the alters) - Specify Exponential Random Graph Model (ERGM) for
each network and then run a meta-analysis over
the results (cf., Lubbers, 2003 Lubbers
Snijders, 2007)
34ERGMs
- ERGMs are available in, among others, the
software StOCNET (where you can find SIENA as
well) - Dependent variable whether alters are related or
not - Independent variables characteristics of alters,
the relation alters have with ego, the
alter-alter pair, endogenous network
characteristics such as transitivity (in the
meta-analysis, characteristics of ego can be
added as well) - Type of analysis Apply a common ERGM to each
network (leaving ego out), then run a
meta-analysis (cf. Lubbers, 2003 Snijders
Baerveldt, 2003 Lubbers Snijders, 2007).
35Part II. Dynamic analysis
- How do personal networks change over time?
- Data on personal networks are collected in two or
more waves in a panel study
36Interest in dynamic analysis
- Networks at one point in time are snapshots, the
results of an untraceable history (Snijders) - E.g., personal communities in Toronto (Wellman et
al.) - Changes following a focal life event (individual
level) - E.g., transition from high school to university
(Degenne Lebeaux, 2005) childbearing, moving,
return to school in midlife (Suitor Keeton,
1997) retirement (Van Tilburg, 1992) marriage
(Kalmijn et al., 2003) divorce (Terhell, Broese
Van Groenou, Van Tilburg, 2007) widowhood
(Morgan, Neal, Carder, 2000) migration (Molina
et al.) - Broader studies of social change Social and
cultural changes in countries with dramatic
institutional changes - E.g., post-communism in Finland, Russia (Lonkila,
1998), and Eastern Germany (Völker Flap, 1995)
37Types of dynamic personal network research
(networks as dependent variables)
- Feld et. al. (2007), Field Methods 19, 218-236
38Types of dynamic personal network research
- Feld et. al. (2007), Field Methods 19, 218-236
39Types of dynamic personal network research
- Feld et. al. (2007), Field Methods 19, 218-236
40Types of dynamic personal network research
- Feld et. al. (2007), Field Methods 19, 218-236
41Types of dynamic personal network research
- Feld et. al. (2007), Field Methods 19, 218-236
42Types of dynamic personal network research
- Feld et. al. (2007), Field Methods 19, 218-236
43Illustration The case of migrants in Spain
- Migrants in Catalonia (Barcelona, Vic, Girona).
- We collected information about the personal
networks of about 300 migrants (in 2004-2005). - Sample of 90 individuals for the second wave (1,5
- 2 years later on average). - Questionnaire at t2 identical to t1, but
supplemented with queries about the changes, such
as about alters who disappeared from the network - For the present illustration, we are focusing on
Argentinean migrants only (part of the interviews
N22).
44Type 1 Persistence of ties with alters across
time
- Dependent variable whether a tie persists or not
to a subsequent time (dichotomous) - Explanatory variables characteristics of ego,
alter, the ego-alter pair, and the situation,
especially in combination with the initial
characteristics of the relationship - Type of analysis Logistic multilevel analysis
45Illustration type 1 The case of migrants in Spain
- Cases 900 alters nested within 20 respondents
- Descriptive How persistent are ties over time?
- 53 of these alters were again nominated in Wave
2 (N 473), whereas 47 of the nominations was
not repeated (N 427). - Explanatory What predicts the persistence of
ties over time? - Logistic multilevel analysis (see Table 1)
46Table 1. Regression coefficients and standard
errors (between brackets) of the logistic
multilevel regression model predicting
persistence of ties (N 900).
47Additionally Differences between dissolved and
new ties
- Are the new ties qualitatively better than the
broken ones? - Alters newly nominated in Wave 2 were somewhat
- frequently contacted (3.2 versus 2.8 on frequency
of contact scale, t 5.32, df 888, p lt .001),
and somewhat - closer (2.9 versus 2.4 on closeness, t 3.70, df
888, p lt .001) - than the alters who were not nominated again in
Wave 2. - Furthermore, new relations were somewhat more
often family members (18) than relations that
were broken (12 ?2 6.03, df 1, p lt .05).
Involution?
48Type 2 Changes in characteristics of persistent
ties across time
- Dependent variable change in some characteristic
of the relationship (e.g., change in strength of
tie) - Explanatory variables characteristics of ego,
alter, the ego-alter pair, and the situation,
especially in combination with the initial
characteristics of the relationship - Type of analysis Multilevel analysis
49Illustration Type 2 The case of migrants in
Spain
- Cases 473 persistent ties
- Descriptive
- There was a fair amount of change in frequency of
contact (Mt1 3.50, Mt2 2.94 t 8.231, df
472, p lt .05) and less change in closeness in
stable ties (Mt1 3.68, Mt2 3.87 t -4.065,
df 472, p lt .05) - Explanatory
- Multilevel analysis (see Table 2).
50Table 2. Regression coefficients and standard
errors (between brackets) of the multilevel
regression model predicting changes in frequency
of contact and closeness in stable ties (N
473).
p lt .05
51Type 3 Changes in the size of the network across
time
- Dependent variable change in number of ties in
the personal network - Explanatory variables characteristics of ego, of
the set of alters, and the situation, especially
in combination with the initial characteristics
of the network - Type of analysis Regression analysis
52Illustration type 3 The case of migrants in
Spain
- The size of the network was fixed at 45 alters in
both waves, so this type of analysis cannot be
illustrated with our data.
53Type 4 Changes in overall network
characteristics across time
- Dependent variable change in compositional or
structural variable (e.g., percentage of alters
with higher education, density of the network) - Explanatory variables characteristics of ego, of
the set of alters, and the situation, especially
in combination with the initial characteristics
of the network - Type of analysis Regression analysis
54Illustration type 4 The case of migrants in
Spain
- Cases 22 respondents.
- The network stability of the 22 respondents was
on average 53 (SD 13.6), and varied between
29 and 76 among respondents. - How does the composition and structure of the
networks (the stable and unstable part together)
change over time? - Descriptive Overall, the network characteristics
hardly changed over time (Table 3). The only
characteristics that differed significantly
between Wave 1 and 2 were average closeness and
betweenness, both of which increased slightly
over the years. - Explanatory These changes could not be predicted
by ego characteristics (using a regression
analysis at ego level) the most important
predictor of the change was the variable at t1
(regression to the mean).
55Table 3. Means and standard deviations of the
compositional variables of the personal networks
at t1 and t2 (N 22), correlations between the
two waves, and t-test of differences between the
two waves.
p lt .05
56Conclusions from the illustration
- There is quite some instability in the personal
relations of Argentinean immigrants in Catalonia,
most importantly in their peripheral relations - Relational characteristics predict the
persistence of ties, whereas demographic
characteristics of ego affect the flux and flow
within their persistent ties - These quantitative analyses suggest that
important changes in the number of active
contacts and/or changes in ties (from 30-70) are
compatible with overall stability in network
composition.
57Further analyses
- We will investigate (based on all 90 respondents)
whether persons with different network profiles
at t1 have different patterns of changes in their
networks, indicating different ways of
assimilation to Spain.
58So what about the dynamics of alter-alter
relations?
- ... Lets propose a type 5?
59Type 5 Changes in ties among alters across time
- Dependent variable whether alters make new ties
or break existing ties with other alters across
time - Independent variables characteristics of alters,
the relation alters have with ego, the
alter-alter pair, endogenous network
characteristics such as transitivity (in the
meta-analysis, characteristics of ego can be
added as well) - Type of analysis Apply a common SIENA model to
each network (leaving ego out), then run a
meta-analysis (cf. Lubbers, 2003 Snijders
Baerveldt, 2003 Lubbers Snijders, 2007). A
multilevel version of SIENA is on the agenda.
60SIENA makes assumptions which seem to be violated
in personal networks
- It is assumed that people act strategically/ration
ally within the network, so the network should
make sense to them and they should know who are
the alters - Thoughts on strategical behavior and robustness
- Strategical behaviour among alters also occurs in
personal networks, e.g., befriend the friends of
friends. - In sociocentric networks, people are also
influenced by others outside the networks (e.g.
out-of-school friends). - In large sociocentric networks (e.g., an
organisation), people do not know all alters
either.
61Illustration of type 5 Changes in ties among
alters across time
- We are currently applying SIENA to each case
- In a meta-analysis, we can then investigate
whether for example a significant tendency of
transitivity among alters is related to more
stability in the relations between ego and the
alters
62Case study Normas network at t1
63Case study Normas network at t2
64Case study Normas network at t2 (new contacts
depicted in red)
65Case study SIENA analysis of Normas network
- In Normas network, there are 62 actors (28
stable actors, 17 who come and 17 who go). Of the
378 stable ties, 292 are not related at any
moment, 64 are related at both moments, 15 only
at t1 and 7 only at t2. - Statistical results The following effects were
significant (apart from degree) - Similarity in the frequency of contact between
alters If two alters had about the same
frequency of contact with ego, they had a higher
probability of having a relation themselves. - Transitivity If A and B are related, and B and C
as well, then it is likely that A and C also
become related. (but note that A and C already
had a transitive relation via the invisible
ego!). - Alter is family of ego or not The family members
of ego have a lower tendency to contact other
alters as the other network members.
66Sources of change in (personal) networks
- Unreliability due to measurement error
- Inherent instability
- Systemic change
- External change
- Leik Chalkley (1997), Social Networks 19, 63-74
67Sources of change in (personal) networks
- Unreliability due to measurement error
- Inherent instability
- Systemic change
- External change
- Researchers should consider the potential impact
of measurement error and inherent instability on
the substantive conclusions! E.g., plan a pilot
study, supplement with qualitative analyses,
calculate test-retest reliability of network and
scales of closeness etc.
Error sources
68Conclusion
- Multiple statistical methods for personal network
research, depending on your research interest - Combining several methods probably gives greatest
insight
69- Thanks!
- My e-mail MirandaJessica.Lubbers_at_uab.es