Title: Social Sub-groups
1Social Sub-groups
- Overview
- Background
- How do we characterize the social structure of a
group? - Exemplar Ken Frank and Jeffrey Yasumoto
- A discussion of how action is situated within and
between social groups. Nice application of a
group-detection algorithm on interesting data. - Linton Freeman
- UC-Irvine. Long-standing editor of the journal
Social Networks. Writes today on the theoretical
necessities of a group. - Rise of StatMath Modularity fro Newman, Porter,
Mucha - Methods Algorithms Measures
-
2Social Sub-groups
Frank Yasumoto Action and Structure
...subgroups may define the essential components
that contextualize actors social ties and
relations. The predominance of subgroups in the
literature, ...leaves unanswered how and why
rational actors simultaneously sustain their
subgroups and the linkages between them.
3Social Sub-groups
Frank Yasumoto Action and Structure
They argue that actors seek social capital,
defined as the access to resources through social
ties, and emphasize two mechanisms a)
Reciprocity Transactions Actors seek to
build obligations with others, and thereby
gain in the ability to extract resources. b)
Enforceable Trust Social capital is
generated by individual members
disciplined compliance with group
expectations. An indirect, group level
effect, that comes through the judicious
non-use of negative action. (p.646)
4Social Sub-groups
Frank Yasumoto Action and Structure
They expect to find evidence of enforceable trust
within social subgroups and evidence of
reciprocity between such groups. To do so, they
must identify primary subgroups within the
network. They do so using a density based
criterion. Franks algorithm iteratively assigns
nodes to subgroups until a parameter that
maximizes in-group density is reached. Basic
model is logit(Yij) a gij Seek to find
an assignment of nodes to groups (g) that
maximizes fit. This results in a block diagonal
adjacency matrix, where most of the ties fall
along the diagonal.
5Relations among the French Financial Elite (as
drawn by FY)
Relations within group are weighted heavier than
between to generate this picture
Group-weighted MDS
6Relations among the French Financial Elite (as
drawn by FY)
Treat all edges equal, get a somewhat less clear
pattern
7Relations among the French Financial Elite Group
to group density table
8Relations among the French Financial Elite
Given a subgroup structure, how do these groups
relate to social capital? Enforceable trust
Look for acts of hostility. A hostile act was
any action on the part of one actor that would
deprive another actor of access to
resources. Note that these were rare. Only 15
overall, likely indicating some level of cohesion
in the system as a whole. On the whole, they
find that -- net of other focal features and
direct ties -- being members of the same
sub-group lowers the probability of a negative
action between the dyad
9Relations among the French Financial Elite
They repeat the exercise with positive
support. They find that supportive actions are
better predicted by friendship (reciprocity) than
by subgroup membership. They conclude that this
supports the hypothesis that the potential for
enforceable trust within subgroups reduces the
relative need to pursue social capital through
reciprocity transactions within subgroups.
(p.647) Instead, they find that support occurs
between subgroups.
10Social Sub-groups
Lin Freeman The sociological concept of Group
Focus on collectivities that are Relatively
small, informal, and involve close personal
ties. What we would call Primary
Groups What (network) structure characterizes
such a group?
Goal Identify (a) non-overlapping groups that
allow one to (b) identify internal group
structure.
11Social Sub-groups
Lin Freeman The sociological concept of Group
Winships Model
1) Assign people to equivalence classes that are
hierarchically nested
12Social Sub-groups
Lin Freeman The sociological concept of Group
Winships Model
To assign people to a class, you must first
identify the strength of the relation between
each pair. Winships model says that you define
proximity based on interaction such that
13Social Sub-groups
Lin Freeman The sociological concept of Group
Winships Model
In words, this means that whatever metric you
define, a person is closer to themselves than to
anyone else, that the relation be symmetric, and
that triads be transitive (which, given the
symmetric condition, means that they be
complete). You can then identify partitions by
scaling the proximity, such that these three
conditions are met.
14Social Sub-groups
Lin Freeman The sociological concept of Group
Winships Model
15Social Sub-groups
Lin Freeman The sociological concept of Group
Winships Model
total
A-G
H-K
A-C
D-G
16Social Sub-groups
Lin Freeman The sociological concept of Group
Granovetters Model
Proceed exactly as in Winship, but treat
intransitivity differently when looking at strong
or weak ties.
If x and y are strongly connected, and y and z
are strongly connected, then x and z should be at
least weakly connected.
17Social Sub-groups
Lin Freeman The sociological concept of Group
Granovetters Model
An example of a graph fitting the prohibition
against G-intransitive relations.
18Social Sub-groups
The Davis - Old South Example
19Social Sub-groups
The Davis - Old South Example Ties gt 2
20Social Sub-groups
The Davis - Old South Example Ties gt 3
21Social Sub-groups
The Davis - Old South Example Ties gt 4
Meets the G-transitivity condition
22Social Sub-groups
The Davis - Old South Example Ties gt 5
Stronger than the G-transitivity condition
23Social Sub-groups
Lin Freeman The sociological concept of Group
Freeman argues that the G-intransitivity model
fits the data best for each of the 7 groups he
studies. Substantively, the types of groups
this model predicts are very similar to those
predicted by the general transitivity model,
except re-cast as a valued relation. Empirically,
if you want to identify groups based on levels
like this, you can use PAJEK and walk through the
model in just the same way as we did with Old
South or you can use UCI-NET (or program it,
its not hard)
24Methods How do we identify primary groups in a
network?
- A) Classic graph theoretical methods Cliques
and extensions of cliques - Cliques
- k-cores
- k-plexes
- Freeman (1992) Models
- K-components (we talked about these already)
- B) Algorithmic methods search through a network
trying to maximize for a particular pattern (I.e.
like Frank Yasumoto) - Adjust assignment of actors to groups until a
particular pattern of ties (block diagonal,
usually) is identified. - Standard models
- - Factions (UCI-NET)
- - KliqueFinder (Frank)
- RNM/CROWDS/JIGGLE (Moody)
- Principle component analysis (PCA)
- Flow models (MCL)
- Modularity Maximization routines
- - General Distance Clustering Methods
25Methods How do we identify primary groups in a
network?
Graph Theoretical Models.
Start with a clique. A clique is defined as a
maximal subgraph in which every member of the
graph is connected to every other member of the
graph. Cliques are collections of nodes where
density 1.0.
- Properties of cliques
- Density 1.0
- Everyone connected to n-1 alters
- Distance between every pair is 1
- Ratio of within group ties to between group ties
is infinite - All triads are transitive
26Methods How do we identify primary groups in a
network?
Graph Theoretical Models.
In practice, complete cliques are not very
useful. They tend to overlap heavily and are
limited in their size.
Graph theorists have thus relaxed the complete
connectivity requirement (with varying degrees of
success). See the Moody White paper on cohesion
for a discussion of many of these attempts.
27Methods How do we identify primary groups in a
network?
Graph Theoretical Models.
k-cores Every person connected to at least k
other people.
Ideally, they would look something like this
(here two 3-cores). However, adding a single
tie from A to B would make the whole graph a
3-core
28Methods How do we identify primary groups in a
network?
Graph Theoretical Models.
Extensions of this idea include
K-plex Every member connected to at least n-k
other people in the graph (recall in a clique
everyone is connected to n-1, so this relaxes
that condition. n-clique Every person is
connected by a path of N or less (recall a clique
is with distance 1). N-clan same as an
n-clique, but all paths must be inside the
group. Ive never had much luck with any of
these methods empirically. Real data is usually
too messy to work well. You should try them, and
gain some intuition for yourself. The place to
start is in UCINET.
29Methods How do we identify primary groups in a
network?
Graph Theoretical Models.
UCINET will compute all of the best-known graph
theoretic treatments for subgroups
30Methods How do we identify primary groups in a
network?
Graph Theoretical Models.
Consider running different methods on a known
group structure
31Methods How do we identify primary groups in a
network?
Graph Theoretical Models.
32Methods How do we identify primary groups in a
network?
Graph Theoretical Models.
Cliques
33Methods How do we identify primary groups in a
network?
Cliques
The only way to get something meaningful from
this is to analyze the clique overlap matrix,
which is what the Clique by partion dataset
does, using cluster analysis
34Methods How do we identify primary groups in a
network?
K-Cores
(See example, but in this case it works very
poorly)
35Methods How do we identify primary groups in a
network?
n-Clique (Everyone linked by a path of at least
length n)
36Methods How do we identify primary groups in a
network?
n-Clique (Everyone linked by a path of at least
length n)
37Methods How do we identify primary groups in a
network?
n-Clan (Everyone linked by a path of at least
length n, but path is INSIDE group)
38Methods How do we identify primary groups in a
network?
K-plex (each member of a K-plex of size N has
N-K ties to other members)
39Methods How do we identify primary groups in a
network?
Strategies for identifying primary groups
Search 1) Fit Measure Identify a measure of
groupness (usually a function of the number of
ties that fall within group compared to the
number of ties that fall between group). 2)
Algorithm to maximize fit. Once we have the
index, we need a clever method for searching
through the network to maximize the fit. See
Jiggle, Factions etc. Destroy Break
apart the network in strategic ways, removing the
weakest parts first, whats left are your primary
groups. See edge betweeness MCL
Evade Dont look directly, instead find a
simpler problem that correlates Examples
Generalized cluster analysis, Factor Analysis, RM.
40Methods How do we identify primary groups in a
network?
Search Optimize a partition to fit
Segregation Index (Freeman, L. C. 1972.
"Segregation in Social Networks." Sociological
Methods and Research 6411-30.)
Freeman asked how we could identify segregation
in a social network. Theoretically, he argues,
if a given attribute (group label) does not
matter for social relations, then relations
should be distributed randomly with respect to
the attribute. Thus, the difference between the
number of cross-group ties expected by chance and
the number observed measures segregation.
41Methods How do we identify primary groups in a
network?
Search Optimize a partition to fit
Consider the (hypothetical) network below. There
are two attributes in this network people with
Blue eyes and Brown eyes and people who are
square or not (they must be hip).
42Methods How do we identify primary groups in a
network?
Search Optimize a partition to fit
Segregation Index
Mixing Matrix
Seg -0.25
Seg 0.78
43Methods How do we identify primary groups in a
network?
Search Optimize a partition to fit
Segregation Index
One problem with the segregation index is that it
is not margin free. That is, if you were to
change the distribution of the category of
interest (say race) by a constant but not the
core association between race and friendship
choice, you can get a different segregation
level. One antidote to this problem is to use
odds ratios. In this case, and odds ratio tells
us the relative likelihood that two people in the
same category will choose each other as friends.
44Methods How do we identify primary groups in a
network?
Search Optimize a partition to fit
Odds Ratios
The odds ratio tells us how much more likely
people in the same group are to nominate each
other. You calculate the odds ratio based on the
number of ties in a group and their relative
size, based on the following table
Member of Same Group Different
Group Friends A
B Not Friends C D
OR AD/ BC
45Odds Ratios
There are 6 hip people and 9 square people in
this network. This implies that there are the
following number of possible ties in the network
Observed
Hip Square Hip 20
3 23 Square 3 30 33 23 33
56
Hip Square Hip 30
54 Square 54 72 Diagonal
ni(ni-1) off diagonal ni2
Group Same
Dif Yes 50 6 Friend
No 52 102
OR (50)102 / 52(6) 16.35
46Complete Network Analysis Network Connections
Social Subgroups
Segregation index compared to the odds ratio
Friendship Segregation Index
r.95
Log(Same-Sex Odds Ratio)
47Methods How do we identify primary groups in a
network?
Search Optimize a partition to fit
The second problem is that the Segregation index
has no clear maximum if every node is assigned
to a single group the value can be higher than if
everyone is assigned to the right group. This
means you cant just keep adjusting nodes until
you see a best fit, but instead have to look for
changes in fit. The modularity score solves this
problem by re-organizing the expectation in a way
that forces the value to 0 if everyone is in a
single group.
48Methods How do we identify primary groups in a
network?
Search Optimize a partition to fit
We can also measure the extent that ties fall
within clusters with the modularity score
Where s indexes clusters in the network ls is
the number of lines in cluster s ds is the sum
of the degrees of s L is the total number of
lines
M has the advantage of going to 0 if there is
only 1 group, which means maximizing the score is
sensible
49Methods How do we identify primary groups in a
network?
Search Optimize a partition to fit
Modularity Scores Comparison to Segregation Index
comparing values for known solutions
Modularity Score Plotted against Segregation
Index for various nets
50Methods How do we identify primary groups in a
network?
Search Optimize a partition to fit
Number of groups ?
In-group Density ?
51Methods How do we identify primary groups in a
network?
Search Optimize a partition to fit
- Factions in UCI-NET
- Multiple options for the exact factor maximized.
I recommend either the density or the correlation
function, and I would calculate the distance in
each case. - Franks KliqueFinder (the AJS paper we just read)
- Moodys crowds / Jiggle has elements of this
- Generalized blockmodel in PAJEK
- iGraph (R) has a couple that see this sort
52Methods How do we identify primary groups in a
network?
Search Optimize a partition to fit
Factions in UCI-NET
53Factions in UCI-NET
54Factions in UCI-NET
55Factions in UCI-NET
Reduced BlockMatrix 1 2 3 4 5 6
-- -- -- -- -- -- 1 59 1 2 14 1 0 2 1
54 0 1 12 2 3 1 2 55 0 1 12 4 9 1
1 51 0 0 5 0 12 2 0 62 1 6 1 0 9
2 0 64
Fit perfectly
56Methods How do we identify primary groups in a
network?
Search Optimize a partition to fit
UCINET
- Biggest drawbacks of FACTIONS are
- SLOW
- Have to specify the number of groups.
57Methods How do we identify primary groups in a
network?
Search Find a cheap indicator, and
cluster/optimize that
- There are many similar approaches, my sense is
the best approaches use a combination of
strategies. - - CROWDS Starts w. an RNM based clusterig, then
shifts nodes to maximize fit. Includes sub-loops
to merge resplit groups. - JIGGLE Starts w. a PCA on a weighted matrix,
then proceeds as in CROWDS (without the group
splitting trick). - Generalized Blockmodel in PAJEK uses a
simulated aneallig procedure to try to directly
maximize fit.
58- PAJEK Generalized Blockmodel
59- PAJEK Generalized Blockmodel
60- PAJEK Generalized Blockmodel
Fits fine, but its slow!
61This is a direct optimization of Modularity
62Methods How do we identify primary groups in a
network?
Evade Find a cheap indicator, and
cluster/optimize that
Cluster analysis
In addition to tools like FACTIONS, we can use
the distance information contained in a network
to cluster observations that are close to each
other. In general, cluster analysis is a set of
techniques that allows you to identify
collections of objects that are simmilar to each
other in some degree. A very good reference is
the SAS/STAT manual section called, Introduction
to clustering procedures. (http//wks.uts.ohio-s
tate.edu/sasdoc/8/sashtml/stat/chap8/index.htm)
(See also Wasserman and Faust, though the
coverage is spotty). We are going to start with
the general problem of hierarchical clustering
applied to any set of analytic objects based on
similarity, and then transfer that to clustering
nodes in a network.
63Cluster analysis
Imagine a set of objects (say people) arrayed in
a two dimensional space. You want to identify
groups of people based on their position in that
space. How do you do it?
How Smart you are
How Cool you are
64Methods How do we identify primary groups in a
network?
Evade Find a cheap indicator, and
cluster/optimize that
Start by choosing a pair of people who are very
close to each other (such as 15 16) and now
treat that pair as one point, with a value equal
to the mean position of the two nodes.
x
65Methods How do we identify primary groups in a
network?
Evade Find a cheap indicator, and
cluster/optimize that
Now repeat that process for as long as possible.
66Methods How do we identify primary groups in a
network?
Evade Find a cheap indicator, and
cluster/optimize that
This process is captured in the cluster tree
(called a dendrogram)
67Methods How do we identify primary groups in a
network?
Evade Find a cheap indicator, and
cluster/optimize that
- As with the network cluster algorithms, there are
many options for clustering. The three that I
use most are - Wards Minimum Variance -- the one I use almost
95 of the time - Average Distance -- the one used in the example
above - Median Distance -- very similar
- Again, the SAS manual is the best single place
Ive found for information on each of these
techniques. - Some things to keep in mind
- Units matter. The example above draws together
pairs horizontally because the range there is
smaller. Get around this by standardizing your
data. - This is an inductive technique. You can find
clusters in a purely random distribution of
points. Consider the following example. -
68Methods How do we identify primary groups in a
network?
Evade Find a cheap indicator, and
cluster/optimize that
Cluster analysis
The data in this scatter plot are produced using
this code
data random do i1 to 20 xrannor(0)
yrannor(0) output end run
69Methods How do we identify primary groups in a
network?
Evade Find a cheap indicator, and
cluster/optimize that
Cluster analysis
Resulting dendrogram
70Resulting cluster solution
Cluster analysis
71Methods How do we identify primary groups in a
network?
Evade Find a cheap indicator, and
cluster/optimize that
Cluster analysis
Cluster analysis works by building a distance
matrix between each pair of points. In the
example above, it used the Euclidean distance
which in two dimensions is simply the physical
distance between the points in a plot. Can
work on any number of dimensions. To use
cluster analysis in a network, we base the
distance on the path-distance between pairs of
people in the network. Consider again the
blue-eye hip example
72Methods How do we identify primary groups in a
network?
Evade Find a cheap indicator, and
cluster/optimize that
Cluster analysis
Distance Matrix 0 1 3 2 3 3 4 3 3 2 3 2 2 1 1 1 0
2 2 2 3 3 3 2 1 2 2 1 2 1 3 2 0 3 2 4 3 3 2 1 1 1
2 2 3 2 2 3 0 1 1 2 1 1 2 3 3 3 2 1 3 2 2 1 0 2 1
1 1 1 2 2 3 3 2 3 3 4 1 2 0 1 1 2 3 4 4 4 3 2 4 3
3 2 1 1 0 2 2 2 3 3 4 4 3 3 3 3 1 1 1 2 0 1 2 3 3
4 3 2 3 2 2 1 1 2 2 1 0 1 2 2 3 3 2 2 1 1 2 1 3 2
2 1 0 1 1 2 2 2 3 2 1 3 2 4 3 3 2 1 0 1 2 2 3 2 2
1 3 2 4 3 3 2 1 1 0 1 1 2 2 1 2 3 3 4 4 4 3 2 2 1
0 2 2 1 2 2 2 3 3 4 3 3 2 2 1 2 0 1 1 1 3 1 2 2 3
2 2 2 3 2 2 1 0
73Methods How do we identify primary groups in a
network?
Evade Find a cheap indicator, and
cluster/optimize that
The distance matrix implies a space that nodes
are embedded within. Using something like MDS,
we can represent the space implied by the
distance matrix in two dimensions. This is the
image of the network you would get if you did
that.
74Cluster analysis
When you use variables, the cluster analysis
program generates a distance matrix. We can,
instead use the network distance matrix directly.
If we do that with this example network, we get
the following
75Cluster analysis
76Cluster analysis
In SAS you use two commands to get a cluster
analysis. The first does the hierarchical
clustering. The second analyzes the cluster
output to create the tree. Example 1. Using
variables to define the space (like income and
musical taste)
proc cluster dataa methodave outclustd
std var x y id node run proc tree
dataclustd ncl5 outcluvars run
77Cluster analysis
proc iml include 'c\moody\sas\programs\modules
\reach.mod' / blue eye example /
mat2j(15,15,0) mat21,2 14 151 / lines
cut here / mat215,1 14 2 41
dmatreach(mat2) mattrib dmat format1.0
print dmat id1nrow(dmat) idid
ddatiddmat create ddat from ddat /
creates the dataset / append from
ddat quit data ddat (typedist) / tells
SAS it is a distance / set ddat /
matrix / run
Example 2. Using a pre-defined distance matrix
to define the space (as in a social network). You
first create the distance matrix (in IML), then
use it in the cluster program.
78Cluster analysis
Example 2. Using a pre-defined distance matrix
to define the space (as in a social
network). Once you have it, the cluster program
is just the same.
proc cluster dataddat methodward
outclustd id col1 run proc tree dataclustd
ncl3 outnetclust copy col1 run proc freq
datanetclust tables cluster run proc print
datanetclust var col1 cluster run
79Methods How do we identify primary groups in a
network?
Evade Find a cheap indicator, and
cluster/optimize that
Moodys CROWDS algorithm combines the search
approach with an initial cluster analysis and a
routine for determining how many clusters are in
the network. It does so by using the Segregation
index and all of the information from the cluster
hierarchy, combining two groups only if it
improves the segregation fit for both groups.
80Methods How do we identify primary groups in a
network?
Destroy Remove lines/nodes until what is left
over reveals something of interest
The logic behind these algorithms is that you
remove some weak links and see what is left.
Most popular is the edge betweenness algorithm.
81Methods How do we identify primary groups in a
network?
Destroy Remove lines/nodes until what is left
over reveals something of interest
UCINET has the MCL algorithm programmed.
82Evade look for something that correlates with
your split
Newmans Leading Eigenvector (in R this is the
bottom partition, not the best fit, which
aggregates/joins from here)
83Evade look for something that correlates with
your split
The Recursive Neighborhood Means algorithm
creates the variables that are then used in the
cluster analysis to identify groups.
- Start by randomly assigning every node a value on
k variables - Then calculate the average for each variable for
the people each person is tied to - Repeat this process multiple times
- ? This results in people who have many ties to
each other having similar values on the k random
variables. This similarity then gets picked up
in a cluster analysis.
84Example of the RNM procedure
Time 3
Time 2
Time 1
85Example of the RNM procedure
86As an example, consider the process active on a
known-to-be clustered networks, starting with 2
random k variables. You get something like this,
where the nodes are now placed according to their
resulting values on the 2 variables.
87(No Transcript)
88The algorithm does a good job uncovering clusters
in fake datasets.
89The algorithm does a good job uncovering clusters
in fake datasets.
90Compared to real data
91Compared to real data