Social Sub-groups - PowerPoint PPT Presentation

About This Presentation
Title:

Social Sub-groups

Description:

In SAS you use two commands to get a cluster analysis. ... It's a DOS based program, and a little clunky to use, but NEGWRITE.MOD will ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 24
Provided by: sociologyr
Learn more at: https://people.duke.edu
Category:
Tags: commands | dos | groups | social | sub

less

Transcript and Presenter's Notes

Title: Social Sub-groups


1
Social Sub-groups
  • Overview
  • Substantive papers
  • Wayne Baker
  • Social structure in a place where there should be
    none
  • Scott Feld
  • What causes clustering in a network? Opportunity
    and interests
  • Methods
  • Continue discussion of social subgroups
  • - Cluster analysis
  • Roles Blockmodels

2
Social Sub-groups
Wayne Baker The Social Structure of a National
Securities Market 1) Behavioral assumptions of
economic actors 2) Micro-structure of
networks 3) Macro-structure of networks 4)
Price Consequences
Under standard economic assumptions, people
should act rationally and act only on price.
This would result in expansive and homogeneous
(I.e. random) networks. It is, in fact, this
structure that allows microeconomic theory to
predict that prices will settle to an optimal
equilibrium
3
Bakers Model
4
Bakers Model
He makes two assumptions in contrast to standard
economic assumptions a) that people do not have
access to perfect information and b) that some
people act opportunistically
He then shows how these assumptions change the
underlying mechanisms in the market, focusing on
price volatility as a marker for
uncertainty. The key on the exchange floor is
market makers people who will keep the process
active, keep trading alive, and thus not hoard
(and lower profits system wide)
5
Bakers Model
Micronetworks Actors should trade extensively
and widely. Why might they not? A) Physical
factors (noise and distance) B) Avoid risk and
build trust
Macro-Networks Should be undifferentiated. Why
not? A) Large crowds should be more
differentiated than small crowds. Why?
Price consequences Markets should clear. They
often dont. Why? Network differentiation
reduces economic efficiency, leading to less
information and more volatile prices
6
Baker Use frequency of exchange to identify the
network, resulting in
Baker finds that the structure of this network
significantly (and differentially) affects the
price volatility of the network
7
Baker Because size is the primary determinant of
clustering in this setting, he concludes that the
standard economic assumption of large market
efficient is unwarranted.
8
Scott Feld Focal Organization of Social Ties
Feld wants to look at the effects of constraint
opportunity for mixing, to situate relational
activity within a wider context. The contexts
form Foci, A social, psychological, legal or
physical entity around which joint activities are
organized (p.1016) People with similar foci
will be clustered together. He contrasts this
with social balance theory. Claim that much of
the clustering attributed to interpersonal
balance processes are really due to focal
clustering. (note that this is not theoretically
fair critique -- given that balance theory can
easily accommodate non-personal balance factors
(like smoking or group membership) but is a good
empirical critique -- most researchers havent
properly accounted for foci.)
9
Cluster analysis
  • As with the network cluster algorithms, there are
    many options for clustering. The three that I
    use most are
  • Wards Minimum Variance -- the one I use almost
    95 of the time
  • Average Distance -- the one used in the example
    above
  • Median Distance -- very similar
  • Again, the SAS manual is the best single place
    Ive found for information on each of these
    techniques.
  • Some things to keep in mind
  • Units matter. The example above draws together
    pairs horizontally because the range there is
    smaller. Get around this by standardizing your
    data.
  • This is an inductive technique. You can find
    clusters in a purely random distribution of
    points. Consider the following example.

10
Cluster analysis
The data in this scatter plot are produced using
this code
data random do i1 to 20 xrannor(0)
yrannor(0) output end run
11
Cluster analysis
Resulting dendrogram
12
Cluster analysis
Resulting cluster solution
13
Cluster analysis
Cluster analysis works by building a distance
matrix between each pair of points. In the
example above, it used the Euclidean distance
which in two dimensions is simply the physical
distance between the points in a plot. Can
work on any number of dimensions. To use
cluster analysis in a network, we base the
distance on the path-distance between pairs of
people in the network. Consider again the
blue-eye hip example
14
Cluster analysis
Distance Matrix 0 1 3 2 3 3 4 3 3 2 3 2 2 1 1 1 0
2 2 2 3 3 3 2 1 2 2 1 2 1 3 2 0 3 2 4 3 3 2 1 1 1
2 2 3 2 2 3 0 1 1 2 1 1 2 3 3 3 2 1 3 2 2 1 0 2 1
1 1 1 2 2 3 3 2 3 3 4 1 2 0 1 1 2 3 4 4 4 3 2 4 3
3 2 1 1 0 2 2 2 3 3 4 4 3 3 3 3 1 1 1 2 0 1 2 3 3
4 3 2 3 2 2 1 1 2 2 1 0 1 2 2 3 3 2 2 1 1 2 1 3 2
2 1 0 1 1 2 2 2 3 2 1 3 2 4 3 3 2 1 0 1 2 2 3 2 2
1 3 2 4 3 3 2 1 1 0 1 1 2 2 1 2 3 3 4 4 4 3 2 2 1
0 2 2 1 2 2 2 3 3 4 3 3 2 2 1 2 0 1 1 1 3 1 2 2 3
2 2 2 3 2 2 1 0
15
Cluster analysis
The distance matrix implies a space that nodes
are embedded within. Using something like MDS,
we can represent the space implied by the
distance matrix in two dimensions. This is the
image of the network you would get if you did
that.
16
Cluster analysis
When you use variables, the cluster analysis
program generates a distance matrix. We can,
instead use the network distance matrix directly.
If we do that with this example network, we get
the following
17
Cluster analysis
18
Cluster analysis
In SAS you use two commands to get a cluster
analysis. The first does the hierarchical
clustering. The second analyzes the cluster
output to create the tree. Example 1. Using
variables to define the space (like income and
musical taste)
proc cluster dataa methodave outclustd
std var x y id node run proc tree
dataclustd ncl5 outcluvars run
19
Cluster analysis
proc iml include 'c\moody\sas\programs\modules
\reach.mod' / blue eye example /
mat2j(15,15,0) mat21,2 14 151 / lines
cut here / mat215,1 14 2 41
dmatreach(mat2) mattrib dmat format1.0
print dmat id1nrow(dmat) idid
ddatiddmat create ddat from ddat /
creates the dataset / append from
ddat quit data ddat (typedist) / tells
SAS it is a distance / set ddat /
matrix / run
Example 2. Using a pre-defined distance matrix
to define the space (as in a social network). You
first create the distance matrix (in IML), then
use it in the cluster program.
20
Cluster analysis
Example 2. Using a pre-defined distance matrix
to define the space (as in a social
network). Once you have it, the cluster program
is just the same.
proc cluster dataddat methodward
outclustd id col1 run proc tree dataclustd
ncl3 outnetclust copy col1 run proc freq
datanetclust tables cluster run proc print
datanetclust var col1 cluster run
21
The CROWDS algorithm combines the density
approach above with an initial cluster analysis
and a routine for determining how many clusters
are in the network. It does so by using the
Segregation index and all of the information from
the cluster hierarchy, combining two groups only
if it improves the segregation fit for both
groups.
22
The one other program you should know about is
NEGOPY. Negopy is a program that combines
elements of the density based approach and the
graph theoretic approach to find groups and
positions. Like CROWDS, NEGOPY assigns people
both to groups and to outsider or between
group positions. It also tells you how many
groups are in the network. Its a DOS based
program, and a little clunky to use, but
NEGWRITE.MOD will translate your data into NEGOPY
format if you want to use it. There are many
other approaches. If youre interested in some
specifically designed for very large networks
(10,000 nodes), Ive developed something I call
Recursive Neighborhood Means that seems to work
fairly well.
23
The Crowds Algorithm 1. Identify members of
network bicomponents, remove people not included.
2. Cluster the reduced network. - Identify
optimal number of groups (TREEWALK) -
For each level of the cluster partition tree do
(BFS) -Move up the tree from smaller to
larger groups. -If the fit for both groups
is improved by joining them then do so. -If
not, then identify group at that level.
-End TREEWALK. Do until all groups are
identified (GLOBAL LOOP) 3. Evaluate node
fit. Do until nodes cannot be moved
For each identified cluster do
(GRPCHECK) - Ensure group is a
bi-component. -Calculate effect on
group a of moving node j to group a.
-Calculate effect on j's present group of
removing j. - If there is a
positive net gain to moving j from own group to
a, then do so. End. 4. Identify
Bridging members. -If removing j from group a
would improve the fit of group a, AND assigning j
to any other group would lower the fit for that
group, then j is considered a bridge. Place all
bridges in separate class. 5. Group Check. Check
returns to combining groups. IF merging groups
would improve the fit of all groups to be merged,
then do so. - Evaluate bridges, to be sure that
they are not bridging two groups that have now
merged. End Global loop.  
Write a Comment
User Comments (0)
About PowerShow.com