Title: Fundamental Building Blocks of Social Structure
1Fundamental Building Blocks of Social Structure
- Honoring Peter Killworths contribution to social
network theory - Southampton, Sept. 28, 2006
2The network scale-up team
- Peter D. Killworth (SOC)
- Christopher McCarty (U Florida)
- Gene A. Shelley (Georgia State U)
- Eugene Johnsen (UC-Santa Barbara)
- H. Russell Bernard (U Florida)
3Some background Ill have a go at that
(Scripps, 1972).
- I asked everyone on a ship to rank order their
interactions with all the others. - I came to the physics department coffee break and
asked "anybody here want to know the social
structure of a vessel that gets all your data?" - The ocean-going physicists in the room knew they
weren't supposed to talk to people like me and
didn't even look up.
4- Peter hadnt gotten the memo about social
scientists and said he thought it might be fun. - And thats what its been, for 34 years and
40-odd papers later
5How to get at the structure of these data? Lets
try this
- Peter applied an algorithm from F.S. Actons
(then) recent book Numerical Methods that
(Usually) Work ... - The algorithm had been developed to solve the a
traffic problem How to get from point A to point
B fastest, irrespective of the number of red
lights on the path. - Visualizing the messy result.
6(No Transcript)
7(No Transcript)
8The prison studies
- We combined numerical methods with ethnography.
- The cliques always made sense, until one day
- Three numerically tied inmates whose connections
made no apparent sense different crimes, North
and South, rural and urban, Black and White. - Finally, finally an artifact .
9Peter This is too easy.
- We discovered that physicists dont apply their
models to social structure and anthropologists
dont test the error bounds of their instruments.
- We were half-way on this one, so we started the
accuracy studies.
10How to study accuracy?
- We studied people whose real communication could
be unobtrusively monitored and whose members we
could ask questions like "So, in the last day,
week, month, who did you talk to in this
group?" - Deaf people on TTYs
- Ham radio operators in a local network
- An early e-mail group
- An office
- A fraternity
11Half of what people tell you is incorrect
- People dont recall behaviors that did occur and
recall behaviors that didn't occur. - People arent lying. Theyre just terrible
behaviorscopes.
12Extending (or redefining) the problem
- We asked are the instruments for gathering data
about human behavior producing accurate
measurements of human behavior? - Others used our data and asked what do those
instruments produce a valid measurement of? - Answer If you ask people who they interact with,
people retrieve who they usually interact with
and report who they ought to interact with, given
everything they already know about their place in
the social structure.
13Next, the small world
- Milgrams famous small-world experiment told us
that there are 5.5 links between any two white
people in the U.S. and exactly one more link
between any white and any black person in the
U.S. - But these numbers do not tell us anything about
the structure of the society.
14Peter Lets find out how the SW actually
operates
- Show people a list of SW targets, complete with
the information about location, occupation,
hobbies, and organizations. - ask people to tell us their first link in a
small-world experiment. - Repeat 500 times and analyze the information
needed by people to make their choice of a first
link.
15The reverse small world experiments
- We ran six of these experiments in the U.S., in
Micronesia and in Mexico. - Things that people in the US find useful to the
task (name, location, occupation, hobbies,
organizations) are the same things that people in
other cultures need to know to place someone in
their network. - For both of us, the cross-cultural regularity
discovered in this series of experiments is among
the most exciting results of our work.
16(No Transcript)
17- We created a similarity matrix between targets
how many people used the same choice for a given
pair of targets? - A 2-d MDS shows the enduring influence of Gerhard
Mercator on schooling.
18(No Transcript)
19Finding the distribution c
- Our real objective, though, is to understand the
basic components of social structure. - One quantity that seems important is the number
of people whom people know. - We call this c
20Network size Its just one number
- From the first, Peter pushed us all to learn more
about the basic quanta - How does network size vary, within and across
cultures? - Whats the distribution look like?
- Our first estimate, in 1978, for average network
size in the U.S. was 250.
21(No Transcript)
22Peter You have to start somewhere.
- And what was that 250?
- It was the number of people on whom the people of
Morgantown, West Virginia who sat through this
grueling, 8-hour experiment could call on to be
first links if Milgram had shown up and asked
them to participate in a small world experiment.
23Deriving c from an assumption
- Let t be the size of a population, and let e be
the size of some subpopulation within it. - We assume that the fractional size
- p e/t
- of that subpopulation also applies to any
individuals network, other things being equal. - That is, everyones network in a society
reflects the distribution of subpopulations in
that society.
24The scale-up method to estimate c
- To test this, we ask a representative sample of
people to tell us how many people they know in
many subpopulations whose sizes are known - e.g., diabetics, gun dealers, postal workers,
women named Nicole, men named Michael
25People answer accurately
- Now, assuming that people can and do answer our
question accurately
26(No Transcript)
27A maximum likelihood estimate of an individuals
network size
   Â
where there are L known subpopulations. (Here i
is the individual, who knows mij in subpopulation
j.) Network size is (the sum of all the people
you say you know in some subpopulations of known
size, divided by the total size of those
subpopulations) times the population within which
the subpopulations are embedded.
28The estimates of c are reliable
- This doesnt deal with the big IF, but across 7
surveys in the U.S., average network size 290
(sd 232, median 231). - The 290 is not an average of averages. Its a
repeated finding. - And its almost certainly not an artifact of the
method.
29Reliability I
- In one survey, we estimated c by asking people
how many people they know in each of 17 relation
categories people who are in their immediate
family, people who are co-workers, people who
provide a service and summing. - The summation method (due to Chris McCarty)
produced a mean for c of 290.
30Reliability II Change the data
- We changed reported values at or above 5 to a
value of 5 precisely. The mean dropped to 206, a
change of 29. - We set values of at least 5 to a uniformly
distributed random value between 5 and 15. We
repeated this random change only for large
subpopulations (with gt 1 million). - The mean increased to 402, a change of 38 -- in
the opposite direction.
31 Reliability III Survey clergy
- We surveyed a national sample of 159 members of
the clergy people who are widely thought to
have large networks. - Mean c 598 for the scale-up method
- Mean c 948 for the summation method
32 290 is not a coincidence
- 1. Two different methods of counting produce the
same result. - 2. Changing the data produces large changes in
the results. - 3. People who are widely thought to have large
networks do have large networks.
33Something is going on
- This next slide shows the probability, for two of
our surveys, of knowing no one in each of 29
populations of known size, by the actual size of
those populations. - The two distributions track, except for the
expected offset.
34(No Transcript)
35The distribution of c
- Here is the graph of the distribution of network
size
36(No Transcript)
37Reliability vs. validity
- Ok, its reliable. But if the model works, we
ought to be able to use it to estimate the size
of populations whose sizes are not known. - Create a maximum likelihood estimate for the size
of an unknown subpopulation based on what all
respondents tell us and our estimates of their
network sizes. - Roughly speaking, inverting the previous
formula.
38Can we predict what we know?
- Test this by predicting the size of 29
populations of known size. - The overall result is encouraging
39r .79 but note the outliers
40Over- and under-estimation
- The two largest populations are people who have a
twin brother or sister and diabetics. - These are highly underestimated.
- Without these two outliers, the correlation rises
from r .79 to r .94 - No cheating
41Stigma vs. not newsworthy
- Being a twin or a diabetic is neither
stigmatizing, nor newsworthy. - From Gene Shelleys work, we know that personal
information about close co-workers or business
associates can take a decade or more to be
transmitted ... and in the case of being a twin
or a diabetic, may never be transmitted.
42Another encouraging result
- Charles Kadushin ran a national survey to
estimate the prevalence of crimes in 14 cities,
large and small, across the U.S. - He asked 17,000 people to report the number of
people they knew who had been victims of six
kinds of crime and the number of people they knew
who used heroin regularly.
43- Here are the estimates for the number of heroin
users in each of the 14 cities, along with the
estimates from the UCR.
44(No Transcript)
45- The fact that we track well with official
estimates means only that we have a much, much
less expensive way to get at these estimates
not that the estimates are correct. - And estimates of other crimes in those 14 cities
did not track so well.
46Reliability, validity, and accuracy
- So, while definitely reliable and perhaps valid,
our estimate of network size (and its
distribution) is not sufficiently accurate.
47Compromising assumptions
- 1. Transmission effects Everyone knows
everything about everyone they know. - 2. Barrier effects Everyone in the population
has an equal chance of knowing someone in any
subpopulation.
48Correlation between the mean number of Native
Americans known and the percent of the state
population that is Native American is 0.58, p
0.0001.
49Network social barriers
- Race (Blacks may know more diabetics than Whites
do.) - Gender (men may know more gun dealers than women
do.) - Even first names are associated with the barrier
effect. - We address the barrier effect by using a random,
nationally representative sample of respondents. - However, using the method on specific populations
may still lead to incorrect estimates.
50The transmission effect
- We asked people things about people they knew
and then called up those people to see how much
people really do know about their network members.
51Some things are easy to get right
- 99 know their alters marital status.
- People know how many children 89 of their alters
have. - 98 know the employment status of their alters.
52Some things are harder to know
- People say they know the state in which 70 of
their alters were born, but only 57 of the
reports (egos and alters) agree on this. - People dont know the number of siblings their
alters have 52 of the time.
53Some people withdraw
- Gene Shelley found that people who are HIV
withdraw from their network in order to limit the
number of people who know their HIV status. - Eugene Johnsen confirmed this by showing that
HIV people have, on average, networks that are
one-third the global average.
54A theory of transmission bias
- Take another look at the comparison of the data
from clergy and others - Its likely that you know at least one
Christopher (the probability of knowing NO
Christophers is close to zero). - Twins are likely to be underreported
55(No Transcript)
56- Peter said Assume that people report correctly
what they know but that what they know is
incorrect. - What would happen to the jaggedy curve if people
responded honestly to correct information instead
of honestly to incorrect information?
57How to adjust the x-axis rather than the y-axis
in the diagram?
- Suppose that widows dont tell half the people
they know about their being a widow. - The .013 on the x-axis remains the same but the
number that people would be responding to would
be .013/2. - To make the x-axis the effective size of that
population, we slide it to the left while the
y-axis remains the same.
58The jaggedy line would go
- Of course, we have no idea what the transmission
error might be. - We do know that if the numbers remain the same on
the y-axis and we make up the effective sizes on
the x-axis, the jaggedy line would go.
59- Peter did this analytically and computed the
predicted distribution of c. - The next slide shows that we may be on the right
track
60(No Transcript)
61Peters (highly) unusual place in the social
sciences
- No. of articles 154
- In Social Science journals (43)
- Total number of Citations 3194
- In Social Science journals 456 (14 )
- In non-Social Science journals 2738 (86)
62Â Â
63- http//garfield.library.upenn.edu/histcomp/killwor
th-pd_citing/ (http//tinyurl.com/nmhdc) - http//garfield.library.upenn.edu/histcomp/killwor
th-pd_auth/ (http//tinyurl.com/ppr82)