Title: DEPARTMENT OF SOCIOLOGY
1DEPARTMENT OF SOCIOLOGY
The Assumptions You Dont Realise You Are Making
Are The Ones That Will Do You In Simulation,
Social Science and Appropriate Data Edmund
Chattoe-Brown ecb18_at_le.ac.uk
2Plan of talk
- Some background.
- A very simple (but revealing) example.
- Sociological data collection methods.
- Simulation methodology and types of simulation.
- A case study Modelling drug trends.
- Conclusions.
3Some relevant distinctions
- Not gaming or role playing Student United
Nations. - Not the post modernist thing (Baudrillard?)
whatever that is. - Instrumental versus descriptive simulation Not
just a technical tool (doing the sums quicker)
but a new way of understanding (explaining)
social behaviour. - A social process described as a computer
programme rather than a narrative or a
statistical model. - Other disciplines, other approaches Experiments,
content analysis, GIS.
4Intellectual biography
- Started as a chemist Follow the scientific
method. - Moved to social science Was particularly
interested in situations where analytical models
fail (oligopoly). - Studied Artificial Intelligence where simulating
something is considered a normal way of
supporting the claim you have understood it. - Became a sociologist because they are allowed to
collect many different kinds of data
(statistical, observational, cognitive).
5Current position
- A social simulator since 1994. Have built 5 or
6 different working simulations Uncommon
experience in a new field. - Trying to simulate social systems, particularly
those connected with social decision, networks,
communication and innovation. - Trying to link social scientific data with
simulation to build properly falsifiable models. - Trying to raise the bar institutionally
Getting away from toy models based on
unsystematic reading and with a wishful
relationship to data. - Trying to identify and bridge the intellectual
gap between mainstream social science and
simulation.
6Spatial segregation (Schelling)
- Agents live on a square grid (like a US city) so
each has eight neighbours. - There are two types of agents (red and green)
and some spaces in the grid are vacant. Initially
agents and vacancies are distributed randomly. - All agents decide what to do in the same very
simple way. - Each agent has a preferred proportion (PP) of
neighbours of its own kind (0.5 PP means that you
want at least 4 neighbours out of 8 to be your
own kind - but you would be happy with up to 8 i.
e. PP is a minimum.) - If an agent is in a position that satisfies its
PP then it does nothing. - If it is in a position that does not satisfy its
PP then it moves to an unoccupied position chosen
at random. - A time period is defined as the time it takes for
each agent (chosen in random order) to take a
turn at deciding and possibly moving.
7Initial state
8Two questions
- What is the smallest PP (i. e. number 0-1) that
will produce clusters? - What happens when the PP is 1?
9Simple individuals but complex system
10Deconstructing this example
- Clearly unrealistic in some senses Property
values, decision processes, space, communication,
neighbourhood knowledge. - However, not unrealistic in the important sense
that the simulation contains no arbitrary
parameters and no impossible global knowledge
(non computable, recursive). The only
parameters in the model are individual PP
values. - The simulation also generates unintended
consequences (PP1) and patterns that were not
built in. For example, is the distribution of
empty sites random or buffering? This emergence
(surprise) allows the possibility of genuine
falsification. - Complex systems also have heuristic fertility
What do we mean by compatible desires?
11Quantitative data collection approach
- Collect survey data Cross sectional, time series
or whatever. - Choose a model and accept/reject it on grounds of
statistical fit (adequate random sample, absence
of non-normality in data). - Model coefficients are results conditional on
acceptable model. - In what sense do models explain observed
patterns? - What is scientific status of coefficients?
(Descriptive/generative.) - Technical problems Explanatory range depends on
sample size. - Basic problem doesnt go away even with fancier
techniques like time series/MLM A description
isnt an explanation. - Rarely heuristically fertile.
12Deriving a quantitative coefficient
Number of strikes (units)
80
50
1
2
Unemployment (millions)
13Quantitative example
- The most important empirical findings of this
study can be summarized as follows - Contrary to Hypothesis 1, there is a moderate
tendency for individuals with higher service
class origins to be more likely than others to
enrol in PhD programmes. -
- The estimated effect of class drops to zero when
controlling for parents education and employment
in research or higher education. - The overall implication of these findings is that
the transition from graduate to doctoral studies
is influenced by social origins to a considerable
degree. Thus, the notion that such effects
disappear at transitions at higher educational
levels - due either to changes over the life
course or to differential social selection - is
not supported. (Mastekaasa, Acta Sociologica,
2006, 49(4), pp. 448-449.)
14Qualitative data collection approach
- Collect data (cognitive, behavioural, structural)
by observation and interrogation. - Try (though surprisingly rarely) to induce an
overarching pattern from the data Example of the
addiction cycle and compare with
amount/frequency account of drug use. - Result is rich coherent narrative(s) What heroin
addiction means from the inside and in a
particular context. - Are the results generalisable? (What is N?)
- Can we correctly envisage the consequences of
complex social interaction sequences presented
using narratives? (Compare Schelling case.) - Often heuristically fertile.
15Qualitative example
- Turkish interviewees do not include themselves
when they are evaluating the status of Turkish
women in general. While referring to Turkish
women, most Turkish interviewees use the pronoun
they - Turkish women are more home-oriented. I think
that they are left in the backstage because they
do not have education, because they are not given
equal opportunities with men. (T3) - One of the Turkish interviewees stated that it
was difficult for her to answer the questions
related to her status as a woman, because - I dont think of myself as a Turkish women, but
as a Turkish person. I mean I never think about
what kind of role I have in the society as a
woman. (T1) - Most Norwegian interviewees, on the other hand,
identify with Norwegian women in general, and
they refer to Norwegian women as we - I think that in a way Norwegian women, that is
we, at least have our rights on paper. We have
equal rights for education and we have good
welfare arrangements (N1) (Sümer, Acta
Sociologica, 1998, 41(1), p. 122)
16The Gilbert and Troitzsch box
17Ideal simulation methodology
- Choose a target system Ethnic segregation in
cities. - Build a simulation of the target system and
calibrate it, typically on micro level data
Ethnography and experiments? How do agents make
relocation decisions and where do they go? - Run simulation and look for regularities and
their preconditions Do we observe clusters
(always, never, only with high PP, fixed,
identical, moving) or buffer zones? - Compare these regularities perhaps with
statistical data on real residential patterns.
What tests do we have? - If there is a good match then we havent yet
falsified the claim that the simulation
generates the target system and therefore
explains it.
18A metaphor
- Think of the target system as a three dimensional
object that casts shadows (data) depending on its
orientation. Our simulation is an object that
should cast the same shadows. - Because we cannot hold the object all ways at
once, there are always some orientations that we
will not have tried. - A regression coefficient or line of best fit has
lower dimensionality than the target system. This
means that although these methods can nearly
always imitate shadows at fixed orientations,
they dont match the shadows at any arbitrary
orientation. - By recreating the dynamic structure of the target
system, a simulation doesnt just imitate
arbitrary shadows but actually mirrors the object
itself.
19What is going on here?
- Qualitative research tells us how people interact
and make decisions but cant usually tell us what
large scale patterns result. - Quantitative research tells us what the large
scale patterns are but may not really explain
them (ground them in micro foundations). - Simulation attempts to bridge the gap between the
levels of description with a generative social
theory expressed as a computer programme. - To do this, it needs to be ontologically clear
about what different kinds of data contribute
(cognitive, behavioural, structural, statistical)
and avoid arbitrary parameter values. (Ideally,
all parameters in a simulation should be
fittable/fitted empirically?)
20The catch
- Different approaches to simulation (types of
simulation) incorporate (often tacitly) different
behavioural assumptions. - For example, a strict cellular automaton just
has states and transition rules (no movement like
that found in Schelling). This may be great for
snowflake formation but is usually nothing like
either social or geographic space. (Example CA
fitting GIS data.) - These tacit behavioural assumptions may impact on
our ability to falsify simulations effectively
either because they introduce arbitrary
parameters or foreclose the collection of
relevant data on how people actually
behave/decide. - Something like model choice in statistics One
can use expertise and social intuition but not
test the choice directly.
21Voting Cellular Automata (CA)
22Case study Drug trends (DTI Foresight)
- How does drug use evolve over time?
- Comparing two approaches broadly agent based
and broadly system dynamics. - The Caulkins et al. model of drug use involves
(sort of) system dynamics Pools of non users,
light users and heavy users and various fixed
transition probabilities between them. - The DrugChat/DrugTalk simulations are (unusually)
broadly based on ethnographic data (Michael
Agar) Users may source and share drugs, transmit
information about experiences and thus become
more or less positive about drug use. They can
also become addicted.
23The Caulkins et al. model
LIGHT USERS
HEAVY USERS
b
I
g
a
NON USERS
L(t1)(1-a-b)L(t)I(t), H(t1)(1-g)H(t)bL(t)
24Deconstructing the model
- What is the status of the constant transition
probabilities? Do these describe historical
transitions (and thus require constant refitting)
or generate transitions? If so, how? - What determines the number of boxes and arrows?
(What about ex-users?) Is there something
independent of fit quality? (If not, there is a
danger of data mining/over fitting.) - Technical problem Do we have adequate
statistical tests for fitting this kind of model
(rather than, say, a regression). - How falsifiable is the model? Will it fit any
data and only visibly fail if outflow from
heavy users appears to be greater than outflow
from light users Minimal behavioural
plausibility.
25The DrugTalk/DrugChat simulations
- Based on ethnographic work by Michael Agar.
- DrugChat is a LISP replication of DrugTalk (in
NetLogo) for a DTI Foresight exercise in
approaches to modelling drug trends. - Agents structured in networks (many with few ties
and few with many). - Types (non users, users and addicts) defined
behaviourally rather than in terms of levels of
drug use Users and addicts differ in drug
sharing behaviour and users and non-users differ
in the kind of information transmitted and its
credibility. (This is ethnographic knowledge.)
26Simulation assumptions
- Doses distributed differing by use status
probability and number. - Decision process involves comparing attitude to
risk (fixed) and attitude to drugs (socially
influenced in several ways). - Users party (share) but addicts use privately
as a first approximation. - Dose use (binges?) Experiences can be good and
bad. - Running experience count kept and updates drug
attitude Diminishing marginal returns to
experience and bad experiences register more. - Communication Addicts have no communicative
credibility but are themselves a warning. Current
users influence directly by their attitude to
drugs from experience. Former users or non users
gossip (transmit good and bad experience counts
to others) which has a much smaller (and
indirect) effect. - Addiction after five doses Addicts dont
listen i. e. change attitude to drugs.
27Deconstructing the simulation
- Clearly oversimplified Static networks (key
result in question), decision process,
communication content and so on. - Ethnographic data needed User biographies,
levels of availability, sharing behaviour, stash
sizes. Can the simulation be effectively
calibrated? (Do data collection methods exist for
each parameter? If not, why not?) - Methods appropriate for real time dynamic
change i. e. attitudes. - Can the model be falsified in terms of
statistical data (recorded addicts, recorded
deaths from overdose and so on). How hard is it
to generate an S shaped innovation curve? How
hard is it to generate a population of
plausible addict biographies?
28What do we mean by agent based?
- Deconstructing the tacit homogeneity assumptions
in Schelling. - Different decision making with different inputs
and behaviours. - Different attributes (wealth).
- Different local perceptions, experiences and
memories. - Different/diverse environmental features houses
with different costs/facilities, travel to jobs,
ease of access and so on. - Fundamental question Just how similar are
people? Economic models (and Schelling) at one
extreme and journalism/biography at the other.
The agent based approach minimises the amount of
built in similarity relative to other
approaches.
29What can we do with this simulation?
- Multiple empirically accessible outputs (another
falsification opportunity) aggregate data,
biographies. - Exploring data quality issues See paper.
- Sensitivity analysis See paper.
- Examine plausibility of potential reductions
for the simulation Does a simulation with this
level of social complexity demonstrate stable
regularities in terms of variables or
transition probabilities? - Similar argument to Hendry in econometrics Start
with general model that is statistically adequate
and then know how much you throw away by
simplification. S to G and G to S are not
symmetrical processes. - Important not to draw the wrong conclusion from
this exercise but improves on the futile debate
between realism and simplicity. (Realists cheat
by not offering general conclusions. Simplicity
types cheat by not stating how easy their
models are to falsify. A mean is not a model.)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34Design principles
- Do assumptions of the simulation approach reflect
what we know about the social phenomenon Is it
predominantly spatial, network, local/global,
communicative, cognitive/reactive or whatever? - What is status of simulation parameters? Are they
theoretical (discount rate), empirical (number of
friends), descriptive (birth rate), generative
(Schelling) or what? - Do simulation style and data collection programme
permit falsification? How tough a test is it?
(Clusters? Out of sample prediction?) What degree
of toughness is reasonable here? - Dont let rigour challenges stop you
Unmeasured parameters are not unmeasurable.
(Compare statistical approach of proxies and just
fitting the data youve got.)
35Weaknesses/challenges?
- Is this a naïve view of falsification?
(Philosophy of science says there are always
ceteris paribus clauses.) - How do we use existing knowledge systematically
to calibrate and falsify simulations? Because
simulation is new, it has a backlog of data to
tackle which is a unique situation. - What new methods should we be developing (head
cameras) or adapting (experiments) to gather
missing data? - How can we afford and co-ordinate this kind of
research? Are we in a Catch-22? - How does a discipline pick itself up by its own
bootstraps in terms of methodological quality?
Does it?
36Encouraging thought
- To the man who has only a hammer, everything
looks like a nail (Abraham Maslow). - Have we really, for all the technical and
empirical challenges, found a new science and a
radically new place to stand? (Its a new
paradigm. Yawn!)
37Further resources
- NetLogo lthttp//ccl.northwestern.edu/netlogo/gt.
Free and cross platform. Rapidly becoming a
standard. - Gilbert and Troitzsch (2005) Simulation for the
Social Scientist (Open University Press). NOTE
Get the second edition with the exercises in
NetLogo rather than LISP. - JASSS (Journal of Artificial Societies and Social
Simulation) lthttp//jasss.soc.surrey.ac.uk/JASSS.
htmlgt. Interdisciplinary peer reviewed free
online journal devoted to social simulation. - Chattoe, Hickman and Vickerman Drugs Futures
2025? Modelling Drug Use lthttp//www.dti.gov.uk/
files/file15388.pdfgt.