Manchester RSS presentation

About This Presentation

Transcript and Presenter's Notes

Title: Manchester RSS

1
A graphic account of
structure in models and data
Peter Green, University of BristolRSS
Manchester Local Group, 5 June 2002
2
What do I mean by structure?

The key idea is conditional independence
x and z are conditionally independent given y
if p(x,zy) p(xy)p(zy)
implying, for example, that
p(xy,z) p(xy)
CI turns out to be a remarkably powerful and
pervasive idea in probability and statistics

3
How to represent this structure?

The idea of graphical modelling we draw graphs
in which nodes represent variables, connected by
lines and arrows representing relationships
We separate logical (the graph) and quantitative
(the assumed distributions) aspects of the model

4
Contingency tables
Markov chains
Spatial statistics
Genetics
Graphical models
Regression
AI
Statistical physics
Sufficiency
Covariance selection
5
Graphical modelling 1

Assuming structure to do probability calculations
Inferring structure to make substantive
conclusions
Structure in model building
Inference about latent variables

6
Basic DAG
a
b
c
in general
d
for example
p(a,b,c,d)p(a)p(b)p(ca,b)p(dc)
7
Basic DAG
a
b
c
d
p(a,b,c,d)p(a)p(b)p(ca,b)p(dc)
8
A natural DAG from genetics
AB
AO
AO
OO
OO
9
A natural DAG from genetics
AB
AO
AO
OO
OO
10
DNA forensics example(thanks to Julia Mortera)

A blood stain is found at a crime scene
A body is found somewhere else!
There is a suspect
DNA profiles on all three - crime scene sample is
a mixed trace is it a mix of the victim and
the suspect?

11
DNA forensics in Hugin

Disaggregate problem in terms of paternal and
maternal genes of both victim and suspect.
Assume Hardy-Weinberg equilibrium
We have profiles on 8 STR markers - treated as
independent (linkage equilibrium)

12
DNA forensics in Hugin
13
DNA forensics

The data
2 of 8 markers show more than 2 alleles at crime
scene ?mixture of 2 or more people

14
DNA forensics

Population gene frequencies for D7S820 (used as
prior on founder nodes)

Hugin
15
(No Transcript)
16
DNA forensics

Results (suspectvictim vs. unknownvictim)

17
How does it work?

(1) Manipulate DAG to corresponding (undirected)
conditional independence graph
(draw an (undirected) edge between variables ?
and ? if they are not conditionally independent
given all other variables)

?
?
?
18
How does it work?

(2) If necessary, add edges so it is triangulated
(decomposable)

19

(3) Construct junction tree
7
6
5
2
3
4
1
a clique
another clique
a separator
267
236
3456
26
36
2
For any 2 cliques C and D, C?D is a subset of
every node between them in the junction tree
12
20

(4) Probability propagation - passing messages
around junction tree
21
C
A
B
C
AB
BC
B
B
A
AB
BC
Initialisation of potential representation
22
A
B
C
AB
BC
B
Passing message from BC to AB (1)
marginalise
multiply
23
A
B
C
AB
BC
B
Passing message from BC to AB (2)
assign
24
A
B
C
AB
BC
B
After equilibration - marginal tables
25

Probabilistic expert systems Hugin for
Asia example
26
Limitations

of message passing
all variables discrete, or
CG distributions (both continuous and discrete
variables, but discrete precede continuous,
determining a multivariate normal distribution
for them)
of Hugin
complexity seems forbidding for truly realistic
medical expert systems

27
Graphical modelling 2

Assuming structure to do probability calculations
Inferring structure to make substantive
conclusions
Structure in model building
Inference about latent variables

28
Conditional independence graph

draw an (undirected) edge between variables ? and
? if they are not conditionally independent given
all other variables

?
?
?
29
Infant mortality example

Data on infant mortality from 2 clinics, by level
of ante-natal care (Bishop, Biometrics, 1969)

30
Infant mortality example

Same data broken down also by clinic

31
Analysis of deviance

Resid Resid
Df Deviance Df Dev
P(gtChi)
NULL 7 1066.43
Clinic 1 80.06 6 986.36
3.625e-19
Ante 1 7.06 5 979.30
0.01
Survival 1 767.82 4 211.48
5.355e-169
ClinicAnte 1 193.65 3 17.83
5.068e-44
ClinicSurvival 1 17.75 2 0.08
2.524e-05
AnteSurvival 1 0.04 1 0.04
0.84
ClinicAnteSurvival 1 0.04 0 1.007e-12
0.84

32
Infant mortality example
ante
survival
clinic
survival and clinic are dependent
and ante and clinic are dependent
but survival and ante are conditionally
independent given clinic
33
Prognostic factors for coronary heart disease
Analysis of a 26 contingency table (Edwards
Havranek, Biometrika, 1985)
strenuous physical work?
smoking?
family history of CHD?
blood pressure gt 140?
strenuous mental work?
ratio of ? and ? lipoproteins gt3?
34
How does it work?

Hypothesis testing approaches
Tests on deviances, possibly penalised (AIC/BIC,
etc.), MDL, cross-validation...
Problem is how to search model space when
dimension is large

35
How does it work?

Bayesian approaches
Typically place prior on all graphs, and
conjugate prior on parameters (hyper-Markov laws,
Dawid Lauritzen), then use MCMC (see later) to
update both graphs and parameters to simulate
posterior distribution

36

7
6
5
For example, Giudici Green (Biometrika, 2000)
use junction tree representation for fast local
updates to graph
2
3
4
1
267
236
3456
26
36
2
12
37

7
6
5
2
3
4
1
267
236
3456
26
36
27
2
127
12
38
Graphical modelling 3

Assuming structure to do probability calculations
Inferring structure to make substantive
conclusions
Structure in model building
Inference about latent variables

39
DAG for a trivial Bayesian model
?
?
y
40
Modelling with undirected graphs

Directed acyclic graphs are a natural
representation of the way we usually specify a
statistical model - directionally
disease ? symptom
past ? future
parameters ? data ..
However, sometimes (e.g. spatial models) there is
no natural direction

41
Scottish lip cancer data

The rates of lip cancer in 56 counties in
Scotland have been analysed by Clayton and Kaldor
(1987) and Breslow and Clayton (1993)
(the analysis here is based on the example in the
WinBugs manual)

42
Scottish lip cancer data (2)

The data include

the observed and expected cases (expected
numbers based on the population and its age and
sex distribution in the county),

a covariate measuring the percentage of the
population engaged in agriculture, fishing, or
forestry, and

the "position'' of each county expressed as a
list of adjacent counties.

43
Scottish lip cancer data (3)

County Obs Exp x SMR Adjacent
cases cases ( in counties
agric.)
1 9 1.4 16 652.2 5,9,11,19
2 39 8.7 16 450.3 7,10
... ... ... ... ... ...
56 0 1.8 10 0.0 18,24,30,33,45,55

44
Model for lip cancer data
(1) Graph
regression coefficient
covariate
random spatial effects
expected counts
observed counts
45
Model for lip cancer data
(2) Distributions

Data
Link function
Random spatial effects
Priors

46
WinBugs for lip cancer data

Bugs and WinBugs are systems for estimating the
posterior distribution in a Bayesian model by
simulation, using MCMC
Data analytic techniques can be used to summarise
(marginal) posteriors for parameters of interest

47
Bugs code for lip cancer data
model b1regions car.normal(adj,
weights, num, tau) b.mean lt- mean(b) for (i
in 1 regions) Oi dpois(mui)
log(mui) lt- log(Ei) alpha0 alpha1 xi
/ 10 bi SMRhati lt- 100 mui / Ei
alpha1 dnorm(0.0, 1.0E-5) alpha0
dflat() tau dgamma(r, d) sigma lt- 1 /
sqrt(tau)
48
Bugs code for lip cancer data
model b1regions car.normal(adj,
weights, num, tau) b.mean lt- mean(b) for (i
in 1 regions) Oi dpois(mui)
log(mui) lt- log(Ei) alpha0 alpha1 xi
/ 10 bi SMRhati lt- 100 mui / Ei
alpha1 dnorm(0.0, 1.0E-5) alpha0
dflat() tau dgamma(r, d) sigma lt- 1 /
sqrt(tau)
49
Bugs code for lip cancer data
model b1regions car.normal(adj,
weights, num, tau) b.mean lt- mean(b) for (i
in 1 regions) Oi dpois(mui)
log(mui) lt- log(Ei) alpha0 alpha1 xi
/ 10 bi SMRhati lt- 100 mui / Ei
alpha1 dnorm(0.0, 1.0E-5) alpha0
dflat() tau dgamma(r, d) sigma lt- 1 /
sqrt(tau)
50
Bugs code for lip cancer data
model b1regions car.normal(adj,
weights, num, tau) b.mean lt- mean(b) for (i
in 1 regions) Oi dpois(mui)
log(mui) lt- log(Ei) alpha0 alpha1 xi
/ 10 bi SMRhati lt- 100 mui / Ei
alpha1 dnorm(0.0, 1.0E-5) alpha0
dflat() tau dgamma(r, d) sigma lt- 1 /
sqrt(tau)
51
Bugs code for lip cancer data
model b1regions car.normal(adj,
weights, num, tau) b.mean lt- mean(b) for (i
in 1 regions) Oi dpois(mui)
log(mui) lt- log(Ei) alpha0 alpha1 xi
/ 10 bi SMRhati lt- 100 mui / Ei
alpha1 dnorm(0.0, 1.0E-5) alpha0
dflat() tau dgamma(r, d) sigma lt- 1 /
sqrt(tau)
Win Bugs
52
WinBugs for lip cancer data
Dynamic traces for some parameters
53
WinBugs for lip cancer data
Posterior densities for some parameters
54
How does it work?

The simplest MCMC method is the Gibbs sampler
in each sweep, visit each variable in turn, and
replace its current value by a random draw from
its full conditional distribution - i.e. its
conditional distribution given all other
variables including the data

55
Full conditionals in a DAG

Basic DAG factorisation
Bayes theorem gives full conditionals
involving only parents, children and spouses.
Often this is a standard distribution, by
conjugacy.

56
Full conditionals for lip cancer

for example

57
Beyond the Gibbs sampler

Where the full conditional is not a standard
distribution, other MCMC updates can be used the
Metropolis-Hastings methods use the full
conditionals algebraically

58
Limitations of MCMC

You cant beat errors
Autocorrelation limits efficiency
Possibly-undiagnosed failure to converge

59
Graphical modelling 4

Assuming structure to do probability calculations
Inferring structure to make substantive
conclusions
Structure in model building
Inference about latent variables

60
Latent variable problems
variable unknown
variable known
edges known
edges unknown
value set unknown
value set known
61
Hidden Markov models
e.g. Hidden Markov chain (DLM, state space model)
z0
z1
z2
z3
z4
hidden
y1
y2
y3
y4
observed
62
Hidden Markov models

Richardson Green (2000) used a hidden Markov
random field model for disease mapping

observed incidence
relative risk parameters
expected incidence
hidden MRF
63
Larynx cancer in females in France
SMRs
64
Latent variable problems
variable unknown
variable known
edges known
edges unknown
value set known
value set unknown
65
Wisconsin students college plans
10,318 high school seniors (Sewell Shah, 1968,
and many authors since)
ses
sex
5 categorical variables sex (2) socioeconomic
status (4) IQ (4) parental encouragement
(2) college plans (2)
pe
iq
cp
66
(Vastly) most probable graph according to an
exact Bayesian analysis by Heckerman (1999)
ses
sex
5 categorical variables sex (2) socioeconomic
status (4) IQ (4) parental encouragement
(2) college plans (2)
pe
iq
cp
67
h
ses
sex
pe
iq
Heckermans most probable graph with one hidden
variable
cp
68
Latent variable problems
variable unknown
variable known
edges unknown
edges known
value set known
value set unknown
69
Alarm network
Learning a Bayesian network, for an
ICU ventilator management system, from 10000
cases on 37 variables (Spirtes Meek, 1995)
70
Ion channel model choice
Hodgson and Green, Proc Roy Soc Lond A, 1999
71
Example hidden continuous time models
O2
O1
C1
C2
C1
C2
C3
O1
O2
72
Ion channelmodel DAG
model indicator
transition rates
hidden state
binary signal
levels variances
data
73
model indicator
C1
C2
C3
O1
O2
transition rates
hidden state
binary signal
levels variances
data

74
Posterior model probabilities
.41
O1
C1
.12
O2
O1
C1
.36
O1
C1
C2
O2
O1
C1
C2
.10
75
Complex Stochastic Systems book(Semstat lectures)

Graphical models and Causality S Lauritzen
Hidden Markov models H Künsch
Monte Carlo and Genetics E Thompson
MCMC P Green
F den Hollander and G Reinert
ed O Barndorff-Nielsen, D Cox and
C Klüppelberg, Chapman and Hall (2001)

76
Highly Structured Stochastic Systems book

Graphical models and causality
T Richardson/P Spirtes, S Lauritzen, P
Dawid, R Dahlhaus/M Eichler
Spatial statistics
S Richardson, A Penttinen,
H Rue/M Hurn/O Husby
MCMC
G Roberts, P Green, C Berzuini/W Gilks

77
Highly Structured Stochastic Systems book (ctd)

Biological applications
N Becker, S Heath, R Griffiths
Beyond parametrics
N Hjort, A OHagan
... with 30 discussants
editors N Hjort, S Richardson P Green
OUP (2002?), to appear

78
Further reading

J Whittaker, Graphical models in applied
multivariate statistics, Wiley, 1990
D Edwards, Introduction to graphical modelling,
Springer, 1995
D Cox and N Wermuth, Multivariate dependencies,
Chapman and Hall, 1996
S Lauritzen, Graphical models, Oxford, 1996
M Jordan (ed), Learning in graphical models, MIT
press, 1999

Write a Comment

User Comments (0)

About PowerShow.com

Manchester RSS PowerPoint PPT Presentation