in data, and - PowerPoint PPT Presentation

About This Presentation
Title:

in data, and

Description:

What do I mean by structure? The key idea is conditional ... Data on infant mortality from 2 clinics, by level of ante-natal care (Bishop, Biometrics, 1969) ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 78
Provided by: Peter647
Category:
Tags: biometrics | data

less

Transcript and Presenter's Notes

Title: in data, and


1
in data, and
structure
structure
in models
  • uncertainty and complexity

2
What do I mean by structure?
  • The key idea is conditional independence
  • x and z are conditionally independent given y
    if p(x,zy) p(xy)p(zy)
  • implying, for example, that
    p(xy,z) p(xy)
  • CI turns out to be a remarkably powerful and
    pervasive idea in probability and statistics

3
How to represent this structure?
  • The idea of graphical modelling we draw graphs
    in which nodes represent variables, connected by
    lines and arrows representing relationships
  • We separate logical (the graph) and quantitative
    (the assumed distributions) aspects of the model

4
Contingency tables
Markov chains
Spatial statistics
Genetics
Graphical models
Regression
AI
Statistical physics
Sufficiency
Covariance selection
5
Graphical modelling 1
  • Assuming structure to do probability calculations
  • Inferring structure to make substantive
    conclusions
  • Structure in model building
  • Inference about latent variables

6
Basic DAG
a
b
c
in general
d
for example
p(a,b,c,d)p(a)p(b)p(ca,b)p(dc)
7
Basic DAG
a
b
c
d
p(a,b,c,d)p(a)p(b)p(ca,b)p(dc)
8
A natural DAG from genetics
AB
AO
AO
OO
OO
9
A natural DAG from genetics
AB
AO
AO
OO
OO
10
DAG for a trivial Bayesian model
?
?
y
11
DNA forensics example(thanks to Julia Mortera)
  • A blood stain is found at a crime scene
  • A body is found somewhere else!
  • There is a suspect
  • DNA profiles on all three - crime scene sample is
    a mixed trace is it a mix of the victim and
    the suspect?

12
DNA forensics in Hugin
  • Disaggregate problem in terms of paternal and
    maternal genes of both victim and suspect.
  • Assume Hardy-Weinberg equilibrium
  • We have profiles on 8 STR markers - treated as
    independent (linkage equilibrium)

13
DNA forensics in Hugin
14
DNA forensics
  • The data
  • 2 of 8 markers show more than 2 alleles at crime
    scene ?mixture of 2 or more people

15
DNA forensics
  • Population gene frequencies for D7S820 (used as
    prior on founder nodes)

16
(No Transcript)
17
DNA forensics
  • Results (suspectvictim vs. unknownvictim)

18
How does it work?
  • (1) Manipulate DAG to corresponding (undirected)
    conditional independence graph
  • (draw an (undirected) edge between variables ?
    and ? if they are not conditionally independent
    given all other variables)

?
?
?
19
How does it work?
  • (2) If necessary, add edges so it is triangulated
    (decomposable)

20

(3) Construct junction tree
7
6
5
2
3
4
1
a clique
another clique
a separator
267
236
3456
26
36
2
For any 2 cliques C and D, C?D is a subset of
every node between them in the junction tree
12
21
How does it work?
  • (4) any joint distribution with a triangulated
    graph can be factorised
  • until

cliques
separators
22
How does it work?
  • (5) pass messages along junction tree
    manipulate the terms of the expression
  • until
  • from which marginal probabilities can be read off

23

Probabilistic expert systems Hugin for
Asia example
24
Limitations
  • of message passing
  • all variables discrete, or
  • CG distributions (both continuous and discrete
    variables, but discrete precede continuous,
    determining a multivariate normal distribution
    for them)
  • of Hugin
  • complexity seems forbidding for truly realistic
    medical expert systems

25
Graphical modelling 2
  • Assuming structure to do probability calculations
  • Inferring structure to make substantive
    conclusions
  • Structure in model building
  • Inference about latent variables

26
Conditional independence graph
  • draw an (undirected) edge between variables ? and
    ? if they are not conditionally independent given
    all other variables

?
?
?
27
Infant mortality example
  • Data on infant mortality from 2 clinics, by level
    of ante-natal care (Bishop, Biometrics, 1969)

28
Infant mortality example
  • Same data broken down also by clinic

29
Analysis of deviance
  • Resid Resid
  • Df Deviance Df Dev
    P(gtChi)
  • NULL 7 1066.43
  • Clinic 1 80.06 6 986.36
    3.625e-19
  • Ante 1 7.06 5 979.30
    0.01
  • Survival 1 767.82 4 211.48
    5.355e-169
  • ClinicAnte 1 193.65 3 17.83
    5.068e-44
  • ClinicSurvival 1 17.75 2 0.08
    2.524e-05
  • AnteSurvival 1 0.04 1 0.04
    0.84
  • ClinicAnteSurvival 1 0.04 0 1.007e-12
    0.84

30
Infant mortality example
ante
survival
clinic
survival and clinic are dependent
and ante and clinic are dependent
but survival and ante are CI given clinic
31
Prognostic factors for coronary heart disease
Analysis of a 26 contingency table (Edwards
Havranek, Biometrika, 1985)
strenuous physical work?
smoking?
family history of CHD?
blood pressure gt 140?
strenuous mental work?
ratio of ? and ? lipoproteins gt3?
32
How does it work?
  • Hypothesis testing approaches
  • Tests on deviances, possibly penalised (AIC/BIC,
    etc.), MDL, cross-validation...
  • Problem is how to search model space when
    dimension is large

33
How does it work?
  • Bayesian approaches
  • Typically place prior on all graphs, and
    conjugate prior on parameters (hyper-Markov laws,
    Dawid Lauritzen), then use MCMC (see later) to
    update both graphs and parameters to simulate
    posterior distribution

34

7
6
5
For example, Giudici Green (Biometrika, 2000)
use junction tree representation for fast local
updates to graph
2
3
4
1
267
236
3456
26
36
2
12
35

7
6
5
2
3
4
1
267
236
3456
26
36
27
2
127
12
36
Graphical modelling 3
  • Assuming structure to do probability calculations
  • Inferring structure to make substantive
    conclusions
  • Structure in model building
  • Inference about latent variables

37
Mixture modelling
k
  • DAG for a
  • mixture model

w
?
y
38
Mixture modelling
k
  • DAG for a
  • mixture model

w
?
z
y
39
Modelling with undirected graphs
  • Directed acyclic graphs are a natural
    representation of the way we usually specify a
    statistical model - directionally
  • disease ? symptom
  • past ? future
  • parameters ? data ..
  • However, sometimes (e.g. spatial models) there is
    no natural direction

40
Scottish lip cancer data
  • The rates of lip cancer in 56 counties in
    Scotland have been analysed by Clayton and Kaldor
    (1987) and Breslow and Clayton (1993)
  • (the analysis here is based on the example in the
    WinBugs manual)

41
Scottish lip cancer data (2)
  • The data include
  • the observed and expected cases (expected
    numbers based on the population and its age and
    sex distribution in the county),
  • a covariate measuring the percentage of the
    population engaged in agriculture, fishing, or
    forestry, and
  • the "position'' of each county expressed as a
    list of adjacent counties.

42
Scottish lip cancer data (3)
  • County Obs Exp x SMR Adjacent
  • cases cases ( in counties
  • agric.)
  • 1 9 1.4 16 652.2 5,9,11,19
  • 2 39 8.7 16 450.3 7,10
  • ... ... ... ... ... ...
  • 56 0 1.8 10 0.0 18,24,30,33,45,55

43
Model for lip cancer data
(1) Graph
regression coefficient
covariate
random spatial effects
relative risks
observed counts
44
Model for lip cancer data
(2) Distributions
  • Data
  • Link function
  • Random spatial effects
  • Priors

45
WinBugs for lip cancer data
  • Bugs and WinBugs are systems for estimating the
    posterior distribution in a Bayesian model by
    simulation, using MCMC
  • Data analytic techniques can be used to summarise
    (marginal) posteriors for parameters of interest

46
Bugs code for lip cancer data
model b1regions car.normal(adj,
weights, num, tau) b.mean lt- mean(b) for (i
in 1 regions) Oi dpois(mui)
log(mui) lt- log(Ei) alpha0 alpha1 xi
/ 10 bi SMRhati lt- 100 mui / Ei
alpha1 dnorm(0.0, 1.0E-5) alpha0
dflat() tau dgamma(r, d) sigma lt- 1 /
sqrt(tau)
skip
47
Bugs code for lip cancer data
model b1regions car.normal(adj,
weights, num, tau) b.mean lt- mean(b) for (i
in 1 regions) Oi dpois(mui)
log(mui) lt- log(Ei) alpha0 alpha1 xi
/ 10 bi SMRhati lt- 100 mui / Ei
alpha1 dnorm(0.0, 1.0E-5) alpha0
dflat() tau dgamma(r, d) sigma lt- 1 /
sqrt(tau)
48
Bugs code for lip cancer data
model b1regions car.normal(adj,
weights, num, tau) b.mean lt- mean(b) for (i
in 1 regions) Oi dpois(mui)
log(mui) lt- log(Ei) alpha0 alpha1 xi
/ 10 bi SMRhati lt- 100 mui / Ei
alpha1 dnorm(0.0, 1.0E-5) alpha0
dflat() tau dgamma(r, d) sigma lt- 1 /
sqrt(tau)
49
Bugs code for lip cancer data
model b1regions car.normal(adj,
weights, num, tau) b.mean lt- mean(b) for (i
in 1 regions) Oi dpois(mui)
log(mui) lt- log(Ei) alpha0 alpha1 xi
/ 10 bi SMRhati lt- 100 mui / Ei
alpha1 dnorm(0.0, 1.0E-5) alpha0
dflat() tau dgamma(r, d) sigma lt- 1 /
sqrt(tau)
50
Bugs code for lip cancer data
model b1regions car.normal(adj,
weights, num, tau) b.mean lt- mean(b) for (i
in 1 regions) Oi dpois(mui)
log(mui) lt- log(Ei) alpha0 alpha1 xi
/ 10 bi SMRhati lt- 100 mui / Ei
alpha1 dnorm(0.0, 1.0E-5) alpha0
dflat() tau dgamma(r, d) sigma lt- 1 /
sqrt(tau)
51
WinBugs for lip cancer data
Dynamic traces for some parameters
52
WinBugs for lip cancer data
Posterior densities for some parameters
53
How does it work?
  • The simplest MCMC method is the Gibbs sampler
  • in each sweep, visit each variable in turn, and
    replace its current value by a random draw from
    its full conditional distribution - i.e. its
    conditional distribution given all other
    variables including the data

skip
54
Full conditionals in a DAG
  • Basic DAG factorisation
  • Bayes theorem gives full conditionals
  • involving only parents, children and spouses.
  • Often this is a standard distribution, by
    conjugacy.

55
Full conditionals for lip cancer
  • for example

56
Beyond the Gibbs sampler
  • Where the full conditional is not a standard
    distribution, other MCMC updates can be used the
    Metropolis-Hastings methods use the full
    conditionals algebraically

57
Limitations of MCMC
  • You cant beat errors
  • Autocorrelation limits efficiency
  • Possibly-undiagnosed failure to converge

58
Graphical modelling 4
  • Assuming structure to do probability calculations
  • Inferring structure to make substantive
    conclusions
  • Structure in model building
  • Inference about latent variables

59
Latent variable problems
variable unknown
variable known
edges known
edges unknown
value set unknown
value set known
60
Hidden Markov models
e.g. Hidden Markov chain
z0
z1
z2
z3
z4
hidden
y1
y2
y3
y4
observed
61
Hidden Markov models
  • Richardson Green (2000) used a hidden Markov
    random field model for disease mapping

observed incidence
relative risk parameters
expected incidence
hidden MRF
62
Larynx cancer in females in France
SMRs
63
Latent variable problems
variable unknown
variable known
edges unknown
edges known
value set known
value set unknown
64
Ion channel model choice
Hodgson and Green, Proc Roy Soc Lond A, 1999
65
Example hidden continuous time models
O2
O1
C1
C2
C1
C2
C3
O1
O2
66
Ion channelmodel DAG
model indicator
transition rates
hidden state
binary signal
levels variances
data
67
model indicator
C1
C2
C3
O1
O2
transition rates
hidden state
binary signal
levels variances
data











68
Posterior model probabilities
.41
O1
C1
.12
O2
O1
C1
.36
O1
C1
C2
O2
O1
C1
C2
.10
69
Alarm network
Learning a Bayesian network, for an
ICU ventilator management system, from 10000
cases on 37 variables (Spirtes Meek, 1995)
70
Latent variable problems
variable unknown
variable known
edges known
edges unknown
value set known
value set unknown
71
Wisconsin students college plans
10,318 high school seniors (Sewell Shah, 1968,
and many authors since)
ses
sex
5 categorical variables sex (2) socioeconomic
status (4) IQ (4) parental encouragement
(2) college plans (2)
pe
iq
cp
72
(Vastly) most probable graph according to an
exact Bayesian analysis by Heckerman (1999)
ses
sex
5 categorical variables sex (2) socioeconomic
status (4) IQ (4) parental encouragement
(2) college plans (2)
pe
iq
cp
73
h
ses
sex
pe
iq
Heckermans most probable graph with one hidden
variable
cp
74
CSS book (Complex Stochastic Systems)
  • Graphical models and Causality S Lauritzen
  • Hidden Markov models H Künsch
  • Monte Carlo and Genetics E Thompson
  • MCMC P Green
  • F den Hollander and G Reinert
  • ed O Barndorff-Nielsen, D Cox and
  • C Klüppelberg, Chapman and Hall (2001)

75
HSSS book (Highly Structured Stochastic Systems)
  • Graphical models and causality
  • T Richardson/P Spirtes, S Lauritzen, P
    Dawid, R Dahlhaus/M Eichler
  • Spatial statistics
  • S Richardson, A Penttinen,
    H Rue/M Hurn/O Husby
  • MCMC
  • G Roberts, P Green, C Berzuini/W Gilks

76
HSSS book (ctd)
  • Biological applications
  • N Becker, S Heath, R Griffiths
  • Beyond parametrics
  • N Hjort, A OHagan
  • ... with 30 discussants
  • editors N Hjort, S Richardson P Green
  • OUP (2002?), to appear

77
Further reading
  • J Whittaker, Graphical models in applied
    multivariate statistics, Wiley, 1990
  • D Edwards, Introduction to graphical modelling,
    Springer, 1995
  • D Cox and N Wermuth, Multivariate dependencies,
    Chapman and Hall, 1996
  • S Lauritzen, Graphical models, Oxford, 1996
  • M Jordan (ed), Learning in graphical models, MIT
    press, 1999
Write a Comment
User Comments (0)
About PowerShow.com