Title: Priors
1Priors
- Trevor Sweeting
- Department of Statistical Science
- University College London
2Structure of talk
- Bayesian inference the basics
- Specification of the prior
- Examples
- Subjective priors
- Nonsubjective priors
- Examples
- Methods of prior construction
- Coverage probability bias
- Relative entropy loss
- Wrap-up
3Bayesian inference the basics
- X the experimental or observational data to be
observed - Y the future observations to be predicted
- Â Data model
- (Possibly improper) prior distribution
- Â The posterior density of q is Â
-
Posterior density µ Prior density x Likelihood
function
posterior probabilities, moments, marginal
densities, expected losses, predictive densities
...
4Bayesian inference
- The predictive density of Y given X is
- Â Â
- Â
- Where are we?...
5Specification of the prior
- Approaches vary
- from fully Bayesian analyses based on fully
elicited subjective priors - to fully frequentist analyses based on
nonsubjective (objective) priors
Fully Bayesian Fully Frequentist
Subjective Elicited prior
Mixed Performance Penalty fn
Nonsubjective Default prior Dual verification Performance
6Examples
- Four examples
- All taken from Applied Statistics, 52 (2003)
- Competing risks
- Image analysis
- Diagnostic testing
- Geostatistical modelling
7Competing risks (Basu and Sen)
- System failure data cause of failure not
identified - n systems, R competing risks
- Datum for each system is (T, S, C)
- T is failure time, S are the possible causes of
failure, C is a censoring indicator - Parameters in the model are of location scale
type - Use (i) informative conjugate priors
- Source historical data
- or (ii) noninformative priors
- Such that they have a minimal effect on the
analysis - Implementation via Gibbs sampling
8Image analysis (Dryden, Scarr and Taylor)
- Segmentation of weed and crop textures
- Automatic identification of weeds in images of
row crops - Parameters are (k, C, f)
- k is the number of texture components, C are
texture labels, f are parameters associated with
the distribution of pixel intensities - Highly structured prior for (k, C, f)
- Markov random field for C, truncated conjugate
priors for f - Hyperparameters set in context e.g. to
encourage relatively few textures - Implementation via Markov chain Monte Carlo
9Diagnostic testing (Georgiadie, Johnson, Gardner
and Singh)
- Multiple-test screening data models are
unidentifiable - A Bayesian analysis therefore depends critically
on prior information - Parameters consist of various (at least 8) joint
sensitivity and specificity probabilities - Independent beta priors two informative, the
rest noninformative - Investigate coverage performance and
sensitivities for various choices of prior - Implementation via Gibbs sampling
10Geostatistical modelling (Kammann and Wand)
- Geostatistical mapping to study geographical
variability of reproductive health outcomes
(disease mapping) - Geoadditive models
- Universal kriging model involves a stationary
zero-mean stochastic process over sites - leads to borrowing strength
- Non-Bayesian analysis, but model could be
formulated in a Bayesian way, with the mean
responses at the given sites having a
multivariate normal prior - Implementation residual ML and splines
11Table for examples
Fully Bayesian Fully Frequentist
Subjective
Nonsubjective
Image analysis
Diagnostic testing
Geostatistical modelling
12Subjective priors
- To some extent, all the previous examples
included subjective prior specification - Methods of elicitation
- Industrial and medical contexts
- Scientific reporting
- Range of prior specifications conduct
sensitivity analyses
13Subjective priors
- Psychological research should take account of
when devising methods for prior elicitation - Construction of questions
- Anchors
- Probability assessment by frequency
- Availability inverse expertise effect
- Priors are often too narrow
Experimental Psychology, Behavioural Decision
Making, Management Science, Cognitive Psychology
14Nonsubjective priors
- Nonsubjective (objective) priors why?
- Sensible default priors for non-experts (and
experts!) - Recognise basis often weak
- Possible nasty surprises!
- Reference priors for regulatory bodies
- Clinical trials, industrial standards, official
statistics - Safe default priors for high-dimensional problems
- Priors more difficult to specify and possibly
more severe effect
15Nonsubjective priors
- Some general problems
- Improper priors
- Improper posteriors
- E.g. Hierarchical models
- Marginalisation and sampling theory paradoxes
- Dutch books
- Inconsistency
- Posterior doesnt concentrate around true value
asymptotically - Inadmissibility
- of Bayes decision rules/estimators
16Nonsubjective priors
- Proper diffuse priors
- Near-impropriety of posterior
- Unintended large impact on posterior
- Example to follow ...
- Arbitrary choice of hyperparameters
- Non-objectivity
- Lack of invariance
- Egg on face ...
Two examples ...
17WinBUGS - the Movie!
( f is the precision)
- Data 529.0, 530.0, 532.0, 533.1, 533.4, 533.6,
533.7, 534.1, 534.8, 535.3 - Prior parameters a b c 0.001
- Relatively diffuse prior
- Results ...
18WinBUGS - the Movie!
Just another few iterations to make sure ...
19WinBUGS - the Movie!
Oops!
20WinBUGS - the Movie!
- Effect of choice of c (the prior precision of m)
- c 0.001 WinBUGS eventually gets the right
answer - but presumably not the answer we wanted!
- The noninformative prior dominates the
likelihood.
21WinBUGS - the Movie!
- c 0.0002 WinBUGS gives the right answer with
the likelihood dominating - However, it's the wrong answer as the true
marginal posterior of m is still dominated by the
prior
22WinBUGS - the Movie!
- c 0.00016 WinBUGS again gives the right
answer with the likelihood dominating - But it's still the wrong answer
- The true marginal posterior distribution of m
is bimodal
23WinBUGS - the Movie!
- c 0.00010 WinBUGS gives the right answer
- ... and presumably the one we wanted!
Care needed in the choice of prior
parameters in diffuse but proper priors
24Normal regression
( f is the precision)
- Limit as is
-
- Jeffreys' prior
- Here gives exact matching in both
posterior and predictive distributions
25Normal regression
- Data n 25, R residual sum of squares 2.1
- 1.
2.
26Normal regression
- Prediction. Let Y be a future observation and
let denote the usual predictive pivotal
quantity. Then - 1.
2.
Prediction less sensitive to prior than
estimation
27Methods of prior construction
- Limits of proper priors
- Uniform priors/choice of scale
- Data-translated likelihood
- Constant asymptotic precision
- Canonical parameterisation
- Coverage Probability Bias
- Decision-theoretic
28Coverage probability bias
- Sometimes investigated in papers via simulation
(cf. the diagnostic testing example) - Parametric CPB
- When do Bayesian credible intervals have the
correct frequentist coverage? - In regular one-parameter problems, matching is
asymptotically achieved by Jeffreys' prior (Welch
and Peers, 1963) - In multiparameter families cannot in general
achieve matching for all marginals using the same
prior - Usually contravenes the likelihood principle (see
Sweeting, 2001 for a discussion) - Avoid infinite confidence sets! (e.g. ratios of
parameters)
29Coverage probability bias
- Predictive CPB
- When do Bayesian predictive intervals have the
correct frequentist coverage? - In regular one-parameter problems, there exists a
unique prior for which there's no asymptotic CPB
... - ... but in general this depends on the
probability level a! - If there does exist a matching prior that is free
from a then it is Jeffreys' prior (Datta,
Mukerjee, Ghosh and Sweeting, 2000) - In the multiparameter case, if there exists a
matching prior then it is usually not Jeffreys'
prior
30Relative entropy loss
- The reference prior (Bernardo, 1979) maximises
the Shannon mutual information between q and X - Maximises the distance between the prior and
posterior minimal effect of the prior - Also arises as an asymptotically minimax solution
under relative entropy loss (Clarke and Barron,
1994, Barron, 1998)
31Relative entropy loss
- Define the prior-predictive regret
- Minimax/reference prior solution for the full
parameter is usually Jeffreys' prior
- Bernardo argues that when nuisance parameters
are present the reference prior should depend on
which parameter(s) are considered to be of
primary interest
32Relative entropy loss
- A predictive relative entropy approach
- Geisser (1979) suggested a predictive information
criterion introduced by Aitchison (1975) - Standard argument for using log q(Y) as an
operational/default utility function for q as a
predictive density for a future observation Y
(c.f. Good, 1968)
33Relative entropy loss
- is the expected regret under the
loss function - associated with using the
predictive density - when Y arises from
- Appropriate object to study for constructing
objective prior distributions when we are
interested in predictive performance of p under
repeated use or under alternative subjective
priors t
34Relative entropy loss
- Now define the predictive relative entropy loss
(PREL) - where J is Jeffreys prior
- Studying the behaviour of the regret
over t in sets of constant 'predictive
information' is equivalent to studying the
behaviour of the PREL
35Relative entropy loss
36Relative entropy loss
- Under suitable regularity conditions we get
- Although the defined loss functions cover an
infinite variety of possibilities for (a) amount
of data to be observed and (b) predictions to be
made, they are all approximately equivalent to
provided that a sufficient amount of data
is to be observed. - Call the (asymptotic) predictive loss
37Relative entropy loss
- More generally define
- represents the asymptotically
worst-case loss - Investigate its behaviour
- Let
- The prior is minimax if
38Relative entropy loss
- Example 1.
- Consider the class of improper priors
- These all deliver constant risk, with
- All the priors with c nonzero are therefore
inadmissible
- Jeffreys' prior (c 0) is minimax
39Relative entropy loss
- Example 2.
- Consider the class of improper priors
- These all deliver constant risk, with
- L attains its minimum value when a 1, which
corresponds to - Jeffreys' independence prior
- The minimum value -½ lt 0 so that Jeffreys' prior
is inadmissible
40Relative entropy loss
- Example 3.
- Consider again the class of improper priors
- These all deliver constant risk, with
- L attains its minimum value when a 1, which
again corresponds to Jeffreys' independence prior
- The drop in predictive loss increases as the
square of the number q of regressors in the model
41Relative entropy loss
- The above predictive minimax priors also give
rise to minimum predictive coverage probability
bias (Datta, Mukerjee, Ghosh and Sweeting, 2000) - Final note an inappropriately elicited
subjective prior may lead to very high predictive
risk!
42Wrap-up
- We have reviewed some common approaches to prior
construction, from full elicitation to using
default recipes - Need to be aware of dangers, whatever the
approach - As model complexity increases it becomes more
difficult to make sensible prior assignments. At
the same time, the effect of the prior
specification can become more pronounced - Important to have a sound methodology for the
construction of priors in the multiparameter case
- Data-dependent priors may be justifiable (e.g.
Box-Cox transformation model)
43Wrap-up
- More extensive analysis of the predictive risk
approach needed - Developing general methods of finding exact and
approximate solutions for practical
implementation - Investigating connections with predictive
coverage probability bias - Analysing dependent and non-regular problems
- Investigating problems involving mixed
subjective/nonsubjective priors - Priors for model choice or model averaging ...
- ... another talk!
44Wrap-up
Have a great conference!