Priors

About This Presentation

Title:

Priors

Description:

Parametric CPB. When do Bayesian credible intervals have the correct frequentist coverage? ... there exists a unique prior for which there's no asymptotic CPB ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 45

Provided by: trevors7

Category:

Tags: cpb | priors

more less

Transcript and Presenter's Notes

Title: Priors

1
Priors

Trevor Sweeting
Department of Statistical Science
University College London

2
Structure of talk

Bayesian inference the basics
Specification of the prior
Examples
Subjective priors
Nonsubjective priors
Examples
Methods of prior construction
Coverage probability bias
Relative entropy loss
Wrap-up

3
Bayesian inference the basics

X the experimental or observational data to be
observed
Y the future observations to be predicted
Data model
(Possibly improper) prior distribution
The posterior density of q is

Posterior density µ Prior density x Likelihood
function
posterior probabilities, moments, marginal
densities, expected losses, predictive densities
...
4
Bayesian inference

The predictive density of Y given X is
Where are we?...

Philosophical basis

Practical implementation

Prior construction ...

5
Specification of the prior

Approaches vary
from fully Bayesian analyses based on fully
elicited subjective priors
to fully frequentist analyses based on
nonsubjective (objective) priors

Fully Bayesian Fully Frequentist
Subjective Elicited prior
Mixed Performance Penalty fn
Nonsubjective Default prior Dual verification Performance
6
Examples

Four examples
All taken from Applied Statistics, 52 (2003)
Competing risks
Image analysis
Diagnostic testing
Geostatistical modelling

7
Competing risks (Basu and Sen)

System failure data cause of failure not
identified
n systems, R competing risks
Datum for each system is (T, S, C)
T is failure time, S are the possible causes of
failure, C is a censoring indicator
Parameters in the model are of location scale
type
Use (i) informative conjugate priors
Source historical data
or (ii) noninformative priors
Such that they have a minimal effect on the
analysis
Implementation via Gibbs sampling

8
Image analysis (Dryden, Scarr and Taylor)

Segmentation of weed and crop textures
Automatic identification of weeds in images of
row crops
Parameters are (k, C, f)
k is the number of texture components, C are
texture labels, f are parameters associated with
the distribution of pixel intensities
Highly structured prior for (k, C, f)
Markov random field for C, truncated conjugate
priors for f
Hyperparameters set in context e.g. to
encourage relatively few textures
Implementation via Markov chain Monte Carlo

9
Diagnostic testing (Georgiadie, Johnson, Gardner
and Singh)

Multiple-test screening data models are
unidentifiable
A Bayesian analysis therefore depends critically
on prior information
Parameters consist of various (at least 8) joint
sensitivity and specificity probabilities
Independent beta priors two informative, the
rest noninformative
Investigate coverage performance and
sensitivities for various choices of prior
Implementation via Gibbs sampling

10
Geostatistical modelling (Kammann and Wand)

Geostatistical mapping to study geographical
variability of reproductive health outcomes
(disease mapping)
Geoadditive models
Universal kriging model involves a stationary
zero-mean stochastic process over sites
leads to borrowing strength
Non-Bayesian analysis, but model could be
formulated in a Bayesian way, with the mean
responses at the given sites having a
multivariate normal prior
Implementation residual ML and splines

11
Table for examples
Fully Bayesian Fully Frequentist
Subjective

Nonsubjective
Image analysis

Competing risks

Diagnostic testing
Geostatistical modelling
12
Subjective priors

To some extent, all the previous examples
included subjective prior specification
Methods of elicitation
Industrial and medical contexts
Scientific reporting
Range of prior specifications conduct
sensitivity analyses

13
Subjective priors

Psychological research should take account of
when devising methods for prior elicitation
Construction of questions
Anchors
Probability assessment by frequency
Availability inverse expertise effect
Priors are often too narrow

Experimental Psychology, Behavioural Decision
Making, Management Science, Cognitive Psychology
14
Nonsubjective priors

Nonsubjective (objective) priors why?
Sensible default priors for non-experts (and
experts!)
Recognise basis often weak
Possible nasty surprises!
Reference priors for regulatory bodies
Clinical trials, industrial standards, official
statistics
Safe default priors for high-dimensional problems
Priors more difficult to specify and possibly
more severe effect

15
Nonsubjective priors

Some general problems
Improper priors
Improper posteriors
E.g. Hierarchical models
Marginalisation and sampling theory paradoxes
Dutch books
Inconsistency
Posterior doesnt concentrate around true value
asymptotically
Inadmissibility
of Bayes decision rules/estimators

16
Nonsubjective priors

Proper diffuse priors
Near-impropriety of posterior
Unintended large impact on posterior
Example to follow ...
Arbitrary choice of hyperparameters
Non-objectivity
Lack of invariance
Egg on face ...

Two examples ...
17
WinBUGS - the Movie!
( f is the precision)

Data 529.0, 530.0, 532.0, 533.1, 533.4, 533.6,
533.7, 534.1, 534.8, 535.3
Prior parameters a b c 0.001
Relatively diffuse prior
Results ...

18
WinBUGS - the Movie!
Just another few iterations to make sure ...
19
WinBUGS - the Movie!
Oops!
20
WinBUGS - the Movie!

Effect of choice of c (the prior precision of m)
c 0.001 WinBUGS eventually gets the right
answer
but presumably not the answer we wanted!

The noninformative prior dominates the
likelihood.

21
WinBUGS - the Movie!

c 0.0002 WinBUGS gives the right answer with
the likelihood dominating
However, it's the wrong answer as the true
marginal posterior of m is still dominated by the
prior

22
WinBUGS - the Movie!

c 0.00016 WinBUGS again gives the right
answer with the likelihood dominating
But it's still the wrong answer

The true marginal posterior distribution of m
is bimodal

23
WinBUGS - the Movie!

c 0.00010 WinBUGS gives the right answer
... and presumably the one we wanted!

Care needed in the choice of prior
parameters in diffuse but proper priors
24
Normal regression
( f is the precision)

Conjugate prior

Limit as is
Jeffreys' prior
Here gives exact matching in both
posterior and predictive distributions

25
Normal regression

Data n 25, R residual sum of squares 2.1
1.

2.
26
Normal regression

Prediction. Let Y be a future observation and
let denote the usual predictive pivotal
quantity. Then
1.

2.
Prediction less sensitive to prior than
estimation
27
Methods of prior construction

Limits of proper priors
Uniform priors/choice of scale
Data-translated likelihood
Constant asymptotic precision
Canonical parameterisation
Coverage Probability Bias
Decision-theoretic

28
Coverage probability bias

Sometimes investigated in papers via simulation
(cf. the diagnostic testing example)
Parametric CPB
When do Bayesian credible intervals have the
correct frequentist coverage?
In regular one-parameter problems, matching is
asymptotically achieved by Jeffreys' prior (Welch
and Peers, 1963)
In multiparameter families cannot in general
achieve matching for all marginals using the same
prior
Usually contravenes the likelihood principle (see
Sweeting, 2001 for a discussion)
Avoid infinite confidence sets! (e.g. ratios of
parameters)

29
Coverage probability bias

Predictive CPB
When do Bayesian predictive intervals have the
correct frequentist coverage?
In regular one-parameter problems, there exists a
unique prior for which there's no asymptotic CPB
...
... but in general this depends on the
probability level a!
If there does exist a matching prior that is free
from a then it is Jeffreys' prior (Datta,
Mukerjee, Ghosh and Sweeting, 2000)
In the multiparameter case, if there exists a
matching prior then it is usually not Jeffreys'
prior

30
Relative entropy loss

The reference prior (Bernardo, 1979) maximises
the Shannon mutual information between q and X
Maximises the distance between the prior and
posterior minimal effect of the prior
Also arises as an asymptotically minimax solution
under relative entropy loss (Clarke and Barron,
1994, Barron, 1998)

31
Relative entropy loss

Define the prior-predictive regret
Minimax/reference prior solution for the full
parameter is usually Jeffreys' prior

Bernardo argues that when nuisance parameters
are present the reference prior should depend on
which parameter(s) are considered to be of
primary interest

32
Relative entropy loss

A predictive relative entropy approach
Geisser (1979) suggested a predictive information
criterion introduced by Aitchison (1975)
Standard argument for using log q(Y) as an
operational/default utility function for q as a
predictive density for a future observation Y
(c.f. Good, 1968)

33
Relative entropy loss

Define

is the expected regret under the
loss function
associated with using the
predictive density
when Y arises from

Appropriate object to study for constructing
objective prior distributions when we are
interested in predictive performance of p under
repeated use or under alternative subjective
priors t

34
Relative entropy loss

Now define the predictive relative entropy loss
(PREL)
where J is Jeffreys prior
Studying the behaviour of the regret
over t in sets of constant 'predictive
information' is equivalent to studying the
behaviour of the PREL

35
Relative entropy loss
36
Relative entropy loss

Under suitable regularity conditions we get
Although the defined loss functions cover an
infinite variety of possibilities for (a) amount
of data to be observed and (b) predictions to be
made, they are all approximately equivalent to
provided that a sufficient amount of data
is to be observed.
Call the (asymptotic) predictive loss

37
Relative entropy loss

More generally define
represents the asymptotically
worst-case loss
Investigate its behaviour
Let
The prior is minimax if

38
Relative entropy loss

Example 1.
Consider the class of improper priors
These all deliver constant risk, with

All the priors with c nonzero are therefore
inadmissible

Jeffreys' prior (c 0) is minimax

39
Relative entropy loss

Example 2.
Consider the class of improper priors
These all deliver constant risk, with

L attains its minimum value when a 1, which
corresponds to
Jeffreys' independence prior

The minimum value -½ lt 0 so that Jeffreys' prior
is inadmissible

40
Relative entropy loss

Example 3.
Consider again the class of improper priors
These all deliver constant risk, with

L attains its minimum value when a 1, which
again corresponds to Jeffreys' independence prior

The drop in predictive loss increases as the
square of the number q of regressors in the model

41
Relative entropy loss

The above predictive minimax priors also give
rise to minimum predictive coverage probability
bias (Datta, Mukerjee, Ghosh and Sweeting, 2000)
Final note an inappropriately elicited
subjective prior may lead to very high predictive
risk!

42
Wrap-up

We have reviewed some common approaches to prior
construction, from full elicitation to using
default recipes
Need to be aware of dangers, whatever the
approach
As model complexity increases it becomes more
difficult to make sensible prior assignments. At
the same time, the effect of the prior
specification can become more pronounced
Important to have a sound methodology for the
construction of priors in the multiparameter case
Data-dependent priors may be justifiable (e.g.
Box-Cox transformation model)

43
Wrap-up

More extensive analysis of the predictive risk
approach needed
Developing general methods of finding exact and
approximate solutions for practical
implementation
Investigating connections with predictive
coverage probability bias
Analysing dependent and non-regular problems
Investigating problems involving mixed
subjective/nonsubjective priors
Priors for model choice or model averaging ...
... another talk!

44
Wrap-up

And finally

Have a great conference!

Write a Comment

User Comments (0)

About PowerShow.com

Priors - PowerPoint PPT Presentation

Priors

Parametric CPB. When do Bayesian credible intervals have the correct frequentist coverage? ... there exists a unique prior for which there's no asymptotic CPB ... – PowerPoint PPT presentation