Priors - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Priors

Description:

Parametric CPB. When do Bayesian credible intervals have the correct frequentist coverage? ... there exists a unique prior for which there's no asymptotic CPB ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 45
Provided by: trevors7
Category:
Tags: cpb | priors

less

Transcript and Presenter's Notes

Title: Priors


1
Priors
  • Trevor Sweeting
  • Department of Statistical Science
  • University College London

2
Structure of talk
  • Bayesian inference the basics
  • Specification of the prior
  • Examples
  • Subjective priors
  • Nonsubjective priors
  • Examples
  • Methods of prior construction
  • Coverage probability bias
  • Relative entropy loss
  • Wrap-up

3
Bayesian inference the basics
  • X the experimental or observational data to be
    observed
  • Y the future observations to be predicted
  •  Data model
  • (Possibly improper) prior distribution
  •  The posterior density of q is  

Posterior density µ Prior density x Likelihood
function
posterior probabilities, moments, marginal
densities, expected losses, predictive densities
...
4
Bayesian inference
  • The predictive density of Y given X is
  •   
  •  
  • Where are we?...
  • Philosophical basis
  • Practical implementation
  • Prior construction ...

5
Specification of the prior
  • Approaches vary
  • from fully Bayesian analyses based on fully
    elicited subjective priors
  • to fully frequentist analyses based on
    nonsubjective (objective) priors

Fully Bayesian Fully Frequentist
Subjective Elicited prior
Mixed Performance Penalty fn
Nonsubjective Default prior Dual verification Performance
6
Examples
  • Four examples
  • All taken from Applied Statistics, 52 (2003)
  • Competing risks
  • Image analysis
  • Diagnostic testing
  • Geostatistical modelling

7
Competing risks (Basu and Sen)
  • System failure data cause of failure not
    identified
  • n systems, R competing risks
  • Datum for each system is (T, S, C)
  • T is failure time, S are the possible causes of
    failure, C is a censoring indicator
  • Parameters in the model are of location scale
    type
  • Use (i) informative conjugate priors
  • Source historical data
  • or (ii) noninformative priors
  • Such that they have a minimal effect on the
    analysis
  • Implementation via Gibbs sampling

8
Image analysis (Dryden, Scarr and Taylor)
  • Segmentation of weed and crop textures
  • Automatic identification of weeds in images of
    row crops
  • Parameters are (k, C, f)
  • k is the number of texture components, C are
    texture labels, f are parameters associated with
    the distribution of pixel intensities
  • Highly structured prior for (k, C, f)
  • Markov random field for C, truncated conjugate
    priors for f
  • Hyperparameters set in context e.g. to
    encourage relatively few textures
  • Implementation via Markov chain Monte Carlo

9
Diagnostic testing (Georgiadie, Johnson, Gardner
and Singh)
  • Multiple-test screening data models are
    unidentifiable
  • A Bayesian analysis therefore depends critically
    on prior information
  • Parameters consist of various (at least 8) joint
    sensitivity and specificity probabilities
  • Independent beta priors two informative, the
    rest noninformative
  • Investigate coverage performance and
    sensitivities for various choices of prior
  • Implementation via Gibbs sampling

10
Geostatistical modelling (Kammann and Wand)
  • Geostatistical mapping to study geographical
    variability of reproductive health outcomes
    (disease mapping)
  • Geoadditive models
  • Universal kriging model involves a stationary
    zero-mean stochastic process over sites
  • leads to borrowing strength
  • Non-Bayesian analysis, but model could be
    formulated in a Bayesian way, with the mean
    responses at the given sites having a
    multivariate normal prior
  • Implementation residual ML and splines

11
Table for examples
Fully Bayesian Fully Frequentist
Subjective

Nonsubjective
Image analysis
  • Competing risks

Diagnostic testing
Geostatistical modelling
12
Subjective priors
  • To some extent, all the previous examples
    included subjective prior specification
  • Methods of elicitation
  • Industrial and medical contexts
  • Scientific reporting
  • Range of prior specifications conduct
    sensitivity analyses

13
Subjective priors
  • Psychological research should take account of
    when devising methods for prior elicitation
  • Construction of questions
  • Anchors
  • Probability assessment by frequency
  • Availability inverse expertise effect
  • Priors are often too narrow

Experimental Psychology, Behavioural Decision
Making, Management Science, Cognitive Psychology
14
Nonsubjective priors
  • Nonsubjective (objective) priors why?
  • Sensible default priors for non-experts (and
    experts!)
  • Recognise basis often weak
  • Possible nasty surprises!
  • Reference priors for regulatory bodies
  • Clinical trials, industrial standards, official
    statistics
  • Safe default priors for high-dimensional problems
  • Priors more difficult to specify and possibly
    more severe effect

15
Nonsubjective priors
  • Some general problems
  • Improper priors
  • Improper posteriors
  • E.g. Hierarchical models
  • Marginalisation and sampling theory paradoxes
  • Dutch books
  • Inconsistency
  • Posterior doesnt concentrate around true value
    asymptotically
  • Inadmissibility
  • of Bayes decision rules/estimators

16
Nonsubjective priors
  • Proper diffuse priors
  • Near-impropriety of posterior
  • Unintended large impact on posterior
  • Example to follow ...
  • Arbitrary choice of hyperparameters
  • Non-objectivity
  • Lack of invariance
  • Egg on face ...

Two examples ...
17
WinBUGS - the Movie!
( f is the precision)
  • Data 529.0, 530.0, 532.0, 533.1, 533.4, 533.6,
    533.7, 534.1, 534.8, 535.3
  • Prior parameters a b c 0.001
  • Relatively diffuse prior
  • Results ...

18
WinBUGS - the Movie!
Just another few iterations to make sure ...
19
WinBUGS - the Movie!
Oops!
20
WinBUGS - the Movie!
  • Effect of choice of c (the prior precision of m)
  • c 0.001 WinBUGS eventually gets the right
    answer
  • but presumably not the answer we wanted!
  • The noninformative prior dominates the
    likelihood.

21
WinBUGS - the Movie!
  • c 0.0002 WinBUGS gives the right answer with
    the likelihood dominating
  • However, it's the wrong answer as the true
    marginal posterior of m is still dominated by the
    prior

22
WinBUGS - the Movie!
  • c 0.00016 WinBUGS again gives the right
    answer with the likelihood dominating
  • But it's still the wrong answer
  • The true marginal posterior distribution of m
    is bimodal

23
WinBUGS - the Movie!
  • c 0.00010 WinBUGS gives the right answer
  • ... and presumably the one we wanted!

Care needed in the choice of prior
parameters in diffuse but proper priors
24
Normal regression
( f is the precision)
  • Conjugate prior
  • Limit as is
  • Jeffreys' prior
  • Here gives exact matching in both
    posterior and predictive distributions

25
Normal regression
  • Data n 25, R residual sum of squares 2.1
  • 1.

2.
26
Normal regression
  • Prediction. Let Y be a future observation and
    let denote the usual predictive pivotal
    quantity. Then
  • 1.

2.
Prediction less sensitive to prior than
estimation
27
Methods of prior construction
  • Limits of proper priors
  • Uniform priors/choice of scale
  • Data-translated likelihood
  • Constant asymptotic precision
  • Canonical parameterisation
  • Coverage Probability Bias
  • Decision-theoretic

28
Coverage probability bias
  • Sometimes investigated in papers via simulation
    (cf. the diagnostic testing example)
  • Parametric CPB
  • When do Bayesian credible intervals have the
    correct frequentist coverage?
  • In regular one-parameter problems, matching is
    asymptotically achieved by Jeffreys' prior (Welch
    and Peers, 1963)
  • In multiparameter families cannot in general
    achieve matching for all marginals using the same
    prior
  • Usually contravenes the likelihood principle (see
    Sweeting, 2001 for a discussion)
  • Avoid infinite confidence sets! (e.g. ratios of
    parameters)

29
Coverage probability bias
  • Predictive CPB
  • When do Bayesian predictive intervals have the
    correct frequentist coverage?
  • In regular one-parameter problems, there exists a
    unique prior for which there's no asymptotic CPB
    ...
  • ... but in general this depends on the
    probability level a!
  • If there does exist a matching prior that is free
    from a then it is Jeffreys' prior (Datta,
    Mukerjee, Ghosh and Sweeting, 2000)
  • In the multiparameter case, if there exists a
    matching prior then it is usually not Jeffreys'
    prior

30
Relative entropy loss
  • The reference prior (Bernardo, 1979) maximises
    the Shannon mutual information between q and X
  • Maximises the distance between the prior and
    posterior minimal effect of the prior
  • Also arises as an asymptotically minimax solution
    under relative entropy loss (Clarke and Barron,
    1994, Barron, 1998)

31
Relative entropy loss
  • Define the prior-predictive regret
  • Minimax/reference prior solution for the full
    parameter is usually Jeffreys' prior
  • Bernardo argues that when nuisance parameters
    are present the reference prior should depend on
    which parameter(s) are considered to be of
    primary interest

32
Relative entropy loss
  • A predictive relative entropy approach
  • Geisser (1979) suggested a predictive information
    criterion introduced by Aitchison (1975)
  • Standard argument for using log q(Y) as an
    operational/default utility function for q as a
    predictive density for a future observation Y
    (c.f. Good, 1968)

33
Relative entropy loss
  • Define
  • is the expected regret under the
    loss function
  • associated with using the
    predictive density
  • when Y arises from
  • Appropriate object to study for constructing
    objective prior distributions when we are
    interested in predictive performance of p under
    repeated use or under alternative subjective
    priors t

34
Relative entropy loss
  • Now define the predictive relative entropy loss
    (PREL)
  • where J is Jeffreys prior
  • Studying the behaviour of the regret
    over t in sets of constant 'predictive
    information' is equivalent to studying the
    behaviour of the PREL

35
Relative entropy loss
36
Relative entropy loss
  • Under suitable regularity conditions we get
  • Although the defined loss functions cover an
    infinite variety of possibilities for (a) amount
    of data to be observed and (b) predictions to be
    made, they are all approximately equivalent to
    provided that a sufficient amount of data
    is to be observed.
  • Call the (asymptotic) predictive loss

37
Relative entropy loss
  • More generally define
  • represents the asymptotically
    worst-case loss
  • Investigate its behaviour
  • Let
  • The prior is minimax if

38
Relative entropy loss
  • Example 1.
  • Consider the class of improper priors
  • These all deliver constant risk, with
  • All the priors with c nonzero are therefore
    inadmissible
  • Jeffreys' prior (c 0) is minimax

39
Relative entropy loss
  • Example 2.
  • Consider the class of improper priors
  • These all deliver constant risk, with
  • L attains its minimum value when a 1, which
    corresponds to
  • Jeffreys' independence prior
  • The minimum value -½ lt 0 so that Jeffreys' prior
    is inadmissible

40
Relative entropy loss
  • Example 3.
  • Consider again the class of improper priors
  • These all deliver constant risk, with
  • L attains its minimum value when a 1, which
    again corresponds to Jeffreys' independence prior
  • The drop in predictive loss increases as the
    square of the number q of regressors in the model

41
Relative entropy loss
  • The above predictive minimax priors also give
    rise to minimum predictive coverage probability
    bias (Datta, Mukerjee, Ghosh and Sweeting, 2000)
  • Final note an inappropriately elicited
    subjective prior may lead to very high predictive
    risk!

42
Wrap-up
  • We have reviewed some common approaches to prior
    construction, from full elicitation to using
    default recipes
  • Need to be aware of dangers, whatever the
    approach
  • As model complexity increases it becomes more
    difficult to make sensible prior assignments. At
    the same time, the effect of the prior
    specification can become more pronounced
  • Important to have a sound methodology for the
    construction of priors in the multiparameter case
  • Data-dependent priors may be justifiable (e.g.
    Box-Cox transformation model)

43
Wrap-up
  • More extensive analysis of the predictive risk
    approach needed
  • Developing general methods of finding exact and
    approximate solutions for practical
    implementation
  • Investigating connections with predictive
    coverage probability bias
  • Analysing dependent and non-regular problems
  • Investigating problems involving mixed
    subjective/nonsubjective priors
  • Priors for model choice or model averaging ...
  • ... another talk!

44
Wrap-up
  • And finally

Have a great conference!
Write a Comment
User Comments (0)
About PowerShow.com