Developments in Bayesian Priors - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Developments in Bayesian Priors

Description:

Developments in Bayesian Priors. Roger Barlow. Manchester IoP meeting. November 16th 2005 ... Signal size downgraded. Manchester IoP Half Day Meeting ... – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 28
Provided by: RogerB99
Category:

less

Transcript and Presenter's Notes

Title: Developments in Bayesian Priors


1
Developments in Bayesian Priors
  • Roger Barlow
  • Manchester IoP meeting
  • November 16th 2005

2
Plan
  • Probability
  • Frequentist
  • Bayesian
  • Bayes Theorem
  • Priors
  • Prior pitfalls (1) Le Diberder
  • Prior pitfalls (2) Heinrich
  • Jeffreys Prior
  • Fisher Information
  • Reference Priors Demortier

3
Probability
  • Probability as limit of frequency
  • P(A) Limit NA/Ntotal
  • Usual definition taught to students
  • Makes sense
  • Works well most of the time-
  • But not all

4
Frequentist probability
  • It will probably rain tomorrow.
  • Mt174.35.1 GeV means the top quark mass lies
    between 169.2 and 179.4, with 68 probability.
  • The statement It will rain tomorrow. is
    probably true.
  • Mt174.35.1 GeV means the top quark mass lies
    between 169.2 and 179.4, at 68 confidence.

5
Bayesian Probability
  • P(A) expresses my belief that A is true
  • Limits 0(impossible) and 1 (certain)
  • Calibrated off clear-cut instances (coins, dice,
    urns)

6
Frequentist versus Bayesian?
  • Two sorts of probability totally different.
    (Bayesian probability also known as Inverse
    Probability.)
  • Rivals? Religious differences?
  • Particle Physicists tend to be frequentists.
    Cosmologists tend to be Bayesians
  • No. Two different tools for practitioners
  • Important to
  • Be aware of the limits and pitfalls of both
  • Always be aware which youre using

7
Bayes Theorem (1763)
  • P(AB) P(B) P(A and B) P(BA) P(A)
  • P(AB)P(BA) P(A)
  • P(B)
  • Frequentist use eg Cerenkov counter
  • P(? signal)P(signal ?) P(?) / P(signal)
  • Bayesian use
  • P(theory data) P(data theory) P(theory)
  • P(data)

8
Bayesian Prior
  • P(theory) is the Prior
  • Expresses prior belief theory is true
  • Can be function of parameter
  • P(Mtop), P(MH), P(a,ß,?)
  • Bayes Theorem describes way prior belief is
    modified by experimental data
  • But what do you take as initial prior?

9
Uniform Prior
  • General usage choose P(a) uniform in a
  • (principle of insufficient reason)
  • Often improper ?P(a)da 8. Though posterior
    P(ax) comes out sensible
  • BUT!
  • If P(a) uniform, P(a2) , P(ln a) , P(va).. are
    not
  • Insufficient reason not valid (unless a is most
    fundamental whatever that means)
  • Statisticians handle this check results for
    robustness under different priors

10
Example Le Diberder
  • Sad Story
  • Fitting CKM angle a from B???
  • 6 observables
  • 3 amplitudes 6 unknown parameters (magnitudes,
    phases)
  • a is the fundamentally interesting one

11
Results
Frequentist
Bayesian Set one phase to zero Uniform priors in
other two phases and 3 magnitudes
12
More Results
Bayesian Parametrise Tree and Penguin amplitudes
Bayesian 3 Amplitudes 3 real parts, 3
Imaginary parts
13
Interpretation
  • B??? shows same (mis)behaviour
  • Removing all experimental info gives similar P(a)
  • The curse of high dimensions is at work

Uniformity in x,y,z makes P(r) peak at large
r This result is not robust under changes of prior
14
Example - Heinrich
  • CDF statistics group looking at problem of
    estimating signal cross section S in presence of
    background and efficiency.
  • N eSb
  • Efficiency and Background from separate
    calibration experiments (sidebands or MC).
    Scaling factors ?, ? are known.
  • Everything done using Bayesian methods with
    uniform priors and Poisson statistics formula.
    Calibration experiments use uniform prior for e
    and for b, yielding posteriors used for S
  • P(NS)(1/N!)??e-(eSb) (eSb )N P(e) P(b) de db
  • Check coverage all fine

15
But it all goes pear shaped..
  • If particle decays in several channels
  • H??? H? t t- H??bb
  • Each channel with different b and e total 2N1
    parameters, 2N1 experiments
  • Heavy undercoverage!
  • e.g. with 4 channels,
  • all e2510, b0.750.25
  • For s10 get 90 upper limit
  • above s in only 80 of cases

100
90
S
10
20
16
The curse strikes again
  • Uniform prior in e fine
  • Uniform prior in e1, e2 eN
  • eN-1 prior in total e
  • Prejudice in favour of high efficiency
  • Signal size downgraded

17
Happy ending
  • Effect avoided by using Jeffreys Priors instead
    of uniform priors for e and b
  • Not uniform but like 1/e, 1/b
  • Not entirely realistic but interesting
  • Uniform prior in S is not a problem but maybe
    should consider 1/vS?
  • Coverage (a very frequentist concept) is a useful
    tool for Bayesians

18
Fisher Information
An informative experiment is one for which a
measurement of x will give precise information
about the parameter a. Quantify I(a) -lt?2 ln
L/?a2gt (Second derivative curvature)
P(x,a) everything P(x)a is the pdf P(a)x is
the likelihood L(a)
19
Jeffreys Prior
A prior may be uniform in a but if I(a) depends
on a its still not flat special values of a
give better measurements
  • Transform a ? a such that I(a) is constant.
    Then choose a uniform prior
  • location parameter uniform prior OK
  • scale parameter a is ln a. prior 1/a
  • Poisson mean prior 1/va

20
Objective Prior?
  • Jeffreys called this an objective prior as
    opposed to subjective or straight guesswork,
    but not everyone was convinced
  • For statisticians flat prior means Jeffreys
    prior. For physicists it means uniform prior
  • Prior depends on likelihood. Your prior belief
    P(MH) (or whatever) depends on the analysis
  • Equivalent to a prior proportional to vI

21
Reference Priors (Demortier)
  • 4 steps
  • Intrinsic Discrepancy
  • Between two PDFs
  • dP1(z),P2(z)Min?P1(z)ln(P1(z)/P2(z)) dz,
  • ?P2(z)ln(P2(z)/P1(z))dz
  • Sensible measure of difference
  • d0 iff P1(z) P2(z) are the same, else ve
  • Invariant under all transformations of z

22
Reference Priors (2)
  • 2) Expected Intrinsic Information
  • Measurement M x is sampled from p(xa)
  • Parameter a has a prior p(a)
  • Joint distribution p(x,a)p(xa) p(a)
  • Marginal distribution p(x)?p(xa) p(a) da
  • I(p(a),M)dp(x,a),p(x)p(a)
  • Depends on (i) x-a relationship and (ii) breadth
    of p(a)
  • Expected Intrinsic (Shannon) Information from
    measurement M about parameter a

23
Reference Priors (3)
  • 3) Missing information
  • Measurement Mk k samples of x
  • Enough measurements fix a completely
  • Limit k?8 I(p(a),Mk) is the difference between
    knowledge encapsulated in prior p(a) and complete
    knowledge of a. Hence Missing Information given
    p(a).

24
Reference Priors(4)
  • 4) Family of priors P (e.g. Fourier series,
    polynomials, histogram). p(a)?P
  • Ignorance principle choose the least informative
    (dumbest) prior in the family the one for which
    the missing information Limit k?8 I(p(a),Mk) is
    largest.
  • Technical difficulties in taking k limit and
    integrating over infinite range of a

25
Family of Priors (Google)
26
Reference Priors
  • Do not represent subjective belief in fact the
    opposite (like a jury selection). Allow the most
    input to come from the data. Formal consensus
    practitioners can use to arrive at sensible
    posterior
  • Depend on measurement p(xa) cf Jeffreys
  • Also require family of P of possible priors
  • May be improper but this doesnt matter (do not
    represent).
  • For 1 parameter (if measurement is
    asymptoticallly Gaussian, which the CLT usually
    secures) give Jeffreys prior
  • But can also (unlike Jeffreys) work for several
    parameters

27
Summary
  • Probability
  • Frequentist
  • Bayesian
  • Bayes Theorem
  • Priors
  • Prior pitfalls (1) Le Diberder
  • Prior pitfalls (2) Heinrich
  • Jeffreys Prior
  • Fisher Information
  • Reference Priors Demortier
Write a Comment
User Comments (0)
About PowerShow.com