Developments in Bayesian Priors - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Developments in Bayesian Priors

Description:

Developments in Bayesian Priors. Roger Barlow. Manchester IoP meeting. November 16th 2005 ... Signal size downgraded. Manchester IoP Half Day Meeting ... – PowerPoint PPT presentation

Number of Views:14

Avg rating:3.0/5.0

Slides: 28

Provided by: RogerB99

Category:

more less

Transcript and Presenter's Notes

Title: Developments in Bayesian Priors

1
Developments in Bayesian Priors

Roger Barlow
Manchester IoP meeting
November 16th 2005

2
Plan

Probability
Frequentist
Bayesian
Bayes Theorem
Priors
Prior pitfalls (1) Le Diberder
Prior pitfalls (2) Heinrich
Jeffreys Prior
Fisher Information
Reference Priors Demortier

3
Probability

Probability as limit of frequency
P(A) Limit NA/Ntotal
Usual definition taught to students
Makes sense
Works well most of the time-
But not all

4
Frequentist probability

It will probably rain tomorrow.
Mt174.35.1 GeV means the top quark mass lies
between 169.2 and 179.4, with 68 probability.
The statement It will rain tomorrow. is
probably true.
Mt174.35.1 GeV means the top quark mass lies
between 169.2 and 179.4, at 68 confidence.

5
Bayesian Probability

P(A) expresses my belief that A is true
Limits 0(impossible) and 1 (certain)
Calibrated off clear-cut instances (coins, dice,
urns)

6
Frequentist versus Bayesian?

Two sorts of probability totally different.
(Bayesian probability also known as Inverse
Probability.)
Rivals? Religious differences?
Particle Physicists tend to be frequentists.
Cosmologists tend to be Bayesians
No. Two different tools for practitioners
Important to
Be aware of the limits and pitfalls of both
Always be aware which youre using

7
Bayes Theorem (1763)

P(AB) P(B) P(A and B) P(BA) P(A)
P(AB)P(BA) P(A)
P(B)
Frequentist use eg Cerenkov counter
P(? signal)P(signal ?) P(?) / P(signal)
Bayesian use
P(theory data) P(data theory) P(theory)
P(data)

8
Bayesian Prior

P(theory) is the Prior
Expresses prior belief theory is true
Can be function of parameter
P(Mtop), P(MH), P(a,ß,?)
Bayes Theorem describes way prior belief is
modified by experimental data
But what do you take as initial prior?

9
Uniform Prior

General usage choose P(a) uniform in a
(principle of insufficient reason)
Often improper ?P(a)da 8. Though posterior
P(ax) comes out sensible
BUT!
If P(a) uniform, P(a2) , P(ln a) , P(va).. are
not
Insufficient reason not valid (unless a is most
fundamental whatever that means)
Statisticians handle this check results for
robustness under different priors

10
Example Le Diberder

Sad Story
Fitting CKM angle a from B???
6 observables
3 amplitudes 6 unknown parameters (magnitudes,
phases)
a is the fundamentally interesting one

11
Results
Frequentist
Bayesian Set one phase to zero Uniform priors in
other two phases and 3 magnitudes
12
More Results
Bayesian Parametrise Tree and Penguin amplitudes
Bayesian 3 Amplitudes 3 real parts, 3
Imaginary parts
13
Interpretation

B??? shows same (mis)behaviour
Removing all experimental info gives similar P(a)
The curse of high dimensions is at work

Uniformity in x,y,z makes P(r) peak at large
r This result is not robust under changes of prior
14
Example - Heinrich

CDF statistics group looking at problem of
estimating signal cross section S in presence of
background and efficiency.
N eSb
Efficiency and Background from separate
calibration experiments (sidebands or MC).
Scaling factors ?, ? are known.
Everything done using Bayesian methods with
uniform priors and Poisson statistics formula.
Calibration experiments use uniform prior for e
and for b, yielding posteriors used for S
P(NS)(1/N!)??e-(eSb) (eSb )N P(e) P(b) de db
Check coverage all fine

15
But it all goes pear shaped..

If particle decays in several channels
H??? H? t t- H??bb
Each channel with different b and e total 2N1
parameters, 2N1 experiments
Heavy undercoverage!
e.g. with 4 channels,
all e2510, b0.750.25
For s10 get 90 upper limit
above s in only 80 of cases

100
90
S
10
20
16
The curse strikes again

Uniform prior in e fine
Uniform prior in e1, e2 eN
eN-1 prior in total e
Prejudice in favour of high efficiency
Signal size downgraded

17
Happy ending

Effect avoided by using Jeffreys Priors instead
of uniform priors for e and b
Not uniform but like 1/e, 1/b
Not entirely realistic but interesting
Uniform prior in S is not a problem but maybe
should consider 1/vS?
Coverage (a very frequentist concept) is a useful
tool for Bayesians

18
Fisher Information
An informative experiment is one for which a
measurement of x will give precise information
about the parameter a. Quantify I(a) -lt?2 ln
L/?a2gt (Second derivative curvature)
P(x,a) everything P(x)a is the pdf P(a)x is
the likelihood L(a)
19
Jeffreys Prior
A prior may be uniform in a but if I(a) depends
on a its still not flat special values of a
give better measurements

Transform a ? a such that I(a) is constant.
Then choose a uniform prior
location parameter uniform prior OK
scale parameter a is ln a. prior 1/a
Poisson mean prior 1/va

20
Objective Prior?

Jeffreys called this an objective prior as
opposed to subjective or straight guesswork,
but not everyone was convinced
For statisticians flat prior means Jeffreys
prior. For physicists it means uniform prior
Prior depends on likelihood. Your prior belief
P(MH) (or whatever) depends on the analysis
Equivalent to a prior proportional to vI

21
Reference Priors (Demortier)

4 steps
Intrinsic Discrepancy
Between two PDFs
dP1(z),P2(z)Min?P1(z)ln(P1(z)/P2(z)) dz,
?P2(z)ln(P2(z)/P1(z))dz
Sensible measure of difference
d0 iff P1(z) P2(z) are the same, else ve
Invariant under all transformations of z

22
Reference Priors (2)

2) Expected Intrinsic Information
Measurement M x is sampled from p(xa)
Parameter a has a prior p(a)
Joint distribution p(x,a)p(xa) p(a)
Marginal distribution p(x)?p(xa) p(a) da
I(p(a),M)dp(x,a),p(x)p(a)
Depends on (i) x-a relationship and (ii) breadth
of p(a)
Expected Intrinsic (Shannon) Information from
measurement M about parameter a

23
Reference Priors (3)

3) Missing information
Measurement Mk k samples of x
Enough measurements fix a completely
Limit k?8 I(p(a),Mk) is the difference between
knowledge encapsulated in prior p(a) and complete
knowledge of a. Hence Missing Information given
p(a).

24
Reference Priors(4)

4) Family of priors P (e.g. Fourier series,
polynomials, histogram). p(a)?P
Ignorance principle choose the least informative
(dumbest) prior in the family the one for which
the missing information Limit k?8 I(p(a),Mk) is
largest.
Technical difficulties in taking k limit and
integrating over infinite range of a

25
Family of Priors (Google)
26
Reference Priors

Do not represent subjective belief in fact the
opposite (like a jury selection). Allow the most
input to come from the data. Formal consensus
practitioners can use to arrive at sensible
posterior
Depend on measurement p(xa) cf Jeffreys
Also require family of P of possible priors
May be improper but this doesnt matter (do not
represent).
For 1 parameter (if measurement is
asymptoticallly Gaussian, which the CLT usually
secures) give Jeffreys prior
But can also (unlike Jeffreys) work for several
parameters

27
Summary