Title: Developments in Bayesian Priors
1Developments in Bayesian Priors
- Roger Barlow
- Manchester IoP meeting
- November 16th 2005
2Plan
- Probability
- Frequentist
- Bayesian
- Bayes Theorem
- Priors
- Prior pitfalls (1) Le Diberder
- Prior pitfalls (2) Heinrich
- Jeffreys Prior
- Fisher Information
- Reference Priors Demortier
3Probability
- Probability as limit of frequency
- P(A) Limit NA/Ntotal
- Usual definition taught to students
- Makes sense
- Works well most of the time-
- But not all
4Frequentist probability
- It will probably rain tomorrow.
- Mt174.35.1 GeV means the top quark mass lies
between 169.2 and 179.4, with 68 probability. - The statement It will rain tomorrow. is
probably true. - Mt174.35.1 GeV means the top quark mass lies
between 169.2 and 179.4, at 68 confidence.
5Bayesian Probability
- P(A) expresses my belief that A is true
- Limits 0(impossible) and 1 (certain)
- Calibrated off clear-cut instances (coins, dice,
urns)
6Frequentist versus Bayesian?
- Two sorts of probability totally different.
(Bayesian probability also known as Inverse
Probability.) - Rivals? Religious differences?
- Particle Physicists tend to be frequentists.
Cosmologists tend to be Bayesians - No. Two different tools for practitioners
- Important to
- Be aware of the limits and pitfalls of both
- Always be aware which youre using
7Bayes Theorem (1763)
- P(AB) P(B) P(A and B) P(BA) P(A)
- P(AB)P(BA) P(A)
- P(B)
- Frequentist use eg Cerenkov counter
- P(? signal)P(signal ?) P(?) / P(signal)
- Bayesian use
- P(theory data) P(data theory) P(theory)
- P(data)
8Bayesian Prior
- P(theory) is the Prior
- Expresses prior belief theory is true
- Can be function of parameter
- P(Mtop), P(MH), P(a,ß,?)
- Bayes Theorem describes way prior belief is
modified by experimental data - But what do you take as initial prior?
9Uniform Prior
- General usage choose P(a) uniform in a
- (principle of insufficient reason)
- Often improper ?P(a)da 8. Though posterior
P(ax) comes out sensible - BUT!
- If P(a) uniform, P(a2) , P(ln a) , P(va).. are
not - Insufficient reason not valid (unless a is most
fundamental whatever that means) - Statisticians handle this check results for
robustness under different priors
10Example Le Diberder
- Sad Story
- Fitting CKM angle a from B???
- 6 observables
- 3 amplitudes 6 unknown parameters (magnitudes,
phases) - a is the fundamentally interesting one
11Results
Frequentist
Bayesian Set one phase to zero Uniform priors in
other two phases and 3 magnitudes
12More Results
Bayesian Parametrise Tree and Penguin amplitudes
Bayesian 3 Amplitudes 3 real parts, 3
Imaginary parts
13Interpretation
- B??? shows same (mis)behaviour
- Removing all experimental info gives similar P(a)
- The curse of high dimensions is at work
Uniformity in x,y,z makes P(r) peak at large
r This result is not robust under changes of prior
14Example - Heinrich
- CDF statistics group looking at problem of
estimating signal cross section S in presence of
background and efficiency. - N eSb
- Efficiency and Background from separate
calibration experiments (sidebands or MC).
Scaling factors ?, ? are known. - Everything done using Bayesian methods with
uniform priors and Poisson statistics formula.
Calibration experiments use uniform prior for e
and for b, yielding posteriors used for S - P(NS)(1/N!)??e-(eSb) (eSb )N P(e) P(b) de db
- Check coverage all fine
15But it all goes pear shaped..
- If particle decays in several channels
- H??? H? t t- H??bb
- Each channel with different b and e total 2N1
parameters, 2N1 experiments - Heavy undercoverage!
- e.g. with 4 channels,
- all e2510, b0.750.25
- For s10 get 90 upper limit
- above s in only 80 of cases
100
90
S
10
20
16The curse strikes again
- Uniform prior in e fine
- Uniform prior in e1, e2 eN
- eN-1 prior in total e
- Prejudice in favour of high efficiency
- Signal size downgraded
17Happy ending
- Effect avoided by using Jeffreys Priors instead
of uniform priors for e and b - Not uniform but like 1/e, 1/b
- Not entirely realistic but interesting
- Uniform prior in S is not a problem but maybe
should consider 1/vS? - Coverage (a very frequentist concept) is a useful
tool for Bayesians
18Fisher Information
An informative experiment is one for which a
measurement of x will give precise information
about the parameter a. Quantify I(a) -lt?2 ln
L/?a2gt (Second derivative curvature)
P(x,a) everything P(x)a is the pdf P(a)x is
the likelihood L(a)
19Jeffreys Prior
A prior may be uniform in a but if I(a) depends
on a its still not flat special values of a
give better measurements
- Transform a ? a such that I(a) is constant.
Then choose a uniform prior - location parameter uniform prior OK
- scale parameter a is ln a. prior 1/a
- Poisson mean prior 1/va
-
20Objective Prior?
- Jeffreys called this an objective prior as
opposed to subjective or straight guesswork,
but not everyone was convinced - For statisticians flat prior means Jeffreys
prior. For physicists it means uniform prior - Prior depends on likelihood. Your prior belief
P(MH) (or whatever) depends on the analysis - Equivalent to a prior proportional to vI
21Reference Priors (Demortier)
- 4 steps
- Intrinsic Discrepancy
- Between two PDFs
- dP1(z),P2(z)Min?P1(z)ln(P1(z)/P2(z)) dz,
- ?P2(z)ln(P2(z)/P1(z))dz
- Sensible measure of difference
- d0 iff P1(z) P2(z) are the same, else ve
- Invariant under all transformations of z
22Reference Priors (2)
- 2) Expected Intrinsic Information
- Measurement M x is sampled from p(xa)
- Parameter a has a prior p(a)
- Joint distribution p(x,a)p(xa) p(a)
- Marginal distribution p(x)?p(xa) p(a) da
- I(p(a),M)dp(x,a),p(x)p(a)
- Depends on (i) x-a relationship and (ii) breadth
of p(a) - Expected Intrinsic (Shannon) Information from
measurement M about parameter a
23Reference Priors (3)
- 3) Missing information
- Measurement Mk k samples of x
- Enough measurements fix a completely
- Limit k?8 I(p(a),Mk) is the difference between
knowledge encapsulated in prior p(a) and complete
knowledge of a. Hence Missing Information given
p(a).
24Reference Priors(4)
- 4) Family of priors P (e.g. Fourier series,
polynomials, histogram). p(a)?P - Ignorance principle choose the least informative
(dumbest) prior in the family the one for which
the missing information Limit k?8 I(p(a),Mk) is
largest. - Technical difficulties in taking k limit and
integrating over infinite range of a
25Family of Priors (Google)
26Reference Priors
- Do not represent subjective belief in fact the
opposite (like a jury selection). Allow the most
input to come from the data. Formal consensus
practitioners can use to arrive at sensible
posterior - Depend on measurement p(xa) cf Jeffreys
- Also require family of P of possible priors
- May be improper but this doesnt matter (do not
represent). - For 1 parameter (if measurement is
asymptoticallly Gaussian, which the CLT usually
secures) give Jeffreys prior - But can also (unlike Jeffreys) work for several
parameters
27Summary
- Probability
- Frequentist
- Bayesian
- Bayes Theorem
- Priors
- Prior pitfalls (1) Le Diberder
- Prior pitfalls (2) Heinrich
- Jeffreys Prior
- Fisher Information
- Reference Priors Demortier