Title: The small-n problem in High Energy Physics
1The small-n problem in High Energy Physics
Glen Cowan Department of Physics Royal Holloway,
University of London g.cowan_at_rhul.ac.uk www.pp.rhu
l.ac.uk/cowan Statistical Challenges in
Modern Astronomy IV
June 12 - 15, 2006
Glen Cowan
SCMA4, 12-15 June, 2006
2Outline
I. High Energy Physics (HEP) overview Theory E
xperiments Data II. The small-n problem,
etc. Making a discovery Setting
limits Systematic uncertainties III. Conclusion
s
Glen Cowan
SCMA4, 12-15 June, 2006
3The current picture in particle physics
Matter...
force carriers...
photon (g) W Z gluon (g)
relativity quantum mechanics
symmetries... The Standard Model
- almost certainly incomplete
- 25 free parameters (masses, coupling
strengths,...) - should include Higgs boson (not yet seen)
- no gravity yet
- agrees with all experimental observations so
far
Glen Cowan
SCMA4, 12-15 June, 2006
4Experiments in High Energy Physics
HEP mainly studies particle collisions in
accelerators, e.g., Large Electron-Positron (LEP)
Collider at CERN, 1989-2000 4 detectors, each
collaboration 400 physicists.
Glen Cowan
SCMA4, 12-15 June, 2006
5More HEP experiments
LEP tunnel now used for the Large Hadron Collider
(LHC) proton-proton collisions, Ecm14 TeV, very
high luminosity Two general purpose detectors
ATLAS and CMS Each detector collaboration has
2000 physicists Data taking to start 2007
The ATLAS Detector
Glen Cowan
SCMA4, 12-15 June, 2006
6HEP data
Basic unit of data an event. Ideally, an event
is a list of momentum vectors particle
types. In practice, particles reconstructed as
tracks, clusters of energy deposited in
calorimeters, etc. Resolution, angular coverage,
particle id, etc. imperfect.
An event from the ALEPH detector at LEP
Glen Cowan
SCMA4, 12-15 June, 2006
7Data samples
At LEP, event rates typically Hz or less 106 Z
boson events in 5 years for each of 4
experiments At LHC, 109 events/sec(!!!), mostly
uninteresting do quick sifting, record 200
events/sec single event 1 Mbyte 1 year
107 s, 1016 pp collisions per year, 2 billion /
year recorded (2 Pbyte / year) For new/rare
processes, rates at LHC can be vanishingly
small Higgs bosons detectable per year could be
e.g. 103 ? needle in a haystack
Glen Cowan
SCMA4, 12-15 June, 2006
8HEP game plan
Goals include Fill in the gaps in the Standard
Model (e.g. find the Higgs) Find something
beyond the Standard Model (New Physics) Example
of an extension to SM Supersymmetry (SUSY) For
every SM particle ? SUSY partner (none yet
seen!) Minimal SUSY has 105 free parameters,
constrained models 5 parameters (plus the 25
from SM) Provides dark matter candidate
(neutralino), unification of gauge couplings,
solution to hierarchy problem,... Lightest SUSY
particle can be stable (effectively invisible)
Glen Cowan
SCMA4, 12-15 June, 2006
9Simulated HEP data
Monte Carlo event generators available for
essentially all Standard Model processes, also
for many possible extensions to the SM
(supersymmetric models, extra dimensions,
etc.) SM predictions rely on a variety of
approximations (perturbation theory to limited
order, phenomenological modeling of
non-perturbative effects, etc.) Monte Carlo
programs also used to simulate detector response.
Simulated event for ATLAS
Glen Cowan
SCMA4, 12-15 June, 2006
10A simulated event
PYTHIA Monte Carlo pp ? gluino-gluino
Glen Cowan
SCMA4, 12-15 June, 2006
11The data stream
Experiment records events of different types,
with different numbers of particles, kinematic
properties, ...
Glen Cowan
SCMA4, 12-15 June, 2006
12Selecting events
To search for events of a given type (H0
signal), need discriminating variable(s)
distributed as differently as possible relative
to unwanted event types (H1 background)
Count number of events in acceptance region
defined by cuts Expected number of signal
events s ?s ?s L Expected number of
background events b ? b ?b L ?s, ?b cross
section for signal, background Efficiencies
?s P( accept s ), ?b P( accept b ) L
integrated luminosity (related to beam
intensity, data taking time)
Glen Cowan
SCMA4, 12-15 June, 2006
13Poisson data with background
Count n events, e.g., in fixed time or integrated
luminosity. s expected number of signal
events b expected number of background events
n Poisson(sb)
Sometimes b known, other times it is in some way
uncertain. Goals (i) convince people that s ?
0 (discovery) (ii) measure or place limits on
s, taking into consideration the uncertainty
in b. Widely discussed in HEP community, see e.g.
proceedings of PHYSTAT meetings, Durham,
Fermilab, CERN workshops...
Glen Cowan
SCMA4, 12-15 June, 2006
14Making a discovery
Often compute p-value of the background only
hypothesis H0 using test variable related to a
characteristic of the signal. p-value
Probability to see data as incompatible with H0,
or more so, relative to the data
observed. Requires definition of incompatible
with H0 HEP folklore claim discovery if
p-value equivalent to a 5? fluctuation of
Gaussian variable (one-sided) Actual p-value
at which discovery becomes believable will
depend on signal in question (subjective) Why not
do Bayesian analysis? Usually dont know how to
assign meaningful prior probabilities
Glen Cowan
SCMA4, 12-15 June, 2006
15Computing p-values
For n Poisson (sb) we compute p-value of H0
s 0
Often we dont simply count events but also
measure for each event one or more
quantities number of events observed n replaced
by numbers of events (n1, ..., nN) in a
histogram Goodness-of-fit variable could be
e.g. Pearsons ?2
Glen Cowan
SCMA4, 12-15 June, 2006
16Example search for the Higgs boson at LEP
Several usable signal modes
Mass of jet pair mass of Higgs boson b jets
contain tracks not from interaction point
Important background from ee- ? ZZ
b-jet pair of virtual Z can mimic Higgs
Glen Cowan
SCMA4, 12-15 June, 2006
17A candidate Higgs event
17 Higgs like candidates seen but no claim of
discovery -- p-value of s0 (background only)
hypothesis 0.09
Glen Cowan
SCMA4, 12-15 June, 2006
18Setting limits
Frequentist intervals (limits) for a parameter s
can be found by defining a test of the
hypothesized value s (do this for all s)
Specify values of the data n that are
disfavoured by s (critical region) such that
P(n in critical region) g for a prespecified
g, e.g., 0.05 or 0.1. (Because of discrete data,
need inequality here.) If n is observed in the
critical region, reject the value s. Now invert
the test to define a confidence interval as set
of s values that would not be rejected in a test
of size g (confidence level is 1 - g ). The
interval will cover the true value of s with
probability 1 - g.
Glen Cowan
SCMA4, 12-15 June, 2006
19Setting limits classical method
E.g. for upper limit on s, take critical region
to be low values of n, limit sup at confidence
level 1 - b thus found from
Similarly for lower limit at confidence level 1 -
a,
Sometimes choose a b g /2 ? central
confidence interval.
Glen Cowan
SCMA4, 12-15 June, 2006
20Calculating classical limits
To solve for slo, sup, can exploit relation to ?2
distribution
Quantile of ?2 distribution
For low fluctuation of n this can give negative
result for slo i.e. confidence interval is
empty.
b
Glen Cowan
SCMA4, 12-15 June, 2006
21Likelihood ratio limits (Feldman-Cousins)
Define likelihood ratio for hypothesized
parameter value s
Here is the ML estimator, note
Critical region defined by low values of
likelihood ratio. Resulting intervals can be one-
or two-sided (depending on n).
(Re)discovered for HEP by Feldman and Cousins,
Phys. Rev. D 57 (1998) 3873.
Glen Cowan
SCMA4, 12-15 June, 2006
22Coverage probability of confidence intervals
Because of discreteness of Poisson data,
probability for interval to include true value in
general gt confidence level (over-coverage)
Glen Cowan
SCMA4, 12-15 June, 2006
23More on intervals from LR test (Feldman-Cousins)
Caveat with coverage suppose we find n gtgt
b. Usually one then quotes a measurement
If, however, n isnt large enough to claim
discovery, one sets a limit on s. FC pointed out
that if this decision is made based on n,
then the actual coverage probability of the
interval can be less than the stated confidence
level (flip-flopping). FC intervals remove
this, providing a smooth transition from 1- to
2-sided intervals, depending on n. But, suppose
FC gives e.g. 0.1 lt s lt 5 at 90 CL, p-value of
s0 still substantial. Part of upper-limit
wasted?
Glen Cowan
SCMA4, 12-15 June, 2006
24Properties of upper limits
Example take b 5.0, 1 - ? 0.95
Upper limit sup vs. n
Mean upper limit vs. s
Glen Cowan
SCMA4, 12-15 June, 2006
25Upper limit versus b
Feldman Cousins, PRD 57 (1998) 3873
b
If n 0 observed, should upper limit depend on
b? Classical yes Bayesian no FC yes
Glen Cowan
SCMA4, 12-15 June, 2006
26Nuisance parameters and limits
In general we dont know the background b
perfectly.
Suppose we have a measurement of b, e.g.,
bmeas N (b, ?b) So the data are really n
events and the value bmeas. In principle the
confidence interval recipe can be generalized to
two measurements and two parameters. Difficult
and not usually attempted, but see e.g. talks by
K. Cranmer at PHYSTAT03, G. Punzi at PHYSTAT05.
G. Punzi, PHYSTAT05
Glen Cowan
SCMA4, 12-15 June, 2006
27Bayesian limits with uncertainty on b
Uncertainty on b goes into the prior, e.g.,
Put this into Bayes theorem,
Marginalize over b, then use p(sn) to find
intervals for s with any desired probability
content. For b 0, ?b 0, ?(s) const. (s gt
0), Bayesian upper limit coincides with
Classical one.
Glen Cowan
SCMA4, 12-15 June, 2006
28Cousins-Highland method
Regard b as random, characterized by pdf
?(b). Makes sense in Bayesian approach, but in
frequentist model b is constant (although
unknown). A measurement bmeas is random but this
is not the mean number of background events,
rather, b is. Compute anyway
This would be the probability for n if Nature
were to generate a new value of b upon repetition
of the experiment with ?b(b). Now e.g. use this
P(ns) in the classical recipe for upper limit at
CL 1 - b
Widely used method in HEP.
Glen Cowan
SCMA4, 12-15 June, 2006
29Integrated likelihoods
Consider again signal s and background b, suppose
we have uncertainty in b characterized by a prior
pdf ?b(b). Define integrated likelihood as
also called modified profile likelihood, in any
case not a real likelihood.
Now use this to construct likelihood-ratio test
and invert to obtain confidence intervals.
Feldman-Cousins Cousins-Highland (FHC2), see
e.g. J. Conrad et al., Phys. Rev. D67 (2003)
012002 and Conrad/Tegenfeldt PHYSTAT05
talk. Calculators available (Conrad, Tegenfeldt,
Barlow).
Glen Cowan
SCMA4, 12-15 June, 2006
30Digression tangent plane method
Consider least-squares fit with parameter of
interest ?0 and nuisance parameter ?1, i.e.,
minimize
Standard deviations from tangent lines to contour
Correlation between causes errors to
increase.
Glen Cowan
SCMA4, 12-15 June, 2006
31The profile likelihood
The tangent plane method is a special case of
using the profile likelihood
is found by maximizing L (q0, q1) for each q0.
Equivalently use
The interval obtained from
is the same as what is obtained from
the tangents to
Well known in HEP as the MINOS method in
MINUIT. See e.g. talks by Reid, Cranmer, Rolke at
PHYSTAT05.
Glen Cowan
SCMA4, 12-15 June, 2006
32Interval from inverting profile LR test
Suppose we have a measurement bmeas of b. Build
the likelihood ratio test with profile
likelihood
and use this to construct confidence
intervals. Not widely used in HEP but
recommended in e.g. Kendall Stuart see also
PHYSTAT05 talks by Cranmer, Feldman, Cousins,
Reid.
Glen Cowan
SCMA4, 12-15 June, 2006
33Wrapping up,
Frequentist methods have been most widely used
but for many questions (particularly related to
systematics), Bayesian methods are getting more
notice. Frequentist properties such as coverage
probability of confidence intervals seen as very
important (overly so?) Bayesian methods remain
problematic in cases where it is difficult to
enumerate alternative hypotheses and assign
meaningful prior probabilities. Tools widely
applied at LEP some work needed to extend these
to LHC analyses (ongoing).
Glen Cowan
SCMA4, 12-15 June, 2006
34Finally,
The LEP programme was dominated by limit
setting Standard Model confirmed, No New
Physics The Tevatron discovered the top quark and
Bs mixing (both parts of the SM) and also set
many limits (but NNP) By 2012 either well have
discovered something new and interesting beyond
the Standard Model, or, well still be setting
limits and HEP should think seriously about a
new approach!
Glen Cowan
SCMA4, 12-15 June, 2006
35Extra slides
Glen Cowan
SCMA4, 12-15 June, 2006
36A recent discovery Bs oscillations
Recently the D0 experiment (Fermilab) announced
the discovery of Bs mixing Moriond talk
by Brendan Casey, also hep-ex/0603029
Produce a Bq meson at time t0 there is a time
dependent probability for it to decay as an
anti-Bq (q d or s)
VtsÀ Vtd and so Bs oscillates quickly
compared to decay rate Sought but not seen at
LEP early on predicted to be visible at
Tevatron Discovery quickly confirmed by the CDF
experiment
Glen Cowan
SCMA4, 12-15 June, 2006
37Glen Cowan
Statistics in HEP, IoP Half Day Meeting, 16
November 2005, Manchester
38Confidence interval from likelihood function
In the large sample limit it can be shown for ML
estimators
(n-dimensional Gaussian, covariance V)
defines a hyper-ellipsoidal confidence region,
If
then
Glen Cowan
SCMA4, 12-15 June, 2006
39Approximate confidence regions from L(? )
So the recipe to find the confidence region with
CL 1-? is
For finite samples, these are approximate
confidence regions.
Coverage probability not guaranteed to be equal
to 1-? no simple theorem to say by how far off
it will be (use MC).
Glen Cowan
SCMA4, 12-15 June, 2006
40Glen Cowan
Statistics in HEP, IoP Half Day Meeting, 16
November 2005, Manchester
41Upper limit from test of hypothesized ?ms
Base test on likelihood ratio (here ? ?ms)
Observed value is lobs , sampling distribution is
g(l?) (from MC)
? is excluded at CL1-? if
D0 shows the distribution of ln l for ?ms 25
ps-1
equivalent to 2.1? effect
95 CL upper limit
Glen Cowan
SCMA4, 12-15 June, 2006
42The significance of an observed signal
Suppose b 0.5, and we observe nobs 5.
Often, however, b has some uncertainty this can
have significant impact on p-value, e.g. if b
0.8, p-value 1.4 ? 10-3
Glen Cowan
SCMA4, 12-15 June, 2006
43The significance of a peak
Suppose we measure a value x for each event and
find
Each bin (observed) is a Poisson r.v., means
are given by dashed lines.
In the two bins with the peak, 11 entries found
with b 3.2. We are tempted to compute the
p-value for the s 0 hypothesis as
Glen Cowan
SCMA4, 12-15 June, 2006
44The significance of a peak (2)
But... did we know where to look for the peak? ?
give P(n 11) in any 2 adjacent bins Is the
observed width consistent with the expected x
resolution? ? take x window several times the
expected resolution How many bins ? distributions
have we looked at? ? look at a thousand of
them, youll find a 10-3 effect Did we adjust the
cuts to enhance the peak? ? freeze cuts,
repeat analysis with new data How about the bins
to the sides of the peak... (too low!) Should we
publish????
Glen Cowan
SCMA4, 12-15 June, 2006
45Statistical vs. systematic errors
Statistical errors How much would the result
fluctuate upon repetition of the
measurement? Implies some set of assumptions to
define probability of outcome of the
measurement. Systematic errors What is the
uncertainty in my result due to uncertainty in
my assumptions, e.g., model (theoretical)
uncertainty modeling of measurement
apparatus. The sources of error do not vary upon
repetition of the measurement. Often result
from uncertain value of, e.g., calibration
constants, efficiencies, etc.
Glen Cowan
SCMA4, 12-15 June, 2006
46Systematic errors and nuisance parameters
Response of measurement apparatus is never
modeled perfectly
y (measured value)
model
truth
x (true value)
Model can be made to approximate better the truth
by including more free parameters.
systematic uncertainty ? nuisance parameters
Glen Cowan
SCMA4, 12-15 June, 2006