Comparison of data distributions: the power of GoodnessofFit Tests - PowerPoint PPT Presentation

About This Presentation

Title:

Comparison of data distributions: the power of GoodnessofFit Tests

Description:

Goodman test (Kolmogorov-Smirnov test in chi-squared approximation) Kolmogorov-Smirnov test ... Goodman (15.9 0.2) ms. Generalised Girone (16.3 0.2) ms (0.44 ... – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 20

Provided by: mariagr

Category:

more less

Transcript and Presenter's Notes

Title: Comparison of data distributions: the power of GoodnessofFit Tests

1
Comparison of data distributions the power of
Goodness-of-Fit Tests

B. Mascialino1, A. Pfeiffer2, M.G. Pia1, A.
Ribon2, P. Viarengo3
1INFN Genova, Italy
2CERN, Geneva, Switzerland
3IST National Institute for Cancer Research,
Genova, Italy

IEEE NSS 2006 San Diego, October 29-November 5,
2006
2
Goodness of Fit testing
Goodness-of-fit testing is the mathematical
foundation for the comparison of data
distributions

Regression testing
Throughout the software life-cycle
Online DAQ
Monitoring detector behaviour w.r.t. a reference
Simulation validation
Comparison with experimental data
Reconstruction
Comparison of reconstructed vs. expected
distributions
Physics analysis
Comparisons of experimental distributions
Comparison with theoretical distributions

Use cases in experimental physics
3
(No Transcript)
4
GoF algorithms in the Statistical Toolkit
TWO-SAMPLE PROBLEM

Binned distributions
Anderson-Darling test
Chi-squared test
Fisz-Cramer-von Mises test
Tiku test (Cramer-von Mises test in chi-squared
approximation)

Unbinned distributions
Anderson-Darling test
Anderson-Darling approximated test
Cramer-von Mises test
Generalised Girone test
Goodman test (Kolmogorov-Smirnov test in
chi-squared approximation)
Kolmogorov-Smirnov test
Kuiper test
Tiku test (Cramer-von Mises test in chi-squared
approximation)
Weighted Kolmogorov-Smirnov test
Weighted Cramer-von Mises test

5
Performance of the GoF tests
6
Power of GoF tests

Do we really need such a wide collection of GoF
tests? Why?
Which is the most appropriate test to compare two
distributions?
How good is a test at recognizing real
equivalent distributions and rejecting fake ones?

No comprehensive study of the relative power of
GoF tests exists in literature
novel research in statistics (not only in physics
data analysis!)
Systematic study of all existing GoF tests in
progress
made possible by the extensive collection of
tests in the Statistical Toolkit

7
Method for the evaluation of power
The power of a test is the probability of
rejecting the null hypothesis correctly
Parent distribution 1
Parent distribution 2
Pseudo-experiment a random drawing of two
samples from two parent distributions
GoF test
Sample 1 n
Sample 2 n
N10000 Monte Carlo replicas
Confidence Level 0.05
8
Analysis cases

Data samples drawn from different parent
distributions
Data samples drawn from the same parent
distribution
Applying a scale factor
Applying a shift
Use cases in experimental physics
Signal over background
Hot channel, dead channel
etc.

Power analysis on a set of reference mathematical
distributions
Power analysis on some typical physics
applications
Is there any recipe to identify the best test to
use?
9
Parent reference distributions
10
TAILWEIGHT
SKEWNESS
11
Compare different distributions Parent1 ? Parent2
Unbinned distributions
12
The power increases as a function of the sample
size
No clear winner
13
The power varies as a function of the parent
distributions characteristics
General recipe
plt0.0001
14
Quantitative evaluation of GoF tests power
We propose a quantitative method to evaluate the
power of various GoF tests.
15
Binned distributions
Compare different distributions Parent1 ? Parent2
16
Preliminary results
CvM test More powerful Faster (CPU time)
17
Physics use case
18
?0.25 µ2.0
?0.25 µ0.5
K
AD
KS
CvM
Empirical power ()
Empirical power ()
W
WKSAD
Samples size
Samples size
?0.75 µ3.5
AD
Empirical power ()
CvM
WKSAD
Samples size
19
Conclusions

No clear winner for all the considered
distributions in general
the performance of a test depends on its
intrinsic features as well as on the features of
the distributions to be compared
Practical recommendations
first classify the type of the distributions in
terms of skewness and tailweight
choose the most appropriate test given the type
of distributions evaluating the best test by
means of the quantitative model proposed
Systematic study of the power in progress
for both binned and unbinned distributions
Topic still subject to research activity in the
domain of statistics