Title: The Statistical Testing Project
1The Statistical Testing Project
Stefania Donadio and Barbara Mascialino January
15TH, 2003
2Aim of the project
This project will provide a new way of analysing
physical distributions of real data. It was
thought as a tool for the statistical testing of
Geant4 its application areas are physics
validation, regression testing and system
testing. Anyhow, its generality may be of
interest also in other experimental contexts.
At the moment, the core statistical component
is designed to be applicable to the problem of
comparing two distributions, independently from
their origin.
3Distributions
- By means of this statistical tool, the user shall
be able to compare G4 - simulations results with
- equivalent reference distributions,
- experimental measurements,
- data libraries from reference distribution
sources, - functions deriving from theoretical
calculations, - functions deriving from fits,
4Goodness-of-Fit tests
The goodness-of-fit tests are introduced with the
aim of verifying the hypothesis that
experimental data come from a random variable
whose distribution is well known. This problem
is very important both in theoretical and
experimental analysis. The researcher must
decide if theoretical and experimental
distribution follow the same functional law. In
other words, the problem is concerned with
the choice of one of these two alternative
hypothesis H0 F0(x) FT(x) H1 F0(x) ?
FT(x), F0(x) lt FT(x), F0(x) gt FT(x) Of course,
in this kind of tests the acceptance of the null
hypothesis H0 means that the researcher will be
able to specify the distribution analyzed.
5GOF tests inserted in the statistical package
Pearsons c2 test Kolmogorov test Kolmogorov
Smirnov test Anderson-Darling test (for both
continuous and discrete distributions)
6Description of tests
Pearsons Chi-squared test was introduced to
study discrete (both quantitative and
qualitative) distributions adaptation. Kolmogoro
v-Smirnov test is very useful to verify the
adaption of a sample coming from a random
continuous variable. Anderson-Darling test is
performed to be suitable for any data-set
(Aksenov and Savageau-2002) with any skewness
(simmetric distribuitions, left or right
skewned). Moreover it seems to be sensible to fat
tail of distributions.
7Other tests projected in GOF
Of course, the statistical package could be
extended with other goodness-of-fit tests, as for
instance Lilliefors test, Cramer-von Mises
test, Kuiper test, Bayesian methods
8Other methods
Kolmogorov-Smirnov test can be applied only to
continuous distributions. Physical distributions
are not continuous. Following Dagum, these
binned distributions could be fitted (also a
mixture of more than one fit could be possible).
In this way, Kolmogorov-Smirnov test statistics
could be computed between the fitted function
and the theoretical distribution, simply changing
the number of degrees of freedom of the test.
9User requirements
Comparing distributions Converting
distributions Confidence levels Handling
distributions Treatment of errors Plotting
10Software Design
User layer ltgtDeveloper layer Based on AIDA
interfaces It is a general tool with an object
oriented approach
11The code
Chi Squared test gt OK
Anderson-Darling test (discrete distributions)
Kolmogorov-Smirnov test gt OK
Anderson-Darling test (continuous distributions)
12Problems with the existing code
Inside the Chi Squared Quality Checher it is
needed a Gamma Function. It was found inside
the GNU Scientific Library, but this one has the
problem that does not work with N gt171. This
could be a problem!
13Unit tests
Unit tests are to be performed on the statistical
package. We should need some suggestions on
reference distribution to test the code (test
cases).
Acceptance test
Integration test
System test
Unit test
Any suggestion?
Any suggestion?