Multivariate Discriminant Analysis applied to classification of ne CC events in MINOS PowerPoint PPT Presentation

presentation player overlay
1 / 13
About This Presentation
Transcript and Presenter's Notes

Title: Multivariate Discriminant Analysis applied to classification of ne CC events in MINOS


1
Multivariate Discriminant Analysisapplied to
classification of ne CC events in MINOS
  • Alex Sousa
  • Tufts University
  • MINOS Collaboration Meeting
  • Fermilab
  • 03/28/2004

2
Multivariate Discriminant Analysis
  • Introduction
  • Developed by Pearson, Fisher and Mahalanobis,
    among others.
  • Applied to data that can be described by several
    variables, possibly correlated
  • Aims to find combinations of those variables that
    maximize separation between two or more classes
    present in the data.
  • Main uses include signal/background distinction,
    taxonomic classification, medical diagnosis, etc.

Karl Pearson (1857 1936)
R.A. Fisher (1890 1962)
P.C. Mahalanobis (1893 1972)
Simple Example
3
MDA Procedure
  • Define a set of variables that
    appropriately describes the data sample.
  • Calculate the covariance matrix for each class
  • Determine the Mahalanobis distance to each class
    for each event
  • Compute the probabilities for an event to belong
    to each class (scores).

4
Implementation
  • Use the SAS package
  • Very complete Statistical Analysis software
    bundle. Available at the amber cluster at Tufts.
  • Built-in procedures such as DISCRIM and STEPDISC
    perform most of the tasks necessary in MDA for an
    arbitrary number of variables and classes.
  • Do your own code
  • Validated and debugged with Soudan2 data and SAS
    for 2 classes and arbitrary number of variables.
  • Higher flexibility will be useful for future
    sensitivity studies.

5
Samples and Variables
  • Sample contents
  • Constructed from combination of two old Far MC
    files.
  • Variables
  • Total of 60 variables assembled from samples.

6
Tufts Variables
y
  • Proposed by J. Schneps, based on 3D Hits obtained
    from matching U and V views.

m
nm
q
z
x
E_Hit_long
ThetaL_Hit_Max
ne
ne
NC
NC
Ratio_Hit_Energ
ThetaE_Hit_Max
ne
ne
NC
NC
7
Variable Selection
  • Occams Razor Frustra fit per plura quod potest
    fieri per pauciora
  • (it is vain to
    do with more what can be done with fewer).
  • MDA provides more reliable results when highly
    redundant (correlated) variables with similar
    discriminating power are eliminated. A lower set
    of variables also allows a more manageable
    application of the method.
  • SAS built-in Stepwise procedure
  • measures the discriminating power of each
    variable and adds the best one at each step.
  • Ideally the selected variables would be normally
    distributed, If not so, continuous variables can
    still be gaussianized. However, this needs to
    be looked at carefully.

8
Probability Distributions
Training Sample
Training Sample(E Cut)
  • We can make an educated guess on an appropriate
    threshold by calculating the Figure of Merit for
    each threshold value.

9
Efficiency and Purity vs Energy
Training Sample
Training Sample (E Cut)
10
Results
Training Sample
Training Sample (E Cut)
Test Sample (E Cut)
Test Sample
11
Results (contd)
Test Sample (E Cut)
12
Results(contd)
  • Comparison with Numi-714

13
Conclusions
  • Preliminary Results of application of a
    Multivariate Discriminant Analysis method to ne
    CC/NC/nm CC separation were presented. Still much
    work to be done
  • Define a variable selection method. Make sure
    variables are close to gaussian behavior as to
    improve separation efficiency.
  • Need to test on several samples for each training
    to assess possible biases.
  • Look at evolution of results for the new MC ne
    files. Will start processing Roberts new files
    as soon as possible.
  • Work with Harvard group on developing a common
    framework to generate working samples.
  • Will count with the help of C. Gomez-Abajo.
  • W.A. Mann has volunteered his experience to do a
    limited blind manual scanning of a sample
    containing ne CC,NC,nm CC events.
Write a Comment
User Comments (0)
About PowerShow.com