Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley - PowerPoint PPT Presentation

About This Presentation
Title:

Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley

Description:

Square cuts may work well for simpler tasks, but as the data are ... numbers from the USPS database. Gives a highly flexible SVM. July 11, 2001. Daniel Whiteson ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 17
Provided by: dani233
Category:

less

Transcript and Presenter's Notes

Title: Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley


1
Support Vector MachinesGet more Higgs out of
your dataDaniel WhitesonUC Berkeley
2
Multivariate algorithms
Square cuts may work well for simpler tasks, but
as the data are multivariate, the algorithms also
must be.
3
Multivariate Algorithms
  • HEP overlaps with Computer Science, Mathematics
    and Statistics in this area
  • How can we construct an algorithm that can be
    taught by example and generalize effectively?
  • We can use solutions from those fields
  • Neural Networks
  • Probability Density Estimators
  • Support Vector Machines

4
Neural Networks
  • Constructed from a very simple object, they can
    learn complex patterns.
  • Decision function learned using freedom in hidden
    layers.
  • Used very effectively as signal discriminators,
    particle identifiers and parameter estimators
  • Fast evaluation makes them suited to triggers

5
Probability Density Estimation
Then we could calculate
If we knew the distributions of the signal fs(x)
Example disc. surface
and the background fb(x),
And use it to discriminate.
6
Probability Density Estimation
Of course we do not know the analytical
distributions.
  • Given a set of points drawn from a
    distribution, put down a kernel centered at each
    point.
  • With high statistics, this approximates a
    smooth probability density.

Surface with many kernels
7
Probability Density Estimation
  • Simple techniques have advanced to more
    sophisticated approaches
  • Adaptive PDE
  • varies the width of the kernel for smoothness
  • Generalized for regression analysis
  • Measure the value of a continuous parameter
  • GEM
  • Measures the local covariance and adjusts the
    individual kernels to give a more accurate
    estimate.

8
Support Vector Machines
  • PDEs must evaluate a kernel at every training
    point for every classification of a data point.
  • Can we build a decision surface that only uses
    the relevant bits of information, the points in
    training set that are near the signal-background
    boundary?

For a linear, separable case, this is not too
difficult. We simply need to find the hyperplane
that maximizes the separation.
9
Support Vector Machines
  • To find the hyperplane that gives the highest
    separation (lowest energy), we maximize the
    Lagrangian w.r.t ai

(xi,yi) are training data ai are positive
Lagrange multipliers
The solution is
Where ai0 for non support vectors
(images from applet at http//svm.research.bell-la
bs.com/)
10
Support Vector Machines
But not many problems of interest are linear.
Map data to higher dimensional space where
separation can be made by hyperplanes
We want to work in our original space. Replace
dot product with kernel function
For these data, we need
11
Support Vector Machines
Neither are entirely separable problems very
difficult.
  • Allow an imperfect decision boundary, but add a
    penalty.
  • Training errors, points on the wrong side of the
    boundary, are indicated by crosses.

12
Support Vector Machines
We are not limited to linear or
polynomial kernels.
Gives a highly flexible SVM
  • Gaussian kernel SVMs outperformed PDEs in
    recognizing handwritten
  • numbers from the USPS database.

13
Comparative study for HEP
Signal Wh to bb
Neural Net
Background Wbb
Background tt
PDE
Background WZ
2-dimensional discriminant with variables Mjj and
Ht
SVM
Discriminator Value
14
Comparative study for HEP
Signal to Noise Enhancement
Efficiency 43
Efficiency 50
Efficiency 49
All of these methods provide powerful signal
enhancement
Discriminator Threshold
15
Algorithm Comparisons
Algorithm Advantages Disadvantages
Neural Nets Very fast evaluation Build structure by hand Black box Local optimization
PDE Transparent operation Slow evaluation Requires high statistics
SVM Fast evaluation Kernel positions chosen automatically Global optimization Complex Training can be time intensive Kernel selection by hand
16
Conclusions
  • Difficult problems in HEP overlap with those in
    other fields. We can take advantage of our
    colleagues years of thought and effort.
  • There are many areas of HEP analysis where
    intelligent multivariate algorithms like NNs,
    PDEs and SVMs can help us conduct more powerful
    searches and make more precise measurements.
Write a Comment
User Comments (0)
About PowerShow.com