Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley

About This Presentation

Title:

Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley

Description:

Square cuts may work well for simpler tasks, but as the data are ... numbers from the USPS database. Gives a highly flexible SVM. July 11, 2001. Daniel Whiteson ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 17

Provided by: dani233

Learn more at: https://www.snowmass2001.org

Category:

more less

Transcript and Presenter's Notes

Title: Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley

1
Support Vector MachinesGet more Higgs out of
your dataDaniel WhitesonUC Berkeley
2
Multivariate algorithms
Square cuts may work well for simpler tasks, but
as the data are multivariate, the algorithms also
must be.
3
Multivariate Algorithms

HEP overlaps with Computer Science, Mathematics
and Statistics in this area
How can we construct an algorithm that can be
taught by example and generalize effectively?
We can use solutions from those fields
Neural Networks
Probability Density Estimators
Support Vector Machines

4
Neural Networks

Constructed from a very simple object, they can
learn complex patterns.

Decision function learned using freedom in hidden
layers.
Used very effectively as signal discriminators,
particle identifiers and parameter estimators
Fast evaluation makes them suited to triggers

5
Probability Density Estimation
Then we could calculate
If we knew the distributions of the signal fs(x)
Example disc. surface
and the background fb(x),
And use it to discriminate.
6
Probability Density Estimation
Of course we do not know the analytical
distributions.

Given a set of points drawn from a
distribution, put down a kernel centered at each
point.

With high statistics, this approximates a
smooth probability density.

Surface with many kernels
7
Probability Density Estimation

Simple techniques have advanced to more
sophisticated approaches
Adaptive PDE
varies the width of the kernel for smoothness
Generalized for regression analysis
Measure the value of a continuous parameter
GEM
Measures the local covariance and adjusts the
individual kernels to give a more accurate
estimate.

8
Support Vector Machines

PDEs must evaluate a kernel at every training
point for every classification of a data point.
Can we build a decision surface that only uses
the relevant bits of information, the points in
training set that are near the signal-background
boundary?

For a linear, separable case, this is not too
difficult. We simply need to find the hyperplane
that maximizes the separation.
9
Support Vector Machines

To find the hyperplane that gives the highest
separation (lowest energy), we maximize the
Lagrangian w.r.t ai

(xi,yi) are training data ai are positive
Lagrange multipliers
The solution is
Where ai0 for non support vectors
(images from applet at http//svm.research.bell-la
bs.com/)
10
Support Vector Machines
But not many problems of interest are linear.
Map data to higher dimensional space where
separation can be made by hyperplanes
We want to work in our original space. Replace
dot product with kernel function
For these data, we need
11
Support Vector Machines
Neither are entirely separable problems very
difficult.

Allow an imperfect decision boundary, but add a
penalty.

Training errors, points on the wrong side of the
boundary, are indicated by crosses.

12
Support Vector Machines
We are not limited to linear or
polynomial kernels.
Gives a highly flexible SVM

Gaussian kernel SVMs outperformed PDEs in
recognizing handwritten
numbers from the USPS database.

13
Comparative study for HEP
Signal Wh to bb
Neural Net
Background Wbb
Background tt
PDE
Background WZ
2-dimensional discriminant with variables Mjj and
Ht
SVM
Discriminator Value
14
Comparative study for HEP
Signal to Noise Enhancement
Efficiency 43
Efficiency 50
Efficiency 49
All of these methods provide powerful signal
enhancement
Discriminator Threshold
15
Algorithm Comparisons
Algorithm Advantages Disadvantages
Neural Nets Very fast evaluation Build structure by hand Black box Local optimization
PDE Transparent operation Slow evaluation Requires high statistics
SVM Fast evaluation Kernel positions chosen automatically Global optimization Complex Training can be time intensive Kernel selection by hand
16
Conclusions

Difficult problems in HEP overlap with those in
other fields. We can take advantage of our
colleagues years of thought and effort.
There are many areas of HEP analysis where
intelligent multivariate algorithms like NNs,
PDEs and SVMs can help us conduct more powerful
searches and make more precise measurements.

Write a Comment

User Comments (0)