Support Vector Machines in Data Mining AFOSR Software - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Support Vector Machines in Data Mining AFOSR Software

Description:

What is a Support Vector Machine? An optimally defined surface ... Reduced support vector machine classification ... Preleminary results beat state-of-the-art software ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 35

Provided by: olvilman9

Learn more at: http://ftp.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Support Vector Machines in Data Mining AFOSR Software

1
Support Vector Machines in Data MiningAFOSR
Software Systems Annual Meeting Syracuse, NY
June 3-7, 2002

Olvi L. Mangasarian

Data Mining Institute University of Wisconsin -
Madison
2
What is a Support Vector Machine?

An optimally defined surface
Linear or nonlinear in the input space
Linear in a higher dimensional feature space
Implicitly defined by a kernel function

3
What are Support Vector Machines Used For?

Classification
Regression Data Fitting
Supervised Unsupervised Learning

4
Principal Contributions

Lagrangian support vector machine classification
Fast, simple, unconstrained iterative method
Reduced support vector machine classification
Accurate nonlinear classifier using random
sampling
Proximal support vector machine classification
Classify by proximity to planes instead of
halfspaces
Massive incremental classification
Classify by retiring old data adding new data
Knowledge-based classification
Incorporate expert knowledge into classifier
Fast Newton method classifier
Finitely terminating fast algorithm for
classification
Breast cancer prognosis chemotherapy
Classify patients on basis of distinct survival
curves

5
Principal Contributions

Proximal support vector machine classification

6
Support Vector MachinesMaximize the Margin
between Bounding Planes
A
A-
7
Proximal Support Vector Machines Maximize the
Margin between Proximal Planes
A
A-
8
Standard Support Vector MachineAlgebra of
2-Category Linearly Separable Case
9
Standard Support Vector Machine Formulation
10
PSVM Formulation
Standard SVM formulation
This simple, but critical modification, changes
the nature of the optimization problem
tremendously!!
11
Advantages of New Formulation

Objective function remains strongly convex.
An explicit exact solution can be written in
terms of the problem data.
PSVM classifier is obtained by solving a single
system of linear equations in the usually small
dimensional input space.
Exact leave-one-out-correctness can be obtained
in terms of problem data.

12
Linear PSVM

Setting the gradient equal to zero, gives a
nonsingular system of linear equations.
Solution of the system gives the desired PSVM
classifier.

13
Linear PSVM Solution
14
Linear Nonlinear PSVM MATLAB Code
function w, gamma psvm(A,d,nu) PSVM linear
and nonlinear classification INPUT A,
ddiag(D), nu. OUTPUT w, gamma w, gamma
psvm(A,d,nu) m,nsize(A)eones(m,1)HA
-e v(dH) vHDe
r(speye(n1)/nuHH)\v solve (I/nuHH)rv
wr(1n)gammar(n1) getting w,gamma from
r
15
Numerical experimentsOne-Billion Two-Class
Dataset

Synthetic dataset consisting of 1 billion points
in 10- dimensional input space
Generated by NDC (Normally Distributed
Clustered) dataset generator
Dataset divided into 500 blocks of 2 million
points each.
Solution obtained in less than 2 hours and 26
minutes
About 30 of the time was spent reading data
from disk.
Testing set Correctness 90.79

16
Principal Contributions

Knowledge-based classification

17
Conventional Data-Based SVM
18
Knowledge-Based SVM via Polyhedral Knowledge
Sets
19
Incoporating Knowledge Sets Into an SVM
Classifier

This implication is equivalent to a set of
constraints that can be imposed on the
classification problem.

20
Numerical TestingThe Promoter Recognition Dataset

Promoter Short DNA sequence that precedes a
gene sequence.
A promoter consists of 57 consecutive DNA
nucleotides belonging to A,G,C,T .
Important to distinguish between promoters and
nonpromoters
This distinction identifies starting locations
of genes in long uncharacterized DNA sequences.

21
The Promoter Recognition DatasetComparative Test
Results
22
Wisconsin Breast Cancer Prognosis Dataset
Description of the data

110 instances corresponding to 41 patients
whose cancer had recurred and 69 patients whose
cancer had not recurred
32 numerical features
The domain theory two simple rules used by
doctors

23
Wisconsin Breast Cancer Prognosis Dataset
Numerical Testing Results

Doctors rules applicable to only 32 out of 110
patients.
Only 22 of 32 patients are classified correctly
by this rule (20 Correctness).
KSVM linear classifier applicable to all
patients with correctness of 66.4.
Correctness comparable to best available
results using conventional SVMs.
KSVM can get classifiers based on knowledge
without using any data.

24
Principal Contributions

Fast Newton method classifier

25
Fast Newton Algorithm for Classification
Standard quadratic programming (QP) formulation
of SVM
26
Newton Algorithm

Newton algorithm terminates in a finite number of
steps

Termination at global minimum

Error rate decreases linearly

Can generate complex nonlinear classifiers

By using nonlinear kernels K(x,y)

27
Nonlinear Spiral Dataset94 Red Dots 94 White
Dots
28
Principal Contributions

Breast cancer prognosis chemotherapy

29
Kaplan-Meier Curves for Overall PatientsWith
Without Chemotherapy
30
Breast Cancer Prognosis ChemotherapyGood,
Intermediate Poor Patient Clustering
31
Kaplan-Meier Survival Curvesfor Good,
Intermediate Poor Patients
32
Kaplan-Meier Survival Curves for Intermediate
Group With Without Chemotherapy
33
Conclusion

New methods for classification proposed
All based on rigorous mathematical foundation
Fast computational algorithms capable of
classifying massive datasets
Classifiers based on both abstract prior
knowledge as well as conventional datasets
Identification of breast cancer patients that can
benefit from chemotherapy

34
Future Work

Extend proposed methods to standard optimization
problems
Linear quadratic programming
Preleminary results beat state-of-the-art
software
Incorporate abstract concepts into optimization
problems as constraints
Develop fast online algorithms for intrusion and
fraud detection
Classify the effectiveness of new drug cocktails
in combating various forms of cancer
Encouraging preliminary results