Title: Support Vector Machines in Data Mining AFOSR Software
1 Support Vector Machines in Data MiningAFOSR
Software Systems Annual Meeting Syracuse, NY
June 3-7, 2002
Data Mining Institute University of Wisconsin -
Madison
2What is a Support Vector Machine?
- An optimally defined surface
- Linear or nonlinear in the input space
- Linear in a higher dimensional feature space
- Implicitly defined by a kernel function
3What are Support Vector Machines Used For?
- Classification
- Regression Data Fitting
- Supervised Unsupervised Learning
4Principal Contributions
- Lagrangian support vector machine classification
- Fast, simple, unconstrained iterative method
- Reduced support vector machine classification
- Accurate nonlinear classifier using random
sampling - Proximal support vector machine classification
- Classify by proximity to planes instead of
halfspaces - Massive incremental classification
- Classify by retiring old data adding new data
- Knowledge-based classification
- Incorporate expert knowledge into classifier
- Fast Newton method classifier
- Finitely terminating fast algorithm for
classification - Breast cancer prognosis chemotherapy
- Classify patients on basis of distinct survival
curves
5Principal Contributions
- Proximal support vector machine classification
6Support Vector MachinesMaximize the Margin
between Bounding Planes
A
A-
7Proximal Support Vector Machines Maximize the
Margin between Proximal Planes
A
A-
8Standard Support Vector MachineAlgebra of
2-Category Linearly Separable Case
9Standard Support Vector Machine Formulation
10PSVM Formulation
Standard SVM formulation
This simple, but critical modification, changes
the nature of the optimization problem
tremendously!!
11Advantages of New Formulation
- Objective function remains strongly convex.
- An explicit exact solution can be written in
terms of the problem data. - PSVM classifier is obtained by solving a single
system of linear equations in the usually small
dimensional input space. - Exact leave-one-out-correctness can be obtained
in terms of problem data.
12Linear PSVM
- Setting the gradient equal to zero, gives a
nonsingular system of linear equations. - Solution of the system gives the desired PSVM
classifier.
13Linear PSVM Solution
14Linear Nonlinear PSVM MATLAB Code
function w, gamma psvm(A,d,nu) PSVM linear
and nonlinear classification INPUT A,
ddiag(D), nu. OUTPUT w, gamma w, gamma
psvm(A,d,nu) m,nsize(A)eones(m,1)HA
-e v(dH) vHDe
r(speye(n1)/nuHH)\v solve (I/nuHH)rv
wr(1n)gammar(n1) getting w,gamma from
r
15Numerical experimentsOne-Billion Two-Class
Dataset
- Synthetic dataset consisting of 1 billion points
in 10- dimensional input space - Generated by NDC (Normally Distributed
Clustered) dataset generator - Dataset divided into 500 blocks of 2 million
points each. - Solution obtained in less than 2 hours and 26
minutes - About 30 of the time was spent reading data
from disk. - Testing set Correctness 90.79
16Principal Contributions
- Knowledge-based classification
17Conventional Data-Based SVM
18Knowledge-Based SVM via Polyhedral Knowledge
Sets
19Incoporating Knowledge Sets Into an SVM
Classifier
- This implication is equivalent to a set of
constraints that can be imposed on the
classification problem.
20Numerical TestingThe Promoter Recognition Dataset
- Promoter Short DNA sequence that precedes a
gene sequence. - A promoter consists of 57 consecutive DNA
nucleotides belonging to A,G,C,T . - Important to distinguish between promoters and
nonpromoters - This distinction identifies starting locations
of genes in long uncharacterized DNA sequences.
21The Promoter Recognition DatasetComparative Test
Results
22Wisconsin Breast Cancer Prognosis Dataset
Description of the data
- 110 instances corresponding to 41 patients
whose cancer had recurred and 69 patients whose
cancer had not recurred - 32 numerical features
- The domain theory two simple rules used by
doctors
23Wisconsin Breast Cancer Prognosis Dataset
Numerical Testing Results
- Doctors rules applicable to only 32 out of 110
patients. - Only 22 of 32 patients are classified correctly
by this rule (20 Correctness). - KSVM linear classifier applicable to all
patients with correctness of 66.4. - Correctness comparable to best available
results using conventional SVMs. - KSVM can get classifiers based on knowledge
without using any data. -
24Principal Contributions
- Fast Newton method classifier
25Fast Newton Algorithm for Classification
Standard quadratic programming (QP) formulation
of SVM
26Newton Algorithm
- Newton algorithm terminates in a finite number of
steps
- Termination at global minimum
- Error rate decreases linearly
- Can generate complex nonlinear classifiers
- By using nonlinear kernels K(x,y)
27Nonlinear Spiral Dataset94 Red Dots 94 White
Dots
28Principal Contributions
- Breast cancer prognosis chemotherapy
29Kaplan-Meier Curves for Overall PatientsWith
Without Chemotherapy
30Breast Cancer Prognosis ChemotherapyGood,
Intermediate Poor Patient Clustering
31Kaplan-Meier Survival Curvesfor Good,
Intermediate Poor Patients
32Kaplan-Meier Survival Curves for Intermediate
Group With Without Chemotherapy
33Conclusion
- New methods for classification proposed
- All based on rigorous mathematical foundation
- Fast computational algorithms capable of
classifying massive datasets - Classifiers based on both abstract prior
knowledge as well as conventional datasets - Identification of breast cancer patients that can
benefit from chemotherapy
34Future Work
- Extend proposed methods to standard optimization
problems - Linear quadratic programming
- Preleminary results beat state-of-the-art
software - Incorporate abstract concepts into optimization
problems as constraints - Develop fast online algorithms for intrusion and
fraud detection - Classify the effectiveness of new drug cocktails
in combating various forms of cancer - Encouraging preliminary results