Title: Knowledge-Based%20Support%20Vector%20Machine%20Classifiers
1 Knowledge-Based Support Vector Machine
Classifiers
NIPS2002, Vancouver, December 9-14, 2002
- Glenn Fung
- Olvi Mangasarian
- Jude Shavlik
University of Wisconsin-Madison
2Outline of Talk
- Support Vector Machine (SVM) Classifiers
- LP formulation1-norm linear SVM classifier
- Polyhedral Knowledge Sets
- Incorporating knowledge sets into a classifier
- The promoter DNA sequence dataset
- Wisconsin breast cancer prognosis dataset
3What is a Support Vector Machine?
- An optimally defined surface
- Typically nonlinear in the input space
- Linear in a higher dimensional space
- Implicitly defined by a kernel function
- Used for
- Regression Data Fitting
- Supervised Unsupervised Learning
4Geometry of the Classification Problem2-Category
Linearly Separable Case
A
A-
5Algebra of the Classification Problem 2-Category
Linearly Separable Case
- Given m points in n dimensional space
- Represented by an m-by-n matrix A
6Support Vector MachinesMaximizing the Margin
between Bounding Planes
A
A-
7Support Vector Machines QP Formulation
- Solve the following quadratic program
8Support Vector MachinesLinear Programming
Formulation
- Use the 1-norm instead of the 2-norm
- This is equivalent to the following linear
program
9Conventional Data-Based SVM
10Knowledge-Based SVM via Polyhedral Knowledge
Sets
11Incoporating Knowledge Sets Into an SVM
Classifier
- Will show that this implication is equivalent to
a set of constraints that can be imposed on the
classification problem.
12Knowledge Set Equivalence Theorem
13Proof of Equivalence Theorem( Via Nonhomogeneous
Farkas or LP Duality)
Proof By LP Duality
14Knowledge-Based SVM Classification
15Knowledge-Based SVM Classification
16Parametrized Knowledge-Based LPMinimize Error in
Knowledge Set Constraints
17Knowledge-Based SVM via Polyhedral Knowledge
Sets
18Numerical TestingThe Promoter Recognition Dataset
- Promoter Short DNA sequence that precedes a
gene sequence. - A promoter consists of 57 consecutive DNA
nucleotides belonging to A,G,C,T . - Important to distinguish between promoters and
nonpromoters - This distinction identifies starting locations
of genes in long uncharacterized DNA sequences.
19The Promoter Recognition DatasetNumerical
Representation
- Feature space mapped from 57-dimensional nominal
space to a real valued 57 x 4228 dimensional
space.
57 nominal values
57 x 4 228 binary values
20Promoter Recognition Dataset Prior Knowledge
Rules
- Prior knowledge consist of the following 64
rules
21Promoter Recognition Dataset Sample Rules
22The Promoter Recognition DatasetComparative
Algorithms
- KBANN Knowledge-based artificial neural network
Shavlik et al - BP Standard back propagation for neural
networks Rumelhart et al - ONeills Method Empirical method suggested by
biologist ONeill ONeill - NN Nearest neighbor with k3 Cost et al
- ID3 Quinlans decision tree builderQuinlan
- SVM1 Standard 1-norm SVM Bradley et al
23The Promoter Recognition DatasetComparative Test
Results
24Wisconsin Breast Cancer Prognosis Dataset
Description of the data
- 110 instances corresponding to 41 patients
whose cancer had recurred and 69 patients whose
cancer had not recurred - 32 numerical features
- The domain theory two simple rules used by
doctors
25Wisconsin Breast Cancer Prognosis Dataset
Numerical Testing Results
- Doctors rules applicable to only 32 out of 110
patients. - Only 22 of 32 patients are classified correctly
by this rule (20 Correctness). - KSVM linear classifier applicable to all
patients with correctness of 66.4. - Correctness comparable to best available
results using conventional SVMs. - KSVM can get classifiers based on knowledge
without using any data. -
26Conclusion
- Prior knowledge easily incorporated into
classifiers through polyhedral knowledge sets. - Resulting problem is a simple LP.
- Knowledge sets can be used with or without
conventional labeled data. - In either case, KSVM is better than most
classifiers tested.
27Future Research
- Generate classifiers based on prior expert
knowledge in various fields - Diagnostic rules for various diseases
- Financial investment rules
- Intrusion detection rules
- Extend knowledge sets to general convex sets
- Nonlinear kernel classifiers. Challenges
- Express prior knowledge nonlinearly
- Extend equivalence theorem
-
28Web Pages