Support Vector Machine Classification Computation - PowerPoint PPT Presentation

About This Presentation
Title:

Support Vector Machine Classification Computation

Description:

function [w, gamma] = psvm(A,d,nu) % PSVM: linear and nonlinear classification ... w=r(1:n);gamma=r(n 1); % getting w,gamma from r. Numerical experiments ... – PowerPoint PPT presentation

Number of Views:249
Avg rating:3.0/5.0
Slides: 43
Provided by: olvilman9
Learn more at: https://ftp.cs.wisc.edu
Category:

less

Transcript and Presenter's Notes

Title: Support Vector Machine Classification Computation


1
Support Vector Machine ClassificationComputatio
n Informatics in Biology MedicineMadison
Retreat, November 15, 2002
  • Olvi L. Mangasarian
  • with
  • G. M. Fung, Y.-J. Lee, J.W. Shavlik, W. H.
    Wolberg
  • Collaborators at ExonHit Paris

Data Mining Institute University of Wisconsin -
Madison
2
What is a Support Vector Machine?
  • An optimally defined surface
  • Linear or nonlinear in the input space
  • Linear in a higher dimensional feature space
  • Implicitly defined by a kernel function
  • K(A,B) ? C

3
What are Support Vector Machines Used For?
  • Classification
  • Regression Data Fitting
  • Supervised Unsupervised Learning

4
Principal Topics
  • Proximal support vector machine classification
  • Classify by proximity to planes instead of
    halfspaces
  • Massive incremental classification
  • Classify by retiring old data adding new data
  • Knowledge-based classification
  • Incorporate expert knowledge into a classifier
  • Fast Newton method classifier
  • Finitely terminating fast algorithm for
    classification
  • Breast cancer prognosis chemotherapy
  • Classify patients on basis of distinct survival
    curves
  • Isolate a class of patients that may benefit
    from chemotherapy

5
Principal Topics
  • Proximal support vector machine classification

6
Support Vector MachinesMaximize the Margin
between Bounding Planes
A
A-
7
Proximal Support Vector Machines Maximize the
Margin between Proximal Planes
A
A-
8
Standard Support Vector MachineAlgebra of
2-Category Linearly Separable Case
9
Standard Support Vector Machine Formulation
10
Proximal SVM Formulation (PSVM)
Standard SVM formulation
This simple, but critical modification, changes
the nature of the optimization problem
tremendously!! (Regularized Least Squares or
Ridge Regression)
11
Advantages of New Formulation
  • Objective function remains strongly convex.
  • An explicit exact solution can be written in
    terms of the problem data.
  • PSVM classifier is obtained by solving a single
    system of linear equations in the usually small
    dimensional input space.
  • Exact leave-one-out-correctness can be obtained
    in terms of problem data.

12
Linear PSVM
  • Setting the gradient equal to zero, gives a
    nonsingular system of linear equations.
  • Solution of the system gives the desired PSVM
    classifier.

13
Linear PSVM Solution
14
Linear Nonlinear PSVM MATLAB Code
function w, gamma psvm(A,d,nu) PSVM linear
and nonlinear classification INPUT A,
ddiag(D), nu. OUTPUT w, gamma w, gamma
psvm(A,d,nu) m,nsize(A)eones(m,1)HA
-e v(dH) vHDe
r(speye(n1)/nuHH)\v solve (I/nuHH)rv
wr(1n)gammar(n1) getting w,gamma from
r
15
Numerical experimentsOne-Billion Two-Class
Dataset
  • Synthetic dataset consisting of 1 billion points
    in 10- dimensional input space
  • Generated by NDC (Normally Distributed
    Clustered) dataset generator
  • Dataset divided into 500 blocks of 2 million
    points each.
  • Solution obtained in less than 2 hours and 26
    minutes on a 400Mhz
  • About 30 of the time was spent reading data
    from disk.
  • Testing set Correctness 90.79

16
Principal Topics
  • Knowledge-based classification (NIPS2002)

17
Conventional Data-Based SVM
18
Knowledge-Based SVM via Polyhedral Knowledge
Sets
19
Incoporating Knowledge Sets Into an SVM
Classifier
  • This implication is equivalent to a set of
    constraints that can be imposed on the
    classification problem.

20
Numerical TestingThe Promoter Recognition Dataset
  • Promoter Short DNA sequence that precedes a
    gene sequence.
  • A promoter consists of 57 consecutive DNA
    nucleotides belonging to A,G,C,T .
  • Important to distinguish between promoters and
    nonpromoters
  • This distinction identifies starting locations
    of genes in long uncharacterized DNA sequences.

21
The Promoter Recognition DatasetNumerical
Representation
  • Simple 1 of N mapping scheme for converting
    nominal attributes into a real valued
    representation
  • Not most economical representation, but commonly
  • used.

22
The Promoter Recognition DatasetNumerical
Representation
  • Feature space mapped from 57-dimensional nominal
    space to a real valued 57 x 4228 dimensional
    space.

57 nominal values
57 x 4 228 binary values
23
Promoter Recognition Dataset Prior Knowledge
Rules
  • Prior knowledge consist of the following 64
    rules

24
Promoter Recognition Dataset Sample Rules
25
The Promoter Recognition DatasetComparative
Algorithms
  • KBANN Knowledge-based artificial neural network
    Shavlik et al
  • BP Standard back propagation for neural
    networks Rumelhart et al
  • ONeills Method Empirical method suggested by
    biologist ONeill ONeill
  • NN Nearest neighbor with k3 Cost et al
  • ID3 Quinlans decision tree builderQuinlan
  • SVM1 Standard 1-norm SVM Bradley et al

26
The Promoter Recognition DatasetComparative Test
Results
27
Wisconsin Breast Cancer Prognosis Dataset
Description of the data
  • 110 instances corresponding to 41 patients whose
    cancer had recurred and 69 patients whose cancer
    had not recurred
  • 32 numerical features
  • The domain theory two simple rules used by
    doctors

28
Wisconsin Breast Cancer Prognosis Dataset
Numerical Testing Results
  • Doctors rules applicable to only 32 out of 110
    patients.
  • Only 22 of 32 patients are classified correctly
    by this rule (20 Correctness).
  • KSVM linear classifier applicable to all
    patients with correctness of 66.4.
  • Correctness comparable to best available
    results using conventional SVMs.
  • KSVM can get classifiers based on knowledge
    without using any data.

29
Principal Topics
  • Fast Newton method classifier

30
Fast Newton Algorithm for Classification
Standard quadratic programming (QP) formulation
of SVM
31
Newton Algorithm
  • Newton algorithm terminates in a finite number of
    steps
  • Termination at global minimum
  • Error rate decreases linearly
  • Can generate complex nonlinear classifiers
  • By using nonlinear kernels K(x,y)

32
Nonlinear Spiral Dataset94 Red Dots 94 White
Dots
33
Principal Topics
  • Breast cancer prognosis chemotherapy

34
Kaplan-Meier Curves for Overall PatientsWith
Without Chemotherapy
35
Breast Cancer Prognosis ChemotherapyGood,
Intermediate Poor Patient Groupings(6 Input
Features 5 Cytological, 1 Histological)(Groupin
g Utilizes 2 Histological Features Chemotherapy)
36
Kaplan-Meier Survival Curvesfor Good,
Intermediate Poor Patients82.7 Classifier
Correctness via 3 SVMs
37
Kaplan-Meier Survival Curves for Intermediate
Group Note Reversed Role of Chemotherapy
38
Conclusion
  • New methods for classification
  • All based on rigorous mathematical foundation
  • Fast computational algorithms capable of
    classifying massive datasets
  • Classifiers based on both abstract prior
    knowledge as well as conventional datasets
  • Identification of breast cancer patients that can
    benefit from chemotherapy

39
Future Work
  • Extend proposed methods to broader optimization
    problems
  • Linear quadratic programming
  • Preliminary results beat state-of-the-art
    software
  • Incorporate abstract concepts into optimization
    problems as constraints
  • Develop fast online algorithms for intrusion and
    fraud detection
  • Classify the effectiveness of new drug cocktails
    in combating various forms of cancer
  • Encouraging preliminary results for breast cancer

40
Breast Cancer Treatment ResponseJoint with
ExonHit ( French BioTech)
  • 35 patients treated by a drug cocktail
  • 9 partial responders 26 nonresponders
  • 25 gene expression measurements made on each
    patient
  • 1-Norm SVM classifier selected 12 out of 25
    genes
  • Combinatorially selected 6 genes out of 12
  • Separating plane obtained
  • 2.7915 T11 0.13436 S24 -1.0269 U23 -2.8108 Z23
    -1.8668 A19 -1.5177 X05 2899.1 0.
  • Leave-one-out-error 1 out of 35 (97.1
    correctness)

41
Detection of Alternative RNA Isoforms via
DATAS (Levels of mRNA that Correlate with
Senitivity to Chemotherapy)
42
Talk Available
www.cs.wisc.edu/olvi
Write a Comment
User Comments (0)
About PowerShow.com