Clustering High Dimensional Data Using SVM presentation

About This Presentation

Transcript and Presenter's Notes

Title: Clustering High Dimensional Data Using SVM

1
Clustering High Dimensional Data Using SVM

Tsau Young Lin and Tam Ngo
Department of Computer Science
San José State University

2
Overview

Introduction
Support Vector Machine (SVM)
What is SVM
How SVM Works
Data Preparation Using SVD
Singular Value Decomposition (SVD)
Analysis of SVD
The Project
Conceptual Exploration
Result Analysis
Conclusion
Future Work

3
Introduction

World Wide Web
No. 1 place for information
contains billions of documents
impossible to classify by humans
Projects Goals
Cluster documents
Reduce documents size
Get reasonable results when compared to humans
classification

4
Support Vector Machine (SVM)

a supervised learning machine
outperforms many popular methods for text
classification
used for bioinformatics, signature/hand writing
recognition, image and text classification,
pattern recognition, and e-mail spam
categorization

5
Motivation for SVM

How do we separate these points?
with a hyperplane

Source Authors Research
6
SVM Process Flow
Feature Space
Input Space
Input Space
Source DTREG
7
Convex Hulls
Source Bennett, K. P., Campbell, C., 2000
8
Simple SVM Example
Class X1
1 0
-1 1
-1 2
1 3

How would SVM separates these points?
use the kernel trick
? F(X1) (X1, X12)
It becomes 2-deminsional

Source Authors Research
9
Simple Points in Feature Space
Class X1 X12
1 0 0
-1 1 1
-1 2 4
1 3 9

All points here are support vectors.

Source Authors Research
10
SVM Calculation

Positive ?w ? x? b 1
Negative ?w ? x? b -1
Hyperplane ?w ? x? b 0
find the unknowns, w and b
Expending the equations
w1x1 w2x2 b 1
w1x1 w2x2 b -1
w1x1 w2x2 b 0

11
Use Linear Algebra to Solve w and b

w1x1 w2x2 b 1
? w10 w20 b 1
? w13 w29 b 1
w1x1 w2x2 b -1
? w11 w21 b -1
? w12 w24 b -1
Solution is w1 -3, w2 1, b 1
SVM algorithm can find the solution that returns
a hyperplane with the largest margin

12
Use Solutions to Draw the Planes
Positive Plane ?w ? x? b 1 w1x1 w2x2 b
1 ? -3x1 1x2 1 1 ? x2 3x1
Negative Plane ?w ? x? b -1 w1x1 w2x2 b
-1 ? -3x1 1x2 1 -1 ? x2 -2 3x1
Hyperplane ?w ? x? b 0 w1x1 w2x2 b
0 ? -3x1 1x2 1 0 ? x2 -1 3x1

X1 X2
0 0
1 3
2 6
3 9
X1 X2
0 -2
1 1
2 4
3 7
X1 X2
0 -1
1 2
2 5
3 8
Source Authors Research
13
Simple Data Separated by a Hyperplane
Source Authors Research
14
LIBSVM and Parameter C

LIBSVM A Java Library for SVM
C is very small SVM only considers about
maximizing the margin and the points can be on
the wrong side of the plane.
C value is very large SVM will want very small
slack penalties to make sure that all data points
in each group are separated correctly.

15
Choosing Parameter C
Source LIBSVM
16
4 Basic Kernel Types

LIBSVM has implemented 4 basic kernel types
linear, polynomial, radial basis function, and
sigmoid
0 -- linear u'v
1 -- polynomial (gammau'v coef0)degree
2 -- radial basis function exp(-gammau-v2)
3 -- sigmoid tanh(gammau'v coef0)
We use radial basis function with large parameter
C for this project.

17
Data Preparation Using SVD

SVM is excellent for text classification, but
requires labeled documents to use for training
Singular Value Decomposition (SVD)
separates a matrix into three parts left
eigenvectors, singular values, and right
eigenvectors
decompose data such as images and text.
reduce data size
We will use SVD to cluster

18
SVD Example of 4 Documents

D1 Shipment of gold damaged in a fire
D2 Delivery of silver arrived in a silver truck
D3 Shipment of gold arrived in a truck
D4 Gold Silver Truck

Source Garcia, E., 2006
19
Matrix A USVT
D1 D2 D3 D4
a 1 1 1 0
arrived 0 1 1 0
damaged 1 0 0 0
delivery 0 1 0 0
fire 1 0 0 0
gold 1 0 1 1
in 1 1 1 0
of 1 1 1 0
shipment 1 0 1 0
silver 0 2 0 1
truck 0 1 1 1
Given a matrix A, we can factor it into three
parts U, S, and VT.
Source Garcia, E., 2006
20
Using JAMA to Decompose Matrix A

U
0.3966 -0.1282 -0.2349 0.0941
0.2860 0.1507 -0.0700 0.5212
0.1106 -0.2790 -0.1649 -0.4271
0.1523 0.2650 -0.2984 -0.0565
0.1106 -0.2790 -0.1649 -0.4271
0.3012 -0.2918 0.6468 -0.2252
0.3966 -0.1282 -0.2349 0.0941
0.3966 -0.1282 -0.2349 0.0941
0.2443 -0.3932 0.0635 0.1507
0.3615 0.6315 -0.0134 -0.4890
0.3428 0.2522 0.5134 0.1453

S
4.2055 0.0000 0.0000 0.0000
0.0000 2.4155 0.0000 0.0000
0.0000 0.0000 1.4021 0.0000
0.0000 0.0000 0.0000 1.2302

Source JAMA (MathWorks and the National
Institute of Standards and Technology (NIST))
21
Using JAMA to Decompose Matrix A

V
0.4652 -0.6738 -0.2312 -0.5254
0.6406 0.6401 -0.4184 -0.0696
0.5622 -0.2760 0.3202 0.7108
0.2391 0.2450 0.8179 -0.4624

VT
0.4652 0.6406 0.5622 0.2391
-0.6738 0.6401 -0.2760 0.2450
-0.2312 -0.4184 0.3202 0.8179
-0.5254 -0.0696 0.7108 -0.4624

Matrix A can be reconstructed by multiplying
matrices USVT

Source JAMA
22
Rank 2 Approximation (Reduced U, S, and V
Matrices)

U
0.3966 -0.1282
0.2860 0.1507
0.1106 -0.2790
0.1523 0.2650
0.1106 -0.2790
0.3012 -0.2918
0.3966 -0.1282
0.3966 -0.1282
0.2443 -0.3932
0.3615 0.6315
0.3428 0.2522

S
4.2055 0.0000
0.0000 2.4155

V
0.4652 -0.6738
0.6406 0.6401
0.5622 -0.2760
0.2391 0.2450

23
Use Matrix V to Calculate Cosine Similarities

calculate cosine similarities for each document.
sim(D, D) (D D) / (D D)
example, Calculate for D1
sim(D1, D2) (D1 D2) / (D1 D2)
sim(D1, D3) (D1 D3) / (D1 D3)
sim(D1, D4) (D1 D4) / (D1 D4)

24
Result for Cosine Similarities

Example result for D1
sim(D1, D2) ((0.4652 0.6406) (-0.6738
0.6401)) -0.1797
?( (0.4652)2 (-0.6738)2 ) ?( (0.6406)2
(0.6401) 2 )
sim(D1, D3) ((0.4652 0.5622) (-0.6738
-0.2760)) 0.8727
?( (0.4652)2 (-0.6738)2 ) ?( (0.5622)2
(-0.2760)2 )
sim(D1, D4) ((0.4652 0.2391) (-0.6738
0.2450)) -0.1921
?( (0.4652)2 (-0.6738)2 ) ?( (0.2391)2
(0.2450)2 )
D3 returns the highest value, pair D1 with D3
Do the same for D2, D3, and D4.

25
Result of Simple Data Set

D1 3
D2 4
D3 1
D4 2

label 1 1 3
label 2 2 4

label 1
D1 Shipment of gold damaged in a fire
D3 Shipment of gold arrived in a truck
label 2
D2 Delivery of silver arrived in a silver truck
D4 Gold Silver Truck

26
Check Cluster Using SVM

Now we have the label, we can use it to train
with SVM
SVM input format on original data
1 11.00 20.00 31.00 40.00 51.00 61.00
71.00 81.00 91.00 100.00 110.00
2 11.00 21.00 30.00 41.00 50.00 60.00
71.00 81.00 90.00 102.00 111.00
1 11.00 21.00 30.00 40.00 50.00 61.00
71.00 81.00 91.00 100.00 111.00
2 10.00 20.00 30.00 40.00 50.00 61.00
70.00 80.00 90.00 101.00 111.00

27
Results from SVMs Prediction
Results from SVMs Prediction on Original Data
Documents use for Training Predict the Following Document SVM Prediction Result SVD Cluster Result
D1, D2, D3 D4 1.0 2
D1, D2, D4 D3 1.0 1
D1, D3, D4 D2 2.0 2
D2, D3, D4 D1 1.0 1
Source Authors Research
28
Using Truncated V Matrix

We want to reduce data size, more practical to
use truncated V matrix
SVM input format (truncated V matrix)
1 10.4652 2-0.6738
2 10.6406 20.6401
1 10.5622 2-0.2760
2 10.2391 20.2450

29
SVM Result From Truncated V Matrix
Results from SVMs Prediction on Reduced Data
Documents use for Training Predict the Following Document SVM Prediction Result SVD Cluster Result
D1, D2, D3 D4 2.0 2
D1, D2, D4 D3 1.0 1
D1, D3, D4 D2 2.0 2
D2, D3, D4 D1 1.0 1
Using truncated V matrix gives better results.
Source Authors Research
30
Vector Documents on a Graph
D2
D4
D3
D1
Source Authors Research
31
Analysis of the Rank Approximation
Cluster Results from Different Ranking
Approximation
Rank 1 Rank 2 Rank 3 Rank 4
D1 4 D2 4 D3 4 D4 3 D1 3 D2 4 D3 1 D4 2 D1 3 D2 3 D3 1 D4 3 D1 2 D2 3 D3 2 D4 2
label 1 1 4 2 3 label 1 1 3 label 2 2 4 label 1 1 3 2 4 label 1 1 2 3 4
Source Authors Research
32
Program Process Flow

use the previous methods on larger data sets
compare the results with that of humans
classification
Program Process Flow

33
Conceptual Exploration

Reuters-21578
a collection of newswire articles that have been
human-classified by Carnegie Group, Inc. and
Reuters, Ltd
most widely used data set for text categorization

34
Result Analysis
Clustering with SVD vs. Humans Classification
First Data Set
First Data Set from Reuters-21578 (200 x 9928) First Data Set from Reuters-21578 (200 x 9928) First Data Set from Reuters-21578 (200 x 9928)
of Naturally Formed Cluster using SVD SVD Cluster Accuracy SVM Prediction Accuracy
Rank 002 80 75.0 65.0
Rank 005 66 81.5 82.0
Rank 010 66 60.5 54.0
Rank 015 64 52.0 51.5
Rank 020 67 38.0 46.5
Rank 030 72 60.0 54.0
Rank 040 72 62.5 58.5
Rank 050 73 54.5 51.5
Rank 100 75 45.5 58.5
Source Authors Research
35
Result Analysis
Clustering with SVD vs. Humans Classification
Second Data Set
Second Data Set from Reuters-21578 (200 x 9928) Second Data Set from Reuters-21578 (200 x 9928) Second Data Set from Reuters-21578 (200 x 9928)
of Naturally Formed Cluster using SVD SVD Cluster Accuracy SVM Prediction Accuracy
Rank 002 76 67.0 84.5
Rank 005 73 67.0 84.5
Rank 010 64 70.0 85.5
Rank 015 64 63.0 81.0
Rank 020 67 59.5 50.0
Rank 030 69 68.5 83.5
Rank 040 69 59.0 79.0
Rank 050 76 44.5 25.5
Rank 100 71 52.0 47.0
Source Authors Research
36
Result Analysis

highest percentage accuracy for SVD clustering is
81.5
lower rank value seems to give better results
SVM predicts about the same accuracy as SVD
cluster

37
Result Analysis Why results may not be higher?

humans classification is more subjective than a
program
reducing many small clusters to only 2 clusters
by computing the average may decrease the
accuracy

38
Conclusion

Showed how SVM works
Explore the strength of SVM
Showed how SVD can be used for clustering
Analyzed simple and complex data
the method seems to cluster data reasonably
Our method is able to
reduce data size (by using truncated V matrix)
cluster data reasonably
classify new data efficiently (based on SVM)
By combining known methods, we created a form of
unsupervised SVM

39
Future Work

extend SVD to very large data set that can only
be stored in secondary storage
looking for more efficient kernels of SVM

40
Thank You!
41
References

Bennett, K. P., Campbell, C. (2000). Support
Vector Machines Hype or
Hellelujah?. ACM SIGKDD Explorations. VOl. 2,
No. 2, 1-13
Chang, C Lin, C. (2006). LIBSVM a library for
support vector machines,
Retrived November 29, 2006, from
http//www.csie.ntu.edu.tw/cjlin/libsvm
Cristianini, N. (2001). Support Vector and Kernel
Machines. Retrieved November 29, 2005, from
http//www.support-vector.net/icml-tutorial.pdf
Cristianini, N., Shawe-Taylor, J. (2000). An
Introduction to Support Vector
Machines. Cambridge UK Cambridge University
Press
Garcia, E. (2006). SVD and LSI Tutorial 4 Latent
Semantic Indexing (LSI) How-to Calculations.
Retrieved November 28, 2006, from
http//www.miislita.com/information-retrieval-tut
orial/svd-lsi-tutorial-4-lsi-how-to-calculations.h
tml
Guestrin, C. (2006). Machine Learning. Retrived
November 8, 2006, from
http//www.cs.cmu.edu/guestrin/Class/10701/
Hicklin, J., Moler, C., Webb, P. (2005). JAMA
A Java Matrix Package. Retrieved November 28,
2006, from http//math.nist.gov/javanumerics/jama/

42
References

Joachims, T. (1998). Text Categorization with
Support Vector Machines Learning with Many
Relevant Features. http//www.cs.cornell.edu/Peopl
e/tj/publications/joachims_98a.pdf
Joachims, T. (2004). Support Vector Machines.
Retrived November 28, 2006, from
http//svmlight.joachims.org/
Reuters-21578 Text Categorization Test
Collection.
Retrived November 28, 2006, from
http//www.daviddlewis.com/resources/testcollectio
ns/reuters21578/
SVM - Support Vector Machines. DTREG. Retrived
November 28, 2006, from
http//www.dtreg.com/svm.htm
Vapnik, V. N. (2000, 1995). The Nature of
Statistical Learning Theory.
Springer-Verlag New York, Inc.

Write a Comment

User Comments (0)

About PowerShow.com

Clustering High Dimensional Data Using SVM PowerPoint PPT Presentation