Performance Comparison: Two Versions of Higher Order Nave Bayes

About This Presentation

Title:

Performance Comparison: Two Versions of Higher Order Nave Bayes

Description:

POLITICS. 67.86. 83.34. 83.57. RELIGION. 64.13. 74.18. 74.45. Average: 60.74. 76.72. 76.76 ... MIT Press. Support Vector Machine (SVM) ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 15

Provided by: reu

Category:

more less

Transcript and Presenter's Notes

Title: Performance Comparison: Two Versions of Higher Order Nave Bayes

1
Performance ComparisonTwo Versions of Higher
Order Naïve BayesSupport Vector Machines

Advisor
William M. Pottenger, PhD
Associate Research Professor
Computer Science and DIMACSRutgers University

Presenter
Phyo Thiha
Swarthmore College

July 17 , 2008 DIMACS, Rutgers University
2
Flashback (i)

Data Mining Machine Learning
Discovering relevant patterns from large dataset
Make prediction or form rules from patterns found
Text Classification
Assign categories to documents based on contents
Naïve Bayes
Simple probabilistic classifier
Assume independence between instances

3
Flashback (ii)

Higher Order Naïve Bayes (HONB)

D1
D2
D3
Fig. Forming a Higher Order Path Between Documents
4
Task One

Using PURE Higher Order Paths
Old Paths With Different Orders (2nd and 1st
Orders)
New Filtered Higher Order Paths (Only 2nd Order)

Fig. Filtering Out Lower Order Paths
5
Task One

Using PURE Higher Order Paths
Old Paths With Different Orders (2nd and 1st
Orders)
New Filtered Higher Order Paths (Only 2nd Order)

Fig. Filtering Out Lower Order Paths
6
Task One

20 Newsgroups Dataset
Training/Test set ratio 25/475
8 random trials for each subset

Table Average Percentage Accuracy of Naïve
Bayes, Filtered and Unfiltered HONB on Different
Datasets
7
Task One
Figures Percentage Accuracy Comparisons for
Filtered and Unfiltered HONB on Different Dataset
8
Task Two

Support Vector Machines (SVM)
Set of Related Supervised Learning Methods
Widely Used and Known for Good Performance

Table Preliminary Percentage Accuracy of Naïve
Bayes, SVM and Unfiltered HONB on Different
Datasets
9
Task Two

Preliminary Results Use Default Values
Parameters of Interest
C complexity parameter
Exponent and Lower_Order_Terms
Radial Basis Function kernel and Gamma value
Can We Get Better SVM Performance?

10
Task Two

Dataset 20 Newsgroups 8 random trials/dataset

Table Parameter Table for Different Experiment
Setups Note - means set to Default value.
Default values for Exponent1
Lower_Order_TermF RBFF Gamma0.01
11
Task Two

Best Results obtained with Setup 1 C 0.1
1.0

Figures Accuracy Comparisons for Filtered HONB
(average values) and SMO on Different Datasets
12
Future Work

Information Gain
Decision Trees
Apply to HO Information for Building Better
Models
Can We Do Better?

13
References

Ganiz and Pottenger. A Novel Bayesian Classifier
For Sparse Data (draft 2008).
Ian H. Witten and Eibe Frank (2005) "Data Mining
Practical machine learning tools and techniques",
2nd Edition, Morgan Kaufmann, San Francisco,
2005.
J. Platt (1998). "Fast Training of Support Vector
Machines using Sequential Minimal Optimization".
Advances in Kernel Methods - Support Vector
Learning, B. Schoelkopf, C. Burges, and A. Smola,
eds., MIT Press.
Support Vector Machine (SVM)
URL http//en.wikipedia.org/wiki/Support_vector_
machine
20 Newsgroup
URL http//people.csail.mit.edu/jrennie/20Newsgr
oups/
Information Gain
URL http//en.wikipedia.org/wiki/Information_gai
n_in_decision_trees

14
THANKS!
Special Thanks to - Professor Pottenger (my
adviser) - Ciibin George (graduate student of my
adviser) - Murat Ginaz (for support and
explanation in filtering out lower order paths
from HONB)

Write a Comment

User Comments (0)