Performance Comparison: Two Versions of Higher Order Nave Bayes - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Performance Comparison: Two Versions of Higher Order Nave Bayes

Description:

POLITICS. 67.86. 83.34. 83.57. RELIGION. 64.13. 74.18. 74.45. Average: 60.74. 76.72. 76.76 ... MIT Press. Support Vector Machine (SVM) ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 15
Provided by: reu
Category:

less

Transcript and Presenter's Notes

Title: Performance Comparison: Two Versions of Higher Order Nave Bayes


1
Performance ComparisonTwo Versions of Higher
Order Naïve BayesSupport Vector Machines
  • Advisor
  • William M. Pottenger, PhD
  • Associate Research Professor
  • Computer Science and DIMACSRutgers University
  • Presenter
  • Phyo Thiha
  • Swarthmore College

July 17 , 2008 DIMACS, Rutgers University
2
Flashback (i)
  • Data Mining Machine Learning
  • Discovering relevant patterns from large dataset
  • Make prediction or form rules from patterns found
  • Text Classification
  • Assign categories to documents based on contents
  • Naïve Bayes
  • Simple probabilistic classifier
  • Assume independence between instances

3
Flashback (ii)
  • Higher Order Naïve Bayes (HONB)

D1
D2
D3
Fig. Forming a Higher Order Path Between Documents
4
Task One
  • Using PURE Higher Order Paths
  • Old Paths With Different Orders (2nd and 1st
    Orders)
  • New Filtered Higher Order Paths (Only 2nd Order)

Fig. Filtering Out Lower Order Paths
5
Task One
  • Using PURE Higher Order Paths
  • Old Paths With Different Orders (2nd and 1st
    Orders)
  • New Filtered Higher Order Paths (Only 2nd Order)

Fig. Filtering Out Lower Order Paths
6
Task One
  • 20 Newsgroups Dataset
  • Training/Test set ratio 25/475
  • 8 random trials for each subset

Table Average Percentage Accuracy of Naïve
Bayes, Filtered and Unfiltered HONB on Different
Datasets
7
Task One
Figures Percentage Accuracy Comparisons for
Filtered and Unfiltered HONB on Different Dataset
8
Task Two
  • Support Vector Machines (SVM)
  • Set of Related Supervised Learning Methods
  • Widely Used and Known for Good Performance

Table Preliminary Percentage Accuracy of Naïve
Bayes, SVM and Unfiltered HONB on Different
Datasets
9
Task Two
  • Preliminary Results Use Default Values
  • Parameters of Interest
  • C complexity parameter
  • Exponent and Lower_Order_Terms
  • Radial Basis Function kernel and Gamma value
  • Can We Get Better SVM Performance?

10
Task Two
  • Dataset 20 Newsgroups 8 random trials/dataset

Table Parameter Table for Different Experiment
Setups Note - means set to Default value.
Default values for Exponent1
Lower_Order_TermF RBFF Gamma0.01
11
Task Two
  • Best Results obtained with Setup 1 C 0.1
    1.0

Figures Accuracy Comparisons for Filtered HONB
(average values) and SMO on Different Datasets
12
Future Work
  • Information Gain
  • Decision Trees
  • Apply to HO Information for Building Better
    Models
  • Can We Do Better?

13
References
  • Ganiz and Pottenger. A Novel Bayesian Classifier
    For Sparse Data (draft 2008).
  • Ian H. Witten and Eibe Frank (2005) "Data Mining
    Practical machine learning tools and techniques",
    2nd Edition, Morgan Kaufmann, San Francisco,
    2005.
  • J. Platt (1998). "Fast Training of Support Vector
    Machines using Sequential Minimal Optimization".
    Advances in Kernel Methods - Support Vector
    Learning, B. Schoelkopf, C. Burges, and A. Smola,
    eds., MIT Press.
  • Support Vector Machine (SVM)
  • URL http//en.wikipedia.org/wiki/Support_vector_
    machine
  • 20 Newsgroup
  • URL http//people.csail.mit.edu/jrennie/20Newsgr
    oups/
  • Information Gain
  • URL http//en.wikipedia.org/wiki/Information_gai
    n_in_decision_trees

14
THANKS!
Special Thanks to - Professor Pottenger (my
adviser) - Ciibin George (graduate student of my
adviser) - Murat Ginaz (for support and
explanation in filtering out lower order paths
from HONB)
Write a Comment
User Comments (0)
About PowerShow.com