Some Discussion about Text Categorization - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Some Discussion about Text Categorization

Description:

Support Vector Machine. k-Nearest Neighbor. Linear Least Squares Fit. Neural Network ... Feature selection CHI, IG, DF(90%) Classifier SVM, kNN, LLSF. Na ve bayes ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 13
Provided by: Keii
Category:

less

Transcript and Presenter's Notes

Title: Some Discussion about Text Categorization


1
Some Discussion about Text Categorization
  • Hsin-Chen Chiao
  • 11/29/2001

2
Flow Chart
Reference 2
Reference 3
Reference 1
CHI
IG
CHI
IG
DF
CHI
IG
DF
MI
TS
MI
TS
SVM
kNN
LLSF
kNN
LLSF
kNN
LLSF
NNet
NB
WORD
Macro, Micro s, p, r, Error F measure Accuracy
Recall
Precision
Macro, Micro s, p, t F measure
11-pt AVGP
3
Feature Selection Methods
  • Document Frequency remove rare terms
  • Information Gain
  • Mutual Information
  • statistic
  • Term Strength

D
t B
c A C
4
Results
5
Feature Selection Discussion
  • Favoring common terms or rare terms
  • Task sensitive or task free
  • Using term absence to predict the category
    probability
  • The performance of kNN and LLSF

6
Classifier Methods
  • Support Vector Machine
  • k-Nearest Neighbor
  • Linear Least Squares Fit
  • Neural Network
  • Naïve Bayes

7
Results
  • Micro-level (s-test)
  • SVMgtkNNgtgtLLSF, NNetgtgtNB
  • Macro-level (S-test, T-test, Ts test)
  • SVM, kNN, LLSFgtgtNB, NNet
  • P-test
  • SVM, kNNgtLLSFgtNNetgtgtNB

8
Evaluation Method Category Ranking
  • Recall, Precision
  • 11-point average precision
  • For each doc., compute r p in the rank list
  • For each interval in r threshold 0100, find
    the highest precision value
  • For r threshold 100, use either 0 or the value
  • 11 per-interval average precision
  • 11-point average precision

9
Evaluation Method Binary Classifier
  • ra/(ac)
  • pa/(ab)
  • fb/(bd)
  • Acc(ad)/n
  • Err(bc)/n 1.2/931.3
  • F12rp/(rp)
  • BEP (break-even point, recall precision)

10
Results
11
Summarization
  • Feature selection CHI, IG, DF(90)
  • Classifier SVM, kNN, LLSF
  • Naïve bayes
  • Evaluation 11-point AVGP, BEP

12
Reference
  • 1. A comparative study on feature selection in
    text categorization
  • Yiming Yang, Jan O. Pedersen
  • 2. A re-examination of text categorization
    methods
  • Yiming Yang, Xin Liu
  • 3. An evaluation of statistical approaches to
    text categorization
  • Yiming Yang
Write a Comment
User Comments (0)
About PowerShow.com