Text classification: In Search of a Representation - PowerPoint PPT Presentation

About This Presentation
Title:

Text classification: In Search of a Representation

Description:

'Prior art' Yang: best results using k-NN: 82.3% microaveraged accuracy. Joachim's results using Support Vector Machine unlabelled data ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 30
Provided by: stanm1
Category:

less

Transcript and Presenter's Notes

Title: Text classification: In Search of a Representation


1
Text classification In Search of a
Representation
  • Stan Matwin
  • School of Information Technology and Engineering
  • University of Ottawa
  • stan_at_site.uottawa.ca

2
Outline
  • Supervised learningclassification
  • ML/DM at U of O
  • Classical approach
  • Attempt at a linguistic representation
  • N-grams how to get them?
  • Labelling and co-learning
  • Next steps?

3
Supervised learning (classification)
  • Given
  • a set of training instances Tet, where each t
    is a class label one of the classes C1,Ck
  • a concept with k classes C1,Ck (but the
    definition of the concept is NOT known)
  • Find
  • a description for each class which will perform
    well in determining (predicting) class membership
    for unseen instances

4
Classification
  • Prevalent practice
  • examples are represented as vectors of values
    of attributes
  • Theoretical wisdom,
  • confirmed empirically the more examples, the
    better predictive accuracy

5
ML/DM at U of O
  • Learning from imbalanced classes applications in
    remote sensing
  • a relational, rather than propositional
    representation learning the maintainability
    concept
  • Learning in the presence of background knowledge.
    Bayesian belief networks and how to get them.
    Appl to distributed DB

6
Why text classification?
  • Automatic file saving
  • Internet filters
  • Recommenders
  • Information extraction

7
Text classification standard approach
  1. Remove stop words and markings
  2. remaining words are all attributes
  3. A document becomes a vector ltword, frequencygt
  4. Train a boolean classifier for each class
  5. Evaluate the results on an unseen sample

Bag of words
8
Text classification tools
  • RIPPER
  • A coveringlearner
  • Works well with large sets of binary features
  • Naïve Bayes
  • Efficient (no search)
  • Simple to program
  • Gives degree of belief

9
Prior art
  • Yang best results using k-NN 82.3
    microaveraged accuracy
  • Joachims results using Support Vector Machine
    unlabelled data
  • SVM insensitive to high dimensionality,
    sparseness of examples

10
SVM in Text classification
SVM
Transductive SVM Maximum separation Margin for
test set
  • Training with 17 examples in 10 most frequent
    categories gives test performance of 60 on 3000
    test cases available during training

11
Problem 1 aggressive feature selection
12
Problem 2 semantic relationships are missed
13
Proposed solution (Sam Scott)
  • Get noun phrases and/or key phrases (Extractor)
    and add to the feature list
  • Add hypernyms

14
Hypernyms - WordNet
15
Evaluation (Lewis)
  • Vary the loss ratio parameter
  • For each parameter value
  • Learn a hypothesis for each class (binary
    classification)
  • Micro-average the confusion matrices (add
    component-wise)
  • Compute precision and recall
  • Interpolate (or extrapolate) to find the point
    where micro- averaged precision and recall are
    equal

16
Results
  • No gain over BW in alternative representations
  • But
  • Comprehensibility

17
Combining classifiers
  • Comparable to best known results (Yang)

18
Other possibilities
  • Using hypernyms with a small training set (avoids
    ambiguous words)
  • Use BayesRipper in a cascade scheme (Gama)
  • Other representations

19
Collocations
  • Do not need to be noun phrases, just pairs of
    words possibly separated by stop words
  • Only the well discriminating ones are chosen
  • These are added to the bag of words, and
  • Ripper

20
N-grams
  • N-grams are substrings of a given length
  • Good results in Reuters Mladenic, Grobelnik
    with Bayes we try RIPPER
  • A different task classifying text files
  • Attachments
  • Audio/video
  • Coded
  • From n-grams to relational features

21
How to get good n-grams?
  • We use Ziv-Lempel for frequent substring
    detection (.gz!)

abababa
a
b
a
a
b
b
a
22
N-grams
  • Counting
  • Pruning
  • substring occurrence ratio lt acceptance threshold
  • Building relations string A almost always
    precedes string B
  • Feeding into relational learner (FOIL)

23
Using grammar induction (text files)
  • Idea detect patterns of substrings
  • Patterns are regular languages
  • Methods of automata induction a recognizer for
    each class of files
  • We use a modified version of RPNI2 Dupont,
    Miclet

24
Whats new
  • Work with marked up text (Word, Web)
  • XML with semantic tags mixed blessing for DM/TM
  • Co-learning
  • Text mining

25
Co-learning
  • How to use unlabelled data? Or How to limit the
    number of examples that need be labelled?
  • Two classifiers and two redundantly sufficient
    representations
  • Train both, run both on test set,
  • add best predictions to training set

26
Co-learning
  • Training set grows as
  • each learner predicts independently due to
    redundant sufficiency (different representations)
  • would also work with our learners if we used
    Bayes?
  • Would work with classifying emails

27
Co-learning
  • Mitchell experimented with the task of
    classifying web pages (profs, students, courses,
    projects) a supervised learning task
  • Used
  • Anchor text
  • Page contents
  • Error rate halved (from 11 to 5)

28
Cog-sci?
  • Co- learning seems to be cognitively justified
  • Model students learning in groups (pairs)
  • What other social learning mechanisms could
    provide models for supervised learning?

29
Conclusion
  • A practical task, needs a solution
  • No satisfactory solution so far
  • Fruitful ground for research
Write a Comment
User Comments (0)
About PowerShow.com