Text classification: In Search of a Representation - PowerPoint PPT Presentation

About This Presentation

Title:

Text classification: In Search of a Representation

Description:

'Prior art' Yang: best results using k-NN: 82.3% microaveraged accuracy. Joachim's results using Support Vector Machine unlabelled data ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 30

Provided by: stanm1

Category:

more less

Transcript and Presenter's Notes

Title: Text classification: In Search of a Representation

1
Text classification In Search of a
Representation

Stan Matwin
School of Information Technology and Engineering
University of Ottawa
stan_at_site.uottawa.ca

2
Outline

Supervised learningclassification
ML/DM at U of O
Classical approach
Attempt at a linguistic representation
N-grams how to get them?
Labelling and co-learning
Next steps?

3
Supervised learning (classification)

Given
a set of training instances Tet, where each t
is a class label one of the classes C1,Ck
a concept with k classes C1,Ck (but the
definition of the concept is NOT known)
Find
a description for each class which will perform
well in determining (predicting) class membership
for unseen instances

4
Classification

Prevalent practice
examples are represented as vectors of values
of attributes
Theoretical wisdom,
confirmed empirically the more examples, the
better predictive accuracy

5
ML/DM at U of O

Learning from imbalanced classes applications in
remote sensing
a relational, rather than propositional
representation learning the maintainability
concept
Learning in the presence of background knowledge.
Bayesian belief networks and how to get them.
Appl to distributed DB

6
Why text classification?

Automatic file saving
Internet filters
Recommenders
Information extraction

7
Text classification standard approach

Remove stop words and markings
remaining words are all attributes
A document becomes a vector ltword, frequencygt
Train a boolean classifier for each class
Evaluate the results on an unseen sample

Bag of words
8
Text classification tools

RIPPER
A coveringlearner
Works well with large sets of binary features
Naïve Bayes
Efficient (no search)
Simple to program
Gives degree of belief

9
Prior art

Yang best results using k-NN 82.3
microaveraged accuracy
Joachims results using Support Vector Machine
unlabelled data
SVM insensitive to high dimensionality,
sparseness of examples

10
SVM in Text classification
SVM
Transductive SVM Maximum separation Margin for
test set

Training with 17 examples in 10 most frequent
categories gives test performance of 60 on 3000
test cases available during training

11
Problem 1 aggressive feature selection
12
Problem 2 semantic relationships are missed
13
Proposed solution (Sam Scott)

Get noun phrases and/or key phrases (Extractor)
and add to the feature list
Add hypernyms

14
Hypernyms - WordNet
15
Evaluation (Lewis)

Vary the loss ratio parameter
For each parameter value
Learn a hypothesis for each class (binary
classification)
Micro-average the confusion matrices (add
component-wise)
Compute precision and recall
Interpolate (or extrapolate) to find the point
where micro- averaged precision and recall are
equal

16
Results

No gain over BW in alternative representations
But
Comprehensibility

17
Combining classifiers

Comparable to best known results (Yang)

18
Other possibilities

Using hypernyms with a small training set (avoids
ambiguous words)
Use BayesRipper in a cascade scheme (Gama)
Other representations

19
Collocations

Do not need to be noun phrases, just pairs of
words possibly separated by stop words
Only the well discriminating ones are chosen
These are added to the bag of words, and
Ripper

20
N-grams

N-grams are substrings of a given length
Good results in Reuters Mladenic, Grobelnik
with Bayes we try RIPPER
A different task classifying text files
Attachments
Audio/video
Coded
From n-grams to relational features

21
How to get good n-grams?

We use Ziv-Lempel for frequent substring
detection (.gz!)

abababa
a
b
a
a
b
b
a
22
N-grams

Counting
Pruning
substring occurrence ratio lt acceptance threshold
Building relations string A almost always
precedes string B
Feeding into relational learner (FOIL)

23
Using grammar induction (text files)

Idea detect patterns of substrings
Patterns are regular languages
Methods of automata induction a recognizer for
each class of files
We use a modified version of RPNI2 Dupont,
Miclet

24
Whats new

Work with marked up text (Word, Web)
XML with semantic tags mixed blessing for DM/TM
Co-learning
Text mining

25
Co-learning

How to use unlabelled data? Or How to limit the
number of examples that need be labelled?
Two classifiers and two redundantly sufficient
representations
Train both, run both on test set,
add best predictions to training set

26
Co-learning

Training set grows as
each learner predicts independently due to
redundant sufficiency (different representations)
would also work with our learners if we used
Bayes?
Would work with classifying emails

27
Co-learning

Mitchell experimented with the task of
classifying web pages (profs, students, courses,
projects) a supervised learning task
Used
Anchor text
Page contents
Error rate halved (from 11 to 5)

28
Cog-sci?

Co- learning seems to be cognitively justified
Model students learning in groups (pairs)
What other social learning mechanisms could
provide models for supervised learning?

29
Conclusion