Intimate Learning with Very Few Labeled Data - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Intimate Learning with Very Few Labeled Data

Description:

setup. 9. Train feature model by unlabeled training data. An EM-based Approach ... setup. 12 (Cont'd: Course name identification) ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 23

Provided by: Zhon

Category:

more less

Transcript and Presenter's Notes

Title: Intimate Learning with Very Few Labeled Data

1
Intimate Learning with Very Few Labeled Data
CMPT-825 Course Project

Presenter Zhongmin Shi
Advisor Anoop Sarkar

2
Road Map

Motivation
Intimate learning
Course name identification
EM
Co-training
Evaluation
Related work
Conclusion

3
Motivation

Fully unsupervised learning is ambitious
Combining labeled and unlabeled data
Situations when labeled data are too few

4
An Information Extraction Task

Identify course name from web pages
Seed information 10 web pages.
Unlabeled training data 174 web pages.
Test data 40 web pages.

lt/HEADgt ltBODYgt lth1gt CS - 664 Machine Vision
lt/h1gt ltdtgtltbgtCourse Stafflt/bgt
5
One common solution
Parameter estimation
Seed info. Course names
Course name Feature model
Unlabeled data Prob. model
setup
Adjust weight
Parameter estimation
Test data Prob. model

Experimental results Precision 3.11, Recall
0.31
Diverse word combinations
Too many sparse words
Too little seed information

6
Intimate Learning

Target class
Intimate class
on which target classes have certain dependence
Easier to identify that target classes
Identify intimate classes first, which would
provide great hints to target class
identification
Course numbers course names
Why a FSM does not work in identifying course
number?
WMX 3510, MON 10 2pm

7
Identification with Intimate Learning
Course number
Course name
Unlabeled data Prob. model
Unlabeled data Prob. model
Course numbers
Parameter estimation
Adjust weight
Parameter estimation
Adjust weight
Course number Feature model
Seed information Course numbers names
Course name Feature model
setup
setup
Parameter estimation
Parameter estimation
Test data Prob. model
Test data Prob. model
Course numbers
8
Course Number Identification

Initialize course number feature model by seeds
Assumption 1 course number consists of an
uppercased word and a positive integer.
Course number features fi
Probability of being the feature of a course
number

Course name Feature model
Seed information
setup
9
(Contd Course Number Identification)

Train feature model by unlabeled training data
An EM-based Approach
E-step re-estimate course number probability
model
For each word sequence wi
M-step adjust feature weight
For each feature probability P 0 (fi,j )
An 2-means clustering to allow a rough
convergence

Unlabeled data Prob. model
Parameter estimation
Adjust weight
Course number Feature model
10
(Contd Course Number Identification)

Evaluate feature model by test data
Predict course numbers same to E-step
Select course number candidates 2-means

11
Course name identification

Initialize course name feature model by seeds
Assumption 2 course name appears together with a
course number.
Course name features fi
f1 associated course number
f2 HTML format ltltag, rtaggt
f3, f5 course name and its length
f4, f6 separator 2 and its length
Probability of being the feature of a course name

Course name Feature model
Seed information
setup
12
(Contd Course name identification)

Train course name feature model by unlabeled
training data
Similar EM-based approach
E-step re-estimate course name probability model
For each word sequence wi
M-step adjust feature weight
For each feature probability P 0 (fi,j )

13
(Contd Course name identification)

Test course name feature model
Predict course numbers same to E-step
Select course number candidates 2-means
Two evaluation metrics
Phrase-based
Word-based

14
Evaluation

Precision-call w.r.t. the number of EM iterations
Best F-measure 47.2

a. Phrase-based b. Word-based
15
(Contd Evaluation)

EM-based learning needs improvement
Problems with unlabeled training data
Not typical enough to match features of test
data?
Trial an EM training between feature models and
class probability models of test data

16
(Contd Evaluation)
Course number
Course name
Unlabeled data Prob. model
Unlabeled data Prob. model
Course numbers
Parameter estimation
Adjust weight
Parameter estimation
Adjust weight
Course number Feature model
Seed information Course numbers names
Course name Feature model
setup
setup
Parameter estimation
Parameter estimation
Adjust weight
Adjust weight
Test data Prob. model
Test data Prob. model
Course numbers
17
(Contd Evaluation)

Best F-measure 48.6

a. Phrase-based b. Word-based
18
Co-training Based Approach

Co-training (Blum and Mitchell, 1998)
Provides two different views to data
They influence each other to make the final
prediction
In our approach
Two dependent classes course name and course
number
One can provide features to the other to improve
the identification performance.

19
(Contd Co-training based approach)
Course number
Course name
Course names
Unlabeled data Prob. model
Unlabeled data Prob. model
Parameter estimation
Course numbers
Adjust weight
Parameter estimation
Adjust weight
Course number Feature model
Seed information Course numbers names
Course name Feature model
setup
setup
Parameter estimation
Parameter estimation
Adjust weight
Adjust weight
Course names
Test data Prob. model
Test data Prob. model
Course numbers
20
Related work

Scoped learning
Take advantage of local regularities in global
feature training
Co-boosting
Use labeled and unlabeled data to build two
classifiers in parallel
Balance the disagreements between two classifers
on the unlabeled examples
Co-training

21
Conclusion

Intimate learning significantly improves
classification performance.
Applicable in case that target classes are not
totally independent
For weak intimate, co-training based approach
would be helpful
Scoped learning helps in training global feature
by the local regularities
EM-based algorithm needs further improvement

22
References

1 Richard O Duda and Peter E. Hart. Pattern
classification and Scene Analysis. Wiley, 1973.
2 D. Yarowsky. Unsupervised word sense
disambiguation rivalling supervised methods. In
Proceedings of the 33rd Annual Meeting of the
Association of computational Linguistics, pages
189-196, 1995.
3 CMU World Wide Knowledge Base (Web-gtKB)
project.
http//www-2.cs.cmu.edu/afs/cs.cmu.edu/project/the
o-11/www/wwkb/
4 A. P. Dempster, N. M. Laird, and D. B. Rubin.
Maximum likelihood from Incomplete Data Via the
EM Algorithm. Journal of the Royal Statistical
Society. Ser B, 39, 1-38. 1997.
5 David Blei, Drew Bagnell and Andrew McCallum.
Learning with Scope, with Application to
Information Extraction and Classification.
Conference on Uncertainty in Artificial
Intelligence (UAI), 2002.
6 Blum, A. and Mitchell, T. Combining labeled
and unlabeled data with co-training. In
Proceedings of the Conference on computational
Learning Theory. 1998.
7 Michael Collins and Yoram Singer. 1999.
Unsupervised Models for Named Entity
Classification. EMNLP/VLC-99.