Intimate Learning with Very Few Labeled Data - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Intimate Learning with Very Few Labeled Data

Description:

setup. 9. Train feature model by unlabeled training data. An EM-based Approach ... setup. 12 (Cont'd: Course name identification) ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 23
Provided by: Zhon
Category:

less

Transcript and Presenter's Notes

Title: Intimate Learning with Very Few Labeled Data


1
Intimate Learning with Very Few Labeled Data
CMPT-825 Course Project
  • Presenter Zhongmin Shi
  • Advisor Anoop Sarkar

2
Road Map
  • Motivation
  • Intimate learning
  • Course name identification
  • EM
  • Co-training
  • Evaluation
  • Related work
  • Conclusion

3
Motivation
  • Fully unsupervised learning is ambitious
  • Combining labeled and unlabeled data
  • Situations when labeled data are too few

4
An Information Extraction Task
  • Identify course name from web pages
  • Seed information 10 web pages.
  • Unlabeled training data 174 web pages.
  • Test data 40 web pages.

lt/HEADgt ltBODYgt lth1gt CS - 664 Machine Vision
lt/h1gt ltdtgtltbgtCourse Stafflt/bgt
5
One common solution
Parameter estimation
Seed info. Course names
Course name Feature model
Unlabeled data Prob. model
setup
Adjust weight
Parameter estimation
Test data Prob. model
  • Experimental results Precision 3.11, Recall
    0.31
  • Diverse word combinations
  • Too many sparse words
  • Too little seed information

6
Intimate Learning
  • Target class
  • Intimate class
  • on which target classes have certain dependence
  • Easier to identify that target classes
  • Identify intimate classes first, which would
    provide great hints to target class
    identification
  • Course numbers course names
  • Why a FSM does not work in identifying course
    number?
  • WMX 3510, MON 10 2pm

7
Identification with Intimate Learning
Course number
Course name
Unlabeled data Prob. model
Unlabeled data Prob. model
Course numbers
Parameter estimation
Adjust weight
Parameter estimation
Adjust weight
Course number Feature model
Seed information Course numbers names
Course name Feature model
setup
setup
Parameter estimation
Parameter estimation
Test data Prob. model
Test data Prob. model
Course numbers
8
Course Number Identification
  • Initialize course number feature model by seeds
  • Assumption 1 course number consists of an
    uppercased word and a positive integer.
  • Course number features fi
  • Probability of being the feature of a course
    number

Course name Feature model
Seed information
setup
9
(Contd Course Number Identification)
  • Train feature model by unlabeled training data
  • An EM-based Approach
  • E-step re-estimate course number probability
    model
  • For each word sequence wi
  • M-step adjust feature weight
  • For each feature probability P 0 (fi,j )
  • An 2-means clustering to allow a rough
    convergence

Unlabeled data Prob. model
Parameter estimation
Adjust weight
Course number Feature model
10
(Contd Course Number Identification)
  • Evaluate feature model by test data
  • Predict course numbers same to E-step
  • Select course number candidates 2-means

11
Course name identification
  • Initialize course name feature model by seeds
  • Assumption 2 course name appears together with a
    course number.
  • Course name features fi
  • f1 associated course number
  • f2 HTML format ltltag, rtaggt
  • f3, f5 course name and its length
  • f4, f6 separator 2 and its length
  • Probability of being the feature of a course name

Course name Feature model
Seed information
setup
12
(Contd Course name identification)
  • Train course name feature model by unlabeled
    training data
  • Similar EM-based approach
  • E-step re-estimate course name probability model
  • For each word sequence wi
  • M-step adjust feature weight
  • For each feature probability P 0 (fi,j )

13
(Contd Course name identification)
  • Test course name feature model
  • Predict course numbers same to E-step
  • Select course number candidates 2-means
  • Two evaluation metrics
  • Phrase-based
  • Word-based

14
Evaluation
  • Precision-call w.r.t. the number of EM iterations
  • Best F-measure 47.2

a. Phrase-based b. Word-based
15
(Contd Evaluation)
  • EM-based learning needs improvement
  • Problems with unlabeled training data
  • Not typical enough to match features of test
    data?
  • Trial an EM training between feature models and
    class probability models of test data

16
(Contd Evaluation)
Course number
Course name
Unlabeled data Prob. model
Unlabeled data Prob. model
Course numbers
Parameter estimation
Adjust weight
Parameter estimation
Adjust weight
Course number Feature model
Seed information Course numbers names
Course name Feature model
setup
setup
Parameter estimation
Parameter estimation
Adjust weight
Adjust weight
Test data Prob. model
Test data Prob. model
Course numbers
17
(Contd Evaluation)
  • Best F-measure 48.6

a. Phrase-based b. Word-based
18
Co-training Based Approach
  • Co-training (Blum and Mitchell, 1998)
  • Provides two different views to data
  • They influence each other to make the final
    prediction
  • In our approach
  • Two dependent classes course name and course
    number
  • One can provide features to the other to improve
    the identification performance.

19
(Contd Co-training based approach)
Course number
Course name
Course names
Unlabeled data Prob. model
Unlabeled data Prob. model
Parameter estimation
Course numbers
Adjust weight
Parameter estimation
Adjust weight
Course number Feature model
Seed information Course numbers names
Course name Feature model
setup
setup
Parameter estimation
Parameter estimation
Adjust weight
Adjust weight
Course names
Test data Prob. model
Test data Prob. model
Course numbers
20
Related work
  • Scoped learning
  • Take advantage of local regularities in global
    feature training
  • Co-boosting
  • Use labeled and unlabeled data to build two
    classifiers in parallel
  • Balance the disagreements between two classifers
    on the unlabeled examples
  • Co-training

21
Conclusion
  • Intimate learning significantly improves
    classification performance.
  • Applicable in case that target classes are not
    totally independent
  • For weak intimate, co-training based approach
    would be helpful
  • Scoped learning helps in training global feature
    by the local regularities
  • EM-based algorithm needs further improvement

22
References
  • 1 Richard O Duda and Peter E. Hart. Pattern
    classification and Scene Analysis. Wiley, 1973.
  • 2 D. Yarowsky. Unsupervised word sense
    disambiguation rivalling supervised methods. In
    Proceedings of the 33rd Annual Meeting of the
    Association of computational Linguistics, pages
    189-196, 1995.
  • 3 CMU World Wide Knowledge Base (Web-gtKB)
    project.
  • http//www-2.cs.cmu.edu/afs/cs.cmu.edu/project/the
    o-11/www/wwkb/
  • 4 A. P. Dempster, N. M. Laird, and D. B. Rubin.
    Maximum likelihood from Incomplete Data Via the
    EM Algorithm. Journal of the Royal Statistical
    Society. Ser B, 39, 1-38. 1997.
  • 5 David Blei, Drew Bagnell and Andrew McCallum.
    Learning with Scope, with Application to
    Information Extraction and Classification.
    Conference on Uncertainty in Artificial
    Intelligence (UAI), 2002.
  • 6 Blum, A. and Mitchell, T. Combining labeled
    and unlabeled data with co-training. In
    Proceedings of the Conference on computational
    Learning Theory. 1998.
  • 7 Michael Collins and Yoram Singer. 1999.
    Unsupervised Models for Named Entity
    Classification. EMNLP/VLC-99.
Write a Comment
User Comments (0)
About PowerShow.com