Course Summary - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Course Summary

Description:

Decide what kind of feature values to use (e.g., binarizing features or not) ... Bugs. Complaints. Things you like about Mallet. Course summary. 9 weeks: 18 sessions ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 41
Provided by: coursesWa5
Category:
Tags: bug | course | is | kind | of | summary | this | what

less

Transcript and Presenter's Notes

Title: Course Summary


1
Course Summary
  • LING 572
  • Fei Xia
  • 03/06/07

2
Outline
  • Problem description
  • General approach
  • ML algorithms
  • Important concepts
  • Assignments
  • Whats next?

3
Problem descriptions
4
Two types of problems
  • Classification problem
  • Sequence Labeling problem
  • In both cases
  • A predefined set of labels C c1, c2, cn
  • Training data (xi, yi) , where yi 2 C, and yi
    is known or unknown.
  • Test data

5
NLP tasks
  • Classification problems
  • Document classification
  • Spam detection
  • Sentiment analysis
  • Sequence labeling problems
  • POS tagging
  • Word segmentation
  • Sentence segmentation
  • NE detection
  • Parsing
  • IGT detection

6
General approach
7
Step 1 Preprocessing
  • Converting the NLP task to a classification or
    sequence labeling problem
  • Creating the attribute-value table
  • Define feature templates
  • Instantiate feature templates and select features
  • Decide what kind of feature values to use (e.g.,
    binarizing features or not)
  • Converting a multi-class problem to a binary
    problem (optional)

8
Feature selection
  • Dimensionality reduction
  • Feature selection
  • Wrapping methods
  • Filtering methods
  • Mutual info, ?2, Information gain, .
  • Feature extraction
  • Term clustering
  • Latent semantic indexing (LSI)

9
Multiclass ? Binary
  • One-vs-all
  • All-pairs
  • Error-correcting Output Codes (ECOC)

10
Step 2 Training and decoding
  • Choose a ML learner
  • Train and test on development set, with
    different settings of non-model parameters
  • Choose the best setting for the development set
  • Run the learner on the test data with the best
    setting

11
Step 3 Post-processing
  • Label sequence ? the output we want
  • System combination
  • Voting majority voting, weighted voting
  • More sophisticated models

12
Supervised algorithms
13
Main ideas
  • kNN and Ricchio finding the nearest neighbors /
    prototypes
  • DT and DL finding the right group
  • NB, MaxEnt calculating P(y x)
  • Bagging Reducing the instability
  • Boosting Forming a committee
  • TBL Improving the current guess

14
ML learners
  • Modeling
  • Training
  • Testing (a.k.a. decoding)

15
Modeling
  • NB assuming features are conditionally
    independent.
  • MaxEnt

16
Training
  • kNN no training
  • Rocchio calculate prototypes
  • DT build a decision tree
  • Choose a feature and then split data
  • DL build a decision list
  • Choose a decision rule and then spit data
  • TBL build a transformation list by
  • Choose a transformation and then update the
    current label field

17
Training (cont)
  • NB calculate P(ci) and P(fj ci) by simple
    counting.
  • MaxEnt calculate the weights of feature
    functions by iteration.
  • Bagging create bootstrap samples and learn base
    classifiers.
  • Boosting learn base classifiers and their
    weights.

18
Testing
  • kNN calculate distances between x and xi, find
    the closest neighbors.
  • Rocchio calculate distances between x and
    prototypes.
  • DT traverse the tree
  • DL find the first matched decision rule.
  • TBL apply transformations one by one.

19
Testing (cont)
  • NB calc
  • MaxEnt calc
  • Bagging run the base classifiers and choose the
    class with highest votes.
  • Boosting run the base classifiers and calc the
    weighted sum.

20
Sequence labeling problems
  • With classification algorithms
  • Having features that refer to previous tags
  • Using beam search to find good sequences
  • With sequence labeling algorithms
  • HMM
  • TBL
  • MEMM
  • CRF

21
Semi-supervised algorithms
  • Self-training
  • Co-training
  • ? Adding some unlabeled data to the labeled data

22
Unsupervised algorithms
  • MLE
  • EM
  • General algorithm E-step, M-step
  • EM for PM models
  • Forward-backward for HMM
  • Inside-outside for PCFG
  • IBM models for MT

23
Important concepts
24
Concepts
  • Attribute-value table
  • Feature templates vs. features
  • Weights
  • Feature weights
  • Classifier weights
  • Instance weights
  • Feature values

25
Concepts (cont)
  • Maximum entropy vs. Maximum likelihood
  • Maximize likelihood vs. minimize training error
  • Training time vs. test time
  • Training error vs. test error
  • Greedy algorithm vs. iterative approach

26
Concepts (cont)
  • Local optima vs. global optima
  • Beam search vs. Viterbi algorithm
  • Sample vs. resample
  • Model parameters vs. non-model parameters

27
Assignments
28
Assignments
  • Read code
  • NB binary features?
  • DT difference between DT and C4.5
  • Boosting AdaBoost and AdaBoostM2
  • MaxEnt binary features?
  • Write code
  • Info2Vectors
  • BinVectors
  • ?2
  • Complete two projects

29
Projects
  • Steps
  • Preprocessing
  • Training and testing
  • Postprocssing
  • Two projects
  • Project 1 Document classification
  • Project 2 IGT detection

30
Project 1 Document classification
  • A typical classification problem
  • Data are prepared already
  • Feature template word appeared in the doc
  • Feature value word frequency

31
Project 2 IGT detection
  • Can be framed as a sequence labeling problem
  • Preprocessing Define label set
  • Postprocessing Tag sequence ? spans
  • Sequence labeling problem ? using classification
    algorithm with beam search
  • To use classification classifiers
  • Preprocessing
  • Define features
  • Choose feature values

32
Project 2 (cont)
  • Preprocessing
  • Define label set
  • Define feature templates
  • Decide on feature values
  • Training and decoding
  • Write beam search
  • Postprocessing
  • Convert label sequence ? spans

33
Project 2 (cont)
  • Presentation
  • Final report
  • A typical conference paper
  • Introduction
  • Previous work
  • Methodology
  • Experiments
  • Discussion
  • Conclusion

34
Using Mallet
  • Difficulties
  • Java
  • A large package
  • Benefits
  • Java
  • A large package
  • Many learning algorithms comparing the
    implementation with standard algorithms

35
Bugs in Mallet?
  • In Hw9, include a new section
  • Bugs
  • Complaints
  • Things you like about Mallet

36
Course summary
  • 9 weeks 18 sessions
  • 2 kinds of problems
  • 9 supervised algorithms
  • 1 semi-supervised algorithm
  • 1 unsupervised algorithm
  • 4 related issues feature selection, multiclass ?
    binary, system combination, beam search
  • 2 projects
  • 1 well-known package
  • 9 assignments, including 1 presentation and 1
    final report
  • N papers

37
Whats the next?
  • Learn more about the algorithms covered in class.
  • Learn new algorithms
  • SVM, CRF, regression algorithms, graphical
    models,
  • Try new tasks
  • Parsing, spam filtering, reference resolution,

38
Misc
  • Hw7 due tomorrow 11pm
  • Hw8 due Thursday 11pm
  • Hw9 due 3/13 11pm
  • Presentation No more than 155 minutes

39
What must be included in the presentation?
  • Label set
  • Feature templates
  • Effect of beam search
  • 3 ways to improve the system and results on dev
    data (test_data/)
  • Best system results on dev data and the setting
  • Results on test data (more_test_data/)

40
Grades, etc.
  • 9 assignments class participation
  • Hw1-Hw6
  • Total 740
  • Max 696.56
  • Min 346.52
  • Ave 548.74
  • Median 559.08
Write a Comment
User Comments (0)
About PowerShow.com