LING 572 - PowerPoint PPT Presentation

1 / 49

About This Presentation

Title:

LING 572

Description:

Introduction. LING 572. Fei Xia. Week 1: 1/3/06. Outline. Course overview. Problems and methods ... Introduction to probability and statistics. Expectations. Reading: ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 50

Provided by: facultyWa4

Learn more at: http://faculty.washington.edu

Category:

more less

Transcript and Presenter's Notes

Title: LING 572

1
Introduction

LING 572
Fei Xia
Week 1 1/3/06

2
Outline

Course overview
Problems and methods
Mathematical foundation
Probability theory
Information theory

3
Course overview
4
Course objective

Focus on statistical methods that produce
state-of-the-art results
Questions for each algorithm
How the algorithm works input, output, steps
What kind of tasks an algorithm can be applied
to?
How much data is needed?
Labeled data
Unlabeled data

5
General info

Course website
Syllabus (incl. slides and papers) updated every
week.
Message board
ESubmit
Office hour W 3-5pm.
Prerequisites
Ling570 and Ling571.
Programming C, C, or Java, Perl is a plus.
Introduction to probability and statistics

6
Expectations

Reading
Papers are online who dont have access to
printers?
Reference book Manning Schutze (MS)
Finish reading before class. Bring your questions
to class.
Grade
Homework (3) 30
Project (6 parts) 60
Class participation 10
No quizzes, exams

7
Assignments

Hw1 FSA and HMM
Hw2 DT, DL, and TBL.
Hw3 Boosting
No coding
Bring the finished assignments to class.

8
Project

P1 Method 1 (Baseline) Trigram
P2 Method 2 TBL
P3 Method 3 MaxEnt
P4 Method 4 choose one of four tasks.
P5 Presentation
P6 Final report
Methods 1-3 are supervised methods.
Method 4 bagging, boosting, semi-supervised
learning, or system combination.
P1 is an individual task, P2-P6 are group tasks.
A group should have no more than three people.
Use ESubmit
Need to use others code and write your own code.

9
Summary of Ling570

Overview corpora, evaluation
Tokenization
Morphological analysis
POS tagging
Shallow parsing
N-grams and smoothing
WSD
NE tagging
HMM

10
Summary of Ling571

Parsing
Semantics
Discourse
Dialogue
Natural language generation (NLG)
Machine translation (MT)

11
570/571 vs. 572

572 focuses more on statistical approaches.
570/571 are organized by tasks 572 is organized
by learning methods.
I assume that you know
The basics of each task POS tagging, parsing,
The basic concepts PCFG, entropy,
Some learning methods HMM, FSA,

12
An example

570/571
POS tagging HMM
Parsing PCFG
MT Model 1-4 training
572
HMM forward-backward algorithm
PCFG inside-outside algorithm
MT EM algorithm
? All special cases of EM algorithm, one method
of unsupervised learning.

13
Course layout

Supervised methods
Decision tree
Decision list
Transformation-based learning (TBL)
Bagging
Boosting
Maximum Entropy (MaxEnt)

14
Course layout (cont)

Semi-supervised methods
Self-training
Co-training
Unsupervised methods
EM algorithm
Forward-backward algorithm
Inside-outside algorithm
EM for PM models

15
Outline

Course overview
Problems and methods
Mathematical foundation
Probability theory
Information theory

16
Problems and methods
17
Types of ML problems

Classification problem
Estimation problem
Clustering
Discovery
A learning method can be applied to one or more
types of ML problems.
We will focus on the classification problem.

18
Classification problem

Given a set of classes and data x, decide which
class x belongs to.
Labeled data
(xi, yi) is a set of labeled data.
xi is a list of attribute values.
yi is a member of a pre-defined set of classes.

19
Examples of classification problem

Disambiguation
Document classification
POS tagging
WSD
PP attachment given a set of other phrases
Segmentation
Tokenization / Word segmentation
NP Chunking

20
Learning methods

Modeling represent the problem as a formula and
decompose the formula into a function of
parameters
Training stage estimate the parameters
Test (decoding) stage find the answer given the
parameters

21
Modeling

Joint vs. conditional models
P(data, model)
P(model data)
P(data model)
Decomposition
Which variable conditions on which variable?
What independent assumptions?

22
An example of different modeling
23
Training

Objective functions
Maximize likelihood
Minimize error rate
Maximum entropy
.
Supervised, semi-supervised, unsupervised
Ex Maximize likelihood
Supervised simple counting
Unsupervised EM

24
Decoding

DP algorithm
CYK for PCFG
Viterbi for HMM
Pruning
TopN keep topN hyps at each node.
Beam keep hyps whose weights gt beam
max_weight
Threshold keep hyps whose weights gt threshold

25
Outline

Course overview
Problems and methods
Mathematical foundation
Probability theory
Information theory

26
Probability Theory
27
Probability theory

Sample space, event, event space
Random variable and random vector
Conditional probability, joint probability,
marginal probability (prior)

28
Sample space, event, event space

Sample space (O) a collection of basic outcomes.
Ex toss a coin twice HH, HT, TH, TT
Event an event is a subset of O.
Ex HT, TH
Event space (2O) the set of all possible events.

29
Random variable

The outcome of an experiment need not be a
number.
We often want to represent outcomes as numbers.
A random variable is a function that associates a
unique numerical value with every outcome of an
experiment.
Random variable is a function X O?R.
Ex toss a coin once X(H)1, X(T)0

30
Two types of random variable

Discrete random variable X takes on only a
countable number of distinct values.
Ex Toss a coin 10 times. X is the number of
tails that are noted.
Continuous random variable X takes on
uncountable number of possible values.
Ex X is the lifetime (in hours) of a light bulb.

31
Probability function

The probability function of a discrete variable X
is a function which gives the probability p(xi)
that the random variable equals xi a.k.a. p(xi)
p(Xxi).

32
Random vector