Natural Language Processing

About This Presentation

Title:

Description:

Number of Views:18

Avg rating:3.0/5.0

Slides: 25

Provided by: jimma87

Learn more at: https://people.cs.pitt.edu

Category:

Tags: language | natural | processing | pronounciation

Transcript and Presenter's Notes

Title: Natural Language Processing

1
Natural Language Processing

2
Part of Speech Tagging

3
Noisy Channel

An influential metaphor in natural language
processing is the noisy channel (of
communication) model
The channel introduces noise that makes it hard
to recognize the true word
Build a model of how the channel modifies the
word

4
Noisy Channel

Obvious applications include
Speech recognition (pronounciation, etc)
POS tagging
Spelling correction (spelling errors)
Not so obvious
Semantic analysis (intended meaning versus how
the speaker says it and how the listener
interprets it)
Machine translation
I.e German to English is a matter of
uncorrupting the original signal

5
Conditional Probability

6
Conditionals Defined

7
Bayes Rule

8
Bayes and the Noisy Channel

In applying Bayes to the noisy channel we want to
compute the most likely source given some
observed (corrupt) output signal
Argmaxi P(SourceiSignal)
Often (not always) this is hard to get, so we
apply Bayes

9
Bayes and Noisy Channel

10
Argmax and Bayes

What does this mean?
Argmax
Plug in each possible source and compute the
corresponding probability. Pick the one with the
highest
Note the denominator is the same for each source
candidate so we can ignore it for the purposes of
the argmax

11
Argmax and Bayes

Ignoring the denominator leaves us with two
factors P(Source) and P(SignalSource)

12
Bayesian Decoding

P(Source) This is often referred to as a
language model. It encodes information about the
likelihood of particular sequences (or
structures) independent of the observed signal.
P(Signal Source) This encodes specific
information about how the channel tends to
introduce noise. How likely is it that a given
source would produce an observed signal.

13
Note

This framework is general it makes minimal
assumptions about the nature of the application,
the source, or the channel.
Now, back to POS tagging

14
Hidden Markov Models
15
An example
Short for P(planeN)0.2
16
Viterbi Algorithm
L1 should be Li on this line
17
Viterbi an example
18
Fall 2006

We did not cover the rest of these notes in
class, to leave time for other topics.

19
The Brill tagger

An example of TRANSFORMATION-BASED LEARNING
Very popular (freely available, works fairly
well)
A SUPERVISED method requires a tagged corpus
Basic idea do a quick job first (using
frequency), then revise it using contextual rules

20
An example

Examples
It is expected to race tomorrow.
The race for outer space.
Tagging algorithm
Tag all uses of race as NN (most likely tag in
the Brown corpus)
It is expected to race/NN tomorrow
the race/NN for outer space
Use a transformation rule to replace the tag NN
with VB for all uses of race preceded by the
tag TO
It is expected to race/VB tomorrow
the race/NN for outer space

21
Transformation-based learning in the Brill tagger

Tag the corpus with the most likely tag for each
word
Choose a TRANSFORMATION that replaces an existing
tag with a new one such that the resulting tagged
corpus has the lowest error rate
Apply that transformation to the training corpus
Repeat 2-3
Return a tagger that
first tags using unigrams
then applies the learned transformations in order