Natural Language Processing

About This Presentation

Title:

Natural Language Processing

Description:

Stocks plunged this morning, despite a cut in interest rates by the ... Why dost stand forth thy canopy, forsooth; he is this palpable hit the King Henry. ... – PowerPoint PPT presentation

Number of Views:26

Avg rating:3.0/5.0

Slides: 39

Provided by: jimma87

Learn more at: http://www.cs.pitt.edu

more less

Transcript and Presenter's Notes

Title: Natural Language Processing

1
Natural Language Processing

Lecture Notes 6

2
Word Prediction

Stocks plunged this

3
Word Prediction

Stocks plunged this morning, despite a cut in
interest

4
Word Prediction

Stocks plunged this morning, despite a cut in
interest rates by the Federal Reserve, as Wall

5
Word Prediction

Stocks plunged this morning, despite a cut in
interest rates by the Federal Reserve, as Wall
Street began

6
Word Prediction

Stocks plunged this morning, despite a cut in
interest rates by the Federal Reserve, as Wall
Street began trading for the first time since
last

7
Word Prediction

Stocks plunged this morning, despite a cut in
interest rates by the Federal Reserve, as Wall
Street began trading for the first time since
last Tuesdays terrorist attacks.

8
Word Prediction

So, we can predict future words in an utterance
How?
Domain knowledge
Syntactic knowledge
Lexical knowledge
We will use probabilities

9
Word Prediction

If you can predict the next word, you can predict
the likelihood of sequences containing various
alternative words.
That will help us resolve POS, WSD,spelling
correction, hand-writing recognition, speech
recognition, augmentative communication

10
N-GramsThe big red dog

Unigrams P(dog)
Bigrams P(dogred)
Trigrams P(dogbig red)
Four-grams P(dogthe big red)
In general, well be dealing with
P(Word Some fixed prefix)

11
Using N-Grams

P(I want to eat Chinese food) P(Istart)P(wantI)
P(toI want)P(foodI want to eat Chinese)
Markov assumptions
Bigrams (Istart)P(wantI)P(towant)P(foodChines
e)
Trigrams
P(Istart)P(wantI)P(toI want)P(foodeat
Chinese)

12
BERP Table CountsBerkely Restaurant Project

This isnt the complete table. E.g., I occurs
3437 times (see p. 201 in 1st edition)
13
BERP Table Bigram Probabilities
14
An Aside on Logs

You dont really do all those multiplies. The
numbers are too small and lead to underflows
Convert the probabilities to logs and then do
additions.

15
Generation

Choose N-Grams with non-0 probabilities and
string them together to get a feeling for
accuracy of the N-gram model

16
Shakespere

Unigrams
Every enter now severally so, let
Hill he late speaks or! A more to leg less first
you enter
Bigrams
What means, sir. I confess she? Then all sorts,
he is trim, captain.
Why dost stand forth thy canopy, forsooth he is
this palpable hit the King Henry.

17
Shakespeare

Trigrams
Sweet prince, Falstaff shall die.
This shall forbid it should be branded, if renown
made it empty
Quadigrams
What! I will go seek the traitor Gloucester
Will you not tell me who I am?

18
Observations

A small number of events occur with high
frequency
You can collect reliable statistics on these
events with relatively small samples
A large number of events occur with small
frequency
You might have to wait a long time to gather
statistics on the low frequency events

19
Observations

Some zeroes are really zeroes
Meaning that they represent events that cant or
shouldnt occur
On the other hand, some zeroes arent really
zeroes
They represent low frequency events that simply
didnt occur in the corpus

20
Dealing with Problem of Zero Counts

Dont use higher order N-grams
Smoothing
Add-one
Witten-Bell
Backoff

21
Discounting or Smoothing

MLE is usually unsuitable for NLP because of the
sparseness of the data
We need to allow for possibility of seeing events
not seen in training
Must use a Discounting or Smoothing technique
Decrease the probability of previously seen
events to leave a little bit of probability for
previously unseen events

22
Add-one Smoothing (Laplaces law)

Pretend we have seen every n-gram at least once
Intuitively
new_count(n-gram) old_count(n-gram) 1
The idea is to give a little bit of the
probability space to unseen events
P(wiwi-1)
(count(wi-1 wi)1)/(count(wi-1) V)
Later slides count(wi-1) referred to as N

23
Add-one Example (V1616)
unsmoothed bigram counts
2nd word
unsmoothed normalized bigram probabilities
24
Add-one Example (V1616)
add-one smoothed bigram counts
add-one normalized bigram probabilities
25
Problem add-one smoothing (V1616)

bigrams starting with Chinese are boosted by a
factor of 8 ! (1829 / 213)

unsmoothed bigram counts
add-one smoothed bigram counts
26
Problem with add-one smoothing

every previously unseen n-gram is given a low
probability
but there are so many of them that too much
probability mass is given to unseen events
adding 1 to frequent bigram, does not change much
but adding 1 to low bigrams (including unseen
ones) boosts them too much !
In NLP applications that are very sparse,
Laplaces Law actually gives far too much of the
probability space to unseen events.

27
Witten-Bell smoothing

intuition
An unseen n-gram is one that just did not occur
yet
When it does happen, it will be its first
occurrence
So give to unseen n-grams the probability of
seeing a new n-gram

28
Witten-Bell the equations

Total probability mass assigned to zero-frequency
unigrams (T observed types N word
instances/tokens)
So each zero N-gram gets the probability

29
Witten-Bell why discounting

Now of course we have to take away something
(discount) from the probability of the events
seen more than once

30
Witten-Bell for bigrams

We relativize the types to the previous word
this probability mass, must be distributed in
equal parts over all unseen bigrams
Z (w1) number of unseen n-grams starting with
w1
for each unseen event

31
Small example

all unseen bigrams starting with a will share a
probability mass of
each unseen bigram starting with a will have an
equal part of this

32
Small example (cont)

all unseen bigrams starting with b will share a
probability mass of
each unseen bigrams starting with b will have an
equal part of this

33
Small example (cont)

all unseen bigrams starting with c will share a
probability mass of
each unseen bigrams starting with c will have an
equal part of this

34
Back to Counts

Unseen bigrams
To get from probabilities back to the counts, we
know that
// N (w1) nb
of bigrams starting with w1
//C(w2w1) here
means Count(w1 w2)
so we get

35
The restaurant example

The original counts were
T(w) number of different seen bigrams types
starting with w
we have a vocabulary of 1616 words, so we can
compute
Z(w) number of unseen bigrams types starting
with w
Z(w) 1616 - T(w)
N(w) number of bigrams tokens starting with w

36
Witten-Bell smoothed count

the count of the unseen bigram I lunch
the count of the seen bigram want to
Witten-Bell smoothed bigram counts

37
Witten-Bell smoothed probabilities
Witten-Bell normalized bigram probabilities
38
Simple Linear Interpolation

Solve the sparseness in a trigram model by mixing
with bigram and unigram models
Also called
linear interpolation,
finite mixture models
deleted interpolation
Combine linearly
Pli(wnwn-2,wn-1) ?1P(wn) ?2P(wnwn-1)
?3P(wnwn-2,wn-1)
where 0? ?i ?1 and ?i ?i 1

Write a Comment

User Comments (0)