Language Model - PowerPoint PPT Presentation

About This Presentation

Title:

Language Model

Description:

Language Model Language Model Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 20

Provided by: remk

Category:

more less

Transcript and Presenter's Notes

Title: Language Model

1
Language Model
2
Language Model

Major role
Language Models help a speech recognizer figure
out how likely a word sequence is, independent of
the acoustics. A lot of candidates can be
eliminated and it is possible to give other words
higher probabilities.

3
LM

This lets the recognizer make the right
guess when two different sentences
Sound the same.
For example
Its fun to recognize speech?
Its fun to wreck a nice beach?

4
LM

The Bayesian rule
To maximize we look today at P(W)

5
LM

Ultimate goal is that a speech recognizer
performs a good as human being.
In psychology a lot of research has been done.
The eel was on the shoe
The eel was on the car
People capable to adjusting to right context
removes ambiguities
limits possible words
Already very good language models for dedicated
applications (e.g. medical, a lot of
standardization)

6
classification

Language models used in speech recognition can be
classified into the following categories
Uniform models the chance a word occurs is 1 /
V. V is the size of the vocabulary
Finite state machines
Grammar models they use context free grammars
Stochastic models they determine the chance of a
word on its preceding words (eg n-grams)

7
CFG

A grammar is defined by
G (V, T, P, S) whereV contains the set of
all non-terminal symbols. T contains the set of
all terminal symbols. P is a set of production
or production rules. S is a special symbol
called the start symbol.
Example of rules
S -gt NP VP VP -gt V NPNP -gt NOUNNP -gt NAMENOUN
-gt speechNAME -gt Julie Ethan VERB -gt loves
chases

8
CFG

Parsing
bottom up where you start with the input sentence
and try to reach the start symbol
Top down, you start with the starting symbol and
try to reach the input sentence by applying the
appropriate rules. Left recursion is a problem.
(A -gt Aa)
Advantage bottom up
What is the weather forecast for this
afternoon?
A lot of parsing algorithms available from
computer science

Problem people dont follow the rules of grammar
strictly, especially in spoken language. Creating
a grammar that covers all this constructions is
unfeasible.
9
probabilistic CFG

A mixture between formal language and
probabilistic models is the PCFG
If there are m rules for left-hand side non
terminal node
Then probability of these rules is
Where C denotes the number of times each rule is
used.

10
Stochastic language models

In formal language theory P(W) can be regarded as
1 if the word sequence is accepted or as 0 if it
is rejected.
N-grams
The probability that wi will follow, given that
the word sequence was presented previously

11
N-grams

Unigram
Bigram
Trigram

12
gram example
To calculate this probability, we need to compute
both the number of times "am" is preceded by "I"
and the number of times "here" is preceded by "I
am."
All four sounds the same, right decision can only
be made by language model.
13
training

Training is done by very large training sets with
millions of words.
Still a lot of legal word sequences wont be
considered during the training.
Because it is unfeasible to train on every
possible sequence of words, it will occur that
for legal sequences P(W) is zero.

14
training

Solutions to overcome this problem
A practical approach is to assume this
probability depends only on an equivalence class.
For example, group all nouns in an equivalence
class.
A technique called smoothing adjusts very low and
very high probabilities. So 0 en 1 wont occur
anymore.

15
evaluation

The most common metric for a LM is looking at the
word recognition error rate. This requires a
complete SR system.
Another method is known as perplexity

16
perplexity

Encode text W using 2logP(W) bits.
Then the cross-entropy H(W) is
Where N is the length of the text.
The perplexity is then defined as

17
example

Training set
John read her book
I read a different book
John read a book by Mulan

18
example

These bigram probabilities help us estimate the
probability for the sentence as
P(John read a book)
P(Johnltsgt)P(readJohn)P(booka)P(lt/sgtbook)
0.148
Then cross entropy -1/42log(0.148) 0.689
So perplexity 20.689 1.61
Comparison Wall street journal text (5000 words)
has a bigram perplexity of 128

19
evalutation

High perplexity means that the number of words
branching from a previous word is larger on
average.
Low perplexity does not guarantee good
performance.
For example B,C,D,E,G,P,T has 7 but does not take
into account acoustic confusability.

Write a Comment

User Comments (0)