Language Model - PowerPoint PPT Presentation

About This Presentation
Title:

Language Model

Description:

Language Model Language Model Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 20
Provided by: remk
Category:

less

Transcript and Presenter's Notes

Title: Language Model


1
Language Model
2
Language Model
  • Major role
  • Language Models help a speech recognizer figure
    out how likely a word sequence is, independent of
    the acoustics. A lot of candidates can be
    eliminated and it is possible to give other words
    higher probabilities.

3
LM
  • This lets the recognizer make the right
  • guess when two different sentences
  • Sound the same.
  • For example
  • Its fun to recognize speech?
  • Its fun to wreck a nice beach?

4
LM
  • The Bayesian rule
  • To maximize we look today at P(W)

5
LM
  • Ultimate goal is that a speech recognizer
    performs a good as human being.
  • In psychology a lot of research has been done.
  • The eel was on the shoe
  • The eel was on the car
  • People capable to adjusting to right context
  • removes ambiguities
  • limits possible words
  • Already very good language models for dedicated
    applications (e.g. medical, a lot of
    standardization)

6
classification
  • Language models used in speech recognition can be
    classified into the following categories
  • Uniform models the chance a word occurs is 1 /
    V. V is the size of the vocabulary
  • Finite state machines
  • Grammar models they use context free grammars
  • Stochastic models they determine the chance of a
    word on its preceding words (eg n-grams)

7
CFG
  • A grammar is defined by
  • G (V, T, P, S) whereV contains the set of
    all non-terminal symbols. T contains the set of
    all terminal symbols. P is a set of production
    or production rules. S is a special symbol
    called the start symbol.
  • Example of rules
  • S -gt NP VP VP -gt V NPNP -gt NOUNNP -gt NAMENOUN
    -gt speechNAME -gt Julie Ethan VERB -gt loves
    chases

8
CFG
  • Parsing
  • bottom up where you start with the input sentence
    and try to reach the start symbol
  • Top down, you start with the starting symbol and
    try to reach the input sentence by applying the
    appropriate rules. Left recursion is a problem.
    (A -gt Aa)
  • Advantage bottom up
  • What is the weather forecast for this
    afternoon?
  • A lot of parsing algorithms available from
    computer science

Problem people dont follow the rules of grammar
strictly, especially in spoken language. Creating
a grammar that covers all this constructions is
unfeasible.
9
probabilistic CFG
  • A mixture between formal language and
    probabilistic models is the PCFG
  • If there are m rules for left-hand side non
    terminal node
  • Then probability of these rules is
  • Where C denotes the number of times each rule is
    used.

10
Stochastic language models
  • In formal language theory P(W) can be regarded as
    1 if the word sequence is accepted or as 0 if it
    is rejected.
  • N-grams
  • The probability that wi will follow, given that
    the word sequence was presented previously

11
N-grams
  • Unigram
  • Bigram
  • Trigram

12
gram example
To calculate this probability, we need to compute
both the number of times "am" is preceded by "I"
and the number of times "here" is preceded by "I
am."
All four sounds the same, right decision can only
be made by language model.
13
training
  • Training is done by very large training sets with
    millions of words.
  • Still a lot of legal word sequences wont be
    considered during the training.
  • Because it is unfeasible to train on every
    possible sequence of words, it will occur that
    for legal sequences P(W) is zero.

14
training
  • Solutions to overcome this problem
  • A practical approach is to assume this
    probability depends only on an equivalence class.
    For example, group all nouns in an equivalence
    class.
  • A technique called smoothing adjusts very low and
    very high probabilities. So 0 en 1 wont occur
    anymore.

15
evaluation
  • The most common metric for a LM is looking at the
    word recognition error rate. This requires a
    complete SR system.
  • Another method is known as perplexity

16
perplexity
  • Encode text W using 2logP(W) bits.
  • Then the cross-entropy H(W) is
  • Where N is the length of the text.
  • The perplexity is then defined as

17
example
  • Training set
  • John read her book
  • I read a different book
  • John read a book by Mulan

18
example
  • These bigram probabilities help us estimate the
    probability for the sentence as
  • P(John read a book)
  • P(Johnltsgt)P(readJohn)P(booka)P(lt/sgtbook)
  • 0.148
  • Then cross entropy -1/42log(0.148) 0.689
  • So perplexity 20.689 1.61
  • Comparison Wall street journal text (5000 words)
    has a bigram perplexity of 128

19
evalutation
  • High perplexity means that the number of words
    branching from a previous word is larger on
    average.
  • Low perplexity does not guarantee good
    performance.
  • For example B,C,D,E,G,P,T has 7 but does not take
    into account acoustic confusability.
Write a Comment
User Comments (0)
About PowerShow.com