language modelling - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

language modelling

Description:

Perplexity: average of options ... A more intuitive representation of LP is the perplexity ... Perplexity Examples. Bibliography: ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 26
Provided by: mfp
Category:

less

Transcript and Presenter's Notes

Title: language modelling


1
language modelling
  • María Fernández Pajares
  • Verarbeitung gesprochener Sprache

2
Index
  • 1.introduction
  • 2. regular grammars
  • 3. stochastics languages
  • 4. N-grams models
  • 5. perplexity

3
Introduction Language models
  • What is a language model?
  • Its a language structure defining method, in
    order to limit the most probable linguistic units
    sequences.
  • They tend to be useful for aplications which show
    a complex syntax and/or semantic.
  • A good ML should only accept( with a high
    probability) right sentences and reject (or give
    a low probability) to wrong word sequences.
  • CLASSIC MODELS
  • - N-gramms
  • - Stochastic Grammars.

4
Introduction general scheme of a system
signal
text
measurement of parameters
comparison of models
Rule of decision
Acustic and grammar models
5
Introduction tasks difficulty measurement
  • Determined by the admited languages real
    flexibility
  • Perplexity average of options
  • There are finer measures that take into account
    the difficulty of the words or the acustics
    models
  • Speech recognizers seek the word sequence W which
    is most likely to be produced from acoustic
    evidence A
  • Speech recognition involves acoustic processing,
    acoustic modelling, language modelling, and
    search

6
  • Language models (LMs) assign a probability
    estimate P(W ) to word sequences W w1,...,wn
    subject to
  • Language models help guide and constrain the
    search among alternative word hypotheses during
    recognition
  • Huge vocabularies integration of the acoustic
    models and of the language in a hidden
    macro-model in the Markov to all the language.

7
Introduction problems dificulty dimensions
conectivity
(noise, robustness)
speakers
Vocabulary and language complexity
8
Introduction MODELS BASED IN GRAMMARS
They represent language restrictions in a
natural way They allow the modelling of
dependencies as long as required the definition
of these models involves a big difficulty for
tasks that entail languages next to natural
languages (pseudo-natural) Integration with the
acustic model isnt very natural
9
Introduction Kinds of grammars
  • If we take the following grammar G(N,S,P,S)
  • Chomsky hierarchy
  • 0. No restrictions in the rules? too complex to
    be useful
  • 1 Sensible rules to the context? too complex
  • 2 Independent of the context?they are used in
    experimental systems
  • 3 regulars or Finite state

10
Grammars and automat
  • Every kind of grammar is relationed with a kind
    of automat, that recognizes it
  • Kind 0 (without restrictions) Turing Machine
  • Kind 1(free of context) lineal limited automat
  • Kind 2 (sensibles to the context)push-down
    automat
  • Kind 3 (regulars) finite state automat

11
Regular grammars
  • A regular grammar is any
  • right-linear or left-linear grammar
  • Examples
  • Regular grammars generate regular languages

Languages Generated by Regular Grammars
Regular Languages
12
space search
13
An example
14
Grammars and stochastics languages
  • Add a probability to each of the production rules
  • A stochastics grammar is a couple (G,p)
  • Where G is a grammar and p is a function
    pP?0,1 that has the property
  • Where represents a set of grammar
    rules whos antecedent is A.
  • A stochastic language over an alphabet
    is a pair that fulfill the
    following conditions

15
example
16
N-gramms models
P(W) can be broken down like When n2
?bigrams When n3?trigrams
17
Example
  • Let us suppose that the result of an acoustic
    decoding assigns to resemblances probabilities to
    the phrases
  • If
  • P(pig the)P(big the) then the election of
    one or another depends of the word dog.
  • P(the pig dog)P(the). P(pig the). P(dog
    the pig)
  • P(the big dog)P(the). P(big the). P(dog
    the big)
  • as P(dog the big)gt P(dog the pig) the model
    helps to decode the sentences correctly
  • Problems
  • Necessity of elevating number of learning
    samples
  • unigram
  • bigram
  • trigram

18
  • Advantages
  • Probabilities are based on data
  • Parameters determined automatically from
    corpora
  • Incorporate local syntax, semantics, and
    pragmatics
  • Many languages have a strong tendency toward
    standard word order and are thus substantially
    local
  • Relatively easy to integrate into forward
    search methods such as Viterbi (bigram) or A
  • Disadvantages
  • Unable to incorporate long-distance
    constraints
  • Not well suited for flexible word order
    languages
  • Cannot easily accommodate
  • New vocabulary items
  • Alternative domains
  • Dynamic changes (e.g., discourse)
  • Not as good as humans at tasks of
  • Identifying and correcting recognizer errors
  • Predicting following words (or letters)
  • Do not capture meaning for speech understanding

19
Estimation of the Probabilities
  • We go to you suppose that the model of N-gramms
    has been modelized with a finite automat
  • Unigram bigram w1w2 trigram w1w2w3
  • Let us suppose that they we have a sample of
    training, on which has considered a model of
    N-gramms, represented like a finite automat.
  • A state of the automat is q, and is c (q) is
    total number of events (N-gramas) observed in the
    sample when model is in state q.

20
  • C(wq) is the number of times that the word w has
    been observed in the sample,being the model in
    the state q.
  • P(wq) is the probability of observation of the
    word w conditioned to the state q.
  • The set of words observed in the sample when the
    model is in the state q.
  • The total vocabulary of the language that has to
    be modelate
  • For example in a bigram
  • This attitude approach assigns the probability 0
    to the events that havent been said? this cause
    problems of cover?the solution is smooth the
    model?we can smooth the model withplane,lineal,no
    lineal, back-off, sintact back-off..

21
  • Bigrams are easily incorporated in Viterbi search
  • Trigrams used for large vocabulary recognition in
    mid-1970s and remain the dominant language modeL
  • IBM TRIGRAM EXAMPLE

22
  • Methods, in order to measure the probability of
    ungesehenen N-grams
  • n-gram performance can be improved by clustering
    words
  • Hard clustering puts a word into a single
    cluster
  • Soft clustering allows a word to belong to
    multiple clusters
  • Clusters can be created manually, or
    automatically
  • Manually created clusters have worked well
    for small domains
  • Automatic clusters have been created
    bottom-up or top-down

23
PERPLEXITY
  • Average of options
  • Quantifying LM Complexity
  • One LM is better than another if it can
    predict an n word test corpus W with a higher
    probability
  • For LMs representable by the chain rule,
    comparisons are usually based on the average per
    word logprob, LP
  • A more intuitive representation of LP is
    the perplexity
  • (a uniform LM will have PP equal to vocabulary
    size)
  • PP is often interpreted as an average
    branching factor

24
Perplexity Examples
25
Bibliography
  • P. Brown et al., Class-based n-gram models of
    natural language, Computational Linguistics,
    1992.
  • R. Lau, Adaptive Statistical Language
    Modelling, S.M. Thesis, MIT, 1994.
  • M. McCandless, Automatic Acquisition of
    Language Models for Speech Recognition, S.M.
    Thesis, MIT, 1994.
  • L.R.Rabiner y B.-H.JuangFundamentals of Speech
    Recognition,Prentice-Hall,1993
  • GOOGLE
Write a Comment
User Comments (0)
About PowerShow.com