language modelling - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

language modelling

Description:

Perplexity: average of options ... A more intuitive representation of LP is the perplexity ... Perplexity Examples. Bibliography: ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 26

Provided by: mfp

Category:

more less

Transcript and Presenter's Notes

Title: language modelling

1
language modelling

María Fernández Pajares
Verarbeitung gesprochener Sprache

2
Index

1.introduction
2. regular grammars
3. stochastics languages
4. N-grams models
5. perplexity

3
Introduction Language models

What is a language model?
Its a language structure defining method, in
order to limit the most probable linguistic units
sequences.
They tend to be useful for aplications which show
a complex syntax and/or semantic.
A good ML should only accept( with a high
probability) right sentences and reject (or give
a low probability) to wrong word sequences.
CLASSIC MODELS
- N-gramms
- Stochastic Grammars.

4
Introduction general scheme of a system
signal
text
measurement of parameters
comparison of models
Rule of decision
Acustic and grammar models
5
Introduction tasks difficulty measurement

Determined by the admited languages real
flexibility
Perplexity average of options
There are finer measures that take into account
the difficulty of the words or the acustics
models
Speech recognizers seek the word sequence W which
is most likely to be produced from acoustic
evidence A
Speech recognition involves acoustic processing,
acoustic modelling, language modelling, and
search

Language models (LMs) assign a probability
estimate P(W ) to word sequences W w1,...,wn
subject to
Language models help guide and constrain the
search among alternative word hypotheses during
recognition
Huge vocabularies integration of the acoustic
models and of the language in a hidden
macro-model in the Markov to all the language.

7
Introduction problems dificulty dimensions
conectivity
(noise, robustness)
speakers
Vocabulary and language complexity
8
Introduction MODELS BASED IN GRAMMARS
They represent language restrictions in a
natural way They allow the modelling of
dependencies as long as required the definition
of these models involves a big difficulty for
tasks that entail languages next to natural
languages (pseudo-natural) Integration with the
acustic model isnt very natural
9
Introduction Kinds of grammars

If we take the following grammar G(N,S,P,S)
Chomsky hierarchy
0. No restrictions in the rules? too complex to
be useful
1 Sensible rules to the context? too complex
2 Independent of the context?they are used in
experimental systems
3 regulars or Finite state

10
Grammars and automat

Every kind of grammar is relationed with a kind
of automat, that recognizes it
Kind 0 (without restrictions) Turing Machine
Kind 1(free of context) lineal limited automat
Kind 2 (sensibles to the context)push-down
automat
Kind 3 (regulars) finite state automat

11
Regular grammars

A regular grammar is any
right-linear or left-linear grammar
Examples
Regular grammars generate regular languages

Languages Generated by Regular Grammars
Regular Languages
12
space search
13
An example
14
Grammars and stochastics languages

Add a probability to each of the production rules
A stochastics grammar is a couple (G,p)
Where G is a grammar and p is a function
pP?0,1 that has the property
Where represents a set of grammar
rules whos antecedent is A.
A stochastic language over an alphabet
is a pair that fulfill the
following conditions

15
example
16
N-gramms models
P(W) can be broken down like When n2
?bigrams When n3?trigrams
17
Example

Let us suppose that the result of an acoustic
decoding assigns to resemblances probabilities to
the phrases
If
P(pig the)P(big the) then the election of
one or another depends of the word dog.
P(the pig dog)P(the). P(pig the). P(dog
the pig)
P(the big dog)P(the). P(big the). P(dog
the big)
as P(dog the big)gt P(dog the pig) the model
helps to decode the sentences correctly
Problems
Necessity of elevating number of learning
samples
unigram
bigram
trigram

Advantages
Probabilities are based on data
Parameters determined automatically from
corpora
Incorporate local syntax, semantics, and
pragmatics
Many languages have a strong tendency toward
standard word order and are thus substantially
local
Relatively easy to integrate into forward
search methods such as Viterbi (bigram) or A
Disadvantages
Unable to incorporate long-distance
constraints
Not well suited for flexible word order
languages
Cannot easily accommodate
New vocabulary items
Alternative domains
Dynamic changes (e.g., discourse)
Not as good as humans at tasks of
Identifying and correcting recognizer errors
Predicting following words (or letters)
Do not capture meaning for speech understanding

19
Estimation of the Probabilities

We go to you suppose that the model of N-gramms
has been modelized with a finite automat
Unigram bigram w1w2 trigram w1w2w3
Let us suppose that they we have a sample of
training, on which has considered a model of
N-gramms, represented like a finite automat.
A state of the automat is q, and is c (q) is
total number of events (N-gramas) observed in the
sample when model is in state q.

C(wq) is the number of times that the word w has
been observed in the sample,being the model in
the state q.
P(wq) is the probability of observation of the
word w conditioned to the state q.
The set of words observed in the sample when the
model is in the state q.
The total vocabulary of the language that has to
be modelate
For example in a bigram
This attitude approach assigns the probability 0
to the events that havent been said? this cause
problems of cover?the solution is smooth the
model?we can smooth the model withplane,lineal,no
lineal, back-off, sintact back-off..

Bigrams are easily incorporated in Viterbi search
Trigrams used for large vocabulary recognition in
mid-1970s and remain the dominant language modeL
IBM TRIGRAM EXAMPLE

Methods, in order to measure the probability of
ungesehenen N-grams
n-gram performance can be improved by clustering
words
Hard clustering puts a word into a single
cluster
Soft clustering allows a word to belong to
multiple clusters
Clusters can be created manually, or
automatically
Manually created clusters have worked well
for small domains
Automatic clusters have been created
bottom-up or top-down

23
PERPLEXITY

Average of options
Quantifying LM Complexity
One LM is better than another if it can
predict an n word test corpus W with a higher
probability
For LMs representable by the chain rule,
comparisons are usually based on the average per
word logprob, LP
A more intuitive representation of LP is
the perplexity
(a uniform LM will have PP equal to vocabulary
size)
PP is often interpreted as an average
branching factor

24
Perplexity Examples
25
Bibliography

P. Brown et al., Class-based n-gram models of
natural language, Computational Linguistics,
1992.
R. Lau, Adaptive Statistical Language
Modelling, S.M. Thesis, MIT, 1994.
M. McCandless, Automatic Acquisition of
Language Models for Speech Recognition, S.M.
Thesis, MIT, 1994.
L.R.Rabiner y B.-H.JuangFundamentals of Speech
Recognition,Prentice-Hall,1993
GOOGLE