Advanced Smoothing, Evaluation of Language Models

1 / 20

About This Presentation

Title:

Advanced Smoothing, Evaluation of Language Models

Description:

H(L) = limn -1/n p(w1... wn) log p(w1... wn) n is number of words in the sequence ... H(L) = limn - 1/n log p(w1... wn) select sufficiently large n ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 21

Provided by: DanJur1

Tags: limn | flash | free | online_training | powerpoint | ppt | pptx | presentation | slide_show | slideshow

Transcript and Presenter's Notes

Title: Advanced Smoothing, Evaluation of Language Models

1
Advanced Smoothing, Evaluation of Language Models
2
Witten-Bell Discounting

A zero ngram is just an ngram you havent seen
yetbut every ngram in the corpus was unseen
onceso...
How many times did we see an ngram for the first
time? Once for each ngram type (T)
Est. total probability of unseen bigrams as
View training corpus as series of events, one for
each token (N) and one for each new type (T)
We can divide the probability mass equally among
unseen bigrams.or we can condition the
probability of an unseen bigram on the first word
of the bigram
Discount values for Witten-Bell are much more
reasonable than Add-One

3
Good-Turing Discounting

Re-estimate amount of probability mass for zero
(or low count) ngrams by looking at ngrams with
higher counts
Estimate
E.g. N0s adjusted count is a function of the
count of ngrams that occur once, N1
Assumes
word bigrams follow a binomial distribution
We know number of unseen bigrams (VxV-seen)

4
Interpolation and Backoff

Typically used in addition to smoothing
techniques/ discounting
Example trigrams
Smoothing gives some probability mass to all the
trigram types not observed in the training data
We could make a more informed decision! How?
If backoff finds an unobserved trigram in the
test data, it will back off to bigrams (and
ultimately to unigrams)
Backoff doesnt treat all unseen trigrams alike
When we have observed a trigram, we will rely
solely on the trigram counts

5
Backoff methods (e.g. Katz 87)

For e.g. a trigram model
Compute unigram, bigram and trigram probabilities
In use
Where trigram unavailable back off to bigram if
available, o.w. unigram probability
E.g An omnivorous unicorn

6
Smoothing Simple Interpolation

Trigram is very context specific, very noisy
Unigram is context-independent, smooth
Interpolate Trigram, Bigram, Unigram for best
combination
Find ?0lt???lt1 by optimizing on held-out data
Almost good enough

7
Smoothing Held-out estmation

Finding parameter values
Split data into training, heldout, test
Try lots of different values for ?? ? on heldout
data, pick best
Test on test data
Sometimes, can use tricks like EM (estimation
maximization) to find values
How much data for training, heldout, test?
Answer enough test data to be statistically
significant. (1000s of words perhaps)

8
Summary

N-gram probabilities can be used to estimate the
likelihood
Of a word occurring in a context (N-1)
Of a sentence occurring at all
Smoothing techniques deal with problems of unseen
words in a corpus

9
Practical Issues

Represent and compute language model
probabilities on log format
p1 ? p2 ? p3 ? p4 exp (log p1 log p2 log
p3 log p4)

10
Class-based n-grams

P(wiwi-1) P(cici-1) x P(wici)
Factored Language Models

11
Evaluating language models

We need evaluation metrics to determine how good
our language models predict the next word
Intuition one should average over the
probability of new words

12
Some basic information theory

Evaluation metrics for language models
Information theory measures of information
Entropy
Perplexity

13
Entropy

Average length of most efficient coding for a
random variable
Binary encoding

14
Entropy

Example betting on horses
8 horses, each horse is equally likely to win
(Binary) Message required
001, 010, 011, 100, 101, 110, 111, 000
3-bit message required

15
Entropy

8 horses, some horses are more likely to win
Horse 1 ½ 0
Horse 2 ¼ 10
Horse 3 1/8 110
Horse 4 1/16 1110
Horse 5-8 1/64 111100, 111101, 111110, 111111

16
Perplexity

Entropy H
Perplexity 2H
Intuitively weighted average number of choices a
random variable has to make
Equally likely horses Entropy 3 Perplexity 23
8
Biased horses Entropy 2 Perplexity 22 4

17
Entropy

Uncertainty measure (Shannon)
given a random variable x
r 2, pi probability the event is i
Biased coin
-0.8 lg 0.8 -0.2 lg 0.2 0.258 0.464
0.722
Unbiased coin
- 2 0.5 lg 0.5 1
lg log2 (log base 2)
entropy H(x) Shannon uncertainty

Perplexity
(average) branching factor
weighted average number of choices a random
variable has to make
Formula 2H
directly related to the entropy value H
Examples
Biased coin
20.722 0.52
Unbiased coin
- 21 2

18
Entropy and Word Sequences

Given a word sequence
W w1 wn
Entropy for word sequences of length n in
language L
H(w1 wn) -? p(w1 wn) log p(w1 wn)
over all sequences of length n in language L
Entropy rate for word sequences of length n
1/n H(w1 wn)
-1/n ? p(w1 wn) log p(w1 wn)

Entropy rateH(L) limngt? -1/n ? p(w1 wn) log
p(w1 wn)
n is number of words in the sequence
Shannon-McMillan-Breiman theorem
H(L) limn?? - 1/n log p(w1 wn)
select sufficiently large n
possible then to take a single sequence
instead of summing over all possible w1 wn
long sequence will contain many shorter sequences

19
Entropy of a sequence

Finite sequence strings from a language L
Entropy rate (per-word entropy)

20
Entropy of a language

Entropy rate of language L
Shannon-McMillan-Breimann Theorem
If a language is stationary and ergodic
A single sequence if it is long enough is
representative for the language

Write a Comment

User Comments (0)

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Research update Language modeling, ngrams and search engines PowerPoint PPT Presentation

Research update Language modeling, ngrams and search engines - Poor recall most of the relevant documents are not located ... Peanut butter. Peanut candy. Roasted peanut. Chocolate peanut. Peanut brittle. Peanut cookie ... | PowerPoint PPT presentation | free to view

SIMS 256: Applied Natural Language Processing PowerPoint PPT Presentation

SIMS 256: Applied Natural Language Processing - ... contacts on the face of a smart card so that any card and ... This was Amy Garcia, my next door neighbor from 10 years ago. The woman has totally changed! ... | PowerPoint PPT presentation | free to view

'NET Programming Language Research MSR Cambridge PowerPoint PPT Presentation

'NET Programming Language Research MSR Cambridge - Andrew Kennedy and Don Syme. MS.NET days March 2002. Note: This is not in V1 of the CLR ... shared-memory synchronization primitives (mutexes, monitors, r/w locks) ... | PowerPoint PPT presentation | free to view

Chapter 4: Advanced IR Models PowerPoint PPT Presentation

Chapter 4: Advanced IR Models - d1: How to bake bread without recipes. d2: The classic art of Viennese Pastry ... Source: Thomas Hofmann, Tutorial 'Machine Learning in Information Retrieval' ... | PowerPoint PPT presentation | free to view

Simulation of Language Acquisition PowerPoint PPT Presentation

Simulation of Language Acquisition - Explains and predicts empirical data (observations, experimental results) ... Consistence and completeness can sometimes be proven. Falsifiable through simulations ... | PowerPoint PPT presentation | free to view

HFE 730: Advanced Modeling Techniques PowerPoint PPT Presentation

HFE 730: Advanced Modeling Techniques - Are there emerging techniques to help overcome some of these limitations? ... Neighborhoods can be defined in a variety of fashions ... | PowerPoint PPT presentation | free to view

Information Retrieval: Models and Methods PowerPoint PPT Presentation

Information Retrieval: Models and Methods - If a synonym, rather than original term, is used, approach ... Use linguistic resource thesaurus, WordNet to add synonyms/related terms. Feedback expansion ... | PowerPoint PPT presentation | free to view

Synthesis PowerPoint PPT Presentation

Synthesis - The definition of prosody 'exaggeration' The algorithm. Evaluation of exaggerated prosody ... Definition of prosody exaggeration. F0 contour ... | PowerPoint PPT presentation | free to view

CSA405: Advanced Topics in NLP PowerPoint PPT Presentation

CSA405: Advanced Topics in NLP - TIME: complete or partial expression of time of day. Absolute temporal expressions only, i.e. ... 'third quarter of 1991' TIMEX TYPE='DATE' third quarter of ... | PowerPoint PPT presentation | free to view

multimodal emotion recognition and expressivity analysis ICME 2005 Special Session PowerPoint PPT Presentation

multimodal emotion recognition and expressivity analysis ICME 2005 Special Session - capability of machines to recognize, express, model, communicate and respond to ... deictic/conversational gestures 'body language' ... | PowerPoint PPT presentation | free to view

CNM 190 Advanced Digital Animation Lec 05 : Procedural Modeling Basics PowerPoint PPT Presentation

CNM 190 Advanced Digital Animation Lec 05 : Procedural Modeling Basics - CNM 190 Advanced Digital Animation Lec 05 : Procedural Modeling Basics | PowerPoint PPT presentation | free to view

CIS 830 (Advanced Topics in AI) Lecture 16 of 45 PowerPoint PPT Presentation

CIS 830 (Advanced Topics in AI) Lecture 16 of 45 - Filtering: use weak inducers in cascade to filter examples for downstream ones. Resampling: reuse data from D by subsampling (don't need huge or 'infinite' D) ... | PowerPoint PPT presentation | free to view

Cluster-Based Retrieval Using Language Models PowerPoint PPT Presentation

Cluster-Based Retrieval Using Language Models - Cluster-Based Retrieval Using Language Models. Present by Chia-Hao Lee ... W. Bruce Croft : Center for intelligent information retrieval department of ... | PowerPoint PPT presentation | free to view

Scheduling Systems Overview PowerPoint PPT Presentation

Scheduling Systems Overview - Topic 22 Scheduling Systems Overview Implementation Scheduling models Solution methods Advanced Planning and Scheduling (APS) system Enterprise-wide information ... | PowerPoint PPT presentation | free to view

Computational Models of Text Quality PowerPoint PPT Presentation

Computational Models of Text Quality - * Discourse (coherence) relations Only recently empirically results have shown that discourse relations are predictive of text quality (Pitler and Nenkova, ... | PowerPoint PPT presentation | free to view

Automatic Speaker Recognition: Recent Progress, Current Applications, and Future Trends PowerPoint PPT Presentation

Automatic Speaker Recognition: Recent Progress, Current Applications, and Future Trends - E.M. Bakker LIACS Media Lab Leiden University Outline Introduction and State of the Art A Speech Recognition Architecture Acoustic modeling Language modeling ... | PowerPoint PPT presentation | free to view

A Course in English Language Teaching PowerPoint PPT Presentation

A Course in English Language Teaching - A Course in English Language Teaching By Hu Yining FLD, Huangshan College | PowerPoint PPT presentation | free to view

Semantic Representations with Probabilistic Topic Models PowerPoint PPT Presentation

Semantic Representations with Probabilistic Topic Models - ... dependencies short range dependencies Semantic dependencies long-range q z1 z2 z3 z4 w1 w2 w3 w4 s1 s2 s3 s4 ... SATURN GALAXY associate number 1 ... | PowerPoint PPT presentation | free to view

Online Assessment and Evaluation Techniques in Corporate Settings PowerPoint PPT Presentation

Online Assessment and Evaluation Techniques in Corporate Settings - Title: Part I: The State of Online Learning Author: Vanessa Dennen Last modified by: Curt Bonk Created Date: 9/4/2001 8:48:47 AM Document presentation format | PowerPoint PPT presentation | free to view

Topic Models for Social Network Analysis and Bibliometrics PowerPoint PPT Presentation

Topic Models for Social Network Analysis and Bibliometrics - Automatically Building Special Purpose Search Engines with ... | PowerPoint PPT presentation | free to view

Information Extraction Data Mining and Topic Discovery with Probabilistic Models PowerPoint PPT Presentation

Information Extraction Data Mining and Topic Discovery with Probabilistic Models - Slide Material for DHS Reverse Site Visit | PowerPoint PPT presentation | free to view

Topic Models for Social Network Analysis and Bibliometrics PowerPoint PPT Presentation

Topic Models for Social Network Analysis and Bibliometrics - Automatically Building Special Purpose Search Engines with ... | PowerPoint PPT presentation | free to view

BTSD Teacher Evaluation Tool Training PowerPoint PPT Presentation

BTSD Teacher Evaluation Tool Training - The current school year is the last year that the PDE 426/428 and PDE 5501 will be available for ... Evidence, or facts, form the foundation of good decision-making. | PowerPoint PPT presentation | free to view

New Jersey ASK English Language Arts Review PowerPoint PPT Presentation

New Jersey ASK English Language Arts Review - New Jersey ASK English Language Arts Review Grades 7-8 * | PowerPoint PPT presentation | free to view

Chapter seven language, culture, and society PowerPoint PPT Presentation

Chapter seven language, culture, and society - Discussions in this chapter 1.language and culture 2.language and society 3.language and cross-cultural communication 7.1.1how does language relate to culture? | PowerPoint PPT presentation | free to view

ETL%20Tool%20Evaluation%20Guide PowerPoint PPT Presentation

ETL%20Tool%20Evaluation%20Guide - What you need, when you need it ETL Tool Evaluation Guide | PowerPoint PPT presentation | free to view

Selecting the Best Translation Services PowerPoint PPT Presentation

Selecting the Best Translation Services - Learn about the world of translation services with our expert guide. Learn the essential factors, from language expertise to cultural sensitivity, ensuring your communication transcends borders. Read now for informed decision-making. Contact us for professional translation services: https://activeloc.com/ | PowerPoint PPT presentation | free to view