Statistical NLP: Lecture 5 - PowerPoint PPT Presentation

About This Presentation

Title:

Statistical NLP: Lecture 5

Description:

Perplexity. A measure related to the notion of cross-entropy and used in the ... A perplexity of k means that you are as. surprised on average as you would have ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 10

Provided by: Hoo1

Category:

Tags: nlp | lecture | perplexity | statistical

Transcript and Presenter's Notes

Title: Statistical NLP: Lecture 5

1
Statistical NLP Lecture 5

Mathematical Foundations II
Information Theory
(Ch 2)

2
Entropy

The entropy is the average uncertainty of a
single random variable.
Let p(x)P(Xx) where x ?X.
H(p) H(X) - Sx?X p(x)log2p(x)
In other words, entropy measures the
amount of information in a random variable.
It is normally measured in bits.

3
Joint Entropy and Conditional Entropy

The joint entropy of a pair of discrete random
variables
X, Y p(x,y) is the amount of information
needed on
average to specify both their values.
H(X,Y) - Sx?X Sy?Y p(x,y)log2 p(x,y)
The conditional entropy of a discrete random
variable
Y given another X, for X, Y p(x,y), expresses
how
much extra information you still need to supply
on
average to communicate Y given that the other
party
knows X.
H(YX) - Sx?X Sy?Y p(x,y)log2p(yx)
Chain Rule for Entropy H(X,Y)H(X)H(YX)

4
Mutual Information

By the chain rule for entropy, we have H(X,Y)
H(X) H(YX) H(Y)H(XY)
Therefore, H(X)-H(XY)H(Y)-H(YX)
This difference is called the mutual information
between X and Y.
It is the reduction in uncertainty of one random
variable due to knowing about another, or, in
other
words, the amount of information one random
variable contains about another.

5
The Noisy Channel Model

Assuming that you want to communicate messages
over a channel of restricted capacity, optimize
(in
terms of throughput and accuracy) the
communication in the presence of noise in the
channel.
A channels capacity can be reached by designing
an
input code that maximizes the mutual information
between the input and output over all possible
input
distributions.
This model can be applied to NLP.

6
Relative Entropy or Kullback-LeiblerDivergence

For 2 pmfs, p(x) and q(x), their relative entropy
is
D(pq) Sx?X p(x)log(p(x)/q(x))
The relative entropy (also known as the Kullback-
Leibler divergence) is a measure of how
different two
probability distributions (over the same event
space)
are.
The KL divergence between p and q can also be
seen as the average number of bits that are
wasted by encoding events from a distribution p
with a code
based on a not-quite-right distribution q.

7
The Relation to LanguageCross-Entropy

Entropy can be thought of as a matter of how
surprised we will be to see the next word given
previous words we already saw.
The cross entropy between a random variable X
with true probability distribution p(x) and
another
pmf q (normally a model of p) is given by
H(X,q)H(X)D(pq).
Cross-entropy can help us find out what our
average surprise for the next word is.

8
The Entropy of English

We can model English using n-gram
models (also known a Markov chains).
These models assume limited memory, i.e.,
we assume that the next word depends only
on the previous k ones kth order Markov
approximation.
What is the Entropy of English?

9
Perplexity

A measure related to the notion of cross-entropy
and used in the speech recognition
community is called the perplexity.
Perplexity(x1n, m) 2H(x1n,m) m(x1n)-1/n
A perplexity of k means that you are as
surprised on average as you would have
been if you had had to guess between k
equiprobable choices at each step.

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Statistical NLP: Lecture 4 PowerPoint PPT Presentation

Statistical NLP: Lecture 4 - Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2) Notions of Probability Theory Probability theory deals with predicting how likely it ... | PowerPoint PPT presentation | free to view

Statistical NLP: Lecture 10 PowerPoint PPT Presentation

Statistical NLP: Lecture 10 - Examples of lexical acquisition problems: selectional preferences, ... Selectional preferences can be used to rank different parses of a sentence. ... | PowerPoint PPT presentation | free to view

Statistical NLP: Lecture 5 PowerPoint PPT Presentation

Statistical NLP: Lecture 5 - Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory Entropy The entropy is the average uncertainty of a single random variable. | PowerPoint PPT presentation | free to view

Statistical NLP: Lecture 13 PowerPoint PPT Presentation

Statistical NLP: Lecture 13 - Statistical Alignment and Machine Translation. 2. Overview. MT is very hard: translation programs available today do not perform very well. ... | PowerPoint PPT presentation | free to view

Statistical NLP: Lecture 11 PowerPoint PPT Presentation

Statistical NLP: Lecture 11 - ... of seeing the output sequence {lemonade, iced-tea} if the machine always ... We make a square array of states versus time and compute the probabilities of ... | PowerPoint PPT presentation | free to view

Statistical NLP: Lecture 7 PowerPoint PPT Presentation

Statistical NLP: Lecture 7 - Statistical NLP: Lecture 7 Collocations (Ch 5) Introduction Collocations are characterized by limited compositionality. Large overlap between the concepts of ... | PowerPoint PPT presentation | free to view

Statistical NLP: Lecture 10 PowerPoint PPT Presentation

Statistical NLP: Lecture 10 - Sometimes information other than v, n and p is useful. ... Selectional Preference strength, S(v) measures how strongly the verb constrains ... | PowerPoint PPT presentation | free to view

Statistical NLP: Lecture 12 PowerPoint PPT Presentation

Statistical NLP: Lecture 12 - A corresponding set of probabilities on rules such that: i j P(Ni -- j) = 1 ... Induction: i(p,q) = max1 j,k n,p r qP(Ni -- Nj Nk) j(p,r) k(r 1,q) ... | PowerPoint PPT presentation | free to view

CS188 Guest Lecture: Statistical Natural Language Processing PowerPoint PPT Presentation

CS188 Guest Lecture: Statistical Natural Language Processing - CS188 Guest Lecture: Statistical Natural Language Processing Prof. Marti Hearst School of Information Management & Systems www.sims.berkeley.edu/~hearst | PowerPoint PPT presentation | free to view

Statistical NLP: Lecture 2 PowerPoint PPT Presentation

Statistical NLP: Lecture 2 - ... a distinction between linguistic competence and linguistic performance. ... existence beyond the sum of its parts (e.g., disk drive, make up, bacon and eggs) ... | PowerPoint PPT presentation | free to view

Applications and demos PowerPoint PPT Presentation

Applications and demos - Lecture 8 Applications and demos | PowerPoint PPT presentation | free to view

CS626-449: NLP, Speech and Web-Topics-in-AI PowerPoint PPT Presentation

CS626-449: NLP, Speech and Web-Topics-in-AI - CS626-449: NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36: UNL | PowerPoint PPT presentation | free to view

Lecture 1 Introduction to NLP PowerPoint PPT Presentation

Lecture 1 Introduction to NLP - Sometimes it is possible to use grammar rules (like subject verb agreement) to disambiguate: ... (at tennis , or murdered her) 11. Example of NLP difficulties 4 ... | PowerPoint PPT presentation | free to view

WSTA Lecture 14 Part-of-speech Tagging PowerPoint PPT Presentation

WSTA Lecture 14 Part-of-speech Tagging - WSTA Lecture 14 Part-of-speech Tagging Tags introduction tagged corpora, tagsets Tagging motivation Simple unigram tagger Markov model tagging Rule based tagging | PowerPoint PPT presentation | free to view

Natural Language Processing (NLP) PowerPoint PPT Presentation

Natural Language Processing (NLP) - Title: LING 180 Intro to Computer Speech and Language Lecture 1 Author: Dan Jurafsky Last modified by: Engineering Science Created Date: 1/18/2003 3:56:53 AM | PowerPoint PPT presentation | free to view

Lecture 20: Evaluation PowerPoint PPT Presentation

Lecture 20: Evaluation - Title: PowerPoint Presentation Author: Valued Gateway Client Last modified by: Marc Davis Created Date: 9/3/2002 3:52:45 AM Document presentation format | PowerPoint PPT presentation | free to view

Seven Lectures on Statistical Parsing PowerPoint PPT Presentation

Seven Lectures on Statistical Parsing - It's harder when you compose in errors from word segmentation as well... PP modifiers follow NP; arguments and PP modifiers follow V ... | PowerPoint PPT presentation | free to view

Statistical Natural Language Processing PowerPoint PPT Presentation

Statistical Natural Language Processing - Special Topics in Computer Science: Statistical Natural Language Processing ... How does Babelfish translate documents from German to English? ... | PowerPoint PPT presentation | free to view

Lecture 16: Information Extraction PowerPoint PPT Presentation

Lecture 16: Information Extraction - Cincinnati, Ohio 45210. Pawel Opalinski, Software. Engineer at WhizBang Labs. E.g. word patterns: ... Candidates. Abraham Lincoln was born in Kentucky. ... | PowerPoint PPT presentation | free to view

Statistical Machine Translation Part I - Introduction PowerPoint PPT Presentation

Statistical Machine Translation Part I - Introduction - * Where we have been Human evaluation & BLEU Parallel corpora Sentence alignment ... of machine translation Parallel corpora Sentence alignment ... | PowerPoint PPT presentation | free to view

Seven Lectures on Statistical Parsing PowerPoint PPT Presentation

Seven Lectures on Statistical Parsing - Seven Lectures on Statistical Parsing. Christopher Manning. LSA ... The goal is to come up with a hypothesis with ... (on all sentences! up to 80 wd) ... | PowerPoint PPT presentation | free to view

CS626/449 : Speech, NLP and the Web/Topics in AI Programming (Lecture 9: Resnick PowerPoint PPT Presentation

CS626/449 : Speech, NLP and the Web/Topics in AI Programming (Lecture 9: Resnick - CS626/449 : Speech, NLP and the Web/Topics in AI Programming ... dwelling, home, domicile, abode, habitation, dwelling house -- (housing that ... | PowerPoint PPT presentation | free to view

CS460/626 : Natural Language Processing/Language Technology for the Web (Lecture 1 PowerPoint PPT Presentation

CS460/626 : Natural Language Processing/Language Technology for the Web (Lecture 1 - CS460/626 : Natural Language Processing/Language Technology for the Web (Lecture 1 Introduction) Pushpak Bhattacharyya CSE Dept., IIT Bombay | PowerPoint PPT presentation | free to view

ML for NLP With Special Focus on Tagging and Parsing PowerPoint PPT Presentation

ML for NLP With Special Focus on Tagging and Parsing - T.M. Mitchell, Machine Learning (1997, McGraw-Hill): hypothesis, decision trees, ... Tagging and parsing undoubtedly improve: grammar checking, speech processing, ... | PowerPoint PPT presentation | free to view

Lecture 16: Presentations PowerPoint PPT Presentation

Lecture 16: Presentations - Arabic Syntax & grammar Parsing. Arabic Semantic analysis. English Arabic Machine Translation. Arabic Information Extraction. Arabic web search ... | PowerPoint PPT presentation | free to view

CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 3: Argmax Computation) PowerPoint PPT Presentation

CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 3: Argmax Computation) - Key difference between Statistical/ML-based NLP and Knowledge-based/linguistics ... Inflectional, Agglutinative morphology; Infix, Prefix and Postfix Morphemes, ... | PowerPoint PPT presentation | free to view

Statistical%20NLP:%20Lecture%207 PowerPoint PPT Presentation

Statistical%20NLP:%20Lecture%207 - Collocations are characterized by limited compositionality. ... Selection of Collocation based on Mean and Variance of the distance between ... | PowerPoint PPT presentation | free to view