CSA4050: Advanced Topics in NLP - PowerPoint PPT Presentation

About This Presentation
Title:

CSA4050: Advanced Topics in NLP

Description:

... character/phoneme in a sequence, given the first N words/characters/phonemes. ... could be used to study the the probability of the next character or phoneme. ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 25
Provided by: MikeR2
Category:

less

Transcript and Presenter's Notes

Title: CSA4050: Advanced Topics in NLP


1
CSA4050Advanced Topics in NLP
  • Probability I
  • Experiments/Outcomes/Events
  • Independence/Dependence
  • Bayes Rule
  • Conditional Probability/Chain Rule

2
Acknowledgement
  • Much of this material is based on material by
    Mary Dalrymple, Kings College, London

3
Experiment, Basic Outcome,Sample Space
  • Probability theory is founded upon the notion of
    an experiment.
  • An experiment is a situation which can have one
    or more different basic outcomes.
  • Example if we throw a die, there are six
    possible basic outcomes.
  • A Sample Space O is a set of all possible basic
    outcomes. For example,
  • If we toss a coin, O H,T
  • If we toss a coin twice, O HT,TH,TT,HH
  • if we throw a die, O 1,2,3,4,5,6

4
Event
  • An Event A ? O is a set of basic outcomes e.g.
  • tossing two heads HH
  • throwing a 6, 6
  • getting either a 2 or a 4, 2,4.
  • O itself is the certain event, whilst is the
    impossible event.
  • Event Space ? Sample Space

5
Probability distribution
  • A probability distribution of an experiment is a
    function that assigns a number (or probability)
    between 0 and 1 to each basic outcome such that
    the sum of all the probabilities 1.
  • The probability p(E) of an event E is the sum of
    the probabilities of all the basic outcomes in E.
  • Uniform distribution is when each basic outcome
    is equally likely.

6
Probability of an Event die example
  • Sample space set of basic outcomes
    1,2,3,4,5,6
  • If the die is not loaded, distribution is
    uniform.
  • Thus for each basic outcome, e.g. 6 (throwing a
    six) is assigned the same probability 1/6.
  • So p(3,6) p(3) p(6) 2/6 1/3

7
Estimating Probability
  • Repeat experiment T times and count frequency of
    E.
  • Estimated p(E) count(E)/count(T)
  • This can be done over m runs, yielding estimates
    p1(E),...pm(E).
  • Best estimate is (possibly weighted) average of
    individual pi(E)

8
3 times coin toss
  • O HHH,HHT,HTH,HTT,THH,THT,TTH,TTT
  • Cases with exactly 2 tails HTT, THT,TTH
  • Experimenti 1000 cases (3000 tosses).
  • c1(E) 386, p1(E) .386
  • c2(E) 375, p2(E) .375
  • pmean(E) (.386.375)/2 .381
  • Uniform distribution is when each basic outcome
    is equally likely.
  • Assuming uniform distribution, p(E) 3/8 .375

9
Word Probability
  • General ProblemWhat is the probability of the
    next word/character/phoneme in a sequence, given
    the first N words/characters/phonemes.
  • To approach this problem we study an experiment
    whose sample space is the set of possible words.
  • N.B. The same approach could be used to study the
    the probability of the next character or phoneme.

10
Word Probability
  • Approximation 1 all words are equally probable
  • Then probability of each word 1/N where N is
    the number of word types.
  • But all words are not equally probable
  • Approximation 2 probability of each word is the
    same as its frequency of occurrence in a corpus.

11
Word Probability
  • Estimate p(w) - the probability of word w
  • Given corpus Cp(w) ? count(w)/size(C)
  • Example
  • Brown corpus 1,000,000 tokens
  • the 69,971 tokens
  • Probability of the 69,971/1,000,000 ? .07
  • rabbit 11 tokens
  • Probability of rabbit 11/1,000,000 ? .00001
  • conclusion next word is most likely to be the
  • Is this correct?

12
A counter example
  • Given the context Look at the cute ...
  • is the more likely than rabbit?
  • Context matters in determining what word comes
    next.
  • What is the probability of the next word in a
    sequence, given the first N words?

13
Independent Events
A eggs
B monday
sample space
14
Sample Space
  • (eggs,mon) (cereal,mon) (nothing,mon)
  • (eggs,tue) (cereal,tue) (nothing,tue)
  • (eggs,wed) (cereal,wed) (nothing,wed)
  • (eggs,thu) (cereal,thu) (nothing,thu)
  • (eggs,fri) (cereal,fri) (nothing,fri)
  • (eggs,sat) (cereal,sat) (nothing,sat)
  • (eggs,sun) (cereal,sun) (nothing,sun)

15
Independent Events
  • Two events, A and B, are independent if the fact
    that A occurs does not affect the probability of
    B occurring.
  • When two events, A and B, are independent, the
    probability of both occurring p(A,B) is the
    product of the prior probabilities of each, i.e.
  • p(A,B) p(A)   p(B)

16
Dependent Events
  • Two events, A and B, are dependent if the
    occurrence of one affects the probability of the
    occurrence of the other.

17
Dependent Events
A
A ? B
B
sample space
18
Conditional Probability
  • The conditional probability of an event A given
    that event B has already occurred is written
    p(AB)
  • In general p(AB) ? p(BA)

19
Dependent Events p(AB)? p(BA)
sample space
A
A ? B
B
20
Example Dependencies
  • Consider fair die example with
  • A outcome divisible by 2
  • B outcome divisible by 3
  • C outcome divisible by 4
  • p(AB) p(A ? B)/p(B) (1/6)/(1/3) ½
  • p(AC) p(A ? C)/p(C) (1/6)/(1/6) 1

21
Conditional Probability
  • Intuitively, after B has occurred, event A is
    replaced by A ? B, the sample space O is replaced
    by B, and probabilities are renormalised
    accordingly
  • The conditional probability of an event A given
    that B has occurred (p(B)gt0) is thus given by
    p(AB) p(A ? B)/p(B).
  • If A and B are independent,p(A ? B)
    p(A)  p(B) sop(AB) p(A)  p(B) /p(B) p(A).

22
Bayesian Inversion
  • For A and B to occur, either B must occur first,
    then B, or vice versa. We get the following
    possibilites
  • p(AB) p(A ? B)/p(B)p(BA) p(A ? B)/p(A)
  • Hence p(AB) p(B) p(BA) p(A)
  • We can thus express p(AB) in terms of p(BA)
  • p(AB) p(BA) p(A)/p(B)
  • This equivalence, known as Bayes Theorem, is
    useful when one or other quantity is difficult to
    determine

23
Bayes Theorem
  • p(BA) p(B?A)/p(A) p(AB) p(B)/p(A)
  • The denominator p(A) can be ignored if we are
    only interested in which event out of some set is
    most likely.
  • Typically we are interested in the value of B
    that maximises an observation A, i.e.
  • arg maxB p(AB) p(B)/p(A) arg maxB p(AB) p(B)

24
The Chain Rule
  • We can use the definition of conditional
    probability to more than two events
  • p(A1 ? ... ? An) p(A1) p(A2A1) p(A3A1 ?
    A2)..., p(AnA1 ? ... ? An-1)
  • The chain rule allows us to talk about the
    probability of sequences of events p(A1,...,An).
Write a Comment
User Comments (0)
About PowerShow.com