Nave Bayesian Learning slides by Francisco Iacobelli modified by Junsong Yuan PowerPoint PPT Presentation

presentation player overlay
1 / 26
About This Presentation
Transcript and Presenter's Notes

Title: Nave Bayesian Learning slides by Francisco Iacobelli modified by Junsong Yuan


1
Naïve Bayesian Learningslides by Francisco
Iacobelli / modified by Junsong Yuan
  • Thomas Bayes (c. 1702 17 April 1761) was a
    British mathematician and Presbyterian
    minister.(wikipedia)?

2
Bayes Theorem
An Essay towards solving a Problem in the
Doctrine of Chances
  • Notation
  • P(h) Probability that Hypothesis h holds (prior
    probability)?
  • P(D) Probability of observing the training data
    D.
  • P(Dh) Probability of observing D when h holds.
    (read probability of D given h).
  • P(hD) ?
  • This last probability is what interests us. Why?
  • Because we usually have data and need to find the
    most probable hypothesis given that data.

3
Bayes Theorem
4
An Example
  • Weather forecast.
  • Probability of hurricanes in Chicago 0.008
  • Tom Skilling predicts hurricanes correctly 98 of
    the time. (p-h)?
  • Skilling also gets it right 97 of the time when
    he predicts no hurricane (p-nh)?

P(hurricane) 0.008
P(hurricane) 0.992
5
An Example
  • Weather forecast.
  • Probability of hurricanes in Chicago 0.008
  • Tom Skilling predicts hurricanes correctly 98 of
    the time. (p-h)?
  • Skilling also gets it right 97 of the time when
    he predicts no hurricane (p-nh)?

P(hurricane) 0.008
P(hurricane) 0.992
P(p-hhurricane) 0.98
P(p-nhhurricane) 0.02
6
An Example
  • Weather forecast.
  • Probability of hurricanes in Chicago 0.008
  • Tom Skilling predicts hurricanes correctly 98 of
    the time. (p-h)?
  • Skilling also gets it right 97 of the time when
    he predicts no hurricane (p-nh)?

P(hurricane) 0.008
P(hurricane) 0.992
P(p-hhurricane) 0.98
P(p-hhurricane) 0.03
P(p-nhhurricane) 0.02
P(p-nh hurricane) 0.97
7
An Example
  • Weather forecast.
  • Probability of hurricanes in Chicago 0.008
  • Tom Skilling predicts hurricanes correctly 98 of
    the time. (p-h)?
  • Skilling also gets it right 97 of the time when
    he predicts no hurricane (p-nh)?

P(hurricane) 0.008
P(hurricane) 0.992
P(p-hhurricane) 0.98
P(p-hhurricane) 0.03
P(p-nhhurricane) 0.02
P(p-nh hurricane) 0.97
8
An Example
  • Any random day, Skilling says there will be a
    hurricane. Should we believe him?
  • Whats the probability that he is right?
  • Whats the probability that he is wrong?

Let P(hurricane) P(h)Let P(hurricane)P(h)?
P(p-hh)P(h)?
0.980.008
0.0078
P(hp-h)


P(p-h)?
P(p-h)?
P(p-h)?
P(p-hh)P(h)?
0.030.992
0.0298
P(hp-h)


P(p-h)?
P(p-h)?
P(p-h)?
P(hp-h) 3.82 P(hp-h)
9
  • How does Tom predict ?

P(h)0.992
P(p-hh) 0.98
P(p-h)0.0376
P(p-nh h) 0.97
P(h)0.008
10
Maximum A Posteriori hypothesis (MAP) and Maximum
Likelihood (ML)?
  • hMAP argmax P(hD)?
  • But
  • hMAP argmax Thats what we
    did!
  • hMAP argmax P(Dh)P(h)?
  • HML argmax P(Dh) when P(h) is constant.

P(Dh)P(h)?
P(D)?
11
Naïve Bayes
  • One of the most practical methods together with
    NN, dTrees, Nnbr.
  • When to use
  • Moderate to Large Training Sets
  • Attributes that describe data are conditionally
    independent. (P(ac,b) P(ab))?
  • Successful applications Authorship of texts,
    other text classifications, diagnosis.

12
Naïve Bayes What about more attributes?
  • Vwatch-TV, Do-Homework
  • A tired, Heroes-on-TV, I-have-ML-hw
  • vMAP argmax P(vja1,a2,..,an)?
  • But
  • vMAP argmax We
    have done this!
  • vNB argmax
  • Lets look at an example.

P(a1,a2,..,anvj)P(vj)?
P(vja1,a2,..,an)
P(a1,a2,..,an)?
P(a1,a2,..,anvj)P(vj)?
P(a1,a2,..,an)?
P(a1,a2,..,anvj)P(vj)? P(a1vj) P(anvj)
P(vj)?
13
Play Tennis
Tom Mitchel. Machine Learning, p(59)?
14
New ObservationltSunny,Cool,High,Stronggt
  • Do you Play Tennis?
  • VNB argmax P(yeslts,c,h,sgt),P(nolts,c,h,sgt)
  • P(yes)P(sunnyyes)P(coolyes)P(highyes)P(strongy
    es)?
  • P(yes) 9/14 0.64
  • P(sunnyyes) 2/9 0.22
  • P(coolyes) 3/9 0.33
  • P(highyes) 3/9 0.33
  • P(strongyes) 3/9 0.33
  • P(yeslts,c,h,sgt) 0.0051

15
New ObservationltSunny,Cool,High,Stronggt
  • Do you Play Tennis?
  • Vnb argmax P(yeslts,c,h,sgt),P(nolts,c,h,sgt)
  • P(no)P(sunnyno)P(coolno)P(highno)P(strongno)?
  • P(no) 5/14 0.36
  • P(sunnyno) 3/5 0.6
  • P(coolno) 1/5 0.2
  • P(highno) 4/5 0.8
  • P(strongno) 3/5 0.6
  • P(nolts,c,h,sgt) 0.0207

16
New ObservationltSunny,Cool,High,Stronggt
  • Do you Play Tennis?
  • Vnb argmax P(yeslts,c,h,sgt),P(nolts,c,h,sgt)
  • P(yeslts,c,h,sgt) 0.0051
  • P(nolts,c,h,sgt) 0.0207
  • Therefore, given ltsunny, cool, high, stronggt I
    will not play tennis.
  • Moreover. Theres a 80 probability that I will
    not play tennis under the given forecast.
    (Normalize)?

17
Another Observation ltovercast, cool, low,
stronggt
  • P(no)P(overcastno)P(coolno)P(lowno)P(strongno)
    ?
  • So, P(nolto,c,l,sgt) 0!?!?!
  • Solve m-estimate of probability
  • P(overcastno) 0.125

0
nc mp
p1/values
m weight. Usually values
n m
0 31/3
5 3
18
Naïve Bayes and Text
  • Naïve Bayes is used to classify text
  • Authorship of documents
  • Interesting websites
  • And

19
SPAM!!!!
20
Sample emails
  • In just as little as 2 weeks you can have a
    masters degree from a national university. A
    better job, more income and a better life can all
    be yours inl ess than 2 weeks.
  • Yo Walter_balan!.!A Genuine Univers1ty Degree 1n
    4-6 weeks! Have you ever thought that the onlyt
    hing stopping you from a great job and better pay
    was a few letters behind you name?Well now you
    can get them! BA BSc MA MSc MBA PhD

21
Issues
  • Words are not always in the same positions e.g
    weeks, job, better
  • Not all messages have the same length
  • Not all words are conditionally independent e.g
    University Degree

22
How to do it?
  • One approach N-grams (take n-words at a time)?
  • Another approach
  • Assume conditional independence and
  • Assume probability of words is the same in any
    given position.
  • P(aiwkvj) P(amwkvj)?
  • Probability P(wkvj)

nk 1
n Vocabulary
23
Now, Lets Learn The Probabilities
  • Examples a set of texts with target value. V is
    the set of possible target values (spam, no
    spam)?
  • Collect all words
  • Vocabulary all distinct words and tokens in any
    text document
  • Calculate P(vj) and P(wkvj)?
  • For each target value vj in V do
  • docsj subset of Examples for which target is vj
  • P(vj)
  • Textj concatenation of members of docj
  • For each word wk in Vocabulary
  • nk number of times word wk occurs in Textj
  • P(wkvj)

docsj
Examples
nk 1
n Vocabulary
24
And Classify
  • Positions all word positions in Doc that
    contain tokens found in Vocabulary
  • Return vnb where
  • vnb argmax P(vj)?P(aivj)?
  • Note ai word at ith position.

Vj in V
i in positions
25
A Note on Performance
Kim, S. B., Seo, H. C., and Rim, H. C. (2003).
26
Final Notes
  • P(attribute) can have any distribution you want.
  • You may need to find variance and error
    estimation
  • But once thats figured out, you can have a
    classifier with a different probability
    distribution and maybe less training data.
  • To overcome the conditional independence
    restriction, one can build Bayesian Belief
    Networks
  • Bayes Theorem provides many possibilities for
    probabilistic machine learning and underlies a
    lot of algorithms used in many applications
    including speech recognition, natural language
    generation, text classification, etc.
Write a Comment
User Comments (0)
About PowerShow.com