Artificial Intelligence - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Artificial Intelligence

Description:

Given a bag of candy, H denotes the type of the bag, randomly open and inspect ... Suppose randomly pick 10 samples from the bag, and they are all lime. What's ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 22
Provided by: yanz
Category:

less

Transcript and Presenter's Notes

Title: Artificial Intelligence


1
Artificial Intelligence
  • Lecture 17
  • Maximum Likelihood Learning

2
Overview
  • Bayes Theorem
  • MAP, ML hypotheses
  • Naïve Bayes classifier

3
Bayes Theorem
4
Candy Bags
  • Two flavors cherry and lime, wrapped in the same
    opaque wrapper
  • Five candy bags
  • h1 100 cherry
  • h2 75 cherry, 25 lime
  • h3 50 cherry, 50 lime
  • h4 25 cherry, 75 lime
  • h5 100 lime
  • Given a bag of candy, H denotes the type of the
    bag, randomly open and inspect D1, D2, D3,
    pieces of candy, predict the flavor of the next
    piece that is randomly picked.

5
Bayesian Learning
  • Given the data, calculate the probability of each
    hypothesis
  • Predict using all hypothesis weighted by their
    probabilities
  • Let D be all the data, the probability of each
    hypothesis is P(hD)

6
Bayes Theorem
  • P(h) prior probability of hypothesis h
  • P(D) prior probability of training data D
  • P(hD) probability of h given D (posterior
    probability of h)
  • P(Dh) probability of D given h (likelihood of
    D under h)
  • Prediction is made according to the weighted
    average over predictions of each hypothesis
    P(vjD)
  • The optimal classification of a new instance is
    the value vj, for which P(vjD) is maximum.

7
What Is the Flavor Next?
  • Suppose randomly pick 10 samples from the bag,
    and they are all lime. Whats the flavor next?
  • The prior distribution of h1, h2, , h5 is 0.2, 0.4, 0.2, 0.1 as advertised by the
    manufacturer.
  • Assume the samples d1, d2, observed are i.i.d
    (independently and identically distributed), then

8
Posterior Prediction
  • P(hiD1lime) ?
  • P(hiD1lime, D2lime) ?
  • P(hiD1lime,,D10lime) ?
  • P(xlimeD1,,10)?
  • P(xcherryD1,,10)?

9
Posterior Bayesian Prediction
10
Bayesian Prediction
  • The true hypothesis eventually dominates the
    bayesian prediction
  • The bayesian prediction is optimal, whether the
    dataset be small or large any other prediction
    is correct less often

11
MAP ML Hypotheses
12
Problem with Optimal Bayes Classifier
  • When the hypothesis space is very large,
    computing the summation is intractable
  • Resort to approximation
  • Predict according to the most probable hypothesis
    hi that maximizes P(hiD), often called maximum a
    posteriori (MAP)

13
MAP ML
14
Example
  • Does patient have cancer or not?
  • A patient takes a lab test and the result is
    positive. The test returns correct positive
    result in only 98 of the cases in which the
    disease is actually present, and a correct
    negative result is only 97 of the cases in which
    the disease is not present. Furthermore, 0.8 of
    the entire population have this cancer.

15
MAP Classification
  • P(cancer) ?
  • P(not cancer) ?
  • P(cancer) 0.008
  • P(not cancer) 0.992
  • P(cancer) 0.98
  • P(not cancer) 0.02
  • P(-cancer) 0.03
  • P(-not cancer) 0.97

16
Naïve Bayes Classifier
17
Naïve Bayes Classifier
  • One of the most practical learning methods
  • Used when large training set is available and
    attributes are conditional independent given a
    hypothesis
  • Successfully applied to text classification

18
Naïve Bayes Classifier
  • Assume a new instance is described by an, then

19
Naïve Bayes Algorithm
20
Example
  • Consider new instance strong, warm, same
  • Compute
  • P(y)p(sunnyy)p(warmy)p(highy)p(strongy)p(warm
    y)p(samey) ?
  • P(n)p(sunnyn)p(warmn)p(highn)p(strongn)p(warm
    n)p(samen) ?

21
Summary
  • Bayes Theorem
  • MAP, ML hypotheses
  • Naïve Bayes classifier
Write a Comment
User Comments (0)
About PowerShow.com