Handling Uncertainty - PowerPoint PPT Presentation

About This Presentation
Title:

Handling Uncertainty

Description:

Handling Uncertainty Uncertain knowledge Typical example: Diagnosis. Consider: x Symptom(x, Toothache) Disease(x, Cavity). The problem is that this rule is wrong. – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 35
Provided by: AlexT50
Category:

less

Transcript and Presenter's Notes

Title: Handling Uncertainty


1
Handling Uncertainty
2
Uncertain knowledge
  • Typical example Diagnosis. Consider
  • ?x Symptom(x, Toothache) ? Disease(x, Cavity).
  • The problem is that this rule is wrong. Not all
    patients with toothache have cavities some of
    them have gum disease, an abscess,
  • ?x Symptom(x, Toothache) ? Disease(x, Cavity) ?
    Disease(x, GumDisease) ? Disease(x, Abscess) ?
    ...
  • Unfortunately, in order to make the rule true, we
    have to add an almost unlimited list of possible
    causes.
  • We could try turning the rule into a causal rule
  • ?x Disease(x, Cavity) ? Symptom(x, Toothache).
  • But this rule isnt right either not all
    cavities cause pain. We need to make it logically
    exhaustive to augment the left side with all the
    qualifications required for a cavity to cause
    toothache.

3
So, using FOL fails
  • In a domain like medical diagnosis, using FOL
    fails because of
  • Laziness
  • Too much work to list the complete set of
    antecedents or consequents to ensure an
    exceptionless rule.
  • Too hard to use such rules.
  • Theoretical Ignorance
  • Medical science hasnt complete theory for the
    domain.
  • Practical Ignorance
  • Even if we know all the rules, we might be
    uncertain about a particular patient because not
    all necessary tests have been done

4
Belief and Probability
  • The connection between toothaches and cavities is
    not a logical consequence in either direction.
  • However, we can provide a degree of belief on the
    sentences. Our main tool for this is probability
    theory.
  • E.g. We might not know for sure what afflicts a
    particular patient, but we believe that there is,
    say, an 80 chance that is probability 0.8
    that the patient has cavity if he has a
    toothache.
  • We usually get this belief from statistical data.
  • Assigning probability 0 to a sentence correspond
    to an unequivocal belief that the sentence is
    false.
  • Assigning probability 1 to a sentence correspond
    to an unequivocal belief that the sentence is
    true.

5
Syntax
  • Basic element random variable
  • Similar to propositional logic possible worlds
    defined by assignment of values to random
    variables.
  • Boolean random variables
  • e.g., Cavity (do I have a cavity?)
  • Discrete random variables
  • e.g., Weather is one of ltsunny,rainy,cloudy,snowgt
  • Domain values must be exhaustive and mutually
    exclusive
  • Elementary proposition constructed by assignment
    of a value to a random variable e.g., Weather
    sunny, Cavity false
    (abbreviated as ?cavity)
  • Complex propositions formed from elementary
    propositions and standard logical connectives
    e.g., Weather sunny ? Cavity false

6
Atomic events
  • Atomic event A complete specification of the
    state of the world about which the agent is
    uncertain
  • E.g., if the world consists of only two Boolean
    variables Cavity and Toothache, then there are 4
    distinct atomic events
  • Cavity false ? Toothache false
  • Cavity false ? Toothache true
  • Cavity true ? Toothache false
  • Cavity true ? Toothache true
  • Atomic events are mutually exclusive and
    exhaustive

7
Axioms of probability
  • For any propositions A, B
  • 0 P(A) 1
  • Necessarily true (i.e. valid) propositions have
    probability 1, and necessarily false (i.e.
    unsatisfiable) propositions have probability 0.
  • P(true) 1 and P(false) 0
  • P(A ? B) P(A) P(B) - P(A ? B)

8
Using the axioms of probability
  • We can derive a variety of useful facts from
    basic axioms. E.g.
  • P(a ? ?a) P(a) P(?a) P(a ? ?a) (by axiom
    3)
  • P(true) P(a) P(?a) P(a ? ?a) (by logical
    equivalence)
  • 1 P(a) P(?a) (by axiom 2)
  • P(?a) 1 - P(a) (by algebra)
  • Also, we can prove that for discrete variable D
    with domain ltd1,,dngt we have
  • ?ni1P(Ddi) 1.

9
Prior probability and distribution
  • Prior or unconditional probability associated
    with a proposition is the degree of belief
    accorded to it in the absence of any other
    information.
  • e.g., P(Cavity true) 0.1 (or P(cavity)
    0.1)
  • P(Weather sunny) 0.7 (or P(sunny)
    0.7)
  • Probability distribution gives values for all
    possible assignments
  • P(Weather sunny) 0.7
  • P(Weather rain) 0.2
  • P(Weather cloudy) 0.08
  • P(Weather snow) 0.02
  • As a shorthand we can use a vector notation as
  • P(Weather) lt0.7, 0.2, 0.08, 0.02gt (they sum
    up to 1)

10
Joint probability
  • Joint probability distribution for a set of
    random variables gives the probability of every
    atomic event on those random variables.
  • E.g. for two random variables Weather and Cavity
    we have we have
  • P(Weather,Cavity), which is a 4 2 matrix of
    values
  • Weather sunny rainy cloudy snow
  • Cavity true 0.144 0.02 0.016 0.02
  • Cavity false 0.576 0.08 0.064 0.08
  • We can consider the joint probability
    distribution of all the variables we use to
    describe the world. Such joint probability
    distribution is called full joint probability
    distribution.
  • A full joint distribution specifies the
    probability of every atomic event.
  • Any probabilistic question about a domain can be
    answered by the full joint distribution.

11
Conditional probability
  • Conditional or posterior probabilities
  • e.g., P(cavity toothache) 0.8
  • i.e., given that toothache is all I know
  • Notation for conditional distributions
  • P(Cavity Toothache) is a 2-element vector of
    2-element vectors

12
Conditional probability
  • Definition of conditional probability
  • P(a b) P(a ? b) / P(b) if P(b) gt 0
  • Product rule gives an alternative formulation
  • P(a ? b) P(a b) P(b) P(b a) P(a)
  • A general version holds for whole distributions,
    e.g.,
  • P(Weather,Cavity) P(Weather Cavity) P(Cavity)
  • i.e. shorthand for
  • P(sunny ? cavity) P(sunny cavity) P(cavity)
  • P(rainy ? cavity) P(rainy cavity) P(cavity)
  • (View it as a set of 4 2 equations, not matrix
    multiplication)

13
Chain rule
  • Chain rule is derived by successive application
    of product rule

14
Inference by enumeration
  • Start with the joint probability distribution
  • For any proposition f, sum up the probabilities
    of the atomic events where it is true
  • P(f) S??f P(?)

15
Inference by enumeration
16
Inference by enumeration
17
Inference by enumeration
  • Can also compute conditional probabilities

18
Normalization
  • Denominator can be viewed as a normalization
    constant a and we can write in vector notation

19
Inference by enumeration, contd
  • Typically, we are interested in
  • the posterior joint distribution of the query
    variables Y
  • given specific values e for the evidence
    variables E
  • Let the hidden variables be H X - Y - E
  • Then the required summation of joint entries is
    done by summing out the hidden variables
  • P(Y E e) aP(Y,E e) aShP(Y,E e, H h)
  • The terms in the summation are joint entries
    because Y, E and H together exhaust the set of
    random variables
  • Obvious problems
  • Worst-case time complexity O(dn) where d is the
    largest arity
  • Space complexity O(dn) to store the joint
    distribution
  • How to find the numbers for O(dn) entries?

20
Independence
  • Lets add a fourth variable, Weather.
  • The full joint probability distribution
  • P(Toothache, Catch, Cavity, Weather)
  • has 32 entries (because Weather has 4 values)!!
  • It contains 4 editions of the previous table, one
    for each kind of weather.
  • Naturally, we ask what relationship these
    editions have to each other and to the original
    table?
  • E.g. how are
  • P(toothache, catch, cavity, cloudy) and
  • P(toothache, catch, cavity) related? Lets use
    the product rule
  • P(toothache, catch, cavity, cloudy)
    P(cloudy toothache, catch, cavity)
    P(toothache, catch, cavity)
  • Of course, ones dental problems dont influence
    the weather, so
  • P(cloudy toothache, catch, cavity) P(cloudy)

21
Independence (contd)
  • So, we can write
  • P(toothache, catch, cavity, cloudy)
    P(cloudy toothache, catch, cavity)
    P(toothache, catch, cavity)
  • P(cloudy) P(toothache, catch, cavity)
  • Thus, the 32 element table for four variables can
    be constructed from one 8-element table and one
    4-element table!!
  • This property is called independece.
  • A and B are independent iff
  • P(AB) P(A) or P(BA) P(B) or P(A, B)
    P(A) P(B)
  • Absolute independence powerful but rare
  • Dentistry is a large field with hundreds of
    variables, none of which are independent.

22
Bayes' Rule
  • Product rule
  • P(a?b) P(a b) P(b) P(b a) P(a)
  • Bayes' rule P(a b) P(b a) P(a) / P(b)
  • or in vector form
  • P(YX) P(XY) P(Y) / P(X) a P(XY) P(Y)
  • Useful for assessing diagnostic probability from
    causal probability
  • P(CauseEffect) P(EffectCause) P(Cause) /
    P(Effect)

23
Applying Bayes rule
  • Bayess rule is useful in practice because there
    are many cases where we do have good probability
    estimates for these three numbers and need to
    compute the fourth.
  • For example,
  • A doctor knows that the meningitis causes the
    patient to have a stiff neck 50 of the time.
  • The doctor also knows some unconditional facts
  • the prior probability that a patient has
    meningitis is 1/50,000, and
  • the prior probability that any patient has a
    stiff neck is 1/20.

24
Bayes rule (contd)
  • Let  s  be the proposition that the patient has a
    stiff neck
  •            m  be the proposition that the patient
    has meningitis,  
  •             P(sm) 0.5
  •             P(m) 1/50000
  •             P(s) 1/20
  •             P(ms) P(sm) P(m) / P(s) (0.5) x
    (1/50000) / (1/20) 0.0002
  •                                               
  • That is, we expect only 1 in 5000 patients with a
    stiff neck to have meningitis.

25
Bayes rule (contd)
  • Well, we might say that doctors know that a stiff
    neck implies meningitis in 1 out of 5000 cases
  • That is the doctor has quantitative information
    in the diagnostic direction from symptoms
    (effects) to causes.
  • Such a doctor has no need for Bayes rule?!
  • Unfortunately, diagnostic knowledge is more
    fragile than causal knowledge.
  • Imagine, there is sudden epidemic of meningitis.
    The prior probability, P(m), will go up.
  • The doctor who derives the diagnostic probability
    P(ms) from his statistical observations of
    patients before the epidemic will have no idea
    how to update the value.
  • The doctor who derives the diagnostic
    probability P(ms) from the other three values
    will see that P(ms) goes up proportionally with
    P(m).
  • Clearly, P(sm) is unaffected by the epidemic. It
    simply reflects the way meningitis works.

26
Difficulty with more than two vars
27
Conditional independence
  • P(Toothache, Cavity, Catch) has 23 8
    independent entries
  • If I have a cavity, the probability that the
    probe catches in it doesn't depend on whether I
    have a toothache
  • (1) P(catch toothache, cavity) P(catch
    cavity)
  • The same independence holds if I haven't got a
    cavity
  • (2) P(catch toothache,?cavity) P(catch
    ?cavity)
  • Catch is conditionally independent of Toothache
    given Cavity
  • P(Catch Toothache,Cavity) P(Catch Cavity)
  • Equivalent statements
  • P(Toothache Catch, Cavity) P(Toothache
    Cavity)
  • P(Toothache, Catch Cavity) P(Toothache
    Cavity) P(Catch Cavity)

28
Conditional independence contd.
  • Write out full joint distribution using chain
    rule
  • P(Toothache, Catch, Cavity)
  • P(Toothache Catch, Cavity) P(Catch, Cavity)
  • P(Toothache Catch, Cavity) P(Catch Cavity)
    P(Cavity)
  • P(Toothache Cavity) P(Catch Cavity)
    P(Cavity)
  • In most cases, the use of conditional
    independence reduces the size of the
    representation of the joint distribution from
    exponential in n to linear in n.
  • Conditional independence is our most basic and
    robust form of knowledge about uncertain
    environments.

29
In general
30
Bayes' Rule and conditional independence
  • P(Cavity toothache ? catch)
  • aP(toothache ? catch Cavity) P(Cavity)
  • aP(toothache Cavity) P(catch Cavity)
    P(Cavity)
  • This is an example of a naïve Bayes model
  • P(Cause,Effect1, ,Effectn) P(Cause)
    piP(EffectiCause)
  • Total number of parameters is linear in n

31
Athens Example
  • Suppose you are a witness to a nighttime
    hit-and-run accident involving a taxi in Athens.
  • All taxis in Athens are blue or green.
  • You swear, under oath, that the taxi was blue.
  • Extensive testing shows that, under the dim
    lighting conditions, discrimination between blue
    and green is 75 reliable.
  • 9 out of 10 Athenian taxis are green
  • Whats most likely color for the taxi?
  • Hint distinguish carefully between the
    proposition that the taxi is blue and the
    proposition that the taxi appears blue.

32
Athens Example (contd)
  • Two random variables.
  • B taxi was blue with domain b, ?b
  • LB taxi looked blue with domain lb, ?lb
  • The information on the reliability of color
    identification can be written as
  • P(lb b) 0.75 P(?lb ?b) 0.75
  • We need to know the probability that the taxi was
    blue, given that it looked blue.
  • Then, we need to know the probability that the
    taxi wasnt blue, given that it looked blue.
    Lets use the Bayes rule
  • P(b lb) ?P(lb b) P(b) ? 0.75 0.1 ?
    0.075
  • P(?b lb) ?P(lb ?b) P(?b)
  • ?(1 - P(?lb ?b)) (1 - P(b))
  • ?(1 - 0.75) (1 0.1) ? 0.25 0.9 ?
    0.225
  • Hence, P(B lb) lt ?0.075, ?0.225gt. So, even
    if the witness has seen a blue it is more
    probable that the taxi was green.
  • ? 1/P(lb) 1/( P(b lb) P(?b lb) )
    1/(0.075 0.225)

33
Text Categorization
  • Text categorization is the task of assigning a
    given document to one of a fixed set of
    categories, on the basis of the text it contains.
  • Naïve Bayes models are often used for this task.
  • In these models, the query variable is the
    document category, and the effect variables are
    the presence or absence of each word in the
    language.
  • How such a model can be constructed, given as
    training data a set of documents that have been
    assigned to categories?
  • The model consists of the prior probability
    P(Category) and the conditional probabilities
    P(Wordi Category).
  • For each category c, P(Categoryc) is estimated
    as as the fraction of all the training
    documents that are of that category.
  • Similarly, P(Wordi true Category c) is
    estimated as the fraction of documents of
    category that contain word.
  • Also, P(Wordi true Category ?c) is
    estimated as the fraction of documents not of
    category that contain word.

34
Text Categorization (contd)
  • Now we can use naïve Bayes for each c
  • P(Category c Word1 true, , Wordn true)
  • ?P(Category c)?ni1 P(Wordi true
    Category c)
  • P(Category ?c Word1 true, , Wordn true)
  • ?P(Category ?c)?ni1 P(Wordi true
    Category ?c)
  • where ? is the normalization constant.
Write a Comment
User Comments (0)
About PowerShow.com