Introduction to Reasoning under Uncertainty - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Introduction to Reasoning under Uncertainty

Description:

Probability Theory ... Bayesian Theory sees a conditional probability as a ... Probability theory provides a machinery for uncertain reasoning, but often times, ... – PowerPoint PPT presentation

Number of Views:1557
Avg rating:3.0/5.0
Slides: 29
Provided by: Office2004855
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Reasoning under Uncertainty


1
Introduction to Reasoning under Uncertainty
MINDLab. Seminars on Reasoning and Planning under
Uncertainty
  • Ugur Kuter
  • MIND Lab.
  • 8400 Baltimore Avenue, Ste. 200
  • College Park, Maryland, 20742
  • Web Site for the seminars http//www.cs.umd.edu/u
    sers/ukuter/uncertainty/

2
From Classical Logic to Uncertainty
  • Logical reasoning is the process of deriving
    previously-unknown facts based on the known ones
  • However, it is not always possible to have access
    to the entire set of facts to do the reasoning
  • What we lose under uncertainty
  • Deduction (reasoning in stages)
  • Incrementality
  • Modality Locality Detachment

3
Existing Approaches
  • Semantic Models for Uncertainty
  • Rule-based systems (i.e., expert systems)
  • Organizes the knowledge in terms of if-then
    rules, each associated with a numeric certainty
    measure
  • Suffer from most of the problems with classical
    logic under uncertainty
  • Declarative Systems
  • Organizes knowledge based on likelihood,
    relevance, and causation among events
  • Flexible to uncertainty

4
Basic Terminology
  • Random variables describe the features of the
    world whose status may or may not be known
  • Propositions are bi-modal (i.e., Boolean) random
    variables
  • We will assume a finite set of propositional
    symbols, denoted as A,B,
  • Sometimes, we will use English sentences to
    denote propositions, e.g.,
  • Without loss of generality, we will assume that
    the world can be described by a finite set of
    propositions
  • An event in the world is an occurrence in the
    world that is modeled by assigning truth values
    to one or more propositions
  • e.g., the outcomes of rolling two dice are the
    same

5
Probability Theory
  • Probabilities as a way for representing the
    structure of knowledge and reasoning over that
    knowledge
  • Notation P(A) is the probability of the
    proposition A being true in the knowledge base
    (or alternatively, it is the probability of the
    event A occurs in the world)
  • The Axioms of Probability Theory
  • 0 ? P(A) ? 1
  • P(certain event) 1
  • P(A or B) P(A) P(B) - P(A and B)

6
Joint Probability Distributions
  • A joint event describes two occurrences at the
    same time
  • e.g., (A and B) specifies that both propositions
    A and B is true in the world
  • A joint probability distribution over a set of
    random variables specifies a probability for each
    possible combinations of values for those
    variables
  • e.g., a joint probability distribution for
    boolean variables X and Y specifies a probability
    for four cases
  • (X and Y), (X and ?Y), (?X and Y), and finally
    (?X and ?Y)
  • The sum of the joint probability of each case
    must be equal to 1

7
Absolute Independence
  • Suppose two events in a joint probability
    distribution are independent from each other
    (i.e., they are mutually-exclusive)
  • Then we can decompose the joint probability
    distribution into smaller distributions
  • The joint probability of two events under
    absolute independence
  • P(A and B) P(A) P(B)
  • However in most problems, absolute independence
    does not hold

8
Set-theoretic Interpretation of Probabilities
W The set of all possible worlds described by
the knowledge base
WA The set of possible worlds in which A is true
  • The probability of A being true is the proportion
    of WA to W
  • Set-theoretic operations correspond logical
    connectives i.e.,
  • A and B ? WA ? WB
  • A or B ? WA ? WB
  • ?A ? W - WA

9
Revisiting the Third Axiom
  • P(A or B) P(A) P(B) - P(A and B)
  • that is, the proportion of WA ? WB to W
  • If A and B are mutually-exclusive events (i.e.,
    WA ? WB ?) , then we have
  • P(A or B) P(A) P(B)

WA
WB
10
Using the Axioms of Probability
  • Rule via the union of joint events
  • P(A) P(A and B) P(A and ?B),
  • since
  • A and B ? WA ? WB
  • A and ?B ? WA - WB
  • More generally,
  • P(A) ?i P(A and Bi),
  • where B1, B2, , Bk is a set of exhaustive and
    mutually-exclusive set of events

11
Using the Axioms of Probability
  • Rule for absolute falsity
  • P(A) P(A and A) P(A and ?A), by rule of the
    union of joint events
  • P(A) P(false), by logical
    equivalence
  • P(false) 0, by algebra
  • Rule for Negation
  • P( A or ?A) P(A) P(?A) - P(A and ?A), by
    axiom 3
  • P(true) P(A) P(?A) - P(false), by
    logical equivalence
  • 1 P(A) P(?A) - 0, by axiom
    2
  • P(A) 1 - P(?A), by algebra

12
Conditional Probabilities
  • A conditional probability, P(A B), describes
    the belief in the event A, under the assumption
    that another event B is known with absolute
    certainty
  • Formal Definition
  • P(A B) P(A and B) / P(B)
  • That is, the proportion of
  • WA ? WB to WB

13
The Bayesians (1763 -- present)
  • The basis of the Bayesian Theory is conditional
    probabilities
  • Bayesian Theory sees a conditional probability as
    a way to describe the structure and organization
    of human knowledge
  • In this view, A B stands for the event A in the
    context of the event B
  • E.g., the symptom A in the context of a disease B

14
Mathematicians vs. Bayesians, i.e.,
  • P(A B) P(A and B) / P(B) vs. P(A and B)
    P(A B) P(B)
  • Example
  • The probability of the event A ? the outcomes of
    two dice are equal
  • Mathematician would compute the rule for joint
    events A and Bi, where
  • each Bi is the event the outcome of first dice
    is i
  • P(A) ?i P(A and Bi),
  • 6 x (1 / 36)
  • 1 / 6

15
Mathematicians vs. Bayesians, contd
  • The Bayesian Mindset for Dice Rolling
  • P( Equality) ?i P (Outcome of the second dice
    is i Bi) P(Bi)
  • 6 x (1/6) (1/6)
  • 1 /6
  • Since this is more natural in the
    assumption-based mental processes of human
    reasoning, i.e.
  • given that I know

16
The CHAIN Rule
  • The probability that a joint event (A1, , Ak)
    occurs can be computed via the conditional
    probabilities
  • P (A1, , Ak) P(Ak A1, , Ak-1)
  • P(Ak-1 A1, , Ak-2)
  • P(A2 A1) P(A1)

17
Evidential Reasoning The Inversion Rule
  • Reasoning about hypotheses and evidences
    that do/do not support those hypothesis is the
    main venue for Bayesian Inference
  • P(H e) given that I know about an evidence e,
    the probability that my hypothesis H is true
  • P(H e) P(e H) P(H) / P(e),
  • where P(e H) is the probability that evidence
    e will actually be observed in the world, if the
    hypothesis H is true.

18
Pooling of Evidence
  • Suppose the hypothesis H is that a patient has a
    particular disease
  • Let e1, , ek be the possible symptoms of that
    disease
  • If we observed some of the symptoms but not all
    of them, then the combined belief that the
    hypothesis is true can be computed as
  • P(e1, , ek H)P(H) / P(e1, , ek
    ?H)P(?H)
  • This will require an exponential number of
    conditional probabilities to specify

19
Naïve Bayes
  • Use conditional independence assumption
  • e.g., the event that whether we observe a symptom
    or not depends only on whether the patient has
    the disease, not on other symptoms
  • Then, the conditional probability P(e1, , ek
    H) can be computed as
  • P(e1, , ek H) ?i P(ei H)

20
Incremental Bayesian Updating
  • Let H be an hypothesis, E e1, , ek be the
    past data (evidence) observed for the hypothesis
    H
  • Suppose we observed a new data (evidence) e
  • What is the probability that the hypothesis is
    true, given the past and the new evidence?
  • Computing this probability is in efficient
    because it requires to store all the past
    evidence and combine it with the new evidence
  • Bayesian Theory allows us to reformulate the
    question as follows
  • How do we update our belief in H given the new
    evidence and our past belief in H?

21
Incremental Bayesian Updating (contd)
  • Combining prior beliefs with new evidence
  • P(H E and e) P(H E) P(e E,H) / P(e E)
  • Assuming condition independence between the new
    evidence and the old ones given the hypothesis
    (i.e., P(e E,H) P(e H))

Normalization constant
Updated belief in H
Probability that the new evidence will be
observed, given the old evidence and the
hypothesis
Prior belief in H
P(H E and e) ? P(H E) P(e H)
22
Hierarchical Models
  • So far, we assumed that the evidence from the
    world is directly linked to the hypothesis we are
    evaluating
  • E.g., a disease and its symptoms
  • In many problems, this not the case e.g.,


e1
e2
ek
H
23
1. Cascading Inference
  • What is the probability that a burglary took
    place, given the two neighbors testimonies?
  • P(B G,W) ? P(G, W B) P(B)
  • ? P(H) ( P(G,W B, S true)
    P(Strue B)
  • P(G,W B, S false) P(Sfalse B)
    )
  • From conditional independence, we have
  • P(G,W B, S true, false) P(G S) P(W
    S)
  • Thus, we have
  • P(B G,W) ? P(H) ?Strue,false P(G S)
    P(W S) P(S B)

24
2. Predicting Future
  • The daughter will call with some probability, if
    she hears the alarm sound
  • What is the probability that the daughter will
    call, given the testimonies of the neighbors?

25
Predicting Future
  • The probability that
  • the daughter will call is
  • P(D e) ?Strue,false P(D S) P(S e)
  • By inversion rule,
  • P(Strue,false e) ? P(e Strue,false)
    P(Strue,false)
  • where P(e S ) P(G S) P(W S) as before, and
  • P(S) ?Btrue,false P(S B) P(B)

Known
Unknown
26
3. Explaining Away
  • If an event has multiple causes, sometimes the
    occurrence of one cause reduces our belief in the
    occurrence of the other
  • If the Alarm is sensitive enough to go off when
    an earthquake occurs, the occurrence of the
    earthquake explains away the burglary hypothesis

27
Interactions between Multiple Causes
  • Conditional Probability Tables
  • (CPTs)
  • P(SB,E) P(B) P(E)
  • P(S?B,E)P(?B)P(E)
  • P(SB, ?E)P(B)P(?E)
  • P(S?B, ?E)P(?B)P(?E)
  • The size of a CPT is exponential in the size of
    the number of causes
  • i.e., if there are k causes of an event X, then
    the CPT for X has 2k entries
  • This creates serious efficiency problems for
    exact reasoning algorithms
  • Approximation techniques are developed for
    reasoning over incomplete CPTs
  • E.g., Noisy-OR (NOR), Generalized-NOR,
    Recursive-NOR
  • We will discuss these techniques later

28
Summary
  • Reasoning with classical logic has problems with
    plausible reasoning under uncertainty
  • Probability theory provides a machinery for
    uncertain reasoning, but often times, it is very
    inefficient to depend on solely probability
    theory
  • Exponential-sized CPTs
  • No structural reasoning over the knowledge
  • Bayesians use probability theory in order to
    describe the organization (i.e., the structure)
    of human knowledge and develop reasoning
    machinery over such structures
  • Hierarchical Modeling
  • Next Week
  • Structural Models of the Bayesians Networks of
    Plausible Reasoning
Write a Comment
User Comments (0)
About PowerShow.com