Title: Knowledge Representations
1Knowledge Representations
- One large distinction between an AI system and a
normal piece of software is that an AI system
must reason using worldly knowledge - What types of knowledge?
- Facts
- Axioms
- Statements (which may or may not be true)
- Rules
- Cases
- Experiences
- Associations (which may not be truth preserving)
- Descriptions
- Probabilities and Statistics
2Types of Representations
- Early systems used either
- semantic networks or predicate calculus to
represent knowledge - or used simple search spaces if the
domain/problem had very limited amounts of
knowledge (e.g., simple planning as in blocks
world) - With the early expert systems in the 70s, a
significant shift took place to production
systems, which combined representation and
process (chaining) and even uncertainty handling
(certainty factors) - later, frames (an early version of OOP) were
introduced - Problem-specific approaches were introduced such
as scripts and CDs for language representation - In the 1980s, there was a shift from rules to
model-based approaches - Since the 1990s, Bayesian networks and hidden
Markov Models have become popular - First, we will take a brief look at some of the
representations
3Search Spaces
- Given a problem expressed as a state space
(whether explicitly or implicitly) - Formally, we define a search space as N, A, S,
GD - N set of nodes or states of a graph
- A set of arcs (edges) between nodes that
correspond to the steps in the problem (the legal
actions or operators) - S a nonempty subset of N that represents start
states - GD a nonempty subset of N that represents goal
states - Our problem becomes one of traversing the graph
from a node in S to a node in GD - Example
- 3 missionaries and 3 cannibals are on one side of
the river with a boat that can take exactly 2
people across the river - how can we move the 3 missionaries and 3
cannibals across the river such that the
cannibals never outnumber the missionaries on
either side of the river (lest the cannibals
start eating the missionaries!)
4M/C Solution
- We can represent a state as a 6-item tuple
(a, b, c, d, e, f) - a/b number of missionaries/cannibals on left
shore - c/d number of missionaries/cannibals in boat
- e/f number of missionaries/cannibals on right
shore - where a b c d e f 6
- a gt b (unless a 0), c gt d (unless c 0), and
e gt f (unless e 0) - Legal operations (moves) are
- 0, 1, 2 missionaries get into boat
- 0, 1, 2 missionaries get out of boat
- 0, 1, 2 cannibals get into boat
- 0, 1, 2 missionaries get out of boat
- boat sails from left shore to right shore
- boat sails from right shore to left shore
5Relationships
- We often know stuff about objects (whether
physical or abstract) - These objects have attributes (components,
values) and/or relationships with other things - So, one way to represent knowledge is to
enumerate the objects and describe them through
their attributes and relationships - Common forms of such relationship representations
are - semantic networks a network consists of nodes
which are objects and values, and edges
(links/arcs) which are annotated to include how
the nodes are related - predicate calculus predicates are often
relationships and arguments for the predicates
are objects - frames in essence, objects (from
object-oriented programming) where attributes are
the data members and the values are the specific
values stored in those members in some cases,
they are pointers to other objects
6Representations With Relationships
Here, we see the same information
being represented using two different
representational techniques a semantic network
(above) and predicates (to the left)
7Another Example Blocks World
Here we see a real-world situation of three
blocks and a predicate calculus representation
for expressing this knowledge We equip our
system with rules such as the below rule to
reason over how to draw conclusions and
manipulate this blocks world
This rule says if there does not exist a Y that
is on X, then X is clear
8Semantic Networks
- Collins and Quillian were the first to use
semantic networks in AI by storing in the network
the objects and their relationships - their intention was to represent English
sentences - edges would typically be annotated with these
descriptors or relations
- isa class/subclass
- instance the first object is an instance of the
class - has contains or has this as a physical property
- can has the ability to
- made of, color, texture, etc
A semantic network to represent the sentences a
canary can sing/fly, a canary is a
bird/animal, a canary is a canary, a canary
has skin
9Representing Word Meanings
- Quillian demonstrated how to use the semantic
network to represent word meanings - each word would have one or more networks, with
links that attach words to their definition
planes - the word plant is represented as three planes,
each of which has links to additional word planes
10Frames
- The semantic network requires a graph
representation which may not be a very efficient
use of memory - Another representation is the frame
- the idea behind a frame was originally that it
would represent a frame of memory for
instance, by capturing the objects and their
attributes for a given situation or moment in
time - a frame would contain slots where a slot could
contain - identification information (including whether
this frame is a subclass of another frame) - relationships to other frames
- descriptors of this frame
- procedural information on how to use this frame
(code to be executed) - defaults for slots
- instance information (or an identification of
whether the frame represents a class or an
instance)
11Frame Example
Here is a partial frame representing a hotel
room The room contains a chair, bed, and phone
where the bed contains a mattress and a bed
frame (not shown)
12Production Systems
- A production system is
- a set of rules (if-then or condition-action
statements) - working memory
- the current state of the problem solving, which
includes new pieces of information created by
previously applied rules - inference engine (the author calls this a
recognize-act cycle) - forward-chaining, backward-chaining, a
combination, or some other form of reasoning such
as a sponsor-selector, or agenda-driven scheduler - conflict resolution strategy
- when it comes to selecting a rule, there may be
several applicable rules, which one should we
select? the choice may be based on a conflict
resolution strategy such as first rule, most
specific rule, most salient rule, rule with
most actions, random, etc
13Chaining
- The idea behind a production systems reasoning
is that rules will describe steps in the problem
solving space where a rule might - be an operation in a game like a chess move
- translate a piece of input data into an
intermediate conclusion - piece together several intermediate conclusions
into a specific conclusion - translate a goal into substeps
- So a solution using a production system is a
collection of rules that are chained together - forward chaining reasoning from data to
conclusions where working memory is sought for
conditions that match the left-hand side of the
given rules - backward chaining reasoning from goals to
operations where an initial goal is unfolded into
the steps needed to solve that goal, that is, the
process is one of subgoaling
14Two Example Production Systems
15Example System Water Jugs
- Problem given a 4-gallon jug (X) and a 3-gallon
jug (Y), fill X with exactly 2 gallons of water - assume an infinite amount of water is available
- Rules/operators
- 1. If X 0 then X 4 (fill X)
- 2. If Y 0 then Y 3 (fill Y)
- 3. If X gt 0 then X 0 (empty X)
- 4. If Y gt 0 then Y 0 (empty Y)
- 5. If X Y gt 3 and X gt 0 then X X (3 y)
and Y 3 (fill Y from X) - 6. If X Y gt 4 and Y gt 0 then X 4 and Y Y
(4 X) (fill X from Y) - 7. If X Y lt 3 and X gt 0 then X 0 and Y X
Y (empty X into Y) - 8. If X Y lt 4 and Y gt 0 then X X Y and Y
0 (empty Y into X) - rule numbers used on the next slide
16Conflict Resolution Strategies
- In a production system, what happens when more
than one rule matches? - a conflict resolution strategy dictates how to
select from between multiple matching rules - Simple conflict resolution strategies include
- random
- first match
- most/least recently matched rule
- rule which has matched for the longest/shortest
number of cycles (refractoriness) - most salient rule (each rule is given a salience
before you run the production system) - More complex resolution strategies might
- select the rule with the most/least number of
conditions (specificity/generality) - or most/least number of actions (biggest/smallest
change to the state)
17MYCIN
- By the early 1970s, the production system
approach was found to be more than adequate for
constructing large scale expert systems - in 1971, researchers at Stanford began
constructing MYCIN, a medical diagnostic system - it contained a very large rule base
- it used backward chaining
- to deal with the uncertainty of medical
knowledge, it introduced certainty factors (sort
of like probabilities) - in 1975, it was tested against medical experts
and performed as well or better than the doctors
it was compared to
(defrule 52 if (site culture is blood)
(gram organism is neg) (morphology organism
is rod) (burn patient is serious) then .4
(identity organism is pseudomonas))
If the culture was taken from the patients
blood and the gram of the organism is negative
and the morphology of the organism is rods and
the patient is a serious burn patient, then
conclude that the identity of the organism is
pseudomonas (.4 certainty)
18MYCIN in Operation
- Mycins process starts with diagnose-and-treat
- repeat
- identify all rules that can provide the
conclusion currently sought - match right hand sides (that is, search for rules
whose right hand sides match anything in working
memory) - use conflict resolution to identify a single rule
- fire that rule
- find and remove a piece of knowledge which is no
longer needed - find and modify a piece of knowledge now that
more specific information is known - add a new subgoal (left-hand side conditions that
need to be proved) - until the action done is added to working memory
- Mycin would first identify the illness, possibly
ordering more tests to be performed, and then
given the illness, generate a treatment - Mycin consisted of about 600 rules
19R1/XCON
- Another success story is DECs R1
- later renamed XCON
- This system would take customer orders and
configure specific VAX computers for those orders
including - completing the order if the order was incomplete
- how the various components (drive and tape units,
mother board(s), etc) would be placed inside the
mainframe cabinet) - how the wiring would take place among the various
components - R1 would perform forward chaining over about
10,000 rules - over a 6 year period, it configured some 80,000
orders with a 95-98 accuracy rating - ironically, whereas planning/design is viewed as
a backward chaining task, R1 used forward
chaining because, in this particular case, the
problem is data driven, starting with user input
of the computer systems specifications - R1s solutions were similar in quality to human
solutions
20R1 Sample Rules
- Constraint rules
- if device requires battery then select battery
for device - if select battery for device then pick battery
with voltage(battery) voltage(device) - Configuration rules
- if we are in the floor plan stage and there is
space for a power supply and there is no power
supply available then add a power supply to the
order - if step is configuring, propose alternatives and
there is an unconfigured device and no container
was chosen and no other device that can hold it
was chosen and selecting a container wasnt
proposed yet and no problems for selecting
containers were identified then propose selecting
a container - if the step is distributing a massbus device and
there is a single port disk drive that has not
been assigned to a massbus and there are no
unassigned dual port disk drives and the number
of devices that each massbus should support is
known and there is a massbus that has been
assigned at least one disk drive and that should
support additional disk drives and the type of
cable needed to connect the disk drive is known,
then assign the disk drive to this massbus
21Strong Slot-n-Filler Structures
- To avoid the difficulties with Frames and Nets,
Schank and Rieger offered two network-like
representations that would have implied uses and
built-in semantics conceptual dependencies and
scripts - the conceptual dependency was derived as a form
of semantic network that would have specific
types of links to be used for representing
specific pieces of information in English
sentences - the action of the sentence
- the objects affected by the action or that
brought about the action - modifiers of both actions and objects
- they defined 11 primitive actions, called ACTs
- every possible action can be categorized as one
of these 11 - an ACT would form the center of the CD, with
links attaching the objects and modifiers
22Example CD
- The sentence is John ate the egg
- The INGEST act means to ingest an object (eat,
drink, swallow) - the P above the double arrow indicates past test
- the INGEST action must have an object (the O
indicates it was the object Egg) and a direction
(the object went from Johns mouth to Johns
insides) - we might infer that it was an egg instead of
the egg as there is nothing specific to
indicate which egg was eaten - we might also infer that John swallowed the egg
whole as there is nothing to indicate that John
chewed the egg!
23The CD Theory ACTs
- Is this list complete?
- what actions are missing?
- Could we reduce this list to make it more
concise? - other researchers have developed other lists of
primitive actions including just 3 physical
actions, mental actions and abstract actions
24Example CD Links
25Example CDs
26More Examples
27Complex Example
- The sentence is John prevented Mary from giving
a book to Bill - This sentence has two ACTs, DO and ATRANS
- DO was not in the list of 11, but can be thought
of as caused to happen
- The c/ means a negative conditional, in this case
it means that John caused this not to happen - The ATRANS is a giving relationship with the
object being a Book and the action being from
Mary to Bill Mary gave a book to Bill - like with the previous example, there is no way
of telling whether it is a book or the book
28Scripts
- The other structured representation developed by
Schank (along with Abelson) is the script - a description of the typical actions that are
involved in a typical situation - they defined a script for going to a restaurant
- scripts provide an ability for default reasoning
when information is not available that directly
states that an action occurred - so we may assume, unless otherwise stated, that a
diner at a restaurant was served food, that the
diner paid for the food, and that the diner was
served by a waiter/waitress - A script would contain
- entry condition(s) and results (exit conditions)
- actors (the people involved)
- props (physical items at the location used by the
actors) - scenes (individual events that take place)
- The script would use the 11 ACTs from CD theory
29Restaurant Script
- The script does not contain atypical actions
- although there are options such as whether the
customer was pleased or not - There are multiple paths through the scenes to
make for a robust script - what would a going to the movies script look
like? would it have similar props, actors,
scenes? how about going to class?
30Knowledge Groups
- One of the drawbacks of the knowledge
representations demonstrated thus far is that all
knowledge is grouped into a single, large
collection of representations - the rules taken as a whole for instance dont
denote what rules should be used in what
circumstance - Another approach is to divide the representations
into logical groupings - this permits easier design, implementation,
testing and debugging because you know what that
particular group is supposed to do and what
knowledge should go into it - it should be noted that by distributing the
knowledge, we might use different problem solving
agents for each set of knowledge so that the
knowledge is stored using different
representations
31Knowledge Sources and Agents
- Which leads us to the idea of having multiple
problem solving agents - each agent is responsible for solving some
specialized type of problem(s) and knows where to
obtain its own input - each agent has its own knowledge sources, some
internal, some external - since external agents may have their own forms of
representation, the agent must know - how to find the proper agents
- how to properly communicate with these other
agents - how to interpret the information that it receives
from these agents - how to recover from a situation where the
expected agent(s) is/are not available
32What is an Agent?
- Agents are interactive problem solvers that have
these properties - situated the agent is part of the problem
solving environment it can obtain its own input
from its environment and it can affect its
environment through its output - autonomous the agent operates independently of
other agents and can control its own actions and
internal states - flexible the agent is both responsive and
proactive it can go out and find what it needs
to solve its problem(s) - social the agent can interact with other agents
including humans - Some researchers also insist that agents have
- mobility have the ability to move from their
current environment to a new environment (e.g.,
migrate to another processor) - delegation hand off portions of the problem to
other agents - cooperation if multiple agents are tasked with
the same problem, can their solutions be combined?
33The Semantic Web
- The WWW is a collection of data and knowledge in
an unstructured format - Humans often can take knowledge from disparate
sources and put together a coherent picture, can
problem solving agents? - Agents on the semantic web all have their own
capabilities and know where to look for knowledge
- Whether a static source, or an agent that can
provide the needed information through its own
processing, or from a human - The common approach is to model the knowledge of
a web site using an ontology - ontologies give agents the ability to translate
the results of another agent, or the data
provided from a website, into a version of
knowledge that they can understand and use
34Knowledge Acquisition and Modeling
- Expert System construction used to be a
trial-and-error sort of approach with the
knowledge engineers - once they had knowledge from the experts, they
would fill in their knowledge base and test it
out - By the end of the 80s, it was discovered that
creating an actual domain model was the way to go
build a model of the knowledge before
implementing anything - A model might be
- a dependency graph of what can cause what to
happen - or an associational model which is a collection
of malfunctions and the manifestations we would
expect to see from those malfunctions - or a functional model where component parts are
enumerated and described by function and behavior - The emphasis changed to knowledge acquisition
tools (KADS) - domain experts enter their knowledge as a
graphical model that contains the component parts
of the item being diagnosed/designed, their
functions, and rules for deciding how to diagnose
or design each one
35A NASA Example
- Here is a model developed by NASA for a
Livingston propulsion system for rockets - a reactive self-configuring autonomous system
- knowledge modeled using propositional calc
(instead of predicate calc there are a finite
number of elements, each will be modeled by its
own proposition)
Helium is the fuel tank Oxidizer is mixed to
cause the fuel to burn Acc is the accelerometer
which, along with sensors in the valves, is used
as input to control the system Pryo valves are
used as control once they Change state, they
stay in that state so they are used to change
the flow of fuel when an error is detected,
opening or closing a new pathway from tank to
engine
36Model (Architecture) for the System
- The idea is that the configuration manager tries
to keep the spacecraft moving but at the lowest
cost configuration - Sensors feed into the ME (mode estimator) to
determine if the system is functioning and in the
lowest configuration - If not, the MR (mode reconfiguration) plans a new
mode by determining what valves to open and close - Since this is a spacecraft, the output of the MR
is a set of actions that cause valves to open or
close directly
The high level planner generates a sequence of
hardware configurations goals such as the amount
of propellant that should be used , it is the
configuration manager that must translate these
goals into actions
37VT an Elevators Design
The design of an elevator can be used to
generate a diagnostic system for elevator
problems, or in VTs case, a system that
can design new elevators
38Reasoning with Uncertainty
- Representations generally represent knowledge as
fact - However, often, knowledge and the use of the
knowledge brings with it a degree of uncertainty - how can we represent and reason with uncertainty?
- We find two forms of uncertainty
- unsure input
- unknown do not know the answer so you have to
say unknown - unclear answer doesnt fit question (e.g., not
yes but 80 yes) - vague data is a 100 degree temp a high fever
or just fever? - ambiguous/noisy data data may not be easily
interpretable - non-truth preserving knowledge (most rules are
associational, not truth preserving) - unlike if you are a man then you are mortal, a
doctor might reason from symptoms to diseases - all men are mortal denotes a class/subclass
relationship, which is truth preserving - but the symptom to disease reasoning is based on
associations and is not guaranteed to be true
39Certainty Factors
- First used in the Mycin system, the idea is that
we will attribute a measure of belief to any
conclusion that we draw - CF(H E) MB(H E) MD(H E)
- certainty factor for hypothesis H given evidence
E is the measure of belief we have for H minus
measure of disbelief we have for H - CFs are applied to hypotheses that are drawn from
rules - CFs can be combined as we associate a CF with
each condition and each conclusion of each rule - To use CFs, we need
- to annotate every rule with a CF value (this
comes from the expert) - ways to combine CFs when we use AND, OR, ?
- Combining rules are straightforward
- for AND use min
- for OR use max
- for ? use (multiplication)
40CF Example
- Assume we have the following rules
- A ? B (.7)
- A ? C (.4)
- D ? F (.6)
- B AND G ? E (.8)
- C OR F ? H (.5)
- We know A, D and G are true (so each have a value
of 1.0) - B is .7 (A is 1.0, the rule is true at .7, so B
is true at 1.0 .7 .7) - C is .4
- F is .6
- B AND G is min(.7, 1.0) .7 (G is 1.0, B is .7)
- E is .7 .8 .56
- C OR F is max(.4, .6) .6
- H is .6 .5 .30
41Continued
- Another combining rule is needed when we can
conclude the same hypothesis from two or more
rules - we already used C OR F ? H (.5) to conclude H
with a CF of .30 - lets assume that we also have the rule E ? H
(.5) - since E is .56, we have H at .56 .5 .28
- We now believe H at .30 and at .28, which is
true? - the two rules both support H, so we want to draw
a stronger conclusion in H since we have two
independent means of support for H - We will use the formula CF1 CF2 CF1CF2
- CF(H) .30 .28 - .30 .28 .496
- our belief in H has been strengthened through two
different chains of logic
42Fuzzy Logic
- Prior to CFs, Zadeh introduced fuzzy logic to
introduce shades of grey into logic - other logics are two-valued, true or false only
- Here, any proposition can take on a value in the
interval 0, 1 - Being a logic, Zadeh introduced the algebra to
support logical operators of AND, OR, NOT, ? - X AND Y min(X, Y)
- X OR Y max(X, Y)
- NOT X (1 X)
- X ? Y X Y
- Where the values of X, Y are determined by where
they fall in the interval 0, 1
43Fuzzy Set Theory
- Fuzzy sets are to normal sets what fuzzy logic is
to logic - fuzzy set theory is based on fuzzy values from
fuzzy logic but includes set operations instead
of logic operations - The basis for fuzzy sets is defining a fuzzy
membership function for a set - a fuzzy set is a set of items along with their
membership values in the set where the membership
value defines how closely that item is to being
in that set - Example the set tall might be denoted as
- tall x f(x) 1.0 if x gt 62, .8 if x gt
6, .6 if x gt 510, .4 if x gt 58, .2 if x gt
56, 0 otherwise - so we can say that a person is tall at .8 if they
are 61 or we can say that the set of tall
people are Anne/.2, Bill/1.0, Chuck/.6, Fred/.8,
Sue/.6
44Fuzzy Membership Function
- Typically, a membership function is a continuous
function (often represented in a graph form like
above) - given a value y, the membership value for y is
u(y), determined by tracing the curve and seeing
where it falls on the u(x) axis - How do we define a membership function?
- this is an open question
45Using Fuzzy Logic/Sets
- 1. fuzzify the input(s) using fuzzy membership
functions - 2. apply fuzzy logic rules to draw conclusions
- we use the previous rules for AND, OR, NOT, ?
- 3. if conclusions are supported by multiple
rules, combine the conclusions - like CF, we need a combining function, this may
be done by computing a center of gravity using
calculus - 4. defuzzify conclusions to get specific
conclusions - defuzzification requires translating a numeric
value into an actionable item - Fuzzy logic is often applied to domains where we
can easily derive fuzzy membership functions and
have a few rules but not a lot - fuzzy logic begins to break down when we have
more than a dozen or two rules
46Example
- We have an atmospheric controller which can
increase or decrease the temperature of the air
and can increase or decrease the fan based on
these simple rules - if air is warm and dry, decrease the fan and
increase the coolant - if air is warm and not dry, increase the fan
- if air is hot and dry, increase the fan and the
increase the coolant slightly - if air is hot and not dry, increase the fan and
coolant - if air is cold, turn off the fan and decrease the
coolant - Our input obviously requires the air temperature
and the humidity, the membership function for air
temperature is shown to the right
if it is 60, it would be considered cold 0,
warm 1, hot 0 if it is 85, it would be cold 0,
warm .3 and hot .7
47Continued
- Temperature 85, humidity indicates dry .6
- hot .7, warm .3, cold 0, dry .6, not dry .4 (not
dry 1 dry 1 - .6) - Rule 1 has warm and dry
- warm is .3, dry is .6, so warm and dry
min(.3, .6) .3 - Rule 2 has warm and not dry
- min(.3, .4) .3
- Rule 3 has hot and dry min(.7, .3) .3
- our fourth and fifth rules give us 0 since cold
is 0 - Our conclusions from the first three rules are to
- decrease the coolant and increase the fan at
levels of .3 - increase the fan at level of .3
- increase the fan at .3 and increase the coolant
slightly - To combine our results, we might increase the fan
by .9 and decrease the coolant (assume increase
slightly means increase by ¼) by .3 - .3/4
.9/4 - Finally, we defuzzify decrease by .9/4 and
increase by .9 to actionable amounts
48Using Fuzzy Logic
- The most common applications for fuzzy logic are
for controllers - devices that, based on input, make minor
modifications to their settings for instance - air conditioner controller that uses the current
temperature, the desired temperature, and the
number of open vents to determine how much to
turn up or down the blower - camera aperture control (up/down, focus, negate a
shaky hand) - a subway car for braking and acceleration
- Fuzzy logic has been used for expert systems
- but the systems tend to perform poorly when more
than just a few rules are chained together - in our previous example, we just had 5
stand-alone rules - when we chain rules, the fuzzy values are
multiplied (e.g., .5 from one rule .3 from
another rule .4 from another rule, our result
is .06)
49Dempster-Shaefer Theory
- The D-S Theory goes beyond CF and Fuzzy Logic by
providing us two values to indicate the utility
of a hypothesis - belief as before, like the CF or fuzzy
membership value - plausibility adds to our belief by determining
if there is any evidence (belief) for opposing
the hypothesis - We want to know if h is a reasonable hypothesis
- we have evidence in favor of h giving us a belief
of .7 - we have no evidence against h, this would imply
that the plausibility is greater than the belief - p(h) 1 b(h) 1 (since we have no evidence
against h, h 0) - Consider two hypotheses, h1 and h2 where we have
no evidence in favor of either, so b(h1) b(h2)
.5 - we have evidence that suggests h2 is less
believable than h1 so that b(h2) .3 and
b(h1) .5 - h1 .5, .5 and h2 .5, .7 so h2 is more
believable
50Computing Multiple Beliefs
- D-S theory gives us a way to compute the belief
for any number of subsets of the hypotheses, and
modify the beliefs as new evidence is introduced - the formula to compute belief (given below) is a
bit complex - so we present an example to better understand it
- but the basic idea is this we have a belief
value for how well some piece of evidence
supports a group (subset) of hypotheses - we introduce a new evidence and multiply the
belief from the first with the belief in support
of the new evidence for those hypotheses that are
in the intersection of the two subsets
- the denominator is used to normalize the computed
beliefs, and is 1 unless the intersection
includes some null subsets
51Example
- There are four possible hypotheses for a given
patient, cold (C), flu (F), migraine (H),
meningitis (M) - we introduce a piece of evidence, m1 fever,
which supports C, F, M at .6 - we also have Q (the entire set) with support 1
- .6 .4 - now we add the evidence m2 nausea which can
support C, F, H at .7 so that Q .3 - we combine the two sets of beliefs into m3 as
follows
Since m3 has no empty sets, the denominator is 1,
so the set of values in m3 is already normalized
and we do not have to do anything else
52Continued
- When we had m1, we had two sets, C, F, M and
Q - When we combined it with m2 (with two sets of its
own,C, F, H and Q), the result was four sets - the intersection of C, F, M and C, F, H C,
F - the intersection of C, F, M and Q C, F, M
- the intersection of C, F, H and Q C, F, H
- the intersection of Q and Q Q
- We now add evidence m4 lab culture result that
suggest Meningitis, with belief .8 - m4M .8 and m4Q .2
- In adding m4, with M and Q, we intersect
these with the four intersected sets above which
results in 8 sets - shown on the next slide, with some empty sets so
our denominator will no longer be 1 and we will
have to compute it after computing the numerators
53End of Example
Sum of empty sets .336 .224 .56, the
denominator is 1 - .56 .44 m5M (.096
.144) / .44 .545 m5C, F, M .036 / .44
.082 m5 (.336 .224) / .44 .56 m5C, F
.084 / .44 .191 m5C, F, H .056 / .44
.127 m5Q .036 / .44 .055 The most
plausible explanation is because the evidence
tends to contradict (some symptoms indicate
Meningitis, another symptom indicates no
Meningitis)
54Bayesian Probabilities
- Bayes derived the following formula
- p(h E) p(E h) p(h) / sum for all i (p(E
hi) p(hi)) - the probability that h is true given evidence E
- p(h E) conditional probability
- what is the probability that h is true given the
evidence E - p(E h) evidential probability
- what is the probability that evidence E will
appear if h is true? - p(h) prior probability (or a priori
probability) - what is the probability that h is true in general
without any evidence? - the denominator normalizes the conditional
probabilities to add up to 1 - To solve a problem with Bayesian probabilities
- we need to accumulate the probabilities for all
hypotheses h1, h2, h3 of p(h1 E), p(h2 E),
p(h3 E), , p(E h1), p(E h2), p(E h3),
and p(h1), p(h2), p(h3), and then its just a
straightforward series of calculations
55Example
- The sidewalk is wet, we want to determine the
most likely cause - it rained overnight (h1)
- we ran the sprinkler overnight (h2)
- wet sidewalk (E)
- Assume the following
- there was a 50 chance of rain p(h1) .5
- sprinkler is run two nights a week p(h2) 2/7
.28 - p(wet sidewalk rain overnight) .8
- p(wet sidewalk sprinkler) .9
- Now we compute the two conditional probabilities
- p(h1 E) (.5 .8) / (.5 .8 .28 .9)
.61 - p(h2 E) (.28 .9) / (.5 .8 .28 .9)
.39
56Independent Events
- There is a flaw with our previous example
- if it is likely that it will rain, we will
probably not run the sprinkler even if it is the
night we usually run it, and if it does not rain,
we will probably be more likely to run the
sprinkler the next night - So we have to be aware of whether events are
independent or not - two events are independent if P(A B) P(A)
P(B) - where means intersect
- when P(B) ltgt 0, then P(A) P(A B)
- knowing B is true does not affect the probability
of A being true - We can also modify our computation by using the
formula for conditional independent events - P(A B C) P(A C) P(B C)
- again, is used to mean intersection
- we will expand on this shortly
57Multiple Pieces of Evidence
- In our wet sidewalk example, E consisted of one
piece of evidence, wet sidewalk - what if we have many pieces of evidence?
- Consider a diagnostic case where there are 10
possible symptoms that we might look for to
determine whether a patient has a cold (h1), flu
(h2) or sinus infection (h3) - E is some subset of e1, e2, e3, e4, e5, e6, e7,
e8, e9, e10 - To use Bayes formula, we need to know
- p(h1), p(h2), p(h3) as well as
- p(e1 h1), p(e1 h2), p(e1 h3)
- p(e2 h1), p(e2 h2), p(e2 h3)
- p(e3 h1), p(e3 h2), p(e3 h3)
58Continued
- But our patient may have several symptoms
- So we also need
- p(e1, e2 h1), p(e1, e2 h2), p(e1, e2 h3)
- p(e1, e3 h1), p(e1, e3 h2), p(e1, e3 h3)
- p(e2, e3 h1), p(e2, e3 h2), p(e2, e3 h3)
- p(e1, e2, e3 h1), p(e1, e2, e3 h2), p(e1, e2,
e3 h3) - How many different probabilities will we need?
- with 10 pieces of evidence, there are 210 1024
different combinations for E, so we will need 3
1024 3072 evidential probabilities (to go along
with the 3 prior probabilities, one for each
hypothesis) - imagine if E comprised a set of 50 pieces of
evidence instead!
59Bayesian Net
- We can apply the Bayesian formulas for
independent and conditionally dependent events in
a network form - we want to determine the likely cause for seeing
orange barrels, flashing lights and bad traffic
on the highway - two hypotheses construction, accident (see the
figure below) - notice T (bad traffic) can be caused by either
construction or an accident, orange barrels are
only evidence of construction and flashing lights
are only evidence of an accident (although it
could also be that a driver has been pulled over) - construction and accident are not directly
related to each other this will help simplify
the problem
60Dynamic Bayesian Networks
- Cause-effect situations are temporal
- at time i, an event arises and causes an event at
time i1 - the Bayesian belief network is static, it
captures a situation at a singular point in time - we need a dynamic network instead
- The dynamic Bayesian network is similar to our
previous networks except that each edge
represents not merely a dependency, but a
temporal change - when you take the branch from state i to state
i1, you are not only indicating that state i can
cause i1 but that i was at a time prior to i1
Here is a state diagram to represents possible
utterances for the word tomato Each node
represents both a sound and a segment of time