Knowledge Representations presentation

About This Presentation

Transcript and Presenter's Notes

Title: Knowledge Representations

1
Knowledge Representations

One large distinction between an AI system and a
normal piece of software is that an AI system
must reason using worldly knowledge
What types of knowledge?
Facts
Axioms
Statements (which may or may not be true)
Rules
Cases
Experiences
Associations (which may not be truth preserving)
Descriptions
Probabilities and Statistics

2
Types of Representations

Early systems used either
semantic networks or predicate calculus to
represent knowledge
or used simple search spaces if the
domain/problem had very limited amounts of
knowledge (e.g., simple planning as in blocks
world)
With the early expert systems in the 70s, a
significant shift took place to production
systems, which combined representation and
process (chaining) and even uncertainty handling
(certainty factors)
later, frames (an early version of OOP) were
introduced
Problem-specific approaches were introduced such
as scripts and CDs for language representation
In the 1980s, there was a shift from rules to
model-based approaches
Since the 1990s, Bayesian networks and hidden
Markov Models have become popular
First, we will take a brief look at some of the
representations

3
Search Spaces

Given a problem expressed as a state space
(whether explicitly or implicitly)
Formally, we define a search space as N, A, S,
GD
N set of nodes or states of a graph
A set of arcs (edges) between nodes that
correspond to the steps in the problem (the legal
actions or operators)
S a nonempty subset of N that represents start
states
GD a nonempty subset of N that represents goal
states
Our problem becomes one of traversing the graph
from a node in S to a node in GD
Example
3 missionaries and 3 cannibals are on one side of
the river with a boat that can take exactly 2
people across the river
how can we move the 3 missionaries and 3
cannibals across the river such that the
cannibals never outnumber the missionaries on
either side of the river (lest the cannibals
start eating the missionaries!)

4
M/C Solution

We can represent a state as a 6-item tuple
(a, b, c, d, e, f)
a/b number of missionaries/cannibals on left
shore
c/d number of missionaries/cannibals in boat
e/f number of missionaries/cannibals on right
shore
where a b c d e f 6
a gt b (unless a 0), c gt d (unless c 0), and
e gt f (unless e 0)
Legal operations (moves) are
0, 1, 2 missionaries get into boat
0, 1, 2 missionaries get out of boat
0, 1, 2 cannibals get into boat
0, 1, 2 missionaries get out of boat
boat sails from left shore to right shore
boat sails from right shore to left shore

5
Relationships

We often know stuff about objects (whether
physical or abstract)
These objects have attributes (components,
values) and/or relationships with other things
So, one way to represent knowledge is to
enumerate the objects and describe them through
their attributes and relationships
Common forms of such relationship representations
are
semantic networks a network consists of nodes
which are objects and values, and edges
(links/arcs) which are annotated to include how
the nodes are related
predicate calculus predicates are often
relationships and arguments for the predicates
are objects
frames in essence, objects (from
object-oriented programming) where attributes are
the data members and the values are the specific
values stored in those members in some cases,
they are pointers to other objects

6
Representations With Relationships
Here, we see the same information
being represented using two different
representational techniques a semantic network
(above) and predicates (to the left)
7
Another Example Blocks World
Here we see a real-world situation of three
blocks and a predicate calculus representation
for expressing this knowledge We equip our
system with rules such as the below rule to
reason over how to draw conclusions and
manipulate this blocks world
This rule says if there does not exist a Y that
is on X, then X is clear
8
Semantic Networks

Collins and Quillian were the first to use
semantic networks in AI by storing in the network
the objects and their relationships
their intention was to represent English
sentences
edges would typically be annotated with these
descriptors or relations

isa class/subclass
instance the first object is an instance of the
class
has contains or has this as a physical property
can has the ability to
made of, color, texture, etc

A semantic network to represent the sentences a
canary can sing/fly, a canary is a
bird/animal, a canary is a canary, a canary
has skin
9
Representing Word Meanings

Quillian demonstrated how to use the semantic
network to represent word meanings
each word would have one or more networks, with
links that attach words to their definition
planes
the word plant is represented as three planes,
each of which has links to additional word planes

10
Frames

The semantic network requires a graph
representation which may not be a very efficient
use of memory
Another representation is the frame
the idea behind a frame was originally that it
would represent a frame of memory for
instance, by capturing the objects and their
attributes for a given situation or moment in
time
a frame would contain slots where a slot could
contain
identification information (including whether
this frame is a subclass of another frame)
relationships to other frames
descriptors of this frame
procedural information on how to use this frame
(code to be executed)
defaults for slots
instance information (or an identification of
whether the frame represents a class or an
instance)

11
Frame Example
Here is a partial frame representing a hotel
room The room contains a chair, bed, and phone
where the bed contains a mattress and a bed
frame (not shown)
12
Production Systems

A production system is
a set of rules (if-then or condition-action
statements)
working memory
the current state of the problem solving, which
includes new pieces of information created by
previously applied rules
inference engine (the author calls this a
recognize-act cycle)
forward-chaining, backward-chaining, a
combination, or some other form of reasoning such
as a sponsor-selector, or agenda-driven scheduler
conflict resolution strategy
when it comes to selecting a rule, there may be
several applicable rules, which one should we
select? the choice may be based on a conflict
resolution strategy such as first rule, most
specific rule, most salient rule, rule with
most actions, random, etc

13
Chaining

The idea behind a production systems reasoning
is that rules will describe steps in the problem
solving space where a rule might
be an operation in a game like a chess move
translate a piece of input data into an
intermediate conclusion
piece together several intermediate conclusions
into a specific conclusion
translate a goal into substeps
So a solution using a production system is a
collection of rules that are chained together
forward chaining reasoning from data to
conclusions where working memory is sought for
conditions that match the left-hand side of the
given rules
backward chaining reasoning from goals to
operations where an initial goal is unfolded into
the steps needed to solve that goal, that is, the
process is one of subgoaling

14
Two Example Production Systems
15
Example System Water Jugs

Problem given a 4-gallon jug (X) and a 3-gallon
jug (Y), fill X with exactly 2 gallons of water
assume an infinite amount of water is available
Rules/operators
1. If X 0 then X 4 (fill X)
2. If Y 0 then Y 3 (fill Y)
3. If X gt 0 then X 0 (empty X)
4. If Y gt 0 then Y 0 (empty Y)
5. If X Y gt 3 and X gt 0 then X X (3 y)
and Y 3 (fill Y from X)
6. If X Y gt 4 and Y gt 0 then X 4 and Y Y
(4 X) (fill X from Y)
7. If X Y lt 3 and X gt 0 then X 0 and Y X
Y (empty X into Y)
8. If X Y lt 4 and Y gt 0 then X X Y and Y
0 (empty Y into X)
rule numbers used on the next slide

16
Conflict Resolution Strategies

In a production system, what happens when more
than one rule matches?
a conflict resolution strategy dictates how to
select from between multiple matching rules
Simple conflict resolution strategies include
random
first match
most/least recently matched rule
rule which has matched for the longest/shortest
number of cycles (refractoriness)
most salient rule (each rule is given a salience
before you run the production system)
More complex resolution strategies might
select the rule with the most/least number of
conditions (specificity/generality)
or most/least number of actions (biggest/smallest
change to the state)

17
MYCIN

By the early 1970s, the production system
approach was found to be more than adequate for
constructing large scale expert systems
in 1971, researchers at Stanford began
constructing MYCIN, a medical diagnostic system
it contained a very large rule base
it used backward chaining
to deal with the uncertainty of medical
knowledge, it introduced certainty factors (sort
of like probabilities)
in 1975, it was tested against medical experts
and performed as well or better than the doctors
it was compared to

(defrule 52 if (site culture is blood)
(gram organism is neg) (morphology organism
is rod) (burn patient is serious) then .4
(identity organism is pseudomonas))
If the culture was taken from the patients
blood and the gram of the organism is negative
and the morphology of the organism is rods and
the patient is a serious burn patient, then
conclude that the identity of the organism is
pseudomonas (.4 certainty)
18
MYCIN in Operation

Mycins process starts with diagnose-and-treat
repeat
identify all rules that can provide the
conclusion currently sought
match right hand sides (that is, search for rules
whose right hand sides match anything in working
memory)
use conflict resolution to identify a single rule
fire that rule
find and remove a piece of knowledge which is no
longer needed
find and modify a piece of knowledge now that
more specific information is known
add a new subgoal (left-hand side conditions that
need to be proved)
until the action done is added to working memory
Mycin would first identify the illness, possibly
ordering more tests to be performed, and then
given the illness, generate a treatment
Mycin consisted of about 600 rules

19
R1/XCON

Another success story is DECs R1
later renamed XCON
This system would take customer orders and
configure specific VAX computers for those orders
including
completing the order if the order was incomplete
how the various components (drive and tape units,
mother board(s), etc) would be placed inside the
mainframe cabinet)
how the wiring would take place among the various
components
R1 would perform forward chaining over about
10,000 rules
over a 6 year period, it configured some 80,000
orders with a 95-98 accuracy rating
ironically, whereas planning/design is viewed as
a backward chaining task, R1 used forward
chaining because, in this particular case, the
problem is data driven, starting with user input
of the computer systems specifications
R1s solutions were similar in quality to human
solutions

20
R1 Sample Rules

Constraint rules
if device requires battery then select battery
for device
if select battery for device then pick battery
with voltage(battery) voltage(device)
Configuration rules
if we are in the floor plan stage and there is
space for a power supply and there is no power
supply available then add a power supply to the
order
if step is configuring, propose alternatives and
there is an unconfigured device and no container
was chosen and no other device that can hold it
was chosen and selecting a container wasnt
proposed yet and no problems for selecting
containers were identified then propose selecting
a container
if the step is distributing a massbus device and
there is a single port disk drive that has not
been assigned to a massbus and there are no
unassigned dual port disk drives and the number
of devices that each massbus should support is
known and there is a massbus that has been
assigned at least one disk drive and that should
support additional disk drives and the type of
cable needed to connect the disk drive is known,
then assign the disk drive to this massbus

21
Strong Slot-n-Filler Structures

To avoid the difficulties with Frames and Nets,
Schank and Rieger offered two network-like
representations that would have implied uses and
built-in semantics conceptual dependencies and
scripts
the conceptual dependency was derived as a form
of semantic network that would have specific
types of links to be used for representing
specific pieces of information in English
sentences
the action of the sentence
the objects affected by the action or that
brought about the action
modifiers of both actions and objects
they defined 11 primitive actions, called ACTs
every possible action can be categorized as one
of these 11
an ACT would form the center of the CD, with
links attaching the objects and modifiers

22
Example CD

The sentence is John ate the egg
The INGEST act means to ingest an object (eat,
drink, swallow)
the P above the double arrow indicates past test
the INGEST action must have an object (the O
indicates it was the object Egg) and a direction
(the object went from Johns mouth to Johns
insides)
we might infer that it was an egg instead of
the egg as there is nothing specific to
indicate which egg was eaten
we might also infer that John swallowed the egg
whole as there is nothing to indicate that John
chewed the egg!

23
The CD Theory ACTs

Is this list complete?
what actions are missing?
Could we reduce this list to make it more
concise?
other researchers have developed other lists of
primitive actions including just 3 physical
actions, mental actions and abstract actions

24
Example CD Links
25
Example CDs
26
More Examples
27
Complex Example

The sentence is John prevented Mary from giving
a book to Bill
This sentence has two ACTs, DO and ATRANS
DO was not in the list of 11, but can be thought
of as caused to happen

The c/ means a negative conditional, in this case
it means that John caused this not to happen
The ATRANS is a giving relationship with the
object being a Book and the action being from
Mary to Bill Mary gave a book to Bill
like with the previous example, there is no way
of telling whether it is a book or the book

28
Scripts

The other structured representation developed by
Schank (along with Abelson) is the script
a description of the typical actions that are
involved in a typical situation
they defined a script for going to a restaurant
scripts provide an ability for default reasoning
when information is not available that directly
states that an action occurred
so we may assume, unless otherwise stated, that a
diner at a restaurant was served food, that the
diner paid for the food, and that the diner was
served by a waiter/waitress
A script would contain
entry condition(s) and results (exit conditions)
actors (the people involved)
props (physical items at the location used by the
actors)
scenes (individual events that take place)
The script would use the 11 ACTs from CD theory

29
Restaurant Script

The script does not contain atypical actions
although there are options such as whether the
customer was pleased or not
There are multiple paths through the scenes to
make for a robust script
what would a going to the movies script look
like? would it have similar props, actors,
scenes? how about going to class?

30
Knowledge Groups

One of the drawbacks of the knowledge
representations demonstrated thus far is that all
knowledge is grouped into a single, large
collection of representations
the rules taken as a whole for instance dont
denote what rules should be used in what
circumstance
Another approach is to divide the representations
into logical groupings
this permits easier design, implementation,
testing and debugging because you know what that
particular group is supposed to do and what
knowledge should go into it
it should be noted that by distributing the
knowledge, we might use different problem solving
agents for each set of knowledge so that the
knowledge is stored using different
representations

31
Knowledge Sources and Agents

Which leads us to the idea of having multiple
problem solving agents
each agent is responsible for solving some
specialized type of problem(s) and knows where to
obtain its own input
each agent has its own knowledge sources, some
internal, some external
since external agents may have their own forms of
representation, the agent must know
how to find the proper agents
how to properly communicate with these other
agents
how to interpret the information that it receives
from these agents
how to recover from a situation where the
expected agent(s) is/are not available

32
What is an Agent?

Agents are interactive problem solvers that have
these properties
situated the agent is part of the problem
solving environment it can obtain its own input
from its environment and it can affect its
environment through its output
autonomous the agent operates independently of
other agents and can control its own actions and
internal states
flexible the agent is both responsive and
proactive it can go out and find what it needs
to solve its problem(s)
social the agent can interact with other agents
including humans
Some researchers also insist that agents have
mobility have the ability to move from their
current environment to a new environment (e.g.,
migrate to another processor)
delegation hand off portions of the problem to
other agents
cooperation if multiple agents are tasked with
the same problem, can their solutions be combined?

33
The Semantic Web

The WWW is a collection of data and knowledge in
an unstructured format
Humans often can take knowledge from disparate
sources and put together a coherent picture, can
problem solving agents?
Agents on the semantic web all have their own
capabilities and know where to look for knowledge
Whether a static source, or an agent that can
provide the needed information through its own
processing, or from a human
The common approach is to model the knowledge of
a web site using an ontology
ontologies give agents the ability to translate
the results of another agent, or the data
provided from a website, into a version of
knowledge that they can understand and use

34
Knowledge Acquisition and Modeling

Expert System construction used to be a
trial-and-error sort of approach with the
knowledge engineers
once they had knowledge from the experts, they
would fill in their knowledge base and test it
out
By the end of the 80s, it was discovered that
creating an actual domain model was the way to go
build a model of the knowledge before
implementing anything
A model might be
a dependency graph of what can cause what to
happen
or an associational model which is a collection
of malfunctions and the manifestations we would
expect to see from those malfunctions
or a functional model where component parts are
enumerated and described by function and behavior
The emphasis changed to knowledge acquisition
tools (KADS)
domain experts enter their knowledge as a
graphical model that contains the component parts
of the item being diagnosed/designed, their
functions, and rules for deciding how to diagnose
or design each one

35
A NASA Example

Here is a model developed by NASA for a
Livingston propulsion system for rockets
a reactive self-configuring autonomous system
knowledge modeled using propositional calc
(instead of predicate calc there are a finite
number of elements, each will be modeled by its
own proposition)

Helium is the fuel tank Oxidizer is mixed to
cause the fuel to burn Acc is the accelerometer
which, along with sensors in the valves, is used
as input to control the system Pryo valves are
used as control once they Change state, they
stay in that state so they are used to change
the flow of fuel when an error is detected,
opening or closing a new pathway from tank to
engine
36
Model (Architecture) for the System

The idea is that the configuration manager tries
to keep the spacecraft moving but at the lowest
cost configuration
Sensors feed into the ME (mode estimator) to
determine if the system is functioning and in the
lowest configuration
If not, the MR (mode reconfiguration) plans a new
mode by determining what valves to open and close
Since this is a spacecraft, the output of the MR
is a set of actions that cause valves to open or
close directly

The high level planner generates a sequence of
hardware configurations goals such as the amount
of propellant that should be used , it is the
configuration manager that must translate these
goals into actions
37
VT an Elevators Design
The design of an elevator can be used to
generate a diagnostic system for elevator
problems, or in VTs case, a system that
can design new elevators
38
Reasoning with Uncertainty

Representations generally represent knowledge as
fact
However, often, knowledge and the use of the
knowledge brings with it a degree of uncertainty
how can we represent and reason with uncertainty?
We find two forms of uncertainty
unsure input
unknown do not know the answer so you have to
say unknown
unclear answer doesnt fit question (e.g., not
yes but 80 yes)
vague data is a 100 degree temp a high fever
or just fever?
ambiguous/noisy data data may not be easily
interpretable
non-truth preserving knowledge (most rules are
associational, not truth preserving)
unlike if you are a man then you are mortal, a
doctor might reason from symptoms to diseases
all men are mortal denotes a class/subclass
relationship, which is truth preserving
but the symptom to disease reasoning is based on
associations and is not guaranteed to be true

39
Certainty Factors

First used in the Mycin system, the idea is that
we will attribute a measure of belief to any
conclusion that we draw
CF(H E) MB(H E) MD(H E)
certainty factor for hypothesis H given evidence
E is the measure of belief we have for H minus
measure of disbelief we have for H
CFs are applied to hypotheses that are drawn from
rules
CFs can be combined as we associate a CF with
each condition and each conclusion of each rule
To use CFs, we need
to annotate every rule with a CF value (this
comes from the expert)
ways to combine CFs when we use AND, OR, ?
Combining rules are straightforward
for AND use min
for OR use max
for ? use (multiplication)

40
CF Example

Assume we have the following rules
A ? B (.7)
A ? C (.4)
D ? F (.6)
B AND G ? E (.8)
C OR F ? H (.5)
We know A, D and G are true (so each have a value
of 1.0)
B is .7 (A is 1.0, the rule is true at .7, so B
is true at 1.0 .7 .7)
C is .4
F is .6
B AND G is min(.7, 1.0) .7 (G is 1.0, B is .7)
E is .7 .8 .56
C OR F is max(.4, .6) .6
H is .6 .5 .30

41
Continued

Another combining rule is needed when we can
conclude the same hypothesis from two or more
rules
we already used C OR F ? H (.5) to conclude H
with a CF of .30
lets assume that we also have the rule E ? H
(.5)
since E is .56, we have H at .56 .5 .28
We now believe H at .30 and at .28, which is
true?
the two rules both support H, so we want to draw
a stronger conclusion in H since we have two
independent means of support for H
We will use the formula CF1 CF2 CF1CF2
CF(H) .30 .28 - .30 .28 .496
our belief in H has been strengthened through two
different chains of logic

42
Fuzzy Logic

Prior to CFs, Zadeh introduced fuzzy logic to
introduce shades of grey into logic
other logics are two-valued, true or false only
Here, any proposition can take on a value in the
interval 0, 1
Being a logic, Zadeh introduced the algebra to
support logical operators of AND, OR, NOT, ?
X AND Y min(X, Y)
X OR Y max(X, Y)
NOT X (1 X)
X ? Y X Y
Where the values of X, Y are determined by where
they fall in the interval 0, 1

43
Fuzzy Set Theory

Fuzzy sets are to normal sets what fuzzy logic is
to logic
fuzzy set theory is based on fuzzy values from
fuzzy logic but includes set operations instead
of logic operations
The basis for fuzzy sets is defining a fuzzy
membership function for a set
a fuzzy set is a set of items along with their
membership values in the set where the membership
value defines how closely that item is to being
in that set
Example the set tall might be denoted as
tall x f(x) 1.0 if x gt 62, .8 if x gt
6, .6 if x gt 510, .4 if x gt 58, .2 if x gt
56, 0 otherwise
so we can say that a person is tall at .8 if they
are 61 or we can say that the set of tall
people are Anne/.2, Bill/1.0, Chuck/.6, Fred/.8,
Sue/.6

44
Fuzzy Membership Function

Typically, a membership function is a continuous
function (often represented in a graph form like
above)
given a value y, the membership value for y is
u(y), determined by tracing the curve and seeing
where it falls on the u(x) axis
How do we define a membership function?
this is an open question

45
Using Fuzzy Logic/Sets

1. fuzzify the input(s) using fuzzy membership
functions
2. apply fuzzy logic rules to draw conclusions
we use the previous rules for AND, OR, NOT, ?
3. if conclusions are supported by multiple
rules, combine the conclusions
like CF, we need a combining function, this may
be done by computing a center of gravity using
calculus
4. defuzzify conclusions to get specific
conclusions
defuzzification requires translating a numeric
value into an actionable item
Fuzzy logic is often applied to domains where we
can easily derive fuzzy membership functions and
have a few rules but not a lot
fuzzy logic begins to break down when we have
more than a dozen or two rules

46
Example

We have an atmospheric controller which can
increase or decrease the temperature of the air
and can increase or decrease the fan based on
these simple rules
if air is warm and dry, decrease the fan and
increase the coolant
if air is warm and not dry, increase the fan
if air is hot and dry, increase the fan and the
increase the coolant slightly
if air is hot and not dry, increase the fan and
coolant
if air is cold, turn off the fan and decrease the
coolant
Our input obviously requires the air temperature
and the humidity, the membership function for air
temperature is shown to the right

if it is 60, it would be considered cold 0,
warm 1, hot 0 if it is 85, it would be cold 0,
warm .3 and hot .7
47
Continued

Temperature 85, humidity indicates dry .6
hot .7, warm .3, cold 0, dry .6, not dry .4 (not
dry 1 dry 1 - .6)
Rule 1 has warm and dry
warm is .3, dry is .6, so warm and dry
min(.3, .6) .3
Rule 2 has warm and not dry
min(.3, .4) .3
Rule 3 has hot and dry min(.7, .3) .3
our fourth and fifth rules give us 0 since cold
is 0
Our conclusions from the first three rules are to
decrease the coolant and increase the fan at
levels of .3
increase the fan at level of .3
increase the fan at .3 and increase the coolant
slightly
To combine our results, we might increase the fan
by .9 and decrease the coolant (assume increase
slightly means increase by ¼) by .3 - .3/4
.9/4
Finally, we defuzzify decrease by .9/4 and
increase by .9 to actionable amounts

48
Using Fuzzy Logic

The most common applications for fuzzy logic are
for controllers
devices that, based on input, make minor
modifications to their settings for instance
air conditioner controller that uses the current
temperature, the desired temperature, and the
number of open vents to determine how much to
turn up or down the blower
camera aperture control (up/down, focus, negate a
shaky hand)
a subway car for braking and acceleration
Fuzzy logic has been used for expert systems
but the systems tend to perform poorly when more
than just a few rules are chained together
in our previous example, we just had 5
stand-alone rules
when we chain rules, the fuzzy values are
multiplied (e.g., .5 from one rule .3 from
another rule .4 from another rule, our result
is .06)

49
Dempster-Shaefer Theory

The D-S Theory goes beyond CF and Fuzzy Logic by
providing us two values to indicate the utility
of a hypothesis
belief as before, like the CF or fuzzy
membership value
plausibility adds to our belief by determining
if there is any evidence (belief) for opposing
the hypothesis
We want to know if h is a reasonable hypothesis
we have evidence in favor of h giving us a belief
of .7
we have no evidence against h, this would imply
that the plausibility is greater than the belief
p(h) 1 b(h) 1 (since we have no evidence
against h, h 0)
Consider two hypotheses, h1 and h2 where we have
no evidence in favor of either, so b(h1) b(h2)
.5
we have evidence that suggests h2 is less
believable than h1 so that b(h2) .3 and
b(h1) .5
h1 .5, .5 and h2 .5, .7 so h2 is more
believable

50
Computing Multiple Beliefs

D-S theory gives us a way to compute the belief
for any number of subsets of the hypotheses, and
modify the beliefs as new evidence is introduced
the formula to compute belief (given below) is a
bit complex
so we present an example to better understand it
but the basic idea is this we have a belief
value for how well some piece of evidence
supports a group (subset) of hypotheses
we introduce a new evidence and multiply the
belief from the first with the belief in support
of the new evidence for those hypotheses that are
in the intersection of the two subsets

the denominator is used to normalize the computed
beliefs, and is 1 unless the intersection
includes some null subsets

51
Example

There are four possible hypotheses for a given
patient, cold (C), flu (F), migraine (H),
meningitis (M)
we introduce a piece of evidence, m1 fever,
which supports C, F, M at .6
we also have Q (the entire set) with support 1
- .6 .4
now we add the evidence m2 nausea which can
support C, F, H at .7 so that Q .3
we combine the two sets of beliefs into m3 as
follows

Since m3 has no empty sets, the denominator is 1,
so the set of values in m3 is already normalized
and we do not have to do anything else
52
Continued

When we had m1, we had two sets, C, F, M and
Q
When we combined it with m2 (with two sets of its
own,C, F, H and Q), the result was four sets
the intersection of C, F, M and C, F, H C,
F
the intersection of C, F, M and Q C, F, M
the intersection of C, F, H and Q C, F, H
the intersection of Q and Q Q
We now add evidence m4 lab culture result that
suggest Meningitis, with belief .8
m4M .8 and m4Q .2
In adding m4, with M and Q, we intersect
these with the four intersected sets above which
results in 8 sets
shown on the next slide, with some empty sets so
our denominator will no longer be 1 and we will
have to compute it after computing the numerators

53
End of Example
Sum of empty sets .336 .224 .56, the
denominator is 1 - .56 .44 m5M (.096
.144) / .44 .545 m5C, F, M .036 / .44
.082 m5 (.336 .224) / .44 .56 m5C, F
.084 / .44 .191 m5C, F, H .056 / .44
.127 m5Q .036 / .44 .055 The most
plausible explanation is because the evidence
tends to contradict (some symptoms indicate
Meningitis, another symptom indicates no
Meningitis)
54
Bayesian Probabilities

Bayes derived the following formula
p(h E) p(E h) p(h) / sum for all i (p(E
hi) p(hi))
the probability that h is true given evidence E
p(h E) conditional probability
what is the probability that h is true given the
evidence E
p(E h) evidential probability
what is the probability that evidence E will
appear if h is true?
p(h) prior probability (or a priori
probability)
what is the probability that h is true in general
without any evidence?
the denominator normalizes the conditional
probabilities to add up to 1
To solve a problem with Bayesian probabilities
we need to accumulate the probabilities for all
hypotheses h1, h2, h3 of p(h1 E), p(h2 E),
p(h3 E), , p(E h1), p(E h2), p(E h3),
and p(h1), p(h2), p(h3), and then its just a
straightforward series of calculations

55
Example

The sidewalk is wet, we want to determine the
most likely cause
it rained overnight (h1)
we ran the sprinkler overnight (h2)
wet sidewalk (E)
Assume the following
there was a 50 chance of rain p(h1) .5
sprinkler is run two nights a week p(h2) 2/7
.28
p(wet sidewalk rain overnight) .8
p(wet sidewalk sprinkler) .9
Now we compute the two conditional probabilities
p(h1 E) (.5 .8) / (.5 .8 .28 .9)
.61
p(h2 E) (.28 .9) / (.5 .8 .28 .9)
.39

56
Independent Events

There is a flaw with our previous example
if it is likely that it will rain, we will
probably not run the sprinkler even if it is the
night we usually run it, and if it does not rain,
we will probably be more likely to run the
sprinkler the next night
So we have to be aware of whether events are
independent or not
two events are independent if P(A B) P(A)
P(B)
where means intersect
when P(B) ltgt 0, then P(A) P(A B)
knowing B is true does not affect the probability
of A being true
We can also modify our computation by using the
formula for conditional independent events
P(A B C) P(A C) P(B C)
again, is used to mean intersection
we will expand on this shortly

57
Multiple Pieces of Evidence

In our wet sidewalk example, E consisted of one
piece of evidence, wet sidewalk
what if we have many pieces of evidence?
Consider a diagnostic case where there are 10
possible symptoms that we might look for to
determine whether a patient has a cold (h1), flu
(h2) or sinus infection (h3)
E is some subset of e1, e2, e3, e4, e5, e6, e7,
e8, e9, e10
To use Bayes formula, we need to know
p(h1), p(h2), p(h3) as well as
p(e1 h1), p(e1 h2), p(e1 h3)
p(e2 h1), p(e2 h2), p(e2 h3)
p(e3 h1), p(e3 h2), p(e3 h3)

58
Continued

But our patient may have several symptoms
So we also need
p(e1, e2 h1), p(e1, e2 h2), p(e1, e2 h3)
p(e1, e3 h1), p(e1, e3 h2), p(e1, e3 h3)
p(e2, e3 h1), p(e2, e3 h2), p(e2, e3 h3)
p(e1, e2, e3 h1), p(e1, e2, e3 h2), p(e1, e2,
e3 h3)
How many different probabilities will we need?
with 10 pieces of evidence, there are 210 1024
different combinations for E, so we will need 3
1024 3072 evidential probabilities (to go along
with the 3 prior probabilities, one for each
hypothesis)
imagine if E comprised a set of 50 pieces of
evidence instead!

59
Bayesian Net

We can apply the Bayesian formulas for
independent and conditionally dependent events in
a network form
we want to determine the likely cause for seeing
orange barrels, flashing lights and bad traffic
on the highway
two hypotheses construction, accident (see the
figure below)
notice T (bad traffic) can be caused by either
construction or an accident, orange barrels are
only evidence of construction and flashing lights
are only evidence of an accident (although it
could also be that a driver has been pulled over)
construction and accident are not directly
related to each other this will help simplify
the problem

60
Dynamic Bayesian Networks

Cause-effect situations are temporal
at time i, an event arises and causes an event at
time i1
the Bayesian belief network is static, it
captures a situation at a singular point in time
we need a dynamic network instead
The dynamic Bayesian network is similar to our
previous networks except that each edge
represents not merely a dependency, but a
temporal change
when you take the branch from state i to state
i1, you are not only indicating that state i can
cause i1 but that i was at a time prior to i1

Here is a state diagram to represents possible
utterances for the word tomato Each node
represents both a sound and a segment of time

Write a Comment

User Comments (0)

About PowerShow.com

Knowledge Representations PowerPoint PPT Presentation