Python logic

About This Presentation

Transcript and Presenter's Notes

Title: Python logic

1
Tell me what you do with witches? Burn And what
do you burn apart from witches? More witches!
Shh! Wood! So, why do witches burn? pause
B--... 'cause they're made of... wood? Good!
Heh heh. Oh, yeah. Oh. So, how do we tell
whether she is made of wood? . Does wood sink
in water? No. No. No, it floats! It floats!
Throw her into the pond! The pond! Throw her
into the pond! What also floats in water?
Bread! Apples! Uh, very small rocks! ARTHUR
A duck! CROWD Oooh. BEDEVERE Exactly. So,
logically... VILLAGER 1 If... she... weighs...
the same as a duck,... she's made of
wood. BEDEVERE And therefore? VILLAGER 2 A
witch! VILLAGER 1 A witch!
Python logic
2
Problematic scenarios for hill-climbing
Solution(s) ? Random restart hill-climbing
? Do the non-greedy thing with some
probability pgt0 ? Use simulated annealing
Ridges

When the state-space landscape has
local minima, any search that moves
only in the greedy direction cannot be
(asymptotically) complete
Random walk, on the other hand, is
asymptotically complete
Idea Put random walk into greedy hill-climbing

3
The middle ground between hill-climbing and
systematic search

Hill-climbing has a lot of freedom in deciding
which node to expand next. But it is incomplete
even for finite search spaces.
Good for problems which have solutions, but the
solutions are non-uniformly clustered.
Systematic search is complete (because its search
tree keeps track of the parts of the space that
have been visited).
Good for problems where solutions may not exist,
Or the whole point is to show that there are no
solutions (e.g. propositional entailment problem
to be discussed later).
or the state-space is densely connected (making
repeated exploration of states a big issue).

Smart idea Try the middle ground between the two?
4
Tabu Search

A variant of hill-climbing search that attempts
to reduce the chance of revisiting the same
states
Idea
Keep a Tabu list of states that have been
visited in the past.
Whenever a node in the local neighborhood is
found in the tabu list, remove it from
consideration (even if it happens to have the
best heuristic value among all neighbors)
Properties
As the size of the tabu list grows, hill-climbing
will asymptotically become non-redundant (wont
look at the same state twice)
In practice, a reasonable sized tabu list (say
100 or so) improves the performance of hill
climbing in many problems

Hill climbing ? O(1) space complexity! ? but
has no termination or completeness
guarantee (because it doesnt know
where it has been, it can loop even in
finite search spaces)
5
Making Hill-Climbing Asymptotically Complete

Random restart hill-climbing
Keep some bound B. When you made more than B
moves, reset the search with a new random initial
seed. Start again.
Getting random new seed in an implicit search
space is non-trivial!
In 8-puzzle, if you generate a random state by
making random moves from current state, you are
still not truly random (as you will continue to
be in one of the two components)
biased random walk Avoid being greedy when
choosing the seed for next iteration
With probability p, choose the best child but
with probability (1-p) choose one of the children
randomly
Use simulated annealing
Similar to the previous ideathe probability p
itself is increased asymptotically to one (so you
are more likely to tolerate a non-greedy move in
the beginning than towards the end)

With random restart or the biased random walk
strategies, we can solve very large problems
million queen problems in under minutes!
6
Ideas for improving convergence -- Random
restart hill-climbing After every N
iterations, start with a completely
random assignment --Probabilistic
greedy -with probability p do what
the greedy strategy suggests -with
probability (1-p) pick a random variable
and change its value randomly
-- p can increase as the search
progresses
A greedier version of the above (pick both the
best var and val) For each variable v, let
l(v) be the value that it can take so that
the number of conflicts are minimized. Let n(v)
be the number of conflicts with this value.
--Pick the variable v with the
lowest n(v) value. --Assign it the
value l(v)
1
2
This one basically searches the 1-neighborhood of
the current assignment (where k-neighborhood is
all assignments that differ from the current
assignment in atmost k-variable values)
7
Model-checking by Stochastic Hill-climbing
Clauses 1. (p,s,u) 2. (p, q) 3. (q, r)
4. (q,s,t) 5. (r,s) 6. (s,t) 7. (s,u)
Applying min-conflicts idea to Satisfiability

Start with a model (a random t/f assignment to
propositions)
For I 1 to max_flips do
If model satisfies clauses then return model
Else clause a randomly selected clause from
clauses that is false in model
With probability p whichever symbol in clause
maximizes the number of satisfied clauses
/greedy step/
With probability (1-p) flip the value in model of
a randomly selected symbol from clause /random
step/
Return Failure

Consider the assignment all false -- clauses
1 (p,s,u) 5 (r,s) are violated --Pick
onesay 5 (r,s) if we flip r, 1 (remains)
violated if we flip s, 4,6,7 are violated
So, greedy thing is to flip r we get all
false, except r otherwise, pick either
randomly
Remarkably good in practice!! --So good that
people startedwondering if there actually are any
hard problems out there
8
If most sat problems are easy, then exactly
where are the hard ones?
?
9
Hardness of 3-sat as a function of
clauses/variables
Probability that there is a satisfying
assignment
Cost of solving (either by finding a solution
or showing there aint one)
4.3
clauses/variables
10
Phase Transition in SAT
Theoretically we only know that phase transition
ratio occurs between 3.26 and 4.596.
Experimentally, it seems to be close to 4.3 (We
also have a proof that 3-SAT has sharp threshold)
11
Progress in nailing the bound.. (just FYI)
http//www.ipam.ucla.edu/publications/ptac2002/pta
c2002_dachlioptas_formulas.pdf
12
Beam search for Hill-climbing

Hill climbing, as described, uses one seed
solution that is continually updated
Why not use multiple seeds?
Stochastic hill-climbing uses multiple seeds (k
seeds kgt1). In each iteration, the neighborhoods
of all k seeds are evaluated. From the
neighborhood, k new seeds are selected
probabilistically
The probability that a seed is selected is
proportional to how good it is.
Not the same as running k hill-climbing searches
in parallel
Stochastic hill-climbing is sort of almost
close to the way evolution seems to work with one
difference
Define the neighborhood in terms of the
combination of pairs of current seeds (Sexual
reproduction Crossover)
The probability that a seed from current
generation gets to mate to produce offspring in
the next generation is proportional to the seeds
goodness
To introduce randomness do mutation over the
offspring
This type of stochastic beam-search hillclimbing
algorithms are called Genetic algorithms.
Genetic algorithms limit number of matings to
keep the num seeds the same

13
Illustration of Genetic Algorithms in Action
Very careful modeling needed so the things
emerging from crossover and mutation are
still potential seeds (and not monkeys
typing Hamlet) Is the genetic metaphor
really buying anything?
14
Hill-climbing in continuous search spaces
Example cube root Finding using newton- Raphson
approximation

Gradient descent (that you study in calculus of
variations) is a special case of hill-climbing
search applied to continuous search spaces
The local neighborhood is defined in terms of the
gradient or derivative of the error function.
Since the error function gradient will be zero
near the minimum, and higher farther from it, you
tend to take smaller steps near the minimum and
larger steps farther away from it. just as you
would want
Gradient descent is guranteed to converge to the
global minimum if alpha (see on the right) is
small, and the error function is uni-modal
(I.e., has only one minimum).
Versions of gradient-descent algorithms will be
used in neuralnetwork learning.
Unfortunately, the error function is NOT unimodal
for multi-layer neural networks. So, you will
have to change the gradient descent with ideas
such as simulated annealing to increase the
chance of reaching global minimum.

Err x3-a
a1/3
xo
X?
Tons of variations based on how alpha is set
15
Origins of gradient descentNewton-Raphson
applied to function minimization

Newton-Raphson method is used for finding roots
of a polynomial
To find roots of g(x), we start with some value
of x and repeatedly do
x ? x g(x)/g(x)
To minimize a function f(x), we need to find the
roots of the equation f(x)0
X ? x f(x)/f(x)
If x is a vector then
X ? x f(x)/f(x)

Because hessian is costly to Compute (will have
n2 double Derivative entries for an
n-dimensional vector), we try approximations
f(x)
D
Hf(x)
16
Between Hill-climbing and systematic search

You can reduce the freedom of hill-climbing
search to make it more complete
Tabu search
You can increase the freedom of systematic search
to make it more flexible in following local
gradients
Random restart search

17
Tabu Search

A variant of hill-climbing search that attempts
to reduce the chance of revisiting the same
states
Idea
Keep a Tabu list of states that have been
visited in the past.
Whenever a node in the local neighborhood is
found in the tabu list, remove it from
consideration (even if it happens to have the
best heuristic value among all neighbors)
Properties
As the size of the tabu list grows, hill-climbing
will asymptotically become non-redundant (wont
look at the same state twice)
In practice, a reasonable sized tabu list (say
100 or so) improves the performance of hill
climbing in many problems

18
Random restart search

Because of the random permutation, every time
the search is restarted, you are likely to follow
different paths through the search tree. This
allows you to recover from the bad initial moves.
The higher the cutoff value the lower the amount
of restarts (and thus the lower the freedom to
explore different paths).
When cutoff is infinity, random restart search is
just normal depth-first searchit will be
systematic and complete
For smaller values of cutoffs, the search has
higher freedom, but no guarantee of completeness
A strategy to guarantee asymptotic completeness
Start with a low cutoff value, but keep
increasing it as time goes on.
Random restart search has been shown to be very
good for problems that have a reasonable
percentage of easy to find solutions (such
problems are said to exhibit heavy-tail
phenomenon). Many real-world problems have this
property.

Variant of depth-first search where
When a node is expanded, its children are first
randomly permuted before being introduced into
the open list
The permutation may well be a biased random
permutation
Search is restarted from scratch anytime a
cutoff parameter is exceeded
There is a Cutoff (which may be in terms of
of backtracks, of nodes expanded or amount of
time elapsed)

19
Tell me what you do with witches? Burn And what
do you burn apart from witches? More witches!
Shh! Wood! So, why do witches burn? pause
B--... 'cause they're made of... wood? Good!
Heh heh. Oh, yeah. Oh. So, how do we tell
whether she is made of wood? . Does wood sink
in water? No. No. No, it floats! It floats!
Throw her into the pond! The pond! Throw her
into the pond! What also floats in water?
Bread! Apples! Uh, very small rocks! ARTHUR
A duck! CROWD Oooh. BEDEVERE Exactly. So,
logically... VILLAGER 1 If... she... weighs...
the same as a duck,... she's made of
wood. BEDEVERE And therefore? VILLAGER 2 A
witch! VILLAGER 1 A witch!
Python logic
20
(No Transcript)
21
Representation
Reasoning
22
(No Transcript)
23
Facts Objects relations
FOPC
Prob FOPC
Ontological commitment
Prob prop logic
Prop logic
facts
t/f/u
Deg belief
Epistemological commitment
Assertions t/f
24
Think of a sentence as the stand-in for a set of
worlds (where it is true)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
Proof by model checking
KBa
False False False False False False False False
So, to check if KB entails a, negate a, add it
to the KB, try to show that the resultant
(propositional) theory has no solutions (must
have to use systematic methods)
33
Connection between Entailment and Satisfiability

The Boolean Satisfiability problem is closely
connected to Propositional entailment
Specifically, propositional entailment is the
conjugate problem of boolean satisfiability
(since we have to show that KB f has no
satisfying model to show that KB f)
Of late, our ability to solve very large scale
satisfiability problems has increased quite
significantly

34
Entailment Satisfiability

SAT (boolean satisfiability) problem
Given a set of propositions
And a set of (CNF) clauses
Find a model (an assignment of t/f values to
propositions) that satisfies all clauses
k-SAT is a SAT problem where all clauses are
length less than or equal to k
SAT is NP-complete
1-SAT and 2-SAT are polynomial
k-SAT for kgt 2 is NP-complete (so 3-SAT is the
smallest k-SAT that is NP-Complete)
If we have a procedure for solving SAT problems,
we can use it to compute entailment
If the sentence S is entailed, if negation of S,
when added to the KB, gives a SAT theory that is
unsatisfiable (NO MODEL)
CO-NP-Complete
SAT is useful for modeling many other
assignment problems
We will see use of SAT for planning it can also
be used for Graph coloring, n-queens, Scheduling
and Circuit verification etc (the last thing
makes SAT VERY interesting for Electrical
Engineering folks)
Our ability to solve very large scale SAT
problems has increased quite phenomenally in the
recent years
We can solve SAT instances with millions of
variables and clauses very easily
To use this technology for inference, we will
have to consider systematic SAT solvers.

35
Davis-Putnam-Logeman-Loveland Procedure
?detect failure
36
DPLL Example
Pick p set ptrue unit propagation
(p,s,u) satisfied (remove) p(p,q) ? q
derived set qT (p,q) satisfied
(remove) (q,s,t) satisfied (remove)
q(q,r)?r derived set rT (q,r)
satisfied (remove) (r,s) satisfied
(remove) pure literal elimination in
all the remaining clauses, s occurs negative
set sTrue (i.e. sFalse) At this point
all clauses satisfied. Return
pT,qTrTsFalse
Clauses (p,s,u) (p, q) (q, r) (q,s,t)
(r,s) (s,t) (s,u)
37
Lots of work in SAT solvers

DPLL was the first (late 60s)
Circa 1994 came GSAT (hill climbing search for
SAT)
Circa 1997 came SATZ
Circa 1998-99 came RelSAT
2000 came CHAFF
Current best can be found at
http//www.satlive.org/SATCompetition/2003/results
.html

38
Inference rules
Kb true but theorem not true ?

Sound (but incomplete)
Modus Ponens
AgtB, A B
Modus tollens
AgtB,B A
Abduction (??)
A gt B,A B
Chaining
AgtB,BgtC AgtC

Complete (but unsound)
Python logic

A B AgtB KB A
T T T F F
T F F F F
F T T F T
F F T T T
How about SOUND COMPLETE? --Resolution
(needs normal forms)
39
Need something that does case analysis
If WMDs are found, the war is justified
WgtJ If WMDs are not found, the war is still
justified WgtJ Is the war justified anyway?
J? Can Modus Ponens derive it?
40
Need something that does case analysis
If WMDs are found, the war is justified
WgtJ If WMDs are not found, the war is still
justified WgtJ Is the war justified anyway?
J? Can Modus Ponens derive it?
41
Modus ponens, Modus Tollens etc are special
cases of resolution!
Forward apply resolution steps until the
fact f you want to prove appears as a resolvent
Backward (Resolution Refutation) Add negation
of the fact f you want to derive to KB
apply resolution steps until you derive an
empty clause
42
If WMDs are found, the war is justified W V
J If WMDs are not found, the war is still
justified W V J Is the war justified anyway?
J?
43
Resolution does case analysis
If WMDs are found, the war is justified W V
J If WMDs are not found, the war is still
justified W V J Either WMDs are found or they
are not found W V W Is the war justified
anyway? J?
44
Aka the product of sums form From CSE/EEE 120
Aka the sum of products form
Prolog without variables and without the cut
operator Is doing horn-clause theorem proving
For any KB in horn form, modus ponens is a
sound and complete inference
45
Conversion to CNF form
ANY propositional logic sentence can be converted
into CNF form Try (PQ)gt(R V W)

CNF clause Disjunction of literals
Literal a proposition or a negated proposition
Conversion
Remove implication
Pull negation in
Use demorgans laws to distribute disjunction over
conjunction
Separate conjunctions
into clauses

46
Need for resolution
Resolution does case analysis
Yankees win, it is Destiny YVD Dbacks win,
it is Destiny Db V D Yankees or Dbacks win
Y V Db Is it Destiny either way? D? Can
Modus Ponens derive it? Not until Sunday, when
Db won
47
Solving problems using propositional logic

Need to write what you know as propositional
formulas
Theorem proving will then tell you whether a
given new sentence will hold given what you know
Three kinds of queries
Is my knowledge base consistent? (i.e. is there
at least one world where everything I know is
true?) Satisfiability
Is the sentence S entailed by my knowledge base?
(i.e., is it true in every world where my
knowledge base is true?)
Is the sentence S consistent/possibly true with
my knowledge base? (i.e., is S true in at least
one of the worlds where my knowledge base holds?)
S is consistent if S is not entailed
But cannot differentiate between degrees of
likelihood among possible sentences

48
Steps in Resolution Refutation
Is there search in inference? Yes!! Many
possible inferences can be done Only few are
actually relevant --Idea Set of Support
At least one of the resolved
clauses is a goal clause, or
a descendant of a clause
derived from a goal clause -- Used in the
example here!!

Consider the following problem
If the grass is wet, then it is either raining or
the sprinkler is on
GW gt R V SP GW V R V SP
If it is raining, then Timmy is happy
R gt TH R V TH
If the sprinklers are on, Timmy is happy
SP gt TH SP V TH
If timmy is happy, then he sings
TH gt SG TH V SG
Timmy is not singing
SG SG
Prove that the grass is not wet
GW? GW

49
Search in Resolution

Convert the database into clausal form Dc
Negate the goal first, and then convert it into
clausal form DG
Let D Dc DG
Loop
Select a pair of Clauses C1 and C2 from D
Different control strategies can be used to
select C1 and C2 to reduce number of resolutions
tries
Idea 1 Set of Support At least one of C1 or C2
must be either the goal clause or a clause
derived by doing resolutions on the goal clause
(COMPLETE)
Idea 2 Linear input form Atleast one of C1 or
C2 must be one of the clauses in the input KB
(INCOMPLETE)
Resolve C1 and C2 to get C12
If C12 is empty clause, QED!! Return Success (We
proved the theorem )
D D C12
End loop
If we come here, we couldnt get empty clause.
Return Failure
Finiteness is guaranteed if we make sure that
we never resolve the same pair of clauses more
than once AND
we use factoring, which removes multiple copies
of literals from a clause (e.g. QVPVP gt QVP)

50
Mad chase for empty clause

You must have everything in CNF clauses before
you can resolve
Goal must be negated first before it is converted
into CNF form
Goal (the fact to be proved) may become converted
to multiple clauses (e.g. if we want to prove P V
Q, then we get two clauses P Q to add to the
database
Resolution works by resolving away a single
literal and its negation
PVQ resolved with P V Q is not empty!
In fact, these clauses are not inconsistent (P
true and Q false will make sure that both clauses
are satisfied)
PVQ is negation of P Q. The latter will
become two separate clauses--P , Q. So, by
doing two separate resolutions with these two
clauses we can derive empty clause

51
Complexity of Propositional Inference

Any sound and complete inference procedure has to
be Co-NP-Complete (since model-theoretic
entailment computation is Co-NP-Complete (since
model-theoretic satisfiability is NP-complete))
Given a propositional database of size d
Any sentence S that follows from the database by
modus ponens can be derived in linear time
If the database has only HORN sentences
(sentences whose CNF form has at most one ve
clause e.g. A B gt C), then MP is complete for
that database.
PROLOG uses (first order) horn sentences
Deriving all sentences that follow by resolution
is Co-NP-Complete (exponential)
Anything that follows by unit-resolution can be
derived in linear time.
Unit resolution At least one of the clauses
should be a clause of length 1

52
Example

Pearl lives in Los Angeles. It is a high-crime
area. Pearl installed a burglar alarm. He asked
his neighbors John Mary to call him if they
hear the alarm. This way he can come home if
there is a burglary. Los Angeles is also
earth-quake prone. Alarm goes off when there is
an earth-quake.

Burglary gt Alarm
Earth-Quake gt Alarm
Alarm gt John-calls
Alarm gt Mary-calls
If there is a burglary, will Mary call?
Check KB E M
If Mary didnt call, is it possible that Burglary
occurred?
Check KB M doesnt entail B

53
Example (Real)

Pearl lives in Los Angeles. It is a high-crime
area. Pearl installed a burglar alarm. He asked
his neighbors John Mary to call him if they
hear the alarm. This way he can come home if
there is a burglary. Los Angeles is also
earth-quake prone. Alarm goes off when there is
an earth-quake.
Pearl lives in real world where (1) burglars can
sometimes disable alarms (2) some earthquakes may
be too slight to cause alarm (3) Even in Los
Angeles, Burglaries are more likely than Earth
Quakes (4) John and Mary both have their own
lives and may not always call when the alarm goes
off (5) Between John and Mary, John is more of a
slacker than Mary.(6) John and Mary may call even
without alarm going off

Burglary gt Alarm
Earth-Quake gt Alarm
Alarm gt John-calls
Alarm gt Mary-calls
If there is a burglary, will Mary call?
Check KB E M
If Mary didnt call, is it possible that Burglary
occurred?
Check KB M doesnt entail B
John already called. If Mary also calls, is it
more likely that Burglary occurred?
You now also hear on the TV that there was an
earthquake. Is Burglary more or less likely now?

54
Example (Real)

Pearl lives in Los Angeles. It is a high-crime
area. Pearl installed a burglar alarm. He asked
his neighbors John Mary to call him if they
hear the alarm. This way he can come home if
there is a burglary. Los Angeles is also
earth-quake prone. Alarm goes off when there is
an earth-quake.
Pearl lives in real world where (1) burglars can
sometimes disable alarms (2) some earthquakes may
be too slight to cause alarm (3) Even in Los
Angeles, Burglaries are more likely than Earth
Quakes (4) John and Mary both have their own
lives and may not always call when the alarm goes
off (5) Between John and Mary, John is more of a
slacker than Mary.(6) John and Mary may call even
without alarm going off

Burglary gt Alarm
Earth-Quake gt Alarm
Alarm gt John-calls
Alarm gt Mary-calls
If there is a burglary, will Mary call?
Check KB E M
If Mary didnt call, is it possible that Burglary
occurred?
Check KB M doesnt entail B
John already called. If Mary also calls, is it
more likely that Burglary occurred?
You now also hear on the TV that there was an
earthquake. Is Burglary more or less likely now?

55
How do we handle Real Pearl?

Eager way
Model everything!
E.g. Model exactly the conditions under which
John will call
He shouldnt be listening to loud music, he
hasnt gone on an errand, he didnt recently have
a tiff with Pearl etc etc.
A c1 c2 c3 ..cn gt J
(also the exceptions may have interactions
c1c5 gt c9 )

Ignorant (non-omniscient) and Lazy
(non-omnipotent) way
Model the likelihood
In 85 of the worlds where there was an alarm,
John will actually call
How do we do this?
Non-monotonic logics
certainty factors
probability theory?

Qualification and Ramification problems make
this an infeasible enterprise
56
Probabilistic Calculus to the Rescue

Suppose we know the likelihood
of each of the (propositional) worlds (aka Joint
Probability distribution )
Then we can use standard rules of probability to
compute the likelihood of all queries (as I will
remind you)
So, Joint Probability Distribution is all that
you ever need!
In the case of Pearl example, we just need the
joint probability distribution over B,E,A,J,M (32
numbers)
--In general 2n separate numbers (which should
add up to 1)

Only 10 (instead of 32) numbers to specify!

If Joint Distribution is sufficient for
reasoning, what is domain knowledge supposed to
help us with?
--Answer Indirectly by helping us specify
the joint probability distribution with fewer
than 2n numbers
---The local relations between propositions
can be seen as constraining the form the joint
probability distribution can take!

57
If BgtA then P(AB) ? P(BA) ?
P(BA) ?

Write a Comment

User Comments (0)

About PowerShow.com

Python logic PowerPoint PPT Presentation