Title: Python logic
1Tell me what you do with witches? Burn And what
do you burn apart from witches? More witches!
Shh! Wood! So, why do witches burn? pause
B--... 'cause they're made of... wood? Good!
Heh heh. Oh, yeah. Oh. So, how do we tell
whether she is made of wood? . Does wood sink
in water? No. No. No, it floats! It floats!
Throw her into the pond! The pond! Throw her
into the pond! What also floats in water?
Bread! Apples! Uh, very small rocks! ARTHUR
A duck! CROWD Oooh. BEDEVERE Exactly. So,
logically... VILLAGER 1 If... she... weighs...
the same as a duck,... she's made of
wood. BEDEVERE And therefore? VILLAGER 2 A
witch! VILLAGER 1 A witch!
Python logic
2Problematic scenarios for hill-climbing
Solution(s) ? Random restart hill-climbing
? Do the non-greedy thing with some
probability pgt0 ? Use simulated annealing
Ridges
- When the state-space landscape has
- local minima, any search that moves
- only in the greedy direction cannot be
- (asymptotically) complete
- Random walk, on the other hand, is
- asymptotically complete
- Idea Put random walk into greedy hill-climbing
3The middle ground between hill-climbing and
systematic search
- Hill-climbing has a lot of freedom in deciding
which node to expand next. But it is incomplete
even for finite search spaces. - Good for problems which have solutions, but the
solutions are non-uniformly clustered. - Systematic search is complete (because its search
tree keeps track of the parts of the space that
have been visited). - Good for problems where solutions may not exist,
- Or the whole point is to show that there are no
solutions (e.g. propositional entailment problem
to be discussed later). - or the state-space is densely connected (making
repeated exploration of states a big issue).
Smart idea Try the middle ground between the two?
4Tabu Search
- A variant of hill-climbing search that attempts
to reduce the chance of revisiting the same
states - Idea
- Keep a Tabu list of states that have been
visited in the past. - Whenever a node in the local neighborhood is
found in the tabu list, remove it from
consideration (even if it happens to have the
best heuristic value among all neighbors) - Properties
- As the size of the tabu list grows, hill-climbing
will asymptotically become non-redundant (wont
look at the same state twice) - In practice, a reasonable sized tabu list (say
100 or so) improves the performance of hill
climbing in many problems
Hill climbing ? O(1) space complexity! ? but
has no termination or completeness
guarantee (because it doesnt know
where it has been, it can loop even in
finite search spaces)
5Making Hill-Climbing Asymptotically Complete
- Random restart hill-climbing
- Keep some bound B. When you made more than B
moves, reset the search with a new random initial
seed. Start again. - Getting random new seed in an implicit search
space is non-trivial! - In 8-puzzle, if you generate a random state by
making random moves from current state, you are
still not truly random (as you will continue to
be in one of the two components) - biased random walk Avoid being greedy when
choosing the seed for next iteration - With probability p, choose the best child but
with probability (1-p) choose one of the children
randomly - Use simulated annealing
- Similar to the previous ideathe probability p
itself is increased asymptotically to one (so you
are more likely to tolerate a non-greedy move in
the beginning than towards the end)
With random restart or the biased random walk
strategies, we can solve very large problems
million queen problems in under minutes!
6Ideas for improving convergence -- Random
restart hill-climbing After every N
iterations, start with a completely
random assignment --Probabilistic
greedy -with probability p do what
the greedy strategy suggests -with
probability (1-p) pick a random variable
and change its value randomly
-- p can increase as the search
progresses
A greedier version of the above (pick both the
best var and val) For each variable v, let
l(v) be the value that it can take so that
the number of conflicts are minimized. Let n(v)
be the number of conflicts with this value.
--Pick the variable v with the
lowest n(v) value. --Assign it the
value l(v)
1
2
This one basically searches the 1-neighborhood of
the current assignment (where k-neighborhood is
all assignments that differ from the current
assignment in atmost k-variable values)
7Model-checking by Stochastic Hill-climbing
Clauses 1. (p,s,u) 2. (p, q) 3. (q, r)
4. (q,s,t) 5. (r,s) 6. (s,t) 7. (s,u)
Applying min-conflicts idea to Satisfiability
- Start with a model (a random t/f assignment to
propositions) - For I 1 to max_flips do
- If model satisfies clauses then return model
- Else clause a randomly selected clause from
clauses that is false in model - With probability p whichever symbol in clause
maximizes the number of satisfied clauses
/greedy step/ - With probability (1-p) flip the value in model of
a randomly selected symbol from clause /random
step/ - Return Failure
Consider the assignment all false -- clauses
1 (p,s,u) 5 (r,s) are violated --Pick
onesay 5 (r,s) if we flip r, 1 (remains)
violated if we flip s, 4,6,7 are violated
So, greedy thing is to flip r we get all
false, except r otherwise, pick either
randomly
Remarkably good in practice!! --So good that
people startedwondering if there actually are any
hard problems out there
8If most sat problems are easy, then exactly
where are the hard ones?
?
9Hardness of 3-sat as a function of
clauses/variables
Probability that there is a satisfying
assignment
Cost of solving (either by finding a solution
or showing there aint one)
4.3
clauses/variables
10Phase Transition in SAT
Theoretically we only know that phase transition
ratio occurs between 3.26 and 4.596.
Experimentally, it seems to be close to 4.3 (We
also have a proof that 3-SAT has sharp threshold)
11Progress in nailing the bound.. (just FYI)
http//www.ipam.ucla.edu/publications/ptac2002/pta
c2002_dachlioptas_formulas.pdf
12Beam search for Hill-climbing
- Hill climbing, as described, uses one seed
solution that is continually updated - Why not use multiple seeds?
- Stochastic hill-climbing uses multiple seeds (k
seeds kgt1). In each iteration, the neighborhoods
of all k seeds are evaluated. From the
neighborhood, k new seeds are selected
probabilistically - The probability that a seed is selected is
proportional to how good it is. - Not the same as running k hill-climbing searches
in parallel - Stochastic hill-climbing is sort of almost
close to the way evolution seems to work with one
difference - Define the neighborhood in terms of the
combination of pairs of current seeds (Sexual
reproduction Crossover) - The probability that a seed from current
generation gets to mate to produce offspring in
the next generation is proportional to the seeds
goodness - To introduce randomness do mutation over the
offspring - This type of stochastic beam-search hillclimbing
algorithms are called Genetic algorithms. - Genetic algorithms limit number of matings to
keep the num seeds the same
13Illustration of Genetic Algorithms in Action
Very careful modeling needed so the things
emerging from crossover and mutation are
still potential seeds (and not monkeys
typing Hamlet) Is the genetic metaphor
really buying anything?
14Hill-climbing in continuous search spaces
Example cube root Finding using newton- Raphson
approximation
- Gradient descent (that you study in calculus of
variations) is a special case of hill-climbing
search applied to continuous search spaces - The local neighborhood is defined in terms of the
gradient or derivative of the error function. - Since the error function gradient will be zero
near the minimum, and higher farther from it, you
tend to take smaller steps near the minimum and
larger steps farther away from it. just as you
would want - Gradient descent is guranteed to converge to the
global minimum if alpha (see on the right) is
small, and the error function is uni-modal
(I.e., has only one minimum). - Versions of gradient-descent algorithms will be
used in neuralnetwork learning. - Unfortunately, the error function is NOT unimodal
for multi-layer neural networks. So, you will
have to change the gradient descent with ideas
such as simulated annealing to increase the
chance of reaching global minimum.
Err x3-a
a1/3
xo
X?
Tons of variations based on how alpha is set
15Origins of gradient descentNewton-Raphson
applied to function minimization
- Newton-Raphson method is used for finding roots
of a polynomial - To find roots of g(x), we start with some value
of x and repeatedly do - x ? x g(x)/g(x)
- To minimize a function f(x), we need to find the
roots of the equation f(x)0 - X ? x f(x)/f(x)
- If x is a vector then
- X ? x f(x)/f(x)
Because hessian is costly to Compute (will have
n2 double Derivative entries for an
n-dimensional vector), we try approximations
f(x)
D
Hf(x)
16Between Hill-climbing and systematic search
- You can reduce the freedom of hill-climbing
search to make it more complete - Tabu search
- You can increase the freedom of systematic search
to make it more flexible in following local
gradients - Random restart search
17Tabu Search
- A variant of hill-climbing search that attempts
to reduce the chance of revisiting the same
states - Idea
- Keep a Tabu list of states that have been
visited in the past. - Whenever a node in the local neighborhood is
found in the tabu list, remove it from
consideration (even if it happens to have the
best heuristic value among all neighbors) - Properties
- As the size of the tabu list grows, hill-climbing
will asymptotically become non-redundant (wont
look at the same state twice) - In practice, a reasonable sized tabu list (say
100 or so) improves the performance of hill
climbing in many problems
18Random restart search
- Because of the random permutation, every time
the search is restarted, you are likely to follow
different paths through the search tree. This
allows you to recover from the bad initial moves.
- The higher the cutoff value the lower the amount
of restarts (and thus the lower the freedom to
explore different paths). - When cutoff is infinity, random restart search is
just normal depth-first searchit will be
systematic and complete - For smaller values of cutoffs, the search has
higher freedom, but no guarantee of completeness - A strategy to guarantee asymptotic completeness
- Start with a low cutoff value, but keep
increasing it as time goes on. - Random restart search has been shown to be very
good for problems that have a reasonable
percentage of easy to find solutions (such
problems are said to exhibit heavy-tail
phenomenon). Many real-world problems have this
property.
- Variant of depth-first search where
- When a node is expanded, its children are first
randomly permuted before being introduced into
the open list - The permutation may well be a biased random
permutation - Search is restarted from scratch anytime a
cutoff parameter is exceeded - There is a Cutoff (which may be in terms of
of backtracks, of nodes expanded or amount of
time elapsed)
19Tell me what you do with witches? Burn And what
do you burn apart from witches? More witches!
Shh! Wood! So, why do witches burn? pause
B--... 'cause they're made of... wood? Good!
Heh heh. Oh, yeah. Oh. So, how do we tell
whether she is made of wood? . Does wood sink
in water? No. No. No, it floats! It floats!
Throw her into the pond! The pond! Throw her
into the pond! What also floats in water?
Bread! Apples! Uh, very small rocks! ARTHUR
A duck! CROWD Oooh. BEDEVERE Exactly. So,
logically... VILLAGER 1 If... she... weighs...
the same as a duck,... she's made of
wood. BEDEVERE And therefore? VILLAGER 2 A
witch! VILLAGER 1 A witch!
Python logic
20(No Transcript)
21Representation
Reasoning
22(No Transcript)
23Facts Objects relations
FOPC
Prob FOPC
Ontological commitment
Prob prop logic
Prop logic
facts
t/f/u
Deg belief
Epistemological commitment
Assertions t/f
24Think of a sentence as the stand-in for a set of
worlds (where it is true)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32Proof by model checking
KBa
False False False False False False False False
So, to check if KB entails a, negate a, add it
to the KB, try to show that the resultant
(propositional) theory has no solutions (must
have to use systematic methods)
33Connection between Entailment and Satisfiability
- The Boolean Satisfiability problem is closely
connected to Propositional entailment - Specifically, propositional entailment is the
conjugate problem of boolean satisfiability
(since we have to show that KB f has no
satisfying model to show that KB f) - Of late, our ability to solve very large scale
satisfiability problems has increased quite
significantly
34Entailment Satisfiability
- SAT (boolean satisfiability) problem
- Given a set of propositions
- And a set of (CNF) clauses
- Find a model (an assignment of t/f values to
propositions) that satisfies all clauses - k-SAT is a SAT problem where all clauses are
length less than or equal to k - SAT is NP-complete
- 1-SAT and 2-SAT are polynomial
- k-SAT for kgt 2 is NP-complete (so 3-SAT is the
smallest k-SAT that is NP-Complete) - If we have a procedure for solving SAT problems,
we can use it to compute entailment - If the sentence S is entailed, if negation of S,
when added to the KB, gives a SAT theory that is
unsatisfiable (NO MODEL) - CO-NP-Complete
- SAT is useful for modeling many other
assignment problems - We will see use of SAT for planning it can also
be used for Graph coloring, n-queens, Scheduling
and Circuit verification etc (the last thing
makes SAT VERY interesting for Electrical
Engineering folks) - Our ability to solve very large scale SAT
problems has increased quite phenomenally in the
recent years - We can solve SAT instances with millions of
variables and clauses very easily - To use this technology for inference, we will
have to consider systematic SAT solvers.
35Davis-Putnam-Logeman-Loveland Procedure
?detect failure
36DPLL Example
Pick p set ptrue unit propagation
(p,s,u) satisfied (remove) p(p,q) ? q
derived set qT (p,q) satisfied
(remove) (q,s,t) satisfied (remove)
q(q,r)?r derived set rT (q,r)
satisfied (remove) (r,s) satisfied
(remove) pure literal elimination in
all the remaining clauses, s occurs negative
set sTrue (i.e. sFalse) At this point
all clauses satisfied. Return
pT,qTrTsFalse
Clauses (p,s,u) (p, q) (q, r) (q,s,t)
(r,s) (s,t) (s,u)
37Lots of work in SAT solvers
- DPLL was the first (late 60s)
- Circa 1994 came GSAT (hill climbing search for
SAT) - Circa 1997 came SATZ
- Circa 1998-99 came RelSAT
- 2000 came CHAFF
- Current best can be found at
- http//www.satlive.org/SATCompetition/2003/results
.html
38Inference rules
Kb true but theorem not true ?
- Sound (but incomplete)
- Modus Ponens
- AgtB, A B
- Modus tollens
- AgtB,B A
- Abduction (??)
- A gt B,A B
- Chaining
- AgtB,BgtC AgtC
- Complete (but unsound)
- Python logic
A B AgtB KB A
T T T F F
T F F F F
F T T F T
F F T T T
How about SOUND COMPLETE? --Resolution
(needs normal forms)
39Need something that does case analysis
If WMDs are found, the war is justified
WgtJ If WMDs are not found, the war is still
justified WgtJ Is the war justified anyway?
J? Can Modus Ponens derive it?
40Need something that does case analysis
If WMDs are found, the war is justified
WgtJ If WMDs are not found, the war is still
justified WgtJ Is the war justified anyway?
J? Can Modus Ponens derive it?
41Modus ponens, Modus Tollens etc are special
cases of resolution!
Forward apply resolution steps until the
fact f you want to prove appears as a resolvent
Backward (Resolution Refutation) Add negation
of the fact f you want to derive to KB
apply resolution steps until you derive an
empty clause
42If WMDs are found, the war is justified W V
J If WMDs are not found, the war is still
justified W V J Is the war justified anyway?
J?
43Resolution does case analysis
If WMDs are found, the war is justified W V
J If WMDs are not found, the war is still
justified W V J Either WMDs are found or they
are not found W V W Is the war justified
anyway? J?
44Aka the product of sums form From CSE/EEE 120
Aka the sum of products form
Prolog without variables and without the cut
operator Is doing horn-clause theorem proving
For any KB in horn form, modus ponens is a
sound and complete inference
45Conversion to CNF form
ANY propositional logic sentence can be converted
into CNF form Try (PQ)gt(R V W)
- CNF clause Disjunction of literals
- Literal a proposition or a negated proposition
- Conversion
- Remove implication
- Pull negation in
- Use demorgans laws to distribute disjunction over
conjunction - Separate conjunctions
- into clauses
46Need for resolution
Resolution does case analysis
Yankees win, it is Destiny YVD Dbacks win,
it is Destiny Db V D Yankees or Dbacks win
Y V Db Is it Destiny either way? D? Can
Modus Ponens derive it? Not until Sunday, when
Db won
47Solving problems using propositional logic
- Need to write what you know as propositional
formulas - Theorem proving will then tell you whether a
given new sentence will hold given what you know - Three kinds of queries
- Is my knowledge base consistent? (i.e. is there
at least one world where everything I know is
true?) Satisfiability - Is the sentence S entailed by my knowledge base?
(i.e., is it true in every world where my
knowledge base is true?) - Is the sentence S consistent/possibly true with
my knowledge base? (i.e., is S true in at least
one of the worlds where my knowledge base holds?) - S is consistent if S is not entailed
- But cannot differentiate between degrees of
likelihood among possible sentences
48Steps in Resolution Refutation
Is there search in inference? Yes!! Many
possible inferences can be done Only few are
actually relevant --Idea Set of Support
At least one of the resolved
clauses is a goal clause, or
a descendant of a clause
derived from a goal clause -- Used in the
example here!!
- Consider the following problem
- If the grass is wet, then it is either raining or
the sprinkler is on - GW gt R V SP GW V R V SP
- If it is raining, then Timmy is happy
- R gt TH R V TH
- If the sprinklers are on, Timmy is happy
- SP gt TH SP V TH
- If timmy is happy, then he sings
- TH gt SG TH V SG
- Timmy is not singing
- SG SG
- Prove that the grass is not wet
- GW? GW
49Search in Resolution
- Convert the database into clausal form Dc
- Negate the goal first, and then convert it into
clausal form DG - Let D Dc DG
- Loop
- Select a pair of Clauses C1 and C2 from D
- Different control strategies can be used to
select C1 and C2 to reduce number of resolutions
tries - Idea 1 Set of Support At least one of C1 or C2
must be either the goal clause or a clause
derived by doing resolutions on the goal clause
(COMPLETE) - Idea 2 Linear input form Atleast one of C1 or
C2 must be one of the clauses in the input KB
(INCOMPLETE) - Resolve C1 and C2 to get C12
- If C12 is empty clause, QED!! Return Success (We
proved the theorem ) - D D C12
- End loop
- If we come here, we couldnt get empty clause.
Return Failure - Finiteness is guaranteed if we make sure that
- we never resolve the same pair of clauses more
than once AND - we use factoring, which removes multiple copies
of literals from a clause (e.g. QVPVP gt QVP)
50Mad chase for empty clause
- You must have everything in CNF clauses before
you can resolve - Goal must be negated first before it is converted
into CNF form - Goal (the fact to be proved) may become converted
to multiple clauses (e.g. if we want to prove P V
Q, then we get two clauses P Q to add to the
database - Resolution works by resolving away a single
literal and its negation - PVQ resolved with P V Q is not empty!
- In fact, these clauses are not inconsistent (P
true and Q false will make sure that both clauses
are satisfied) - PVQ is negation of P Q. The latter will
become two separate clauses--P , Q. So, by
doing two separate resolutions with these two
clauses we can derive empty clause
51Complexity of Propositional Inference
- Any sound and complete inference procedure has to
be Co-NP-Complete (since model-theoretic
entailment computation is Co-NP-Complete (since
model-theoretic satisfiability is NP-complete)) - Given a propositional database of size d
- Any sentence S that follows from the database by
modus ponens can be derived in linear time - If the database has only HORN sentences
(sentences whose CNF form has at most one ve
clause e.g. A B gt C), then MP is complete for
that database. - PROLOG uses (first order) horn sentences
- Deriving all sentences that follow by resolution
is Co-NP-Complete (exponential) - Anything that follows by unit-resolution can be
derived in linear time. - Unit resolution At least one of the clauses
should be a clause of length 1
52Example
- Pearl lives in Los Angeles. It is a high-crime
area. Pearl installed a burglar alarm. He asked
his neighbors John Mary to call him if they
hear the alarm. This way he can come home if
there is a burglary. Los Angeles is also
earth-quake prone. Alarm goes off when there is
an earth-quake.
- Burglary gt Alarm
- Earth-Quake gt Alarm
- Alarm gt John-calls
- Alarm gt Mary-calls
- If there is a burglary, will Mary call?
- Check KB E M
- If Mary didnt call, is it possible that Burglary
occurred? - Check KB M doesnt entail B
53Example (Real)
- Pearl lives in Los Angeles. It is a high-crime
area. Pearl installed a burglar alarm. He asked
his neighbors John Mary to call him if they
hear the alarm. This way he can come home if
there is a burglary. Los Angeles is also
earth-quake prone. Alarm goes off when there is
an earth-quake. - Pearl lives in real world where (1) burglars can
sometimes disable alarms (2) some earthquakes may
be too slight to cause alarm (3) Even in Los
Angeles, Burglaries are more likely than Earth
Quakes (4) John and Mary both have their own
lives and may not always call when the alarm goes
off (5) Between John and Mary, John is more of a
slacker than Mary.(6) John and Mary may call even
without alarm going off
- Burglary gt Alarm
- Earth-Quake gt Alarm
- Alarm gt John-calls
- Alarm gt Mary-calls
- If there is a burglary, will Mary call?
- Check KB E M
- If Mary didnt call, is it possible that Burglary
occurred? - Check KB M doesnt entail B
- John already called. If Mary also calls, is it
more likely that Burglary occurred? - You now also hear on the TV that there was an
earthquake. Is Burglary more or less likely now?
54Example (Real)
- Pearl lives in Los Angeles. It is a high-crime
area. Pearl installed a burglar alarm. He asked
his neighbors John Mary to call him if they
hear the alarm. This way he can come home if
there is a burglary. Los Angeles is also
earth-quake prone. Alarm goes off when there is
an earth-quake. - Pearl lives in real world where (1) burglars can
sometimes disable alarms (2) some earthquakes may
be too slight to cause alarm (3) Even in Los
Angeles, Burglaries are more likely than Earth
Quakes (4) John and Mary both have their own
lives and may not always call when the alarm goes
off (5) Between John and Mary, John is more of a
slacker than Mary.(6) John and Mary may call even
without alarm going off
- Burglary gt Alarm
- Earth-Quake gt Alarm
- Alarm gt John-calls
- Alarm gt Mary-calls
- If there is a burglary, will Mary call?
- Check KB E M
- If Mary didnt call, is it possible that Burglary
occurred? - Check KB M doesnt entail B
- John already called. If Mary also calls, is it
more likely that Burglary occurred? - You now also hear on the TV that there was an
earthquake. Is Burglary more or less likely now?
55How do we handle Real Pearl?
- Eager way
- Model everything!
- E.g. Model exactly the conditions under which
John will call - He shouldnt be listening to loud music, he
hasnt gone on an errand, he didnt recently have
a tiff with Pearl etc etc. - A c1 c2 c3 ..cn gt J
- (also the exceptions may have interactions
- c1c5 gt c9 )
- Ignorant (non-omniscient) and Lazy
(non-omnipotent) way - Model the likelihood
- In 85 of the worlds where there was an alarm,
John will actually call - How do we do this?
- Non-monotonic logics
- certainty factors
- probability theory?
Qualification and Ramification problems make
this an infeasible enterprise
56Probabilistic Calculus to the Rescue
- Suppose we know the likelihood
- of each of the (propositional) worlds (aka Joint
Probability distribution ) - Then we can use standard rules of probability to
compute the likelihood of all queries (as I will
remind you) - So, Joint Probability Distribution is all that
you ever need! - In the case of Pearl example, we just need the
joint probability distribution over B,E,A,J,M (32
numbers) - --In general 2n separate numbers (which should
add up to 1)
Only 10 (instead of 32) numbers to specify!
- If Joint Distribution is sufficient for
reasoning, what is domain knowledge supposed to
help us with? - --Answer Indirectly by helping us specify
the joint probability distribution with fewer
than 2n numbers - ---The local relations between propositions
can be seen as constraining the form the joint
probability distribution can take!
57If BgtA then P(AB) ? P(BA) ?
P(BA) ?