Von Neuman - PowerPoint PPT Presentation

About This Presentation
Title:

Von Neuman

Description:

Extra credit portion will be accepted until Thursday with late penalty. Any ... Mohandas Karamchand Gandhi, born October 2nd, 1869. Lecture of October 2nd, 2003 ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 67
Provided by: rao58
Category:
Tags: cranes | gondi | neuman | von

less

Transcript and Presenter's Notes

Title: Von Neuman


1
Claude Shannon (finite look-ahead)
Chaturanga, India (550AD) (Proto-Chess)
Von Neuman (Min-Max theorem)
9/30
Donald Knuth (a-b analysis)
John McCarthy (a-b pruning)
2
Announcements etc.
  • Homework 2 returned ?
  • (!! Our TA doesnt sleep)
  • Average 33/60
  • Max 56/60
  • Solutions online
  • Homework 3 socket opened ?
  • Project 1 due today
  • Extra credit portion will be accepted until
    Thursday with late penalty
  • Any steam to be let off?
  • Todays class
  • Its all fun and GAMES

Steaming in Tempe
3
What if you see this as a game?
If you are perpetual optimist then V2
max(V3,V4)
Review
Min-Max!
4
Game Playing (Adversarial Search)
  • Perfect play
  • Do minmax on the complete game tree
  • Resource limits
  • Do limited depth lookahead
  • Apply evaluation functions at the leaf nodes
  • Do minmax
  • Alpha-Beta pruning (a neat idea that is the bane
    of many a CSE471 student)
  • Miscellaneous
  • Games of Chance
  • Status of computer games..

5
Fun to try and find analogies between this and
environment properties
6
(No Transcript)
7
Searching Tic Tac Toe using Minmax
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
Evaluation Functions TicTacToe
If win for Max infty If lose for Max
-infty If draw for Max 0 Else
rows/cols/diags open for Max -
rows/cols/diags open for Min
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
Why is deeper better?
  • Possible reasons
  • Taking mins/maxes of the evaluation values of the
    leaf nodes improves their collective accuracy
  • Going deeper makes the agent notice traps thus
    significantly improving the evaluation accuracy
  • All evaluation functions first check for
    termination states before computing the
    non-terminal evaluation

16
(so is MDP policy)
17
lt 2
lt 2
lt 5
lt 14
Cut
2
14
5
2
  • Whenever a node gets its true value, its
    parents bound gets updated
  • When all children of a node have been evaluated
    (or a cut off occurs below that node), the
    current bound of that node is its true value
  • Two types of cutoffs
  • If a min node n has bound ltk, and a max ancestor
    of n, say m, has a bound gtj, then cutoff occurs
    as long as j gtk
  • If a max node n has bound gtk, and a min ancestor
    of n, say m, has a bound ltj, then cutoff occurs
    as long as jltk

18
(No Transcript)
19
An eye for an eye only ends up making the whole
world blind. -Mohandas Karamchand Gandhi,
born October 2nd, 1869.
Lecture of October 2nd, 2003
20
Another alpha-beta example
Project 2 assigned
21
(order nodes in terms of their static eval
values)
Click for an animation of Alpha-beta search in
action on Tic-Tac-Toe
22
(No Transcript)
23
(No Transcript)
24
Multi-player Games
Everyone maximizes their utility --How does
this compare to 2-player games? (Maxs
utility is negative of Mins)
25
Expecti-Max
26
(just as human weight lifters refuse to compete
against cranes)
27
What if you see this as a game?
If you are perpetual optimist then V2
max(V3,V4)
Min-Max!
28
RTA
S
S n
m
k
G
G1 H2 F3
G1 H2 F3
n
m
G2 H3 F5
k
infty
RTA is a special case of RTDP --It is useful
for acting in determinstic, dynamic worlds
--While RTDP is useful for actiong in stochastic,
dynamic worlds
--Grow the tree to depth d --Apply f-evaluation
for the leaf nodes --propagate f-values up to the
parent nodes f(parent) min(
f(children))
29
MDPs and Deterministic Search
  • Problem solving agent search corresponds to what
    special case of MDP?
  • Actions are deterministic Goal states are all
    equally valued, and are all sink states.
  • Is it worth solving the problem using MDPs?
  • The construction of optimal policy is an overkill
  • The policy, in effect, gives us the optimal path
    from every state to the goal state(s))
  • The value function, or its approximations, on the
    other hand are useful. How?
  • As heuristics for the problem solving agents
    search
  • This shows an interesting connection between
    dynamic programming and state search paradigms
  • DP solves many related problems on the way to
    solving the one problem we want
  • State search tries to solve just the problem we
    want
  • We can use DP to find heuristics to run state
    search..

30
Incomplete observability(the dreaded POMDPs)
  • To model partial observability, all we need to do
    is to look at MDP in the space of belief states
    (belief states are fully observable even when
    world states are not)
  • Policy maps belief states to actions
  • In practice, this causes (humongous) problems
  • The space of belief states is continuous (even
    if the underlying world is discrete and finite).
    GET IT? GET IT??
  • Even approximate policies are hard to find
    (PSPACE-hard).
  • Problems with few dozen world states are hard to
    solve currently
  • Depth-limited exploration (such as that done in
    adversarial games) are the only option

Belief state s10.3, s20.4 s40.3
5 LEFTs
5 UPs
This figure basically shows that belief states
change as we take actions
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Planning
  • Where states are transparent
  • and actions have preconditions and effects

35
Blocks world
Init Ontable(A),Ontable(B), Clear(A),
Clear(B), hand-empty Goal clear(B),
hand-empty
State variables Ontable(x) On(x,y) Clear(x)
hand-empty holding(x)
Initial state Complete specification of T/F
values to state variables --By convention,
variables with F values are omitted
Goal state A partial specification of the
desired state variable/value combinations
--desired values can be both positive and
negative
Pickup(x) Prec hand-empty,clear(x),ontable(x)
eff holding(x),ontable(x),hand-empty,Clear(x
)
Putdown(x) Prec holding(x) eff Ontable(x),
hand-empty,clear(x),holding(x)
Unstack(x,y) Prec on(x,y),hand-empty,cl(x)
eff holding(x),clear(x),clear(y),hand-empty
Stack(x,y) Prec holding(x), clear(y) eff
on(x,y), cl(y), holding(x), hand-empty
All the actions here have only positive
preconditions but this is not necessary
36
An action A can be applied to state S iff the
preconditions are satisfied in the current
state The resulting state S is computed as
follows --every variable that occurs in the
actions effects gets the value that the
action said it should have --every other
variable gets the value it had in the state
S where the action is applied
Progression
holding(A) Clear(A) Ontable(A) Ontable(B),
Clear(B) handempty
Pickup(A)
Ontable(A) Ontable(B), Clear(A) Clear(B)
hand-empty
holding(B) Clear(B) Ontable(B) Ontable(A),
Clear(A) handempty
Pickup(B)
37
Generic (progression) planner
  • Goal test(S,G)check if every state variable in
    S, that is mentioned in G, has the value that G
    gives it.
  • Child generator(S,A)
  • For each action a in A do
  • If every variable mentioned in Prec(a) has the
    same value in it and S
  • Then return Progress(S,a) as one of the children
    of S
  • Progress(S,A) is a state S where each state
    variable v has value vEff(a)if it is mentioned
    in Eff(a) and has the value vS otherwise
  • Search starts from the initial state

38
A state S can be regressed over an action A (or
A is applied in the backward direction to
S) Iff --There is no variable v such that v is
given different values by the effects of A
and the state S --There is at least one
variable v such that v is given the same
value by the effects of A as well as state S The
resulting state S is computed as follows --
every variable that occurs in S, and does not
occur in the effects of A will be copied
over to S with its value as in S --
every variable that occurs in the precondition
list of A will be copied over to S with the
value it has in in the precondition list
Regression
Putdown(A)
clear(B) holding(A)
clear(B) hand-empty
Stack(A,B)
holding(A) clear(B)
Putdown(B)??
39
Heuristics to guide Progression/Regression
  • Set difference heuristic
  • Intution The cost of a state is the number
    of goals that are not yet present in it.
  • Progression The cost of a state S is G \ S
  • The number of state-variable value pairs in G
    which
  • are not present in S
  • Regression The cost of a state S is S \ I
  • The number of state-variable value pairs in S
    that are
  • not present in the initial state
  • Problems with Set difference heuristic
  • 1. Every literal is given the same
    cost. Some literals are
  • harder to achieve than others!
  • 2. It is assumed that the cost of
    achieving n-literals together is n
  • This ignores the interactions
    between literals (subgoals).
  • -- It may be easier to
    achieve a set of literals together than to
  • achieve each of them
    separately (ve interactions)
  • -- It may be harder to
    achieve a set of literals together than to
  • achieve them
    separately. (-ve interactions)

40
Subgoal interactions
Suppose we have a set of subgoals G1,.Gn
Suppose the length of the shortest plan for
achieving the subgoals in isolation is l1,.ln
We want to know what is the length of the
shortest plan for achieving the n subgoals
together, l1n If subgoals are
independent l1..n
l1l2ln If subgoals have ve
interactions alone l1..n lt l1l2ln
If subgoals have -ve interactions alone
l1..n gt l1l2ln
41
Estimating the cost of achieving individual
literals (subgoals)
Idea Unfold a data structure called planning
graph as follows 1. Start with the initial
state. This is called the zeroth level
proposition list 2. In the next level, called
first level action list, put all the actions
whose preconditions are true in the initial
state -- Have links between actions
and their preconditions 3. In the next level,
called first level propostion list, put
Note A literal appears at most once in a
proposition list. 3.1. All the
effects of all the actions in the previous
level. Links the effects to the
respective actions. (If
multiple actions give a particular effect, have
multiple links to that
effect from all those actions) 3.2.
All the conditions in the previous proposition
list (in this case zeroth
proposition list). Put
persistence links between the corresponding
literals in the previous
proposition list and the current proposition
list. 4. Repeat steps 2 and 3 until there is no
difference between two consecutive
proposition lists. At that point the graph is
said to have leveled off
The next 2 slides show this expansion upto two
levels
42
Lecture of 23 Oct
  • (the first four slides are reviewed from the
    previous class)

43
h-A
h-B
Pick-A
Pick-B
cl-A
cl-B
he
onT-A
onT-A
onT-B
onT-B
cl-A
cl-A
cl-B
cl-B
he
he
44
h-A
on-A-B
St-A-B
on-B-A
h-B
Pick-A
h-A
h-B
Pick-B
cl-A
cl-A
cl-B
cl-B
St-B-A
he
he
onT-A
onT-A
Ptdn-A
onT-A
onT-B
onT-B
onT-B
Ptdn-B
cl-A
cl-A
cl-A
Pick-A
cl-B
cl-B
cl-B
Pick-B
he
he
he
45
Using the planning graph to estimate the cost of
single literals
1. We can say that the cost of a single literal
is the index of the first proposition level
in which it appears. --If the literal
does not appear in any of the levels in the
currently expanded planning graph,
then the cost of that literal is
-- l1 if the graph has been expanded to l
levels, but has not yet leveled off
-- Infinity, if the graph has been
expanded
(basically, the literal cannot be achieved from
the current initial state) Examples
h(he) 1 h (On(A,B)) 2 h(he) 0
46
Estimating the cost of a set of literals (e.g. a
state in regression search)
Idea 0. Max Heuristic Hmax(p,q,r..)
maxh(p),h(q),. Admissible, but very
weak in practice Idea 2. Sum Heuristic Make
subgoal independence assumption
hind(p,q,r,...) h(p)h(q)h(r)
Much better than set-difference heuristic in
practice. --Ignores ve interactions
h(he,h-A) h(he) h(h-A) 112
But, we can achieve both the literals with just
a single action, Pickup(A). So,
the real cost is 1 --Ignores -ve interactions
h(cl(B),he) 10 1 But, there is
really no plan that can achieve these two
literals in this problem So, the real cost
is infinity!
47
We can do a better job of accounting for ve
interactions if we define the cost of a set of
literals in terms of the level
hlev(p,q,r) The index of the first level of
the PG where p,q,r appear together so,
h(he,h-A) 1 Interestingly, hlev is
an admissible heuristic, even though hind is not!
(Prove) To better account for -ve interactions,
we need to start looking into feasibility of
subsets of literals actually being true together
in a proposition level. Specifically, in each
proposition level, we want to mark not just which
individual literals are feasible, but also
which pairs, which triples, which quadruples, and
which n-tuples are feasible. (It is quite
possible that two literals are independently
feasible in level k, but not feasible
together in that level) --The idea
then is to say that the cost of a set of S
literals is the index of the
first level of the planning graph, where no
subset of S is marked infeasible
--The full scale mark-up is very costly, and
makes the cost of planning graph
construction equal the cost of enumerating the
full progression search tree. -- Since we
only want estimates, it is okay if talk of
feasibility of upto k-tuples -- For the
special case of feasibility of k2 (2-sized
subsets), there are some very
efficient marking and propagation procedures.
This is the idea of marking and
propagating mutual exclusion relations.
48
  • Rule 1. Two actions a1 and a2 are mutex if
  • both of the actions are non-noop actions or
  • a1 is a noop action supporting P, and a2 either
    needs P, or gives P.
  • some precondition of a1 is marked mutex with
    some precondition of a2

Rule 2. Two propositions P1 and P2 are marked
mutex if all actions supporting P1
are pair-wise mutex with all
actions supporting P2.
49
h-A
h-B
Pick-A
Pick-B
cl-A
cl-B
he
onT-A
onT-A
onT-B
onT-B
cl-A
cl-A
cl-B
cl-B
he
he
50
h-A
on-A-B
St-A-B
on-B-A
h-B
Pick-A
h-A
h-B
Pick-B
cl-A
cl-A
cl-B
cl-B
St-B-A
he
he
onT-A
onT-A
Ptdn-A
onT-A
onT-B
onT-B
onT-B
Ptdn-B
cl-A
cl-A
cl-A
Pick-A
cl-B
cl-B
cl-B
Pick-B
he
he
he
51
  • Here is how it goes. We know that at every time
    step we are really only going to do one non-no-op
    action. So, at the first level either pickup-A,
    or pickup-B or pickup-C are done. If one of them
    is done, the others cant be. So, we put
    red-arrows to signify that each pair of actions
    are mutually exclusive.
  • Now, we can PROPAGATE the mutex relations to the
    proposition levels.
  • Rule 1. Two actions a1 and a2 are mutex if
  • both of the actions are non-noop actions or
  • a1 is a noop action supporting P, and a2 either
    needs P, or gives P.
  • some precondition of a1 is marked mutex with
    some precondition of a2
  • By this rule Pick-A is mutex with Pick-B.
    Similarly, the noop action he is mutex with
    pick-A.
  • Rule 2. Two propositions P1 and P2 are marked
    mutex if all actions supporting P1 are pair-wise
    mutex with all actions supporting P2.
  • By this rule, h-A and h-B are mutex in level 1
    since the only action giving h-A is mutex with
    the only action giving h-B.
  • cl(B) and he are mutex in the first level, but
    are not mutex in the second level (note that
  • cl(B) is supported by a noop and stack-a-b
    (among others) in level 2. he is supported by
    stack-a-b, noop (among others). At least one
    action stack-a-b supporting the first is
    non-mutex with one actionstack-a-b-- supporting
    the second.

52
Some observations about the structure of the PG
1. If an action a is present in level l, it will
be present in all subsequent levels. 2.
If a literal p is present in level l, it will be
present in all subsequent levels. 3. If
two literals p,q are not mutex in level l, they
will never be mutex in subsequent levels
--Mutex relations relax monotonically as
we grow PG 1,2,3 imply that a PG can be
represented efficiently in a bi-level
structure One level for propositions and one
level for actions. For each
proposition/action, we just track the first time
instant they got into the PG. For mutex
relations we track the first time instant
they went away. A PG is said to have leveled
off if there are no differences between two
consecutive proposition levels in terms of
propositions or mutex relations. --Even
if you grow it further, no more changes can
occur..
53
Level-based heuristics on planning graph with
mutex relations
We now modify the hlev heuristic as follows
hlev(p1, pn) The index of the first level of
the PG where p1, pn appear together
and no pair of them are marked
mutex. (If there is no
such level, then hlev is set to l1 if the PG is
expanded to l levels,
and to infinity, if it has been expanded until it
leveled off)
This heuristic is admissible. With this
heuristic, we have a much better handle on both
ve and -ve interactions. In our example, this
heuristic gives the following reasonable
costs h(he, cl-A) 1 h(cl-B,he) 2
h(he, h-A) infinity (because they
will be marked mutex even in the final level of
the leveled PG)
Works very well in practice
54
Lecture of Oct 25th
  • Agenda
  • Demo of AltAlt--A planner that uses PG as a
    heuristic
  • Qns on PG
  • Use of PGs in Progression vs. Regression
  • Other uses of PG (action selection)
  • PGs as basis for planning as CSP

55
AltAlt
Uses a hybrid of Level and sum Heuristics
--sacrifices admissibility --uses partial
PG to keep heuristic cost down
56
Empirical Evaluation Logistics domain
HadjSum2M heuristic
Problems and domains from AIPS-2000 Planning
Competion (AltAlt approx in top four)
57
Do PG expansion only upto level(l) where all top
level goals come in without being mutex
AIPS-00 Schedule Domain
Quality
Cost
58
Qns on PG?
  • Consider a set of subgoals p,q,r,s
  • If the set appears at level 12 without any pair
    being mutex, is there guaranteed to be a plan to
    achive p,q,r,s using a 12-step plan?
  • If p,q appear in level 12 without being mutex,
    is there a guaranteed 12-step plan to achieve
    p,q?
  • If p appears in level 12 without being mutex,
    is there a guarnateed 12-step plan to achieve p
    with ?

PG does approximate reachability analysis
59
Progression
Regression
A PG based heuristic can give two things 1.
Goal-directedness 2. Consistency. Progression
needs 1 more --So can get by without mutex
propagation Regression needs 2 more. --So may
need even higher consistency information
than is provided by normal PG.
60
Use of PG in Progression vs Regression
  • Progression
  • Need to compute a PG for each child state
  • As many PGs as there are leaf nodes!
  • Lot higher cost for heuristic computation
  • Can try exploiting overlap between different PGs
  • However, the states in progression are
    consistent..
  • So, handling negative interactions is not that
    important
  • Overall, the PG gives a better guidance
  • Regression
  • Need to compute PG only once for the given
    initial state.
  • Much lower cost in computing the heuristic
  • However states in regression are partial states
    and can thus be inconsistent
  • So, taking negative interactions into account
    using mutex is important
  • Costlier PG construction
  • Overall, PGs guidance is not as good unless
    higher order mutexes are also taken into account

Historically, the heuristic was first used with
progression planners. Then they used it with
regression planners. Then they found progression
planners do better. Then they found that
combining them is even better.
61
PGs for reducing actions
  • If you just use the action instances at the final
    action level of a leveled PG, then you are
    guaranteed to preserve completeness
  • Reason Any action that can be done in a state
    that is even possibly reachable from init state
    is in that last level
  • Cuts down branching factor significantly
  • Sometimes, you take more risky gambles
  • If you are considering the goals p,q,r,s, just
    look at the actions that appear in the level
    preceding the first level where p,q,r,s appear
    for the first time without Mutex.

62
PGs can be used as a basis for finding plans
directly
If there exists a k-length plan, it will be a
subgraph of the k-length planning graph.
(see the highlighted subgraph of the PG for our
example problem)
63
Finding the subgraphs that correspond to valid
solutions..
--Can use specialized graph travesal techniques
--start from the end, put the vertices
corresponding to goals in. --if
they are mutex, no solution --else, put
at least one of the supports of those
goals in --Make sure that the supports
are not mutex --If
they are mutex, backtrack and
choose other set of supports.
No backtracking if we have no
mutexes basis for relaxed plans
--At the next level subgoal on the preconds
of the support actions we chose.
--The recursion ends at init level
--Consider extracting the plan from the PG
directly -- This search can also be cast as a
CSP Variables literals in
proposition lists Values actions
supporting them Constraints Mutex
and Activation
The idea behind Graphplan
64
(No Transcript)
65
Backward search in Graphplan
Animated
66
Extraction of a plan and n-ary mutexes
  • One can see the process of extracting the plan as
    verifying that at least one execution thread is
    devoid of n-ary mutexes
  • Example Actions A1A100 gives goals G1G100. No
    preconditions
  • Level at which G1..G100 are true?
  • Length of the plan?

67
Conversion to CSP
-- This search can also be cast as a CSP
Variables literals in proposition lists
Values actions supporting them
Constraints Mutex and Activation
constraints
Variables/Domains cl-B-2 , St-A-B-2,
Pick-B-2 he-2 , St-A-B-2, St-B-A-2,Ptdn-A-2,P
tdn-B-2 h-A-1 , Pick-A-1 h-B-1 ,Pick-B-1
. Constraints he-2 St-A-B-2 gt h-A-1
! activation On-A-B-2
St-A-B-2 gt On-B-A-2 ! St-B-A-2
mutex constraints Goals cl-B-2 ! he-2
!
68
Mutex propagation as CSP pre-processing
  • Suppose we start with a PG that only marks every
    pair of interfering actions as mutex
  • Any pair of non-noop actions are interfering
  • Any pair of actions are interfering if one gives
    P and other gives or requires P
  • No propagation is done
  • Converting this PG and CSP and solving it will
    still give a valid solution (if there is one)
  • So what is mutex propagation doing?
  • It is explicating implicit constraints
  • A special subset of 3-consistency enforcement
  • Recall that enforcing k-consistency involves
    adding (k-1)-ary constraints
  • Not full 3-consistency (which can be much
    costlier)
  • So enforcing the consistency on PG is cheaper
    than enforcing it after conversion to CSP...

69
What else in planning
  • Conditional planning
  • Incomplete initial state
  • Naïve idea consider all completions of the init
    state and solve each of the problems
  • Better idea Extract a tree plan directly
  • Sensing during planning
  • Interleaving planning and execution
  • Planning to sense
  • Stochastic planning
  • POMDPs
  • stochastic and incomplete
  • For classical problems
  • Other search algorithms
  • Partial order (least commitment) planning
  • Other compilation methods
  • Compile to SAT,
  • Integer Linear Programming
  • Binary Decision Diagrams
  • Exploiting additional domain knowledge
  • E.g. HTN planning
  • Metric/Temporal planning
  • Durative actions, resources
  • Combines planning and scheduling considerations
  • Extensions of PG heuristics seem to work (see
    SAPA)

Check out CSE 574 if interested...
70
Partial order planning
  • Partial order (least commitment) planning
  • Use precedence constraints instead of growing
    prefix/suffix
  • Separates planning and scheduling aspects cleanly
  • could be advantageous when we have concurrent
    execution
  • Was thought to be too hard to control
  • But see our RePOP system

71
Position, Relevanceand Commitment
Plannings own version of the Tastes Great
Less Filling debate
FSR and BSR must commit to both position and
relevance of actions Gives state
information - Leads to premature
commitment
Plan-space refinement (PSR) avoids
constraining position Reduces
commitment - Increases plan-handling
costs
Write a Comment
User Comments (0)
About PowerShow.com