NMS PI meeting, September 27-29, 2000

About This Presentation

Title:

NMS PI meeting, September 27-29, 2000

Description:

We will define the formula for (P,n) such that: ... The formulas say nothing about what happens to facts if they are not effected by ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 37

Provided by: edw116

Learn more at: https://www.public.asu.edu

Category:

more less

Transcript and Presenter's Notes

Title: NMS PI meeting, September 27-29, 2000

1
GraphPlan, Satplan and Markov Decision Processes
Sungwook Yoon

Based in part on slides by Alan Fern

2
GraphPlan

Many planning systems use ideas from Graphplan
IPP, STAN, SGP, Blackbox, Medic
Can run much faster than POP-style planners
History
Before GraphPlan came out, most planning
researchers were working on POP-style planners
GraphPlan started them thinking about other more
efficient algorithms
Recent planning algorithms run much faster than
GraphPlan
However, most of them of them have been
influenced by GraphPlan

3
Big Picture

A big source of inefficiency in search algorithms
is the large branching factor
GraphPlan reduces the branching factor by
searching in a special data structure

Phase 1 Create a Planning Graph
built from initial state
contains actions and propositions that are
possibly reachable from initial state
does not include unreachable actions or
propositions
Phase 2 - Solution Extraction
Backward search for the Search for solution in
the planning graph
backward from goal

4
Planning Graph
A literal is just a positive or negative
propositon

Sequence of levels that correspond to time-steps
in the plan
Each level contains a set of literals and a set
of actions
Literals are those that could possibly be true at
the time step
Actions are those that their preconditions could
be satisfied at the time step.
Idea construct superset of literals that could
be possible achieved after an n-level layered
plan
Gives a compact (but approximate) representation
of states that are reachable by n level plans

5
Planning Graph
propositions
actions
6
Planning Graph

maintenance action (persistence actions)
represents what happens if no action affects the
literal
include action with precondition c and effect c,
for each literal c

propositions
actions
7
Graph expansion

Initial proposition layer
Just the initial conditions
Action layer n
If all of an actions preconditions are in
proposition layer n,then add action to layer n
Proposition layer n1
For each action at layer n (including persistence
actions)
Add all its effects (both positive and negative)
at layer n1
(Also allow propositions at layer n to
persist to n1
Propagate mutex information (well talk about
this in a moment)

8
Example
stack(A,B) precondition holding(A),
clear(B) effect holding(A), clear(B),
on(A,B), clear(B), handempty
s0
a0
s1
holding(A)
holding(A)
holding(A)
handempty
stack(a,b)
clear(B)
on(A,B)
clear(B)
clear(B)
9
Example
stack(A,B) precondition holding(A),
clear(B) effect holding(A), clear(B),
on(A,B), clear(B), handempty
s0
a0
s1
holding(A)
holding(A)
holding(A)
handempty
stack(A,B)
clear(B)
on(A,B)
clear(B)
clear(B)
Notice that not all literals in s1 can be made
true simultaneously after 1 level e.g.
holding(A), holding(A) and on(A,B), clear(B)
10
Mutual Exclusion (Mutex)

Between pairs of actions
no valid plan could contain both at layer n
E.g., stack(a,b), unstack(a,b)
Between pairs of literals
no valid plan could produce both at layer n
E.g., clear(a), clear(a) on(a,b),
clear(b)
GraphPlan checks pairs only
mutex relationships can help rule out
possibilities during search in phase 2 of
Graphplan

11
Solution Extraction Backward Search
Repeat until goal set is empty If goals are
present non-mutex 1) Choose set of non-mutex
actions to achieve each goal 2) Add
preconditions to next goal set
12
Searching for a solution plan

Backward chain on the planning graph
Achieve goals level by level
At level k, pick a subset of non-mutex actions to
achieve current goals. Their preconditions become
the goals for k-1 level.
Build goal subset by picking each goal and
choosing an action to add. Use one already
selected if possible (backtrack if cant pick
non-mutex action)
If we reach the initial proposition level and the
current goals are in that level (i.e. they are
true in the initial state) then we have found a
successful layered plan

13
GraphPlan algorithm

Grow the planning graph (PG) to a level n such
that all goals are reachable and not mutex
necessary but insufficient condition for the
existence of an n level plan that achieves the
goals
if PG levels off before non-mutex goals are
achieved then fail
Search the PG for a valid plan
If none found, add a level to the PG and try
again
If the PG levels off and still no valid plan
found, then return failure
Correctness follows from PG properties

14
Important Ideas

Plan graph construction is polynomial time
Though construction can be expensive when there
are many objects and hence many propositions
The plan graph captures important properties of
the planning problem
Necessarily unreachable literals and actions
Possibly reachable literals and actions
Mutually exclusive literals and actions
Significantly prunes search space compared to POP
style planners
The plan graph provides a sound termination
procedure
Knows when no plan exists
Plan graphs can also be used for deriving
admissible (and good non-admissible) heuristics
See your book (we may come back to this idea
later)

15
Encoding Planning as Satisfiability Basic Idea

Bounded planning problem (P,n)
P is a planning problem n is a positive integer
Find a solution for P of length n
Create a propositional formula that represents
Initial state
Goal
Action Dynamics
for n time steps
We will define the formula for (P,n) such that
1) any model (i.e. satisfying truth assignment)
of the formula represent a solution to
(P,n) 2) if (P,n) has a solution then the
formula is satisfiable

16
Example of Complete Formula for (P,1)

at(r1,l1,0) ? ?at(r1,l2,0) ?
at(r1,l2,1) ?
move(r1,l1,l2,0) ? at(r1,l1,0) ?
move(r1,l1,l2,0) ? at(r1,l2,1) ?
move(r1,l1,l2,0) ? ?at(r1,l1,1) ?
move(r1,l2,l1,0) ? at(r1,l2,0) ?
move(r1,l2,l1,0) ? at(r1,l1,1) ?
move(r1,l2,l1,0) ? ?at(r1,l2,1) ?
?move(r1,l1,l2,0) ? ?move(r1,l2,l1,0) ?
?at(r1,l1,0) ? at(r1,l1,1) ? move(r1,l2,l1,0)
?
?at(r1,l2,0) ? at(r1,l2,1) ? move(r1,l1,l2,0)
?
at(r1,l1,0) ? ?at(r1,l1,1) ? move(r1,l1,l2,0)
?
at(r1,l2,0) ? ?at(r1,l2,1) ? move(r1,l2,l1,0)

Formula has propositions for actions and states
variablesat each possible timestep
Well now discuss how to construct such a formula
17
Overall Approach

Do iterative deepening like we did with
Graphplan
for n 0, 1, 2, ,
encode (P,n) as a satisfiability problem ?
if ? is satisfiable, then
From the set of truth values that satisfies ?, a
solution plan can be constructed, so return it
and exit
With a complete satisfiability tester, this
approach will produce optimal layered plans for
solvable problems
We can use a GraphPlan analysis to determine an
upper bound on n, giving a way to detect
unsolvability

18
Fluents (will be used as propositons)

If plan ?a0, a1, , an1? is a solution to (P,n),
then it generates a sequence of states ?s0, s1,
, sn1?
A fluent is a proposition used to describe whats
true in each si
on(A,B,i) is a fluent thats true iff
at(r1,loc1) is in si
Well use ei to denote the fluent for a fact e in
state si
e.g. if e at(r1,loc1)
then ei at(r1,loc1,i)
ai is a fluent saying that a is a action taken at
step i
e.g., if a move(r1,loc2,loc1)
then ai move(r1,loc2,loc1,i)
The set of all possible fluents for (P,n) form
the set of primitive propositions used to
construct our formula for (P,n)

19
Encoding Planning Problems

We can encode (P,n) so that we consider either
layered plans or totally ordered plans
an advantage of considering layered plans is that
fewer time steps are necessary (i.e. smaller n
translates into smaller formulas)
for simplicity we first consider totally-ordered
plans
Encode (P,n) as a formula ? such that
?a0, a1, , an1? is a solution for (P,n)
if and only if
? can be satisfied in a way that makes the
fluents a0, , an1 true
? will be conjunction of many other formulas

20
Formulas in ?

Formula describing the initial state (let E be
the set of possible facts in the planning
problem)
/\e0 e ? s0 ? /\?e0 e ? E s0
Describes the complete initial state (both
positive and negative fact)
E.g. on(A,B,0) ? ?on(B,A,0)
Formula describing the goal (G is set of goal
facts)
/\en e ? G
says that the goal facts must be true in the
final state at timestep n
E.g. on(B,A,n)
Is this enough?
Of course not. The formulas say nothing about
actions.

21
Formulas in ?

For every action a and timestep i, formula
describing what fluents must be true if a were
the ith step of the plan
ai ? /\ ei e ? Precond(a), as
preconditions must be true
ai ? /\ ei1 e ? ADD(a), as ADD effects
must be true in i1
ai ? /\ ?ei1 e ? DEL(a), as DEL
effects must be false in i1
Complete exclusion axiom
For all actions a and b and timesteps i, formulas
saying a and b cant occur at the same time
? ai ? ? bi
this guarantees there can be only one action at a
time
Is this enough?
The formulas say nothing about what happens to
facts if they are not effected by an action
This is known as the frame problem

22
Example

Planning domain
one robot r1
two adjacent locations l1, l2
one operator (move the robot)
Encode (P,n) where n 1
Initial state at(r1,l1)
Encoding at(r1,l1,0) ? ?at(r1,l2,0)
Goal at(r1,l2)
Encoding at(r1,l2,1)
Action Schema see next slide

23
Extracting a Plan

Suppose we find an assignment of truth values
that satisfies ?.
This means P has a solution of length n
For i0,,n-1, there will be exactly one action a
such that ai true
This is the ith action of the plan.
Example (from the previous slides)
? can be satisfied with move(r1,l1,l2,0) true
Thus ?move(r1,l1,l2,0)? is a solution for (P,0)
Its the only solution - no other way to satisfy
?

24
What SATPLAN Shows

General propositional reasoning can compete with
state of the art specialized planning systems
New, highly tuned variations of DP surprising
powerful
Radically new stochastic approaches to SAT can
provide very low exponential scaling
Why does it work?
More flexible than forward or backward chaining
Randomized algorithms less likely to get trapped
along bad paths

25
BlackBox (GraphPlan SatPlan)

The BlackBox procedure combines planning-graph
expansion and satisfiability checking
It is roughly as follows
for n 0, 1, 2,
Graph expansion
create a planning graph that contains n
levels
Check whether the planning graph satisfies a
necessary(but insufficient) condition for plan
existence
If it does, then
Encode (P,n) as a satisfiability problem ? but
include only the actions in the planning graph
If ? is satisfiable then return the solution

26
Blackbox
Can be thought of as an implementation of
GraphPlan that uses an alternative plan
extraction technique than the backward chaining
of GraphPlan.
Plan Graph
Mutex computation
STRIPS
Translator
CNF
Simplifier
General Stochastic / Systematic SAT engines
Solution
CNF
27
Classical Planning Assumptions
Actions
Percepts
World
sole sourceof change
perfect
????
deterministic
fully observable
instantaneous
28
Stochastic/Probabilistic Planning Markov
Decision Process (MDP) Model
Actions
Percepts
World
sole sourceof change
perfect
????
stochastic
fully observable
instantaneous
29
Types of Uncertainty

Disjunctive (used by non-deterministic planning)
Next state could be one of a set of states.
Stochastic/Probabilistic
Next state is drawn from a probability
distribution over the set of states.
How are these models related?

30
Markov Decision Processes

An MDP has four components S, A, R, T
(finite) state set S (S n)
(finite) action set A (A m)
(Markov) transition function T(s,a,s) Pr(s
s,a)
Probability of going to state s after taking
action a in state s
How many parameters does it take to represent?
bounded, real-valued reward function R(s)
Immediate reward we get for being in state s
For example in a goal-based domain R(s) may equal
1 for goal states and 0 for all others
Can be generalized to include action costs
R(s,a)
Can be generalized to be a stochastic function
Can easily generalize to countable or continuous
state and action spaces (but algorithms will be
different)

31
Graphical View of MDP
At
At1
St
St1
St2
Rt2
Rt
Rt1
32
Assumptions

First-Order Markovian dynamics (history
independence)
Pr(St1At,St,At-1,St-1,..., S0) Pr(St1At,St)
Next state only depends on current state and
current action
First-Order Markovian reward process
Pr(RtAt,St,At-1,St-1,..., S0) Pr(RtAt,St)
Reward only depends on current state and action
As described earlier we will assume reward is
specified by a deterministic function R(s)
i.e. Pr(RtR(St) At,St) 1
Stationary dynamics and reward
Pr(St1At,St) Pr(Sk1Ak,Sk) for all t, k
The world dynamics do not depend on the absolute
time
Full observability
Though we cant predict exactly which state we
will reach when we execute an action, once it is
realized, we know what it is

33
Policies (plans for MDPs)

Nonstationary policy
pS x T ? A, where T is the non-negative integers
p(s,t) is action to do at state s with t
stages-to-go
What if we want to keep acting indefinitely?
Stationary policy
pS ? A
p(s) is action to do at state s (regardless of
time)
specifies a continuously reactive controller
These assume or have these properties
full observability
history-independence
deterministic action choice

Why not just consider sequences of actions? Why
not just replan?
34
Value of a Policy

How good is a policy p?
How do we measure accumulated reward?
Value function V S ?R associates value with each
state (or each state and time for non-stationary
p)
Vp(s) denotes value of policy at state s
Depends on immediate reward, but also what you
achieve subsequently by following p
An optimal policy is one that is no worse than
any other policy at any state
The goal of MDP planning is to compute an optimal
policy (method depends on how we define value)

35
Policy Evaluation

Value equation for fixed policy
How can we compute the value function for a
policy?
we are given R and Pr
simple linear system with n variables (each
variables is value of a state) and n constraints
(one value equation for each state)
Use linear algebra (e.g. matrix inverse)

36
Value Iteration vs. Policy Iteration

Which is faster? VI or PI
It depends on the problem
VI takes more iterations than PI, but PI requires
more time on each iteration
PI must perform policy evaluation on each step
which involves solving a linear system
Complexity
There are at most exp(n) policies, so PI is no
worse than exponential time in number of states
Empirically O(n) iterations are required
Still no polynomial bound on the number of PI
iterations (open problem)!