NMS PI meeting, September 27-29, 2000 - PowerPoint PPT Presentation

About This Presentation
Title:

NMS PI meeting, September 27-29, 2000

Description:

We will define the formula for (P,n) such that: ... The formulas say nothing about what happens to facts if they are not effected by ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 37
Provided by: edw116
Category:

less

Transcript and Presenter's Notes

Title: NMS PI meeting, September 27-29, 2000


1
GraphPlan, Satplan and Markov Decision Processes
Sungwook Yoon
  • Based in part on slides by Alan Fern

2
GraphPlan
  • Many planning systems use ideas from Graphplan
  • IPP, STAN, SGP, Blackbox, Medic
  • Can run much faster than POP-style planners
  • History
  • Before GraphPlan came out, most planning
    researchers were working on POP-style planners
  • GraphPlan started them thinking about other more
    efficient algorithms
  • Recent planning algorithms run much faster than
    GraphPlan
  • However, most of them of them have been
    influenced by GraphPlan

3
Big Picture
  • A big source of inefficiency in search algorithms
    is the large branching factor
  • GraphPlan reduces the branching factor by
    searching in a special data structure
  • Phase 1 Create a Planning Graph
  • built from initial state
  • contains actions and propositions that are
    possibly reachable from initial state
  • does not include unreachable actions or
    propositions
  • Phase 2 - Solution Extraction
  • Backward search for the Search for solution in
    the planning graph
  • backward from goal

4
Planning Graph
A literal is just a positive or negative
propositon
  • Sequence of levels that correspond to time-steps
    in the plan
  • Each level contains a set of literals and a set
    of actions
  • Literals are those that could possibly be true at
    the time step
  • Actions are those that their preconditions could
    be satisfied at the time step.
  • Idea construct superset of literals that could
    be possible achieved after an n-level layered
    plan
  • Gives a compact (but approximate) representation
    of states that are reachable by n level plans

5
Planning Graph
propositions
actions
6
Planning Graph
  • maintenance action (persistence actions)
  • represents what happens if no action affects the
    literal
  • include action with precondition c and effect c,
    for each literal c

propositions
actions
7
Graph expansion
  • Initial proposition layer
  • Just the initial conditions
  • Action layer n
  • If all of an actions preconditions are in
    proposition layer n,then add action to layer n
  • Proposition layer n1
  • For each action at layer n (including persistence
    actions)
  • Add all its effects (both positive and negative)
    at layer n1
  • (Also allow propositions at layer n to
    persist to n1
  • Propagate mutex information (well talk about
    this in a moment)

8
Example
stack(A,B) precondition holding(A),
clear(B) effect holding(A), clear(B),
on(A,B), clear(B), handempty
s0
a0
s1
holding(A)
holding(A)
holding(A)
handempty
stack(a,b)
clear(B)
on(A,B)
clear(B)
clear(B)
9
Example
stack(A,B) precondition holding(A),
clear(B) effect holding(A), clear(B),
on(A,B), clear(B), handempty
s0
a0
s1
holding(A)
holding(A)
holding(A)
handempty
stack(A,B)
clear(B)
on(A,B)
clear(B)
clear(B)
Notice that not all literals in s1 can be made
true simultaneously after 1 level e.g.
holding(A), holding(A) and on(A,B), clear(B)
10
Mutual Exclusion (Mutex)
  • Between pairs of actions
  • no valid plan could contain both at layer n
  • E.g., stack(a,b), unstack(a,b)
  • Between pairs of literals
  • no valid plan could produce both at layer n
  • E.g., clear(a), clear(a) on(a,b),
    clear(b)
  • GraphPlan checks pairs only
  • mutex relationships can help rule out
    possibilities during search in phase 2 of
    Graphplan

11
Solution Extraction Backward Search
Repeat until goal set is empty If goals are
present non-mutex 1) Choose set of non-mutex
actions to achieve each goal 2) Add
preconditions to next goal set
12
Searching for a solution plan
  • Backward chain on the planning graph
  • Achieve goals level by level
  • At level k, pick a subset of non-mutex actions to
    achieve current goals. Their preconditions become
    the goals for k-1 level.
  • Build goal subset by picking each goal and
    choosing an action to add. Use one already
    selected if possible (backtrack if cant pick
    non-mutex action)
  • If we reach the initial proposition level and the
    current goals are in that level (i.e. they are
    true in the initial state) then we have found a
    successful layered plan

13
GraphPlan algorithm
  • Grow the planning graph (PG) to a level n such
    that all goals are reachable and not mutex
  • necessary but insufficient condition for the
    existence of an n level plan that achieves the
    goals
  • if PG levels off before non-mutex goals are
    achieved then fail
  • Search the PG for a valid plan
  • If none found, add a level to the PG and try
    again
  • If the PG levels off and still no valid plan
    found, then return failure
  • Correctness follows from PG properties

14
Important Ideas
  • Plan graph construction is polynomial time
  • Though construction can be expensive when there
    are many objects and hence many propositions
  • The plan graph captures important properties of
    the planning problem
  • Necessarily unreachable literals and actions
  • Possibly reachable literals and actions
  • Mutually exclusive literals and actions
  • Significantly prunes search space compared to POP
    style planners
  • The plan graph provides a sound termination
    procedure
  • Knows when no plan exists
  • Plan graphs can also be used for deriving
    admissible (and good non-admissible) heuristics
  • See your book (we may come back to this idea
    later)

15
Encoding Planning as Satisfiability Basic Idea
  • Bounded planning problem (P,n)
  • P is a planning problem n is a positive integer
  • Find a solution for P of length n
  • Create a propositional formula that represents
  • Initial state
  • Goal
  • Action Dynamics
  • for n time steps
  • We will define the formula for (P,n) such that
    1) any model (i.e. satisfying truth assignment)
    of the formula represent a solution to
    (P,n) 2) if (P,n) has a solution then the
    formula is satisfiable

16
Example of Complete Formula for (P,1)
  • at(r1,l1,0) ? ?at(r1,l2,0) ?
  • at(r1,l2,1) ?
  • move(r1,l1,l2,0) ? at(r1,l1,0) ?
  • move(r1,l1,l2,0) ? at(r1,l2,1) ?
  • move(r1,l1,l2,0) ? ?at(r1,l1,1) ?
  • move(r1,l2,l1,0) ? at(r1,l2,0) ?
  • move(r1,l2,l1,0) ? at(r1,l1,1) ?
  • move(r1,l2,l1,0) ? ?at(r1,l2,1) ?
  • ?move(r1,l1,l2,0) ? ?move(r1,l2,l1,0) ?
  • ?at(r1,l1,0) ? at(r1,l1,1) ? move(r1,l2,l1,0)
    ?
  • ?at(r1,l2,0) ? at(r1,l2,1) ? move(r1,l1,l2,0)
    ?
  • at(r1,l1,0) ? ?at(r1,l1,1) ? move(r1,l1,l2,0)
    ?
  • at(r1,l2,0) ? ?at(r1,l2,1) ? move(r1,l2,l1,0)

Formula has propositions for actions and states
variablesat each possible timestep
Well now discuss how to construct such a formula
17
Overall Approach
  • Do iterative deepening like we did with
    Graphplan
  • for n 0, 1, 2, ,
  • encode (P,n) as a satisfiability problem ?
  • if ? is satisfiable, then
  • From the set of truth values that satisfies ?, a
    solution plan can be constructed, so return it
    and exit
  • With a complete satisfiability tester, this
    approach will produce optimal layered plans for
    solvable problems
  • We can use a GraphPlan analysis to determine an
    upper bound on n, giving a way to detect
    unsolvability

18
Fluents (will be used as propositons)
  • If plan ?a0, a1, , an1? is a solution to (P,n),
    then it generates a sequence of states ?s0, s1,
    , sn1?
  • A fluent is a proposition used to describe whats
    true in each si
  • on(A,B,i) is a fluent thats true iff
    at(r1,loc1) is in si
  • Well use ei to denote the fluent for a fact e in
    state si
  • e.g. if e at(r1,loc1)
  • then ei at(r1,loc1,i)
  • ai is a fluent saying that a is a action taken at
    step i
  • e.g., if a move(r1,loc2,loc1)
  • then ai move(r1,loc2,loc1,i)
  • The set of all possible fluents for (P,n) form
    the set of primitive propositions used to
    construct our formula for (P,n)

19
Encoding Planning Problems
  • We can encode (P,n) so that we consider either
    layered plans or totally ordered plans
  • an advantage of considering layered plans is that
    fewer time steps are necessary (i.e. smaller n
    translates into smaller formulas)
  • for simplicity we first consider totally-ordered
    plans
  • Encode (P,n) as a formula ? such that
  • ?a0, a1, , an1? is a solution for (P,n)
  • if and only if
  • ? can be satisfied in a way that makes the
    fluents a0, , an1 true
  • ? will be conjunction of many other formulas

20
Formulas in ?
  • Formula describing the initial state (let E be
    the set of possible facts in the planning
    problem)
  • /\e0 e ? s0 ? /\?e0 e ? E s0
  • Describes the complete initial state (both
    positive and negative fact)
  • E.g. on(A,B,0) ? ?on(B,A,0)
  • Formula describing the goal (G is set of goal
    facts)
  • /\en e ? G
  • says that the goal facts must be true in the
    final state at timestep n
  • E.g. on(B,A,n)
  • Is this enough?
  • Of course not. The formulas say nothing about
    actions.

21
Formulas in ?
  • For every action a and timestep i, formula
    describing what fluents must be true if a were
    the ith step of the plan
  • ai ? /\ ei e ? Precond(a), as
    preconditions must be true
  • ai ? /\ ei1 e ? ADD(a), as ADD effects
    must be true in i1
  • ai ? /\ ?ei1 e ? DEL(a), as DEL
    effects must be false in i1
  • Complete exclusion axiom
  • For all actions a and b and timesteps i, formulas
    saying a and b cant occur at the same time
  • ? ai ? ? bi
  • this guarantees there can be only one action at a
    time
  • Is this enough?
  • The formulas say nothing about what happens to
    facts if they are not effected by an action
  • This is known as the frame problem

22
Example
  • Planning domain
  • one robot r1
  • two adjacent locations l1, l2
  • one operator (move the robot)
  • Encode (P,n) where n 1
  • Initial state at(r1,l1)
  • Encoding at(r1,l1,0) ? ?at(r1,l2,0)
  • Goal at(r1,l2)
  • Encoding at(r1,l2,1)
  • Action Schema see next slide

23
Extracting a Plan
  • Suppose we find an assignment of truth values
    that satisfies ?.
  • This means P has a solution of length n
  • For i0,,n-1, there will be exactly one action a
    such that ai true
  • This is the ith action of the plan.
  • Example (from the previous slides)
  • ? can be satisfied with move(r1,l1,l2,0) true
  • Thus ?move(r1,l1,l2,0)? is a solution for (P,0)
  • Its the only solution - no other way to satisfy
    ?

24
What SATPLAN Shows
  • General propositional reasoning can compete with
    state of the art specialized planning systems
  • New, highly tuned variations of DP surprising
    powerful
  • Radically new stochastic approaches to SAT can
    provide very low exponential scaling
  • Why does it work?
  • More flexible than forward or backward chaining
  • Randomized algorithms less likely to get trapped
    along bad paths

25
BlackBox (GraphPlan SatPlan)
  • The BlackBox procedure combines planning-graph
    expansion and satisfiability checking
  • It is roughly as follows
  • for n 0, 1, 2,
  • Graph expansion
  • create a planning graph that contains n
    levels
  • Check whether the planning graph satisfies a
    necessary(but insufficient) condition for plan
    existence
  • If it does, then
  • Encode (P,n) as a satisfiability problem ? but
    include only the actions in the planning graph
  • If ? is satisfiable then return the solution

26
Blackbox
Can be thought of as an implementation of
GraphPlan that uses an alternative plan
extraction technique than the backward chaining
of GraphPlan.
Plan Graph
Mutex computation
STRIPS
Translator
CNF
Simplifier
General Stochastic / Systematic SAT engines
Solution
CNF
27
Classical Planning Assumptions
Actions
Percepts
World
sole sourceof change
perfect
????
deterministic
fully observable
instantaneous
28
Stochastic/Probabilistic Planning Markov
Decision Process (MDP) Model
Actions
Percepts
World
sole sourceof change
perfect
????
stochastic
fully observable
instantaneous
29
Types of Uncertainty
  • Disjunctive (used by non-deterministic planning)
  • Next state could be one of a set of states.
  • Stochastic/Probabilistic
  • Next state is drawn from a probability
    distribution over the set of states.
  • How are these models related?

30
Markov Decision Processes
  • An MDP has four components S, A, R, T
  • (finite) state set S (S n)
  • (finite) action set A (A m)
  • (Markov) transition function T(s,a,s) Pr(s
    s,a)
  • Probability of going to state s after taking
    action a in state s
  • How many parameters does it take to represent?
  • bounded, real-valued reward function R(s)
  • Immediate reward we get for being in state s
  • For example in a goal-based domain R(s) may equal
    1 for goal states and 0 for all others
  • Can be generalized to include action costs
    R(s,a)
  • Can be generalized to be a stochastic function
  • Can easily generalize to countable or continuous
    state and action spaces (but algorithms will be
    different)

31
Graphical View of MDP
At
At1
St
St1
St2
Rt2
Rt
Rt1
32
Assumptions
  • First-Order Markovian dynamics (history
    independence)
  • Pr(St1At,St,At-1,St-1,..., S0) Pr(St1At,St)
  • Next state only depends on current state and
    current action
  • First-Order Markovian reward process
  • Pr(RtAt,St,At-1,St-1,..., S0) Pr(RtAt,St)
  • Reward only depends on current state and action
  • As described earlier we will assume reward is
    specified by a deterministic function R(s)
  • i.e. Pr(RtR(St) At,St) 1
  • Stationary dynamics and reward
  • Pr(St1At,St) Pr(Sk1Ak,Sk) for all t, k
  • The world dynamics do not depend on the absolute
    time
  • Full observability
  • Though we cant predict exactly which state we
    will reach when we execute an action, once it is
    realized, we know what it is

33
Policies (plans for MDPs)
  • Nonstationary policy
  • pS x T ? A, where T is the non-negative integers
  • p(s,t) is action to do at state s with t
    stages-to-go
  • What if we want to keep acting indefinitely?
  • Stationary policy
  • pS ? A
  • p(s) is action to do at state s (regardless of
    time)
  • specifies a continuously reactive controller
  • These assume or have these properties
  • full observability
  • history-independence
  • deterministic action choice

Why not just consider sequences of actions? Why
not just replan?
34
Value of a Policy
  • How good is a policy p?
  • How do we measure accumulated reward?
  • Value function V S ?R associates value with each
    state (or each state and time for non-stationary
    p)
  • Vp(s) denotes value of policy at state s
  • Depends on immediate reward, but also what you
    achieve subsequently by following p
  • An optimal policy is one that is no worse than
    any other policy at any state
  • The goal of MDP planning is to compute an optimal
    policy (method depends on how we define value)

35
Policy Evaluation
  • Value equation for fixed policy
  • How can we compute the value function for a
    policy?
  • we are given R and Pr
  • simple linear system with n variables (each
    variables is value of a state) and n constraints
    (one value equation for each state)
  • Use linear algebra (e.g. matrix inverse)

36
Value Iteration vs. Policy Iteration
  • Which is faster? VI or PI
  • It depends on the problem
  • VI takes more iterations than PI, but PI requires
    more time on each iteration
  • PI must perform policy evaluation on each step
    which involves solving a linear system
  • Complexity
  • There are at most exp(n) policies, so PI is no
    worse than exponential time in number of states
  • Empirically O(n) iterations are required
  • Still no polynomial bound on the number of PI
    iterations (open problem)!
Write a Comment
User Comments (0)
About PowerShow.com