4/1: Search Methods and Heuristics - PowerPoint PPT Presentation

About This Presentation
Title:

4/1: Search Methods and Heuristics

Description:

Progression: Sapa (TLPlan; FF) Regression: TP4. Partial order: Zeno (IxTET) Reading List ... To become linear. More on Temporal planning. by plan-space planners (Zeno) ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 34
Provided by: min63
Category:

less

Transcript and Presenter's Notes

Title: 4/1: Search Methods and Heuristics


1
4/1 Search Methods and Heuristics
  • Progression Sapa (TLPlan FF)
  • Regression TP4
  • Partial order Zeno (IxTET)

2
Reading List
  • (3/27)Papers on Metric Temporal Planning
  • Paper on PDDL-2.1 standard (read up to--not
    including--section 6)
  • Paper on SAPA
  • Paper on Temporal TLPlan (see Section 3 for a
    slightly longer description of the progression
    search used in SAPA). (regression search for
    Temporal Planning
  • Paper on TP4 (regression search for Temporal
    Planning
  • Paper on Zeno (Plan-space search for Temporal
    Planning)

3
State-Space SearchSearch is through
time-stamped states
Search states should have information about --
what conditions hold at the current time slice
(P,M below) -- what actions have we already
committed to put into the plan (?,Q below)
S(P,M,?,Q,t)
In the initial state, P,M, non-empty
Q non-empty if we have exogenous
events
4
Light-match
Let current state S be Phave_light_at_0
at_steps_at_0
Qhave_light_at_15
t 0 (presumably after doing the light-candle
action) Applying
cross_cellar to this state gives S
Phave_light_at_0 crossing_at_0
?have_light,lt0,10gt Qat_fuse-box_at_10hav
e_light_at_15 t 0
Time-stamp
Light-match
Cross-cellar
15
10
5
Advancing the clock as a device for concurrency
control
In the cellar plan above, the clock, If advanced,
will be advanced to 15, Where an event
(have-light will occur) This means cross-cellar
can either be done At 0 or 15 (and the latter
makes no sense)
  • To support concurrency, we need to consider
    advancing the clock
  • How far to advance the clock?
  • One shortcut is to advance the clock to the time
    of the next earliest event event in the event
    queue since this is the least advance needed to
    make changes to P and M of S.
  • At this point, all the events happening at that
    time point are transferred from Q to P and M (to
    signify that they have happened)
  • This
  • This strategy will find a plan for every
    problembut will have the effect of enforcing
    concurrency by putting the concurrent actions to
    align on the left end
  • In the candle/cellar example, we will find plans
    where the crossing cellar action starts right
    when the light-match action starts
  • If we need slack in the start times, we will have
    to post-process the plan
  • If we want plans with arbitrary slacks on
    start-times to appears in the search space, we
    will have to consider advancing the clock by
    arbitrary amounts (even if it changes nothing in
    the state other than the clock time itself).

have-light
Light-match
Cross-cellar
Cross-cellar
15
10
6
Search Algorithm (cont.)
  • Goal Satisfaction
  • S(P,M,?,Q,t) ? G if ?ltpi,tigt? G either
  • ? ltpi,tjgt ? P, tj lt ti and no event in Q deletes
    pi.
  • ? e ? Q that adds pi at time te lt ti.
  • Action Application
  • Action A is applicable in S if
  • All instantaneous preconditions of A are
    satisfied by P and M.
  • As effects do not interfere with ? and Q.
  • No event in Q interferes with persistent
    preconditions of A.
  • A does not lead to concurrent resource change
  • When A is applied to S
  • P is updated according to As instantaneous
    effects.
  • Persistent preconditions of A are put in ?
  • Delayed effects of A are put in Q.

S(P,M,?,Q,t)
TLplan Sapa 2001
7
Regression Search is similar
R W X y
  • In the case of regression over durative actions
    too, the main generalization we need is
    differentiating the advancement of clock and
    application of a relevant action
  • Can use same state representation S(P,M,?,Q,t)
    with the semantics that
  • P and M are binary and resource subgoals needed
    at current time point
  • Q are the subgoals needed at earlier time points
  • ? are subgoals to be protected over specific
    intervals
  • We can either add an action to support something
    in P or Q, or push the clock backward before
    considering subgoals
  • If we push the clock backward, we push it to the
    time of the latest subgoal in Q
  • TP4 uses a slightly different representation
    (with State and Action information)

Q
A3W
A2X
A1Y
We can either work On R at tinf or R and Q At
tinf-D(A3)
TP4 1999
8
Let current state S be Pat_fuse_box_at_0
t 0
Regressing cross_cellar over this
state gives S P ?have_light,lt 0
, -10gt Qhave_light_at_ -10at_stairs_at_-10
t 0
Cross_cellar
Have_light
Notice that in contrast to progression, Regression
will align the end points of Concurrent
actions(e.g. when we put in Light-match to
support have-light)
This example changed since the class
9
Notice that in contrast to progression, Regression
will align the end points of Concurrent
actions(e.g. when we put in Light-match to
support have-light)
Cross_cellar
S P ?have_light,lt 0 , -10gt
Qhave_light_at_-10at_stairs_at_-10 t
0 If we now decide to support the subgoal in
Q Using light-match SP
Qhave-match_at_-15at_stairs_at_-10
?have_light,lt0 , -10gt t 0
Have_light
Cross_cellar
Have_light
Light-match
10
PO (Partial Order) Search
Involves LPsolving over Linear
constraints (temporal constraints Are linear
too) Waits for nonlinear constraints To become
linear.
Involves Posting temporal Constraints,
and Durative goals
Split the Interval into Multiple
overlapping intervals
Zeno 1994
11
More on Temporal planningby plan-space planners
(Zeno)
  • The accommodation to complexity that Zeno makes
    by refusing to handle nonlinear constraints
    (waiting instead until they become linear) is
    sort of hilarious given it doesnt care much
    about heuristic control otherwise
  • Basically Zeno is trying to keep the per-node
    cost of the search down (and if you do nonlinear
    constraint consistency check, even that is quite
    hard)
  • Of course, we know now that there is no obvious
    reason to believe that reducing the per-node cost
    will, ipso facto, also lead to reduction in
    overall search.
  • The idea of goal reduction by splitting a
    temporal subgoal to multiple sub-intervals is
    used only in Zeno, and helps it support a
    temporal goal over a long duration with multiple
    actions. Neat idea.
  • Zeno doesnt have much of a problem handling
    arbitrary concurrencysince we are only posting
    constraints on temporal variables denoting the
    start points of the various actions. In
    particular, Zeno does not force either right or
    left alignment of actions.
  • In addition to Zeno, IxTeT is another influential
    metric temporal planner that uses plan-space
    planning idea.

12
At_fusebox
Have_light_at_t1
t1
Cross_cellar
G
I
at_fuse_box_at_G
t2
Have_light_at_ltt1,t2gt
t2-t1 10 t1 lt tG tI lt t1
13
The have_light effect at t4 can violate the
lthave_light, t3,t1gt causal link! Resolve by
Adding T4ltt3 V t1ltt4
have-light
t3
t4
Burn_match
At_fusebox
Have_light_at_t1
t1
Cross_cellar
G
I
at_fuse_box_at_G
t2
Have_light_at_ltt1,t2gt
t2-t1 10 t1 lt tG tI lt t1 T4lttG T4-t315 T3ltt1 T4lt
t3 V t1ltt4
14
Notice that zeno allows arbitrary slack
between the two actions
have-light
t3
t4
Burn_match
At_fusebox
Have_light_at_t1
t1
Cross_cellar
G
I
at_fuse_box_at_G
t2
Have_light_at_ltt1,t2gt
t2-t1 10 t1 lt tG tI lt t1 T4lttG T4-t315 T3ltt1 T4lt
t3 V t1ltt4 T3ltt2 T4ltt3 V t2ltt4
To work on have_light_at_ltt1,t2gt, we can either
--support the whole interval directly by
adding a causal link lthave-light, t3,ltt1,t2gtgt
--or first split ltt1,t2gt to two subintervals
ltt1,tgt ltt,t2gt and work on supporting
have-light on both intervals
15
4/3
  • Discussion of the Sapa/Tp4/Zeno search algorithms
  • Heuristics for temporal planning

16
Q/A on Search Methods for Temporal Planning
  • Menkes What is meant by the argument that
    resources are always easy to handle for
    progression planners?
  • The idea is that the partial plans in the search
    space of a progression planner are position
    constrainedso you know exactly when each action
    starts. Given then, it is a simple matter to
    check if a particular resource constraint
    (however complicated and nonlinear) holds over a
    time point or interval.
  • In contrast, partial order planners only have
    constraints on the start points. So, checking
    that a resource constraint is valid involves
    checking that it holds on every possible
    assignment of times to the temporal variables.
  • The difference is akin to the difference between
    model checking and theorem proving Halpern
    Vardi KR91 (you can check the consistency of
    more complicated formulas in more complicated
    logics if you only need to do model-checking
    rather than inference/theorem proving

17
Q/A contd.
  • Dan Can the interval goal reduction used in
    Zeno be made more goal directed?
  • Yes. For example, regressing a goal have_light_at_1
    15 over an action that gives have_ligth_at_1 7
    will make it have_light_at_7 15
  • Making the reduction goal directed may be
    actually a smarter idea (especially for position
    constrained plannersfor zeno, it doesnt make
    much difference since it splits the interval into
    two variable-sized intervals.

18
Q/A contd
  • Romeo TL Plan paper says that their strategy is
    to keep adding concurrent actions until no more
    actions can be added at the current point, and
    only then advance the clock. Is this used in SAPA
    too?
  • Rao I am surprised to hear that TLPlan does
    that. If this is used as a strategy rather than
    as a heuristic, then it can lead to loss of
    completeness. In general, we just because an
    action can be done doesnt mean that it should be
    done.
  • For example, consider a problem where you want a
    goal G. Ultimately, all actions that give G wind
    up requiring, among other conditions, the
    condition P. P is present in the init state.
    There is an action A that deletes P and no
    action gives P. It is applicable in the init
    state and doesnt interfere with ANY of the other
    actions. Now, if we put A in the plan, just
    because it can be done concurrently, then we know
    we are doomed.
  • I (Rao) made this mistake in my ECP-97 paper on
    Graphplan (see Footnote 2 in http//rakaposhi.eas.
    asu.edu/pub/rao/ewsp-graphplan.ps), and figured
    out my error later

19
Tradeoffs Progression/Regression/PO Planning for
metric/temporal planning
  • Compared to PO, both progression and regression
    do a less than fully flexible job of handling
    concurrency (e.g. slacks may have to be handled
    through post-processing).
  • Progression planners have the advantage that the
    exact amount of a resource is known at any given
    state. So, complex resource constraints are
    easier to verify. PO (and to some extent
    regression), will have to verify this by posting
    and then verifying resource constraints.
  • Currently, SAPA (a progression planner) does
    better than TP4 (a regression planner). Both do
    oodles better than Zeno/IxTET. However
  • TP4 could be possibly improved significantly by
    giving up the insistence on admissible heuristics
  • Zeno (and IxTET) could benefit by adapting ideas
    from RePOP.

20
Heuristic Control
Temporal planners have to deal with more
branching possibilities ? More critical to have
good heuristic guidance
Design of heuristics depends on the objective
function
? In temporal Planning heuristics focus on richer
obj. functions that guide both planning and
scheduling
21
Objectives in Temporal Planning
  • Number of actions Total number of actions in the
    plan.
  • Makespan The shortest duration in which we can
    possibly execute all actions in the solution.
  • Resource Consumption Total amount of resource
    consumed by actions in the solution.
  • Slack The duration between the time a goal is
    achieved and its deadline.
  • Optimize max, min or average slack values
  • Combinations there-of

22
Deriving heuristics for SAPA
We use phased relaxation approach to derive
different heuristics
Relax the negative logical and resource
effects to build the Relaxed Temporal Planning
Graph
AltAlt,AIJ2001
23
Heuristics in Sapa are derived from the
Graphplan-style bi-level relaxed temporal
planning graph (RTPG)
Progression so constructed anew for each
state..
24
Relaxed Temporal Planning Graph
  • Relaxed Action
  • No delete effects
  • May be okay given progression planning
  • No resource consumption
  • Will adjust later

while(true) forall A?advance-time
applicable in S S
Apply(A,S) Involves changing P,?,Q,t Update Q
only with positive effects and only when there
is no other earlier event giving that effect
if S?G then Terminatesolution
S Apply(advance-time,S) if ?(pi,ti) ?G
such that ti lt Time(S) and pi?S
then
Terminatenon-solution else S S end
while
Deadline goals
25
Details on RTPG Construction
  • ?All our heuristics are based on the relaxed
    temporal planning graph structure (RTPG). This is
    a Graphplanstyle
  • 2 bi-level planning graph generalized to
    temporal domains.
  • Given a state S (PM ? Q t), the RTPG is
    built from S using the set of relaxed actions,
    which are generated from original actions by
    eliminating all effects which (1) delete some
    fact (predicate) or (2) reduce the level of some
    resource. Since delete effects are ignored, RTPG
    will not contain any mutex relations, which
    considerably reduces the cost of constructing
    RTPG. The algorithm to build the RTPG structure
    is summarized in Figure 4.
  • ?To build RTPG, we need three main
    datastructures a fact level, an action level,
    and an unexecuted event queue
  • ?Each fact f or action A is marked in, and
    appears in the RTPGs fact/action level at time
    instant tf /tA if it can be
  • achieved/executed at tf /tA.
  • ?In the beginning, only facts which appear in P
    are marked in at t, the action level is empty,
    and the event queue holds all the unexecuted
    events in Q that add new predicates.
  • ?Action A will be marked in if (1) A is not
    already marked in and (2) all of As
    preconditions are marked in.
  • When action A is in, then all of As unmarked
    instant add effects will also be marked in at t.
  • ?Any delayed effect e of A that adds fact f
    is put into the event queue Q if (1) f is not
    marked in and (2) there is no event e0 in Q that
    is scheduled to happen before e and which also
    adds f. Moreover, when an event e is added to Q,
    we will take out from Q any event e0 which is
    scheduled to occur after e and also adds f.
  • ?When there are no more unmarked applicable
    actions in S, we will stop and return no-solution
    if either
  • (1) Q is empty or (2) there exists some unmarked
    goal with a deadline that is smaller than the
    time of the
  • earliest event in Q.
  • ?If none of the situations above occurs, then we
    will apply advance-time action to S and
  • activate all events at time point te0 of the
    earliest event e in Q.
  • ?The process above will be repeated until all the
    goals are marked in or one of the conditions
    indicating non-solution occurs.

From Do Kambhampati ECP 01
26
Heuristics directly from RTPG
A D M I S S I B L E
  • For Makespan Distance from a state S to the
    goals is equal to the duration between time(S)
    and the time the last goal appears in the RTPG.
  • For Min/Max/Sum Slack Distance from a state to
    the goals is equal to the minimum, maximum, or
    summation of slack estimates for all individual
    goals using the RTPG.
  • Slack estimate is the difference between the
    deadline of the goal, and the expected time of
    achievement of that goal.

Proof All goals appear in the RTPG at times
smaller or equal to their achievable times.
27
Heuristics from Relaxed Plan Extracted from RTPG
RTPG can be used to find a relaxed solution which
is then used to estimate distance from a given
state to the goals
Sum actions Distance from a state S to the goals
equals the number of actions in the relaxed plan.
Sum durations Distance from a state S to the
goals equals the summation of action durations in
the relaxed plan.
28
Resource-based Adjustments to Heuristics
Resource related information, ignored originally,
can be used to improve the heuristic values
Adjusted Sum-Action h h ?R ?
(Con(R) (Init(R)Pro(R)))/?R?
Adjusted Sum-Duration h h ?R
(Con(R) (Init(R)Pro(R)))/?R.Dur(AR)
? Will not preserve admissibility
29
Aims of Empirical Study
  • Evaluate the effectiveness of the different
    heuristics.
  • Ablation studies
  • Test if the resource adjustment technique helps
    different heuristics.
  • Compare with other temporal planning systems.

30
Empirical Results
  • Sum-action finds solutions faster than sum-dur
  • Admissible heuristics do not scale up to bigger
    problems
  • Sum-dur finds shorter duration solutions in most
    of the cases
  • Resource-based adjustment helps sum-action, but
    not sum-dur
  • Very few irrelevant actions. Better quality than
    TemporalTLPlan.
  • So, (transitively) better than LPSAT

31
Empirical Results (cont.)
Logistics domain with driving restricted to
intra-city (traditional logistics domain)
Sapa is the only planner that can solve all 80
problems
32
Empirical Results (cont.)
Logistics domain with inter-city driving actions
The sum-action heuristic used as the default
in Sapa can be mislead by the long duration
actions...
?
Future work on fixed point time/level propagation
33
Multi-objective search
Next Class
  • Multi-dimensional nature of plan quality in
    metric temporal planning
  • Temporal quality (e.g. makespan, slack)
  • Plan cost (e.g. cumulative action cost, resource
    consumption)
  • Necessitates multi-objective optimization
  • Modeling objective functions
  • Tracking different quality metrics and heuristic
    estimation
  • ? Challenge There may be inter-dependent
    relations between different quality metric
Write a Comment
User Comments (0)
About PowerShow.com