Deliberation Scheduling for Planning in Real-Time

1 / 28
About This Presentation
Title:

Deliberation Scheduling for Planning in Real-Time

Description:

CSM planning is not an anytime algorithm --- it's more a Las Vegas than a Monte Carlo algorithm. ... Bounded-Horizon Discrete Schedule ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 29
Provided by: house4

less

Transcript and Presenter's Notes

Title: Deliberation Scheduling for Planning in Real-Time


1
Deliberation Scheduling for Planning in Real-Time
  • David J. Musliner
  • Honeywell Laboratories
  • Robert P. Goldman
  • SIFT, LLC
  • Kurt Krebsbach
  • Lawrence University

2
Outline
  • Application summary.
  • Deliberation scheduling problem.
  • Analytic experiments.
  • Demonstration tests.
  • Conclusions.

3
Planning and Action for Real-Time Control
  • Adaptive Mission Planner Decomposes an overall
    mission into multiple control problems, with
    limited performance goals designed to make the
    controller synthesis problem solvable with
    available time and available execution resources.
  • Controller Synthesis Module For each control
    problem, synthesizes a real-time reactive
    controller according to the constraints sent from
    AMP.
  • Real Time Subsystem Continuously executes
    synthesized control reactions in hard real-time
    environment does not pause waiting for new
    controllers.

Adaptive Mission Planner
Controller Synthesis Module
Real Time System
4
How CIRCA Works
Break down task
Adaptive Mission Planner
Generate controller
Generate controller
Generate controller
Controller Synthesis Module
Execute controller
if (state-1) then action-1 if (state-2) then
action-2 ...
Real Time System
5
CIRCA Design Features
  • Flexible systems --- CIRCA reconfigures itself
    while it is operating.
  • Limited resources --- CIRCA dynamically
    synthesizes controllers for only the immediately
    relevant parts of the situation. CIRCA does this
    introspectively, reasoning about resource limits.
  • Time-critical, hazardous situations --- CIRCA
    guarantees that it will respond in a timely way
    to threats in its environment.

OK
6
Controller Synthesis Module (CSM)
Available actions
Uncontrollable transitions
Controller Synthesis Module
Timed Automata Controller Design Executable Reac
tive Controller
Goal state description
Initial state description
Problem configuration
7
AMP Overview
  • Mission is the main input threats and goals,
    specific to different mission phases (e.g.,
    ingress, attack, egress).
  • Threats are safety-critical must guarantee to
    maintain safety (sometimes probabilistically) in
    worst case, using real-time reactions.
  • Goals are best-effort dont need to guarantee.
  • Each mission phase requires a plan (or
    controller), built by the CSM to handle a problem
    configuration.
  • Changes in capabilities, mission, environment can
    lead to need for additional controller synthesis.

8
AMP Overview
Threats, Goals
Problem Configuration
Controller Synthesis Module
Algorithm Controls
Algorithm Performance
9
AMP Responsibilities
  • Divide mission into phases, subdividing them as
    necessary to handle resource restrictions.
  • Build problem configurations for each phase, to
    drive CSM.
  • Modify problem configurations, both internally
    and via negotiation with other AMPs, to handle
    resource limitations.
  • Capabilities (assets).
  • Bounded rationality deliberation resources.
  • Bounded reactivity execution resources.

10
AMP Deliberation Scheduling
  • MDP-based approach for AMP to adjust CSM problem
    configurations and algorithm parameters to
    maximize expected utility of deliberation.
  • Issues
  • Complex utility function for overall mission
    plan.
  • Survival dependencies between sequenced
    controllers.
  • Require CSM algorithm performance profiles.
  • Planning that is expected to complete further in
    the future must be discounted.
  • Differences from other deliberation scheduling
    techniques
  • CSM planning is not an anytime algorithm --- its
    more a Las Vegas than a Monte Carlo algorithm.
  • Its not a problem of trading deliberation versus
    action deliberation and action proceed in
    concert.
  • Survival of the platform is key concern.

11
AMP Deliberation Scheduling
  • Mission phases characterized by
  • Probability of survival/failure.
  • Expected reward.
  • Expected start time and duration.
  • Agent keeps reward from all executed phases.
  • Different CSM problem configuration operators
    yield different types of plan improvements.
  • Improve probability of survival.
  • Improve expected reward (number or likelihood of
    goals).
  • Configuration operators can be applied to same
    phase in different ways (via parameters).
  • Configuration operators have different expected
    resource requirements (computation time/space).

12
Expected Mission Utility
Markov chain behavior in the mission phases
Probability of surviving vs. entering absorbing
failure state. Reward expectations unevenly
distributed.
s1
s2
s3
s4
Phase 1
Phase 2
Phase 4
Phase 3
Phase 5
R5
R3
1-s1
FAILURE
13
The Actions CSM Performance Profiles
AMP attempts to predict time-to-plan from domain
characteristics, so AMP can be smart about
configuring CSM problems in time-constrained
situations.
14
Histogram of Same Performance Results
Note increasing spread (uncertainty of runtime)
as problem grows.
15
Modeling the Problem as MDP
  • Actions commit to 80 success time for CSM
    plan.
  • All actions have equal probability of success.
  • Durations vary.
  • States
  • Sink states destruction and mission completion.
  • Other states vector of survival probabilities.
  • Utility model goal achievement survival.

16
Algorithms
  • Optimal MDP solution Bellman backup (finite
    horizon problem).
  • Very computationally expensive.
  • Greedy one-step lookahead.
  • Assume you do only one computational action,
    which is best.
  • Discounted variant.
  • Strawmen shortest-action first, earliest-phase
    first, etc.
  • Conducted a number of comparison experiments
    (results published elsewhere).

17
Deliberation Scheduling Experiments
  • Developed simulation of greedy algorithm and
    competitors to evaluate performance
  • Analytically compute expected utility of
    different algorithms.
  • Compare sampled performance on test scenarios, to
    validate analytic results.
  • Compare runtimes.

18
Optimal Deliberation Scheduling
  • Optimal policy accounts for all non-deterministic
    outcomes of deliberation and world state.

19
Bounded-Horizon Discrete Schedule
  • Only assign limited future deliberation time, in
    discretized intervals, to maximize expected
    utility of deliberation.
  • Execute one or more of the scheduled deliberation
    activities (CSM methods) and then re-derive
    schedule.
  • Ala model predictive control.
  • Greedy approach reduces complexity of
    deliberation scheduling.
  • Reacts effectively to actual outcome of CSM
    processing.

20
Discount Factors
  • Greedy use of basic expected utility formula
    requires discounting to take into account two
    important effects
  • Window of opportunity for deliberation you have
    more future time to deliberate on phases that
    start later.
  • Otherwise, large potential improvements in
    far-out phases can distract from near-term
    improvements.
  • Split phase when new plan downloaded during
    execution Amount of improvement limited by time
    remaining in phase.

21
Runtime Comparison of Optimal Greedy
22
Quality Result for Medium Scenarios
23
Medium Quality Comparison Summary
  • Discounted greedy agent beats simple greedy agent
    79 times, ties 3, loses 2.
  • Discounted greedy agent averages 86 of optimal
    expected utility simple greedy averages 79.
  • More difficult domains challenge myopic policies,
    and crush random policy (73 overall).
    Discounted greedy beats random 83/84 times.
  • Even on easy scenarios, optimal is waaaay too
    slow!

24
Deliberation Scheduling Experiments Summary
  • Both greedy agents runtime scales linearly with
    branching factor of graph at a single level, not
    the overall size of graph.
  • Branching factor corresponds to number of
    alternative deliberation actions (here,
    approximately the of threats).
  • Note this will probably be too large to enumerate
    in more realistic domains (even our UAV demo
    domains).
  • Discounted greedy agent is able to robustly
    achieve the bulk of available deliberation
    utility with acceptable and scalable deliberation
    scheduling delays.
  • Hard domains have significantly higher threat
    profile, larger state space sizes.

25
Mission Testing
  • Modified AMP to incorporate deliberation
    scheduling algorithms.
  • Tested three different agents
  • S shortest problem first
  • U simple greedy DS
  • DU greedy with discounting.
  • Tested in mission with multiple threats and two
    goals.

26
AMP with Deliberation Scheduling
  • AMP-accessible CSM performance profiles added.
  • AMP characterizes problem configuration by
    threats, goals (no structural distinctions
    yet).
  • AMP estimates survival probability based on
    threat lethality (prior to planning).
  • Reward is assumed 100 probable if goal is
    achieved by plan.
  • Six deliberation scheduling modes available
  • Shortest/earliest/earliest-shortest.
  • Marginal expected future payoff.
  • Incremental expected utility.
  • Discounted incremental expected utility.

27
Mission Overview
Egress
5
Attack
6
4
3
2
Ingress
1
0
28
Demo Agents
  • Shortest selects shortest deliberation task
    first.
  • Lacking utility measuring ability, can swap in
    less-useful plans.
  • Utility selects highest incremental expected
    utility.
  • Discounted utility includes time-discounting
    to avoid being distracted by high-reward goals
    far in the future.

29
Demo Outcome
  • Shortest
  • Builds all the easy single-threat plans quickly.
  • Survives the entire mission.
  • Waits too long before building plans for goal
    achievement fails to hit targets.
  • Utility
  • Builds safe plans for most threats
  • Gets distracted by high-reward goal in egress
    phase.
  • Dies in attack phase due to unhandled threat.
  • Discounted utility
  • Completes entire mission successfully.

30
Expected Payoff vs. Time
Apparent drop in utility is due to phase update.
Utility chooses badly, tries to plan for egress
but ignores threat during attack.
Shortest chooses badly, discards good plans and
tries goal plans too late.
31
Demo 2 Ingress Phase
  • All three are attacked but defend selves
    successfully.

32
Demo 2 Attack Phase
  • Utility and Discounted utility hit targets.
  • Utility dies from unhandled threat.
  • Shortest stays safe but does not strike target.

33
Demo 2 Second Attack Phase (Egress)
  • Only Discounted utility hits second target.
  • Shortest stays safe but does not strike target.

34
Summary
35
The End
36
Related Topics
  • Conventional Deliberation Scheduling Work
  • Typically this work assumes the object-level
    computation is based on anytime algorithms.
  • CSM algorithms are not readily converted to
    anytime. Performance improvements are discrete
    and all-or-nothing.
  • Because of true parallel Real Time System/AI
    System, dont have conventional think/act
    tradeoffs.
  • Design-to-time appropriate, but building full
    schedules versus single action choices.
    Comparison may be possible.
  • MDP solvers either infinite horizon or finite
    horizon with offline policy computation. We have
    on-line decision making with dynamic MDP.

37
Survival
  • Survival is related to plan safety.
  • Survival probability of successfully completing
    a mission phase without transitioning to failure.
  • si P(surviving from start to end of individual
    phase)
  • Si P(surviving through phase i, from current
    point)

given that scur is the following conditional
probability
38
Reward
  • Reward is related to team goal achievement.
  • Individual agents improve overall utility of
    mission plan by successfully bidding on,
    planning, and executing plans that achieve team
    goals.
  • Team shares rewards in all phases (ties local
    survival to distributed goal achievement)
  • If awarded goal contract, current plan taken into
    account.
  • If not awarded goal contract, assume full reward
    value.
  • Reward for ith phase containing n team goals

39
Demo Scenario
  • Three types of threats (IR, radar, radar2) during
    ingress, attack, and egress phases.
  • Targets in attack and egress phases.
  • Overall, there are 41 valid different problem
    configurations that can be sent to the CSM. Some
    are unsolvable in allocated time.
  • Performance profiles are approximate
  • Predicted planning times range from 1 to 60
    seconds.
  • Some configurations take less than predicted.
  • Some take more, and time out rather than
    finishing.
  • Mission begins as soon as first plan available (lt
    1 second).
  • Mission lasts approx 4 minutes.
  • Doing all plans would require 22.3 minutes.

40
Simulation Notes
  • AMP Information Display quality meters illustrate
    status of plans for different mission phases, and
    deliberation scheduling decisions about focus of
    attention/replanning.
  • R reward achieved by best plan for phase.
  • S survival probability of best plan for phase.
  • Overall payoff meter shows plans expected
    utility.
Write a Comment
User Comments (0)