Title: Deliberation Scheduling for Planning in Real-Time
1Deliberation Scheduling for Planning in Real-Time
- David J. Musliner
- Honeywell Laboratories
- Robert P. Goldman
- SIFT, LLC
- Kurt Krebsbach
- Lawrence University
2Outline
- Application summary.
- Deliberation scheduling problem.
- Analytic experiments.
- Demonstration tests.
- Conclusions.
3Planning and Action for Real-Time Control
- Adaptive Mission Planner Decomposes an overall
mission into multiple control problems, with
limited performance goals designed to make the
controller synthesis problem solvable with
available time and available execution resources. - Controller Synthesis Module For each control
problem, synthesizes a real-time reactive
controller according to the constraints sent from
AMP. - Real Time Subsystem Continuously executes
synthesized control reactions in hard real-time
environment does not pause waiting for new
controllers.
Adaptive Mission Planner
Controller Synthesis Module
Real Time System
4How CIRCA Works
Break down task
Adaptive Mission Planner
Generate controller
Generate controller
Generate controller
Controller Synthesis Module
Execute controller
if (state-1) then action-1 if (state-2) then
action-2 ...
Real Time System
5CIRCA Design Features
- Flexible systems --- CIRCA reconfigures itself
while it is operating. - Limited resources --- CIRCA dynamically
synthesizes controllers for only the immediately
relevant parts of the situation. CIRCA does this
introspectively, reasoning about resource limits. - Time-critical, hazardous situations --- CIRCA
guarantees that it will respond in a timely way
to threats in its environment.
OK
6Controller Synthesis Module (CSM)
Available actions
Uncontrollable transitions
Controller Synthesis Module
Timed Automata Controller Design Executable Reac
tive Controller
Goal state description
Initial state description
Problem configuration
7AMP Overview
- Mission is the main input threats and goals,
specific to different mission phases (e.g.,
ingress, attack, egress). - Threats are safety-critical must guarantee to
maintain safety (sometimes probabilistically) in
worst case, using real-time reactions. - Goals are best-effort dont need to guarantee.
- Each mission phase requires a plan (or
controller), built by the CSM to handle a problem
configuration. - Changes in capabilities, mission, environment can
lead to need for additional controller synthesis.
8AMP Overview
Threats, Goals
Problem Configuration
Controller Synthesis Module
Algorithm Controls
Algorithm Performance
9AMP Responsibilities
- Divide mission into phases, subdividing them as
necessary to handle resource restrictions. - Build problem configurations for each phase, to
drive CSM. - Modify problem configurations, both internally
and via negotiation with other AMPs, to handle
resource limitations. - Capabilities (assets).
- Bounded rationality deliberation resources.
- Bounded reactivity execution resources.
10AMP Deliberation Scheduling
- MDP-based approach for AMP to adjust CSM problem
configurations and algorithm parameters to
maximize expected utility of deliberation. - Issues
- Complex utility function for overall mission
plan. - Survival dependencies between sequenced
controllers. - Require CSM algorithm performance profiles.
- Planning that is expected to complete further in
the future must be discounted. - Differences from other deliberation scheduling
techniques - CSM planning is not an anytime algorithm --- its
more a Las Vegas than a Monte Carlo algorithm. - Its not a problem of trading deliberation versus
action deliberation and action proceed in
concert. - Survival of the platform is key concern.
11AMP Deliberation Scheduling
- Mission phases characterized by
- Probability of survival/failure.
- Expected reward.
- Expected start time and duration.
- Agent keeps reward from all executed phases.
- Different CSM problem configuration operators
yield different types of plan improvements. - Improve probability of survival.
- Improve expected reward (number or likelihood of
goals). - Configuration operators can be applied to same
phase in different ways (via parameters). - Configuration operators have different expected
resource requirements (computation time/space).
12Expected Mission Utility
Markov chain behavior in the mission phases
Probability of surviving vs. entering absorbing
failure state. Reward expectations unevenly
distributed.
s1
s2
s3
s4
Phase 1
Phase 2
Phase 4
Phase 3
Phase 5
R5
R3
1-s1
FAILURE
13The Actions CSM Performance Profiles
AMP attempts to predict time-to-plan from domain
characteristics, so AMP can be smart about
configuring CSM problems in time-constrained
situations.
14Histogram of Same Performance Results
Note increasing spread (uncertainty of runtime)
as problem grows.
15Modeling the Problem as MDP
- Actions commit to 80 success time for CSM
plan. - All actions have equal probability of success.
- Durations vary.
- States
- Sink states destruction and mission completion.
- Other states vector of survival probabilities.
- Utility model goal achievement survival.
16Algorithms
- Optimal MDP solution Bellman backup (finite
horizon problem). - Very computationally expensive.
- Greedy one-step lookahead.
- Assume you do only one computational action,
which is best. - Discounted variant.
- Strawmen shortest-action first, earliest-phase
first, etc. - Conducted a number of comparison experiments
(results published elsewhere).
17Deliberation Scheduling Experiments
- Developed simulation of greedy algorithm and
competitors to evaluate performance - Analytically compute expected utility of
different algorithms. - Compare sampled performance on test scenarios, to
validate analytic results. - Compare runtimes.
18Optimal Deliberation Scheduling
- Optimal policy accounts for all non-deterministic
outcomes of deliberation and world state.
19Bounded-Horizon Discrete Schedule
- Only assign limited future deliberation time, in
discretized intervals, to maximize expected
utility of deliberation. - Execute one or more of the scheduled deliberation
activities (CSM methods) and then re-derive
schedule. - Ala model predictive control.
- Greedy approach reduces complexity of
deliberation scheduling. - Reacts effectively to actual outcome of CSM
processing.
20Discount Factors
- Greedy use of basic expected utility formula
requires discounting to take into account two
important effects - Window of opportunity for deliberation you have
more future time to deliberate on phases that
start later. - Otherwise, large potential improvements in
far-out phases can distract from near-term
improvements. - Split phase when new plan downloaded during
execution Amount of improvement limited by time
remaining in phase.
21Runtime Comparison of Optimal Greedy
22Quality Result for Medium Scenarios
23Medium Quality Comparison Summary
- Discounted greedy agent beats simple greedy agent
79 times, ties 3, loses 2. - Discounted greedy agent averages 86 of optimal
expected utility simple greedy averages 79. - More difficult domains challenge myopic policies,
and crush random policy (73 overall).
Discounted greedy beats random 83/84 times. - Even on easy scenarios, optimal is waaaay too
slow!
24Deliberation Scheduling Experiments Summary
- Both greedy agents runtime scales linearly with
branching factor of graph at a single level, not
the overall size of graph. - Branching factor corresponds to number of
alternative deliberation actions (here,
approximately the of threats). - Note this will probably be too large to enumerate
in more realistic domains (even our UAV demo
domains). - Discounted greedy agent is able to robustly
achieve the bulk of available deliberation
utility with acceptable and scalable deliberation
scheduling delays. - Hard domains have significantly higher threat
profile, larger state space sizes.
25Mission Testing
- Modified AMP to incorporate deliberation
scheduling algorithms. - Tested three different agents
- S shortest problem first
- U simple greedy DS
- DU greedy with discounting.
- Tested in mission with multiple threats and two
goals.
26AMP with Deliberation Scheduling
- AMP-accessible CSM performance profiles added.
- AMP characterizes problem configuration by
threats, goals (no structural distinctions
yet). - AMP estimates survival probability based on
threat lethality (prior to planning). - Reward is assumed 100 probable if goal is
achieved by plan. - Six deliberation scheduling modes available
- Shortest/earliest/earliest-shortest.
- Marginal expected future payoff.
- Incremental expected utility.
- Discounted incremental expected utility.
27Mission Overview
Egress
5
Attack
6
4
3
2
Ingress
1
0
28Demo Agents
- Shortest selects shortest deliberation task
first. - Lacking utility measuring ability, can swap in
less-useful plans. - Utility selects highest incremental expected
utility. - Discounted utility includes time-discounting
to avoid being distracted by high-reward goals
far in the future.
29Demo Outcome
- Shortest
- Builds all the easy single-threat plans quickly.
- Survives the entire mission.
- Waits too long before building plans for goal
achievement fails to hit targets. - Utility
- Builds safe plans for most threats
- Gets distracted by high-reward goal in egress
phase. - Dies in attack phase due to unhandled threat.
- Discounted utility
- Completes entire mission successfully.
30Expected Payoff vs. Time
Apparent drop in utility is due to phase update.
Utility chooses badly, tries to plan for egress
but ignores threat during attack.
Shortest chooses badly, discards good plans and
tries goal plans too late.
31Demo 2 Ingress Phase
- All three are attacked but defend selves
successfully.
32Demo 2 Attack Phase
- Utility and Discounted utility hit targets.
- Utility dies from unhandled threat.
- Shortest stays safe but does not strike target.
33Demo 2 Second Attack Phase (Egress)
- Only Discounted utility hits second target.
- Shortest stays safe but does not strike target.
34Summary
35The End
36Related Topics
- Conventional Deliberation Scheduling Work
- Typically this work assumes the object-level
computation is based on anytime algorithms. - CSM algorithms are not readily converted to
anytime. Performance improvements are discrete
and all-or-nothing. - Because of true parallel Real Time System/AI
System, dont have conventional think/act
tradeoffs. - Design-to-time appropriate, but building full
schedules versus single action choices.
Comparison may be possible. - MDP solvers either infinite horizon or finite
horizon with offline policy computation. We have
on-line decision making with dynamic MDP.
37Survival
- Survival is related to plan safety.
- Survival probability of successfully completing
a mission phase without transitioning to failure. - si P(surviving from start to end of individual
phase) - Si P(surviving through phase i, from current
point)
given that scur is the following conditional
probability
38Reward
- Reward is related to team goal achievement.
- Individual agents improve overall utility of
mission plan by successfully bidding on,
planning, and executing plans that achieve team
goals. - Team shares rewards in all phases (ties local
survival to distributed goal achievement) - If awarded goal contract, current plan taken into
account. - If not awarded goal contract, assume full reward
value. - Reward for ith phase containing n team goals
39Demo Scenario
- Three types of threats (IR, radar, radar2) during
ingress, attack, and egress phases. - Targets in attack and egress phases.
- Overall, there are 41 valid different problem
configurations that can be sent to the CSM. Some
are unsolvable in allocated time. - Performance profiles are approximate
- Predicted planning times range from 1 to 60
seconds. - Some configurations take less than predicted.
- Some take more, and time out rather than
finishing. - Mission begins as soon as first plan available (lt
1 second). - Mission lasts approx 4 minutes.
- Doing all plans would require 22.3 minutes.
40Simulation Notes
- AMP Information Display quality meters illustrate
status of plans for different mission phases, and
deliberation scheduling decisions about focus of
attention/replanning. - R reward achieved by best plan for phase.
- S survival probability of best plan for phase.
- Overall payoff meter shows plans expected
utility.