Deliberation Scheduling for Planning in Real-Time

1 / 28

About This Presentation

Title:

Deliberation Scheduling for Planning in Real-Time

Description:

CSM planning is not an anytime algorithm --- it's more a Las Vegas than a Monte Carlo algorithm. ... Bounded-Horizon Discrete Schedule ... – PowerPoint PPT presentation

Number of Views:26

Avg rating:3.0/5.0

Slides: 29

Provided by: house4

Learn more at: https://www.agcognition.org

more less

Transcript and Presenter's Notes

Title: Deliberation Scheduling for Planning in Real-Time

1
Deliberation Scheduling for Planning in Real-Time

David J. Musliner
Honeywell Laboratories
Robert P. Goldman
SIFT, LLC
Kurt Krebsbach
Lawrence University

2
Outline

Application summary.
Deliberation scheduling problem.
Analytic experiments.
Demonstration tests.
Conclusions.

3
Planning and Action for Real-Time Control

Adaptive Mission Planner Decomposes an overall
mission into multiple control problems, with
limited performance goals designed to make the
controller synthesis problem solvable with
available time and available execution resources.
Controller Synthesis Module For each control
problem, synthesizes a real-time reactive
controller according to the constraints sent from
AMP.
Real Time Subsystem Continuously executes
synthesized control reactions in hard real-time
environment does not pause waiting for new
controllers.

Adaptive Mission Planner
Controller Synthesis Module
Real Time System
4
How CIRCA Works
Break down task
Adaptive Mission Planner
Generate controller
Generate controller
Generate controller
Controller Synthesis Module
Execute controller
if (state-1) then action-1 if (state-2) then
action-2 ...
Real Time System
5
CIRCA Design Features

Flexible systems --- CIRCA reconfigures itself
while it is operating.
Limited resources --- CIRCA dynamically
synthesizes controllers for only the immediately
relevant parts of the situation. CIRCA does this
introspectively, reasoning about resource limits.
Time-critical, hazardous situations --- CIRCA
guarantees that it will respond in a timely way
to threats in its environment.

OK
6
Controller Synthesis Module (CSM)
Available actions
Uncontrollable transitions
Controller Synthesis Module
Timed Automata Controller Design Executable Reac
tive Controller
Goal state description
Initial state description
Problem configuration
7
AMP Overview

Mission is the main input threats and goals,
specific to different mission phases (e.g.,
ingress, attack, egress).
Threats are safety-critical must guarantee to
maintain safety (sometimes probabilistically) in
worst case, using real-time reactions.
Goals are best-effort dont need to guarantee.
Each mission phase requires a plan (or
controller), built by the CSM to handle a problem
configuration.
Changes in capabilities, mission, environment can
lead to need for additional controller synthesis.

8
AMP Overview
Threats, Goals
Problem Configuration
Controller Synthesis Module
Algorithm Controls
Algorithm Performance
9
AMP Responsibilities

Divide mission into phases, subdividing them as
necessary to handle resource restrictions.
Build problem configurations for each phase, to
drive CSM.
Modify problem configurations, both internally
and via negotiation with other AMPs, to handle
resource limitations.
Capabilities (assets).
Bounded rationality deliberation resources.
Bounded reactivity execution resources.

10
AMP Deliberation Scheduling

MDP-based approach for AMP to adjust CSM problem
configurations and algorithm parameters to
maximize expected utility of deliberation.
Issues
Complex utility function for overall mission
plan.
Survival dependencies between sequenced
controllers.
Require CSM algorithm performance profiles.
Planning that is expected to complete further in
the future must be discounted.
Differences from other deliberation scheduling
techniques
CSM planning is not an anytime algorithm --- its
more a Las Vegas than a Monte Carlo algorithm.
Its not a problem of trading deliberation versus
action deliberation and action proceed in
concert.
Survival of the platform is key concern.

11
AMP Deliberation Scheduling

Mission phases characterized by
Probability of survival/failure.
Expected reward.
Expected start time and duration.
Agent keeps reward from all executed phases.
Different CSM problem configuration operators
yield different types of plan improvements.
Improve probability of survival.
Improve expected reward (number or likelihood of
goals).
Configuration operators can be applied to same
phase in different ways (via parameters).
Configuration operators have different expected
resource requirements (computation time/space).

12
Expected Mission Utility
Markov chain behavior in the mission phases
Probability of surviving vs. entering absorbing
failure state. Reward expectations unevenly
distributed.
s1
s2
s3
s4
Phase 1
Phase 2
Phase 4
Phase 3
Phase 5
R5
R3
1-s1
FAILURE
13
The Actions CSM Performance Profiles
AMP attempts to predict time-to-plan from domain
characteristics, so AMP can be smart about
configuring CSM problems in time-constrained
situations.
14
Histogram of Same Performance Results
Note increasing spread (uncertainty of runtime)
as problem grows.
15
Modeling the Problem as MDP

Actions commit to 80 success time for CSM
plan.
All actions have equal probability of success.
Durations vary.
States
Sink states destruction and mission completion.
Other states vector of survival probabilities.
Utility model goal achievement survival.

16
Algorithms

Optimal MDP solution Bellman backup (finite
horizon problem).
Very computationally expensive.
Greedy one-step lookahead.
Assume you do only one computational action,
which is best.
Discounted variant.
Strawmen shortest-action first, earliest-phase
first, etc.
Conducted a number of comparison experiments
(results published elsewhere).

17
Deliberation Scheduling Experiments

Developed simulation of greedy algorithm and
competitors to evaluate performance
Analytically compute expected utility of
different algorithms.
Compare sampled performance on test scenarios, to
validate analytic results.
Compare runtimes.

18
Optimal Deliberation Scheduling

Optimal policy accounts for all non-deterministic
outcomes of deliberation and world state.

19
Bounded-Horizon Discrete Schedule

Only assign limited future deliberation time, in
discretized intervals, to maximize expected
utility of deliberation.
Execute one or more of the scheduled deliberation
activities (CSM methods) and then re-derive
schedule.
Ala model predictive control.
Greedy approach reduces complexity of
deliberation scheduling.
Reacts effectively to actual outcome of CSM
processing.

20
Discount Factors

Greedy use of basic expected utility formula
requires discounting to take into account two
important effects
Window of opportunity for deliberation you have
more future time to deliberate on phases that
start later.
Otherwise, large potential improvements in
far-out phases can distract from near-term
improvements.
Split phase when new plan downloaded during
execution Amount of improvement limited by time
remaining in phase.

21
Runtime Comparison of Optimal Greedy
22
Quality Result for Medium Scenarios
23
Medium Quality Comparison Summary

Discounted greedy agent beats simple greedy agent
79 times, ties 3, loses 2.
Discounted greedy agent averages 86 of optimal
expected utility simple greedy averages 79.
More difficult domains challenge myopic policies,
and crush random policy (73 overall).
Discounted greedy beats random 83/84 times.
Even on easy scenarios, optimal is waaaay too
slow!

24
Deliberation Scheduling Experiments Summary

Both greedy agents runtime scales linearly with
branching factor of graph at a single level, not
the overall size of graph.
Branching factor corresponds to number of
alternative deliberation actions (here,
approximately the of threats).
Note this will probably be too large to enumerate
in more realistic domains (even our UAV demo
domains).
Discounted greedy agent is able to robustly
achieve the bulk of available deliberation
utility with acceptable and scalable deliberation
scheduling delays.
Hard domains have significantly higher threat
profile, larger state space sizes.

25
Mission Testing

Modified AMP to incorporate deliberation
scheduling algorithms.
Tested three different agents
S shortest problem first
U simple greedy DS
DU greedy with discounting.
Tested in mission with multiple threats and two
goals.

26
AMP with Deliberation Scheduling

AMP-accessible CSM performance profiles added.
AMP characterizes problem configuration by
threats, goals (no structural distinctions
yet).
AMP estimates survival probability based on
threat lethality (prior to planning).
Reward is assumed 100 probable if goal is
achieved by plan.
Six deliberation scheduling modes available
Shortest/earliest/earliest-shortest.
Marginal expected future payoff.
Incremental expected utility.
Discounted incremental expected utility.

27
Mission Overview
Egress
5
Attack
6
4
3
2
Ingress
1
0
28
Demo Agents

Shortest selects shortest deliberation task
first.
Lacking utility measuring ability, can swap in
less-useful plans.
Utility selects highest incremental expected
utility.
Discounted utility includes time-discounting
to avoid being distracted by high-reward goals
far in the future.

29
Demo Outcome

Shortest
Builds all the easy single-threat plans quickly.
Survives the entire mission.
Waits too long before building plans for goal
achievement fails to hit targets.
Utility
Builds safe plans for most threats
Gets distracted by high-reward goal in egress
phase.
Dies in attack phase due to unhandled threat.
Discounted utility
Completes entire mission successfully.

30
Expected Payoff vs. Time
Apparent drop in utility is due to phase update.
Utility chooses badly, tries to plan for egress
but ignores threat during attack.
Shortest chooses badly, discards good plans and
tries goal plans too late.
31
Demo 2 Ingress Phase

All three are attacked but defend selves
successfully.

32
Demo 2 Attack Phase

Utility and Discounted utility hit targets.
Utility dies from unhandled threat.
Shortest stays safe but does not strike target.

33
Demo 2 Second Attack Phase (Egress)

Only Discounted utility hits second target.
Shortest stays safe but does not strike target.

34
Summary
35
The End
36
Related Topics

Conventional Deliberation Scheduling Work
Typically this work assumes the object-level
computation is based on anytime algorithms.
CSM algorithms are not readily converted to
anytime. Performance improvements are discrete
and all-or-nothing.
Because of true parallel Real Time System/AI
System, dont have conventional think/act
tradeoffs.
Design-to-time appropriate, but building full
schedules versus single action choices.
Comparison may be possible.
MDP solvers either infinite horizon or finite
horizon with offline policy computation. We have
on-line decision making with dynamic MDP.

37
Survival

Survival is related to plan safety.
Survival probability of successfully completing
a mission phase without transitioning to failure.
si P(surviving from start to end of individual
phase)
Si P(surviving through phase i, from current
point)

given that scur is the following conditional
probability
38
Reward

Reward is related to team goal achievement.
Individual agents improve overall utility of
mission plan by successfully bidding on,
planning, and executing plans that achieve team
goals.
Team shares rewards in all phases (ties local
survival to distributed goal achievement)
If awarded goal contract, current plan taken into
account.
If not awarded goal contract, assume full reward
value.
Reward for ith phase containing n team goals

39
Demo Scenario

Three types of threats (IR, radar, radar2) during
ingress, attack, and egress phases.
Targets in attack and egress phases.
Overall, there are 41 valid different problem
configurations that can be sent to the CSM. Some
are unsolvable in allocated time.
Performance profiles are approximate
Predicted planning times range from 1 to 60
seconds.
Some configurations take less than predicted.
Some take more, and time out rather than
finishing.
Mission begins as soon as first plan available (lt
1 second).
Mission lasts approx 4 minutes.
Doing all plans would require 22.3 minutes.

40
Simulation Notes

AMP Information Display quality meters illustrate
status of plans for different mission phases, and
deliberation scheduling decisions about focus of
attention/replanning.
R reward achieved by best plan for phase.
S survival probability of best plan for phase.
Overall payoff meter shows plans expected
utility.

Write a Comment

User Comments (0)