Probabilistic Planning via Determinization in Hindsight FFHindsight - PowerPoint PPT Presentation

About This Presentation

Title:

Probabilistic Planning via Determinization in Hindsight FFHindsight

Description:

Alan Fern, Bob Givan and Rao Kambhampati. Sungwook Yoon Probabilistic Planning via Determinization ... Client : Participants, send action. Server: Competition ... – PowerPoint PPT presentation

Number of Views:85

Avg rating:3.0/5.0

Slides: 37

Provided by: SY90

Learn more at: https://rakaposhi.eas.asu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Probabilistic Planning via Determinization in Hindsight FFHindsight

1
Probabilistic Planning via Determinization in
HindsightFF-Hindsight

Sungwook Yoon
Joint work with
Alan Fern, Bob Givan and Rao Kambhampati

2
Probabilistic Planning Competition
Client Participants, send action Server
Competition Host, simulates actions
3
The Winner was

FF-Replan
A replanner. Use FF
Probabilistic domain is determinized
Interesting Contrast
Many probabilistic planning techniques
Work in theory but does not work in practice
FF-Replan
No theory
Work in practice

4
The Papers Objective
Better determinization approach (Determinization
in Hindsight)
Theoretical consideration of the new
determinization (in Hindsight)
New view on FF-Replan
Experimental studies with determinization in
Hindsight (FF-Hindsight)
5
Probabilistic Planning(goal-oriented)
Action
Left Outcomes are more likely
Maximize Goal Achievement
I
Probabilistic Outcome
A1
A2
Time 1
A1
A2
A1
A2
A1
A2
A1
A2
Time 2
Dead End
Action
Goal State
State
6
All Outcome Replanning (FFRA)
ICAPS-07
Effect 1
Action1
Effect 1
Probability1
Action
Probability2
Effect 2
Action2
Effect 2
7
Probabilistic PlanningAll Outcome Determinization
Action
Find Goal
I
Probabilistic Outcome
A1
A2
Time 1
A1-1
A1-2
A2-1
A2-2
A1
A2
A1
A2
A1
A2
A1
A2
Time 2
A1-1
A1-2
A2-1
A2-2
A1-1
A1-2
A2-1
A2-2
A1-1
A1-2
A2-1
A2-2
A1-1
A1-2
A2-1
A2-2
Dead End
Action
Goal State
State
8
Probabilistic PlanningAll Outcome Determinization
Action
Find Goal
I
Probabilistic Outcome
A1
A2
Time 1
A1-1
A1-2
A2-1
A2-2
A1
A2
A1
A2
A1
A2
A1
A2
Time 2
A1-1
A1-2
A2-1
A2-2
A1-1
A1-2
A2-1
A2-2
A1-1
A1-2
A2-1
A2-2
A1-1
A1-2
A2-1
A2-2
Dead End
Action
Goal State
State
9
Problem of FF-Replan and better alternative
sampling
FF-Replans Static Determinizations dont respect
probabilities. We need Probabilistic and Dynamic
Determinization
Sample Future Outcomes and Determinization in
Hindsight
Each Future Sample Becomes a Known-Future
Deterministic Problem
10
Probabilistic Planning(goal-oriented)
Action
Left Outcomes are more likely
Maximize Goal Achievement
I
Probabilistic Outcome
A1
A2
Time 1
A1
A2
A1
A2
A1
A2
A1
A2
Time 2
Dead End
Action
Goal State
State
11
Start Sampling
Note. Sampling will reveal which is better A1? Or
A2 at state I
12
Hindsight Sample 1
Action
Left Outcomes are more likely
Maximize Goal Achievement
I
Probabilistic Outcome
A1
A2
Time 1
A1
A2
A1
A2
A1
A2
A1
A2
Time 2
A1 1 A2 0
Dead End
Action
Goal State
State
13
Hindsight Sample 2
Action
Left Outcomes are more likely
Maximize Goal Achievement
I
Probabilistic Outcome
A1
A2
Time 1
A1
A2
A1
A2
A1
A2
A1
A2
Time 2
A1 2 A2 1
Dead End
Action
Goal State
State
14
Hindsight Sample 3
Action
Left Outcomes are more likely
Maximize Goal Achievement
I
Probabilistic Outcome
A1
A2
Time 1
A1
A2
A1
A2
A1
A2
A1
A2
Time 2
A1 2 A2 1
Dead End
Action
Goal State
State
15
Hindsight Sample 4
Action
Left Outcomes are more likely
Maximize Goal Achievement
I
Probabilistic Outcome
A1
A2
Time 1
A1
A2
A1
A2
A1
A2
A1
A2
Time 2
A1 3 A2 1
Dead End
Action
Goal State
State
16
Summary of the IdeaThe Decision
Process(Estimating Q-Value, Q(s,a))
S Current State, A(S) ? S
1. For Each Action A, Draw Future Samples
Each Sample is a Deterministic Planning Problem
2. Solve The Deterministic Problems
The solution length is used for goal-oriented
problems, Q(s,A)
3. Aggregate the solutions for each action
Max A Q(s,A)
4. Select the action with best aggregation
17
Mathematical Summary of the Algorithm

H-horizon future FH for M S,A,T,R
Mapping of state, action and time (hltH) to a
state
S A h ? S
Value of a policy p for FH
R(s,FH, p)
VHS(s,H) EFH maxp R(s,FH,p)
Compare this and the real value
V(s,H) maxp EFH R(s,FH,p)
VFFRa(s) maxF V(s,F) VHS(s,H) V(s,H)
Q(s,a,H) (R(a) EFH-1 maxp R(a(s),FH-1,p) )
In our proposal, computation of maxp R(s,FH-1,p)
is approximately done by FF Hoffmann and Nebel
01

Each Future is a Deterministic Problem
Done by FF
18
Key Technical Results
The Importance of Independent Sampling of States,
Actions, Time
The necessity of Random Time Breaking in Decision
making
We identify the characteristic of FF-Replan in
terms of Hindsight Decision Making, VFFRa(s)
maxF V(s,F)
Theorem 1 When there is a policy that can achieve
the goal with probability 1 within horizon,
hindsight decision making algorithm will find the
goal with probability 1.
Theorem 2 Polynomial number of samples are needed
with regard to, Horizon, Action, The minimum
Q-value advantage
19
Empirical Results
IPPC-04 Problems
Numbers are solved Trials
For ZenoTravel, when we used Importance sampling,
the solved trials have been improved to 26
20
Empirical Results
These Domains are Developed just to Beat
FF-Replan Obviously, FF-Replan did not do
well. But, FF-Hindsight did very well,
showing Probabilistic Reasoning Ability while
achieving Scalability
21
Conclusion
Deterministic Planning
Probabilistic Planning
scalability
scalability
Classic Planning
Markov Decision Processes
Machine Learning for Planning
Machine Learning for MDP
Net Benefit Optimization
Temporal MDP
Temporal Planning
scalability
Determinization
22
Conclusion

Devised an algorithm that can take advantage of
the significant advances in deterministic
planning in the context of probabilistic planning
Made many of the deterministic planning
techniques available to probabilistic planning
Most of the learning to planning techniques are
developed solely for deterministic planning
Now, these techniques are relevant to
probabilistic planning too
Advanced net-benefit style of planners can be
used for the reward maximization style of
probabilistic planning problems

23
Discussion

Mercier and Van Hentenryck provided the analysis
of the difference between
V(s,H) maxp EFH R(s,FH,p)
VHS(s,H) EFH maxp R(s,FH,p)
Ng and Jordan provided the analysis of the
difference between
V(s,H) maxp EFH R(s,FH,p)
V(s,H) maxp ? R(s,FH,p) / m, where m is
the sample number

24
IPPC-2004 Results
Winner of IPPC-04 FFRs
Human Control Knowledge
Numbers Successful Runs
Learned Knowledge
2nd Place Winners
25
IPPC-2006 Results
Numbers Percentage of Successful Runs
Unofficial Winner of IPPC-06 FFRa
26
(No Transcript)
27
Sampling ProblemTime dependency issue
S1
S2
A
Start
Goal
D (with probability 1-p)
B
C (with probability p)
S3
C (with probability 1-p)
D (with probability p)
Dead End
28
Sampling ProblemTime dependency issue
S1
S2
A
Start
Goal
B
S3
Dead End
S3 is worse state then S1 but looks like there is
always a path to GoalNeed to sample
independently across actions
29
Action Selection ProblemRandom Tie breaking is
essential
B with probability 1-p
A Always stays in Start
Start
S1
Goal
B with probability p
C with probability 1-p
C with probability p
In Start state, C action is definitely better,
but A can be used to wait until C to the Goal
effect is realized
30
Sampling ProblemImportance Sampling (IS)
B with very high probability
Start
Goal
S1
B with extremely low probability
- Sampling uniformly would find the problem
unsolvable. - Use importance sampling. -
Identifying the region that needs importance
sampling is for further study.-In the benchmark,
Zenotravel needs the IS idea.
31
Theoretical Results

Theorem 1
For goal-achieving probabilistic planning
problems, if there is a policy that can solve the
probabilistic planning problem with probability 1
with bounded horizon, then hindsight planning
would solve the problem with probability 1. If
there is no such policy, hindsight planning would
return less 1 success ratio.
If there is a future where no plan can achieve
the goal, the future can be sampled
Theorem 2
The number of future samples needed to correctly
identify the best action
w gt 4?-2T ln (AH / d)
? the minimum Q-advantage of the best action
over the other actions, d confidence parameter
From Chernoff Bound

32
Probabilistic PlanningExpecti-max solution
Action
Maximize Goal Achievement
Probabilistic Outcome
Max
Time 1
Exp
Exp
Max
Max
Max
Max
Time 2
E
E
E
E
E
E
E
E
Action
Goal State
State
33
Hindsight Sample 1
Action
Left Outcomes are more likely
Maximize Goal Achievement
I
Probabilistic Outcome
A1
A2
Time 1
A1
A2
A1
A2
A1
A2
A1
A2
Time 2
A1 1 A2 0
Dead End
Action
Goal State
State
34
Hindsight Sample 2
Action
Left Outcomes are more likely
Maximize Goal Achievement
I
Probabilistic Outcome
A1
A2
Time 1
A1
A2
A1
A2
A1
A2
A1
A2
Time 2
A1 2 A2 1
Dead End
Action
Goal State
State
35
Hindsight Sample 3
Action
Left Outcomes are more likely
Maximize Goal Achievement
I
Probabilistic Outcome
A1
A2
Time 1
A1
A2
A1
A2
A1
A2
A1
A2
Time 2
A1 2 A2 1
Dead End
Action
Goal State
State
36
Hindsight Sample 4
Action
Left Outcomes are more likely
Maximize Goal Achievement
I
Probabilistic Outcome
A1
A2
Time 1
A1
A2
A1
A2
A1
A2
A1
A2
Time 2
A1 3 A2 1
Dead End
Action
Goal State
State

Write a Comment

User Comments (0)