AN ADAPTIVE PLANNER BASED ON LEARNING OF PLANNING PERFORMANCE - PowerPoint PPT Presentation

About This Presentation
Title:

AN ADAPTIVE PLANNER BASED ON LEARNING OF PLANNING PERFORMANCE

Description:

Learning Bugs and Repairs. Explanation-Based Learning. Reinforcement Learning. Case-Based Planning ... Common LISP, SPARCStation. EXAMPLE OF REUSE: SUSSMAN ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 17
Provided by: kgo3
Category:

less

Transcript and Presenter's Notes

Title: AN ADAPTIVE PLANNER BASED ON LEARNING OF PLANNING PERFORMANCE


1
AN ADAPTIVE PLANNER BASED ON LEARNING OF
PLANNING PERFORMANCE
  • Kreshna Gopal Thomas R. Ioerger
  • Department of Computer Science
  • Texas AM University
  • College Station, TX

2
APPROACHES TO PLANNING
  • Planning as Problem Solving
  • Situation Calculus Planning
  • STRIPS
  • Partial Order Planning
  • Hierarchical Planning
  • Enhance Language Expressiveness
  • Planning with Constraints
  • Special Purpose Planning
  • Reactive Planning
  • Plan Execution and Monitoring
  • Distributed, Continual Planning
  • Planning Graphs
  • Planning as Satisfiability
  • Machine Learning Methods for Planning

3
MACHINE LEARNING METHODS FOR PLANNING
  • Learning Macro-operators
  • Learning Bugs and Repairs
  • Explanation-Based Learning
  • Reinforcement Learning
  • Case-Based Planning
  • Plan Reuse

4
PLAN REUSE ISSUES
  • Plan storage indexing
  • Plan retrieval - matching new problem with solved
    ones
  • Plan modification - to suit requirements of new
    problem
  • Nebel Koehler
  • Plan matching is NP-hard
  • Plan modification is worse than plan generation
  • Motivations of proposed method
  • Avoid plan modification
  • Very efficient matching using a neural network

5
COMPONENTS OF PROPOSED PLANNING SYSTEM
  • Default Planner
  • Plan Library (I Initial State, G Goal State, P
    Solution Plan)
  • I1 G1 P1
  • I2 G2 P2
  • .
  • .
  • .
  • In Gn Pn
  • Training predict default planners performance
    using a neural network

6
SCHEME OF REUSE
  • New problem ltInew,Gnewgt
  • Retrieved plan ltIk,Gk,Pkgt
  • Pnew
  • Inew Gnew
  • PI PG
  • Ik Gk
  • Pk
  • Proposed approach use default planner to
    generate PI and PG (instead of Pnew) and return
    concatenation of PI , Pk and PG as solution

7
DISTANCE AND GAIN METRICS
  • Distance(I,G)
  • Time default planner will take to solve ltI,Ggt
  • Gain(Inew,Gnew,Ik,Gk)
  • Distance(Inew,Gnew)
  • Distance(Inew,Ik) Distance(Gk,Gnew)
  • Choose case with maximum Gain
  • There should be a minimum Distance and a minimum
    Gain for reuse

8
THE TRAINING PHASE
  • Target function
  • Time prediction, t
  • Training experience
  • Solved examples, D
  • Target function representation
  • n
  • t ? wifi (n features, f0 1)
  • i 0
  • Learning algorithm
  • Gradient descent minimizes error, E, of weight
    vector w
  • E(w) ½ ? (td - od)2
  • d ? D

9
GRADIENT DESCENT ALGORITHM
  • Inputs 1. Training examples, where each example
    is a pair ltx,tgt, where x is the
    input vector and t is target output
    value.
  • 2. Learning rate, ?
  • 3. Number of iterations, m
  • Initialize each wi to some small random value
  • repeat m times
  • Initialize each ?wi to 0
  • for each training example ltx,tgt do
  • Find the output o of the unit on input x
  • for each linear unit weight wi do
  • ?wi ?wi ?(t - o)xi
  • for each linear unit weight wi do
  • wi wi ?wi

10
FEATURE EXTRACTION
  • Feature extraction by domain experts
  • Knowledge-acquisition bottleneck
  • Automatic feature extraction methods can be used
  • Domain dependence domain knowledge is crucial
    for efficient planning systems

11
PLAN RETRIEVAL AND REUSE ALGORITHM
  • Inputs LIBRARY, w, MinGain, MinTime and a new
    problem ltInew, Gnewgt
  • if Distance(Inew, Gnew) lt MinTime
  • then Call default planner to solve ltInew, Gnewgt
  • else
  • MaxGain -? / MaxGain records
    the maximum Gain so far /
  • for k 1 to n do / There are n
    cases in LIBRARY /
  • kth case ltIk, Gk, Pkgt
  • Gain Distance(Inew, Gnew)
    /Distance(Inew, Ik)
  • Distance(Gk, Gnew)
  • if Gain gt MaxGain then
  • MaxGain Gain
  • b k / b is the index
    of the best case found so far /
  • if MaxGain gt MinGain
  • then

12
EMPIRICAL EVALUATION
  • Default planner STRIPS (Shorts Dickens)
  • Learning perceptron
  • Blocks-world domain (3-7 blocks)
  • Plan library (100 - 1000 cases)
  • Common LISP, SPARCStation

13
EXAMPLE OF REUSE SUSSMAN ANOMALY PROBLEM
  • Inew Gnew
  • PI,K PG,k
  • PK
  • Ik Gk
  • PK MOVE-BLOCK-TO-TABLE(Blue,Red),
  • MOVE-BLOCK-FROM-TABLE(Red,Blue)
  • PI,K MOVE-BLOCK-TO-BLOCK(Blue,Yellow,Red)
  • PG,K MOVE-BLOCK-FROM-TABLE(Yellow,Red)

14
BLOCKS-WORLD DOMAIN
  • BLOCKS A, B, C,
  • PREDICATES
  • ON(A,B) block A is on block B
  • ON-TABLE(B) block B is on table
  • CLEAR(A) block A is clear
  • OPERATORS
  • MOVE-BLOCK-TO-BLOCK(A,B,C)
  • Move A from top of B to top of C
  • MOVE-BLOCK-TO-TABLE(A,B)
  • Move A from top of B to table
  • MOVE-BLOCK-FROM-TABLE(A,B)
  • Move A from table to top of B

15
FEATURES
  • Domain-independent features
  • Size of problem
  • SIZE
  • Number of conditions in goal state already
    satisfied in initial state
  • SAT-CLEAR, SAT-ON, SAT-ON-TABLE
  • Number of conditions in goal state not satisfied
    in initial state
  • UNSAT-CLEAR, UNSAT-ON,
  • UNSAT-ON-TABLE
  • Domain-dependent features
  • Number of stacks in the initial and goal states
  • STACK-INIT, STACK-GOAL
  • Number of blocks already in place i.e. they need
    not be moved to reach goal configuration
  • IN-PLACE
  • Heuristic function which guesses the number of
    planning steps
  • STEPS

16
CONCLUSIONS
  • Plan modification is avoided
  • Problem matching is done very efficiently
  • The planning system is domain-independent
  • Other target functions (like quality of plans)
    can be learned and predicted
  • Utility problem
  • Indexing the library
  • Selective storage
  • Integrate with other techniques
Write a Comment
User Comments (0)
About PowerShow.com