Learning Relational Rules for Goal Decomposition - PowerPoint PPT Presentation

About This Presentation
Title:

Learning Relational Rules for Goal Decomposition

Description:

can-land-short(?p) : - type(?p DC10) & wind-speed(low) & runway-cond(dry) Primitive Operators: ... (9), free(11),..., free(15), runway-cond(wet), wind-speed(high) ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 30
Provided by: IBMU407
Learn more at: http://www.isle.org
Category:

less

Transcript and Presenter's Notes

Title: Learning Relational Rules for Goal Decomposition


1
Learning Relational Rules for Goal Decomposition
  • Prasad Tadepalli
  • Oregon State University
  • Chandra Reddy
  • IBM T.J. Watson Research Center
  • Supported by Office of Naval Research

2
A Critique of Current Research
  • Most work is confined to learning in isolation
  • Predominantly employs propositional
  • representations
  • The learner is passive and has to learn
  • from random examples
  • The role of prior knowledge in learning
  • is minimal

3
Our Approach
  • Learning in the context of hierarchical problem
    solving
  • The goals, states and actions are represented
    relationally
  • Active Learning Learner can ask questions, pose
    problems to itself, and solve them
  • Declarative prior knowledge guides and speeds up
    learning

4
Air Traffic Control (ATC) Task(Ackerman and
Kanfer study)
5
Goal Decomposition Rules (D-rules)
  • D-rules decompose goals into subgoals.
  • goal land(?plane)
  • condition plane-at(?plane, ?loc) level(L3,
    ?loc)
  • subgoals move(?plane, L2) move(?plane, L1)
    land1(?plane)
  • Problems are solved by recursive decomposition of
    goals to subgoals
  • Control knowledge guides the selection of
    appropriate decomposition rules.

6
Domain theory for ATC task
  • Domain Axioms
  • can-land-short(?p) - type(?p propeller)
  • can-land-short(?p) - type(?p DC10)
    wind-speed(low) runway-cond(dry)
  • Primitive Operators
  • jump(?cursor-from, ?cursor-to),
  • short-deposit(?plane, ?runway),
  • long-deposit(?plane, ?runway),
  • select(?loc, ?plane)

7
Learning from Demonstration
  • Input Examples
  • State at(p1, 10), type(p1, propeller),
    fuel(p1, 5),cursor-loc(4), free(1), free(2),,
    free(9), free(11),, free(15), runway-cond(wet),
    wind-speed(high), wind-dir(south)
  • Goal land-plane(p1)
  • Solution jump(4, 10), select(10,p1),
    jump(10,14), short-deposit(p1,R2)
  • Output underlying D-rules

8
Generalizing Examples
  • Examples are inductively generalized
  • Examples to D-rules
  • Example goal D-rule Goal
  • Initial state Condition
  • Literals in other states Subgoals
  • Least General Generalization (lgg)

lgg
X
H
Problem Size of lgg grows exponentially with
the number of examples.
9
Learning from Queries
  • Use queries to prevent the exponential growth of
    the lgg
  • (Reddy and Tadepalli, 1997) Non-recursive,
    single-predicate Horn programs are learnable
    from queries and examples.
  • Prune each literal in the lgg and ask a
    membership query (a question) to confirm that the
    result is not overgeneral.

10
Need for queries
D
11
Need for queries
x
D
12
Need for queries
x
lgg
D
13
Need for queries
x
D
target
14
Need for queries
overgeneral
x
D
15
Need for queries
x
D
16
Using Prior Knowledge
  • Explanation-Based Pruning
  • Remove literals that don't play a causal role in
    the plan e.g., free(1), free(2), ...etc.
  • Abstraction by Forward Chaining
  • can-land-short(?p) - type(?p propeller)
  • Helps learn a more general rule.
  • Learning subgoal order
  • Subgoal literals are maintained as a sequence of
    sets of literals. A set is refined into a
    sequence of smaller sets using multiple examples.

17
Learning Multiple D-Rules
  • Maintain a list of d-rules for each goal.
  • Combine a new example x with the first d-rule
    hi for which lgg(x,hi) is not over-general
  • Reduce the result and replace hi

18
Results on learning from demonstration
19
Learning from Exercises
  • Supplying solved training examples is too
    demanding for the teacher.
  • Solving problems from scratch is computationally
    hard.
  • A compromise solution learning from exercises.
  • Exercises are helpful intermediate subproblems
    that help solve the main problems.
  • Solving easier subproblems makes it possible to
    solve more difficult problems.

20
Difficulty Levels in ATC Domain
21
Solving Exercises
  • Use previously learned d-rules as operators .
  • Iterative-deepening DFS to find short rules.
  • Generalization is done as before.

22
Query Answering by Testing
  • Generate test problems InitialState, Goal that
    match the d-rule.
  • Use the decomposition that the d-rule suggests,
    and solve the problems
  • If some problem cannot be solved the rule is
    over-general.

23
Results on learning from exercises
  • 14 d-rules

24
Conclusions
  • It is possible to learn useful problem-solving
    strategies in expressive representations.
  • Prior knowledge can be put to good use in
    learning.
  • Queries can be implemented approximately using
    heuristic techniques.
  • Learning from demonstration and learning from
    exercises make different tradeoffs with respect
    to learning and reasoning.

25
Learning for Training Environments(Ron Metoyer)
  • Task Training
  • Sports
  • Military

Boston Dynamics Inc.
Who creates the training content?
26
Research Challenges
  • Learning must be on-line. Must learn quickly,
    since users can only give a few examples.
  • Extension to more complex strategy languages that
    include concurrency, partial observability,
    real-time execution, multiple agents, e.g.,
    ConGolog
  • Provide a predictable model of generalization.
  • Allow learning from demonstrations,
    reinforcement, advice, and hints e.g., improving
    or learning to select between strategies.

27
Vehicle Routing Product Delivery
28
Learning Challenges
  • Very large number of states and actions
  • Stochastic demands by customers and shops
  • Multiple agents (trucks, truck companies, shops,
    distribution centers)
  • Partial observability
  • Hierarchical decision making
  • Significant real-world impact

29
ICML Workshop onRelational Reinforcement Learning
  • Paper Deadline April 2
  • Check ICML website
Write a Comment
User Comments (0)
About PowerShow.com