Title: Learning Relational Rules for Goal Decomposition
1 Learning Relational Rules for Goal Decomposition
- Prasad Tadepalli
- Oregon State University
- Chandra Reddy
- IBM T.J. Watson Research Center
- Supported by Office of Naval Research
2A Critique of Current Research
- Most work is confined to learning in isolation
- Predominantly employs propositional
- representations
- The learner is passive and has to learn
- from random examples
- The role of prior knowledge in learning
- is minimal
3Our Approach
- Learning in the context of hierarchical problem
solving - The goals, states and actions are represented
relationally - Active Learning Learner can ask questions, pose
problems to itself, and solve them - Declarative prior knowledge guides and speeds up
learning
4Air Traffic Control (ATC) Task(Ackerman and
Kanfer study)
5Goal Decomposition Rules (D-rules)
- D-rules decompose goals into subgoals.
- goal land(?plane)
- condition plane-at(?plane, ?loc) level(L3,
?loc) - subgoals move(?plane, L2) move(?plane, L1)
land1(?plane) - Problems are solved by recursive decomposition of
goals to subgoals - Control knowledge guides the selection of
appropriate decomposition rules.
6Domain theory for ATC task
- Domain Axioms
- can-land-short(?p) - type(?p propeller)
- can-land-short(?p) - type(?p DC10)
wind-speed(low) runway-cond(dry) - Primitive Operators
- jump(?cursor-from, ?cursor-to),
- short-deposit(?plane, ?runway),
- long-deposit(?plane, ?runway),
- select(?loc, ?plane)
7Learning from Demonstration
- Input Examples
- State at(p1, 10), type(p1, propeller),
fuel(p1, 5),cursor-loc(4), free(1), free(2),,
free(9), free(11),, free(15), runway-cond(wet),
wind-speed(high), wind-dir(south) - Goal land-plane(p1)
- Solution jump(4, 10), select(10,p1),
jump(10,14), short-deposit(p1,R2) - Output underlying D-rules
8Generalizing Examples
- Examples are inductively generalized
- Examples to D-rules
- Example goal D-rule Goal
- Initial state Condition
- Literals in other states Subgoals
- Least General Generalization (lgg)
-
lgg
X
H
Problem Size of lgg grows exponentially with
the number of examples.
9Learning from Queries
- Use queries to prevent the exponential growth of
the lgg - (Reddy and Tadepalli, 1997) Non-recursive,
single-predicate Horn programs are learnable
from queries and examples. - Prune each literal in the lgg and ask a
membership query (a question) to confirm that the
result is not overgeneral.
10Need for queries
D
11Need for queries
x
D
12Need for queries
x
lgg
D
13Need for queries
x
D
target
14Need for queries
overgeneral
x
D
15Need for queries
x
D
16Using Prior Knowledge
- Explanation-Based Pruning
- Remove literals that don't play a causal role in
the plan e.g., free(1), free(2), ...etc. - Abstraction by Forward Chaining
- can-land-short(?p) - type(?p propeller)
- Helps learn a more general rule.
- Learning subgoal order
- Subgoal literals are maintained as a sequence of
sets of literals. A set is refined into a
sequence of smaller sets using multiple examples.
17Learning Multiple D-Rules
- Maintain a list of d-rules for each goal.
- Combine a new example x with the first d-rule
hi for which lgg(x,hi) is not over-general
- Reduce the result and replace hi
18Results on learning from demonstration
19Learning from Exercises
- Supplying solved training examples is too
demanding for the teacher. - Solving problems from scratch is computationally
hard. - A compromise solution learning from exercises.
- Exercises are helpful intermediate subproblems
that help solve the main problems. - Solving easier subproblems makes it possible to
solve more difficult problems.
20Difficulty Levels in ATC Domain
21Solving Exercises
- Use previously learned d-rules as operators .
- Iterative-deepening DFS to find short rules.
- Generalization is done as before.
22Query Answering by Testing
- Generate test problems InitialState, Goal that
match the d-rule. - Use the decomposition that the d-rule suggests,
and solve the problems - If some problem cannot be solved the rule is
over-general.
23Results on learning from exercises
24Conclusions
- It is possible to learn useful problem-solving
strategies in expressive representations. - Prior knowledge can be put to good use in
learning. - Queries can be implemented approximately using
heuristic techniques. - Learning from demonstration and learning from
exercises make different tradeoffs with respect
to learning and reasoning.
25Learning for Training Environments(Ron Metoyer)
- Task Training
- Sports
- Military
Boston Dynamics Inc.
Who creates the training content?
26Research Challenges
- Learning must be on-line. Must learn quickly,
since users can only give a few examples. - Extension to more complex strategy languages that
include concurrency, partial observability,
real-time execution, multiple agents, e.g.,
ConGolog - Provide a predictable model of generalization.
- Allow learning from demonstrations,
reinforcement, advice, and hints e.g., improving
or learning to select between strategies.
27Vehicle Routing Product Delivery
28Learning Challenges
- Very large number of states and actions
- Stochastic demands by customers and shops
- Multiple agents (trucks, truck companies, shops,
distribution centers) - Partial observability
- Hierarchical decision making
- Significant real-world impact
29ICML Workshop onRelational Reinforcement Learning
- Paper Deadline April 2
- Check ICML website