Title: Computational Discovery of Communicable Knowledge
1Challenges in Learning Plan Knowledge
Pat Langley School of Computing and
Informatics Arizona State University Tempe,
Arizona USA Institute for the Study of Learning
and Expertise Palo Alto, California USA
Thanks to D. Choi, T. Konik, U. Kutur, N. Li, D.
Nau, N. Nejati, and D. Shapiro for their many
contributions. This talk reports research funded
by grants from DARPA IPTO, which is not
responsible for its contents.
2Outline of the Talk
- Brief review of learning plan knowledge
- Learning from different sources
- Learning for new performance tasks
- Learning in different scenarios
- Learning with novel representations
- Some responses to these challenges
- Concluding remarks
3The Problem Learning Plan Knowledge
- Given Basic knowledge about some action-oriented
domain. (e.g., state/goal representation,
operators) - Given A set of training problems (e.g., initial
states, goals, and possibly more) - Given Some performance task that the system must
carry out. - Given A performance mechanism that can use
knowledge to carry out that task. - Learn Knowledge that will let the system improve
its ability to perform new tasks from the same
or similar domain.
4Topics Not Covered
This talk will range widely, but I will not cover
issues related to
- Learning with impoverished representations
- Interested in human-like, intelligent behavior
- Most work on reinforcement learning is irrelevant
- Acquiring basic knowledge about domain
- Interested in building on such knowledge
- Most work on learning action models is too basic
- Nonincremental learning from large data sets
- Interested in human-like incremental learning
- This rules out most data-mining approaches
5Historical Topics
There has been a long history of work on learning
plan knowledge
- Forming macro-operators
- Fikes et al. (1972), Iba (1988), Mooney (1989),
Botea et al. (2005) - Inducing forward-chaining control rules
- Anzai Simon (1978) Mitchell et al. (1981),
Langley (1982) - Learning control rules analytically
- Laird et al. (1986), Mitchell et al. (1986),
Minton (1988) - Problem solving by analogy
- Veloso (1994), Jones Langley (1995), VanLehn
Jones (1994) - Inducing control rules for partial-order plans
- Kautukam Kambhampati (1994), Estlin Mooney
(1997)
6Historical Trends
Work on learning plan knowledge has seen many
shifts in fashion
- Early hope for improving problem
solvers/planners (1978?1985) - Excitement/confusion introduced by EBL movement
(1986?1992) - Some doubts raised by the utility problem
(1988?1993) - Mass migration to reinforcement learning
paradigm (1993?2003) - Resurgence of interest in learning plan
knowledge (2004?present)
Throughout these changes, the problems and
potential of learning plan knowledge have
remained.
7Traditional Sources of Information
Most research on learning for planning has
assumed the system uses search to generate
- Successful paths that achieve the goals (positive
instances) - Failed paths that do not achieve the goals
(negative instances) - Alternative paths of different desirability
(preferred instances)
But humans learn from other sources of
information and our AI systems should as well.
8Challenge Learn from Many Sources
There has been relatively little research on plan
learning from
- Demonstrations of solved problems (Nejati et al.,
2006) - Explicit instruction from teacher (Blythe et al.,
2007) - Advice or hints from teacher (Mostow, 1983)
- Mental simulations or daydreaming (Mueller, 1985)
- Undesirable side effects during execution
Humans learn from all of these sources, and our
learning systems should support the same
capabilities. Moreover, we should develop
single systems that integrate plan knowledge
learned from all of them (Oblinger, 2006).
9Traditional Performance Tasks
Most research on learning for planning has
assumed the system aims to improve
- The efficiency of plan generation (nodes
expanded, time) - The quality of generated plans (path length,
utility) - The coverage of plan knowledge (problems solved)
But humans learn and use plan knowledge for
other purposes that are just as valid.
10Challenge Learn for Plan Execution
Many important domains require executing plan
knowledge in some environment that includes
- operators with likely but nonguaranteed effects
- external events not directly under the agents
control - other agents that are pursuing their own goals
Urban driving is one setting that raises all
three of these issues. Complex board games like
chess, although deterministic, still require
interleaving of planning and execution. We need
more research on plan learning in contexts of
this sort (e.g., Benson, 1995 Fern et al.,
2004).
11Challenge Learn for Plan Understanding
Another understudied problem is learning for plan
understanding.
- Given A partially observed sequence of states
influenced by another agents actions. - Given Learned knowledge about how to achieve
goals. - Find The other agents goals and the plans it
is pursuing to achieve them.
Plan understanding is important not only in
complex games, but in military planning,
politics, and other settings. This performance
task suggests new learning problems, methods, and
evaluation criteria.
12Traditional Learning Scenarios
Most research on learning for planning has
assumed the system
- Trains on problems from a given distribution /
domain - Tests on problems from the same distribution /
domain
Success depends on the extent to which the
learner generalizes well to new problems from the
same domain. But humans also use their learned
plan knowledge in other, more flexible ways to
improve performance.
13Challenge Cumulative Learning
In complex domains, humans learn plan knowledge
gradually
- Starting with small, relatively easy problems
- Moving to complex problems after mastering
simpler ones
Later acquisitions build naturally on earlier
experience, learning to cumulative learning.
Our education system depends heavily on such
vertical transfer of learned knowledge. We
need more learning systems that demonstrate this
form of cumulative improvement (e.g., Reddy
Tadepalli, 1997).
14Challenge Cross-Domain Transfer
In other cases, humans exhibit a form of transfer
that involves
- Learning to solve problems in one domain
- Reusing this knowledge to solve problems in
another domain that is superficially quite
different
Such cross-domain transfer is related to
within-domain analogical reasoning, but it is far
more challenging. In its extreme form, the two
domains support similar solutions but have no
shared symbols or predicates. We need more
learning systems that demonstrate this radical
form of knowledge reuse.
15Traditional Learned Representations
Most research on learning for planning has
focused on learning
- Control rules that reduce effective branching
factor - Macro-operators that reduce effective solution
depth
These grew naturally from representations used to
create hand-crafted expert problem solvers. But
now we have other representations of plan
knowledge that suggest new learning tasks and
methods. Nor does this refer to POMDPs,
workflows, or other highly constrained
formalisms.
16Challenge Learn HTNs
Hierarchical task networks (HTNs) offer the most
effective planning available, but they are
expensive to build manually. HTNs provide an
ideal target for learning because they have
- the modularity and flexibility of search-control
rules - the large-scale structure of macro-operators
Machine learning has automated the creation of
expert classifiers. We should do the same for
HTNs, which are effectively expert planning
systems.
17Challenge Learn HTNs
We can define the task of learning hierarchical
task networks as
- Given Basic knowledge about some action-oriented
domain - Given A set of training problems (initial states
and goals) - Given Some performance task the system must
carry out. - Given Some module that uses HTNs to perform this
task - Learn An HTN that lets the system improve its
performance on new tasks from the same or
similar domain.
We need more research on this important topic
(e.g., Reddy Tadepalli, 1997 Ilghami et al.,
2005).
18Some Responses
Our recent research attempts to respond to these
challenges by developing methods that
- acquire a constrained but important class of HTNs
- that one can use for both planning and reactive
control - from both successful problem solving and expert
traces - that extends naturally to support cross-domain
transfer
Moreover, these ideas are embedded in an
integrated architecture that supports many
capabilities ? ICARUS (Langley, 2006).
19Conceptual Knowledge in ICARUS
Nonprimitive Concept (patient-form-filled
?patient)
Primitive Concept (assigned-mission ?patient
?mission)
- Conceptual knowledge is cast as Horn clauses that
specify relevant relations in the environment - Memory is organized hierarchically
- Divided into primitive and non-primitive
predicates
20HTN Methods in ICARUS
HTN goal concept
subgoal
HTN method
precondition concept
HTN method
operator
- Similar to SHOP2 but methods indexed by goals
they achieve - Each method decomposes a goal into subgoals
- If a methods goal is active and its precondition
is satisfied, then try to achieve its subgoals or
apply its operators
21Operators in ICARUS
Action (get-arrival-time ?patient ?from ?to)
Effects Concept (arrival-time ?patient)
Precondition Concept (patient ?p)
and (travel-from ?p ?from) and (travel-to ?p ?to)
- Operators describe low-level actions that agents
can execute directly in the environment - Preconditions legal conditions for action
execution - Effects expected changes when action is executed
22Training Input Expert Traces and Goals
Operator instance (get-arrival-time P2)
Goal concept (all-patients-arranged)
State
Concept instance (assigned-flight P1 M1)
- Expert demonstration traces
- Operators the expert uses and the resulting
belief state - State Set of concept instances
- Goal is a concept instance in the final state
- ICARUS learns generalized skills that achieves
similar goals
23Learning Plan Knowledge from Demonstration
Reactive Executor
Problem
Plan Knowledge
?
Initial State
goal
Learned plan knowledge
If Impasse
HTNs
Demonstration Traces
Expert
States and actions
Operators
Concept definitions
Background knowledge
24Learning HTNs by Trace Analysis
concepts
actions
25Learning HTNs by Trace Analysis
Operator Chaining
26Learning HTNs by Trace Analysis
Concept Chaining
concepts
actions
27Explanation Structure for Trace
(transfer-hospital patient1 hospital2)
(arrange-ground-transportation SFO hospital2 1pm)
Time3
(location patient1 SFO 1pm)
(close-airport hospital2 SFO)
(assigned patient1 NW32)
(arrival-time NW32 1pm)
(dest-airport patient1 SFO)
(query-arrival-time)
(assign patient1 NW32)
Time1
Time2
(scheduled NW32)
(flight-available)
28Hierarchical Task Network Structure
(transfer-hospital ?patient ?hospital)
(close-airport ?hospital ?loc)
(arrange-ground-transportation ?loc ?hospital
?time)
(location ?patient ?loc ?time)
(assigned ?patient ?flight)
(arrival-time ?flight ?time)
(dest-airport ?patient ?loc)
(scheduled ?flight)
(flight-available)
(query-arrival-time)
(assign ?patient ?flight)
29Transfer by Representation Mapping
Source domain
Target domain
concepts
Predicate mappings
actions
30Challenge Learn with Richer Goals
HTNs are more expressive than classical plans
(Errol et al., 1994). Our approach loses this
advantage because it assumes the head of each
method is a goal it achieves, but we can
- Extend goal concepts to describe temporal
behavior - Revise the execution module to handle these
structures - Augment trace analysis to reason about temporal
goals - Learn new methods with temporal goals in their
heads
This scheme should acquire the full class of HTNs
while still retaining the tractability of
goal-directed learning.
31Challenge Extend Conceptual Vocabularies
Our approach to learning HTNs relies on the
concept hierarchy used to explain solution
traces. The method would be less dependent if
it extended this hierarchy
- Given A set of concepts used in goals, states,
and methods - Given New methods acquired from sample solution
traces - Find New concepts that produce improved
performance as the result of future method
learning.
This would support a bootstrapped learner that
invents predicates to describe states, goals, and
methods.
32Challenge Extend Conceptual Vocabularies
Our approach to utilizing predicate invention has
three steps
- Define a new concept for the precondition of each
method learned by chaining off a concept
definition. - Check traces for states in which this concept
becomes true and learn methods to achieve it. - During performance, treat each methods
precondition as its first subgoal, which it can
achieve if submethods are known.
This technique would make an HTN more complete by
growing it downward, introducing nonterminal
symbols as necessary. We have partially
implemented this scheme and hope to report
results at the next meeting.
33Concluding Remarks Research Style
Clearly, there remain many open problems to
address in learning plan knowledge. These
involve new abilities, not improvements on
existing ones, which suggests that we
- Look at human behavior for ideas on how to
proceed - Develop integrated systems rather than component
algorithms - Demonstrate their behavior on challenging domains
These strategies will help us extend the reach of
our learning systems, not just strengthen their
grasp.
34Concluding Remarks Evaluation
We must evaluate our new plan learners, but this
does not mean
- Measuring their speed in generating plans
- Showing they run faster than existing systems
- Entering them in planning competitions
More appropriate experiments would revolve
around
- Demonstrating entirely new functionalities
- Running lesion studies to show new features are
required - Using performance measures appropriate to the task
These steps will produce conceptual advances and
scientific understanding far more than will
mindless bake-offs.
35Concluding Remarks Summary
Learning plan knowledge is a key area with many
open problems
- Learning from traces, advice, and other sources
- Transferring knowledge within and across domains
- Learning and extending rich structures like HTNs
These challenges will benefit from earlier work
on plan learning, but they also require new
ideas. Together, they should lead us toward
learning systems that rival humans in their
flexibility and power.
36End of Presentation
37ICARUS Concepts for In-City Driving
((in-rightmost-lane ?self ?clane) percepts
( (self ?self) (segment ?seg) (line ?clane
segment ?seg)) relations ((driving-well-in-segme
nt ?self ?seg ?clane) (last-lane ?clane) (not
(lane-to-right ?clane ?anylane)))) ((driving-well
-in-segment ?self ?seg ?lane) percepts ((self
?self) (segment ?seg) (line ?lane segment ?seg))
relations ((in-segment ?self ?seg) (in-lane
?self ?lane) (aligned-with-lane-in-segment ?self
?seg ?lane) (centered-in-lane ?self ?seg
?lane) (steering-wheel-straight
?self))) ((in-lane ?self ?lane) percepts
( (self ?self segment ?seg) (line ?lane segment
?seg dist ?dist)) tests ( (gt ?dist -10)
(lt ?dist 0)))
38Representing Short-Term Beliefs/Goals
(current-street me A) (current-segment me
g550) (lane-to-right g599 g601) (first-lane
g599) (last-lane g599) (last-lane
g601) (at-speed-for-u-turn me) (slow-for-right-tur
n me) (steering-wheel-not-straight
me) (centered-in-lane me g550 g599) (in-lane me
g599) (in-segment me g550) (on-right-side-in-segme
nt me) (intersection-behind g550
g522) (building-on-left g288) (building-on-left
g425) (building-on-left g427) (building-on-left
g429) (building-on-left g431) (building-on-left
g433) (building-on-right g287) (building-on-right
g279) (increasing-direction me) (buildings-on-righ
t g287 g279)
39ICARUS Skills for In-City Driving
((in-rightmost-lane ?self ?line) percepts
((self ?self) (line ?line)) start
((last-lane ?line)) subgoals ((driving-well-in-s
egment ?self ?seg ?line))) ((driving-well-in-seg
ment ?self ?seg ?line) percepts ((segment
?seg) (line ?line) (self ?self)) start
((steering-wheel-straight ?self)) subgoals
((in-segment ?self ?seg) (centered-in-lane ?self
?seg ?line) (aligned-with-lane-in-segment ?self
?seg ?line) (steering-wheel-straight
?self))) ((in-segment ?self ?endsg) percepts
((self ?self speed ?speed) (intersection ?int
cross ?cross) (segment ?endsg street ?cross
angle ?angle)) start ((in-intersection-fo
r-right-turn ?self ?int)) actions ((?steer
1)))
40ICARUS Interleaves Execution and Problem Solving
Skill Hierarchy
Problem
Reactive Execution
?
no
impasse?
Primitive Skills
Executed plan
yes
Problem Solving
This organization reflects the psychological
distinction between automatized and controlled
behavior.