Title: Experience-Based Identification
1Experience-BasedIdentification Control
viaHigher-Level Reinforcement Learning
- George G. Lendaris
- NW Computational Intelligence Laboratory
- Portland State University, Portland, OR
- Supported by NSF ECS-0301022
2- Ideas Example (Assume experienced car driver)
- I. Car attributes
- 1) driving own car
- 2) driving friends car.
- II. Environment clear afternoon with
- 1) dry pavement
- 2) icy pavement.
- III. Performance criteria
- 1) Road race minimize time.
- 2) Elderly relative on excursion maximize
comfort. - Use same base set of driving skills, but when
change from 1 to 2, make adjustments to
control law and/or decision logic, from
collection acquired via EXPERIENCE. - CONTEXT comprises I, II, III.
-
NWCIL WCCI - IJCNN2006 2
3CONTEXT We formulate context as comprising
three components A) Plant, B) Environment, and
C) Objectives
(characterized via performance criteria labeled
CF). Specification of all three yields a
specific context to each specific context there
corresponds a particular control law a change in
any of the components results in a different
context.
CONTEXT
. . . . . . .
CONTROL LAW REPOSITORY (EXPERIENCE)
NWCIL WCCI - IJCNN2006 3
4- CONTEXT is fundamental to the approach, so
- performed a historical overview of the control
field vis-à-vis the explicit role that context
has (or has not) played in the various
formulations and approaches. - Phase 1 DESIGN BASED on INTUITION and
- INVENTION.
- Phase 2 DESIGN BASED on MATHEMATICAL
TOOLS. - Phase 3 DESIGN for CONTEXT DEPENDENCE.
- e.g. Adaptive Control and Learning Control
- accommodates a modicum of variations in
- context via on-line parameter adjustments
- Phase 4 (next slide)
NWCIL WCCI - IJCNN2006 4
5Phase 4 (new) DESIGN for EXPERIENCE-BASED
PROCESSES, including
AUTONOMOUS CONTEXT
DISCERNMENT and MODEL
SELECTION Stipulated requirements for this
phase - Agent has the ability a) to use
experience for model selection (plant
or controller) and b) to do so effectively
and efficiently. Fundamental aspects to
consider 1) context, 2) discerning current
context, 3) selecting appropriate model from
experience repository for the discerned context,
and 4) doing the latter two in an effective and
efficient manner. (Aspects 3, 4, and potentially
2, entail a memory property)
NWCIL WCCI - IJCNN2006 5
6- KEY IDEA of HLLA
- Re-purpose the Reinforcement Learning method
(to a higher level) such that - Instead of using it to design an optimal
controller for a given task, - An already achieved collection of such solutions
for a variety of related contexts is provided - (as an experience repository), and
- HLLA creates a strategy for optimally selecting
a solution from the repository.
NWCIL WCCI - IJCNN2006 6
7 Conceptual layout of EB Control process
Context Monitoring by Agent (context
awareness)
Starting Condition
All OK
Criterion Function Assessment
CONTROLLER
PLANT
Off Nominal
Agent Performs
Controller SELECTION (EB)
EB-UPDATED PLANT MODEL
Agent Perform SID (EB)
Off Nominal
context discernment
EB-UPDATED CONTROLLER MODEL
Criterion Function Assessment
EB-UPDATED PLANT MODEL
All OK
Install Updated Controller Design
Run Simulation
NWCIL WCCI - IJCNN2006 7
8- Populating the Repository
- In practice, might build repository piece by
piece - - via available design tools (e.g., Phase 2
tools), and - generate controllers for a given application and
collect them into a repository, along with a list
of attributes (parameters) that can be used as an - index to facilitate their selection, BUT
NWCIL WCCI - IJCNN2006 8
9- Populating the Repository continued
- difficult to define the list of attributes to
serve as useful indexing mechanisms, and - to come up with a useful parameterization of the
context for the given task. - We note that the choice of representations and
associated mappings directly influences
subsequent - - efficiency of access
- - notion of nearness, and
- - notion of generalization
NWCIL WCCI - IJCNN2006 9
10- Populating the Repository continued
- For research purposes, start with a synthetic
method - employ an analytic equation, neural network,
- their parameters provide built-in indexing
- mechanisms
- e.g., if the plant is known to be linear, employ
say a fifth order transfer function, and the
functions coefficients used to define the
indexing - e.g., if known to be second order linear, then
use - second order transfer function
- ? access via 2-dim. vs. 5-dim space
- ? efficiency generalization
implications
NWCIL WCCI - IJCNN2006 10
11- MANIFOLDS
- Important to endow the index with
- property of being searchable, and
- operational notion of nearness
- The mathematical construct of manifolds (from
geometric topology) provides a useful formalism
for this application. - a set of elements, S, and
- a coordinate system in Rn
- (a one-to-one mapping from S to Rn that
specifies each element in S via a vector of n
real numbers, a.k.a. the coordinates of the
element). For Rn is searchable nearness is
Euclidean. - (Terms index and coordinates are here
synonymous.)
NWCIL WCCI - IJCNN2006 11
12MAPPINGS BETWEEN CONTEXT SPACE AND VARIOUS
MANIFOLDS (? repository set part)
In general, how does one craft an appropriate
mapping from the full Context Space (whatever
form of representation is employed) to the
coordinate system of the control manifold?
Strategy is to employ HLLA to learn the mappings.
NWCIL WCCI - IJCNN2006 12
13EBSID Example Neural Network as Plant. The goal
of this SID process was to train a NN (called
CDN) to select a NN from a neural manifold (plant
model repository) to match the behavior of an
observed NN (the plant/ system).
NWCIL WCCI - IJCNN2006 13
14EBSID Example Pole-Cart Plant. This benchmark
system was used to further develop ideas.
Repository populated synthetically via a
parameterized set of equations for the pole-cart
plant. Parameters explicitly included the pole
length and mass. The latter were varied during
the experiment. Task was to discern changes in
context (pole length, pole mass) when they
occurred. After the CDN is trained, it
functions to adjust the manifold coordinate
values (pole length and mass). These parameter
values instantiate a plant model in the
repository. Once the CDN issues null adjustments,
the appropriate model has been selected
(specified via current values of CD) and is to be
used by the Agent to select a corresponding
controller
NWCIL WCCI - IJCNN2006 14
15Demonstration of Context Discernment in response
to change in Pole-Cart parameter values (context
change) at every 50th iteration
Errors between state variable values for
pole-cart system and for models selected during
discernment process.
NWCIL WCCI - IJCNN2006 15
16Extension of Example 3 first foray into mixed
representations. Use NN models of the pole-cart
plant in the repository instead of equations. A
NN with 174 weights trained to emulate the plant
for a given length/mass combination. Using this
trained NN as a starting point, eight new NNs
were trained for different mass/length (M/L)
combinations. The weights of the resulting NNs
were analyzed for changes from the original
(base) case A sensitivity-variance metric was
crafted and used to select 22 weights. A new NN
with 152 weights frozen to the base-case design
and with 22 adjustable weights was then trained
for the same eight cases, yielding models all
within tolerance.
NWCIL WCCI - IJCNN2006 16
17- Extension of Example 3 first foray into mixed
representations (continued) - - These NNs now populate the repository, with the
22 adjustable weights as its coordinates. - - Even though the pole M L are the two context
parameters that are changing, the representation
available to the Agent comprises 22 parameters. - This is a more difficult task than previous
example, as the model plant and actual plant have
different forms of representation. - The HLLA is to design a CDN that selects the
appropriate NN for a given M/L combination, and
may have to develop mappings between the two sets
of coordinates. - NOT ACCOMPLISHED YET, as moved on to next task
NWCIL WCCI - IJCNN2006 17
18To move beyond working with just simulations, we
have begun initial stages of implementing a Robot
with Context Discerning and Controller Selecting
capability at NWCIL.
Sony AIBO robotic dog has been modified into a
research platform. Its task is to learn to
discern changes in walking surface types and
adjust its gait accordingly.
NWCIL WCCI - IJCNN2006 18
19- AIBO experiments to date
- Constructed five different surface types (4
long) - hardwood, thin foam, thin carpet, thick
shag carpet, - reversed shag carpet
- 2. Used genetic algorithm to develop good AIBO
gait for each of five surfaces. - Note each resulting gait yields better
performance measure on its respective surface
than does the default gait differences are
visually discernable. - 3. AIBO made test walks on the five surfaces for
each gait, and data streams from 17
joint-actuator sensors were recorded. - An AR model was computed/stored for each of the
25 sets of sensory experiences.
NWCIL WCCI - IJCNN2006 19
20- AIBO experiments (cont.)
- 5. Now, when walking on a surface, AIBO discerns
the - surface type (CONTEXT) by processing its
- current kinesthetic experience through the
- models in its repository, and selects the
most - similar one.
- 6. It then selects the gait corresponding to this
- surface, and adjusts its walk accordingly.
- 7. Show video
NWCIL WCCI - IJCNN2006 20
21CONCLUSION We currently conjecture that the
proposed experience-based approach will usher in
a whole new phase of development of the decision
and controls fields making a significant stride
toward the achievement of more human-like
decision and control. We also conjecture that
the context discernment concepts plus the
manifolds representation will provide a basis for
constructing learning agents capable of long term
rapidly accessible memory. If so, this could pave
the way for scaling neural systems to brain-like
capabilities
NWCIL WCCI - IJCNN2006 21
22I wish to acknowledge the creative role filled by
the various graduate students who have worked in
the NWCIL, both present and past. These include,
but not limited to, in alphabetical order
Michael Carroll Christian Paintz Lars
Holmstrom Alec Rogers Steven Hutsell Adreas
Rustan Bryan Johnson Larry Schultz Joe
Lotz Steve Shervais Shari Matzner Andrew
Toland Jay McCarthy Sangsuree Vasupongayya
Particular acknowledgements go to my Lab
Managers Thaddeus Shannon Roberto
Santiago. ----------------------- Also, NSF
Grants ECS-9904378 ECS-0301022
NWCIL WCCI - IJCNN2006 22
23NWCIL WCCI - IJCNN2006 23
24Demonstration of Context Discernment in response
to change in NN plant parameter values (context
change) at every 100th iteration.
NWCIL WCCI - IJCNN2006 24
25- CONTEXT DISCERNMENT
- When done relative to the plant, entails a
form of System Identification (SID). - Employ notion of experience repository for the
SID task as well (comprising
relevant plant models). - HLLA creates a strategy for optimally selecting
a model from the repository.
NWCIL WCCI - IJCNN2006 25
26- SELECTION
- - The selection process is triggered by the Agent
becoming aware that a change in context has
occurred (contextual awareness) - Followed by the Agent seeking information about
what changed (context discernment) - And finally, by selection.
NWCIL WCCI - IJCNN2006 26
27- At NWCIL, experience is being addressed
- - via a notion of experience repository, and
- via a novel concept for applying
- Reinforcement Learning / Adaptive Critics
- vis-à-vis the experience repository,
- ? Higher-Level Learning Algorithm
(HLLA).
NWCIL WCCI - IJCNN2006 27
28System configuration during DHP Design of
Controller Augmentation System (2003)
NWCIL WCCI - IJCNN2006 28
29- Blue LoFLYTE w/ Unaugmented control
- Red LoFLYTE w/Augmented Control
- Black LoFLYTE
Pitch w/ cg Shift
NWCIL WCCI - IJCNN2006 29
30- Blue LoFLYTE w/ Unaugmented control
- Red LoFLYTE w/Augmented Control
- Black LoFLYTE
Roll 1
NWCIL WCCI - IJCNN2006 30
31- Blue LoFLYTE w/ Unaugmented control
- Red LoFLYTE w/Augmented Control
- Black LoFLYTE
Roll 2
NWCIL WCCI - IJCNN2006 31
32DEFINITIONS 1. Agent computational
intelligence device. 2. Context Variables (Agent
centric) those attributes of i) the environment
and ii) the plant/process whose variations could
engender changes to the decision rule / control
policy employed by the Agent while accomplishing
the Agents current objective or goal and in
addition, iii) the criteria (representing the
objective or goal) to be used for designing and
subsequent selection of the decision rule or
control law. We use the term Criterion Function
(CF) to represent these criteria. 3. Context
Space (Agent centric) a vector space in which
each context variable is associated to a
dimension. The Context Space is conceptualized as
comprising three sub-spaces, one each associated
with the i) Plant, ii) Environment, and iii)
Criterion Function. 4. Context (Agent centric)
a point in Context Space the set of values taken
on by the context variables in a given situation.
NWCIL WCCI - IJCNN2006 32
33- 5. Context Discernment the act or process of
determining the current values of the context
variables (current point in Context Space)
appropriate to the task being performed. Webster
on-line for discern to recognize or identify
as separate and distinct. - 6. Experience A two-component concept
- Component A Repository of previously developed
context-specific models
(controller, plant, or CF models), and - Component B Algorithms used by the Agent to
effectively and efficiently select a model
from the repository as changes in context
occur. Note A key task of the HLLA is to
train the Agent to learn Component B. - 7. Selection the act of choosing/retrieving
appropriate element of the repository
corresponding to the discerned context. - 8. Higher-Level Learning Algorithm (HLLA) The
reference level for the term higher is the case
where learning algorithms are applied directly to
the design of optimal controllers as in Learning
Control, ones that would be accumulated in the
repository. Higher-Level here means applying
the learning method to create a strategy for
selecting a good controller from the repository,
where the process of selection is optimized.
Definition of the Utility function (CF) is key
for application of this process.
NWCIL WCCI - IJCNN2006 33
34OBSERVATION 1 In the case of humans, the
more knowledge / experience attained, the more
improvement in effectiveness of performing new
related tasks, with little or no speed
penalty. OBSERVATION 2 In the case of AI
rule-based systems, the more knowledge attained,
the slower the processing. RESEARCH OBJECTIVE
Develop a computational intelligence device
(Agent) that employs experience to enhance
effectiveness and efficiency of certain processes
i.e. endow these processes with more human-like
attributes.
NWCIL WCCI - IJCNN2006 34
35- Since CONTEXT is fundamental to the approach
- performed a historical overview of the control
field vis-à-vis the explicit role that context
has (or has not) played in the various
formulations and approaches. - overview also motivated by the belief that
adding the capability to employ experience in the
controller design / selection process will usher
in a qualitatively new phase in the evolution of
the controls field.
NWCIL WCCI - IJCNN2006 35
36- Phase 1 DESIGN BASED on INTUITION and
- INVENTION.
- control devices date to antiquity
-
- well-known recent device is the flyball
governor James Watt, 1788 - design of such devices were the product of
intuition and inventive genius, with little
support from mathematically based tools, and with
no explicit notion of context.
NWCIL WCCI - IJCNN2006 36
37- Phase 2 DESIGN BASED on MATHEMATICAL
TOOLS. - mathematics has played a fundamental role in
developing the control field - Maxwell used differential equations to analyze
the flyball governor dynamics, ca. 1870 - followed by Fourier and Laplace transforms, state
space methods, stochastic methods, Hilbert space
methods, algebraic and geometric topological
methods, - - design is done off-line
NWCIL WCCI - IJCNN2006 37
38Phase 2 - continued - contains design methods
where the controller is placed in service with no
associated mechanism for modifying its design in
response to changes in context - each controller
design is based on a single point in the Context
Space, or at most, a small neighborhood of
points - this phase includes at least the
following well known design methods Classical
Control, Modern Control, Optimal Control,
Stochastic Control, and Robust Control
NWCIL WCCI - IJCNN2006 38
39- Phase 3 DESIGN for CONTEXT DEPENDENCE.
- such large variation in context that
- - fixed controllers are not sufficient
- first, discern current context
- then, use previously designated process to
- adjust controller parameters, based on
- observations
- e.g. Adaptive Control and Learning Control
- accommodates a modicum of variations in context
via on-line parameter adjustments - mechanism for performing accommodations is
distinct from that defined for Phase 4.
NWCIL WCCI - IJCNN2006 39
40NWCIL WCCI - IJCNN2006 40
41EB IDENTIFICATION AND CONTROL
Generic structure for developing proposed
Experience Based System Identifier
NWCIL WCCI - IJCNN2006 41
42EB IDENTIFICATION AND CONTROL cont.
Generic structure for developing proposed
Experience Based Controller
A good EB-Algorithm minimizes number of selection
cycles it has learned (via the Adaptive Critic
method) to make optimal use of a priori
knowledge.
NWCIL WCCI - IJCNN2006 42
43NWCIL WCCI - IJCNN2006 43