Experience-Based Identification - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Experience-Based Identification

Description:

Jay McCarthy Sangsuree Vasupongayya. Particular acknowledgements go to my Lab Managers: ... Blue: LoFLYTE w/ Unaugmented control. Red: LoFLYTE w/Augmented ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 44
Provided by: georgegl7
Category:

less

Transcript and Presenter's Notes

Title: Experience-Based Identification


1
Experience-BasedIdentification Control
viaHigher-Level Reinforcement Learning
  • George G. Lendaris
  • NW Computational Intelligence Laboratory
  • Portland State University, Portland, OR
  • Supported by NSF ECS-0301022

2
  • Ideas Example (Assume experienced car driver)
  • I. Car attributes
  • 1) driving own car
  • 2) driving friends car.
  • II. Environment clear afternoon with
  • 1) dry pavement
  • 2) icy pavement.
  • III. Performance criteria
  • 1) Road race minimize time.
  • 2) Elderly relative on excursion maximize
    comfort.
  • Use same base set of driving skills, but when
    change from 1 to 2, make adjustments to
    control law and/or decision logic, from
    collection acquired via EXPERIENCE.
  • CONTEXT comprises I, II, III.

NWCIL WCCI - IJCNN2006 2
3
CONTEXT We formulate context as comprising
three components A) Plant, B) Environment, and
C) Objectives
(characterized via performance criteria labeled
CF). Specification of all three yields a
specific context to each specific context there
corresponds a particular control law a change in
any of the components results in a different
context.
CONTEXT
  • B. ENVIRONMENT
  • .1 .2
  • A. PLANT
  • .1 .2
  • C. CF
  • .1 .2

. . . . . . .
CONTROL LAW REPOSITORY (EXPERIENCE)
NWCIL WCCI - IJCNN2006 3
4
  • CONTEXT is fundamental to the approach, so
  • performed a historical overview of the control
    field vis-à-vis the explicit role that context
    has (or has not) played in the various
    formulations and approaches.
  • Phase 1 DESIGN BASED on INTUITION and
  • INVENTION.
  • Phase 2 DESIGN BASED on MATHEMATICAL
    TOOLS.
  • Phase 3 DESIGN for CONTEXT DEPENDENCE.
  • e.g. Adaptive Control and Learning Control
  • accommodates a modicum of variations in
  • context via on-line parameter adjustments
  • Phase 4 (next slide)

NWCIL WCCI - IJCNN2006 4
5
Phase 4 (new) DESIGN for EXPERIENCE-BASED
PROCESSES, including
AUTONOMOUS CONTEXT
DISCERNMENT and MODEL
SELECTION Stipulated requirements for this
phase - Agent has the ability a) to use
experience for model selection (plant
or controller) and b) to do so effectively
and efficiently. Fundamental aspects to
consider 1) context, 2) discerning current
context, 3) selecting appropriate model from
experience repository for the discerned context,
and 4) doing the latter two in an effective and
efficient manner. (Aspects 3, 4, and potentially
2, entail a memory property)
NWCIL WCCI - IJCNN2006 5
6
  • KEY IDEA of HLLA
  • Re-purpose the Reinforcement Learning method
    (to a higher level) such that
  • Instead of using it to design an optimal
    controller for a given task,
  • An already achieved collection of such solutions
    for a variety of related contexts is provided
  • (as an experience repository), and
  • HLLA creates a strategy for optimally selecting
    a solution from the repository.

NWCIL WCCI - IJCNN2006 6
7
Conceptual layout of EB Control process
Context Monitoring by Agent (context
awareness)
Starting Condition
All OK
Criterion Function Assessment
CONTROLLER
PLANT
Off Nominal
Agent Performs
Controller SELECTION (EB)
EB-UPDATED PLANT MODEL
Agent Perform SID (EB)
Off Nominal
context discernment
EB-UPDATED CONTROLLER MODEL
Criterion Function Assessment
EB-UPDATED PLANT MODEL
All OK
Install Updated Controller Design
Run Simulation
NWCIL WCCI - IJCNN2006 7
8
  • Populating the Repository
  • In practice, might build repository piece by
    piece
  • - via available design tools (e.g., Phase 2
    tools), and
  • generate controllers for a given application and
    collect them into a repository, along with a list
    of attributes (parameters) that can be used as an
  • index to facilitate their selection, BUT

NWCIL WCCI - IJCNN2006 8
9
  • Populating the Repository continued
  • difficult to define the list of attributes to
    serve as useful indexing mechanisms, and
  • to come up with a useful parameterization of the
    context for the given task.
  • We note that the choice of representations and
    associated mappings directly influences
    subsequent
  • - efficiency of access
  • - notion of nearness, and
  • - notion of generalization

NWCIL WCCI - IJCNN2006 9
10
  • Populating the Repository continued
  • For research purposes, start with a synthetic
    method
  • employ an analytic equation, neural network,
  • their parameters provide built-in indexing
  • mechanisms
  • e.g., if the plant is known to be linear, employ
    say a fifth order transfer function, and the
    functions coefficients used to define the
    indexing
  • e.g., if known to be second order linear, then
    use
  • second order transfer function
  • ? access via 2-dim. vs. 5-dim space
  • ? efficiency generalization
    implications


NWCIL WCCI - IJCNN2006 10
11
  • MANIFOLDS
  • Important to endow the index with
  • property of being searchable, and
  • operational notion of nearness
  • The mathematical construct of manifolds (from
    geometric topology) provides a useful formalism
    for this application.
  • a set of elements, S, and
  • a coordinate system in Rn
  • (a one-to-one mapping from S to Rn that
    specifies each element in S via a vector of n
    real numbers, a.k.a. the coordinates of the
    element). For Rn is searchable nearness is
    Euclidean.
  • (Terms index and coordinates are here
    synonymous.)

NWCIL WCCI - IJCNN2006 11
12
MAPPINGS BETWEEN CONTEXT SPACE AND VARIOUS
MANIFOLDS (? repository set part)
In general, how does one craft an appropriate
mapping from the full Context Space (whatever
form of representation is employed) to the
coordinate system of the control manifold?
Strategy is to employ HLLA to learn the mappings.
NWCIL WCCI - IJCNN2006 12
13
EBSID Example Neural Network as Plant. The goal
of this SID process was to train a NN (called
CDN) to select a NN from a neural manifold (plant
model repository) to match the behavior of an
observed NN (the plant/ system).
NWCIL WCCI - IJCNN2006 13
14
EBSID Example Pole-Cart Plant. This benchmark
system was used to further develop ideas.
Repository populated synthetically via a
parameterized set of equations for the pole-cart
plant. Parameters explicitly included the pole
length and mass. The latter were varied during
the experiment. Task was to discern changes in
context (pole length, pole mass) when they
occurred. After the CDN is trained, it
functions to adjust the manifold coordinate
values (pole length and mass). These parameter
values instantiate a plant model in the
repository. Once the CDN issues null adjustments,
the appropriate model has been selected
(specified via current values of CD) and is to be
used by the Agent to select a corresponding
controller
NWCIL WCCI - IJCNN2006 14
15
Demonstration of Context Discernment in response
to change in Pole-Cart parameter values (context
change) at every 50th iteration
Errors between state variable values for
pole-cart system and for models selected during
discernment process.
NWCIL WCCI - IJCNN2006 15
16
Extension of Example 3 first foray into mixed
representations. Use NN models of the pole-cart
plant in the repository instead of equations. A
NN with 174 weights trained to emulate the plant
for a given length/mass combination. Using this
trained NN as a starting point, eight new NNs
were trained for different mass/length (M/L)
combinations. The weights of the resulting NNs
were analyzed for changes from the original
(base) case A sensitivity-variance metric was
crafted and used to select 22 weights. A new NN
with 152 weights frozen to the base-case design
and with 22 adjustable weights was then trained
for the same eight cases, yielding models all
within tolerance.
NWCIL WCCI - IJCNN2006 16
17
  • Extension of Example 3 first foray into mixed
    representations (continued)
  • - These NNs now populate the repository, with the
    22 adjustable weights as its coordinates.
  • - Even though the pole M L are the two context
    parameters that are changing, the representation
    available to the Agent comprises 22 parameters.
  • This is a more difficult task than previous
    example, as the model plant and actual plant have
    different forms of representation.
  • The HLLA is to design a CDN that selects the
    appropriate NN for a given M/L combination, and
    may have to develop mappings between the two sets
    of coordinates.
  • NOT ACCOMPLISHED YET, as moved on to next task

NWCIL WCCI - IJCNN2006 17
18
To move beyond working with just simulations, we
have begun initial stages of implementing a Robot
with Context Discerning and Controller Selecting
capability at NWCIL.
Sony AIBO robotic dog has been modified into a
research platform. Its task is to learn to
discern changes in walking surface types and
adjust its gait accordingly.
NWCIL WCCI - IJCNN2006 18
19
  • AIBO experiments to date
  • Constructed five different surface types (4
    long)
  • hardwood, thin foam, thin carpet, thick
    shag carpet,
  • reversed shag carpet
  • 2. Used genetic algorithm to develop good AIBO
    gait for each of five surfaces.
  • Note each resulting gait yields better
    performance measure on its respective surface
    than does the default gait differences are
    visually discernable.
  • 3. AIBO made test walks on the five surfaces for
    each gait, and data streams from 17
    joint-actuator sensors were recorded.
  • An AR model was computed/stored for each of the
    25 sets of sensory experiences.

NWCIL WCCI - IJCNN2006 19
20
  • AIBO experiments (cont.)
  • 5. Now, when walking on a surface, AIBO discerns
    the
  • surface type (CONTEXT) by processing its
  • current kinesthetic experience through the
  • models in its repository, and selects the
    most
  • similar one.
  • 6. It then selects the gait corresponding to this
  • surface, and adjusts its walk accordingly.
  • 7. Show video

NWCIL WCCI - IJCNN2006 20
21
CONCLUSION We currently conjecture that the
proposed experience-based approach will usher in
a whole new phase of development of the decision
and controls fields making a significant stride
toward the achievement of more human-like
decision and control. We also conjecture that
the context discernment concepts plus the
manifolds representation will provide a basis for
constructing learning agents capable of long term
rapidly accessible memory. If so, this could pave
the way for scaling neural systems to brain-like
capabilities
NWCIL WCCI - IJCNN2006 21
22
I wish to acknowledge the creative role filled by
the various graduate students who have worked in
the NWCIL, both present and past. These include,
but not limited to, in alphabetical order
Michael Carroll Christian Paintz Lars
Holmstrom Alec Rogers Steven Hutsell Adreas
Rustan Bryan Johnson Larry Schultz Joe
Lotz Steve Shervais Shari Matzner Andrew
Toland Jay McCarthy Sangsuree Vasupongayya
Particular acknowledgements go to my Lab
Managers Thaddeus Shannon Roberto
Santiago. ----------------------- Also, NSF
Grants ECS-9904378 ECS-0301022
NWCIL WCCI - IJCNN2006 22
23
NWCIL WCCI - IJCNN2006 23
24
Demonstration of Context Discernment in response
to change in NN plant parameter values (context
change) at every 100th iteration.
NWCIL WCCI - IJCNN2006 24
25
  • CONTEXT DISCERNMENT
  • When done relative to the plant, entails a
    form of System Identification (SID).
  • Employ notion of experience repository for the
    SID task as well (comprising
    relevant plant models).
  • HLLA creates a strategy for optimally selecting
    a model from the repository.

NWCIL WCCI - IJCNN2006 25
26
  • SELECTION
  • - The selection process is triggered by the Agent
    becoming aware that a change in context has
    occurred (contextual awareness)
  • Followed by the Agent seeking information about
    what changed (context discernment)
  • And finally, by selection.

NWCIL WCCI - IJCNN2006 26
27
  • At NWCIL, experience is being addressed
  • - via a notion of experience repository, and
  • via a novel concept for applying
  • Reinforcement Learning / Adaptive Critics
  • vis-à-vis the experience repository,
  • ? Higher-Level Learning Algorithm
    (HLLA).

NWCIL WCCI - IJCNN2006 27
28
System configuration during DHP Design of
Controller Augmentation System (2003)
NWCIL WCCI - IJCNN2006 28
29
  • Blue LoFLYTE w/ Unaugmented control
  • Red LoFLYTE w/Augmented Control
  • Black LoFLYTE

Pitch w/ cg Shift
NWCIL WCCI - IJCNN2006 29
30
  • Blue LoFLYTE w/ Unaugmented control
  • Red LoFLYTE w/Augmented Control
  • Black LoFLYTE

Roll 1
NWCIL WCCI - IJCNN2006 30
31
  • Blue LoFLYTE w/ Unaugmented control
  • Red LoFLYTE w/Augmented Control
  • Black LoFLYTE

Roll 2
NWCIL WCCI - IJCNN2006 31
32
DEFINITIONS 1. Agent computational
intelligence device. 2. Context Variables (Agent
centric) those attributes of i) the environment
and ii) the plant/process whose variations could
engender changes to the decision rule / control
policy employed by the Agent while accomplishing
the Agents current objective or goal and in
addition, iii) the criteria (representing the
objective or goal) to be used for designing and
subsequent selection of the decision rule or
control law. We use the term Criterion Function
(CF) to represent these criteria. 3. Context
Space (Agent centric) a vector space in which
each context variable is associated to a
dimension. The Context Space is conceptualized as
comprising three sub-spaces, one each associated
with the i) Plant, ii) Environment, and iii)
Criterion Function. 4. Context (Agent centric)
a point in Context Space the set of values taken
on by the context variables in a given situation.
NWCIL WCCI - IJCNN2006 32
33
  • 5. Context Discernment the act or process of
    determining the current values of the context
    variables (current point in Context Space)
    appropriate to the task being performed. Webster
    on-line for discern to recognize or identify
    as separate and distinct.
  • 6. Experience A two-component concept
  • Component A Repository of previously developed
    context-specific models
    (controller, plant, or CF models), and
  • Component B Algorithms used by the Agent to
    effectively and efficiently select a model
    from the repository as changes in context
    occur. Note A key task of the HLLA is to
    train the Agent to learn Component B.
  • 7. Selection the act of choosing/retrieving
    appropriate element of the repository
    corresponding to the discerned context.
  • 8. Higher-Level Learning Algorithm (HLLA) The
    reference level for the term higher is the case
    where learning algorithms are applied directly to
    the design of optimal controllers as in Learning
    Control, ones that would be accumulated in the
    repository. Higher-Level here means applying
    the learning method to create a strategy for
    selecting a good controller from the repository,
    where the process of selection is optimized.
    Definition of the Utility function (CF) is key
    for application of this process.

NWCIL WCCI - IJCNN2006 33
34
OBSERVATION 1 In the case of humans, the
more knowledge / experience attained, the more
improvement in effectiveness of performing new
related tasks, with little or no speed
penalty. OBSERVATION 2 In the case of AI
rule-based systems, the more knowledge attained,
the slower the processing. RESEARCH OBJECTIVE
Develop a computational intelligence device
(Agent) that employs experience to enhance
effectiveness and efficiency of certain processes
i.e. endow these processes with more human-like
attributes.
NWCIL WCCI - IJCNN2006 34
35
  • Since CONTEXT is fundamental to the approach
  • performed a historical overview of the control
    field vis-à-vis the explicit role that context
    has (or has not) played in the various
    formulations and approaches.
  • overview also motivated by the belief that
    adding the capability to employ experience in the
    controller design / selection process will usher
    in a qualitatively new phase in the evolution of
    the controls field.

NWCIL WCCI - IJCNN2006 35
36
  • Phase 1 DESIGN BASED on INTUITION and
  • INVENTION.
  • control devices date to antiquity
  • well-known recent device is the flyball
    governor James Watt, 1788
  • design of such devices were the product of
    intuition and inventive genius, with little
    support from mathematically based tools, and with
    no explicit notion of context.

NWCIL WCCI - IJCNN2006 36
37
  • Phase 2 DESIGN BASED on MATHEMATICAL
    TOOLS.
  • mathematics has played a fundamental role in
    developing the control field
  • Maxwell used differential equations to analyze
    the flyball governor dynamics, ca. 1870
  • followed by Fourier and Laplace transforms, state
    space methods, stochastic methods, Hilbert space
    methods, algebraic and geometric topological
    methods,
  • - design is done off-line

NWCIL WCCI - IJCNN2006 37
38
Phase 2 - continued - contains design methods
where the controller is placed in service with no
associated mechanism for modifying its design in
response to changes in context - each controller
design is based on a single point in the Context
Space, or at most, a small neighborhood of
points - this phase includes at least the
following well known design methods Classical
Control, Modern Control, Optimal Control,
Stochastic Control, and Robust Control
NWCIL WCCI - IJCNN2006 38
39
  • Phase 3 DESIGN for CONTEXT DEPENDENCE.
  • such large variation in context that
  • - fixed controllers are not sufficient
  • first, discern current context
  • then, use previously designated process to
  • adjust controller parameters, based on
  • observations
  • e.g. Adaptive Control and Learning Control
  • accommodates a modicum of variations in context
    via on-line parameter adjustments
  • mechanism for performing accommodations is
    distinct from that defined for Phase 4.

NWCIL WCCI - IJCNN2006 39
40
NWCIL WCCI - IJCNN2006 40
41
EB IDENTIFICATION AND CONTROL
Generic structure for developing proposed
Experience Based System Identifier
NWCIL WCCI - IJCNN2006 41
42
EB IDENTIFICATION AND CONTROL cont.
Generic structure for developing proposed
Experience Based Controller
A good EB-Algorithm minimizes number of selection
cycles it has learned (via the Adaptive Critic
method) to make optimal use of a priori
knowledge.
NWCIL WCCI - IJCNN2006 42
43
NWCIL WCCI - IJCNN2006 43
Write a Comment
User Comments (0)
About PowerShow.com