Title: A1257278321pBoXr
1Closed Loop Temporal Sequence Learning Learning
to act in response to sequences of sensor events
2Overview over different methods
You are here !
3Natural Temporal Sequences in life and in
control situations
- Real Life
- Heat radiation predicts pain when touching a hot
surface. - Sound of a prey may precede its smell will
precede its taste. - Control
- Force precedes position change.
- Electrical (disturbance) pulse may precede change
in a controlled plant. - Psychological
Experiment - Pavlovian and/or Operant Conditioning Bell
precedes Food.
4What we had so far !
Early Bell
w
S
Late Food
5conditioned Input
This is an open-loop system
6Closed loop
Behaving
Sensing
Adaptable Neuron
Env.
7How to assure behavioral learning convergence ??
This is achieved by starting with a stable
reflex-like action and learning to supercede it
by an anticipatory action.
8Robot Application
9Reflex Only (Compare to an electronic closed
loop controller!)
Think of a Thermostat !
This structure assures initial (behavioral)
stability (homeostasis)
10Retraction reflex
Bump
The Basic Control Structure Schematic diagram of
A pure reflex loop
11Robot Application
Learning Goal Correlate the vision signals with
the touch signals and navigate without collisions.
Initially built-in behavior Retraction reaction
whenever an obstacle is touched.
12Lets look at the Reflex first
13This is an open-loop system
The T represents the temporal delay between
vision and bump.
Closed Loop TS Learning
14Lets see how learning continues
15Antomically this loop still exists but ideally it
should never be active again !
This is the system after learning
Closed Loop TS Learning
16What has the learning done ?
Elimination of the late reflex input corresponds
mathematically to the condition X00
This was the major condition of convergence for
the ISO/ICO rules. (The other one was T0).
Whats interesting about this is that the
learning-induced behavior self-generates this
condition. This is non-trivial as it assures the
synaptic weight AND behavior stabilize at exactly
the same moment in time.
17The inner pathway has now become a
pure feed-forward path
What has happened in engineering terms?
18Formally
19The Learner has learned the inverse transfer
function of the world and can compensate the
disturbance therefore at the summation node!
-e-sT
0
e-sT
P1
D
Phase Shift
20The out path has, through learning, built a
Forward Model of the inner path. This is some
kind of a holy grail for an engineer
called Model Free Feed-Forward Compensation
For example If you want to keep the temperature
in an outdoors container constant, you can employ
a thermostat. You may however also first measure
how the sun warms the liquid in the container,
relating outside temperature to inside
temperature (Eichkurve). From this curve you
can extract a control signal for the cooling of
the liquid and react BEFORE the liquid inside
starts warming up. As you react BEFORE, this is
called Feed-Forward-Compensation (FFC). As you
have used a curve, have performed Model-Based
FFC. Our robot learns the model, hence we do
Model Free FFC.
21Example of ISO instability as compared to ICO
stability !
Remember the Auto-Correlation Instability of ISO??
dw1
m u1 u0
ICO
dt
dw1
m u1 v
ISO
dt
22(No Transcript)
23Old ISO-Learning
24New ICO-Learning
25Full compensation
Over compensation
26Statistical Analysis Conventional differential
hebbian learning
One-shot learning (some instances)
Highly unstable
27Statistical Analysis Input Correlation Learning
One-shot learning (common)
Stable in a wide domain
28(No Transcript)
29Two more applications RunBot With different
convergent conditions!
Walking is a very difficult problem as such and
requires a good, multi-layered control structure
only on top of which learning can take place.
30An excursion into Multi-Layered Neural Network
Control
- Some statements
- Animals are behaving agents. They receive sensor
inputs from MANY sensors and they have to control
the reaction of MANY muscles. - This is a terrific problem and solved by out
brain through many control layers. - Attempts to achieve this with robots have so far
been not very successful as network control is
somewhat fuzzy and unpredictable.
31Adaptive Control during Waliking Three Loops
Step control Terrain control
Central Control (Brain)
Ground- contact
Spinal Cord (Oscillations )
Motorneurons Sensors (Reflex generation)
Muscles Skeleton (Biomechanics)
Muscle length
32Self-Stabilization by passive Biomechanical
Properties (Pendelum)
Muscles Skeleton (Biomechanics)
33Passive Walking Properties
34RunBots Network of the lowest loop
Motor neurons Sensors (Reflex generation)
Muscles Skeleton (Biomechanics)
Muscle length
35Leg Control of RunBot Reflexive Control (cf.
Cruse)
36Instantaneous Parameter Switching
37Body (UBC) Control of RunBot
38Long-Loop Reflex of one of the upper loops
Terrain control
Central Control (Brain)
Before or during a fall Leaning forward of
rump and arms
39Long-Loop Reflexe der obersten Schleife
Terrain control
Central Control (Brain)
Forward leaning UBC
Backward leaning UBC
40The Leg Control as Target of the Learning
41Learning in RunBot
42RunBot Learning to climb a slope
Spektrum der Wissenschaft, Sept. 06
43Human walking versus machine walking
ASIMO by Honda
44RunBots Joints
Ramp
Lower floor
Upper floor
45Change of Gait when walking upwards
46Learning relevant parameters
Too late
Early enough
X
47A different Learning Control Circuitry
AL, (AR) Stretch receptor for anterior angle of
left (right) hip GL, (GR) Sensor neuron for
ground contact of left (right) foot EI (FI)
Extensor (Flexor) reflex inter-neuron, EM (FM)
Extensor (Flexor) reflex motor-neuron ES (FS)
Extensor (Flexor) reflex sensor neuron
48Motor Neuron Signal Structure (spike rate)
Right Leg
Left Leg
x0
x1
Learning reverses pulse sequence ? Stability of
STDP
49Motor Neuron Signal Structure (spike rate)
Right Leg
Left Leg
x0
x1
Learning reverses pulse sequence ? Stability of
STDP
50The actual Experiment
Note Here our convergence condition has been
T0 (oscillatory!) and NOT as usual X00
51Speed Change by Learning
Note the Oscillations !
T 0 !
52(No Transcript)
53(No Transcript)
54dw1
m u1 u0
dt
dw1
m u1 v
dt
55Driving School Learn to follow a track and
develop RFs in a closed loop behavioral context
56Differential Hebbian learning used for STDP
ISO-learning for temporal sequence
learning (Porr and Wörgötter, 2003)
??
T
Central Features Filtering
of all inputs with low-pass filters
(membrane property) Reproduces asymmetric
weight change curve
57Modelling STDP (plasticity) and using this
approach (in a simplified way) to control
behavioral learning.
An ecological approach to some aspects
of theoretical neuroscience
Differential Hebbian Learning rule
Transfer of a computational neuroscience approach
to a technical domain.
58Pavlov in real life !
Adding a secondary loop
59What has happened during learning to the system ?
The primary reflex re-action has effectively been
eliminated and replaced by an anticipatory action