A1257278321pBoXr - PowerPoint PPT Presentation

1 / 59

About This Presentation

Title:

A1257278321pBoXr

Description:

Folie 1 – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 60

Provided by: FW3

Category:

more less

Transcript and Presenter's Notes

Title: A1257278321pBoXr

1
Closed Loop Temporal Sequence Learning Learning
to act in response to sequences of sensor events
2
Overview over different methods
You are here !
3
Natural Temporal Sequences in life and in
control situations

Real Life
Heat radiation predicts pain when touching a hot
surface.
Sound of a prey may precede its smell will
precede its taste.
Control
Force precedes position change.
Electrical (disturbance) pulse may precede change
in a controlled plant.
Psychological
Experiment
Pavlovian and/or Operant Conditioning Bell
precedes Food.

4
What we had so far !
Early Bell
w
S
Late Food
5
conditioned Input
This is an open-loop system
6
Closed loop
Behaving
Sensing
Adaptable Neuron
Env.
7
How to assure behavioral learning convergence ??
This is achieved by starting with a stable
reflex-like action and learning to supercede it
by an anticipatory action.
8
Robot Application
9
Reflex Only (Compare to an electronic closed
loop controller!)
Think of a Thermostat !
This structure assures initial (behavioral)
stability (homeostasis)
10
Retraction reflex
Bump
The Basic Control Structure Schematic diagram of
A pure reflex loop
11
Robot Application
Learning Goal Correlate the vision signals with
the touch signals and navigate without collisions.
Initially built-in behavior Retraction reaction
whenever an obstacle is touched.
12
Lets look at the Reflex first
13
This is an open-loop system
The T represents the temporal delay between
vision and bump.
Closed Loop TS Learning
14
Lets see how learning continues
15
Antomically this loop still exists but ideally it
should never be active again !
This is the system after learning
Closed Loop TS Learning
16
What has the learning done ?
Elimination of the late reflex input corresponds
mathematically to the condition X00
This was the major condition of convergence for
the ISO/ICO rules. (The other one was T0).
Whats interesting about this is that the
learning-induced behavior self-generates this
condition. This is non-trivial as it assures the
synaptic weight AND behavior stabilize at exactly
the same moment in time.
17
The inner pathway has now become a
pure feed-forward path
What has happened in engineering terms?
18
Formally
19
The Learner has learned the inverse transfer
function of the world and can compensate the
disturbance therefore at the summation node!
-e-sT
0
e-sT
P1
D
Phase Shift
20
The out path has, through learning, built a
Forward Model of the inner path. This is some
kind of a holy grail for an engineer
called Model Free Feed-Forward Compensation
For example If you want to keep the temperature
in an outdoors container constant, you can employ
a thermostat. You may however also first measure
how the sun warms the liquid in the container,
relating outside temperature to inside
temperature (Eichkurve). From this curve you
can extract a control signal for the cooling of
the liquid and react BEFORE the liquid inside
starts warming up. As you react BEFORE, this is
called Feed-Forward-Compensation (FFC). As you
have used a curve, have performed Model-Based
FFC. Our robot learns the model, hence we do
Model Free FFC.
21
Example of ISO instability as compared to ICO
stability !
Remember the Auto-Correlation Instability of ISO??
dw1
m u1 u0
ICO
dt
dw1
m u1 v
ISO
dt
22
(No Transcript)
23
Old ISO-Learning
24
New ICO-Learning
25
Full compensation
Over compensation
26
Statistical Analysis Conventional differential
hebbian learning
One-shot learning (some instances)
Highly unstable
27
Statistical Analysis Input Correlation Learning
One-shot learning (common)
Stable in a wide domain
28
(No Transcript)
29
Two more applications RunBot With different
convergent conditions!
Walking is a very difficult problem as such and
requires a good, multi-layered control structure
only on top of which learning can take place.
30
An excursion into Multi-Layered Neural Network
Control

Some statements
Animals are behaving agents. They receive sensor
inputs from MANY sensors and they have to control
the reaction of MANY muscles.
This is a terrific problem and solved by out
brain through many control layers.
Attempts to achieve this with robots have so far
been not very successful as network control is
somewhat fuzzy and unpredictable.

31
Adaptive Control during Waliking Three Loops
Step control Terrain control
Central Control (Brain)
Ground- contact
Spinal Cord (Oscillations )
Motorneurons Sensors (Reflex generation)
Muscles Skeleton (Biomechanics)
Muscle length
32
Self-Stabilization by passive Biomechanical
Properties (Pendelum)
Muscles Skeleton (Biomechanics)
33
Passive Walking Properties
34
RunBots Network of the lowest loop
Motor neurons Sensors (Reflex generation)
Muscles Skeleton (Biomechanics)
Muscle length
35
Leg Control of RunBot Reflexive Control (cf.
Cruse)
36
Instantaneous Parameter Switching
37
Body (UBC) Control of RunBot
38
Long-Loop Reflex of one of the upper loops
Terrain control
Central Control (Brain)
Before or during a fall Leaning forward of
rump and arms
39
Long-Loop Reflexe der obersten Schleife
Terrain control
Central Control (Brain)
Forward leaning UBC
Backward leaning UBC
40
The Leg Control as Target of the Learning
41
Learning in RunBot
42
RunBot Learning to climb a slope
Spektrum der Wissenschaft, Sept. 06
43
Human walking versus machine walking
ASIMO by Honda
44
RunBots Joints
Ramp
Lower floor
Upper floor
45
Change of Gait when walking upwards
46
Learning relevant parameters
Too late
Early enough
X
47
A different Learning Control Circuitry
AL, (AR) Stretch receptor for anterior angle of
left (right) hip GL, (GR) Sensor neuron for
ground contact of left (right) foot EI (FI)
Extensor (Flexor) reflex inter-neuron, EM (FM)
Extensor (Flexor) reflex motor-neuron ES (FS)
Extensor (Flexor) reflex sensor neuron
48
Motor Neuron Signal Structure (spike rate)
Right Leg
Left Leg
x0
x1
Learning reverses pulse sequence ? Stability of
STDP
49
Motor Neuron Signal Structure (spike rate)
Right Leg
Left Leg
x0
x1
Learning reverses pulse sequence ? Stability of
STDP
50
The actual Experiment
Note Here our convergence condition has been
T0 (oscillatory!) and NOT as usual X00
51
Speed Change by Learning
Note the Oscillations !
T 0 !
52
(No Transcript)
53
(No Transcript)
54
dw1
m u1 u0
dt
dw1
m u1 v
dt
55
Driving School Learn to follow a track and
develop RFs in a closed loop behavioral context
56
Differential Hebbian learning used for STDP
ISO-learning for temporal sequence
learning (Porr and Wörgötter, 2003)
??
T
Central Features Filtering
of all inputs with low-pass filters
(membrane property) Reproduces asymmetric
weight change curve
57
Modelling STDP (plasticity) and using this
approach (in a simplified way) to control
behavioral learning.
An ecological approach to some aspects
of theoretical neuroscience
Differential Hebbian Learning rule
Transfer of a computational neuroscience approach
to a technical domain.
58
Pavlov in real life !
Adding a secondary loop
59
What has happened during learning to the system ?
The primary reflex re-action has effectively been
eliminated and replaced by an anticipatory action

Write a Comment

User Comments (0)