Title: Representation Learning and Modular SelfOrganization for an Autonomous Agent
1Representation Learning andModular
Self-Organization foran Autonomous Agent
- Bruno Scherrer
- Supervisors F. Alexandre, F. Charpillet
2Build an autonomous agent
- Compute a strategy/policy
- Examples
- walk
- drive a car
- play backgammon
3Representation and Modular Organization
Perception
Representation
Modular Organization
Centralized Organization
4Copy an efficient system
- autonomous
- robust
- anytime
- dynamical
- distributed parallel
- graceful degradation
Connectionist algorithms
Massively interconnected networks of elementary
parallel processors
5Aims of the thesis
- Show that the following problems
- compute a strategy/policy
- learn a representation
- organize a system into modules
- has connectionist solutions
- Understand the computational stakes of such an
approach
6This talk
- Introduction
- A connectionist computation
- Optimal control reinforcement learning
- Representation learning
- Modular Self-Organization
- Conclusions
7Connectionist algorithms
- Connectivity
- Activation functions
- Learning law(s)
- (A)synchronism ?
A dynamical system which is hard to analyze and
design !
8A connectionist computation
t0
Activation
units
9A connectionist computation
- Computation of contraction fixed points
- Traditional solution
- Connectionist solution
Distributed Parallel Asynchronous
M
Bertsekas Tsitsiklis, 89
10Summary
- Properties of a fixed point computation
- anytime
- dynamical
- with a connectionist approach
- massively parallel
- Tractability ? the network size
- The number of iterations to reach the fixed point
is the same
11This talk
- Introduction
- A connectionist computation
- Optimal control reinforcement learning
- Representation learning
- Modular Self-Organization
- Conclusions
12Optimal control
One looks for a policy that maximizes the
long-term expected amount of rewards One computes
the Value function
? S ? A
13Example
Actions
14Example
15Example
Reward
16Example
Value function
17Relation with connectionism
18A dynamical computation
19Reinforcement learning
- An optimal control problem for which some
parameters are uncompletely known - Parameter estimation (learning)
- Exploration/exploitation dilemma
? ?
20Relation with connectionism
- In the network
- Estimation of R learning law 1
- Estimation of T learning law 2
T(s,?,s')
s'
T(s,?,s'')
Law 2 similar to Hebb law
s''
V
R
?
...
s
21Summary
- A connectionist architecture for reinforcement
learning - Tractability ? size of the state space
- number of iterations for the fixed point
- estimation of R and T
environment
Parameter estimation
Control
p
TR
SATR
SA
22This talk
- Introduction
- A connectionist computation
- Optimal control reinforcement learning
- Representation learning
- Modular Self-Organization
- Conclusions
23Representation
? Tractability
24Representation
? Sub-optimal
25Representation
? Optimal
26Whats a good representation ?
27Measuring the approx. error
- A bound on the approximation error
- depends on the interpolation error
- and is the fixed point of
- Most uncertain policy
Munos Moore, 99
28Measuring the approx. error
29Measuring the approx. error
30Measuring the approx. error
31Reducing the approx. error
32Reducing the approx. error
- One can improve an approximation...
- by using gradient descent
long-term
instantaneous
33Reducing the approx. error
Zone of interest
34Reducing the approx. error
- New representation, new errors
35Reducing the approx. error
- New representation, new errors
36Reducing the approx. error
- New representation, new errors
37Reducing the approx. error
- New representation, new errors
38Experiments (1/2)
39Experiments (1/2)
40Experiments (2/2)
41Experiments (2/2)
42Summary
- A new connectionist functional layer
environment
Parameter estimation
Control
p
TR
SATR
SA
Optimization of the quality / complexity ratio
43This talk
- Introduction
- A connectionist computation
- Optimal control reinforcement learning
- Representation learning
- Modular Self-Organization
- Conclusions
44Learning a representation
M
45Learning a representation
M4
M2
M3
M1
One representation may not be enough when there
are several tasks !
46Learning representations
M4
M2
M3
M1
47A modular approach
M4
M2
M3
M1
48Learning a modular architecture
- Representation learning is
- Modular self-organization is
A straight generalization / A clustering problem
49Experiment
6 tasks to perform
3 modules
50Experiment
3
2
1
Module 1
Module 3
Module 2
51Summary
environment
TR
Rep. Learning
p
TR
Parameter estimation
Control
TR
S
SATR
SA
Improvement of the quality / complexity ratio
52This talk
- Introduction
- A connectionist computation
- Optimal control reinforcement learning
- Representation learning
- Modular Self-Organization
- Conclusions
53Conclusions
Designing connectionist algorithms ? ? Fixed
point computation ? Application to optimal
control and reinforcement learning Large state
space ? ? Representation Learning Several tasks
? ? Modular Self-organization
massive parallelism
optimization of the quality / complexity ratio
Improvement of the quality / complexity ratio
54Conclusions
Theoretically sound approximation techniques ?
Generic results Experimental validation on
continuous problems ? driving a car ?
multi-goal navigation
55Possible future of this work
- Extensions/improvements
- Modular cooperation
- Parallel implementation
- Powerful approximation frameworks
- The exploration/exploitation dilemma
- Relations with cognitive science