Representation Learning and Modular SelfOrganization for an Autonomous Agent PowerPoint PPT Presentation

presentation player overlay
1 / 55
About This Presentation
Transcript and Presenter's Notes

Title: Representation Learning and Modular SelfOrganization for an Autonomous Agent


1
Representation Learning andModular
Self-Organization foran Autonomous Agent
  • Bruno Scherrer
  • Supervisors F. Alexandre, F. Charpillet

2
Build an autonomous agent
  • Compute a strategy/policy
  • Examples
  • walk
  • drive a car
  • play backgammon

3
Representation and Modular Organization
Perception
Representation
Modular Organization
Centralized Organization
4
Copy an efficient system
  • autonomous
  • robust
  • anytime
  • dynamical
  • distributed parallel
  • graceful degradation

Connectionist algorithms
Massively interconnected networks of elementary
parallel processors
5
Aims of the thesis
  • Show that the following problems
  • compute a strategy/policy
  • learn a representation
  • organize a system into modules
  • has connectionist solutions
  • Understand the computational stakes of such an
    approach

6
This talk
  • Introduction
  • A connectionist computation
  • Optimal control reinforcement learning
  • Representation learning
  • Modular Self-Organization
  • Conclusions

7
Connectionist algorithms
  • Connectivity
  • Activation functions
  • Learning law(s)
  • (A)synchronism ?

A dynamical system which is hard to analyze and
design !
8
A connectionist computation
t0
Activation
units
9
A connectionist computation
  • Computation of contraction fixed points
  • Traditional solution
  • Connectionist solution

Distributed Parallel Asynchronous
M
Bertsekas Tsitsiklis, 89
10
Summary
  • Properties of a fixed point computation
  • anytime
  • dynamical
  • with a connectionist approach
  • massively parallel
  • Tractability ? the network size
  • The number of iterations to reach the fixed point
    is the same

11
This talk
  • Introduction
  • A connectionist computation
  • Optimal control reinforcement learning
  • Representation learning
  • Modular Self-Organization
  • Conclusions

12
Optimal control
One looks for a policy that maximizes the
long-term expected amount of rewards One computes
the Value function
? S ? A
13
Example
Actions
14
Example
  • Reward

15
Example
  • Value function

Reward
16
Example
  • Optimal policy

Value function
17
Relation with connectionism
18
A dynamical computation
19
Reinforcement learning
  • An optimal control problem for which some
    parameters are uncompletely known
  • Parameter estimation (learning)
  • Exploration/exploitation dilemma

? ?
20
Relation with connectionism
  • In the network
  • Estimation of R learning law 1
  • Estimation of T learning law 2

T(s,?,s')
s'
T(s,?,s'')
Law 2 similar to Hebb law
s''
V
R
?
...
s
21
Summary
  • A connectionist architecture for reinforcement
    learning
  • Tractability ? size of the state space
  • number of iterations for the fixed point
  • estimation of R and T

environment
Parameter estimation
Control
p
TR
SATR
SA
22
This talk
  • Introduction
  • A connectionist computation
  • Optimal control reinforcement learning
  • Representation learning
  • Modular Self-Organization
  • Conclusions

23
Representation
? Tractability
24
Representation
? Sub-optimal
25
Representation
? Optimal
26
Whats a good representation ?
27
Measuring the approx. error
  • A bound on the approximation error
  • depends on the interpolation error
  • and is the fixed point of
  • Most uncertain policy

Munos Moore, 99
28
Measuring the approx. error
  • Interpolation error

29
Measuring the approx. error
  • Approximation error

30
Measuring the approx. error
  • Most uncertain policy

31
Reducing the approx. error
32
Reducing the approx. error
  • One can improve an approximation...
  • by using gradient descent

long-term
instantaneous
33
Reducing the approx. error
Zone of interest
34
Reducing the approx. error
  • New representation, new errors

35
Reducing the approx. error
  • New representation, new errors

36
Reducing the approx. error
  • New representation, new errors

37
Reducing the approx. error
  • New representation, new errors

38
Experiments (1/2)
39
Experiments (1/2)
40
Experiments (2/2)
41
Experiments (2/2)
42
Summary
  • A new connectionist functional layer

environment
Parameter estimation
Control
p
TR
SATR
SA
Optimization of the quality / complexity ratio
43
This talk
  • Introduction
  • A connectionist computation
  • Optimal control reinforcement learning
  • Representation learning
  • Modular Self-Organization
  • Conclusions

44
Learning a representation
M
45
Learning a representation
M4
M2
M3
M1
One representation may not be enough when there
are several tasks !
46
Learning representations
M4
M2
M3
M1
47
A modular approach
M4
M2
M3
M1
48
Learning a modular architecture
  • Representation learning is
  • Modular self-organization is

A straight generalization / A clustering problem
49
Experiment
6 tasks to perform
3 modules
50
Experiment
3
2
1
Module 1
Module 3
Module 2
51
Summary
environment
TR
Rep. Learning
p
TR
Parameter estimation
Control
TR
S
SATR
SA
Improvement of the quality / complexity ratio
52
This talk
  • Introduction
  • A connectionist computation
  • Optimal control reinforcement learning
  • Representation learning
  • Modular Self-Organization
  • Conclusions

53
Conclusions
Designing connectionist algorithms ? ? Fixed
point computation ? Application to optimal
control and reinforcement learning Large state
space ? ? Representation Learning Several tasks
? ? Modular Self-organization
massive parallelism
optimization of the quality / complexity ratio
Improvement of the quality / complexity ratio
54
Conclusions
Theoretically sound approximation techniques ?
Generic results Experimental validation on
continuous problems ? driving a car ?
multi-goal navigation
55
Possible future of this work
  • Extensions/improvements
  • Modular cooperation
  • Parallel implementation
  • Powerful approximation frameworks
  • The exploration/exploitation dilemma
  • Relations with cognitive science
Write a Comment
User Comments (0)
About PowerShow.com