Emotion-Based Decision and Learning presentation

About This Presentation

Transcript and Presenter's Notes

Title: Emotion-Based Decision and Learning

1
Emotion-Based Decision and Learning

Bruno Damas

2
Worst case agent scenario

Complex world, with large number of perceptions
Minimum a priori knowledge
Very limited computational power (both
computation time and memory size)
Possible non-stationary world

Discretization of perception space leads to an
exponential growth of computational resources
needed with the increase of the number of
perceptions.
Only the most important information must be
preserved.
Solution Apply the concept of somatic markers to
build an associative memory capable of dealing
with such problems.

4
Emotions in human decision

Somatic markers store Situation/Connotation
associations (feelings) in human memory
When a decision has to be made, several possible
scenarios are built in the mind, associated with
the possible different behaviors the subject may
have.
Somatic markers, taking into account their
likeness to these hypothetic situations, induce a
body response (the emotion) that corresponds to
the situation desirability.

5
Future Situation 1
a1
u1
a2
u2
Future Situation 2
Present Situation
Decision
a3
Future Situation 3
u3
Somatic Markers
6
Decision and learning process

To implement such an emotion-based decision
process in an artificial agent, at least three
mechanisms are required
An associative memory
A memory management system
A connotation estimation procedure

7
Associative memory
What should be stored in associative memory?
(Perception, Action)
C or dC
Situation
Desirability
One must know where to find invariances. Ex
Filling the tank vs. Putting only 5l
8
Estimation Procedure
Non parametric regression problem with K samples
(xi,yi).
y?
There is no reference model!
x
9
Proposed Estimation Procedure
Similarity measure
x (P,A), y u(P,A), yi u(P,A dCi)
10
Relation to classical decision
11
Design issues

Continuous-time signal sampling and
reconstruction
Cut frequency of low-pass filter
Sampling rate

Associative Memory
Distance measure (similarity)
Memory capacity

12
Finite Resources Memory Management

The agent must start picking and discarding
memory records when the associative memory
reaches its full capacity. The choice policy of
the to be discarded record is crucial
Agent performance should increase, i.e.,
estimation should become better on the long time.
Discarding mechanisms must be fast, and must
have, in the worst case , the same computational
complexity as the estimation mechanisms.

13
First Approach
Distribute the memory records as uniformly as
possible in the perception space. Discarding
records in crowded areas should do the trick.
Second Approach
Eliminate memory points that hardly make a
difference in the estimation / interpolation
process. Local variance could be a possible
heuristic, but care must be taken since the order
in wich memory points are acquired does matter.
14
Third Approach
Take into account non-stationary environments.
This is the hardest case. Time must then be
considered in the interpolation function, and a
reformulation of the removal policy must be done
(in the limit FIFO) Obtaining the environment
change rate ( is it slow-varying or fast-varying?
) can become a major problem.
15
Conclusions

Major advantages
No need for discretization of a continuous
perception state (Reinforcement Learning)
Ability to deal with arbitrary large
environments with any computational /memory
restrictions
No need for previous world examples ( Neural
Networks ) Agent learns from the begin.

16
Conclusions

Major drawbacks
A similarity measure is needed
It is difficult to choose an appropriate memory
size
This is a greedy architecture.

17
Major Questions

Self-adjustment of similarity measure
( Particular case identification of irrelevant
perception vector elements. There are statistical
tools that do that, but ... )
Choosing an adequate memory size, possibly based
on
Perception vector dimension
Bounds for each perception vector element
Variability of the true unknown function we
are trying to estimate ( Bandwith )
Exploration vs. Exploitation problem

18
Current Work

Sequences of actions
Application of this architecture to
Hidden Markov Chain
Inverted Pendulum control
Dynamic obstacles avoidance

Write a Comment

User Comments (0)

About PowerShow.com

Emotion-Based Decision and Learning PowerPoint PPT Presentation