Title: Emotion-Based Decision and Learning
1Emotion-Based Decision and Learning
2Worst case agent scenario
- Complex world, with large number of perceptions
- Minimum a priori knowledge
- Very limited computational power (both
computation time and memory size) - Possible non-stationary world
3- Discretization of perception space leads to an
exponential growth of computational resources - needed with the increase of the number of
perceptions. - Only the most important information must be
preserved. - Solution Apply the concept of somatic markers to
build an associative memory capable of dealing
with such problems.
4Emotions in human decision
- Somatic markers store Situation/Connotation
associations (feelings) in human memory - When a decision has to be made, several possible
scenarios are built in the mind, associated with
the possible different behaviors the subject may
have. - Somatic markers, taking into account their
likeness to these hypothetic situations, induce a
body response (the emotion) that corresponds to
the situation desirability.
5Future Situation 1
a1
u1
a2
u2
Future Situation 2
Present Situation
Decision
a3
Future Situation 3
u3
Somatic Markers
6Decision and learning process
- To implement such an emotion-based decision
process in an artificial agent, at least three
mechanisms are required - An associative memory
- A memory management system
- A connotation estimation procedure
7Associative memory
What should be stored in associative memory?
(Perception, Action)
C or dC
Situation
Desirability
One must know where to find invariances. Ex
Filling the tank vs. Putting only 5l
8Estimation Procedure
Non parametric regression problem with K samples
(xi,yi).
y?
There is no reference model!
x
9Proposed Estimation Procedure
Similarity measure
x (P,A), y u(P,A), yi u(P,A dCi)
10Relation to classical decision
11Design issues
- Continuous-time signal sampling and
reconstruction - Cut frequency of low-pass filter
- Sampling rate
- Associative Memory
- Distance measure (similarity)
- Memory capacity
12Finite Resources Memory Management
- The agent must start picking and discarding
memory records when the associative memory
reaches its full capacity. The choice policy of
the to be discarded record is crucial - Agent performance should increase, i.e.,
estimation should become better on the long time. - Discarding mechanisms must be fast, and must
have, in the worst case , the same computational
complexity as the estimation mechanisms.
13First Approach
Distribute the memory records as uniformly as
possible in the perception space. Discarding
records in crowded areas should do the trick.
Second Approach
Eliminate memory points that hardly make a
difference in the estimation / interpolation
process. Local variance could be a possible
heuristic, but care must be taken since the order
in wich memory points are acquired does matter.
14Third Approach
Take into account non-stationary environments.
This is the hardest case. Time must then be
considered in the interpolation function, and a
reformulation of the removal policy must be done
(in the limit FIFO) Obtaining the environment
change rate ( is it slow-varying or fast-varying?
) can become a major problem.
15Conclusions
- Major advantages
- No need for discretization of a continuous
perception state (Reinforcement Learning) - Ability to deal with arbitrary large
environments with any computational /memory
restrictions - No need for previous world examples ( Neural
Networks ) Agent learns from the begin.
16Conclusions
- Major drawbacks
- A similarity measure is needed
- It is difficult to choose an appropriate memory
size - This is a greedy architecture.
17Major Questions
- Self-adjustment of similarity measure
- ( Particular case identification of irrelevant
perception vector elements. There are statistical
tools that do that, but ... ) - Choosing an adequate memory size, possibly based
on - Perception vector dimension
- Bounds for each perception vector element
- Variability of the true unknown function we
are trying to estimate ( Bandwith ) - Exploration vs. Exploitation problem
18Current Work
- Sequences of actions
- Application of this architecture to
- Hidden Markov Chain
- Inverted Pendulum control
- Dynamic obstacles avoidance