Title: Reinforcement learning
1Reinforcement learning
- Formal Modeling Approach and Neurophysiology
2Reinforcement learning
- An agent interacts with an environment and tries
to achieve goal - The agent has a feedback from the environment,
can sense its state and is able to take actions
to change the state of the environment - Learning how to map situations to actions, so as
to maximize reward
3Main elements of a reinforcement learning system
A policy Mapping from perceived states of the
environment to actions to be taken when in those
states
A reward function Mapping each state (or
state-action pair) of the environment to a
single number, a reward It indicates the
intrinsic desirability of the state, unalterable
by the agent
Defines what is desirable in IMMEDIATE SENSE
Defines what is desirable in THE LONG RUN
A value function Mapping a state to a single
number, a value, indicating the total amount of
reward that one can expect to accumulate over
the future, starting from that state
4Exploration and exploitation
- The agent has to exploit what it already knows
(in order to obtain reward using known methods) - However, it has to explore, in order to make
possible better action selections in the future
5(No Transcript)
6An example of successful machine learning by the
help of reinforcement learning principle
7Rescorla-Wagner model
8Rescorla-Wagner model
Vn1 Vn C ( Vmax Vn)
Learning rate (usually salience of stimuli x
attractiveness of the reinforcer)
The associative strength in a current trial
The associative strength for the next trial
(new, updated value)
Maximal value of associative strength that
unconditioned stimuli can support (the strength
of association with the reinforcer that is
required to fully predict the occurrence of the
reinforcer)
9Illustration Vmax 100 C 0,5
Rescorla-Wagner model
10Rescorla-Wagner model
Illustration Vmax 100 C 0,5
Vn1 Vn C ( Vmax Vn)
11Rescorla-Wagner model
Illustration Vmax 100 C 0,5
12Rescorla-Wagner model
Illustration Vmax 100 C 0,5
13Rescorla-Wagner model
Illustration Vmax 100 C 0,5
14Rescorla-Wagner model
Illustration Vmax 100 C 0,5
15Rescorla-Wagner model
Illustration Vmax 100 C 0,5
Thorndikes cat
16Dopamine neurons activity
unpredicted reward
Increasingly predicted reinforcer
Fully predicted reinforcer
Omission of reinforcer due to error response
17Dopamine pathways
Each dopamine cell body in SNR or ventral
tegmental area sends an axon to several hundred
neurons in the striatum or frontal cortex, and
has about 500,000 dopamine releasing varicosities
in the striatum