Reinforcement learning

About This Presentation

Title:

Reinforcement learning

Description:

... so as to maximize reward Main elements of a reinforcement learning system Exploration and exploitation The agent has to exploit what it already knows ... – PowerPoint PPT presentation

Number of Views:911

Avg rating:3.0/5.0

Slides: 18

Provided by: queensuCa

Category:

more less

Transcript and Presenter's Notes

Title: Reinforcement learning

1
Reinforcement learning

Formal Modeling Approach and Neurophysiology

2
Reinforcement learning

An agent interacts with an environment and tries
to achieve goal
The agent has a feedback from the environment,
can sense its state and is able to take actions
to change the state of the environment
Learning how to map situations to actions, so as
to maximize reward

3
Main elements of a reinforcement learning system
A policy Mapping from perceived states of the
environment to actions to be taken when in those
states
A reward function Mapping each state (or
state-action pair) of the environment to a
single number, a reward It indicates the
intrinsic desirability of the state, unalterable
by the agent
Defines what is desirable in IMMEDIATE SENSE
Defines what is desirable in THE LONG RUN
A value function Mapping a state to a single
number, a value, indicating the total amount of
reward that one can expect to accumulate over
the future, starting from that state
4
Exploration and exploitation

The agent has to exploit what it already knows
(in order to obtain reward using known methods)
However, it has to explore, in order to make
possible better action selections in the future

5
(No Transcript)
6
An example of successful machine learning by the
help of reinforcement learning principle
7
Rescorla-Wagner model

8
Rescorla-Wagner model
Vn1 Vn C ( Vmax Vn)
Learning rate (usually salience of stimuli x
attractiveness of the reinforcer)
The associative strength in a current trial
The associative strength for the next trial
(new, updated value)
Maximal value of associative strength that
unconditioned stimuli can support (the strength
of association with the reinforcer that is
required to fully predict the occurrence of the
reinforcer)
9
Illustration Vmax 100 C 0,5
Rescorla-Wagner model
10
Rescorla-Wagner model
Illustration Vmax 100 C 0,5
Vn1 Vn C ( Vmax Vn)
11
Rescorla-Wagner model
Illustration Vmax 100 C 0,5
12
Rescorla-Wagner model
Illustration Vmax 100 C 0,5
13
Rescorla-Wagner model
Illustration Vmax 100 C 0,5
14
Rescorla-Wagner model
Illustration Vmax 100 C 0,5
15
Rescorla-Wagner model
Illustration Vmax 100 C 0,5
Thorndikes cat
16
Dopamine neurons activity
unpredicted reward
Increasingly predicted reinforcer
Fully predicted reinforcer
Omission of reinforcer due to error response
17
Dopamine pathways
Each dopamine cell body in SNR or ventral
tegmental area sends an axon to several hundred
neurons in the striatum or frontal cortex, and
has about 500,000 dopamine releasing varicosities
in the striatum

Write a Comment

User Comments (0)