Reinforcement Learning

About This Presentation

Title:

Description:

Number of Views:15

Avg rating:3.0/5.0

Slides: 9

Provided by: vladimir5

Category:

Tags: coverage | depth | learning | of | reinforcement

Transcript and Presenter's Notes

Title: Reinforcement Learning

1
Reinforcement Learning

2
Outline

3
What is Reinforcement Learning (RL)?

4
Types of RL

5
Passive RL

Policy is known and fixed, need to learn how good
it is environment model
Learn U(st) but do not know P(st st-1,
at-1) and R( st ) from at
Method conduct trials, receive sequence of
actions, states and rewards (at,st,Rt) ,
compute model parameters and utility

at NA A A A
st NL NL L L
rt -20 0 20 20
6
Direct utility estimation

Sample 1
Sample 2
at NA A A A
st NL NL L L
rt -20 0 20 20
at NA A A NA A
st NL NL L L L
rt -20 0 20 5 20

Drawback does not use the fact that utilities of
states are dependent (Bellman equations)!

7
Adaptive dynamic programming

8
TD-Learning

Only update U-values for observed transitions
Algorithm
Receive new sample pair, (st,st1)
Assume only transition st ? st1 can occur
Compute update of U
Does not need to compute model parameters!( Yet
converges to the right solution. )

Old value
Old value
New value
Value computed fromBellman equation

Write a Comment

User Comments (0)