Title: Neural Networks Chapter 7
1Neural NetworksChapter 7
- Joost N. Kok
- Universiteit Leiden
2Recurrent Networks
- Learning Time Sequences
- Sequence Recognition
- Sequence Reproduction
- Temporal Association
3Recurrent Networks
- Tapped Delay Lines
- Keep several old values in a buffer
4Recurrent Networks
- Drawbacks
- Length must be chosen in advance, leads to large
number of input units, large number of training
patterns, etc. - Replace fixed time delays by filters
5Recurrent Networks
- Partially recurrent networks
6Recurrent Networks
7Recurrent Networks
8Recurrent Networks
- Expanded Hierarchical Elman Network
9Recurrent Networks
10(No Transcript)
11Recurrent Networks
- Back-Propagation Through Time
12Reinforcement Learning
- Supervised learning with some feedback
- Reinforcement Learning Problems
- Class I reinforcement signal is always the same
for given input-output pair - Class II stochastic environment, fixed
probability for each input-output pair - Class III reinforcement and input patterns
depend on past history of network output
13Associative Reward-Penalty
- Stochastic Output Units
- Reinforcement Signal
- Target
- Error
14Associative Reward Penalty
15Models and Critics
Environment
16Reinforcement Comparison
Environment
Critic
17Reinforcement Learning
- Reinforcement-Learning Model
- Agent receives input I which is some indication
of current state s of environment - Then the agent chooses an action a
- The action changes the state of the environment
and the value is communicated through a scalar
reinforcement signal r
18Reinforcement Learning
- Environment You are in state 65. You have four
possible actions. - Agent Ill take action 2.
- Environment You received a reinforcement of 7
units. You are now in state 15. You have two
possible actions. - Agent Ill take action 1.
- Environment You received a reinforcement of -4
units. You are now in state 12. You have two
possible actions. - Agent Ill take action 2.
19Reinforcement Learning
- Environment is non-deterministic
- same action in same state may result in different
states and different reinforcements - The environment is stationary
- Probabilities of making state transitions or
receiving specific reinforcement signals do not
change over time
20Reinforcement Learning
- Two types of learning
- Model-free learning
- Model based learning
- Typical application areas
- Robots
- Mazes
- Games
21Reinforcement Learning
- Paper A short introduction to Reinforcement
Learning (Stephan ten Hagen and Ben Krose)
22Reinforcement Learning
- Environment is a Markov Decision Proces
23Reinforcement Learning
- Optimize interaction with environment
- Optimize action selection mechanism
- Temporal Credit Assignment Problem
- Policy action selection mechanism
- Value function
24Reinforcement Learning
- Optimal Value function based on optimal policy
25Reinforcement Learning
- Policy Evaluation approximate value function for
given policy - Policy Iteration start with arbitrary policy and
improve
26Reinforcement Learning
27Reinforcement Learning
- Value Iteration combine policy evaluation and
policy improvement steps
28Reinforcement Learning
- Monte Carlo use if and are not known
- Given a policy, several complete iterations are
performed - Exploration/Exploitation Dilemma
- Extract Information
- Optimize Interaction
29Reinforcement Learning
- Temporal Difference (TD) Learning
- During interaction, part of the update can be
calculated - Information from previous interactions is
used
30Reinforcement Learning
- TD(l) learning discount factor l the longer
ago the state was visited, the less it will be
effected by the present update
31Reinforcement Learning
- Q-learning combine actor and critic
32Reinforcement Learning
- Use temporal difference learning
33Reinforcement Learning
34Reinforcement Learning
- Feedforward Neural Networks are used when
state/action spaces are large for of estimates of
V(s) and Q(s,a).