Neural Networks Chapter 7

About This Presentation
Title:

Neural Networks Chapter 7

Description:

* Output Nodes Hidden Nodes Input Nodes Context Nodes Elman Network Output Nodes Hidden Nodes Input Nodes Context Nodes Input Layer Hidden Layer Hidden Layer Output ... – PowerPoint PPT presentation

Number of Views:5
Avg rating:3.0/5.0
Slides: 35
Provided by: Joo71

less

Transcript and Presenter's Notes

Title: Neural Networks Chapter 7


1
Neural NetworksChapter 7
  • Joost N. Kok
  • Universiteit Leiden

2
Recurrent Networks
  • Learning Time Sequences
  • Sequence Recognition
  • Sequence Reproduction
  • Temporal Association

3
Recurrent Networks
  • Tapped Delay Lines
  • Keep several old values in a buffer

4
Recurrent Networks
  • Drawbacks
  • Length must be chosen in advance, leads to large
    number of input units, large number of training
    patterns, etc.
  • Replace fixed time delays by filters

5
Recurrent Networks
  • Partially recurrent networks

6
Recurrent Networks
  • Jordan Network

7
Recurrent Networks
  • Elman Network

8
Recurrent Networks
  • Expanded Hierarchical Elman Network

9
Recurrent Networks
10
(No Transcript)
11
Recurrent Networks
  • Back-Propagation Through Time

12
Reinforcement Learning
  • Supervised learning with some feedback
  • Reinforcement Learning Problems
  • Class I reinforcement signal is always the same
    for given input-output pair
  • Class II stochastic environment, fixed
    probability for each input-output pair
  • Class III reinforcement and input patterns
    depend on past history of network output

13
Associative Reward-Penalty
  • Stochastic Output Units
  • Reinforcement Signal
  • Target
  • Error

14
Associative Reward Penalty
  • Learning Rule

15
Models and Critics
Environment
16
Reinforcement Comparison
Environment
Critic
17
Reinforcement Learning
  • Reinforcement-Learning Model
  • Agent receives input I which is some indication
    of current state s of environment
  • Then the agent chooses an action a
  • The action changes the state of the environment
    and the value is communicated through a scalar
    reinforcement signal r

18
Reinforcement Learning
  • Environment You are in state 65. You have four
    possible actions.
  • Agent Ill take action 2.
  • Environment You received a reinforcement of 7
    units. You are now in state 15. You have two
    possible actions.
  • Agent Ill take action 1.
  • Environment You received a reinforcement of -4
    units. You are now in state 12. You have two
    possible actions.
  • Agent Ill take action 2.

19
Reinforcement Learning
  • Environment is non-deterministic
  • same action in same state may result in different
    states and different reinforcements
  • The environment is stationary
  • Probabilities of making state transitions or
    receiving specific reinforcement signals do not
    change over time

20
Reinforcement Learning
  • Two types of learning
  • Model-free learning
  • Model based learning
  • Typical application areas
  • Robots
  • Mazes
  • Games

21
Reinforcement Learning
  • Paper A short introduction to Reinforcement
    Learning (Stephan ten Hagen and Ben Krose)

22
Reinforcement Learning
  • Environment is a Markov Decision Proces

23
Reinforcement Learning
  • Optimize interaction with environment
  • Optimize action selection mechanism
  • Temporal Credit Assignment Problem
  • Policy action selection mechanism
  • Value function

24
Reinforcement Learning
  • Optimal Value function based on optimal policy

25
Reinforcement Learning
  • Policy Evaluation approximate value function for
    given policy
  • Policy Iteration start with arbitrary policy and
    improve

26
Reinforcement Learning
  • Improve Policy

27
Reinforcement Learning
  • Value Iteration combine policy evaluation and
    policy improvement steps

28
Reinforcement Learning
  • Monte Carlo use if and are not known
  • Given a policy, several complete iterations are
    performed
  • Exploration/Exploitation Dilemma
  • Extract Information
  • Optimize Interaction

29
Reinforcement Learning
  • Temporal Difference (TD) Learning
  • During interaction, part of the update can be
    calculated
  • Information from previous interactions is
    used

30
Reinforcement Learning
  • TD(l) learning discount factor l the longer
    ago the state was visited, the less it will be
    effected by the present update

31
Reinforcement Learning
  • Q-learning combine actor and critic

32
Reinforcement Learning
  • Use temporal difference learning

33
Reinforcement Learning
  • Q(l) learning

34
Reinforcement Learning
  • Feedforward Neural Networks are used when
    state/action spaces are large for of estimates of
    V(s) and Q(s,a).
Write a Comment
User Comments (0)