Bernard Ans, Stphane Rousset, - PowerPoint PPT Presentation

About This Presentation
Title:

Bernard Ans, Stphane Rousset,

Description:

Bernard Ans, St phane Rousset, Robert M. French & Serban Musca ... Real cognition requires the ability to learn sequences of patterns (or actions) ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 40
Provided by: ULg5
Learn more at: http://science.slc.edu
Category:

less

Transcript and Presenter's Notes

Title: Bernard Ans, Stphane Rousset,


1

Preventing Catastrophic Interference in
Multiple-Sequence Learning Using Coupled
Reverberating Elman Networks
  • Bernard Ans, Stéphane Rousset,
  • Robert M. French Serban Musca
  • (European Commission grant HPRN-CT-1999-00065)

2
The Problem of Multiple-Sequence Learning
  • Real cognition requires the ability to learn
    sequences of patterns (or actions). (This is why
    SRNs Elman Networks were originally
    developed.)
  • But learning sequences really means being able to
    learn multiple sequences without the most
    recently learned ones erasing the previously
    learned ones.
  • Catastrophic interference is a serious problem
    for the sequential learning of individual
    patterns. It is far worse when multiple
    sequences of patterns have to be learned
    consecutively.

3
The Solution
  • We have developed a dual-network system using
    coupled Elman networks that completely solves
    this problem.
  • These two separate networks exchange information
    by means of reverberated pseudopatterns.

4
Pseudopatterns
Assume a network-in-a-box learns a series of
patterns produced by a function f(x). These
original patterns are no longer available. How
can you approximate f(x)?
5
1 0 0 1 1
Random Input
6
1 1 0
Associated output
1 0 0 1 1
Random Input
7
1 1 0
Associated output
1 0 0 1 1
Random Input
This creates a pseudopattern ?1 1 0 0 1
1 ? 1 1 0
8
A large enough collection of these
pseudopatterns ?1 1 0 0 1 1 ? 1 1 0 ?2 1 1
0 0 0 ? 0 1 1 ?3 0 0 0 1 0 ? 1 0 0 ?4 0 1 1
1 1 ? 0 0 0 Etc will approximate the originally
learned function.
9
Transferring information from Net 1 to Net 2
with pseudopatterns
Associated output
1 1 0
target
1 1 0
Net 2
Net 1
input
1 0 0 1 1
Random input
1 0 0 1 1
10
Information transfer by pseudopatterns in
dual-network systems
  • New information is presented to one network (Net
    1).
  • Pseudopatterns are generated by Net 2 where
    previously learned information is stored.
  • Net 1 then trains not only on the new pattern(s)
    to be learned, but also on the pseudopatterns
    produced by Net 2.
  • Once Net 1 has learned the new information, it
    generates (lots of) pseudopatterns that train Net
    2

This is why we say that information is
continually transferred between the two networks
by means of pseudopatterns.
11
Are all pseudopatterns created equal? No.
  • Even though the simple dual-network system (i.e.,
    new learning in one network long-term storage in
    the other) using simple pseudopatterns does
    eliminate catastrophic interference, we can do
    better using reverberated pseudopatterns.

12
Building a Network that uses reverberated
pseudopatterns.
Start with a standard backpropagation network
Output layer
Hidden layer
Input layer
13
Add an autoassociator
Output layer
Hidden layer
Input layer
14
A new pattern to be learned, P Input ? Target,
will be learned as shown below.
Target
Input
Input
15
What are reverberated pseudopatterns andhow
are they generated?
16
We start with a random input î0, feed it through
the network and collect the output on the
autoassociative side of the network.. This
output is fed back into the input layer
(reverberated) and, again, the output on the
autoassociative side is collected. This is done R
times.
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
After R reverberations, we associate the
reverberated input and the target output.
This forms the reverberated pseudopattern
22

This dual-network approach using reverberated
pseudopattern information transfer between the
two networks effectively overcomes catastrophic
interference in multiple-pattern learning
23
But what about multiple-sequence learning?
Elman networks are designed to learn sequences
of patterns. But they forget catastrophically
when they attempt to learn multiple
sequences. Can we generalize the dual-network,
reverberated pseudopattern technique to dual
Elman networks and eliminate catastrophic
interference in multiple-sequence learning? Yes
24
Elman networks (a.k.a. Simple Recurrent Networks)
S(t1)
Copy hidden unit activations from previous
time-step
Hidden H(t)
Standard input S(t)
Context H(t-1)
Learning a sequence S(1), S(2), , S(n).
25
A Reverberated Simple Recurrent Network (RSRN)
an Elman network with an autoassociative part
26
RSRN technique for sequentially learning two
sequences A(t) and B(t).
  • Net 1 learns A(t) completely.
  • Reverberated pseudopattern transfer to Net 2.
  • Net 1 makes one weight-change pass through B(t).
  • Net 2 generates a few static reverberated
    pseudopatterns
  • Net 1 does one learning epoch on these
    pseudopatterns from Net 2.
  • Continue until Net 1 has learned B(t).
  • Test how well Net 1 has retained A(t).

27

Two sequences to be learned A(0), A(1), A(10)
and B(0), B(1), B(10)
Net 1
Net 2
Net 1 learns (completely) sequence A(0), A(1), ,
A(10)
28
Transferring the learning to Net 2
1110010011010
1110010011010 Teacher
Net 1
Net 2
010110100110010
010110100110010 Input
Net 1 produces 10,000 pseudopatterns,
29
Transferring the learning to Net 2
1110010011010 Teacher
Net 1
Net 2
feedforward
010110100110010 Input
30
Transferring the learning to Net 2
1110010011010 Teacher
Backprop weight change
Net 1
Net 2
010110100110010 Input
For each of the 10,000 pseudopatterns produced by
Net 1, Net 2 makes 1 FF-BP pass.
31
Learning B(0), B(1), B(10) by NET 1
Net 1
Net 2
1. Net 1 does ONE learning epoch on sequence
B(0), B(1), , B(10)
2. Net 2 generates a few pseudopatterns ?NET 2
3. Net 1 does one FF-BP pass on each ?NET 2
32
Learning B(0), B(1), B(10) by NET 1
Net 1
Net 2
1. Net 1 does ONE learning epoch on sequence
B(0), B(1), , B(10)
2. Net 2 generates a few pseudopatterns ?NET 2
3. Net 1 does one FF-BP pass on each ?NET 2
Continue until Net 1 has learned B(0), B(1), ,
B(10)
33
Sequences chosen
  • Twenty-two distinct random binary vectors of
    length 100 are created.
  • Half of these vectors are used to produce the
    first ordered sequence of items, A, denoted by
    A(0), A(1), , A(10).
  • The remaining 11 vectors are used to create a
    second sequence of items, B, denoted by B(0),
    B(1), , B(10).
  • In order to introduce a degree of ambiguity into
    each sequence (so that a simple BP network would
    not be able to learn them), we modify each
    sequence so that A(8) A(5) and B(5) B(1).

34
Test method
  • First, sequence A is completely learned by the
    network.
  • Then sequence B is learned.
  • During the course of learning, we monitor at
    regular intervals how much of sequence A has been
    forgotten by the network.

35
Normal Elman networks Catastrophic forgetting
(a) Learning of sequence B (after having
previously learned sequence A). By 450 epochs (an
epoch corresponds to one pass through the entire
sequence), sequence B has been completely
learned. (b) The number of incorrect
units (out of 100) for each serial position of
sequence A during learning of sequence B. After
450 epochs, the SRN has, for all intents and
purposes, completely forgotten the previously
learned sequence A
36
Dual-RSRNs Catastrophic forgetting is eliminated
Recall performance for sequences B and A during
learning of sequence B by a dual-network RSRN.
(a) By 400 epochs, the second sequence B has
been completely learned. (b) The previously
learned sequence A shows virtually no forgetting.
Catastrophic forgetting of the previously learned
sequence A has been completely overcome.
37
Normal Elman Network Massive forgetting Error
on Sequence A
Dual RSRN No forgetting of Sequence A
Seq. B being learned
38
Cognitive/Neurobiological plausibility?
  • The brain, somehow, does not forget
    catastrophically.
  • Separating new learning from previously learned
    information seems necessary.
  • McClelland, McNaughton, OReilly (1995) have
    suggested the hippocampal-neocortical separation
    may be Natures way of solving this problem.
  • Pseudopattern transfer is not so far-fetched if
    we accept results that claim that neo-cortical
    memory consolidation, is due, at least in part,
    to REM sleep.

39
Conclusions
  • The RSRN reverberating dual-network architecture
    (Ans Rousset, 1997, 2000) can be generalized to
    sequential learning of multiple temporal
    sequences.
  • When learning multiple sequences of patterns,
    interleaving simple reverberated input-output
    pseudopatterns, each of which reflect the entire
    previously learned sequence(s), reduces (or
    eliminates entirely) forgetting of the initially
    learned sequence(s).
Write a Comment
User Comments (0)
About PowerShow.com