Pattern Association XY

About This Presentation

Title:

Pattern Association XY

Description:

The model should incorporate associations between multiple pattern pairs. ... Wnew = Wold x xT for subsequent patterns. There are problems with this approach ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 29

Provided by: michae1249

Category:

more less

Transcript and Presenter's Notes

Title: Pattern Association XY

1
Pattern AssociationX?Y

Linear Methods

2
Goal

Given sets of xs, X, produce their associated
sets of ys, Y.
The model should incorporate associations between
multiple pattern pairs.
i.e., X1 should produce Y1, X2-gtY2, X3-gtY3
And all of these associations should be encoded
within the same model.

3
Autoassociation

A special form of association is autoassociation
When given X1, model should produce X1.
Why? Its a form of fuzzy memory retrieval.
If only part of X1 is given, can system retrieve
the rest?
If a distortion of X1 is given, can system
retrieve the learned pattern X1 (i.e., clean up
the pattern).

4
How are patterns represented?

Conceptually, each feature is represented by a
node in the system.
Each node is connected to every other node
through some bidirectional weighted connection.
For any given pattern, X1, the nodes of the
network are set to a particular activation value.

5
Autoassociative Memory
6
Other representations - Vectors and Matrices

Rather than represent all of those activations
and connections in a diagram, you can use vectors
and matrices.
X1 is represented as an n-length vector,
e.g. 1, 1, 1, -1
The connections are represented by an n x n
symmetric matrix.

7
Weight matrix, W
8
An aside Linear algebra

Vector representations
Collections of features/microfeatures at the
input level

9
Measuring vector similarity

Many methods leverage the Dot product
Sum of pairwise products
a,b,c,d,e.f,g,h,i,j af bg ch di
ej
1,2,1,-1,0.2,-1,3,2,4
2-23-20 -1
Note, when two vectors point in same direction,
their dot product is maximal.
1,2,1,-1,0.1,1,1,-1,0 5
1,2,1,-1,0.-1,-1,-1,3,0 -7
Dot product is similar to correlation
Positive values similar, negative values
opposites, 0 unrelated.
However, dot product is not standardized to
-1,1.

10
Orthogonal vectors

Vectors which have a dot product of 0 are
orthogonal.
This is analogous to two vectors being at right
angles.
Indicates that the sets of features are
uncorrelated.
This is different from negatively correlated,

11
Standardizing dot products Vector length

Square root of sum of squares of all components.
This is the multivariate extension of Pythagorean
theorem (a2 b2 c2)
c square root of (a2 b2)
Length of any vector is v square root(v.v)

12
Computing similarity between two vectors

Method 1 Based on the angle between the two
vectors.
If angle is 0, then perfectly similar (1.0).
If angle is 90, then orthogonal not similar
(0.0)
If angle is gt 90, then negative similarity (lt
0.0).
Computed from the Normalized dot product
Cosine of angle bt the vectors range of -1,1

13
Other methods of computing similarity

There are many methods, well encounter a couple
others (e.g., the Minkowski metric) later in
course

14
Propagation of activity in a network Matrix
Algebra

Propagation of activations is easily computed
using matrix algebra
Dot product of a matrix and a vector is a vector
Matrix designated using capital letter, vector
with small.
W and a. W has i rows and j columns. a has j
values. W1 designates first row of W.
First value of resultant vector is W1.a, second
value is W2.a, etc.

15
Example

Picture form and matrix form

16
Autoassociation How to encode a vector?

We want a set of weights, W, that satisfies the
following relationship, W.x x, for a number of
different x vectors.
The weights are analogous to the bs of linear
regression
In other words, we want a network that encodes
the set of vectors, X, and can retrieve them.
Remember why - we can give network partial,
noisy, or similar inputs and have it complete,
clean, or retrieve similar memories.

17
Finding W

When relationships are linear, we could just
solve for W, but Hebbian learning is incremental
Hebbian learning encodes correlations
W xxT
Wx o x, or outer(x,x) (outer product) in R
When two units are on or off at same time, the
weight between them is increased, when one is off
while other is on, weight is decreased.

18
Hebbian learning, cont

When more than one pattern, x, must be learned,
just repeat
W x xT for first pattern
Wnew Wold x xT for subsequent patterns
There are problems with this approach
1. Weights grow without bound.
2. Retrieved vector will be a multiple of the
original (pointing in right direction, but wrong
length)
3. Interference between memories if xs are not
orthogonal.

19
Example 1

Want to encode the following vectors
1,0,1,2 and -1,1,-1,0
Compute W
x1 lt- c(1,0,1,2)
x2 lt- c(-1,1,-1,0)
W outer(x1,x1)
W W outer(x2,x2)
,1 ,2 ,3 ,4
1, 2 -1 2 2
2, -1 1 -1 0
3, 2 -1 2 2
4, 2 0 2 4

20
Retrieving Example 1

W x1 8, -2, 8, 12
Original was 1,0,1,2
W x2 -5, 3,-5, -4
Original was -1,1,-1,0
How close to right direction?
Use normalized dot product and show standardized
vectors
x1 .983 0.41, 0.00, 0.41, 0.82 0.48, -.12,
0.48, 0.72
x2 .867 -.58, 0.58, -0.58, 0.00 -.58,
0.35,-.58, -.46

21
Example 2

Same weight matrix as previous example that
encoded 1,0,1,2 and -1,1,-1,0
Lets try to retrieve a partial version of x1.
x1p lt- c(1,0,1,0)
x1pr lt- W x1p
Std(orig) 0.41, 0.00, 0.41, 0.82
Std(retr) 0.55,-0.28, 0.55, 0.55
Not too bad, but lets run it through again
x1prr lt- W x1pr
Std(again) 0.52,-0.20, 0.52, 0.64
And again, and again, and again 200 times
Std(mult) 0.51,-0.15, 0.51, 0.68

22
Example 3

Same weight matrix as previous example that
encoded 1,0,1,2 and -1,1,-1,0
Lets try to retrieve a noisy version of x1.
x1n lt- c(1.2,-.1,.78,2.3)
x1nr lt- W x1n
Std(orig) 0.41, 0.00, 0.41, 0.82
Std(retr) 0.48,-0.11, 0.48, 0.73
Not too bad, but lets run it through again 200
times
Std(mult) 0.51,-0.15, 0.51, 0.68
Huh? Same answer as last time!

23
Example 4 - Lets try that again

Same weight matrix as previous example that
encoded 1,0,1,2 and -1,1,-1,0
Lets try to retrieve a noisy version of x2.
x2n lt- c(-1.2, .9, -.8,.1)
x2nr lt- W x2n
Std(orig) -.58, 0.58, -.58, 0.00
Std(retr) -.58, 0.36, -.58, -.44
Not too bad, but lets do it again 200 times
Std(mult) -.51, 0.15, -.51, -.68
Huh? Thats just the negative of the previous
answer.

24
Autoassociators as attractor networks

Effectively, the network encodes attractors (in
the chaos theory sense).
Hebbian learning creates these attractors at
locations based on the vectors to be encoded.
During each retrieval, the system tries to
minimize energy (not error).
Attractor states are minimal energy states.
Any particular weight matrix is limited in the
number of attractors that it can encode.

25
Capacity of autoassociators

Patterns that are orthogonal are easy to encode
with no interference, but there are a limited
number of orthogonal patterns.
Theoretically, the largest number of distinct
patterns that you can encode in a Hebbian network
is the maximum number of orthogonal patterns that
can be represented in N units
So, pmax N.

26
Spurious Attractors

A spurious attractor is an attractor that doesnt
correspond to any of the patterns that you used
during training.
A noisy or partial version of a pattern could
settle into a spurious attractor.

27
Typical maximum capacity

Rule of thumb given prior research
For P(error) .001, a system with N units can be
expected to have up to .105N stable attractors
(i.e., 10 units will have about 1 good stable
attractor).
For P(error) .05, a system with N units can be
expected to have up to .37N stable attractors.
For P(error) .10, a system with N units can be
expected to have up to .61N stable attractors.
If you exceed that maximum number, the
performance of the system rapidly degrades
BUT, even if youre below it, youll still get
errors if the to-be-encoded patterns are highly
similar.

28
Hopfield networks

An aside - a Hopfield network is a Hebbian
autoassociator that uses threshold
(McCulloch-Pitts) units.

Write a Comment

User Comments (0)

About PowerShow.com

Pattern Association XY - PowerPoint PPT Presentation

Pattern Association XY

The model should incorporate associations between multiple pattern pairs. ... Wnew = Wold x xT for subsequent patterns. There are problems with this approach ... – PowerPoint PPT presentation