Pattern Association XY - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Pattern Association XY

Description:

The model should incorporate associations between multiple pattern pairs. ... Wnew = Wold x xT for subsequent patterns. There are problems with this approach ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 29
Provided by: michae1249
Category:

less

Transcript and Presenter's Notes

Title: Pattern Association XY


1
Pattern AssociationX?Y
  • Linear Methods

2
Goal
  • Given sets of xs, X, produce their associated
    sets of ys, Y.
  • The model should incorporate associations between
    multiple pattern pairs.
  • i.e., X1 should produce Y1, X2-gtY2, X3-gtY3
  • And all of these associations should be encoded
    within the same model.

3
Autoassociation
  • A special form of association is autoassociation
  • When given X1, model should produce X1.
  • Why? Its a form of fuzzy memory retrieval.
  • If only part of X1 is given, can system retrieve
    the rest?
  • If a distortion of X1 is given, can system
    retrieve the learned pattern X1 (i.e., clean up
    the pattern).

4
How are patterns represented?
  • Conceptually, each feature is represented by a
    node in the system.
  • Each node is connected to every other node
    through some bidirectional weighted connection.
  • For any given pattern, X1, the nodes of the
    network are set to a particular activation value.

5
Autoassociative Memory
6
Other representations - Vectors and Matrices
  • Rather than represent all of those activations
    and connections in a diagram, you can use vectors
    and matrices.
  • X1 is represented as an n-length vector,
  • e.g. 1, 1, 1, -1
  • The connections are represented by an n x n
    symmetric matrix.

7
Weight matrix, W
8
An aside Linear algebra
  • Vector representations
  • Collections of features/microfeatures at the
    input level

9
Measuring vector similarity
  • Many methods leverage the Dot product
  • Sum of pairwise products
  • a,b,c,d,e.f,g,h,i,j af bg ch di
    ej
  • 1,2,1,-1,0.2,-1,3,2,4
  • 2-23-20 -1
  • Note, when two vectors point in same direction,
    their dot product is maximal.
  • 1,2,1,-1,0.1,1,1,-1,0 5
  • 1,2,1,-1,0.-1,-1,-1,3,0 -7
  • Dot product is similar to correlation
  • Positive values similar, negative values
    opposites, 0 unrelated.
  • However, dot product is not standardized to
    -1,1.

10
Orthogonal vectors
  • Vectors which have a dot product of 0 are
    orthogonal.
  • This is analogous to two vectors being at right
    angles.
  • Indicates that the sets of features are
    uncorrelated.
  • This is different from negatively correlated,

11
Standardizing dot products Vector length
  • Square root of sum of squares of all components.
  • This is the multivariate extension of Pythagorean
    theorem (a2 b2 c2)
  • c square root of (a2 b2)
  • Length of any vector is v square root(v.v)

12
Computing similarity between two vectors
  • Method 1 Based on the angle between the two
    vectors.
  • If angle is 0, then perfectly similar (1.0).
  • If angle is 90, then orthogonal not similar
    (0.0)
  • If angle is gt 90, then negative similarity (lt
    0.0).
  • Computed from the Normalized dot product
  • Cosine of angle bt the vectors range of -1,1

13
Other methods of computing similarity
  • There are many methods, well encounter a couple
    others (e.g., the Minkowski metric) later in
    course

14
Propagation of activity in a network Matrix
Algebra
  • Propagation of activations is easily computed
    using matrix algebra
  • Dot product of a matrix and a vector is a vector
  • Matrix designated using capital letter, vector
    with small.
  • W and a. W has i rows and j columns. a has j
    values. W1 designates first row of W.
  • First value of resultant vector is W1.a, second
    value is W2.a, etc.

15
Example
  • Picture form and matrix form

16
Autoassociation How to encode a vector?
  • We want a set of weights, W, that satisfies the
    following relationship, W.x x, for a number of
    different x vectors.
  • The weights are analogous to the bs of linear
    regression
  • In other words, we want a network that encodes
    the set of vectors, X, and can retrieve them.
  • Remember why - we can give network partial,
    noisy, or similar inputs and have it complete,
    clean, or retrieve similar memories.

17
Finding W
  • When relationships are linear, we could just
    solve for W, but Hebbian learning is incremental
  • Hebbian learning encodes correlations
  • W xxT
  • Wx o x, or outer(x,x) (outer product) in R
  • When two units are on or off at same time, the
    weight between them is increased, when one is off
    while other is on, weight is decreased.

18
Hebbian learning, cont
  • When more than one pattern, x, must be learned,
    just repeat
  • W x xT for first pattern
  • Wnew Wold x xT for subsequent patterns
  • There are problems with this approach
  • 1. Weights grow without bound.
  • 2. Retrieved vector will be a multiple of the
    original (pointing in right direction, but wrong
    length)
  • 3. Interference between memories if xs are not
    orthogonal.

19
Example 1
  • Want to encode the following vectors
  • 1,0,1,2 and -1,1,-1,0
  • Compute W
  • x1 lt- c(1,0,1,2)
  • x2 lt- c(-1,1,-1,0)
  • W outer(x1,x1)
  • W W outer(x2,x2)
  • ,1 ,2 ,3 ,4
  • 1, 2 -1 2 2
  • 2, -1 1 -1 0
  • 3, 2 -1 2 2
  • 4, 2 0 2 4

20
Retrieving Example 1
  • W x1 8, -2, 8, 12
  • Original was 1,0,1,2
  • W x2 -5, 3,-5, -4
  • Original was -1,1,-1,0
  • How close to right direction?
  • Use normalized dot product and show standardized
    vectors
  • x1 .983 0.41, 0.00, 0.41, 0.82 0.48, -.12,
    0.48, 0.72
  • x2 .867 -.58, 0.58, -0.58, 0.00 -.58,
    0.35,-.58, -.46

21
Example 2
  • Same weight matrix as previous example that
    encoded 1,0,1,2 and -1,1,-1,0
  • Lets try to retrieve a partial version of x1.
  • x1p lt- c(1,0,1,0)
  • x1pr lt- W x1p
  • Std(orig) 0.41, 0.00, 0.41, 0.82
  • Std(retr) 0.55,-0.28, 0.55, 0.55
  • Not too bad, but lets run it through again
  • x1prr lt- W x1pr
  • Std(again) 0.52,-0.20, 0.52, 0.64
  • And again, and again, and again 200 times
  • Std(mult) 0.51,-0.15, 0.51, 0.68

22
Example 3
  • Same weight matrix as previous example that
    encoded 1,0,1,2 and -1,1,-1,0
  • Lets try to retrieve a noisy version of x1.
  • x1n lt- c(1.2,-.1,.78,2.3)
  • x1nr lt- W x1n
  • Std(orig) 0.41, 0.00, 0.41, 0.82
  • Std(retr) 0.48,-0.11, 0.48, 0.73
  • Not too bad, but lets run it through again 200
    times
  • Std(mult) 0.51,-0.15, 0.51, 0.68
  • Huh? Same answer as last time!

23
Example 4 - Lets try that again
  • Same weight matrix as previous example that
    encoded 1,0,1,2 and -1,1,-1,0
  • Lets try to retrieve a noisy version of x2.
  • x2n lt- c(-1.2, .9, -.8,.1)
  • x2nr lt- W x2n
  • Std(orig) -.58, 0.58, -.58, 0.00
  • Std(retr) -.58, 0.36, -.58, -.44
  • Not too bad, but lets do it again 200 times
  • Std(mult) -.51, 0.15, -.51, -.68
  • Huh? Thats just the negative of the previous
    answer.

24
Autoassociators as attractor networks
  • Effectively, the network encodes attractors (in
    the chaos theory sense).
  • Hebbian learning creates these attractors at
    locations based on the vectors to be encoded.
  • During each retrieval, the system tries to
    minimize energy (not error).
  • Attractor states are minimal energy states.
  • Any particular weight matrix is limited in the
    number of attractors that it can encode.

25
Capacity of autoassociators
  • Patterns that are orthogonal are easy to encode
    with no interference, but there are a limited
    number of orthogonal patterns.
  • Theoretically, the largest number of distinct
    patterns that you can encode in a Hebbian network
    is the maximum number of orthogonal patterns that
    can be represented in N units
  • So, pmax N.

26
Spurious Attractors
  • A spurious attractor is an attractor that doesnt
    correspond to any of the patterns that you used
    during training.
  • A noisy or partial version of a pattern could
    settle into a spurious attractor.

27
Typical maximum capacity
  • Rule of thumb given prior research
  • For P(error) .001, a system with N units can be
    expected to have up to .105N stable attractors
    (i.e., 10 units will have about 1 good stable
    attractor).
  • For P(error) .05, a system with N units can be
    expected to have up to .37N stable attractors.
  • For P(error) .10, a system with N units can be
    expected to have up to .61N stable attractors.
  • If you exceed that maximum number, the
    performance of the system rapidly degrades
  • BUT, even if youre below it, youll still get
    errors if the to-be-encoded patterns are highly
    similar.

28
Hopfield networks
  • An aside - a Hopfield network is a Hebbian
    autoassociator that uses threshold
    (McCulloch-Pitts) units.
Write a Comment
User Comments (0)
About PowerShow.com