An Exact Approach for Learning Vector Quantization - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

An Exact Approach for Learning Vector Quantization

Description:

An Exact Approach for. Learning Vector Quantization. Aree Witoelar. Michael Biehl ... Heuristics, no cost function explicitly related to generalization error ... – PowerPoint PPT presentation

Number of Views:441
Avg rating:3.0/5.0
Slides: 32
Provided by: Owne1028
Category:

less

Transcript and Presenter's Notes

Title: An Exact Approach for Learning Vector Quantization


1
An Exact Approach forLearning Vector Quantization
  • Aree Witoelar
  • Michael Biehl
  • University of Groningen

Barbara Hammer Clausthal University of Technology
2
Outline
  • Learning Vector Quantization
  • Mathematical treatment
  • Performance of LVQ algorithms
  • LVQ1, LVQ 2.1
  • Unsupervised algorithms
  • Winner Takes All, Neural Gas
  • Summary, Outlook

3
Learning Vector Quantization
  • Classification of data using
  • prototype vectors

Assign data to nearest prototype vector
4
Online learning
Sequence of independent random data
according to
Described by given LVQ algorithm
Update of prototype(s)
learning rate, step size
strength, direction of update etc.
move prototype towards currentdata
Euclidean
5
Model
Mixture of M - spherical (Gaussian) clusters of
high dimensional data
Random vectors ? ? RN according to
Projected to Span(B1,B2)
Projected to Span(v1,v2) v1, v2 random
vectors Not separable
Separable in (at most) M dimensions
6
Model
Mixture of M - spherical (Gaussian) clusters of
high dimensional data
Random vectors ? ? RN according to
  • For cluster k
  • classes sk 1, -1
  • prior prob. pk
  • S pk 1
  • offset of center from origin
  • Distance lk
  • Orientation Bk Bk1
  • variance vk

v2
2
1
3
l3
B1
4
v5
B5
5
7
Mathematical analysis
8
Mathematical analysis
In the thermodynamic limit N ? 8 ...
  • the projections
  • become correlated Gaussian quantities (Central
    Limit Theorem)

9
Mathematical analysis
3. Derive ordinary differential equations
closed in
  • 4. Solve for Rss(a), Qst(a)
  • dynamics/asymptotic behavior (a ? 8)
  • generalization error
  • sensitivity to initial conditions, learning
    rates, structure of data

10
Results
11
Supervised algorithms
LVQ 1
LVQ 2.1
Update winner (closest prototype) towards/away
from data
Update two closest prototypes wJ,wK if cJ ? cK
and ?µ falls inside a window
? window parameter, 0lt ? 1
ws winner
1
T Heaviside function
12
Supervised algorithms
LVQ /-
LVQ1
update winner (closest prototype) towards/away
from data
update two closest prototype if their classes are
different
ws winner
1
T Heaviside function
wJ, wK two closest prototypes
13
LVQ1
Order parameters
A simple model 2 prototypes, 2 clusters
p10.7, p20.3, v1v21,l1 l21, B1B20
14
Learning rate
(Asymptotic) Performance depends on learning rates
eg
?2
?1
?0.5
a
eg
??0
Optimaldecisionboundary
15
LVQ1, 3 prototypes
  • Does adding more prototypes
  • always increase performance?

Generalization error
eg
l1l21.0, p10.7, p20.3,v10.81 v20.25
16
Asymptotics
eg vs ?
eg vs p
eg
p
p p-0.5, ?- 1
? gt?- (?0.81, ?- 0.25)
Optimal class assignment is not dependent on p
Optimal assignment place more prototypes on
class with larger variance
17
Multiple prototypes
Many order parameters to observe but more
problems
18
Computing averages lt.gt
Integrations in N dim.
LVQ algorithm
y2
S2(1-dim.) Gaussian integration? analytical
S3(2-dim.) Gaussian integration? numerical
y1
19
More complex models
1 1 prototypes
eg 0.1802
There exists an optimal number of prototypes for
LVQ1
20
LVQ 2.1
  • Without window, prototypes diverge because of
    repulsion
  • But for very high N, window does not work
  • Alternatives
  • Early stopping
  • Alternative window forN? 8

always lt s
1lt s 8
21
LVQ 2.1 vs LVQ1
eg
  • Window can improve performance (with right
    parameter d)

22
Unsupervised algorithms
  • Clustering problem
  • Minimize quantization error E

distance to nearest prototype
Minimize quantizationerror directly
Less sensitiveto initialization?
23
Sensitivity to initialization
  • For an initialization far away from the cluster
    means

Neural Gas
RS2
E(W)
RS1
  • WTA
  • some prototypes rarely wins
  • Neural Gas
  • less prototypes get left behind

24
Robustness
Winner Takes All
Neural Gas
More robust with respect to initial conditions
Asymptotic configuration is sensitive to initial
conditions
25
Global minima
  • However, Neural Gas does not guarantee to find
    global minima

p30.51
(here initialized at global minima) E8 lt E0
p1p20.49
v1v2v31
26
Summary
  • An exact approach to investigate typical learning
    behavior of LVQ algorithms for certain data
    structures, eg.
  • LVQ1
  • Optimal number of prototypes exists, depends on
    the structure of data
  • More prototypes at class with larger variance
  • LVQ2.1
  • Windows can slow down divergence
  • Good performance with the right parameter
  • Neural Gas (vs. Winner Takes All)
  • Better for initialization
  • More robust w.r.t. initial conditions
  • Does not guarantee optimal solution

27
Outlook
  • Analysis for more prototypes allows more general
    study of LVQ algorithms
  • Extensions
  • LVQ 2.1, Generalized LVQ, Robust Soft LVQ
  • Self Organising Maps
  • Offline (batch) learning
  • An optimal LVQ algorithm, regardless of data
    structure, number of prototypes, etc.?

28
The End
  • Thank you for your attention

29
Central Limit Theorem
Joint density xh1,hS, b, b- becomes
corellated Gaussian quantities as N ? 8
frequency
frequency
Monte Carlo simulations N 100 over 10000 data
points
30
Validity
  • Good comparison with Monte Carlo Simulations with
    same parameters (N200)

31
Self averaging
Fluctuations decreases with larger degree of
freedom N
At N?8, fluctuations vanish (variance becomes
zero)
Monte Carlo simulations over 100 independent runs
32
Optimal class assignment, LVQ1 3 prototypes
unequal variance ?gt?-
Best decision boundary hyperquadric where
C1,1,-1
optimal
C1,-1,-1
More prototypes can produce lower performance
Write a Comment
User Comments (0)
About PowerShow.com