Title: Dynamics of Learning VQ and Neural Gas
1Dynamics of Learning VQand Neural Gas
- Aree Witoelar, Michael Biehl
- Mathematics and Computing Science
- University of Groningen, Netherlands
- in collaboration with Barbara Hammer (Clausthal),
Anarta Ghosh (Groningen)
2Outline
- Vector Quantization (VQ)
- Analysis of (L)VQ Dynamics
- Results
- Multiple prototypes
- Neural Gas
- Learning Vector Quantization
- Summary
3Vector Quantization
4Vector Quantization
- Objective
- representation of (many) data with (few)
prototype vectors
Find optimal set W for lowest quantization error
distance to nearest prototype
data
5Example Winner Takes All (WTA)
- prototypes at areas with high density data
- (stochastic) on-line gradient descent with
respect to a cost function
6Problems
7Dynamics of VQ Analysis
8Modeltwo Gaussian clusters of high dimensional
data with class s 1,-1
Random vectors ? ? RN according to
classes 1, -1
prior prob. p, p- p p- 1
center vectors B, B- ? RN
variance ?, ?-
separation l
only separable in 2 dimensions ? simple model,
but not trivial
9Online learning
sequence of independent random data
acc. to
? ? RN
ws ? RN
update of prototype vector
learning rate, step size
strength, direction of update etc.
prototypeclass
data class
fs describes the algorithm used
winner
101. Define few characteristic quantities of the
system
projections tocluster centers
length and overlapof prototypes
11average over examples
In the thermodynamic limit N?8 ...
123. Derive ordinary differential equations
- 4. Solve for Rss(t), Qst(t)
- dynamics/asymptotic behavior (t ? 8)
- quantization/generalization error
- sensitivity to initial conditions, learning
rates, structure of data
13Results
14Vector Quantization2 prototypes
characteristic quantities
Numerical integration of the ODEs (ws(0)0
p0.6, l1.0, ?1.5, ?- 1.0, ?0.01)
quantization error
152 prototypes
Projections of prototypes on the B,B- plane at
t50
Asymptotic positions (t?8) prototypes are on the
B,B- plane
16Neural Gas a winner take most algorithm3
prototypes
?(t) decreased over time ?(t)?0 identical to WTA
?i2 ?f10-2
17Sensitivity to initialization
Neural Gas
WTA
RS-
RS-
RS
RS
at t50
at t50
?HVQ0
E(W)
plateau
t
18Summary of VQ and NG
- WTA
- (eventually) reaches minimum E(W)
- depends on initialization possible large
learning time - Neural Gas
- more robust w.r.t. initialization
19Learning Vector Quantization
20Learning Vector Quantization (LVQ)
- Objective
- classification of data using prototype vectors
21LVQ1
update winner towards/ away from data
two prototypes
22Generalization error
class
misclassified data
eg
t
p0.6, p- 0.4 ?1.5, ?-1.0
23Optimal decision boundary
(hyper)plane where
equal variance (??-) linear decision boundary
unequal variance ?gt?-
K2
24LVQ1, three prototypes
? gt?- (?0.81, ?- 0.25)
?1,1,-1
?1,1,-1
eg
eg
p
p
- LVQ1 K3 better
- Optimal K3 better
- LVQ1 K3 worse
- Optimal K3 equal to K2
- more prototypes not always better for LVQ1
- best more prototypes on the class with the
larger variance
25Summary
- dynamics of (Learning) Vector Quantization for
high dimensional data - Neural Gas
- more robust w.r.t. initialization than WTA
- LVQ1
- more prototypes not always better
- variances matter
Outlook
- study other algorithms e.g. LVQ/-, LFM, RSLVQ
- more complex models
- multi-prototype, multi-class problems
26Questions
?
27(No Transcript)
28example LVQ1
29Self averaging
Fluctuations decreases with larger degree of
freedom N
At N?8, fluctuations vanish (variance becomes
zero)
Monte Carlo simulations over 100 independent runs
30LVQ /-
update correct and incorrect winners
ds min dk with ?s sµ
dt min dk with ?t ?sµ
31Comparison LVQ1 and LVQ 2.1
LVQ2.1 ? performance depends on initial conditions
32LVQ1, three prototypes
l1.0, ?0.81, ?- 0.25
?1,1,-1
? gt ?-
? lt ?-
eg
eg
p
p
3 prototypes better
3 prototypes worse