Title: LEARNING VECTOR QUANTIZATION Presentation By : Mihajlo Grbovic
1LEARNING VECTOR QUANTIZATIONPresentation By
Mihajlo Grbovic
2Learning Vector Quantization
INTRODUCTION
Learning Vector Quantization (LVQ) has been
introduced by Kohonen as a simple, universal and
efficient learning classifier. LVQ represents
a family of algorithms that are widely used in
the classification of potentially
high-dimensional data. Their popularity and
success in numerous applications is closely
related to their easy implementation and their
intuitively clear approach.
3Learning Vector Quantization
INTRODUCTION
TRAINING DATA SET
Class 1 - green Class 2 - blue Class 3 -
red Class 4 - yellow
4Learning Vector Quantization
INTRODUCTION
LVQs TASK IS TO BUILD A MODEL USING A TRAINING
DATA SET
Each test point is labeled based on the label of
the closest prototype
LVQ PROTOTYPES
LABEL TEST POINTS BASED ON THE CLOSEST LVQ
PROTOTYPES
5Learning Vector Quantization
INTRODUCTION
LVQ classification is based on the Euclidian
distance as a measure of how similar the given
data is to the so-called prototypes. The
prototypes are determined during the training
procedure using a labeled dataset. The idea is
to start with some initial positions of the
prototypes in the feature space, and then improve
them in such way that in the end they represent
the labeled data in a best possible way.
Attractive feature of LVQ is that it can be
easily applied to a multi-class problem
Depending on the complexity of the labeled
data, we choose the number of prototypes that are
involved in representation of each class. This
number can vary from only a single prototype per
each class (if class separations are simple) to a
large number of prototypes per each class (if
class separations are complex). Also, different
classes can involve different number of
prototypes depending on their distribution in
space.
6Learning Vector Quantization
INTRODUCTION
During the training procedure, positions of the
prototypes are updated based on the distance from
the points in the given dataset. Basically, we
are scanning trough the dataset and for every
point determining the closest prototype. Once
the closest prototype is found it is moved
towards (away from) the point if their classes
match (differ), respectively. LVQ is an
on-line learning algorithm, its computational
effort scales linearly with the size of the
dataset. Once one scan trough the data is
finished, the prototypes should be in their
optimal positions. However, there are some
applications where multiple scans are needed.
7Learning Vector Quantization
INTRODUCTION
There are several different LVQ algorithms that
deal with the updates of the prototypes in a
different way. Three main variants are LVQ1,
LVQ2, and LVQ3. There are also LVQ2.1, LFM,
LFMW, weighted LVQ, etc.
8Learning Vector Quantization
LVQ 1
For each training point x(t), all of the
reference vectors (prototypes) are searched and
the reference vector closest to the point is
found, using a Euclidean distance measure. If
this reference vector (prototype) mi belongs to
the same class as the training point x(t), it is
moved closer to the point, in proportion to the
distance between the two vectors If the
closest reference vector (prototype) mi belongs
to a class other than that of the point x(t), it
is moved away, again in proportion to the
distance between the two vectors
mi(tl) mi(t) a(t) (x(t) mi(t)), where a(t)
is a monotonically decreasing function of time.
mi(tl) mi(t) - a(t) (x(t) mi(t))
Prototype (class 2)
Prototype (class 1)
Point (class 1)
9Learning Vector Quantization
LVQ2
For a certain training point x(t), three
conditions must be met for LVQ learning to
occur 1) Closest prototype to x(t) has to be of
wrong class - mi. 2) Next closest prototype to
x(t) has to be of correct class - mj. 3) The
training point x(t) must fall inside a small
symmetric window defined around the midpoint of
mi and mj
mi(tl) mi(t) - a(t) (x(t) mi(t)) mj(tl)
mj(t) a(t) (x(t) mj(t))
UPDATE STEP
where x(t) is a training vector belonging to
class j, mi is the reference vector for the
incorrect category, mj is the reference vector
for the correct category and a(t) is a
monotonically decreasing function of time.
It can be seen that this scheme assures that the
decision line between the two vectors will
eventually attain a near-optimal position given
the probability distributions of the categories,
namely, the place where the distributions cross.
Common initial value for a(0) is 0.03 Let di
and dj be the distances from the certain training
point x(t) and corresponding prototypes. Then,
x(t) falls inside the window if
, where s is a constant factor, commonly chosen
between 0.4 and 0.8
10Learning Vector Quantization
LFM
For each training point x(t), all of the
reference vectors (prototypes) are searched and
the reference vector closest to the point is
found, using a Euclidean distance measure. If
this reference vector (prototype) belongs to the
same class as the training point, Do NOTHING! If
the closest reference vector (prototype) belongs
to a class other than that of the training point,
it is moved away, in proportion to the distance
between the two vectors After that find the
closest prototype mj of the same class as the
training point. This prototype is then moved
closer to the training point, again, in
proportion to the distance between the two
vectors
mi(tl) mi(t) - a(t) (x(t) mi(t))
mj(tl) mj(t) a(t) (x(t) mj(t))
Prototype (class 2)
Prototype (class 4)
Point (class 4)
Prototype (class 1)
11PROBLEMS WITH LVQ
12Problems with LVQ
Some Issues
- How to initialize positions of the prototypes?
- How many prototypes per class to choose? 10, 20,
30 Depends on the situation - Some classes have more complicated distribution
in the feature space then others, so they need
more - prototypes. How to detect this?
- If the data set is unbalanced 90 of the
training data is of class 1 and 10 of class 2,
how many prototypes - to assign to each class? More of them to class
1 or more of them to class 2? - As a result of noise some prototypes end up in
positions where they are increasing
classification - error instead of decreasing it. They are doing
more harm then good. Example - 2 Gaussians in 2D - 1
2 - If we are working on a budget (100 prototypes)
do we use them right away or we start with a
number of - prototypes and smartly increase their number
during classifications?
or
Prototype will initially be chosen here where it
will remain trapped.
2
13Problems with LVQ
Complicated Data Sets
- It can be shown that regular LVQ doesnt cope
well with complicated distributions is feature - space, even in the 2D case.
- Example
After 0 LVQ iterations (based on initial
prototype positions) Accuracy 0.6808, number of
misclassified points 3192
After 30 LVQ iterations Accuracy 0.88, number of
misclassified points 1173
After 60 LVQ iterations Accuracy 0.87, number of
misclassified points 1207
Training Data Set 10.000 points, 4 classes
Initially choose 100 points of each class as
prototypes
14Problems with LVQ
Complicated Data Sets
- Why are these points misclassified after so many
iterations? - There must be learning going on. But never the
less these points remain misclassified. - No meter how much these points are moving the
prototypes of correct class towards them, they
never seem - to come.
- There is a simple explanation for this. Some
other points are dragging them back so they can
remain correctly - classified. This means we dont have enough
prototypes. - So we come to the conclusion that we have to add
some more prototypes at certain places.
15Adaptive LVQ
LVQ add / LVQ remove
- We introduced a novel modification of LVQ called
Adaptive LVQ - The idea is to start with initial equal number
of prototypes per each class. - Than add prototypes to better describe more
complicated class regions and - remove prototypes that are increasing
classification error instead of decreasing it. - We add two steps at the end of every LVQ
iteration LVQremove and LVQadd
16Adaptive LVQ
LVQ ADD
- LVQadd concentrates on misclassified points of
each class while LVQ training. - Using Hierarchical clustering we find whole
clusters of such points that are - misclassified due to insufficient number of
prototypes of that class. - Then, we add prototypes at positions of cluster
centorids to improve classification - accuracy.
- We can control the size of clusters we want to
take into consideration and the - number of prototypes we are allowed to add.
17Adaptive LVQ
LVQ ADD
- First we isolate training points that are
misclassified by the existing prototypes
- Then we concentrate on each class separately to
find clusters of misclassified - points and determine their centroids.
interesting
not interesting
CLASS 1
CLASS 2
etc
18Adaptive LVQ
LVQ ADD
- We are not interested in small clusters of data.
We can control the sensitivity - of our algorithm (for example, consider only
clusters with 4 or more points). - After LVQadd the new prototypes will be added to
the existing ones - There is usually some budget involved. Lets say
we start with 10 prototypes all - together. We can set a limiting budget of 50
prototypes. - So if LVQadd already added 40 prototypes during
the first 30 iterations, in order - to add more it has to wait for LVQremove to
remove some of them.
19Adaptive LVQ
LVQ REMOVE
- LVQremove is introduced to deal with possible
outcomes of prototype outcasts, - trapped prototypes and prototypes that are
stuck in the position where they are - classifying more training points incorrectly
than correctly. - This can also happen to the prototypes added as
a result of LVQadd. - We are gathering statistics about each prototype
during LVQ training and - combining these statistics into a unique
prototype score. - For each prototype i
ScoreiAi-BiCi
Ai counts how many times prototype i classified
correctly (and hasnt been moved) Bi counts how
many times has prototype i been moved away as a
prototype of the wrong class Ci counts how many
times has prototype i been moved towards as a
prototype of the correct class.
20Adaptive LVQ
LVQ REMOVE
- Prototypes with negative score are increasing
classification error instead of - decreasing it and as a result they are
removed. - Based on the SCORE, prototype is a good
prototype if it has to be moved a small - number of times AND it classifies correctly a
large number of times. It is STABLE! - The purpose of LVQremove is to detect bad
prototypes and remove them
1, Outcast prototypes - Never, or almost never
selected as the closest prototypes. They have
small Ai and small Ci
They are not influencing any point.
These prototypes can be removed simply and
without the
implementation of SCORE. 2. Prototypes that are
too close to one another We merge them 3,
Trapped prototypes - Large number of times
selected as closest prototypes
but they usually
misclassify. They have large Bi and small Ai.
- They can
never escape their destiny and will always be
moved around (2D Gauss case)
21Adaptive LVQ
IMPLEMENTATION
- LVQadd and LVQremove together form Adaptive LVQ
that can be applied to any - algorithm in the LVQ family (with slight
adjustments). - For example LVQ2LVQaddLVQremoveAdaptive LVQ2
- LVQremove and LVQadd are performed after each
LVQ iteration respectively. - Adaptive LVQ has many interesting applications.
We can use it to -
- - form multi-class BUDGET classification
algorithm - - determine which class needs more prototypes
and which less - - determine how many prototypes is enough for
good classification
22EXPERIMENTS AND RESULTS
23RESULTS
COMPLICATED 2D CASE
- We use the same data set as before. This time we
start with 20 training points - (5 of each class) as initial prototypes and
use Adaptive LVQ to build a model. - Our limit is 100 prototypes, since we did
previous experiments with this number - of prototypes.
After 30 LVQ iterations Accuracy 0.982, number
of misclassified points 174
Training Data Set 10.000 points, 4 classes
24RESULTS
MAJOR DATA SETS
- We compared Adaptive LVQ to Regular LVQ in
classification results on 10 major data sets. - Adaptive LVQ brings 6.4 accuracy improvement on
average
DATA SET classes prototypes LVQ2 only LVQ2 LVQ add/remove
Adult 2 100 0.8017 0.8136
Letter 26 858 0.7750 0.8968
Usps 10 280 0.8854 0.9253
Shuttle 7 105 0.8974 0.9954
Ijcnn 2 242 0.7633 0.9339
Pendigits 2 84 0.9703 0.9834
Gauss 56 52 0.8842 0.9152
Ionosphere 2 14 0.906 0.9602
Iris 3 6 0.9133 0.9666
Vowel 11 82 0.4480 0.4913
AVERAGE 0.82446 0.88817