FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR

About This Presentation

Title:

FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR

Description:

(Pattern Recognition in Information System, PRIS, 2003) 2 ... wa are the weights vector obtained by the model in the previous iteration. ... – PowerPoint PPT presentation

Number of Views:91

Avg rating:3.0/5.0

Slides: 16

Provided by: josmartn

Category:

more less

Transcript and Presenter's Notes

Title: FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR

1
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST
SQUARES ESTIMATOR
J.M. Sotoca (Pattern Recognition in Information
System, PRIS, 2003)
2
Feature selection process with validation
Selection
Evaluation
Validation
Subset of
Original set
features
of features
S E L E C T E D
Goodness of the subset
S U B S E T
Stopping criterion
no
yes
3
Filter and wrapper methods
4
Validation weighting-selection
5
Comparative of feature weighting methods

Nearest hit Search for each instance x, the
nearest neighbour with the same class.
Nearest miss Search for each instance x, the
nearest neighbour with different class.
ReliefF Algorithm (Kononenko, 1994)
This algorithm calculates for each
feature and m instances randomly of TS, the
difference between nearest miss and nearest hit.
ReliefF is a extension for multi-class data sets.
Class Weighted-L2 (CW_L2) (Paredes and Vidal,
2000)
This method obtains a set of weights
(one weight per attribute and class) by means of
gradient-descent minimisation of an appropriate
criterion function based in the division of
nearest hit with nearest miss.

6
The Generalized Least Squares(GLS)

Initialisation
wi 1.0
n (d x K) 2 is the number of
observations for each instance x.
Qll is a matrix equal to identity matrix
assuming isotropic error in the observations ?.
In each iteration t, do
Calculate the matrices A, B, Qww BQllBT and the
vector of residual functions W.
Calculate the new weights wt
Until the residual or leaving-one-out error rate
is minimum

7
A class-intensity-based model

Class Intensity Sum of the influences of each
neighbour pk with class label c(pk) over a
instance x of the Training Set (TS). This
influence is inverse of the squared distance D
as
w Weights vector or parameters of the
model.
? Observations vector in the TS. It is
formed by the set of differences d x K to take
part in the neighbourhood, where K is the number
of neighbours and d is the number of dimensions.
The charge class C is defined as follow

8
A class-intensity-based model

The squared criterion distance D can be expressed
as follows
where max(xi) and min(xi) are the maximum
and minimum of the feature i.

9
Feature Weight Estimation

For each instance x ?TS, the criterion function
to minimise is
where Ex1(w,?) is the class intensity in
the actual iteration and Ex2(wa,?) is when all
neighbours have the same class label. wa are the
weights vector obtained by the model in the
previous iteration.
The parameters model w w1,...,wd in the
d-dimensional feature space, collect the
relevance of the features.

10
Feature Weight Estimation

The observations vector is the set of all ?ki,
k 1,...,K, i1,...,d. Also, we add Ex1 and Ex2
in ours observations over the instance x.
The vector of residual functions is defined as
follows

11
Descriptions of data sets

The main characteristics are summarised in the
table (the number of irrelevant features are
given in brackets).
Six artificial databases (Led17, Monk 1-3,
Waveform and Waveform40) have been chosen to
evaluate performance under controller conditions.

12
Empirical Results

Validation with the k-NN classifier rule. We call
(wi 1.0) in the case of non-weighted k-NN
classification.
The first five columns correspond to the results
when using the 1-NN rule, while the last columns
are those from the best k-NN classifiers (1? k ?
21).

13
Learning capability
14
Concluding remarks

A new feature weighting method has been
introduced. It basically consist to minimisation
a criterion function through generalised least
squared (GLS).
The behaviour of the GLS algorithm proposed here
is similar to that of the well-known ReliefF
approach.
Studying the learning rate of ReliefF and GLS
models, both obtain goods results in presence of
irrelevant attributes, while GLS is able to
obtain better results when all attributes are
relevants.