FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR

Description:

(Pattern Recognition in Information System, PRIS, 2003) 2 ... wa are the weights vector obtained by the model in the previous iteration. ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 16
Provided by: josmartn
Category:

less

Transcript and Presenter's Notes

Title: FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR


1
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST
SQUARES ESTIMATOR
J.M. Sotoca (Pattern Recognition in Information
System, PRIS, 2003)
2
Feature selection process with validation
Selection
Evaluation
Validation
Subset of
Original set
features
of features
S E L E C T E D
Goodness of the subset
S U B S E T
Stopping criterion
no
yes
3
Filter and wrapper methods
4
Validation weighting-selection
5
Comparative of feature weighting methods
  • Nearest hit Search for each instance x, the
    nearest neighbour with the same class.
  • Nearest miss Search for each instance x, the
    nearest neighbour with different class.
  • ReliefF Algorithm (Kononenko, 1994)
  • This algorithm calculates for each
    feature and m instances randomly of TS, the
    difference between nearest miss and nearest hit.
    ReliefF is a extension for multi-class data sets.
  • Class Weighted-L2 (CW_L2) (Paredes and Vidal,
    2000)
  • This method obtains a set of weights
    (one weight per attribute and class) by means of
    gradient-descent minimisation of an appropriate
    criterion function based in the division of
    nearest hit with nearest miss.

6
The Generalized Least Squares(GLS)
  • Initialisation
  • wi 1.0
  • n (d x K) 2 is the number of
    observations for each instance x.
  • Qll is a matrix equal to identity matrix
    assuming isotropic error in the observations ?.
  • In each iteration t, do
  • Calculate the matrices A, B, Qww BQllBT and the
    vector of residual functions W.
  • Calculate the new weights wt
  • Until the residual or leaving-one-out error rate
    is minimum

7
A class-intensity-based model
  • Class Intensity Sum of the influences of each
    neighbour pk with class label c(pk) over a
    instance x of the Training Set (TS). This
    influence is inverse of the squared distance D
    as
  • w Weights vector or parameters of the
    model.
  • ? Observations vector in the TS. It is
    formed by the set of differences d x K to take
    part in the neighbourhood, where K is the number
    of neighbours and d is the number of dimensions.
  • The charge class C is defined as follow

8
A class-intensity-based model
  • The squared criterion distance D can be expressed
    as follows
  • where max(xi) and min(xi) are the maximum
    and minimum of the feature i.

9
Feature Weight Estimation
  • For each instance x ?TS, the criterion function
    to minimise is
  • where Ex1(w,?) is the class intensity in
    the actual iteration and Ex2(wa,?) is when all
    neighbours have the same class label. wa are the
    weights vector obtained by the model in the
    previous iteration.
  • The parameters model w w1,...,wd in the
    d-dimensional feature space, collect the
    relevance of the features.

10
Feature Weight Estimation
  • The observations vector is the set of all ?ki,
    k 1,...,K, i1,...,d. Also, we add Ex1 and Ex2
    in ours observations over the instance x.
  • The vector of residual functions is defined as
    follows

11
Descriptions of data sets
  • The main characteristics are summarised in the
    table (the number of irrelevant features are
    given in brackets).
  • Six artificial databases (Led17, Monk 1-3,
    Waveform and Waveform40) have been chosen to
    evaluate performance under controller conditions.

12
Empirical Results
  • Validation with the k-NN classifier rule. We call
    (wi 1.0) in the case of non-weighted k-NN
    classification.
  • The first five columns correspond to the results
    when using the 1-NN rule, while the last columns
    are those from the best k-NN classifiers (1? k ?
    21).

13
Learning capability
14
Concluding remarks
  • A new feature weighting method has been
    introduced. It basically consist to minimisation
    a criterion function through generalised least
    squared (GLS).
  • The behaviour of the GLS algorithm proposed here
    is similar to that of the well-known ReliefF
    approach.
  • Studying the learning rate of ReliefF and GLS
    models, both obtain goods results in presence of
    irrelevant attributes, while GLS is able to
    obtain better results when all attributes are
    relevants.

15
Further works
  • Movement of the set of observed data ?.
  • Detection of outliers.
  • Simultaneous fit of multiple models. Feature
    selection by class.
Write a Comment
User Comments (0)
About PowerShow.com