36x48 vertical poster template - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

36x48 vertical poster template

Description:

The NIPS 2003 challenge was organized to find feature extraction algorithms that ... In this project, we address that, by using single filter methods, learning ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 2
Provided by: jayb102
Category:

less

Transcript and Presenter's Notes

Title: 36x48 vertical poster template


1
University of Zurich
Filter Support Vector Machine for NIPS 2003
Challenge Jiwen Li
Department of Informatics
The NIPS 2003 challenge was organized to find
feature extraction algorithms that can improve
learning machines performance on five
well-chosen datasets. In this project, we address
that, by using single filter methods, learning
performance could match or outperform the
performances of the best entries in NIPS 2003. We
mainly used TP, S2N and Relief as filter methods,
and support vector machine as the classifier. The
experiment results did show that these simple
combination was sufficient to get a space
dimensionality reduction comparable to what the
winners obtained and keep the similar learning
performance as the winners at the same time.
Support Vector Machine
Filter Methods
Support Vector Machines are learning machines
that can perform binary classification and
regression estimation tasks. Two results make
this approach successful - A function only
on support vectors. - non-linearly map
n- dimensional input space into a high
dimensional feature space.
T-statistic compares the means of the two
Gaussian distributions in projection on a given
variable. The T-statistic has a Student
distribution with m1m2-2 degrees of freedom,
where m1 and m2 are the number of examples of
each distribution, and s1 and s2 the estimation
of the standard deviations of the distributions
obtained from the available examples. S2N
coefficient measures the ratio of the "signal"
(the difference between the mean values of the
two classes), and the "noise" (the within class
standard deviation). Relief are based on the
feature weighting, estimating how well the value
of a given feature helps to distinguish between
instances that are near to each other. For a
randomly selected sample x, two nearest
neighbors, (xd from the same class, and xs from a
different class), are found. Then, the Relief
relevance Index for x is increased by a small
amount proportional to the difference
X(x)-X(xd) and is decreased by a small amount
Proportional to X(x)-X(xs). After a large
number of iterations this index captures local
correlations between feature values and their
ability to help in discrimination of vectors
from different classes.
NISP 2003 Dataset
ARCENE Cancer Diagnosis - feature10000 -
training data100 - validation data100 -
test data700 GISETTE digit
recognition - feature
5000 - training data
6000 - validation data
1000 - test data 6500 DEXTER text
categorization - feature 20000 - training
data 300 - validation data 300 - test data
2000 DOROTHEA drug
discovery - feature
100000 - training data
800 - validation data
350 - test data 800 MADELON
artificial data - feature 500 - training
data 2000 - validation data 600 - test
data 1800
Experiment Methods
ARCENE my_svcsvc('coef02', 'degree3',
'gamma0', 'shrinkage0.1') my_mode
chain(relief('f_max1400'), normalize,
my_svc)
with 5 cross-validation.
GISETTE my_svcsvc('coef01', 'degree5',
'gamma0', 'shrinkage1') my_mode
chain(normalize, match_filter(pca_bank(f_max50
')), my_svc) with 5
cross-validation DEXTER my_svcsvc('coef01',
'degree1', 'gamma0', 'shrinkage0.5') my_mode
lchain(s2n('f_max4500'), normalize,
my_svc) with 5
cross-validation DOROTHEA my_modelchain(TP(
f_max15000), normailze, relief(f_max700),
naïve, bias) MADELON my_svcsvc('coef01
', 'degree0', 'gamma1', 'shrinkage1') my_mod
elchain(probe(relief,'p_num2000',
f_max15'), standardize, my_svc) wit
h 10 cross-validation
Experiment Results

Dataset Name Selected Feature Training BER Validation BER Test BER
ARCENE 1400 0.0089 0.0000 0.1048
GISETTE 0.0000 0.0000 0.0103
DEXTER 4500 0.0000 0.0000 0.0325
DOROTHEA 700 0.0258 0.1015 0.0930
MADELON 15 0.0080 0.0050 0.0667
Write a Comment
User Comments (0)
About PowerShow.com