RESULTS OF THE WCCI 2006 - PowerPoint PPT Presentation

About This Presentation
Title:

RESULTS OF THE WCCI 2006

Description:

GINA: Kari Torkkola & Eugene Tuv with ACE RLSC. HIVA: Gavin Cawley with Final #3 (corrected) ... GINA. SYLVA. HIVA. NOVA. Part III. RESULT ANALYSIS. What did ... – PowerPoint PPT presentation

Number of Views:157
Avg rating:3.0/5.0
Slides: 34
Provided by: Isabell47
Category:
Tags: results | the | wcci | gina

less

Transcript and Presenter's Notes

Title: RESULTS OF THE WCCI 2006


1
  • RESULTS OF THE WCCI 2006
  • PERFORMANCE PREDICTION
  • CHALLENGE
  • Isabelle Guyon
  • Amir Reza Saffari Azar Alamdari
  • Gideon Dror

2
Part I
  • INTRODUCTION

3
Model selection
  • Selecting models (neural net, decision tree, SVM,
    )
  • Selecting hyperparameters (number of hidden
    units, weight decay/ridge, kernel parameters, )
  • Selecting variables or features (space
    dimensionality reduction.)
  • Selecting patterns (data cleaning, data
    reduction, e.g by clustering.)

4
Performance prediction
  • How good are you at predicting
  • how good you are?
  • Practically important in pilot studies.
  • Good performance predictions render model
    selection trivial.

5
Why a challenge?
  • Stimulate research and push the state-of-the art.
  • Move towards fair comparisons and give a voice to
    methods that work but may not be backed up by
    theory (yet).
  • Find practical solutions to true problems.
  • Have fun

6
History
  • USPS/NIST.
  • Unipen (with Lambert Schomaker) 40 institutions
    share 5 million handwritten characters.
  • KDD cup, TREC, CASP, CAMDA, ICDAR, etc.
  • NIPS challenge on unlabeled data.
  • Feature selection challenge (with Steve Gunn)
    success! 75 entrants, thousands of entries.
  • Pascal challenges.
  • Performance prediction challenge

1980 1990 2000 2001 2002 2003 2004 2005
7
Challenge
  • Date started Friday September 30, 2005.
  • Date ended Monday March 1, 2006
  • Duration 21 weeks.
  • Estimated number of entrants 145.
  • Number of development entries 4228.
  • Number of ranked participants 28.
  • Number of ranked submissions 117.

8
Datasets
Type
Dataset
Domain
Feat-ures
Training Examples
Validation Examples
Test Examples
Dense
ADA
415
Marketing
48
4147
41471
Dense
GINA
Digits
970
3153
315
31532
Dense
HIVA
384
Drug discovery
1617
3845
38449
Sparse binary
NOVA
Text classif.
16969
1754
175
17537
Dense
SYLVA
1308
Ecology
216
13086
130858
http//www.modelselect.inf.ethz.ch/
9
BER distribution
Test BER
10
Results
  • Overall winners for ranked entries
  • Ave rank Roman Lutz with LB tree mix cut
    adapted
  • Ave score Gavin Cawley with Final 2
  • ADA Marc Boullé with SNB(CMA)10k F(2D) tv
    or SNB(CMA) 100k F(2D) tv
  • GINA Kari Torkkola Eugene Tuv with
    ACERLSC
  • HIVA Gavin Cawley with Final 3 (corrected)
  • NOVA Gavin Cawley with Final 1
  • SYLVA Marc Boullé with SNB(CMA) 10k F(3D) tv
  • Best AUC Radford Neal with Bayesian Neural
    Networks

11
Part II
  • PROTOCOL and SCORING

12
Protocol
  • Data split training/validation/test.
  • Data proportions 10/1/100.
  • Online feed-back on validation data.
  • Validation label release one month before end of
    challenge.
  • Final ranking on test data using the five last
    complete submissions for each entrant.

13
Performance metrics
  • Balanced Error Rate (BER) average of error rates
    of positive class and negative class.
  • Guess error dBER abs(testBER guessedBER)
  • Area Under the ROC Curve (AUC).

14
Optimistic guesses
HIVA
ADA
GINA
NOVA
SYLVA
15
Scoring method
  • E testBER dBER 1-exp(- g dBER/s)
  • dBER abs(testBER guessedBER)
  • g1

Challenge score
Test BER
Guessed BER
Test BER
16
dBER/s
E ? testBER dBER
ADA
HIVA
dBER/s
GINA
NOVA
SYLVA
Test BER
17
Score
E testBER dBER 1-exp(- g dBER/s)
E
testBERdBER
testBER
18
Score (continued)
ADA
GINA
SYLVA
HIVA
NOVA
19
Part III
  • RESULT ANALYSIS

20
What did we expect?
  • Learn about new competitive machine learning
    techniques.
  • Identify competitive methods of performance
    prediction, model selection, and ensemble
    learning (theory put into practice.)
  • Drive research in the direction of refining such
    methods (on-going benchmark.)

21
Method comparison
dBER
Test BER
22
Danger of overfitting
Full line test BER Dashed line validation BER
0.5
0.45
0.4
0.35
HIVA
0.3
BER
0.25
0.2
ADA
0.15
0.1
NOVA
GINA
0.05
SYLVA
0
0
20
40
60
80
100
120
140
160
Time (days)
23
How to estimate the BER?
  • Statistical tests (Stats) Compute it on training
    data compare with a null hypothesis e.g. the
    results obtained with a random permutation of the
    labels.
  • Cross-validation (CV) Split the training data
    many times into training and validation set
    average the validation data results.
  • Guaranteed risk minimization (GRM) Use of
    theoretical performance bounds.

24
Stats / CV / GRM ???
25
Top ranking methods
  • Performance prediction
  • CV with many splits 90 train / 10 validation
  • Nested CV loops
  • Model selection
  • Use of a single model family
  • Regularized risk / Bayesian priors
  • Ensemble methods
  • Nested CV loops, computationally efficient with
    with VLOO

26
Other methods
  • Use of training data only
  • Training BER.
  • Statistical tests.
  • Bayesian evidence.
  • Performance bounds.
  • Bilevel optimization.

27
Part IV
  • CONCLUSIONS AND FURTHER WORK

28
Open problems
  • Bridge the gap between theory and practice
  • What are the best estimators of the variance of
    CV?
  • What should k be in k-fold?
  • Are other cross-validation methods better than
    k-fold (e.g bootstrap, 5x2CV)?
  • Are there better hybrid methods?
  • What search strategies are best?
  • More than 2 levels of inference?

29
Future work
  • Game of model selection.
  • JMLR special topic on model selection.
  • IJCNN 2007 challenge!

30
Benchmarking model selection?
  • Performance prediction Participants just need to
    provide a guess of their test performance. If
    they can solve that problem, they can perform
    model selection efficiently. Easy and motivating.
  • Selection of a model from a finite toolbox In
    principle a more controlled benchmark, but less
    attractive to participants.

31
CLOP
  • CLOPChallenge Learning Object Package.
  • Based on the Spider developed at the Max Planck
    Institute.
  • Two basic abstractions
  • Data object
  • Model object

http//clopinet.com/isabelle/Projects/modelselect/
MFAQ.html
32
CLOP tutorial
At the Matlab prompt
  • Ddata(X,Y)
  • hyper 'degree3', 'shrinkage0.1'
  • model kridge(hyper)
  • resu, model train(model, D)
  • tresu test(model, testD)
  • model chain(standardize,kridge(hyper))

33
Conclusions
  • Twice as much volume of participation as in the
    feature selection challenge
  • Top methods as before (different order)
  • Ensembles of trees
  • Kernel methods (RLSC/LS-SVM, SVM)
  • Bayesian neural networks
  • Naïve Bayes.
  • Danger of overfitting.
  • Triumph of cross-validation?
Write a Comment
User Comments (0)
About PowerShow.com