RESULTS OF THE NIPS 2006 - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

RESULTS OF THE NIPS 2006

Description:

Best ave. BER still held by Reference (Gavin Cawley) with the_bad. Part II. PROTOCOL and SCORING ... Ave. test BER. H._Jair_Escalante. Juha Reunanen ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 40
Provided by: Isabell47
Category:
Tags: nips | results | the | ave

less

Transcript and Presenter's Notes

Title: RESULTS OF THE NIPS 2006


1
  • RESULTS OF THE NIPS 2006
  • MODEL SELECTION GAME
  • Isabelle Guyon, Amir Saffari, Gideon Dror,
  • Gavin Cawley, Olivier Guyon,
  • and many other volunteers, see http//www.agnostic
    .inf.ethz.ch/credits.php

2
Thanks
3
Part I
  • INTRODUCTION

4
Model selection
  • Selecting models (neural net, decision tree, SVM,
    )
  • Selecting hyperparameters (number of hidden
    units, weight decay/ridge, kernel parameters, )
  • Selecting variables or features (space
    dimensionality reduction.)
  • Selecting patterns (data cleaning, data
    reduction, e.g by clustering.)

5
Performance prediction challenge
  • How good are you at predicting
  • how good you are?
  • Practically important in pilot studies.
  • Good performance predictions render model
    selection trivial.

6
Model Selection Game
  • Find which model works best
  • in a well controlled environment.
  • A given sandbox the CLOP Matlab toolbox.
  • Focus only on devising model selection strategy.
  • Same datasets as the performance prediction
    challenge, but reshuffled
  • Two 500 prizes offered.

7
Agnostic Learning vs. Prior Knowledge challenge
  • When everything else fails,
  • ask for additional domain knowledge
  • Two tracks
  • Agnostic learning Preprocessed datasets in a
    nice feature-based representation, but no
    knowledge about the identity of the features.
  • Prior knowledge Raw data, sometimes not in a
    feature-based representation. Information given
    about the nature and structure of the data.

8
Game rules
  • Date started October 1st, 2006.
  • Date ended December 1st, 2006
  • Duration 3 months.
  • Submit in Agnostic track only.
  • Optionally use CLOP or Spider.
  • Five last complete entries ranked
  • Total ALvsPK challenge entrants 22.
  • Total ALvsPK developement entries 546.
  • Number of game ranked participants 10.
  • Number of game ranked submissions 39.

9
Datasets
Type
Dataset
Domain
Feat-ures
Training Examples
Validation Examples
Test Examples
Dense
ADA
415
Marketing
48
4147
41471
Dense
GINA
Digits
970
3153
315
31532
Dense
HIVA
384
Drug discovery
1617
3845
38449
Sparse binary
NOVA
Text classif.
16969
1754
175
17537
Dense
SYLVA
1308
Ecology
216
13086
130858
http//www.agnostic.inf.ethz.ch
10
Baseline BER distribution(Performance prediction
challenge, 145 entrants)
Test BER
11
Agnostic track on Dec. 1st 2006
  • Yellow used a CLOP model
  • CLOP prize winner Juha Reunanen
    (both ave. rank and ave. BER)
  • Best ave. BER still held by Reference (Gavin
    Cawley) with the_bad.

12
Part II
  • PROTOCOL and SCORING

13
Protocol
  • Data split training/validation/test.
  • Data proportions 10/1/100.
  • Online feed-back on validation data.
  • Validation label release not yet one month
    before end of challenge.
  • Final ranking on test data using the five last
    complete submissions for each entrant.

14
Performance metrics
  • Balanced Error Rate (BER) average of error rates
    of positive class and negative class.
  • Area Under the ROC Curve (AUC).
  • Guess error (for the performance prediction
    challenge only)
  • dBER abs(testBER guessedBER)

15
CLOP
  • CLOPChallenge Learning Object Package.
  • Based on the Spider developed at the Max Planck
    Institute.
  • Two basic abstractions
  • Data object
  • Model object

http//www.agnostic.inf.ethz.ch/models.php
16
CLOP tutorial
At the Matlab prompt
  • Ddata(X,Y)
  • hyper 'degree3', 'shrinkage0.1'
  • model kridge(hyper)
  • resu, model train(model, D)
  • tresu test(model, testD)
  • model chain(standardize,kridge(hyper))

17
CLOP models
18
Preprocessing and FS
19
Model grouping
for k110 base_modelkchain(standardize,
naive) end my_modelensemble(base_model)
20
Part III
  • RESULT ANALYSIS

21
What did we expect?
  • Learn about new competitive machine learning
    techniques.
  • Identify competitive methods of performance
    prediction, model selection, and ensemble
    learning (theory put into practice).
  • Drive research in the direction of refining such
    methods (on-going benchmark).

22
Method comparison (PPC)
Agnostic track no significant improvement so far
dBER
Test BER
23
LS-SVM
Gavin Cawley, July 2006
24
Logitboost
Roman Lutz, July 2006
25
CLOP models (best entrant)
 
Juha Reunanen, cross-indexing-7
 
sns shiftnscale, std standardize, norm
normalize (some details of hyperparameters not
shown)
26
CLOP models (2nd best entrant)
 
Hugo Jair Escalante Balderas, BRun2311062
 
sns shiftnscale, std standardize, norm
normalize (some details of hyperparameters not
shown) Note entry Boosting_1_001_x900 gave
better results, but was older.
27
Danger of overfitting (PPC)
Full line test BER Dashed line validation BER
0.5
0.45
0.4
0.35
HIVA
0.3
BER
0.25
0.2
ADA
0.15
0.1
NOVA
GINA
0.05
SYLVA
0
0
20
40
60
80
100
120
140
160
Time (days)
28
Two best CLOP entrants (game)
Ave. test BER
H._Jair_Escalante
Juha Reunanen
Time
Statistically significant difference for 3/5
datasets.
29
Stats / CV / bounds ???
30
Top ranking methods
  • Performance prediction
  • CV with many splits 90 train / 10 validation
  • Nested CV loops
  • Model selection
  • Performance prediction challenge
  • Use of a single model family
  • Regularized risk / Bayesian priors
  • Ensemble methods
  • Nested CV loops, computationally efficient with
    with VLOO
  • Model selection game
  • Cross-indexing
  • Particle swarm

31
Part IV
  • COMPETE NOW
  • in the
  • PRIOR KNOWLEDGE TRACK

32
ADA
  • ADA is the marketing database
  • Task Discover high revenue people from census
    data. Two-class pb.
  • Source Census bureau, Adult database from the
    UCI machine-learning repository.
  • Features 14 original attributes including age,
    workclass,  education, education, marital status,
    occupation, native country. Continuous, binary
    and categorical features.
  •  

33
GINA
GINA is the digit database
  • Task Handwritten digit recognition. Separate the
    odd from the even digits. Two-class pb. with
    heterogeneous classes.
  • Source MNIST database formatted by LeCun and
    Cortes.
  • Features 28x28 pixel map.
  •  

34
HIVA
  • HIVA is the HIV database
  • Task Find compounds active against the AIDS HIV
    infection. We brought it back to a two-class pb.
    (active vs. inactive), but provide the original
    labels (active, moderately active, and inactive).
  • Data source National Cancer Inst.
  • Data representation The compounds are
    represented by their 3d molecular structure.
  •  

35
NOVA
Subject Re Goalie masksLines 21Tom
Barrasso wore a great mask, one time, last
season.  He unveiled it at a game in Boston. 
It was all black, with Pgh city scenes on it.
The "Golden Triangle" graced the top, alongwith
a steel mill on one side and the Civic Arena on
the other.   On the back of the helmet was the
old Pens' logo the current (at the time)
Penslogo, and a space for the "new" logo.A
great mask done in by a goalie's
superstition.Lori 
  • NOVA is the text classification database
  • Task Classify newsgroup emails into politics or
    religion vs. other topics.
  • Source The 20-Newsgroup dataset from in the UCI
    machine-learning repository.
  • Data representation The raw text with an
    estimated 17000 words of vocabulary.

36
SYLVA
  • SYLVA is the ecology database
  • Task Classify forest cover types into Ponderosa
    pine vs. everything else.
  • Source US Forest Service (USFS).
  • Data representation Forest cover type for 30 x
    30 meter cells encoded with 108 features
    (elavation, hill shade, wilderness type, soil
    type, etc.)
  •  

37
How to enter?
  • Enter results on any dataset in either track
    until March 1st 2007 at http//www.agnostic.inf.et
    hz.ch.
  • Only complete entries (on 5 datasets) will be
    ranked. The 5 last will count.
  • Seven prizes
  • Best overall agnostic entry.
  • Best overall prior knowledge entry.
  • Best prior knowledge result in each dataset (5
    prizes).
  • Best paper.

38
Conclusions
  • Less participation volume as in the previous
    challenges
  • Entry level higher
  • Other on-going competitions
  • Top methods in agnostic track as before
  • LS-SVMs and boosted logistic trees
  • Top ranking entries closely followed by CLOP
    entries showing great advances in model
    selection.
  • Todo upgrade CLOP with LS-SVMs and logitboost.

39
Open problems
  • Bridge the gap between theory and practice
  • What are the best estimators of the variance of
    CV?
  • What should k be in k-fold?
  • Are other cross-validation methods better than
    k-fold (e.g bootstrap, 5x2CV)?
  • Are there better hybrid methods?
  • What search strategies are best?
  • More than 2 levels of inference?
Write a Comment
User Comments (0)
About PowerShow.com