Lab 1 - PowerPoint PPT Presentation

About This Presentation
Title:

Lab 1

Description:

Why is the LOO error sometimes larger than the training and test error? ... testD = data(rand(3,8), [-1 -1 1]'); tresu = test(model, testD) ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 55
Provided by: Isabell47
Category:
Tags: lab

less

Transcript and Presenter's Notes

Title: Lab 1


1
Lab 1
Getting started with Basic Learning
Machines and the Overfitting Problem
2
Lab 1
Polynomial regression
3
Matlab POLY_GUI
  • The code implements the ridge regression
    algorithm wargmin Si (1-yi f(xi))2 g w 2
  • f(x) w1 x w2 x2 wn xn w xT
  • x x, x2, , xn
  • wT XY
  • X XT(XXTg)-1(XTX g)-1XT
  • Xx(1) x(2) x(p) (matrix (p, n))
  • The leave-one-out error (LOO) is obtained with
    PRESS statistic (Predicted REsidual Sums of
    Squares.)
  • LOO error (1/p) Sk rk/1-(XX)kk 2

4
Matlab POLY_GUI
5
Matlab POLY_GUI
  • At the prompt type poly_gui
  • Vary the parameters. Refrain from hitting CV.
    Explain what happens in the following situations
  • Sample num. ltlt Target degree (small noise)
  • Large noise, small sample num
  • Target degree ltlt Model degree
  • Why is the LOO error sometimes larger than the
    training and test error?
  • Are there local minima in the LOO error? Is the
    LOO error flat near the optimum?
  • Propose ways of getting a better solution.

6
CLOP Data Objects
The poly_gui emulates CLOP objects of type data
  • X rand(10,5)
  • Y rand(10,1)
  • D data(X,Y) constructor
  • methods(D)
  • get_x(D)
  • get_y(D)
  • plot(D)

7
CLOP Model Objects
poly_ridge is a model object.
  • P poly_ridge h plot(P)
  • D gene(P) plot(D, h)
  • resu, P train(P, D)
  • mse(resu)
  • Dt gene(P)
  • tresu, P test(P, Dt)
  • mse(tresu)
  • plot(P, h)

8
Lab 1
Support Vector Machines
9
Support Vector Classifier
Boser-Guyon-Vapnik-1992
10
Matlab SVC_GUI
  • At the prompt type svc_gui
  • The code implements the Support Vector Machine
    algorithm with kernel
  • k(s, t) (1 s ? t)q exp -gs-t2
  • Regularization similar to ridge regression
  • Hinge loss L(xi)max(0, 1-yi f(xi))b
  • Empirical risk Si L(xi)
  • wargmin (1/C) w2 Si L(xi)

shrinkage
11
Lab 1
More loss functions
12
Loss Functions
13
Exercise Gradient Descent
  • Linear discriminant f(x) Sj wj xj
  • Functional margin zy f(x), y?1
  • Compute ?z/ ?wj
  • Derive the learning rules Dwj-h ?L/?wj
    corresponding to the following loss functions

SVC loss max(0, 1-z)
Adaboost loss e-z
square loss (1- z)2
logistic loss log(1e-z)
Perceptron loss max(0, -z)
14
Exercise Dual Algorithms
  • From the Dwj derive the Dw
  • w Si ai xi
  • From the Dw, derive the Dai of the dual
    algorithms.

15
Summary
  • Modern ML algorithms optimize a penalized risk
    functional

16
Lab 2
Getting started with CLOP
17
Lab 2
CLOP tutorial
18
What is CLOP?
  • CLOPChallenge Learning Object Package.
  • Based on the Spider developed at the Max Planck
    Institute.
  • Two basic abstractions
  • Data object
  • Model object
  • Put the CLOP directory in your path.
  • At the prompt type use_spider_clop
  • If you have used before poly_gui type
  • clear classes

19
CLOP Data Objects
At the Matlab prompt
  • addpath(ltclop_dirgt)
  • use_spider_clop
  • Xrand(10,8)
  • Y1 1 1 1 1 -1 -1 -1 -1 -1'
  • Ddata(X,Y) constructor
  • p,nget_dim(D)
  • get_x(D)
  • get_y(D)

20
CLOP Model Objects
D is a data object previously defined.
  • model kridge constructor
  • resu, model train(model, D)
  • resu, model.W, model.b0
  • Yhat D.Xmodel.W' model.b0
  • testD data(rand(3,8), -1 -1 1')
  • tresu test(model, testD)
  • balanced_errate(tresu.X, tresu.Y)

21
Hyperparameters and Chains
A model often has hyperparameters
  • default(kridge)
  • hyper 'degree3', 'shrinkage0.1'
  • model kridge(hyper)
  • model chain(standardize,kridge(hyper))
  • resu, model train(model, D)
  • tresu test(model, testD)
  • balanced_errate(tresu.X, tresu.Y)

Models can be chained
22
Hyper-parameters
  • Kernel methods kridge and svc
  • k(x, y) (coef0 x ? y)degree exp(-gamma x -
    y2)
  • kij k(xi, xj)
  • kii ? kii shrinkage
  • Naïve Bayes naive none
  • Neural network neural
  • units, shrinkage, maxiter
  • Random Forest rf (windows only)
  • mtry

23
Exercise
  • Here some the pattern recognition CLOP objects
  • _at_rf _at_naive
  • _at_svc _at_neural
  • _at_gentleboost _at_lssvm
  • _at_gkridge _at_kridge
  • _at_klogistic _at_logitboost
  • Try at the prompt example(neural)
  • Try other pattern recognition objects
  • Try different sets of hyperparameters, e.g.,
    example(svc('gamma1', 'shrinkage0.001'))
  • Remember use default(method) to get the HP.

24
Lab 2
Example Digit Recognition
Subset of the MNIST data of LeCun and Cortes used
for the NIPS2003 challenge
25
data(X, Y)
  • Go to the Gisette directory
  • cd('GISETTE')
  • Load validation data
  • Xtload('gisette_valid.data')
  • Ytload('gisette_valid.labels')
  • Create a data object
  • and examine it
  • Dtdata(Xt, Yt)
  • browse(Dt, 2)
  • Load training data (longer)
  • Xload('gisette_train.data')
  • Yload('gisette_train.labels')
  • p, nget_dim(Dt)
  • Dtrain(subsample('p_max' num2str(p)), data(X,
    Y))
  • clear X Y Xt Yt

26
model(hyperparam)
  • Define some hyperparameters
  • hyper 'degree3', 'shrinkage0.1'
  • Create a kernel ridge
  • regression model
  • model kridge(hyper)
  • Train it and test it
  • resu, Model train(model, D)
  • tresu test(Model, Dt)
  • Visualize the results
  • roc(tresu)
  • idxfind(tresu.X.tresu.Ylt0)
  • browse(get(D, idx), 2)

27
Exercise
  • Here are some pattern recognition CLOP objects
  • _at_rf _at_naive _at_gentleboost
  • _at_svc _at_neural _at_logitboost
  • _at_kridge _at_lssvm _at_klogistic
  • Instanciate a model with some hyperparameters
    (use default(method) to get the HP)
  • Vary the HP and the number of training examples
    (Hint use get(D, 1n) to restrict the data to n
    examples).

28
chain(model1, model2,)
  • Combine preprocessing and kernel ridge
    regression
  • my_prepronormalize
  • model chain(my_prepro,kridge(hyper))
  • Combine replicas of a base learner
  • for k110
  • base_modelkneural
  • end
  • modelensemble(base_model)

ensemble(model1, model2,)
29
Exercise
  • Here are some preprocessing CLOP objects
  • _at_normalize _at_standardize _at_fourier
  • Chain a preprocessing and a model, e.g.,
  • modelchain(fourier, kridge('degree3'))
  • my_classifsvc('coef01', 'degree4', 'gamma0',
    'shrinkage0.1')
  • modelchain(normalize, my_classif)
  • Train, test, visualize the results. Hint you can
    browse the preprocessed data
  • browse(train(standardize, D), 2)

30
Summary
  • After creating your complex model, just one
    command train
  • modelensemble(chain(standardize,kridge(hyper))
    ,chain(normalize,naive))
  • resu, Model train(model, D)
  • After training your complex model, just one
    command test
  • tresu test(Model, Dt)
  • You can use a cv object to perform
    cross-validation
  • cv_modelcv(model)
  • resu, Model train(model, D)
  • roc(resu)

31
Lab 3
Getting started with Feature Selection
32
POLY_GUI again
  • clear classes
  • poly_gui
  • Check the Multiplicative updates (MU) box.
  • Play with the parameters.
  • Try CV
  • Compare with no MU

33
Lab 3
Exploring feature selection methods
34
Re-load the GISETTE data
  • Start CLOP
  • clear classes
  • use_spider_clop
  • Go to the Gisette directory
  • cd('GISETTE')
  • load('gisette')

35
Visualization
  • 1) Create a heatmap of the data matrix or a
    subset
  • show(D)
  • show(get(D,110, 12500))
  • 2) Look at individual patterns
  • browse(D)
  • browse(D, 2) For 2d data
  • Display feature positions
  • browse(D, 2, 212, 463, 429, 239)
  • 3) Make a scatter plot of a few
    featuresscatter(D, 212, 463, 429, 239)

36
Example
  • my_classifsvc('coef01', 'degree3', 'gamma0',
    'shrinkage1')
  • modelchain(normalize, s2n('f_max100'),
    my_classif)
  • resu, Model train(model, D)
  • tresu test(Model, Dt)
  • roc(tresu)
  • Show the misclassified first
  • s,idxsort(tresu.X.tresu.Y)
  • browse(get(Dt, idx), 2, Model2)

37
Some Filters in CLOP
  • Univariate
  • _at_s2n (Signal to noise ratio.)
  • _at_Ttest (T statistic similar to s2n.)
  • _at_Pearson (Uses Matlab corrcoef. Gives the same
    results as Ttest, classes are balanced.)
  • _at_aucfs (ranksum test)
  • Multivariate
  • _at_relief (no elimination of redundancy)
  • _at_gs (Gram-Schmidt orthogonalization
    complementary features)

38
Exercise
  • Change the feature selection algorithm
  • Visualize the features
  • What can you say of the various methods?
  • Which one gives the best results for 2, 10, 100
    features?
  • Can you improve by changing the preprocessing?
    (Hint try _at_pc_extract)

39
Lab 3
Feature significance
40
T-test
m-
m
P(XiY1)
P(XiY-1)
-1
xi
s-
s
  • Normally distributed classes, equal variance s2
    unknown estimated from data as s2within.
  • Null hypothesis H0 m m-
  • T statistic If H0 is true,
  • t (m - m-)/(swithin?1/m1/m-)
    Student(mm--2 d.f.)

41
Evalution of pval and FDR
  • Ttest object
  • computes pval analytically
  • FDRpvalnsc/n
  • probe object
  • takes any feature ranking object as an argument
    (e.g. s2n, relief, Ttest)
  • pvalnsp/np
  • FDRpvalnsc/n

42
Analytic vs. probe
43
Example
  • resu, FS train(Ttest, D)
  • resu, PFS train(probe(Ttest), D)
  • figure('Name', 'pvalue')
  • plot(get_pval(FS, 1), 'r')
  • hold on plot(get_pval(PFS, 1))
  • figure('Name', 'FDR')
  • plot(get_fdr(FS, 1), 'r')
  • hold on plot(get_pval(PFS, 1))

44
Exercise
  • What could explain differences between the pvalue
    and fdr with the analytic and probe method?
  • Replace Ttest with chain(rmconst('w_min0'),
    Ttest)
  • Recompute the pvalue and fdr curves. What do you
    notice?
  • Choose an optimum number fnum of features based
    on pvalue or FDR. Visualize with browse(D, 2,FS,
    fnum)
  • Create a model with fnum. Is fnum optimal? Do you
    get something better with CV?

45
Lab 3
Local feature selection
46
Exercise
  • Consider the 1 nearest neighbor algorithm. We
    define the following score
  • Where s(k) (resp. d(k)) is the index of the
    nearest neighbor of xk belonging to the same
    class (resp. different class) as xk.

47
Exercise
  1. Motivate the choice of such a cost function to
    approximate the generalization error (qualitative
    answer)
  2. How would you derive an embedded method to
    perform feature selection for 1 nearest neighbor
    using this functional?
  3. Motivate your choice (what makes your method an
    embedded method and not a wrapper method)

48
Relief
ReliefltDmiss/Dhitgt
Local_Relief Dmiss/Dhit
nearest hit
Dhit
Dmiss
nearest miss
Dhit
Dmiss
49
Exercise
  • resu, FS train(relief, D)
  • browse(D, 2,FS, 20)
  • resu, LFS train(local_relief,D)
  • browse(D, 2,LFS, 20)
  • Propose a modification to the nearest neighbor
    algorithm that uses features relevant to
    individual patterns (like those provided by
    local_relief).
  • Do you anticipate such an algorithm to perform
    better than the non-local version using relief?

50
Epilogue
Becoming a pro and playing with other datasets
51
Some CLOP objects
52
http//clopinet.com/challenges/
  • Challenges in
  • Feature selection
  • Performance prediction
  • Model selection
  • Causality
  • Large datasets

53
NIPS 2003 Feature Selection Challenge
54
NIPS 2006 Model Selection Game
NOVA
First place Juha Reunanen, cross-indexing-7
Subject Re Goalie masksLines 21Tom Barrasso
wore a great mask, one time, last season. It was
all black, with Pgh city scenes on it. The
"Golden Triangle" graced the top, along with a
steel mill on one side and the Civic Arena on the
other. On the back of the helmet was the old
Pens' logo the current (at the time) Pens logo,
and a space for the "new" logo.Lori 
sns shiftnscale, std standardize, norm
normalize (some details of hyperparameters not
shown)
Second place Hugo Jair Escalante Balderas,
BRun2311062
 
GINA
Proc. IJCNN07, Orlando, FL, Aug, 2007 PSMS for
Neural Networks H. Jair Escalante, Manuel Montes
y Gomez, and Luis Enrique Sucar Model Selection
and Assessment Using Cross-indexing, Juha Reunanen
 
sns shiftnscale, std standardize, norm
normalize (some details of hyperparameters not
shown) Note entry Boosting_1_001_x900 gave
better results, but was older.
Write a Comment
User Comments (0)
About PowerShow.com