Title: LSSVMlab
1LS-SVMlab Large scale modeling
Kristiaan Pelckmans, ESAT- SCD/SISTA J.A.K.
Suykens, B. De Moor
2Content
- I. Overview
- II. Classification
- III. Regression
- IV. Unsupervised Learning
- V. Time-series
- VI. Conclusions and Outlooks
3- People
- Contributors to LS-SVMlab
- Kristiaan Pelckmans
- Johan Suykens
- Tony Van Gestel
- Jos De Brabanter
- Lukas Lukas
- Bart Hamers
- Emmanuel Lambert
- Supervisors
- Bart De Moor
- Johan Suykens
- Joos Vandewalle
Acknowledgements Our research is supported by
grants from several funding agencies and sources
Research Council K.U.Leuven Concerted Research
Action GOA-Mefisto 666 (Mathematical
Engineering), IDO (IOTA Oncology, Genetic
networks), several PhD/postdoc fellow grants
Flemish Government Fund for Scientific Research
FWO Flanders (several PhD/postdoc grants,
projects G.0407.02 (support vector machines),
G.0080.01 (collective intelligence), G.0256.97
(subspace), G.0115.01 (bio-i and microarrays),
G.0240.99 (multilinear algebra), G.0197.02 (power
islands), research communities ICCoS, ANMMM), AWI
(Bil. Int. Collaboration South Africa, Hungary
and Poland), IWT (Soft4s (softsensors),
STWW-Genprom (gene promotor prediction), GBOU
McKnow (Knowledge management algorithms),
Eureka-Impact (MPC-control), Eureka-FLiTE
(flutter modeling), several PhD-grants) Belgian
Federal Government DWTC (IUAP IV-02 (1996-2001)
and IUAP V-10-29 (2002-2006) Dynamical Systems
and Control Computation, Identification
Modelling), Program Sustainable Development
PODO-II (CP-TR-18 Sustainibility effects of
Traffic Management Systems) Direct contract
research Verhaert, Electrabel, Elia, Data4s,
IPCOS. JS is a professor at K.U.Leuven Belgium
and a postdoctoral researcher with FWO Flanders.
BDM and JWDW are full professors at K.U.Leuven
Belgium.
4I. Overview
- Goal of the Presentation
- Overview Intuition
- Demonstration LS-SVMlab
- Pinpoint research challenges
- Preparation NIPS 2002
- Research results and challenges
- Towards applications
- Overview LS-SVMlab
5I.2 Overview research
- Learning, generalization, extrapolation,
identification, smoothing, modeling - Prediction (black box modeling)
- Point of view Statistical Learning, Machine
Learning, Neural Networks, Optimization, SVM
6I.2 Type, Target, Topic
7I.3 Towards applications
- System identification
- Financial engineering
- Biomedical signal processing
- Datamining
- Bio-informatics
- Textmining
- Adaptive signal processing
8I.4 LS-SVMlab
9I.4 LS-SVMlab (2)
- Starting points
- Modularity
- Object Oriented Functional Interface
- Basic bricks for advanced research
- Website and tutorial
- Reproducibility (preprocessing)
10II. Classification
- Learn the decision function associated with a
set of labeled data points to predict the values
of unseen data - Least Squares Support Vector Machines
- Bayesian Framework
- Different norms
- Coding schemes
11II.1 Least Squares Support vector Machines
(LS-SVM (?,?))
- Least Squares cost-function regularization
equality constraints - Non-linearity by Mercer kernels
- Primal-Dual Interpretation (Lagrange multipliers)
Primal parametric Model
Dual non-parametric Model
12II.1 LS-SVM (?,?)
Learning representations from relations
13II.2 Bayesian Inference
- Bayes rule (MAP)
- Closed form formulas
- Approximations - Hessian in optimum
- - Gaussian distribution
- Three levels of posteriors
14II.3 SVM formulations norms
- 1 norm inequality constraints SVM
- extensions to any convex cost-function
- 2 norm equality constraints LS-SVM
- weighted versions
15II.4 Coding schemes
Multi-class Classification task ? (multiple)
binary classifiers
Labels
16III. Regression
- Learn the underlying function from a set of
data points and its corresponding noisy targets
in order to predict the values of unseen data - LS-SVM(?,?)
- Cross-validation (CV)
- Bayesian Inference
- Robustness
17III.1 LS-SVM(?,?)
- Least Squares cost-function Regularization
Equality constraints - Mercer kernels
- Lagrange multipliers
- Primal Parametric ? Dual Non-parametric
18III.1 LS-SVM(?,?) (2)
- Regularization parameter
- Do not fit noise (overfitting)!
- trade-off noise and information
19III.2 Cross-validation (CV)
- How to estimate generalization power of model?
- Division training set test set
- Repeated division Leave-one-out CV (fast
implementation) - L-fold cross-validation
- Generalized Cross-validation (GCV)
- Complexity criteria AIC, BIC,
20III.2 Cross-validation Procedure (CVP)
- How to optimize model for optimal
generalization performance - Trade-off fitting model complexity
- Kernel parameters
- Optimization routine?
21III.1 LS-SVM(?,?) (3)
- Kernel type and parameter
- Zoölogy as elephantism and non-elephantism
- Model Comparison
- By cross-validation or Bayesian Inference
22III.3 Applications
- ok, but does it work?
- Soft4s
- Together with O. Barrero, L. Hoegaerts, IPCOS
(ISMC), BASF, B. De Moor - Soft-sensor
- ELIA
- Together with O. Barrero, I.Goethals, L.
Hoegaerts, I.Markovsky, T. Van Gestel, ELIA, B.
De Moor - Prediction short and long term electricity
consumption
23III.2 Bayesian Inference
- Bayes rule (MAP)
- Closed form formulas
- Three levels of posteriors
24III.4 Robustness
- How to build good models in the case of
non-Gaussian noise or outliers - Influence function
- Breakdown point
- How
- De-preciating influence of large residuals
- Mean - Trimmed mean Median
- Robust CV, GCV, AIC,
25IV. Unsupervised Learning
- Extract important features from the unlabeled
data - Kernel PCA and related methods
- Nyström approximation
- From Dual to primal
- Fixed size LS-SVM
26IV.1 Kernel PCA
- Principal Component Analysis Kernel
based PCA
27IV.2 Kernel PCA (2)
- Primal Dual LS-SVM style formulations
- For Kernel PCA, CCA, PLS
28IV.2 Nyström approximation
- Sampling of integral equation
- Approximating Feature map for Mercer kernel
29IV.3 Fixed Size LS-SVM
?
30V. Time-series
- Learn to predict future values given a sequence
of past values - NARX
- Recurrent vs. feedforward
31V.1 NARX
- Reducible to static regression
- CV and Complexity criteria
- Predicting in recurrent mode
- Fixed size LS-SVM (sparse representation)
32V.1 NARX (2)
- Santa Fe Time-series competition
33V.2 Recurrent models?
- How to learn recurrent dynamical models?
- Training cost Prediction cost?
- Non-parametric model class?
- Convex or non-convex?
- Hyper-parameters?
34VI.0 References
- J. A. K. Suykens, T. Van Gestel, J. De Brabanter,
B. De Moor J. Vandewalle (2002), Least Squares
Support Vector Machines, World Scientific. - V. Vapnik (1995), The Nature of Statistical
Learning Theory, Springer-Verlag. - B. Schölkopf A. Smola (2002), Learning with
Kernels, MIT Press. - T. Poggio F. Girosi (1990), Networks for
approximation and learning'', Proc. of the IEEE,
, 78, 1481-1497. - N. Cristianini J. Shawe-Taylor (2000), An
Introduction to Support Vector Machines,
Cambridge University Press.
35VI. Conclusions
- Non-linear Non-parametric learning as a
generalized methodology - Non-parametric Learning
- Intuition Formulations
- Hyper-parameters
- LS-SVMlab
Questions?