Title: Email:%20zbxu@mail.xjtu.edu.cn
1 ?????????????
??? (??????) Email zbxu_at_mail.xjtu.edu.cn
?? http//zbxu.gr.xjtu.edu.cn
2? ?
- ???????????
- ????????????
- ??????????????
3A New Learning Paradigm LtDAHP(Learning through
Deterministic Assignment of Hidden Parameters)
Zongben Xu (Xian Jiaotong University,
Xian, China) Email zbxu_at_mail.xjtu.edu.cn
Homepage http//zbxu.gr.xjtu.edu.cn
4- A supervised learning problem difficult or easy?
- Can a difficult learning problem be solved more
simply? - Is a linear machine universal?
5Outline
- Some Related Concepts
- LtRAHP Learning through Random Assignment of
Hidden Parameters - LtDAHP Learning through Deterministic Assignment
of Hidden Parameters - Concluding Remarks
6Outline
- Some Related Concepts
- LtRAHP Learning through Random Assignment of
Hidden Parameters - LtDAHP Learning through Deterministic Assignment
of Hidden Parameters - Concluding Remarks
7Some Related Concepts Supervised Learning
Supervised Learning Given a finite number of
input/output samples, to find a function f in a
machine H that approximates the unknown relation
between the input and output spaces.
Black box
ERM
Social Network
Face Recognition
Stock Index Tracking
8Some Related Concepts HP vs BP
- Hidden Parameter Determine the hidden predictors
(non-linear mechanism). - Bright Parameter Determine how the hidden
predictors are linearly - combined
(linear mechanism) -
-
Machine
FNNs
9Some Related Concepts OSL vs TSL
One-Stage Learning HPs and BPs are trained
simultaneously in one stage. Two-Stage Learning
HPs and BPs are trained separately in two
stages.
Machine
Hidden parameter
Bright parameter
10Some Related Concepts Main Concerns
Q1 How to specify assign function?
- Tassign(a)
- Tassign(µ) random assignment
- Tassign(n) deterministic assignment
ADM
LtRAHP
LtDAHP
Q2 Can TSL work?
- Universal approximation?
- Does it degrade the generalization ability?
- Consistency/Convergence ?
- Effectiveness Efficiency?
11Outline
- Some Related Concepts
- LtRAHP Learning through Random Assignment of
Hidden Parameters - LtDAHP Learning through Deterministic Assignment
of Hidden Parameters - Concluding Remarks
12LtRAHP An Overview
Random vector functional-link networks (RVFLs)
(Y. H. Pao, Adaptive Pattern Recognition and
Neural Networks, Reading, MA Addison-Wesley,
1989)
LtRAHP Typicals
Echo-state neural networks (ESNs)
(H. Jaeger and H. Haas. Harnessing nonlinearity
Predicting chaotics systems and saving energy in
wireless communication. Science, 304 78-80,
2004.)
Extreme learning machine (ELM)
(G. B. Huang, Q. Y. Zhu and C. K. Siew. Extreme
learning machineTheory and applications.
Neurocomputing, 70 489-501, 2006.)
Stage 1
LtRAHP Training
Stage 2
Random assignment
13LtRAHP Experimental Evidences
Experimental Support Huang et al. 2006
Training time
TestRMSE of UCI data
Data sets BP SVM ELM
Trianzines 0.5484 0.0086 lt10-4
Housing 6.532 74.184 1.1177
Abalone 1.7562 1.6123 0.0125
Airelone 2.7525 0.6726 0.0591
Census 8.0647 11.251 1.0795
Data sets BP SVM ELM
Trianzines 0.2197 0.1289 0.2002
Housing 0.1285 0.1180 0.1267
Abalone 0.0874 0.0784 0.0824
Airelone 0.0481 0.0429 0.0431
Census 0.0685 0.0746 0.0660
Application Support
Object Recognition Xu et al. 2012
Handwritten Character Recognition Chacko et al.
2012
Face Recoginition Marques et al. 2012
14LtRAHP Really feasible?
A Precise Theoretical Assessment (Xu el al, 2014)
Prior Prior FNN Learning ELM Learning
Approximation Density universal universal
Approximation Complexity
Approximation Complexity
Generalization Consistence universal universal
Generalization Learning rate
Generalization Learning rate
Computational Complexity Computational Complexity very high
Xia Liu, ShaoBo Lin and Zongben Xu, Is extreme
learning machine feasible? A theoretical
assessment (Part I, Part II), IEEE TNNLS, 2014.
15LtRAHP Uncertainty Problem
- Difference in Theoretical Assertions
OSL
LtRAHP
If HPs are randomly assigned according
to
uncertainty
16LtRAHP Uncertainty Problem
Number of samples (m)
Number of samples (m)
Number of hidden nodes (N)
Number of hidden nodes (N)
Uncertainty
Non-uncertainty
17LtRAHP Uncertainty Problem
Is there other TSL scheme which has the same
complexity with LtRAHP while the uncertainty
problem does not occur?
18Outline
- Some Related Concepts
- LtRAHP Learning through Random Assignment of
Hidden Parameters - LtDAHP Learning through Deterministic Assignment
of Hidden Parameters - Concluding Remarks
19LtDAHP Main Idea
Uniformly Random Assignment
Deterministically assign HPs as Equally Spaced
Points (ESP)
The smallest ball containing at least two hidden
parameters
ESP
Points with mesh ratio Wendland, Scattered
Data Approximation, 2006
Can ESPs be practically constructed for any
subset in an arbitrarily high dimensional space?
The largest ball containing no hidden parameters
20LtDAHP Mathematical Foundations (I)
Homeomorphism A continuous function between
two topological spaces that has a
continuous inverse.
ESP Decomposition
Y. Xu. Orthogonal polynomials and cubature
formulae on spheres and on balls. SIAM J. Math.
Anal, 1998.
Hard sphere problem
21LtDAHP Mathematical Foundations (II)
- Hard Sphere Problem Given an integer N, find a
configuration - so as to
maximize the smallest distances among - the points. W. Habicht and B. L. Van der
Waerden, Math. Ann. 1951
- Minimal Riesz t-Energy Configuration
- Problem B. Dahlberg. Duke Math. J., 1978.
Smale's 7th problem How to solve the N-point
minimal Riesz t-energy over Sd-1in a polynomial
time for arbitrary N and t.
S. Smale., Mathematical problems for the next
century. Math. Intel., 1998.
22LtDAHP Mathematical Foundations (II)
Minimal Riesz t-Energy (tgtd-1) Configuration
Problem can be approximately solved by
- Equal-area partition (EAP)
D. Hardin and E. Saff, Discretizing manifold via
minimum energy points. Notices of Amer. Math.
Soc, 2004.
- Recursive zonal sphere partition (RZSP)
P. Leopardi, Distributing points on the sphere
partitions, separation, quadrature and energy.
Doctoral dissertation, University of New South
Wales, 2007.
http//www.mathworks.com/matlabcentral/fileexchang
e/13356-eqsp-recursive-zonal-sphere-partitioning-t
oolbox
Computational Complexity
23LtDAHP FNN Instance
Conventional FNNs
LtDAHP based FNNs
Architecture of LtDAHP
Architecture of FNN
24LtDAHP Learning procedure (FNN instance)
LtDAHP Algorithm
Stage 1
Minimal Riesz (d-1)-energy points on Sd-1 (EZSP)
Best packing points on S1
Stage 2
Architecture of for LtDAHP
25LtDAHP Theoretical assessment (FNN instance)
Generalization Capability
26LtDAHP Theoretical assessment (FNN instance)
Generalization Capability
- LtDAHP If , ,
, - ELM If T is randomly fixed according to
-
Multiple times of trials are required
27Number of samples (m)
Number of hidden nodes (N)
Number of samples (m)
Number of hidden nodes (N)
28 LtDAHP Toy simulations (FNN instance)
Test error
ELM (LtRAHP)
Number of samples (m)
Number of hidden nodes (N)
Training time
LtDAHP
Number of samples (m)
Number of hidden nodes (N)
29 LtDAHP Simulations on UCI data sets
Data sets Training samples Testing samples Attributes
Auto_Price 106 53 15
Stock 633 317 9
Bank(Bank8FM) 2999 1500 8
Delta_ailerons 3565 3564 5
Delta_Elevators 4759 4758 6
Data sets TestRMSE TestRMSE TestRMSE TrainMT TrainMT TrainMT MSparsity MSparsity MSparsity
Data sets SVM ELM LtDAHP SVM ELM LtDAHP SVM ELM LtDAHP
Auto_price 0.0427 0.0324 0.0357 160 3.22 3.22 116.2 240.1 72.2
Stock 0.0478 0.0347 0.0306 5.64 0.325 0.325 26.7 108.1 148.3
Bank8FM 0.0454 0.0446 0.0421 82.1 1.42 1.42 112.9 88.4 60.5
Delta_airelons 0.0422 0.0387 0.0399 60.1 2.32 2.32 169.3 56.2 48.1
Delta_Elevators 0.0534 0.0535 0.0537 684 3.10 3.10 597.6 52.6 52.1
30LtDAHP Real world data experiments
Methods TestRMSE TrainMT Msparsity
ELM 10.89 1989 601
LtDAHP 9.21 1989 512
Million song dataset
Million Song Dataset (Bertin et al.,2011)
describes a learning task of predicting the year
in which a song is released based on audio
features associated with the song. The dataset
consists of 463, 715 training examples and 51,
630 testing examples with d90. Each example is
a song released between 1922 and 2011, and the
song is represented as a vector of timbre
information computed about the song.
31LtDAHP Real world data experiments
Methods TestRMSE TrainMT Msparsity
ELM 0.0037 1523 534
LtDAHP 0.0017 1523 186
Buzz in Social Media
Buzz Prediction dataset is collected
from Twitter, a famous social network and a
micro-blogging platform with exponential growth
and extremely fast dynamics. The task is to
predict the mean number of active discussion
(NAD) from d77 primary features, including
number of created discussions, average number of
author interaction, average discussion length,
and etc. The dataset contains m583, 250
samples, and so a real large scale problem.
32Concluding Remarks
- LtDAHP provides a very efficient way of
overcoming both high computation burden of OSL
and the uncertainty difficulty in LtRAHP. - LtDAHP establishes a new paradigm in which
supervised learning problems can be very simply
but still effectively solved by preassigning the
hidden parameters and solving the bright
parameters only, while not sacrificing the
generalization capability. - Many problems are still open on LtDAHP. Deserve
further study.
33Thank You!