Minimal Neural Networks - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Minimal Neural Networks

Description:

Let's consider a model of a system and an observation of the ... Gauss pruning: Laplace pruning: Cauchy pruning: N is the number of components of the vectors ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 29

Provided by: fda31

Category:

more less

Transcript and Presenter's Notes

Title: Minimal Neural Networks

1
Minimal Neural Networks

Support vector machines and Bayesian learning for
neural networks

Peter Andras andrasp_at_ieee.org
2
Bayesian neural networks I.
The Bayes rule
Lets consider a model of a system and an
observation of the system, an event. The a
posteriori probability of correctness of the
model, after the observation of the event, is
proportional to the product of the a priori
correctness of the model and the probability of
the event conditioned by the correctness of the
model.
Mathematically
where ? is the parameter of the model H? and D is
the observed event
3
Bayesian neural networks II.
Best model model with highest a posteriori
probability of correctness
Model selection by optimizing the formula
4
Bayesian neural networks III.
Application to neural networks
g? is the function represented by the neural
network,
where ? is the vector of all parameters of the
network
is the observed event
we suppose normal distribution for the data
conditioned by the validity of a model, i.e., the
observed values yi are normally distributed
around g?(xi), if ? is the correct parameter
vector
5
Bayesian neural networks IV.
By making the calculations we get
and the new formula for optimization is
6
Bayesian neural networks V.
The equivalence of the regularization and
Bayesian model selection
Regularization formula
Bayesian optimization formula
Equivalence
Both represents a priori information about the
correct solution
7
Bayesian neural networks VI.
Bayesian pruning by regularization
Gauss pruning
Laplace pruning
Cauchy pruning
N is the number of components of the ? vectors
8
Support vector machines - SVM I.
Linear separable classes
- many separators - there is an optimal separator
9
Support vector machines - SVM II.
How to find the optimal separator ?
- support vectors - overspecification
Property one less support vector
new optimal separator
10
Support vector machines - SVM III.
We look for minimal and robust separators. These
are minimal and robust models of the data. The
full data set is equivalent with the set of the
support vectors with respect to the specification
of the minimal robust model.
11
Support vector machines - SVM IV.
Mathematical problem formulation I.
we represent the separator as a pair (w,b), where
w is vector and b is a scalar
we look w and b such that they satisfy
The support vectors are those xi-s for which this
inequality is in fact equality.
12
Support vector machines - SVM V.
Mathematical problem formulation II.
The distances form the origo of the hyper-planes
of the support vectors are
The distance between the two planes is
13
Support vector machines - SVM VI.
Mathematical problem formulation III.
Optimal separator the distance between the two
hyper-planes is maximal
Optimization
with the restrictions that or in other form
14
Support vector machines - SVM VII.
Mathematical problem formulation IV.
Complete optimization formula, using Lagrange
multipliers
15
Support vector machines - SVM VIII.
Mathematical problem formulation V.
Writing the optimality conditions for w and b we
get
The dual problem is
The support vectors are those xi-s for which ?i
is strictly positive
16
Support vector machines - SVM IX.
Graphical interpretation
We search for the tangent point of a
hyper-ellipsoid with the positive space quadrant
17
Support vector machines - SVM X.
How to solve the support vector problem ?
Optimization with respect to the ?-s
- gradient method - Newton and quasi-Newton
methods
We get as result
- the support vectors - the optimal linear
separator
18
Support vector machines - SVM XI.
Implications for artificial neural networks
- robust perceptron (low sensitivity to noise) -
minimal linear classificatory neural network
19
Support vector machines - SVM XII.
What can we do if the boundary is nonlinear ?
Idea
transform the data vectors to a space where the
separator is linear
20
Support vector machines - SVM XIII.
The transformation many times is made to an
infinite dimensional space, usually a function
space. Example x ? cos(uTx)
21
Support vector machines - SVM XIV.
The new optimization formulas are
22
Support vector machines - SVM XIV.
How to handle the products of the transformed
vectors ?
Idea use a transformation that fits the Mercer
theorem
Let
then K has a decomposition
Mercer theorem
where
and H is a function space
if and only if
for each
23
Support vector machines - SVM XV.
Optimization formula with transformation that
fits the Mercer theorem
The form of the solution
the b is determined from an equation valid for a
support vector
24
Support vector machines - SVM XVI.
Examples of transformations and kernels
a. b. c.
25
Support vector machines - SVM XVII.
Other typical kernels
26
Support vector machines - SVM XVIII.
Summary of main ideas

look for minimal complexity classification
transform the data to another space where the
class boundaries are linear
use Mercer kernels

27
Support vector machines - SVM XIX.
Practical issues

the global optimization doesnt work with large
amount of data ? sequential optimization with
chunks of the data
the resulted models are minimal complexity
models, they are insensitive to noise and keep
the generalization ability of the more complex
models
applications character recognition, economic
forecasting

28
Regularization neural networks
General optimization vs. optimization over the
grid
The regularization operator specifies the grid -
we look for functions that satisfy Tg20 - in
the relaxed case the regularization operator is
incorporated as a constraint in the error
function

Write a Comment

User Comments (0)