Sieci neuronowe

About This Presentation

Transcript and Presenter's Notes

Title: Sieci neuronowe

1
Sieci neuronowe bezmodelowa analiza danych?

K. M. Graczyk
IFT, Uniwersytet Wroclawski
Poland

2
Why Neural Networks?

Inspired by C. Giunti (Torino)
PDFs by Neural Network
Papers of Forte et al.. (JHEP 0205062,200, JHEP
0503080,2005, JHEP 0703039,2007,
Nucl.Phys.B8091-63,2009).
A kind of model independent way of fitting data
and computing associated uncertainty
Learn, Implement, Publish (LIP rule)
Cooperation with R. Sulej (IPJ, Warszawa) and P.
Plonski (Politechnika Warszawska)
NetMaker
GrANNet ) my own C library

3
Road map

Artificial Neural Networks (NN) idea
Feed Forward NN
PDFs by NN
Bayesian statistics
Bayesian approach to NN
GrANNet

4
Inspired by Nature
The human brain consists of around 1011 neurons
which are highly interconnected with around 1015
connections
5
Applications

Function approximation, or regression analysis,
including time series prediction, fitness
approximation and modeling.
Classification, including pattern and sequence
recognition, novelty detection and sequential
decision making.
Data processing, including filtering, clustering,
blind source separation and compression.
Robotics, including directing manipulators,
Computer numerical control.

6
Artificial Neural Network
the simplest example ? Linear Activation
Functions ? Matrix
7
threshold
8
activation functions

Heavside function q(x)
? 0 or 1 signal
sigmoid function
tanh()
linear

signal is amplified
Signal is weaker
9
architecture

3 -layers network, two hidden
1211
221 121 par9

Bias neurons, instead of thresholds
Signal One

F(x)
x
Linear Function
Symmetric Sigmoid Function
10
Neural Networks Function Approximation

The universal approximation theorem for neural
networks states that every continuous function
that maps intervals of real numbers to some
output interval of real numbers can be
approximated arbitrarily closely by a multi-layer
perceptron with just one hidden layer. This
result holds only for restricted classes of
activation functions, e.g. for the sigmoidal
functions. (Wikipedia.org)

11
A map from one vector space to another
12
Supervised Learning

Propose the Error Function
in principle any continuous function which has a
global minimum
Motivated by Statistics Standard Error Function,
chi2, etc,
Consider set of the data
Train given NN by showing the data ? marginalize
the error function
back propagation algorithms
An iterative procedure which fixes weights

13
Learning Algorithms

Gradient Algorithms
Gradient descent
RPROP (Ridmiller Braun)
Conjugate gradients
Look at curvature
QuickProp (Fahlman)
Levenberg-Marquardt (hessian)
Newtonian method (hessian)
Monte Carlo algorithms (based on Marcov chain
algorithm)

14
Overfitting

More complex models describe data in better way,
but lost generalities
bias-variance trade-off
Overfitting ? large values of the weights
Compare with the test set (must be twice larger
than original)
Regularization ? additional penalty term to error
function

Decay rate
15
What about physics
Problems Some general constraints Model
Independent Analysis Statistical Model ? data ?
Uncertainty of the predictions
16
Fitting data with Artificial Neural Networks

The goal of the network training is not to learn
on exact representation of the training data
itself, but rather to built statistical model for
the process which generates the data
C. Bishop, Neural Networks for Pattern
Recognition

17
Parton Distribution Function with NN

Some method but

18
Parton Distributions Functions S. Forte, L.
Garrido, J. I. Latorre and A. Piccione, JHEP 0205
(2002) 062

A kind of model independent analysis of the data
Construction of the probability density PG(Q2)
in the space of the structure functions
In practice only one Neural Network architecture
Probability density in the space of parameters of
one particular NN

But in reality Forte at al.. did
19
The idea comes from W. T. Giele and S. Keller
Training Nrep neural networks, one for each set
of Ndat pseudo-data
The Nrep trained neural networks ? provide a
representation of the probability measure in the
space of the structure functions
20
uncertainty
correlation
21
10, 100 and 1000 replicas
22
short
enough long
too long
30 data points, overfitting
23
(No Transcript)
24
My criticism

The simultaneous use of artificial data and chi2
error function overestimates uncertainty?
Do not discuss other NN architectures
Problems with overfitting (a need of test set)
Relatively simple approach, comparing with the
present techniques in NN computing.
The uncertainty of the model predictions must be
generated by the probability distribution
obtained for the model then the data itself

25
GraNNet Why?

I stole some ideas from FANN
C Library, easy in use
User defined Error Function (any you wish)
Easy access to units and their weights
Several ways for initiating network of given
architecture
Bayesin learning
Main objects
Classes NeuralNetwork, Unit
Learning algorithms so far QuickProp, Rprop,
Rprop-, iRprop-, iRprop,,
Network Response Uncertainty (based on Hessian)
Some restarting and stopping simple solutions

26
Structure of GraNNet

Libraries
Unit class
Neural_Network class
Activation (activation and error function
structures)
Learning algorithms
RProp, RProp-, iRProp, RProp-, Quickprop,
Backprop
generatormt
TNT inverse matrix package

27
Bayesian Approach

common sense reduced to calculations

28
Bayesian Framework for BackProp NN, MacKay,
Bishop,

Objective Criteria for comparing alternative
network solutions, in particular with different
architectures
Objective criteria for setting decay rate a
Objective choice of regularizing function Ew
Comparing with test data is not required.

29
Notation and Conventions
30
Model Classification

A collection of models, H1, H2, , Hk
We believe that models are classified by P(H1),
P(H2), , P(Hk) (sum to 1)
After observing data D ? Bayes rule ?
Usually at the beginning P(H1)P(H2) P(Hk)

31
Single Model Statistics

Assume that model Hi is the correct one
The neural network A with weights w is considered
Task 1 Assuming some prior probability of w,
after including data, construct Posterior
Task 2 consider the space of hypothesis and
construct evidence for them

32
Hierarchy
33
Constructing prior and posterior functions
Weight distribution!!!
likelihood
Prior
Posterior probability
w0
34
Computing Posterior
hessian
Covariance matrix
35
How to fix proper a?

Two ideas
Evidence Approximation (MacKay)
Hierarchical
Find wMP
Find aMP
Perform analytically integrals over a

If sharply peaked!!!
36
Getting aMP
The effective number of well-determined parameters
Iterative procedure during training
37
Bayesian Model Comparison Occam Factor
Occam Factor

The log of Occam Factor ? amount of
Information we gain after data have arrived
Large Occam factor ?? complex models
larger accessible phase space (larger range of
posterior)
Small Occam factor ?? simple models
small accessible phase space (larger range of
posterior)

Best fit likelihood
38
Evidence
39
(No Transcript)
40
131 network preferred by data
41
(No Transcript)
42
131 seems to be preferred by the data

Write a Comment

User Comments (0)

About PowerShow.com

Sieci neuronowe PowerPoint PPT Presentation