Neural Networks - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Neural Networks

Description:

N.B. thresholding = degenerated' sigmoid. ai in game programming it university of copenhagen ... Learning in Sigmoid Perceptrons ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 38

Provided by: Mar5334

Category:

more less

Transcript and Presenter's Notes

Title: Neural Networks

1
Neural Networks

Marco Loog

2
Previously in Statistical Methods...

Agents can handle uncertainty by using the
methods of probability and decision theory
But first they must learn their probabilistic
theories of the world from experience...

3
Previously in Statistical Methods...

Key Concepts
Data evidence, i.e., instantiation of one or
more random variables describing the domain
Hypotheses probabilistic theories of how the
domain works

4
Previously in Statistical Methods...

Outline
Bayesian learning
Maximum a posteriori and maximum likelihood
learning
Instance-based learning
Neural networks...

5
Todays Remainder

Some more from previous lecture and some new
Network structure
Perceptrons
Multilayer Feed-Forward Neural Networks
Learning Networks?

6
Neural Networks and Games
7
Neural Networks and Games
8
Neural Networks and Games
9
Neural Networks and Games
10
Neural Networks and Games
11
Neural Networks and Games
12
Neural Networks and Games
13
Neural Networks and Games
14
So...

...how about this dark art?
According to Robert Hecht-Nielsen, a neural
network is simply a computing system made up of
a number of simple, highly interconnected
processing elements, which process information by
their dynamic state response to external inputs
We skip biology and provide bare basics

15
Network Structure

Input units
Hidden units
Output units

16
Network Structure

Feed-forward networks
Recurrent networks
Feedback from output units to input

17
Feed-Forward Network

Feed-forward network a parameterized family of
nonlinear functions
g is activation function
Ws are weights to be adapted, i.e., the learning

18
Activation Functions

Often have form of a step function a threshold
or sigmoid
N.B. thresholding degenerated sigmoid

19
Perceptrons

Single-layer neural network
Expressiveness
Perceptron with g step function can learn AND,
OR, NOT, majority, but not XOR

20
Learning in Sigmoid Perceptrons

The idea is to adjust the weights so as to
minimize some measure of error on the training
set
Learning optimization of the weights
This can be done using general optimization
routines for continuous spaces

21
Learning in Sigmoid Perceptrons

The idea is to adjust the weights so as to
minimize some measure of error on the training
set
Error measure most often used for NN is the sum
of squared errors

22
Learning in Sigmoid Perceptrons

Error measure most often used for NN is the sum
of squared errors
Perform optimization search by gradient descent
Weight update rule ? is learning rate

23
Simple Comparison
24
Some Remarks

Thresholded perceptron learning rule converges
to a consistent function for any linearly
separable data set
Sigmoid perceptron output can be interpreted as
conditional probability
Also interpretation in terms of maximum
likelihood ML estimation possible

25
Multilayer Feed-Forward NN

Network with hidden units
Adding hidden layers enlarges the hypothesis
space
Most common single hidden layer

26
Expressiveness

2-input perceptron
2-input single-hidden-layer neural network by
adding perceptron outputs

27
Expressiveness

With a single, sufficiently large, hidden layer
it is possible to approximate any continuous
function
With two layers, discontinuous functions can be
approximated as well
For particular networks it is hard to say what
exactly can be represented

28
Learning in Multilayer NN

Back-propagation is used to perform weight
updates in the network
Similar to perceptron learning
Major difference is that output error is clear,
but how to measure the error at the nodes in the
hidden layers?
Additionally, should deal with multiple outputs

29
Learning in Multilayer NN

At output layer weight-update rule is the similar
as for perceptron... but then for multiple
outputs iwhere
Idea of back-propagation every hidden unit
contributes some fraction to the error of the
output node to which it connects

30
Learning in Multilayer NN

... contributes some fraction to the error of
the output node to which it connects
Thus errors are divided according to connection
strength or weights
Update rule

31
E.g. Learning...

Training curve for 100 restaurant examples
exact fit

32
Learning NN Structures?

How to find the best network structure?
Too big results in lookup table behavior /
overtraining
Too small in undertraining / not exploiting the
full expressiveness
Possibility try different structures and
validate using, for example, cross-validation
But which different structures to consider?
Start with fully connected network and remove
nodes optimal brain damage
Growing larger networks from smaller ones, e.g.
tiling and NEAT

33
Learning NN Structures?
34
Finally, Some Remarks

NN possibly complex nonlinear function with
many parameters that have to be tuned
Problems slow convergence, local minima
Back-propagation explained, but other
optimization schemes are possible
Perceptron can handle linear separable functions
Multilayer NN can represent any kind of function

35
And More...