Title: Neural Networks
1Neural Networks
2Previously in Statistical Methods...
- Agents can handle uncertainty by using the
methods of probability and decision theory - But first they must learn their probabilistic
theories of the world from experience...
3Previously in Statistical Methods...
- Key Concepts
- Data evidence, i.e., instantiation of one or
more random variables describing the domain - Hypotheses probabilistic theories of how the
domain works
4Previously in Statistical Methods...
- Outline
- Bayesian learning
- Maximum a posteriori and maximum likelihood
learning - Instance-based learning
- Neural networks...
5Todays Remainder
- Some more from previous lecture and some new
- Network structure
- Perceptrons
- Multilayer Feed-Forward Neural Networks
- Learning Networks?
6Neural Networks and Games
7Neural Networks and Games
8Neural Networks and Games
9Neural Networks and Games
10Neural Networks and Games
11Neural Networks and Games
12Neural Networks and Games
13Neural Networks and Games
14So...
- ...how about this dark art?
- According to Robert Hecht-Nielsen, a neural
network is simply a computing system made up of
a number of simple, highly interconnected
processing elements, which process information by
their dynamic state response to external inputs - We skip biology and provide bare basics
15Network Structure
- Input units
- Hidden units
- Output units
16Network Structure
- Feed-forward networks
- Recurrent networks
- Feedback from output units to input
17Feed-Forward Network
- Feed-forward network a parameterized family of
nonlinear functions - g is activation function
- Ws are weights to be adapted, i.e., the learning
18Activation Functions
- Often have form of a step function a threshold
or sigmoid - N.B. thresholding degenerated sigmoid
19Perceptrons
- Single-layer neural network
- Expressiveness
- Perceptron with g step function can learn AND,
OR, NOT, majority, but not XOR
20Learning in Sigmoid Perceptrons
- The idea is to adjust the weights so as to
minimize some measure of error on the training
set - Learning optimization of the weights
- This can be done using general optimization
routines for continuous spaces
21Learning in Sigmoid Perceptrons
- The idea is to adjust the weights so as to
minimize some measure of error on the training
set - Error measure most often used for NN is the sum
of squared errors
22Learning in Sigmoid Perceptrons
- Error measure most often used for NN is the sum
of squared errors - Perform optimization search by gradient descent
- Weight update rule ? is learning rate
23Simple Comparison
24Some Remarks
- Thresholded perceptron learning rule converges
to a consistent function for any linearly
separable data set - Sigmoid perceptron output can be interpreted as
conditional probability - Also interpretation in terms of maximum
likelihood ML estimation possible
25Multilayer Feed-Forward NN
- Network with hidden units
- Adding hidden layers enlarges the hypothesis
space - Most common single hidden layer
26Expressiveness
- 2-input perceptron
- 2-input single-hidden-layer neural network by
adding perceptron outputs
27Expressiveness
- With a single, sufficiently large, hidden layer
it is possible to approximate any continuous
function - With two layers, discontinuous functions can be
approximated as well - For particular networks it is hard to say what
exactly can be represented
28Learning in Multilayer NN
- Back-propagation is used to perform weight
updates in the network - Similar to perceptron learning
- Major difference is that output error is clear,
but how to measure the error at the nodes in the
hidden layers? - Additionally, should deal with multiple outputs
29Learning in Multilayer NN
- At output layer weight-update rule is the similar
as for perceptron... but then for multiple
outputs iwhere - Idea of back-propagation every hidden unit
contributes some fraction to the error of the
output node to which it connects
30Learning in Multilayer NN
- ... contributes some fraction to the error of
the output node to which it connects - Thus errors are divided according to connection
strength or weights - Update rule
31E.g. Learning...
- Training curve for 100 restaurant examples
exact fit
32Learning NN Structures?
- How to find the best network structure?
- Too big results in lookup table behavior /
overtraining - Too small in undertraining / not exploiting the
full expressiveness - Possibility try different structures and
validate using, for example, cross-validation - But which different structures to consider?
- Start with fully connected network and remove
nodes optimal brain damage - Growing larger networks from smaller ones, e.g.
tiling and NEAT
33Learning NN Structures?
34Finally, Some Remarks
- NN possibly complex nonlinear function with
many parameters that have to be tuned - Problems slow convergence, local minima
- Back-propagation explained, but other
optimization schemes are possible - Perceptron can handle linear separable functions
- Multilayer NN can represent any kind of function
35And More...
- Hard to come up with optimal network
- Learning rate, initial weights, etc. have to be
set - Activation function, etc., has to be chosen
- Not much magic there... nor black art...
- Keine Hekserei, nur Behändigkeit!
36And with that Disappointing Message...
37(No Transcript)