Title: Training Neural Networks
1Training Neural Networks
- Robert Turetsky
- Columbia University rjt72_at_columbia.edu
- Systems, Man and Cybernetics Society
- IEEE North Jersey Chapter
- December 12, 2000
2Objective
- Introduce fundamental concepts in Artificial
Neural Networks - Discuss methods of training ANNs
- Explore some uses of ANNs
- Assess the accuracy of artificial neurons as
models for biological neurons - Discuss current views, ideas and research
3Organization
- Why Neural Networks?
- Single TLUs
- Training Neural Nets Back propagation
- Working with Neural Networks
- Modeling the neuron
- The multi-agent architecture
- Directions and destinations
4Why Neural Networks?
5The Von Neumann architecture
- Memory for programs and data
- CPU for math and logic
- Control unit to steer program flow
6Von Neumann vs. ANNs
Von Neumann
Neural Net
- Follows Rules
- Solution can/must be formally specified
- Cannot generalize
- Not error tolerant
- Learns from data
- Rules on data are not visible
- Able to generalize
- Copes well with noise
7Circuits that LEARN
- Three types of learning
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- Hebbian networks reward good paths, punish
bad paths - Train neural net by adjusting weights
- PAC (Probably Approximately Correct) theory
Kerns Vazirani 1994, Haussler 1990
8Supervised Learning Concepts
- Training set ? Input/output pairs
- Supervised learning because we know the correct
action for every input in ? - We want our Neural Net to act correctly in as
many training vectors as possible - Choose training set to be a typical set of inputs
- The Neural net will (hopefully) generalize to all
inputs based on training set - Validation Set Check to see how well our
training can generalize
9Neural Net Applications
- Miros Corp. Face recognition
- Handwriting Recognition
- BrainMaker Medical Diagnosis
- Bushnell Neural net for combinational automatic
test pattern generation - ALVINN Knight Rider in real life!
- Getting rich LBS Capital Management predicts the
SP 500
10History of Neural Networks
- 1943 McCullough and Pitts - Modeling the Neuron
for Parallel Distributed Processing - 1958 Rosenblatt - Perceptron
- 1969 Minsky and Papert publish limits on the
ability of a perceptron to generalize - 1970s and 1980s ANN renaissance
- 1986 Rumelhart, Hinton Williams present
backpropagation - 1989 Tsividis Neural Network on a chip
11Threshold Logic Units
- The building blocks of
- Neural Networks
12The TLU at a glance
- TLU Threshold Logic Unit
- Loosely based on the firing of biological neurons
- Many inputs, one binary output
- Threshold Biasing function
- Squashing function compresses infinite input into
range of 0 - 1
13The TLU in Action
14Training TLUs Notation
- ? Threshold of TLU
- X Input Vector
- W Weight Vector
- s X Wie if s ? ?, op 1 if s lt ?, op
0 - d desired output of TLU
- f output of TLU with current X and W
15Augmented Vectors
- Motivation Train threshold ? at the same time as
input weights - X ? W ? ? is the same as X ? W - ? ? 0
- Set threshold of TLU 0
- Augment W W w1, w2, wn, -?
- Augment X X x1, x2, .. xn, 1
- New TLU equation X W ? 0(for augmented X and
W)
16Gradient Descent Methods
- Error Function How far off are we?
- Example Error function
- ? depends on weight values
- Gradient Descent Minimize error by moving
weights along the decreasing slope of error - The Idea iterate through the training set and
adjust the weights to minimize the gradient of
the error
17Gradient Descent The Math
- We have ? (d - f)2
- Gradient of ?
- Using the chain rule
- Since , we have
- Also
- Which finally gives
18Gradient Descent Back to reality
- So we have
- The problem ?f / ?s is not differentiable
- Three solutions
- Ignore It The Error-Correction Procedure
- Fudge It Widrow-Hoff
- Approximate it The Generalized Delta Procedure
19Training a TLU Example
- Train a neural network to match the following
linearly separable training set
20Behind the scenes Planes and Hyperplanes
21What can a TLU learn?
22Linearly Separable Functions
- A single TLU can implement any Linearly separable
function - AB is Linearly separable
- A ? B is not
23NEURAL NETWORKS
- An Architecture for Learning
24Neural Network Fundamentals
- Chain multiple TLUs together
- Three layers
- Input Layer
- Hidden Layers
- Output Layer
- Two classifications
- Feed-Forward
- Recurrent
25Neural Network Terminology
26Training ANNs Backpropagation
- Main Idea distribute the error function across
the hidden layers, corresponding to their effect
on the output - Works on feed-forward networks
- Use sigmoid units to train, and then we can
replace with threshold functions.
27Back-Propagation Birds-eye view
- Repeat
- Choose training pair and copy it to input layer
- Cycle that pattern through the net
- Calculate error derivative between output
activation and target output - Back propagate the summed product of the weights
and errors in the output layer to calculate the
error on the hidden units - Update weights according to the error on that
unit - Until error is low or the net settles
28Back-Prop Sharing the Blame
- We want to assign
- Wij weights of i-th sigmoid in j-th layer
- Xj-1 inputs to our TLU (outputs from previous
layer) - cij learning rate constant of i-th sigmoid in
j-th layer - ?ij sensitivity of the network output to
changes in the input of our TLU - Important equation
29Back-Prop Calculating ?ij
- For the output layer ?ij ?k
- ?ij ?k (d-f)?f/ ?sk
- ?ij (d-f)f(1-f) for sigmoid
- Therefore Wk lt- Wk ck (d - f) f (1 -f ) Xk-1
- For the hidden layers
- See Nilsson 1998 for calculation
- Recursive Formula base case ?k (d-f)f(1-f)
30Back-Prop Example
- Train a 2-layer Neural net with the following
input - x10 1, x20 0, x30 1, d 0
- x10 0, x20 0, x30 1, d 1
- x10 0, x20 1, x30 1, d 0
- x10 1, x20 1, x30 1, d 1
31Back-Prop Problems
- Learning rate is non-optimal
- One solution Learn the learning rate
- Network Paralysis Weights grow so large that
fij(1-fij) --gt 0, and the net never learns - Local Extrema Gradient Descent is a greedy
method - These problems are acceptable in many cases, even
if workarounds cant be found
32Back-Prop Momentum
- We want to choose a learning rate that is as
large as possible - Speed up convergence
- Avoid oscillations
- Add momentum term dependent on past weight
change
33Another Method ALOPEX
- Used for visual receptive field mapping by
Tzanakou and Harth,1973 - Originally developed for receptive field mapping
in the visual pathway of frogs - The main ideas
- Use cross-correlation to determine a direction of
movement in gradient field - Add a random element to avoid local extrema
34WORKING WITHNEURAL NETS
35ANN Project Lifecycle
- Task identification and design
- Feasibility
- Data Coding
- Network Design
- Data Collection
- Data Checking
- Training and Testing
- Error Analysis
- Network Analysis
- System Implementation
36ANN Design Tradeoffs
- A good design will find a balance between these
two extremes!
37ANN Design Balance Depth
- Too few hidden layers will cause errors in
accuracy - Too many errors will cause errors in
generalization!
38CLICK!
39Wetware Biological Neurons
40The Process Neuron Firing
- Each electrical signal received at a synapse
causes neurotransmitter release - The neurotransmitter travels along the synaptic
cleft and received by the other neuron at a
receptor site - Post-Synaptic-Potential (PSP) either increases
(hyperpolarizes) or decreases (depolarizes) the
polarization of the post-synaptic membrane (the
receptors) - In hyperpolarization, the spike train is
inhibited. In depolarization, the spike train is
excited.
41The Process Part 2
- Each PSP travels along the dendrite of the new
neuron, and spreads itself over the cell body - When the effects of the PSP reaches the
axon-hillock, it is summed with other PSPs. - If the sum is greater than a certain threshold,
the neuron fires a spike along the axon - Once the spike reaches the synapse of an efferent
neuron, the process starts in that neuron
42The neuron to the TLU
- Cell Body (Soma) accumulator plus its threshold
function - Dendrites inputs to the TLU
- Axon output of the TLU
- Information Encoding
- Neurons use frequency
- TLUs use value
43Modeling the Neuron Capabilities
- Humans and Neural Nets are both
- Good at pattern recognition
- Bad at mathematical calculation
- Good at compressing lots of information into a
yes/no decision - Taught via training period
- TLUs win because neurons are slow
- Wetware wins because we have a cheap source of
billions of neurons
44Do ANNs model neuron structures?
- No Hundreds of types of specialized nerons, only
one TLU - No Weights to neural threshold controlled by
many neurotransmitters, not just one - Yes Most of the complexity in the neuron is
devoted to sustaining life, not information
processing - Maybe There is no real method for
backpropagation in the brain. Instead, firing of
neurons increases connection strength
45High Level Agent Architecture
- Our minds are composed of a series of
non-intelligent agents - The hierarchy, interconnections, and interactions
between the agents creates our intelligence - There is no one agent in control
- We learn by forming new connections between
agents - We improve by dealing with agents at a higher
level, ie creating mental scripts
46Agent Hierarchy Playing with Blocks
From the outside, Builder knows how to build
towers. From inside, Builder just turns on other
agents.
47How We Remember K-Line Theory
48New Knowledge Connections
- Sandcastles in the sky Everything we know is
connected to everything else we know - Knowledge is acquired by making connections new
between things we already know
49Learning Meaning
- Uniframing Combining several descriptions into
one - Accumulating Collecting incompatible
descriptions - Reformulating modifying a descriptions
character - Transforming bridging between structures and
functions or actions
50The Exception Principle
- It rarely pays to tamper with a rule that nearly
always works. It is better to complement it with
an accumulation of exceptions - Birds can Fly
- Birds can fly unless they are penguins and
ostriches
51The Exception Principle Overfitting
- Birds can fly, unless they are penguins and
ostriches, or if they happen to be dead, or have
broken wings, or are confined to cages, or have
their feet stuck in cement, or have undergone
experiences so dreadful as to render them
psychologically incapable of flight - In real thought, finding exceptions to everything
is usually unnecessary.
52Minskys Princples
- Most new knowledge is simply finding a new way to
relate things we already know - There is nothing wrong with circular logic or
having imperfect rules - Any idea will seem self-evident... once youve
forgotten learning it. - Easy things are hard Were least aware of what
our minds do best
53TO THE FUTURE AND BEYOND
- Why you should be nice
- to your computer
54Im lonely and Im bored.Come play with me!
55Computers are Dumb
- Deep Blue might be able to win at chess, but it
wont know to come in from the rain. - Computers can only know what theyre told, or
what theyre told to learn - Computers lack a sense of mortality and a
physical self with which to preserve - All of this will change when computers can reach
consciousness
56I, Silicon Consciousness
- Kurzweil By 2019, a 1000 computer will be
equivalent to the human brain. - By 2029, machines will claim to be conscious. We
will believe them. - By 2049, nanobot swarms will make virtual reality
obsolete in real reality. - By 2099, man and machine will have completely
merged.
57You mean to tell me?????
- We humans will gradually introduce machines into
our bodies, as implants - Our machines will grow more human as they learn,
and learn to design themselves - The Neo-Luddite scenarios
- AI succeeds in creating conscious beings. All
life is at the mercy of the machines. - Humans retain control workers are obsolete. The
power to decide the fate of the masses is now
completely in the hands of the elite.
58Neural Networks Conclusions
- Neural Networks are a powerful tool for
- Pattern recognition
- Generalizing to a problem
- Machine learning
- Training Neural Networks
- Can be done, but exercise great care
- Still has room for improvement
- Understanding and creating consciousness?
- Still working on it )