Title: A brief history of connectionism and information processing
1A brief history of connectionism and information
processing
2The Role of the Brain
- Neural inspirations
- Neurons are the basic computational tools of the
brain - simple and dumb processors
- Basic structure
- Dendrite (carry information in)
- Cell body (integrates the information)
- Axon (carries information out)
- Synapse
- The near contact area between an axon and a
dendrite
3Basic operation
- Intercell communication via the synapse
- Can be excitatory (making a receiving neuron more
likely to fire) or inhibitory (making it less
likely to fire) - Typically communicate via neurotransmitters
- Released on the axon side and trigger electrical
changes on the dendrite side - Neurologists believed that the basic unit of
information is the rate of firing of a neuron - This is usually discussed in terms of a neurons
activation level
4Representing info in our wetware
- Method 1 Assume that each neuron is a
grandmother cell - This is a local representation
- Pattern of activation tells you what is currently
being thought of - Note that we havent dealt with how those
thoughts connect up - e.g. grandma, blue, dress, glasses, apple pie.
- This is a local representation
5Representing info in our wetware, cont.
- Method 2 Patterns of activation
- Assume that your grandmother is instead
represented across a number of cells - e.g. the pattern 110011010010 in 12 neurons
represents grandma - This is a distributed representation
- Patterns of connectivity
- May be the method by which associations are
encoded - When one pattern is active, it may trigger a
different pattern
6McCulloch Pitts (1943)
- They explored the formal properties of
neuron-like devices - What logical operations could neurons compute?
- Five assumptions based on then-current knowledge
of neurons - 1. The activity of a neuron is all-or-none
(binary coding) - 2. Each neuron has a fixed threshold on the
required number of synapses that need to be
excited before the neuron itself will be excited.
Weights are identical. - 3. Synaptic action causes a time delay before
firing. - 4. Inhibition is absolute.
- 5. The physical structure of a network of neurons
doesnt change with time connections and their
strengths are static.
7McCulloch/Pitts neurons
- McCulloch/Pitts neurons can then be used to
compute any (finite) logical function - BUT, McCulloch/Pitts networks cant learn.
8Hebb (1949)
- Aimed to set out the psychological implications
of particular neural models also was very
interested in developing a physiological theory
of learning.
9Learning in a Hebbian network
- When an axon of cell A is near enough to excite
a cell B and repeatedly or persistently takes
part in firing it, some grown process or
metabolic change takes place in one or both cells
such that As efficiency, as one of the cells
firing B, is increased.
10Hebbian learning, more formally
- Eq
- where as are activation values (-1 or 1), and ??
is a learning rate parameter. - Equation is applied until weights saturate
(typically at 1) and do not keep increasing as
inputs are presented. - Think of Hebbian learning as picking up on
correlations between features in the environment - Features that co-occur will have strong positive
weights, those that do not occur together will
have strong negative weights, random pairing
produces zero weights
11Perceptron (Rosenblatt, 1958, 1962)
- Rosenblatt explored the properties of networks of
McCulloch-Pitts neurons (linear-threshold) with
connections that could be modified by learning
12Perceptron
- Most commonly discussed architecture
- Only connections between feature units and output
unit was modifiable (the wis). The input
feature unit values (xi) were set by hand.
w0
wn
13Multiple Perceptrons
14How were the connections learned?
- Start with random connections
- Present an input pattern
- Propagate activation through network to the
output. - If output is correct then dont change anything.
- If incorrect, then change weights only on
connections between active feature units and the
output units.
15Change weights how? How much?
- Rule
- If the output unit is on when it should be off,
then decrease the weights from those active
feature units by some constant amount. - If the output unit is off when it should be on,
then increase the weights from those active
feature units by some constant amount - Perceptron was very powerful method for learning
various relationships.
16Minsky Papert (1969)
- Presented a formal analysis of the properties of
perceptrons and revealed several fundamental
limitations. - Limitations
- Cant learn nonlinearly separable problems like
the XOR - More
17Linearly separable
18Nonlinearly separable
19Minsky Papert cont.
- Limitations
- So.cant learn nonlinearly separable problems
like the XOR - Although including hidden layers allows one to
hand-design a network that can represent XOR and
related problems, they showed that the perceptron
learning rule cant learn the required weights. - They also showed that even those functions that
can be learned by perceptron rule learning may
require huge amounts of learning time
20Fallout of Minsky Paperts analysis
- This paper was nearly the death of this budding
field. - Subsequent research was largely done in
garages. - i.e., only in obscure academic circles.
21Connectionist (subsymbolic) vs. symbolic
processing
- Newell (1980) articulated the role of the
mathematical theory of symbolic processing. - Cognition involves the manipulation of symbols
analogous to words, concepts, schema, etc. - What are symbols?
- Definition is hard to pin down.
- Roughly, its like the values of a categorical
variable (male, female, red, blue, dog, cat). - Operators on those symbols would then be things
like is-a a-kind-of purpose shape
part-of object
22McClelland Rumelharts alternative subsymbolic
processing
- Cognition involves the spreading of activation,
relaxation, statistical correlation. - Represents a method for how symbolic systems
might be implemented - Hypothesized that apparently symbolic processing
is an emergent property of subsymbolic
operations. - Subsymbolic elements of computation are numbers
- Philosophers of mind continue to debate the
distinction between symbolic and subsymbolic and
which is fundamentally correct.
23Should we toss out symbolic approaches?
- No they do offer a different level of analysis
and can be very helpful, especially when your
interest is in high level cognition - Example, do you want to build a connectionist
model of chess playing? Very complex. - But how would you build a symbolic model of
vision?
24Terms you may encounter
- Distributed vs. Local representations
- Symbolic typically local
- Connectionist typically distributed
- Parallel vs. Serial processing
- Symbolic typically serial
- Connectionist typically parallel
25Why use connectionist models?
- Strong generalization
- Fault tolerance
- Can be used to model learning
- More naturally capture nonlinear relationships
- Fuzzy information retrieval
- The gap between neural processing and
connectionist models is smaller (but still large)
26Next week
- Refresher on linear techniques to associate
input(s) to an output x -gt y - Simple regression (single predictor)
- Multiple regression (multiple predictors)