Title: Connectionism in 2 hours
 1Connectionism in 2 hours
- Christer Johansson 
 - Computational Linguistics 
 - Bergen University
 
  2Why should Linguists be interested in 
Connectionism?
- Alternative to good old AI (rules) 
 - Learning - knowledge is acquired 
 - Biological plausibility (?) 
 - Practical applications  handles uncertain data, 
only needs representative exemplars. 
  3The main point of Connectionism
- There is no central processor
 
  4Processing is interaction
- Each neuron is a simple processor that receives 
information from other neurons and sends out 
activation or deactivation to other neurons, 
depending on how much it was activated.  - Processing is an emergent phenomena from the 
activity of large quantities of such simple 
processors. 
  5Hebbian Learning (1949)
- Neurons that fire togetherwire together
 
  6Processing is subsymbolic 
- The information neurons work with is without 
(much) content.  - This allows for easy interaction between 
different modalities. (McGurk-effect). 
  7Biological inspirations
- Some numbers 
 - The human brain contains about 10 billion nerve 
cells (neurons)  - Each neuron is connected to the others through 
10000 synapses (average some more some much 
less)  - Properties of the brain 
 - It can learn, reorganize itself from experience 
 - It adapts to the environment 
 - It is robust and fault tolerant 
 
  8Connectionism vs AI 
 9Connectionist Successes
- Graceful degradation 
 - Connectionist models can be damaged and still 
keep (some) functionality.  - Models of reaction time studies. 
 - Learning path emerges from complexity in the data 
and the learning law. U-shaped learning common. 
  10Not so good (yet)
- Systematicity, information structure and 
encapsulation of information.  - If you understand the boy ate the fish you also 
understand the fish ate the boy.  - Typically a neural net would allow global 
information to affect the interpretation (thus 
disregarding structural information).  - The tomato ate the boy. 
 - I love apples.
 
  11Not so good (yet)
- Fast Mapping 
 - A child may observe the use of a word once or 
twice, and still be able to use the word 
correctly several weeks later.  - Sound is mapped to meaning fast 
 - The mapping degrades slowly 
 - Neural networks do not typically show these 
behaviors.  - Radial Basis Networks? Instance based learning.
 
  12Philosophical issues
- Is connectionism a better model of intelligence 
(mind) than symbolic AI?  - What do we mean by better? Researchers do not 
agree what intelligence is.  - None of the models correct? 
 - AI models handle symbolic information better but 
have little to say how symbolic behavior emerge.  - Connectionist models better at pattern 
recognition (noisy input, missing values, 
redundancy). 
  13Other arguments for connectionism
- Many relevant phenomena has been modeled. The 
activity in itself leads to new knowledge, and 
some insights into possible mechanisms.  - Models of aphasia, dyslexia etc. Many with 
detailed predictions, and even implications for 
remedies.  - Interaction between information sources is taken 
seriously. 
  14Connectionism in Linguistics
- Case Rule based behaviorPast Tense  U-shaped 
learning 
  15Chomsky  Pinker
- Learning language is done by acquiring rules 
(setting the parameters in a fixed format), which 
are processed by a specific, fixed, mechanism and 
expressed in an internal language (compare 
machine language).  - Chomsky main interest description of language 
complexity.  - Pinker Tries to push the point that Language 
depends on innate machinerywe only acquire the 
data the machinery processes and set the 
parameters of the processor.  - (Neural networks have an innate mechanism for 
how to learn from input). 
  16U-shaped Learning
- Children often use correct past tense forms, 
before over-generalizing the rule, and later 
recover to correct usage.  - Neural networks have a tendency for a similar 
behavior.  - At first the free capacity in the net allows 
memorization.  - When the regularity is discovered it is applied 
generally, and interacts with previous knowledge, 
which then gets an error signal that makes it 
possible to recover. 
  17U-shaped learning
- On the downside for connectionism 
 - The input needs to contain a clear signal. 
 - Which means some preprocessing, and in effect 
building the solution into the representation.  - Still, the u-shaped learning seems a fairly 
robust characteristic of many different neural 
models. 
  18U-shaped learning
- Pinker has proposed a so called dual route model. 
He argues that  - Regular forms are done by symbolic processing 
 - The irregular are done by association, a la 
connectionism.  - But this makes little sense if we are allowed to 
separate the regular from the irregular then a 
neural network can easily learn the regular 
alternations (as well as the irregulars but it 
cannot learn gaps). 
  19Fodor  Modularity
- Fodor, among others, argues that 
compositionality  systematicity can only be 
made by symbolic machinery.  - /S//P//I//L/ gt spill ed(past)  spilled. 
 - The kangaroo jumped over the elephant. gt 
 - The elephant jumped over the kangaroo. 
 
  20Not Truth based
- Adam loves Eve. 
 -  does NOT imply 
 - Eve loves Adam. 
 - But still if the first is understood, the second 
should also be understood. (Role of Syntax). 
  21Fodor  Modularity
- Information needs to be encapsulated. 
 - Different levels should not interact (each level 
is encapsulated).  - Leaky modules 
 - Modules in the brain. Are different anatomical 
areas specialized for  - Different general tasks? 
 - Functionally specific tasks (say syntactic 
processing)? 
  22Outline
- The rest of the talk falls into two categories 
 - Biology Neurons, the Brain and Language Areas 
 - Technology Practical Applications of Neural 
Networks 
  23Looking at the Brain
- What about the argument for specialized modules 
for language?  - Brocas area.
 
  24Biological neuron
- A neuron has 
 - input (via dendrites) 
 - output (via the axon) 
 - The information mediated from the dendrites to 
the axon via the cell body  - Axon connects to dendrites (of other neurons) via 
synapses  - Synapses vary in strength 
 - Synapses may be excitatory or inhibitory 
 
  25Neurons
Cell machinery
Surface structure 
 26Schematic Neuron
Summation function Input 
 Output Weights 
 27Example Neurons
Neurons come in a variety of flavours. 
 28Neuronal organisation
Neurons are organised into hierarchical 
layers. Within each layer we often have 
inhibitory connections. 
 29The Brain 
 30Outline 
 31Neuroimaging ConfirmationYES
- reading complex sentences vs. letter strings
 
  32Points
- confirmation of left hemisphere dominance 
 - confirmation of classical language areas 
 - modification 
 - involvement of additional areas
 
  33Brocas area
- Brocas area is involved in the comprehension 
of complex sentences 
  34Simple Sentences vs. Passive Fixation 
 35The role of Brocas area?
- That Brocas area is involved does not mean that 
syntactic processing is located in the left 
inferior frontal lobe  - simple sentences do not reliably activate this 
area  - other tasks with similar cognitive components 
also activate this area 
  36Wijers et al WM task 
 37Wijers et al WM task 
 38Wijers et al WM task 
 39Conclusions
- Language Areas not specific for language. 
 - Language may depend on interaction 
 - Modules? (The brain is functionally structured) 
 - Neurons? (All neurons may contribute)
 
  40Technical Applications 
 41Properties of Neural Networks
- Supervised networks are universal approximators 
 - Theorem  Any limited function can be 
approximated by a neural network with a finite 
number of hidden neurons to an arbitrary 
precision.  - This could be useful )
 
  42Other properties
- Adaptivity 
 - Adapt weights to environment (examples) 
 - Easily retrainable 
 - Generalization ability 
 - May counteract lack of data 
 - Fault tolerance 
 - Graceful degradation of performances if damaged. 
 - damage might also be faulty input (noise, 
missing values etc.)  - The information is distributed within the entire 
net. 
  43Classification (Discrimination)
- Estimation of the probability for a certain 
object to belong to a specific class  - Can be used for Data Mining 
 - Applications  Economy, speech and visual pattern 
recognition, sociology, etc.  
  44Example
Examples of handwritten postal codes drawn from 
a database available from the US Postal service 
 45What do we need to use NN ?
- Determination of input should be (what 
information is available, what info do we need )  - A representative Collection of data for the 
learning and testing phase of the neural network  - Find an optimum number of hidden nodes 
 - Estimate the parameters (Learning running the 
algorithm)  - Evaluate the performance of the network 
 - IF (when) performance is not satisfactory  
Review (all) the precedent points 
  46What are NNs used for?
- Prediction 
 - The weather tomorrow 
 - Classification 
 - X-ray shows cancer or not? 
 - Association / error correction 
 - Associate a pattern with another pattern / 
itself.  - Filtering 
 - Take noise / echo out of telephone signal
 
  47What are NNs used for in Language Technology?
- Text-to-speech 
 - NetTalk 
 - Speech Recognition (as part of larger systems) 
 - Estimate probability distributions 
 - Word ltgt document association 
 - Information Retrieval