Other and related for 2 hours - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Other and related for 2 hours

Description:

NN with lexicon. Early models of language acquisition stressed that connectionist models didn't ... Fuzzy allows us to be young and old at the same time ... to ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 35

Provided by: lili99

Category:

Tags: hours | models | nn | related | young

more less

Transcript and Presenter's Notes

Title: Other and related for 2 hours

1
Other and related for 2 hours

Christer Johansson
Computational Linguistics
Bergen University

2
New Developments

Local Learning
cautious generalization (Instance Based
learning)

3
Radial Basis Functions (RBFs)

Features
One hidden layer
The activation of a hidden unit is determined by
the distance between the input vector and a
prototype vector

Outputs
Radial units
Inputs
4
Learning

The training is performed by deciding on
How many hidden nodes there should be
The centers and the sharpness of the Gaussians
2 steps
In the 1st stage, the input data set is used to
determine the parameters of the basis functions
In the 2nd stage, functions are kept fixed while
the second layer weights are estimated ( Simple
BP algorithm like for MLPs)

5
MLPs versus RBFs

Classification
MLPs separate classes via hyperplanes
RBFs separate classes via hyperspheres
Learning
MLPs use distributed learning
RBFs use localized learning
RBFs train faster
Structure
MLPs have one or more hidden layers
RBFs have only one layer
RBFs require more hidden neurons gt curse of
dimensionality

MLP
X2
X1
X2
RBF
X1
6
Temporal Processing

Simple Recurrent Networks (SRN)

7
SRN

Uses the back-propagation algorithm
Feeds back the (a) hidden layers activition
This becomes part of the input in the next
processing step.
Initially the activation of the hidden layer is
undefined. May take a little while to stabilize.

8
SRN
9
SRN an example
An SRN can be usedto predict the next character
of a sequence. Errors typically drops from start
of words. A method to detect words?
10
xor in time (spurious regularities)
If we run a loopover this data set we typically
getgood predictionfor input, but notthe xor
function. To learn xor in time we must make
sure the input is randomto give xor a chance.
1 ? 1 ? 0 0 1 ? 0 ? 1 1 0
? 1 ? 1 1 0 ? 0 ? 0 0
11
Neural Nets with a Lexicon

Can symbolic and connectionistprocessing be
combined?

12
NN with lexicon

Early models of language acquisition stressed
that connectionist models didnt need a separate
lexicon. Everything was stored in the net.
Miikkulainen Dyer showed that adding a lexicon
could help the net invent a useful input
representation.

13
NN with lexicon

Their model mapped short sentences (with a task
to assign thematic roles to words).
Words were index numbers
The index pointed to a representation that was
trained.
The error signal was sent one step further, and
used to update the representations in the
lexicon. (!)

14
Inventing features
15
Miikkulainen FGREP DISCERN

The combination of a lexicon and a neural net
proved successful.
Interesting because it marries symbolic AI with
connectionism.
There is suddenly room for hybrid models.

16
An alternative Learning Law

Winner Takes All
Kohonen Maps (SOM, LVQ)

17
Self organizing maps

The purpose of SOM is to map a multidimensional
input space onto a topology preserving map of
neurons
Preserve a topological so that neighboring
neurons respond to similar input patterns
The topological structure is often a 2 or 3
dimensional space
Each neuron is assigned a weight vector with the
same dimensionality of the input space
Input patterns are compared to each weight vector
and the closest wins (Euclidean Distance)

18
Self organizing maps

The result of SOM is a clustering of data, so
that similar input appear closer in the map
space.
An implicit categorization is discovered without
feedback.
Problems with winner-takes all
If sequence of training is ordered in a certain
way one neuron could form a universal class for
everything.
Remedy conscience. Neurons are discouraged
from being greedy their probability of being
the winner is influenced by how many times they
have won previously.

The activation of the neuron spreads to its
direct neighborhood gtneighbors become sensitive
to similar input patterns
The size of the neighborhood is initially large
but reduce over time gt Specialization of the
network
Other measures of distance possible which
defines the neighborhood.

2nd neighborhood
First neighborhood
20
Adaptation

During training, the winner neuron and its
neighborhood adapts to make their weight vector
more similar to the input pattern that caused the
activation
The neurons are moved closer to the input
pattern (the weights of the neurons to the input
are adapted).
The magnitude of the adaptation is controlled via
a learning parameter which decays over time

21
Interpretation of Neural Nets

Exclusive or Inclusive ProbabilitiesFuzzy Logic

22
Fuzzy Logic
Values in neural networks are usually shades as
neurons gradually gets activated. Fuzzy Logic
uses shades of truth, and can combinesome of the
strengths of symbolic AI with the strengths of
connectionist AI.
23
Fuzzy is not Probabilistic
In Fuzzy Logic we can say thatthe car is 0.70 in
parking pocket Aand 0.30 in parking pocket
B. This is not the same as saying it is in A
with 0.70 probability. (it would then be
either in A orsomewhere else.)
C A R
24
Vagueness
When do we get old? Are we not old at 39, but
old at 40? In crisp logic there would have to be
one sharp dividing line between the oldand the
not old. Fuzzy allows us to be young and old at
the same time to some
degree. (cf. Non-linear activation function that
helped to save the perceptron for multi-layered
networks).
25
Neural Nets

Fuzzy Logic or Probabilistic Reasoning?
NNs are often used to estimate parameters in
Fuzzy Logic systems. The main question is to
what degree is the input a member of the valid
classes.

26
Neural Nets

When we provide NNs as a knowledge source the
information about the detected vagueness is
preserved.
It is also possible to allow some adaptation of
that information in the final product.
NNs can also be used to approximate probability
distributions.

27
Neural Nets Hidden Markov Models

Alike Different
Hybrid Models

28
Neural Nets

NNs can be used to approximate probability
distributions.
This is a main problem in probabilistic modeling
(Hidden Markov Models).
Probabilities are estimated from large data
bases, but there will always be a need for more
data. Larger contexts means sparse data neural
networks may help generalize to unseen data.

29
HMM

Work with probabilities of state transitions
Derived from a corpus?
Either you are in a state or you are not.

30
HMM

Assume that new input is going to be like the old
input (from a corpus).
What happens if we see a new unit (word)?
Markov assumption The probability of the next
state only depends on this state.

31
HMM

What happens if we see a new unit (word)?
or a completely new sequence
We might estimate the probabilities associated
with this new word based on probabilities
observed for new (low freq) words previously.
We might use a neural network to integrate
information about word form, position, etc. into
an activation vector for this word. This vector
could be used to approximate the probabilities we
need.

32
Comparison

HMM
Maximize probability of observation (etc).
Probabilities from observation
Discrete units
Smoothing techniques for assigning prob to unseen.

NN
Minimize errors for recognition (etc.)
Activation Vectors fuzzy membership
sub-symbolic
Smoothing based on regularities and cues in
corpus.

33
LINKS
Tlearn is a free neural network simulator
tlearn, crl, ucsd Jeffrey Elman popularized
neural networks in
Cognitive Science / Linguistics Course
http//www.cs.wisc.edu/dyer/cs540/notes/nn.html T
he PDP book Rumelhart McClelland,
Parallel Distributed Processing I
II. Fuzzy Logic Bart Kosko, 1994. Fuzzy
Thinking. Michael Negnevitsky, 2002.
Artificial intelligence. Search for Lotfi
Zadeh. Neurons http//cti.itc.virginia.edu/psyc
220
34
More resources(some illustrations which were
found on the internet)

Connectionism (fact laden)
http//www.dcs.shef.ac.uk/yorick/ai_course/com107
0.9.ppt COM1070.10.ppt
Technical
http//acat02.sinp.msu.ru/presentations/prevotet/tu
torial.ppt ACAT2002.ppt
The tlearn simulator
http//crl.ucsd.edu/innate/tlearn.html (check
out the book http//crl.ucsd.edu/innate/tlearn.htm
l )