Other and related for 2 hours - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Other and related for 2 hours

Description:

NN with lexicon. Early models of language acquisition stressed that connectionist models didn't ... Fuzzy allows us to be young and old at the same time ... to ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 35
Provided by: lili99
Category:
Tags: hours | models | nn | related | young

less

Transcript and Presenter's Notes

Title: Other and related for 2 hours


1
Other and related for 2 hours
  • Christer Johansson
  • Computational Linguistics
  • Bergen University

2
New Developments
  • Local Learning
  • cautious generalization (Instance Based
    learning)

3
Radial Basis Functions (RBFs)
  • Features
  • One hidden layer
  • The activation of a hidden unit is determined by
    the distance between the input vector and a
    prototype vector

Outputs
Radial units
Inputs
4
Learning
  • The training is performed by deciding on
  • How many hidden nodes there should be
  • The centers and the sharpness of the Gaussians
  • 2 steps
  • In the 1st stage, the input data set is used to
    determine the parameters of the basis functions
  • In the 2nd stage, functions are kept fixed while
    the second layer weights are estimated ( Simple
    BP algorithm like for MLPs)

5
MLPs versus RBFs
  • Classification
  • MLPs separate classes via hyperplanes
  • RBFs separate classes via hyperspheres
  • Learning
  • MLPs use distributed learning
  • RBFs use localized learning
  • RBFs train faster
  • Structure
  • MLPs have one or more hidden layers
  • RBFs have only one layer
  • RBFs require more hidden neurons gt curse of
    dimensionality

MLP
X2
X1
X2
RBF
X1
6
Temporal Processing
  • Simple Recurrent Networks (SRN)

7
SRN
  • Uses the back-propagation algorithm
  • Feeds back the (a) hidden layers activition
  • This becomes part of the input in the next
    processing step.
  • Initially the activation of the hidden layer is
    undefined. May take a little while to stabilize.

8
SRN
9
SRN an example
An SRN can be usedto predict the next character
of a sequence. Errors typically drops from start
of words. A method to detect words?
10
xor in time (spurious regularities)
If we run a loopover this data set we typically
getgood predictionfor input, but notthe xor
function. To learn xor in time we must make
sure the input is randomto give xor a chance.
1 ? 1 ? 0 0 1 ? 0 ? 1 1 0
? 1 ? 1 1 0 ? 0 ? 0 0
11
Neural Nets with a Lexicon
  • Can symbolic and connectionistprocessing be
    combined?

12
NN with lexicon
  • Early models of language acquisition stressed
    that connectionist models didnt need a separate
    lexicon. Everything was stored in the net.
  • Miikkulainen Dyer showed that adding a lexicon
    could help the net invent a useful input
    representation.

13
NN with lexicon
  • Their model mapped short sentences (with a task
    to assign thematic roles to words).
  • Words were index numbers
  • The index pointed to a representation that was
    trained.
  • The error signal was sent one step further, and
    used to update the representations in the
    lexicon. (!)

14
Inventing features
15
Miikkulainen FGREP DISCERN
  • The combination of a lexicon and a neural net
    proved successful.
  • Interesting because it marries symbolic AI with
    connectionism.
  • There is suddenly room for hybrid models.

16
An alternative Learning Law
  • Winner Takes All
  • Kohonen Maps (SOM, LVQ)

17
Self organizing maps
  • The purpose of SOM is to map a multidimensional
    input space onto a topology preserving map of
    neurons
  • Preserve a topological so that neighboring
    neurons respond to similar input patterns
  • The topological structure is often a 2 or 3
    dimensional space
  • Each neuron is assigned a weight vector with the
    same dimensionality of the input space
  • Input patterns are compared to each weight vector
    and the closest wins (Euclidean Distance)

18
Self organizing maps
  • The result of SOM is a clustering of data, so
    that similar input appear closer in the map
    space.
  • An implicit categorization is discovered without
    feedback.
  • Problems with winner-takes all
  • If sequence of training is ordered in a certain
    way one neuron could form a universal class for
    everything.
  • Remedy conscience. Neurons are discouraged
    from being greedy their probability of being
    the winner is influenced by how many times they
    have won previously.

19
  • The activation of the neuron spreads to its
    direct neighborhood gtneighbors become sensitive
    to similar input patterns
  • The size of the neighborhood is initially large
    but reduce over time gt Specialization of the
    network
  • Other measures of distance possible which
    defines the neighborhood.

2nd neighborhood
First neighborhood
20
Adaptation
  • During training, the winner neuron and its
    neighborhood adapts to make their weight vector
    more similar to the input pattern that caused the
    activation
  • The neurons are moved closer to the input
    pattern (the weights of the neurons to the input
    are adapted).
  • The magnitude of the adaptation is controlled via
    a learning parameter which decays over time

21
Interpretation of Neural Nets
  • Exclusive or Inclusive ProbabilitiesFuzzy Logic

22
Fuzzy Logic
Values in neural networks are usually shades as
neurons gradually gets activated. Fuzzy Logic
uses shades of truth, and can combinesome of the
strengths of symbolic AI with the strengths of
connectionist AI.
23
Fuzzy is not Probabilistic
In Fuzzy Logic we can say thatthe car is 0.70 in
parking pocket Aand 0.30 in parking pocket
B. This is not the same as saying it is in A
with 0.70 probability. (it would then be
either in A orsomewhere else.)
C A R
24
Vagueness
When do we get old? Are we not old at 39, but
old at 40? In crisp logic there would have to be
one sharp dividing line between the oldand the
not old. Fuzzy allows us to be young and old at
the same time to some
degree. (cf. Non-linear activation function that
helped to save the perceptron for multi-layered
networks).
25
Neural Nets
  • Fuzzy Logic or Probabilistic Reasoning?
  • NNs are often used to estimate parameters in
    Fuzzy Logic systems. The main question is to
    what degree is the input a member of the valid
    classes.

26
Neural Nets
  • When we provide NNs as a knowledge source the
    information about the detected vagueness is
    preserved.
  • It is also possible to allow some adaptation of
    that information in the final product.
  • NNs can also be used to approximate probability
    distributions.

27
Neural Nets Hidden Markov Models
  • Alike Different
  • Hybrid Models

28
Neural Nets
  • NNs can be used to approximate probability
    distributions.
  • This is a main problem in probabilistic modeling
    (Hidden Markov Models).
  • Probabilities are estimated from large data
    bases, but there will always be a need for more
    data. Larger contexts means sparse data neural
    networks may help generalize to unseen data.

29
HMM
  • Work with probabilities of state transitions
  • Derived from a corpus?
  • Either you are in a state or you are not.

30
HMM
  • Assume that new input is going to be like the old
    input (from a corpus).
  • What happens if we see a new unit (word)?
  • Markov assumption The probability of the next
    state only depends on this state.

31
HMM
  • What happens if we see a new unit (word)?
  • or a completely new sequence
  • We might estimate the probabilities associated
    with this new word based on probabilities
    observed for new (low freq) words previously.
  • We might use a neural network to integrate
    information about word form, position, etc. into
    an activation vector for this word. This vector
    could be used to approximate the probabilities we
    need.

32
Comparison
  • HMM
  • Maximize probability of observation (etc).
  • Probabilities from observation
  • Discrete units
  • Smoothing techniques for assigning prob to unseen.
  • NN
  • Minimize errors for recognition (etc.)
  • Activation Vectors fuzzy membership
  • sub-symbolic
  • Smoothing based on regularities and cues in
    corpus.

33
LINKS
Tlearn is a free neural network simulator
tlearn, crl, ucsd Jeffrey Elman popularized
neural networks in
Cognitive Science / Linguistics Course
http//www.cs.wisc.edu/dyer/cs540/notes/nn.html T
he PDP book Rumelhart McClelland,
Parallel Distributed Processing I
II. Fuzzy Logic Bart Kosko, 1994. Fuzzy
Thinking. Michael Negnevitsky, 2002.
Artificial intelligence. Search for Lotfi
Zadeh. Neurons http//cti.itc.virginia.edu/psyc
220
34
More resources(some illustrations which were
found on the internet)
  • Connectionism (fact laden)
  • http//www.dcs.shef.ac.uk/yorick/ai_course/com107
    0.9.ppt COM1070.10.ppt
  • Technical
  • http//acat02.sinp.msu.ru/presentations/prevotet/tu
    torial.ppt ACAT2002.ppt
  • The tlearn simulator
  • http//crl.ucsd.edu/innate/tlearn.html (check
    out the book http//crl.ucsd.edu/innate/tlearn.htm
    l )
Write a Comment
User Comments (0)
About PowerShow.com