Spring 2006 Artificial Intelligence COSC40503 week 5 - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Spring 2006 Artificial Intelligence COSC40503 week 5

Description:

... a step function to a continous one with a close behavior, e.g. a sigmoid axon ... Calculate the derivative of the sigmoid: da(j)/dn(j) = a(j)*(1-a(j) ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 23
Provided by: antonio9
Category:

less

Transcript and Presenter's Notes

Title: Spring 2006 Artificial Intelligence COSC40503 week 5


1
Spring 2006 Artificial Intelligence
COSC 40503week 5
  • Antonio Sanchez
  • Texas Christian University

2
About Learning and Behavior
Herbert Simon (1916-2001)
  • Learning is any change in a system that produces
    a more or less permanent change in its capacity
    for adapting to its environment.
  • Human beings, viewed as behaving systems, are
    quite simple. The apparent complexity of our
    behavior over time is largely a reflection of the
    complexity of the environment in which we find
    ourselves.

3
Connectionism
  • The concept of a Neuron has invited many lines of
    research.
  • Yet the power of living neurons lies in both
  • Their huge number 1010 in humans, 104 in small
    bug
  • Their connectivity 105 in humans
  • It is therefore by the power of the
  • ganglia (1015) that we present a
  • complex and intelligent behavior
  • Neurons can be model as digital
  • or continuous systems

4
A simple Neuron Model
  • Orthogonally fh 0
  • Normality ff 1 execute fi
    nfi/sqrt(nf.nf)
  • Soma ni,j ? ai,kdk,j
  • for all k dendrites
  • Linear Axon ai,j ni,j
  • Non Linear Axon if ni,j gt Thresholdi,j
  • then ai,j 1 else
    ai,j 0
  • Lineal compensation ?di,j ?ai,kai,j
    0 lt ? lt 1
  • k is entry axon
    j exit axon
  • N pattern compensation ?di,j
    ??i,k,hai,j,h for all h patterns

5
A LinearNeuronal Network
It can learn a linear function such as
Exclusive OR
6
A Hidden Layer Neuronal Network
Exclusive OR
Th 1
Th 1
Th1
Excel file
Source Rumelhart, D. E., Hinton, G. E.,
Williams, R. J. (1986a). Learning internal
representations by error propagation. In D. E.
Rumelhart, J. L. McClelland (Eds.), Parallel
distributed processing Explorations in the
microstructure of cognition. Vol. 1 Foundations
(pp. 318--362). Cambridge, MA MIT Press.
7
Credit Assignment using Gradient Descent method
to change the dendritic weights
  • Change the function for firing neurons from a
    step function to a continous one with a close
    behavior, e.g. a sigmoid axon
  • a(j) 1/(1e-n(j))
  • Calculate the derivative of the sigmoid
  • da(j)/dn(j) a(j)(1-a(j))
  • Determine output error
  • e(output) TrueValue - a(j)
  • Determine the axon error
  • ea(j) a(j)(1-a(j))e(output)
  • Minimize E by gradient descent change each
    weight by an amount proportional to the partial
    derivative
  • ?d aea(j)a(h)

If slope is negative ? increase n(j) If slope is
positive ? decrease n(j) Local minima for Error
are the places where derivative equals zero
8
Faster Convergence Momentum rule
  • Add a fraction (amomentum) of the last change to
    the current change

?d(t) aea(j)a(h)ß?d(t-1)
Study and Run Excel file
9
Extended Hidden LayerNeuronal Network
Hidden Layer
Inputs
Outputs
10
A Lesson from Mother NatureUsing the
Scientific Method Observation
11
Hypothesis
When an axon of cell A is near enough to excite
cell B and repeatedly or persistently takes part
in firing it, some grow process or metabolic
change takes place in one or both cells such that
As efficiency, as one of the cells firing B, is
increased Donald O. Hebb 1949
12
Synthesis and Validation
The connectivity of a brain is product of its
interaction with its environment
High Interaction
Low Interaction
13
Argumentation
  • In any case the connectivity is due to at least
    the following
  • three factors
  • Time
  • Born, Child, Young, Adult
  • Interaction
  • Low, Medium, High
  • Activity of the brain
  • Low, Medium, High

14
How about storing Information
2 bit net for
Th 1
Value -1
Th 0
Value -1
Th 1
15
How about storing Knowledge
2 bit net for gt
Th 1
Value 1
Th 1
Value 1
Value -1
Th 0
16
Digital Neural Network requirements
  • Extrapolating the previous data to Gigabytes of
    knowledge and
  • information, we would require an enormous amount
    of cells
  • and connections, some like 1011 for a full
    Gigabyte of
  • information and knowledge
  • However we must take into account three important
    facts for the
  • case of natural neurons
  • They are not binary digital, but continuous
  • There is a lot of implicit coding in the ganglia
  • They do not store all the information, but only
    important patterns of them (aka knowledge)

17
Some important knowledge on Artificial Neural
Networks
  • They are slow but once they learn they perform
    beautifully
  • Treat the threshold as negative dendrite with an
    axon with value of 1
  • To speed up convergence try with random seeds for
    the initial weights
  • If they do not converge use another set of random
    weights
  • To reduce the size of the network, use binary
    coding for both the inputs and the outputs
  • Use other deterministic filters you deem
    necessary such as the mean, sd
  • Do not overload the network with more patterns
    than a maximum of 20 to 25 of the number of
    neurons
  • A single hidden layer is enough, it should
    comprised about half of the input cells and/or
    twice the number output cells

18
Linear Recurrent Networks
  • Associative Memory
  • Hopfield Network

Excel files
19
Unsupervised Learning
So far we have talked about training with a
purpose, I.e. showing some examples and give some
type of compensation, this is called supervised
learning. Yet following Donald O. Hebb
hypothesis, we do not event need to supervised
the learning process, this will take place
anyway!
  • Examples
  • Kohonen Maps
  • Bayesian Nets

20
Bayesian Belief Networks
Based on the joint probability concepts of Thomas
Bayes, these networks are used to obtain
inference based on the connection that occurs
within related events, here the basic equations
(try to remember them) p(A ? B) p(A) p(B) -
p(A n B) p(A n B) p(A)p(BA) p(B)p(AB)
p(A,B) p(A n B) 1 - p(A ? B) If p(A n B)
p(A)p(B) then A B are independent events and
no inference can be obtained from having one
event happening If p(A n B) 0 then A B are
mutually exclusive events, that is one cannot
happen if the other is present
21
Bayesian Belief Networks 2
An important extension is that p(A,B,C)
p(A)p(BA)p(CA,B) p(B)p(AB)p(CA,B)
p(A)p(CA)p(BA,C) p(B)p(CB)p(AC,B)
p(C )p(AC)p(BA,C) p(A n B n C) For the
general case p(x1,,xn) p p(xi E) for all
i where E is the required a priori evidence
22
Bayesian Belief Networks 3
The key aspect is to arrange the events in such a
fashion that we obtain an adequate tree of the
joint probabilities for various events, such that
we can assume that p(F,P,C)
p(C)p(PC)p(FC,P) is computed only as
p(C) p(PC) p(F P)
And
p(E,P,C,S,F) p(C)p(PC)p(SC)p(EC,P,S)p(
FC,P,S,E) is computed only as
p(C)p(SC)p(PC)p(EP,S)p(FP)
See Excel files
Write a Comment
User Comments (0)
About PowerShow.com