Spring 2006 Artificial Intelligence COSC40503 week 5

About This Presentation

Title:

Spring 2006 Artificial Intelligence COSC40503 week 5

Description:

... a step function to a continous one with a close behavior, e.g. a sigmoid axon ... Calculate the derivative of the sigmoid: da(j)/dn(j) = a(j)*(1-a(j) ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 23

Provided by: antonio9

Category:

more less

Transcript and Presenter's Notes

Title: Spring 2006 Artificial Intelligence COSC40503 week 5

1
Spring 2006 Artificial Intelligence
COSC 40503week 5

Antonio Sanchez
Texas Christian University

2
About Learning and Behavior
Herbert Simon (1916-2001)

Learning is any change in a system that produces
a more or less permanent change in its capacity
for adapting to its environment.
Human beings, viewed as behaving systems, are
quite simple. The apparent complexity of our
behavior over time is largely a reflection of the
complexity of the environment in which we find
ourselves.

3
Connectionism

The concept of a Neuron has invited many lines of
research.
Yet the power of living neurons lies in both
Their huge number 1010 in humans, 104 in small
bug
Their connectivity 105 in humans
It is therefore by the power of the
ganglia (1015) that we present a
complex and intelligent behavior
Neurons can be model as digital
or continuous systems

4
A simple Neuron Model

Orthogonally fh 0
Normality ff 1 execute fi
nfi/sqrt(nf.nf)
Soma ni,j ? ai,kdk,j
for all k dendrites
Linear Axon ai,j ni,j
Non Linear Axon if ni,j gt Thresholdi,j
then ai,j 1 else
ai,j 0
Lineal compensation ?di,j ?ai,kai,j
0 lt ? lt 1
k is entry axon
j exit axon
N pattern compensation ?di,j
??i,k,hai,j,h for all h patterns

5
A LinearNeuronal Network
It can learn a linear function such as
Exclusive OR
6
A Hidden Layer Neuronal Network
Exclusive OR
Th 1
Th 1
Th1
Excel file
Source Rumelhart, D. E., Hinton, G. E.,
Williams, R. J. (1986a). Learning internal
representations by error propagation. In D. E.
Rumelhart, J. L. McClelland (Eds.), Parallel
distributed processing Explorations in the
microstructure of cognition. Vol. 1 Foundations
(pp. 318--362). Cambridge, MA MIT Press.
7
Credit Assignment using Gradient Descent method
to change the dendritic weights

Change the function for firing neurons from a
step function to a continous one with a close
behavior, e.g. a sigmoid axon
a(j) 1/(1e-n(j))
Calculate the derivative of the sigmoid
da(j)/dn(j) a(j)(1-a(j))
Determine output error
e(output) TrueValue - a(j)
Determine the axon error
ea(j) a(j)(1-a(j))e(output)
Minimize E by gradient descent change each
weight by an amount proportional to the partial
derivative
?d aea(j)a(h)

If slope is negative ? increase n(j) If slope is
positive ? decrease n(j) Local minima for Error
are the places where derivative equals zero
8
Faster Convergence Momentum rule

Add a fraction (amomentum) of the last change to
the current change

?d(t) aea(j)a(h)ß?d(t-1)
Study and Run Excel file
9
Extended Hidden LayerNeuronal Network
Hidden Layer
Inputs
Outputs
10
A Lesson from Mother NatureUsing the
Scientific Method Observation
11
Hypothesis
When an axon of cell A is near enough to excite
cell B and repeatedly or persistently takes part
in firing it, some grow process or metabolic
change takes place in one or both cells such that
As efficiency, as one of the cells firing B, is
increased Donald O. Hebb 1949
12
Synthesis and Validation
The connectivity of a brain is product of its
interaction with its environment
High Interaction
Low Interaction
13
Argumentation

In any case the connectivity is due to at least
the following
three factors
Time
Born, Child, Young, Adult
Interaction
Low, Medium, High
Activity of the brain
Low, Medium, High

14
How about storing Information
2 bit net for
Th 1
Value -1
Th 0
Value -1
Th 1
15
How about storing Knowledge
2 bit net for gt
Th 1
Value 1
Th 1
Value 1
Value -1
Th 0
16
Digital Neural Network requirements

Extrapolating the previous data to Gigabytes of
knowledge and
information, we would require an enormous amount
of cells
and connections, some like 1011 for a full
Gigabyte of
information and knowledge
However we must take into account three important
facts for the
case of natural neurons
They are not binary digital, but continuous
There is a lot of implicit coding in the ganglia
They do not store all the information, but only
important patterns of them (aka knowledge)

17
Some important knowledge on Artificial Neural
Networks

They are slow but once they learn they perform
beautifully
Treat the threshold as negative dendrite with an
axon with value of 1
To speed up convergence try with random seeds for
the initial weights
If they do not converge use another set of random
weights
To reduce the size of the network, use binary
coding for both the inputs and the outputs
Use other deterministic filters you deem
necessary such as the mean, sd
Do not overload the network with more patterns
than a maximum of 20 to 25 of the number of
neurons
A single hidden layer is enough, it should
comprised about half of the input cells and/or
twice the number output cells

18
Linear Recurrent Networks

Associative Memory
Hopfield Network

Excel files
19
Unsupervised Learning
So far we have talked about training with a
purpose, I.e. showing some examples and give some
type of compensation, this is called supervised
learning. Yet following Donald O. Hebb
hypothesis, we do not event need to supervised
the learning process, this will take place
anyway!

Examples
Kohonen Maps
Bayesian Nets

20
Bayesian Belief Networks
Based on the joint probability concepts of Thomas
Bayes, these networks are used to obtain
inference based on the connection that occurs
within related events, here the basic equations
(try to remember them) p(A ? B) p(A) p(B) -
p(A n B) p(A n B) p(A)p(BA) p(B)p(AB)
p(A,B) p(A n B) 1 - p(A ? B) If p(A n B)
p(A)p(B) then A B are independent events and
no inference can be obtained from having one
event happening If p(A n B) 0 then A B are
mutually exclusive events, that is one cannot
happen if the other is present
21
Bayesian Belief Networks 2
An important extension is that p(A,B,C)
p(A)p(BA)p(CA,B) p(B)p(AB)p(CA,B)
p(A)p(CA)p(BA,C) p(B)p(CB)p(AC,B)
p(C )p(AC)p(BA,C) p(A n B n C) For the
general case p(x1,,xn) p p(xi E) for all
i where E is the required a priori evidence
22
Bayesian Belief Networks 3
The key aspect is to arrange the events in such a
fashion that we obtain an adequate tree of the
joint probabilities for various events, such that
we can assume that p(F,P,C)
p(C)p(PC)p(FC,P) is computed only as
p(C) p(PC) p(F P)
And
p(E,P,C,S,F) p(C)p(PC)p(SC)p(EC,P,S)p(
FC,P,S,E) is computed only as
p(C)p(SC)p(PC)p(EP,S)p(FP)
See Excel files

Write a Comment

User Comments (0)