Soft Computing

About This Presentation

Transcript and Presenter's Notes

Title: Soft Computing

1
Soft Computing

Lecture 10
Boltzmann machine

2
Definition of wikipedia
A Boltzmann machine is a type of stochastic
recurrent neural network originally invented by
Geoffrey Hinton and Terry Sejnowski. Boltzmann
machines can be seen as the stochastic,
generative counterpart of Hopfield nets. They
were an early example of neural networks capable
of forming internal representations. Because
they are very slow to simulate they are not very
useful for most practical purposes. However,
they are theoretically intriguing due to the
biological plausibility of their training
algorithm.
3
Definition of BM (2)

Boltzmann machine, like a Hopfield net is a
network of binary units with
an "energy" defined for the network. Unlike
Hopfield nets though,
Boltzmann machines only ever have units that
take values of 1 or 0.
The global energy, E, in a Boltzmann machine is
identical to that of
a Hopfield network, that is
Where
wij is the connection weight from unit j to unit
i.
si is the state (1 or 0) of unit i.
?i is the threshold of unit i.

4
Definition of BM (3)
Thus, the difference in the global energy that
results from a single unit i being 0 or 1,
written ?Ei, is given by
A Boltzmann machine is made up of stochastic
units. The probability, pi of the ith unit being
on is given by
(The scalar T is referred to as the
"temperature of the system.)
5
Definition of BM (4)
Notice that temperature T plays a crucial role in
the equation, and that in the course of running
the network, the value of T will start high, and
gradually 'cool down' to a lower value.
This is a continuous function that transforms any
inputs - from infinity to infinity - into real
numbers in the interval 0, 1. This is the
logistic function, has a characteristic
sigmoid shape
So when Net 0, e-Net 1, because any number
raised to the power 0 is 1. This true for all
temperatures. So always prob(A 1) 1/2. I.e.
if the netinput is 0, it's as likely to fire as
not.
6
Definition of BM (5)

With very low temperatures, e.g. 0.001, if you
get a little bit of positive activation, the
probability it will fire goes to 1.
conversely, with if it goes negative, i.e. at
very low temperatures - as it approaches 0 - the
Boltzmann machine becomes deterministic.
Otherwise, the higher the temperature, the more
it diverges from this.

7
Definition of BM (6)
Units are divided into "visible" units, V, and
"hidden" units, H. The visible units are those
which receive information from the
"environment", i.e. those units that receive
binary state vectors for training.
The connections in a Boltzmann machine have three
restrictions on them
(No unit has a connection with itself)
(All connections are symmetric)
(Visible units have no connections between them)
8
Structure of BM
9
Alternative structure of BM
10
Training of BM
Boltzmann machines can be viewed as a type of
maximum likelihood model, i.e. training involves
modifying the parameters (weights) in the
network to maximize the probability of the
network producing the data as it was seen in the
training set. In other words, the network must
successfully model the probabilities of the data
in the environment.
There are two phases to Boltzmann machine
training. One is the "positive phase where the
visible units' states are clamped to a particular
binary state vector from the training set. The
other is the "negative" phase where the network
is allowed to run freely, i.e. no units have
their state determined by external data. A
vector over the visible units is denoted Va and a
vector over the hidden units is denoted as Hß.
The probabilities P(S) and P-(S) represent the
probability for a given state, S, in the positive
and negative phases respectively. Note that
this means that P (Va) is determined by the
environment for every Va, because the visible
units are set by the environment in the positive
phase.
11
Training of BM (2)
Boltzmann machines are trained using a gradient
descent algorithm, so a given weight, wij is
changed by subtracting the partial derivative of
a cost function with respect to the weight. The
cost function used for Boltzmann machines, G, is
given as
This means that the cost function is lowest when
the probability of a vector in the negative
phase is equivalent to the probability of the
same vector in the positive phase. As well, it
ensures that the most probable vectors in the
data have the greatest effect on the cost
12
Training of BM (3)
This cost function would seem to be complicated
to perform gradient descent with. Suprisingly
though, the gradient with respect to a given
weight, wij, at thermal equilibrium is given by
the very simple equation

Where
is the probability of units i and j
both being on in the positive phase.
is the probability of units i and j
both being on in the negative phase.

13
Training of BM (4)
This result follows from the fact that at thermal
equilibrium the probability of any state when
the network is free-running is given by the
Boltzmann distribution (hence the name
"Boltzmann machine"). A state of thermal
equilibrium is crucial for this though, hence the
network must be brought to thermal equilibrium
before the probabilities of two units both being
on are calculated. Thermal equilibrium is
achieved with simulated annealing in a Boltzmann
machine. It is the necessity of simulated
annealing which can make training a Boltzmann
machine on a digital computer a very slow
process. However, this learning rule is fairly
biologically plausible because the only
information needed to change the weights is
provided by "local" information. That is, the
connection (or synapse biologically speaking)
does not need information about anything other
than the two neurons it connects. This is far
more biologically realistic than the information
needed by a connection in many other neural
network training algorithms, such as
backpropagation.
14
Training of BM (5)
Simulated annealing (SA) is a generic
probabilistic meta-algorithm for the global
optimization problem, namely locating a good
approximation to the global optimum of a given
function in a large search space. It was
independently invented by S. Kirkpatrick, C. D.
Gelatt and M. P. Vecchi in 1983, and by V. Cerny
in 1985. The name and inspiration come from
annealing in metallurgy, a technique involving
heating and controlled cooling of a material to
increase the size of its crystals and reduce
their defects. The heat causes the atoms to
become unstuck from their initial positions (a
local minimum of the internal energy) and wander
randomly through states of higher energy the
slow cooling gives them more chances of finding
configurations with lower internal energy than
the initial one.
15
Training of BM (6)
16
(No Transcript)
17
Similarity and difference of BM and Hopfield
model
18
Sometimes machine boltzmann are used in
combination with Hopfield model and perceptron as
device for find global minimum of energy function
or error function correspondingly. In first case
during of working (recall) state of any neuron
is changed according of T (temperature) and if
energy function decreases then this changing is
accepted and process continues. In perceptron
during of learning any weight is changed and
this changing is accepted or not in according
with estimation of error function. This process
of changing may be executed in mixture with
usual process of working or learning or after it
to improve result.
19
(No Transcript)
20
Boltzmann Machine Simulator /

Network Boltzmann Machine with Simulated
Annealing
Application Optimization Traveling Salesman
Problem
void InitializeApplication(NET Net) INT
n1,n2 REAL x1,x2,y1,y2 REAL Alpha1, Alpha2
Gamma 7 for (n10 n1ltNUM_CITIES n1)
for (n20 n2ltNUM_CITIES n2)
Alpha1 ((REAL) n1 / NUM_CITIES) 2 PI
Alpha2 ((REAL) n2 / NUM_CITIES) 2 PI
x1 cos(Alpha1) y1 sin(Alpha1)
x2 cos(Alpha2) y2
sin(Alpha2) Distancen1n2
sqrt(sqr(x1-x2) sqr(y1-y2)) f
fopen("BOLTZMAN.txt", "w") fprintf(f,
"Temperature Valid Length Tour\n\n")
21
void CalculateWeights(NET Net) INT
n1,n2,n3,n4 INT i,j INT Pred_n3,
Succ_n3 REAL Weight for (n10
n1ltNUM_CITIES n1) for (n20
n2ltNUM_CITIES n2) i
n1NUM_CITIESn2 for (n30
n3ltNUM_CITIES n3) for (n40
n4ltNUM_CITIES n4) j
n3NUM_CITIESn4 Weight 0
if (i!j) Pred_n3
(n30 ? NUM_CITIES-1 n3-1)
Succ_n3 (n3NUM_CITIES-1 ? 0
n31) if ((n1n3)
OR (n2n4))
Weight -Gamma
else if ((n1 Pred_n3) OR (n1
Succ_n3))
Weight -Distancen2n4

Net-gtWeightij Weight

Net-gtThresholdi -Gamma/2
22
void PropagateUnit(NET Net, INT i) INT j
REAL Sum, Probability Sum 0 for (j0
jltNet-gtUnits j) Sum
Net-gtWeightij Net-gtOutputj
Sum - Net-gtThresholdi Probability 1 /
(1 exp(-Sum / Net-gtTemperature)) if
(RandomEqualREAL(0, 1) lt Probability)
Net-gtOutputi TRUE else
Net-gtOutputi FALSE
23
void BringToThermalEquilibrium(NET Net) INT
n,i for (i0 iltNet-gtUnits i)
Net-gtOni 0 Net-gtOffi 0
for (n0 nlt1000Net-gtUnits n)
PropagateUnit(Net, i RandomEqualINT(0,
Net-gtUnits-1)) for (n0
nlt100Net-gtUnits n)
PropagateUnit(Net, i RandomEqualINT(0,
Net-gtUnits-1)) if (Net-gtOutputi)
Net-gtOni else
Net-gtOffi for
(i0 iltNet-gtUnits i)
Net-gtOutputi Net-gtOni gt Net-gtOffi

24
void Anneal(NET Net) Net-gtTemperature 100
do BringToThermalEquilibrium(Net)
WriteTour(Net) Net-gtTemperature 0.99
while (NOT ValidTour(Net))
void main() NET Net InitializeRandoms()
GenerateNetwork(Net) InitializeApplication(
Net) CalculateWeights(Net)
SetRandom(Net) Anneal(Net)
FinalizeApplication(Net)

Write a Comment

User Comments (0)

About PowerShow.com

Soft Computing PowerPoint PPT Presentation