Building NeuroSearch - PowerPoint PPT Presentation

About This Presentation

Title:

Building NeuroSearch

Description:

Building NeuroSearch Intelligent Evolutionary Search Algorithm For Peer-to-Peer Environment Master s Thesis by Joni T yryl 3.9.2004 Mikko Vapa, researcher ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 25

Provided by: Jarkko

Category:

more less

Transcript and Presenter's Notes

Title: Building NeuroSearch

1
Building NeuroSearch Intelligent Evolutionary
Search Algorithm For Peer-to-Peer
EnvironmentMasters Thesis by Joni Töyrylä
3.9.2004

Mikko Vapa, researcher studentInBCT 3.2 Cheese
Factory / P2P Communication
Agora Center
http//tisu.it.jyu.fi/cheesefactory

2
Contents

Resource Discovery Problem
Related Work
Peer-to-Peer Network
Neural Networks
Evolutionary Computing
NeuroSearch
Research Environment
Research Cases
Fitness
Population
Inputs
Resources
Queriers
Brain Size
Summary and Future

3
Resource Discovery Problem

In peer-to-peer (P2P) resource discovery problem
a P2P node decides based on local knowledge
which neighbors would be the best targets (if
any) for the query to find the needed resource
A good solution locates the predetermined number
of resources using minimal number of packets

4
NeuroSearch

NeuroSearch resource discovery algorithm uses
neural networks and evolution to adapt its
behavior to given environment
neural network for deciding whether to pass the
query further down the link or not
evolution for breeding and finding out the best
neural network in a large class of local search
algorithms

Neighbor Node
Forward the query
Query
Neighbor Node
Forward the query
5
NeuroSearchs Inputs

The internal structure of NeuroSearch algorithm
Multiple layers enable the algorithm to express
non-linear behavior
With enough neurons the algorithm can universally
approximate any decision function

6
NeuroSearchs Inputs

Bias is always 1 and provides means for neuron to
produce non-zero output with zero inputs
Hops is the number of links the message has gone
this far
Neighbors (also known as currentNeighbors or
MyNeighbors) is the amount of neighbor nodes this
node has
Targets neighbors (also known as toNeighbors) is
the amount of neighbor nodes the messages target
has
Neighbor rank (also known as NeighborsOrder)
tells targets neighbor amoun related to current
nodes other neighbors
Sent is a flag telling if this message has
already been forwarded to the target node by this
node
Received (also known as currentVisited) is a flag
describing whether the current node has got this
message earlier

7
NeuroSearchs Training Program

The neural network weights define how neural
network behaves so they must be adjusted to right
values
This is done using iterative optimization process
based on evolution and Gaussian mutation

Define thenetwork conditions
Iteratethousandsofgenerations
Create candidate algorithmsrandomly
Select the bestones for nextgeneration
Breed a newpopulation
Define the quality requirementsfor the algorithm
Finally select thebest algorithm forthese
conditions
8
Research Environment

The peer-to-peer network being tested contained
100 power-law distributed P2P nodes with 394
links and 788 resources
Resources were distributed based on the number of
connections the node has meaning that
high-connectivity nodes were more likely to
answer to the queries
Topology was static so nodes were not
disappearing or moving
Querier and the queried resource were selected
randomly and 10 different queries were used in
each generation (this was found to be enough to
determine the overall performance of the neural
network)
Requirements for the fitness function were
The algorithm should locate half of the available
resources for every query (each obtained resource
increased fitness 50 points)
The algorithm should use as minimal number of
packets as possible (each used packet decreased
fitness by 1 point)
The algorithm should always stop (stop limit for
number of packets was set to 300)

9
Research Environment
10
Research Cases - Fitness

Fitness value determines how good the neural
network is compared to others
Even smallest and simplest neural networks manage
to have fitness value over 10000
Fitness value is calculated for poor NeuroSearch
as following
Fitness 50 replies packets 50239
1290 10660

Note Because of bug Steiner tree does not locate
half of replies and thus gets a lower fitness
than HDS
11
Research Cases Random Weights

10 million new neural networks were randomly
generated
It seems that over 16000 fitness values cannot be
obtained purely by guessing and therefore we need
optimization method

12
Research Cases - Inputs

Different inputs were tested individually and
together to get a feeling what inputs are
important

Using Hops we can forexample design rules I
have travelled 4 hops,I will not send further
13
Target node contains 10 neighbors,I will send
further
Target node contains the most number
ofneighbors compared to all my neighbors,I will
not send further
14
I have 7 neighbors,I will send further
I have received this query earlier,I will not
send further
15
The results indicate that using only one
topological information is more efficient than
combining it with other topological information
(the explanation for this behavior is still
unclear)
16
Also the results indicate that using only one
query related information is more efficient than
combining it with other query related information
(the explanation for this behavior is also
unclear)
17
Research Cases - Resources

The needed percentage of resources was varied and
the results compared to other local search
algorithms (Highest Degree Search and
Breadth-First Search) and to near-optimal search
trees (Steiner)

Note Breadth-FirstSearch curve needsto be
halved becausethe percentage wascalculated to
half ofresources and not allavailable resources
18
Research Cases - Queriers

The effect of lowering the amount of queriers per
generation to calculate fitness value of neural
network was examined
It was found that the number ofqueriers can be
dropped from 50 to 10 and still we get reliable
fitness values? Speeds up the
optimizationprocess significantly

19
Research Cases Brain Size

The amount of neurons on first and second layer
were varied
It was found that there exists many different
kind of NeuroSearch algorithms

20
Research Cases Brain Size

Also optimization of larger neural networks takes
more time

21
Research Cases Brain Size

And there exists an interesting breadth-first
search vs. depth-first search dilemma where
smaller networks obtain best fitness values with
breadth-first search strategy,
medium-sized networks obtain best fitness values
with depth-first search strategy and
large-sized networks obtain best fitness values
with breadth-first search strategy
In overall it seems that best fitness 18091.0 can
be obtained with breadth-first strategy using 5
hops with neuron size of 2510 (25 on the first
hidden layer and 10 on the second hidden layer)

22
2010 had the greatest average hops value What
happens if the number of neuronson 2nd hidden
layer is increased? Willthe average number of
hops decrease?
2510 had the greatest fitness value Would more
generations than 100.000 increase the fitness
when 1st hiddenlayer contains more than 25
neurons?
23
Summary and Future

The main findings of the thesis were that
Population size of 24 and query amount of 10 are
sufficient
Optimization algorithm needs to be used, because
randomly guessing neural network weights does not
give good results
Individual inputs give better results than
combination of two inputs (however the best
fitnesses can be obtained by using all 7 inputs)
By choosing specific set of inputs NeuroSearch
may imitate any existing search algorithm or it
may behavior as combination of any of those
Optimal algorithm (Steiner) has efficiency of
99, whereas the best known local search
algorithm (HDS) achieves 33 and NeuroSearch 25
Breadth-first search vs. Depth-first search
dilemma exists, but no good explanation can be
given yet

24
Summary and Future

In addition to the problems shown this far, for
the future work of NeuroSearch it is suggested
that
More inputs would be designed such that they
provide useful information e.g., the number of
received replies, inputs used by Highest-Degree
Search algorithm, inputs that define how many
forwarding decisions have already been done in
the current decision round and how many are still
left
Probability based output instead of threshold
function could also be tested
The correct neural network architecture and the
size of population could be dynamically adjusted
during evolution to find an optimal structure
more easily