Title: Building NeuroSearch
1Building NeuroSearch Intelligent Evolutionary
Search Algorithm For Peer-to-Peer
EnvironmentMasters Thesis by Joni Töyrylä
3.9.2004
- Mikko Vapa, researcher studentInBCT 3.2 Cheese
Factory / P2P Communication - Agora Center
- http//tisu.it.jyu.fi/cheesefactory
2Contents
- Resource Discovery Problem
- Related Work
- Peer-to-Peer Network
- Neural Networks
- Evolutionary Computing
- NeuroSearch
- Research Environment
- Research Cases
- Fitness
- Population
- Inputs
- Resources
- Queriers
- Brain Size
- Summary and Future
3Resource Discovery Problem
- In peer-to-peer (P2P) resource discovery problem
a P2P node decides based on local knowledge
which neighbors would be the best targets (if
any) for the query to find the needed resource - A good solution locates the predetermined number
of resources using minimal number of packets
4NeuroSearch
- NeuroSearch resource discovery algorithm uses
neural networks and evolution to adapt its
behavior to given environment - neural network for deciding whether to pass the
query further down the link or not - evolution for breeding and finding out the best
neural network in a large class of local search
algorithms
Neighbor Node
Forward the query
Query
Neighbor Node
Forward the query
5NeuroSearchs Inputs
- The internal structure of NeuroSearch algorithm
- Multiple layers enable the algorithm to express
non-linear behavior - With enough neurons the algorithm can universally
approximate any decision function
6NeuroSearchs Inputs
- Bias is always 1 and provides means for neuron to
produce non-zero output with zero inputs - Hops is the number of links the message has gone
this far - Neighbors (also known as currentNeighbors or
MyNeighbors) is the amount of neighbor nodes this
node has - Targets neighbors (also known as toNeighbors) is
the amount of neighbor nodes the messages target
has - Neighbor rank (also known as NeighborsOrder)
tells targets neighbor amoun related to current
nodes other neighbors - Sent is a flag telling if this message has
already been forwarded to the target node by this
node - Received (also known as currentVisited) is a flag
describing whether the current node has got this
message earlier
7NeuroSearchs Training Program
- The neural network weights define how neural
network behaves so they must be adjusted to right
values - This is done using iterative optimization process
based on evolution and Gaussian mutation
Define thenetwork conditions
Iteratethousandsofgenerations
Create candidate algorithmsrandomly
Select the bestones for nextgeneration
Breed a newpopulation
Define the quality requirementsfor the algorithm
Finally select thebest algorithm forthese
conditions
8Research Environment
- The peer-to-peer network being tested contained
- 100 power-law distributed P2P nodes with 394
links and 788 resources - Resources were distributed based on the number of
connections the node has meaning that
high-connectivity nodes were more likely to
answer to the queries - Topology was static so nodes were not
disappearing or moving - Querier and the queried resource were selected
randomly and 10 different queries were used in
each generation (this was found to be enough to
determine the overall performance of the neural
network) - Requirements for the fitness function were
- The algorithm should locate half of the available
resources for every query (each obtained resource
increased fitness 50 points) - The algorithm should use as minimal number of
packets as possible (each used packet decreased
fitness by 1 point) - The algorithm should always stop (stop limit for
number of packets was set to 300)
9Research Environment
10Research Cases - Fitness
- Fitness value determines how good the neural
network is compared to others - Even smallest and simplest neural networks manage
to have fitness value over 10000 - Fitness value is calculated for poor NeuroSearch
as following - Fitness 50 replies packets 50239
1290 10660
Note Because of bug Steiner tree does not locate
half of replies and thus gets a lower fitness
than HDS
11Research Cases Random Weights
- 10 million new neural networks were randomly
generated - It seems that over 16000 fitness values cannot be
obtained purely by guessing and therefore we need
optimization method
12Research Cases - Inputs
- Different inputs were tested individually and
together to get a feeling what inputs are
important
Using Hops we can forexample design rules I
have travelled 4 hops,I will not send further
13Target node contains 10 neighbors,I will send
further
Target node contains the most number
ofneighbors compared to all my neighbors,I will
not send further
14I have 7 neighbors,I will send further
I have received this query earlier,I will not
send further
15The results indicate that using only one
topological information is more efficient than
combining it with other topological information
(the explanation for this behavior is still
unclear)
16Also the results indicate that using only one
query related information is more efficient than
combining it with other query related information
(the explanation for this behavior is also
unclear)
17Research Cases - Resources
- The needed percentage of resources was varied and
the results compared to other local search
algorithms (Highest Degree Search and
Breadth-First Search) and to near-optimal search
trees (Steiner)
Note Breadth-FirstSearch curve needsto be
halved becausethe percentage wascalculated to
half ofresources and not allavailable resources
18Research Cases - Queriers
- The effect of lowering the amount of queriers per
generation to calculate fitness value of neural
network was examined - It was found that the number ofqueriers can be
dropped from 50 to 10 and still we get reliable
fitness values? Speeds up the
optimizationprocess significantly
19Research Cases Brain Size
- The amount of neurons on first and second layer
were varied - It was found that there exists many different
kind of NeuroSearch algorithms
20Research Cases Brain Size
- Also optimization of larger neural networks takes
more time
21Research Cases Brain Size
- And there exists an interesting breadth-first
search vs. depth-first search dilemma where - smaller networks obtain best fitness values with
breadth-first search strategy, - medium-sized networks obtain best fitness values
with depth-first search strategy and - large-sized networks obtain best fitness values
with breadth-first search strategy - In overall it seems that best fitness 18091.0 can
be obtained with breadth-first strategy using 5
hops with neuron size of 2510 (25 on the first
hidden layer and 10 on the second hidden layer)
222010 had the greatest average hops value What
happens if the number of neuronson 2nd hidden
layer is increased? Willthe average number of
hops decrease?
2510 had the greatest fitness value Would more
generations than 100.000 increase the fitness
when 1st hiddenlayer contains more than 25
neurons?
23Summary and Future
- The main findings of the thesis were that
- Population size of 24 and query amount of 10 are
sufficient - Optimization algorithm needs to be used, because
randomly guessing neural network weights does not
give good results - Individual inputs give better results than
combination of two inputs (however the best
fitnesses can be obtained by using all 7 inputs) - By choosing specific set of inputs NeuroSearch
may imitate any existing search algorithm or it
may behavior as combination of any of those - Optimal algorithm (Steiner) has efficiency of
99, whereas the best known local search
algorithm (HDS) achieves 33 and NeuroSearch 25 - Breadth-first search vs. Depth-first search
dilemma exists, but no good explanation can be
given yet
24Summary and Future
- In addition to the problems shown this far, for
the future work of NeuroSearch it is suggested
that - More inputs would be designed such that they
provide useful information e.g., the number of
received replies, inputs used by Highest-Degree
Search algorithm, inputs that define how many
forwarding decisions have already been done in
the current decision round and how many are still
left - Probability based output instead of threshold
function could also be tested - The correct neural network architecture and the
size of population could be dynamically adjusted
during evolution to find an optimal structure
more easily