Title: Yevgeniy Ivanchenko University of Jyv
1Adaptation of Neural Nets
for Resource Discovery Problem in
Dynamic and Distributed P2P
Environment
- Yevgeniy IvanchenkoUniversity of Jyväskylä
- yeivanch_at_cc.jyu.fi
2OBJECTIVES (I)
- Since nothing is known about decision mechanism
of NeuroSearch we need to look inside the
algorithm to understand its behavior. - Since nothing is known about behavior of
NeuroSearch algorithm in dynamic environment, we
need to know its behavior under conditions that
are approximated to real life situation.
3OBJECTIVES (II)
- To understand behavior of NeuroSearch data
analysis techniques were used. The
Self-Organizing Maps (SOM) is well known tool to
perform data mining task. - Set of rules was obtained based on the analysis
of NeuroSearch. The rules were tested in static
environment. The question that arises here Is it
possible to use the algorithm, which utilized
properties of static environment, in dynamic
scenario? -
4OBJECTIVES (III)
- If we know the inner structure of decision
mechanism of NeuroSearch we will be able to tell
about contribution of every input to particular
decision of the algorithm. This for example can
be used to remove unnecessary input information. - This also can help evaluate complexity and
robustness of the algorithm.
5SOM (I)
- SOM is neural network model that maps high
dimensional space onto low-dimensional space
(usually two dimensional). - After using SOM algorithm similar vectors from
the input space are located near each other in
the output space. This can help investigate
properties of obtained clusters and as a
consequence causes that produced these clusters
on the output map.
6SOM (II)
- Usually SOM represents itself either hexagonal or
rectangular grid of neurons. In the figure R1
and R2 denote different neighborhood size. - During the training process size of neighborhood
is slightly decreased to provide more accurate
adjustment of the weights of the neurons.
R2
R1
7SOM (III)
- In the figure one can see that the neurons that
are covered by neighborhood kernel function
move closer to the input vector. - Best Matching Unit (BMU) is the closest neuron to
the current input vector. - The weights of the neurons are updated according
to the kernel function and the distance to BMU.
BMU
8DATA ANALYSIS (I)
- NeuroSearch can be considered as the main part of
information model of the system. To build this
system black box method was used we are modeling
external behavior of the system and at the same
time we dont know what are the causes of
particular behavior of the system. - To investigate decision mechanism of NeuroSearch
analysis of input-output pairs was done using SOM.
9DATA ANALYSIS (II)
- To perform the analysis we used Component plane
U-matrix with hit distribution on it. Component
plane visualizes values of all components of the
vectors according to the output map. U-matrix is
one of possible ways to visualize the output map.
The hits on the U-matrix correspond to the
decisions of NeuroSearch. - This approach allows us investigating not only
contribution of each component to particular
decision, but also the correlations between
components.
10DATA ANALYSIS (III)
toUnsearchedNeighbors
U-matrix
- The figure shows U-matrix (the left side of the
figure) fragment of Component plane (the right
side of the figure). - It is easy to see variable From is responsible
for stopping further forwarding of the queries
where it is 1. - Other variables have different values in the area
where From is 1, for example variable
toUnsearchedNeighbors has different values in
this area.
From
11DATA ANALYSIS (IV)
- After the analysis it was found that 4 variables
(From, toVisited, Sent and currentVisited) are
responsible for stopping further forwarding of
the queries. - Variables toUnsearchedNeighbors and Neighbors are
correlated. - Variables packetsNow and Hops are highly
correlated. - Variables fromNeighborAmount, packetsNow and Hops
are correlated somehow. - NeuroSearch mostly doesnt send the queries
further if Neighbors or toUnsearchedNeighbors is
small.
12DATA ANALYSIS (V)
- Further investigation of the algorithm is based
on Hops because only this variable shows the
state of the algorithm in particular time
interval, in other words analyzing intervals of
this variable we can monitor the queries through
their path. - The maximum length of the queries path is 7.
Thus we have 7 different cases to analyze. - Data for each case contains only samples with the
currently investigating value of Hops variable.
All samples where at least one of From, Sent,
currentVisted or toVisited variables is equal to
1 were removed as well. It is because we already
know behavior of the algorithm in these areas.
13DATA ANALYSIS (VI)
- After investigation of the algorithm for the
different values of Hops we have produced Rule
Based Algorithm (RBA). RBA is based on rules that
were extracted using analysis of U-matrix and
corresponding component plane. - General strategy of the algorithm is quite
simple A decision is mostly based on
interconnection between Hops, Neighbors/toUnsearch
edNeighbors and NeighborsOrder values. In the
beginning the algorithm sends the queries to the
most connected nodes. When number of hops in the
query is increasing NeuroSearch slightly starts
to forward the queries to low-connected nodes. -
14DATA ANALYSIS (VII)
The table shows efficiency of four algorithms.
One can see that NeuroSearch and RBA have almost
the same level of performance. This means that
RBA adapted behavior of NeuroSearch and we can
say that SOM suits well for analyzing of
NeuroSearch. Both these algorithms have better
performance compared to BFS2 and BFS3.
Comparison between algorithms
Algorithm Packets Replies
BFS-2 3000 619
BFS-3 12464 1325
NeuroSearch 4703 979
RBA 4904 963
15DYNAMIC ENVIRONMENT (I)
- Since RBA is based on decision mechanism of
NeuroSearch it is possible to evaluate behavior
of NeuroSearch using RBA in dynamic environment. - As a simulation environment P2P extension for
NS-2 was built. - The environment provides quite high dynamical
changes. There are two different classes of
probabilities that define dynamical changes in
the network. The first class is defined randomly
before starting the simulation. The second is
defined by the formulas
16DYNAMIC ENVIRONMENT (II)
To make qualitative evaluation of performance,
RBA was compared to BFS2 and BFS3 in static and
dynamic environments. Number of replies and
amount of used packets in static environment are
shown in the figures
17DYNAMIC ENVIRONMENT (III)
- Analyzing behavior of the algorithms in static
environment one can see that mostly RBA locates
more resources than BFS2 and significantly less
than BFS3. - In general RBA uses more packets than BFS2 and
significantly less than BFS3. - This situation satisfies us because RBA is based
on NeuroSearchs decision mechanism that is
trained to locate only half of available
resources. - In some points RBA locates more resources than
BFS3 algorithm and in the same time uses less
packets. This means that if some resource isnt
common in the network, RBA and as a consequence
NeuroSearch can find enough instances of this
resource.
18DYNAMIC ENVIRONMENT (IV)
Number of replies and amount of used packets in
dynamic environment are shown in the figures
Analyzing the figures one can see that
performance of the algorithms didnt suffer so
much in the dynamic environment.
19DYNAMIC ENVIRONMENT (V)
Total number of located resources and used
packets in static and dynamic environment are
shown in the table
Algorithm Packets Packets Replies Replies
Algorithm Static dynamic static dynamic
BFS2 3000 2515 619 528
BFS3 12464 10040 1325 1245
RBA 4904 4865 963 900
The algorithms still can find enough resources in
dynamic environment. There are two possible
causes that can explain the fact that all
investigated algorithms found a little bit fewer
resources 1) Some nodes in offline mode could
contain queried resources. 2) Some nodes in
offline mode could lie on possible path of the
query.
20DYNAMIC ENVIRONMENT (VI)
- The algorithms used less packets in dynamic
environment than in static environment. - BFS strategy is very sensitive to the size of the
network, because BFS based algorithms used
significantly less packets in dynamic environment
where size of the network was smaller all the
simulation time. - RBA used approximately the same amount of packets
in both environments. Therefore we can say that
RBA is not strongly sensitive to the size of the
network.
21FUTURE WORK
- Developing the supervised approach to train
NeuroSearch. - Developing modification of the algorithm for ad
hoc wireless P2P networks. - Paying more detailed and deeper attention to the
inner structure of the algorithm, using knowledge
discovery methods. - Investigating and utilizing properties of other
P2P algorithms to answer to the question about
adding these properties to NeuroSearch.
22Thank you!