Title:
1Hybrid Search Schemes for Unstructured
Peer-to-Peer NetworksRandom Walks in
Peer-to-Peer Networks
- Christos Gkantsidis, Milena Mihail, Amin Saberi
- Presented by Paul Bogdan
- February 28th, 2007
2Hybrid Search Schemes for Unstructured
Peer-to-Peer Networks
Christos Gkantsidis, Milena Mihail, Amin Saberi
3Outline
- Random Graph Models
- Flooding and Normalization
- Random Walks and Replication
- Generalized Search Schemes
- Experimental evaluation
4Motivation
- Flooding small time-to-live (TTL) performs well
in regular graphs - Performance metric number of exchanged
messages/distinct response - Its performance decreases when TTL increases or
for irregular networks - Random Walk performs better than flooding
- scalability, granularity
- Hybrid Generalized search schemes
- Random Walks with lookahead, Random Walks with
1-step replication
5Contribution
- Random walks (RW) with shallow flooding offer
good performance (analytic justification) - R1 In a random graph model with O(n) nodes of
constant degree and - O(n1/2) nodes of degree O(n1/2) the expected time
to discover O(n) is O(n1/2). - R2 Random Walks with look-ahead 1 or 1-step
replication perform better - when there is discrepancy on the degrees of the
underlying topology. - Normalized Flooding (NF) solution
- R3 NF achieves comparable performance to
flooding in regular graphs. - R4 NF with 1-step replication achieves
performance comparable to RW - with 1-step replication.
- R5 Local information of the network (nodes
degree) offers global benefit. - Generalized Search Schemes
6Random Graph Models
- Random Regular Graphs Gn,d
- Gn,d represents a graph with n nodes and each
node is of degree d. - Gn,d has a sum of degree D nd .
- Random Graphs with super-nodes - Gn,d,a,ß
- Given a and ß constants, Gn,d,a,ß denotes a
graphs with an1/2 of degree ßn1/2 (i.e. large
vertices) and the remaining nodes of degree d
(i.e. small vertices). - Gn,d,a,ß has a sum of degree D (aßd)n.
7Flooding and Normalization
- Theorem 3.1. Let us consider Gn,d random regular
graph, flooding scenario from node v with
time-to-live t, S the number of distinct nodes
queried by flooding with S V / 2 - Claims
-
(1) -
(2) -
(3)
8 9 10(No Transcript)
11 12Flooding and Normalization
- Theorem 3.2. Let Gn,d,a,ß be a random graph with
supernodes and a flooding scenario from node v of
degree d with time-to-live t. - Claim For some t O(log log n), the number of
distinct responses is O(n). - Proof
- Consider flooding with t c logd-1(log n)1 and
vertices visited with TTL t-1. - Assumption this set (of visited nodes) doesnt
contain a large degree vertex. - From d-regular graphs we know that this set
contains at least (d - 1)t-1 edges. - The probability that no vertex in G(St-1(v)) is
bounded by (d/(daß))(d - 1)(t-1)
(d/(daß))clog n so within the first O(loglog n)
steps we see a large vertex. -
13Flooding and Normalization
- Theorem 3.3. Let Gn,d,a,ß be a random graph
with supernodes, a normalized - flooding scenario from node v with TTL
. Then the number of distinct - responses is O((d - 1)t-1) and the number of
messages per response is O(1). - Proof
- From Theorem 3.1. the number of minigroups seen
is (d - 1)t-1 - The expected number of small vertices is Q (d
(d - 1)t-1)/(daß) - Let Xi, i 1,,N be random variables with P
Xi1pi and PXi01-pi - Using the above Chernoff bound the probability
that less than Q/2 are seen is - vanishingly small.
14Random Walks and Replication
- Random Walk with Look-Ahead
- a random walk with shallow flooding on each step
of the walk - RW with lookahead 1 visits O(n) nodes with
response O(n(1/2)) - Theorem 4.2. Let Gn,d,a,ß be a random graph with
supernodes and consider a - random walk from a node v. Then, in 1-step
replication scenario, the expected - number of messages and response time to obtain
distinct - responses is
15- Theorem 4.3. Let Gn,d,a,ß be a random graph with
supernodes and consider - Normalized flooding from v with TTL t (log
n)/(2log(d-1)). Then, in 1-step - replication scenario, the number of distinct
responses is at least - and the number of messages is at most
- Proof
- The number of minigroups seen is (d - 1)t 1 and
using the Chernoff bounds - there will be
minigroups corresponding to large vertices.
16Generalized Search Schemes
- Searching procedure
- A node of degree d initiates a search based on a
budget k - budget number of messages that are propageted
in the network - Among its d neighbors the node picks certain
quantities k1,k2,,kd such that k1 k2 kd
k - For every neighbor i the master node forwards the
message with budget ki ( for ki 0 the message
is not transmitted) - Each neighbor i reduces the budget by 1 unit and
repeat the process until the budget is greater
than 0 - Every node that receives the message for the
second yime from another neighbor forwards the
message with the corresponding budget - Random Walks Flooding
17Experimental Evaluation
- Methodology
- Performance Metrics
- Median and Mean number of distinct peers
discovered (hits) - Minimum, Maximum, Standard Deviation of the
number of hits - Number of messages
- Granularity of number of messages
- Response time
- Topologies
- Random d-Regular Graphs
- Power Law Graphs
- Bimodal topologies
- Clustered topologies
18Normalized Flooding (NF)
- Mean number of unique peers discovered as a
function of the initial TTL - NF and Standard Flooding behave similarly in
Regular Graphs - NF controls the number of messages and provides
higher efficiency
19Normalized Flooding (NF)
- The number of unique peers increases
exponentially with TTL in NF case - The number of peers increases faster than
exponentially with TTL in topologies with high
degrees
20Random Walk with 1-step replication
21Random Walk with LookAhead (RWLA)
- RWLA performance is similar to long RW without
lookahead (in terms of unique peers discovered) - RWLA response time is much smaller compared to
standard RW
22Edge Criticality Searching with weights
- Generalized Searching performs similarly to
Standard Flooding in regular graphs - Generalized Searching behaves similarly to
Standard Flooding in other topologies if
normalized edge criticality is used.
23Conclusions
- Normalized Flooding (NF) could substitute the
Standard Flooding in irregular graphs - RW with 1-step replication performs better than
RW and NF in irregular graphs - Open for improvements
- Generalized schemes (analytic investigation)
- Quantifying Directional flooding
24Random Walks in Peer-to-Peer (P2P)
Networks
- Christos Gkantsidis, Milena Mihail, Amin Saberi
25Outline
- Motivation
- Statistical Estimation and Random Walks (RW)
- Searching
- Methodology and Topologies importance
- Construction and Summary
26Motivation
- Random Walks (RW) were proposed for constructing
searching and topology maintenance protocols in
P2P networks - RW improve searching performance as compared to
flooding (Cao et al., 2002) - A RW approach to constructing and maintaining
unstructured topologies provides good
connectivity properties (i.e. constant degree,
constant expansion) - Claim RW approach is a good candidate
- to simulate uniform sampling
- the number of simulation steps required can be as
low as the number of samples in independent
uniform sampling - Searching and Overlay Topology Construction
- RW searching performs better than flooding for
the same number of messages and for cluster and
slow dynamic topologies - Construction of P2P networks by random walks
27Statistical Estimation Random Walks
- Coupon collection and Chernoff bounds
- n - type of coupons each time one is drawn
(uniformly distributed) - Tn - time by which we extracted coupons belonging
to all n types - Tan - time by which we encountered an distinct
types, 0 lt a lt 1 - X1,,Xk independent Bernoulli trials, PXi1pi
and PXi01-pi - p - probability that a random drawn object has a
particular property - the probability that the property is found in
substantially fewer draws than its frequency in
the search space and the quality of the estimator
X/k are bounded by
28Statistical Estimation Random Walks
- Random Walks (RW), Convergence and Cover Time
- G (V,E) undirected graph, V n, and di-
degree of vertex I - Aij - adjacency matrix, P - transition matrix
which satisfies - f V?0,1 which satisfies
- Convergence rate metric - the rate at which the
RW approaches the stationary distribution - Cover time metric - the time by which all nodes
were visited - Trajectory sample average - the rate at which the
value of f averaged over successive vertices of
the RW trajectory approaches p
29Statistical Estimation Random Walks
- Convergence rate is related to the second
eigenvalue of P -
(1) - yt the vertex that the RW visited at time t
- Cover time
-
(2) - Trajectory sample average
-
(3)
(1) 11, (2) 12, 13 , (3) 3, 4, 5, 6
30Statistical Estimation Random Walks
- Second Eigenvalue, Expansion and Conductance
- S subset of V, C(S) cutset of V (i.e. edges with
one point in S and the other one in V\S), vol(S)
(i.e. the sum of degrees of vertices in S) - Expansion
- Conductance
- Known bound
11, 14, 15, 16, 17, 18, 19
31Searching
- Performance metrics for Flooding and RW
- average number of distinct copies of an item
located in the search - number of messages used by the searching
algorithm - RW performs better than flooding if
- multiple search requests for the same item with
slow-changing topology - peer clustering ( see 20, 21, 22, 23, 24, 25
for details) - Searching analysis
- Methodology
- Flat topologies with Uniformly Distributed
Content - Topologies with Peer Clustering
- Re-issuing the Same Query
- Real topologies
32Searching - Methodology
- Performance Metrics
- mean of the number of distinct copies (i.e. Mean)
- discrepancy around the mean (i.e. Std) and the
failure probability - Cost
- number of messages or queries performed during
search - Peer-to-peer topologies ( 1 million nodes)
- Flat regular expanders, Two tier topologies with
clustering, Power law graphs, Samples from real
topologies - Dynamic topologies
- rewiring
- Content placement
- Content clustering affects the performance of
searching
33Searching Flat Topologies
- Experiment
- one request in a network of 500K peers
- Mean hits, Minimum of hits and Std are similar
for Flooding and RW - the entire distribution of hits is similar for
Flooding and RW
34Searching -Topologies with Peer Clustering
- Cluster topology consists of
- 5 flat regular graphs of size 40K from each one
pick randomly 1000 nodes to construct another
flat regular graph - Number of hits for RW is more concentrated around
the mean compared to Flooding
35Searching - Reissuing the Same Query
- Experiment setup repeat 4 times the below
procedure - each peer sends a request and waits for response
- between requests 2 of the links are rewired
- each peer initiates a new searching
- RW have better performance than Flooding
- Mean Hits and Failure Probability
36Searching - Reissuing the Same Query
- Performance of successive searches depends
- on the number of topology changes considered
between consecutive searches - Performance of Flooding increases as the rate of
topological changes increases - RW Performance remains the same for small
variations
37Searching Real Topologies
- The number of hits for RW is more concentrated
around the mean than in Flooding - P2P have good expansion properties
38Construction
- P2P network construction concerns with
- peers arrive and leave the network dynamically
- strong and weak decentralization
- low network overhead per addition or deletion
39Baseline Construction of Expander Graphs
- ABASE (undirected graph) consists of
- n vertices where each one chooses randomly d
vertices - total number of edges nd and expected vertex
degree 2d - Theorem 4.1. Let G(V,E) a graph constructed by
ABASE. - Then, G is an expander with high probability and
for positive - constant a lt 1
40Baseline Construction of Expander Graphs with
Constant Overhead in Random Bits
- ABASE construction algorithm
- start a RW at a random vertex on H (constant
degree expander graph) - when ABASE needs a random number this is taken
from the RW on H - Theorem 4.2. Let G(V,E) a graph constructed by
ABASE. - There are positive constants a, 0 lt ß lt 0.5 such
that any - subset S of at least ßV and at most 0.5V has
cutset - expansion a almost surely.
41Distributed Construction of Expanders with
Constant Overhead on Network Resources
- AH construction
- d daemons , one for each Hamilton cycle
- a new arriving node, it contacts the daemon
associated with the i-th Hamilton cycle - it attaches after c number of steps between the
peer that currently hosts daemon i and one of its
neighbors in the cycle i
42Distributed Construction of Expanders with
Constant Overhead on Network Resources
- AM construction
- d daemons , one for each Hamilton cycle
- the arrival of a new arriving node consists of
two X and Y nodes X and Y contact the central
server to discover the location of the d daemons - X becomes the neighbor of daemon i and Y the
neighbor of the initial daemons neighbor
43Summary
- For Searching
- Random Walks (RW) are superior to Flooding
- For Construction
- RW add new peers with constant overhead
- Open Problems
- Strong Decentralized Construction algorithm
- Can we handle better deletions and expansions of
small sets? - How the P2P network parameters (e.g. capacities)
affect the performance of RW?