Replication Strategies in Unstructured PeertoPeer Networks - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Replication Strategies in Unstructured PeertoPeer Networks

Description:

The network has a central index that all peers contact. Ex. Napster ... Disadvantages: Central point of failure, scalability, not a 'true' p2p. Query. Response ... – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 43
Provided by: spencer2
Category:

less

Transcript and Presenter's Notes

Title: Replication Strategies in Unstructured PeertoPeer Networks


1
Replication Strategies in Unstructured
Peer-to-Peer Networks
  • Edith Cohen _at_ ATT
  • Scott Shenker _at_ ICSI
  • March 26, 2003
  • UNC COMP 290-86

2
What are the types of P2P Networks?
  • Three Structural Classifications
  • Centralized
  • Structured Decentralized
  • Unstructured Decentralized

3
Centralized
  • The network has a central index that all peers
    contact
  • Ex. Napster
  • Think of as DNS / directory
  • Advantages Minimize network traffic, search
    whole network quickly
  • Disadvantages Central point of failure,
    scalability, not a true p2p

Query
Response
Request
4
Structured Decentralized
  • No central point
  • Nodes organize themselves in relation to data
  • Ex. Chord, Pastry, CAN
  • Advantages true p2p, expected search times, load
    balancing
  • Disadvantages Extra overhead for structure,
    discovering keys

5
Unstructured Decentralized
  • No central point
  • No relation between topology and data
  • Most popular system
  • Ex. Gnutella, Kazaa
  • Advantages true p2p, low cost to create
  • Disadvantages Overall data knowledge,
    efficiency

6
Fundamental Point of Unstructured
  • Hosts receive a query with no relationship to its
    content
  • Searches are equivalent to randomly probing
    network
  • Goal is to minimize number of hosts contacted to
    resolve query

7
How to minimize?
  • One way Replication!!
  • Can replicate either when an item is stored or
    searched for
  • FastTrack protocol uses super nodes that
    replicate indices of other nodes
  • Note Paper says Gnutella does not support, but
    several clients have UltraPeer / SuperNode/
    Reflector technology

8
Replication Strategies
  • Even though FastTrack searches are faster, it has
    the same replication strategy as vanilla
    Gnutella - number of index entries per item is
    proportional to number of hosts with item
  • Fundamental QuestionWhat is the optimal
    replication strategy?

9
Metric for Replication Strategies
  • Expected search size for successful queries
  • Performance on insoluble queries
  • Related to maximum search size

10
Two Natural Strategies
  • Uniform - replicate everything equally
  • Minimizes maximum search size
  • Proportional - replicate based on search
    popularity
  • Minimizes search size for more queries
  • Minimizes maximum utilization rate

11
Optimal Strategy
  • Proportional and Uniform are the two extremes of
    strategy range
  • Both have same expected search size!
  • However, all the strategies in between are better
  • Will show Square-root minimizes expected
    successful search size
  • Optimal strategy between Uniform and Square-root

12
Model Legend
  • Number of nodes in network
  • Capacity of node (homogeneous)
  • Total capacity of network
  • Number of distinct items in network
  • Number of copies of ith item
  • Fraction of total capacity
  • allocation
  • Fraction of queries for ith item
  • query rates with

13
Model (cont.)
  • A replication strategy is a mapping from the
    query rate distribution to the allocation
  • Minimum fraction of total copies
  • Maximum fraction of total copies

14
Expected Search Size (ESS)
  • The distribution of finding an item is geometric
    with probability ofand therefore an expected
    value of
  • ESS
  • Want to find allocation that minimizes

15
Bounded Search Size
  • L Maximum search size
  • An item is locatable if found by a search with
    high probability
  • The probability for an unsuccessful search of a
    locatable object is
  • The cost of insoluble queries is related to L
  • is the solution that minimizes L

16
Heterogeneity
  • Model assumes constant capacity and bandwidths
  • Variations can be accounted for by using mean of
    capacities weighted by visitation

17
Two Allocation Strategies
  • An allocation is defined when
  • An allocation is uniform when
  • An allocation is proportional when
  • Lemma 3.1Both have the same ESS independent
    of query distribution

18
Characterizing Allocations
  • Choose two items
  • Let
  • Uniform Proportional
  • ESS proportional to
  • Use 1st derivative to find minimum

19
(No Transcript)
20
Allocation Space
  • An allocation lies between uniform and
    proportional if
  • Theorem 3.1 ESS is less for allocations between
    uniform and proportional
  • Lemma 3.2 ESS is larger for allocations outside
    this range

21
Sketch of Theorem 3.1
  • The solution space is the bounded area with
    constrained by and
  • The ESS function is at a maximum
    when at a vertex
  • Uniform or Proportional at that point
  • If a vertex were to be a hybrid of these,
    allocations with larger ESS could be constructed
    iteratively until one is reached

22
Square-root Allocation
  • A square-root allocation has
  • Square-root allocation minimizes ESS
  • is the exact value
  • The gain factor is
  • Lemma 4.2

23
Gain Factors for Query Distributions
Zipf-like query distributions withith item
proportional to i-w
24
Gain Factor for Real-life Distributions
25
Square-root Allocation
  • Fix the set of locatable items and maximum search
    size
  • Square-root may not be defined
  • Square-root minimizes the ESS for these
    constraints
  • Closer to Uniform if insoluble dominate
  • Closer to Square-root if few insoluble

26
Square-root Construction
  • Lemma 5.1 There exists a unique allocation p
    s.t.1)2)This allocation minimizes the ESS

27
Proportional Allocation
  • When fixed as before, proportional may not be
    defined
  • Proportional allocation is the unique allocation
    satisfying the following conditions

28
Replication Algorithms
  • These protocols perform two main tasks
  • Creation Requesting node creates C copies of
    item after successfully querying for it
  • Deletion Mechanisms may vary, but the more
    recent of two copies has the lower probability of
    being deleted at an instance

29
Deletion in Existing Networks
  • Copy deletion occurs in two ways
  • Node going offline - content unavailable
  • Replacement Policy
  • Replacement should be independent of queries
  • Least Recently Used (LRU) and Least Frequently
    Used (LFU) do not satisfy
  • First In First Out (FIFO), fixed lifetime, and
    random do satisfy

30
Steady State
  • System is in steady state when lifetime
    distributions of items do not change with time
  • Let ltCigt be the average value of C for item i
  • Claim 6.1 If ltCigt/ ltCjgt remains fixed, then pi
    / pj approaches qiltCigt/ qj ltCjgt
  • Proportional when same C values for all

31
Designing Square-root Allocations
  • The challenging task is designing square-root
    algorithms with only local information and no
    exterior protocol
  • Corollary 6.1
  • Propose three algorithms to achieve this
  • Path Replication
  • Replication with Sibling-Number Memory
  • Probe Memory

32
Path Replication
  • Remember the ESS Ai for an item is inversely
    proportional to pi, which is proportional to
    qiltCigt
  • If Ci is set to Ai, then at the fixed point they
    are proportional to 1/sqrt(qi)
  • Let ltAigt be the fixed point value. Then new
    copies are generated at Ai / ltAigt of optimal rate
  • May overshoot - converge slowly

33
Replication with Sibling-Number Memory
  • At a given time, search size may not be a good
    indicator for query rate
  • Need additional bookkeeping to stay close to
    fixed point
  • Sibling-Number Memory (SNM)
  • Each copy has record of number of sibling copies
    generated at the time and time print
  • Age of copy related to survival-rate

34
Replication w/ SNM (cont.)
  • Let d be the number of sibling copies generated
    after some past query a and be the survival
    rate, with each copy storing these values
  • Over some time T, is an estimator
    for the number of requests for the item, with PT
    the set of copies within this range
  • is the expectation of over
    copies

35
Replication with Probe Memory
  • Each node records the number of recent probes
    per item and the cumulative search size
  • Probe rate ri qiAi
  • Can estimate qi based on estimates for Ai and ri
    (could aggregate over several nodes)
  • The natural node set is a query path

36
Replication w/ PM (cont.)
  • Let kv be the number of probes v has witnessed
    and Sv be the sum of search sizes. Stored for
    each item.
  • Let V be a set of nodes

37
Optimal Algorithm
  • Remember that Square-root is not the optimal
    algorithm
  • The optimal algorithm is a hybrid of uniform and
    square-root
  • Uniform marks with nodes should have permanent
    copies of item
  • Square-root indicates which nodes should have
    transient copies
  • Parameters need to be adjusted to balance cost of
    insoluble / soluble

38
Simulation
  • The square-root allocation algorithms for path
    replication and replication w/ SNM were simulated
    to illustrate convergence
  • 10,000 nodes with fixed lifetimes and queries
    issued at fixed intervals

39
(No Transcript)
40
(No Transcript)
41
Summary
  • In unstructured peer-to-peer networks, the
    natural strategies of uniform and proportional
    perform equally, with all strategies between
    performing better
  • For soluble queries, Square-root performs best
    with the optimal between this and uniform in
    general
  • Despite its nonlinear and non-local nature, three
    Square-root algorithms are given

42
Realism
  • Like most peer-to-peer algorithms, question of
    performance in realistic environment
  • Is their heterogeneity model accurate?How would
    this correspond to a true Internet environment?
    Mobile environment?
  • What is the effects of a dynamic network?
  • How could this be related to structured p2p
    networks?
Write a Comment
User Comments (0)
About PowerShow.com