Scalability Lecture - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Scalability Lecture

Description:

... Gnutella network as shown on the LimeWire web site, the popular Gnutella file-sharing client. ... 'Free Riding on Gnutella,' First Monday 5(10) ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 25
Provided by: iant158
Category:

less

Transcript and Presenter's Notes

Title: Scalability Lecture


1
Scalability Lecture
  • Optimizing P2P Networks Lessons learned from
    social networking
  • Social Networks
  • Lessons Learned
  • Are P2P Networks Social??
  • Organizing P2P Networks
  • Peer Topologies
  • Centralized, Ring, Hierarchical Decentralized
  • Hybrid
  • Centralized-Ring
  • Centralized-Centralized
  • Centralized-Decentralized
  • Reflector Nodes
  • Gnutella Case Studies
  • 3 case studies

2
Social Networks
  • Stanley Milgram (Harvard professor) 1967
    social networking experiment
  • How many social hops would it take for
    messages to traverse through the US population
    (200 million)
  • Posted 160 letters randomly chosen people in
    Omaha, Nebraska
  • Asked them to try to pass these letters to a
    stockbroker working in Boston, Massachusetts
  • Rules
  • use intermediacies whom they know on a first
    name basis
  • chosen intelligently
  • make a note at each hop
  • 42 letters made it !!
  • Average of 5.5 hops
  • Demonstrated the small world effect

Proved that the social network of the United
States is indeed connected with a path-length
(number of hops) of around 6 The 6 degrees of
separation !
Does this mean that it takes 6 hops to traverse
200 million people??
3
Lessons Learned from Milgrims Experiment
  • Social circles are highly clustered
  • A few members have wide-ranging connections
  • these form a bridge between far-flung social
    clusters
  • this bridging plays a critical role in bringing
    the network closer together
  • For example
  • A quarter of all letters passed through a local
    storekeeper
  • A half were mediated by just 3 people
  • Lessons Learned
  • These people acted as gateways or hubs between
    the source and the wider world
  • A small number of bridges dramatically reduces
    the number of hops

4
From Social Networks toComputer Networks
  • There are a number of similarities to social
    networks
  • People peers
  • Intermediaries Hubs, Gateways or Rendezvous
    Nodes (JXTA speak...)
  • Number of intermediaries passed through number
    of hops
  • Are P2P Networks Special then?
  • P2P networks are more like social networks than
    other types of computer network because they are
    often
  • Self Organizing
  • Ad-Hoc
  • Employ clustering techniques based on prior
    interactions (like we form relationships)
  • Decentralized discovery and communication (like
    we form neighbourhoods, villages, cities etc)

5
Peer to Peer Whats the problem?
  • Problem how do we organize peers within ad-hoc,
    multi-hop pervasive P2P networks?
  • network of self-organizing peers organized in a
    decentralized fashion
  • such networks can rapidly expand from a few
    hundred peers to several thousand or even
    millions
  • P2P Environment Recap
  • Unreliable Environments
  • Peers connecting/disconnecting network
    failures to participation
  • Random Failures e.g. power outages, Cable, DSL
    failure, hackers
  • Personal machines are much more vulnerable than
    servers
  • algorithms have to cope with this continuous
    restructuring of the network core.
  • P2P systems need to treat failures as normal
    occurrences not freak exceptions
  • must be designed in a way that promotes
    redundancy with the tradeoff of a degradation of
    performance

6
So, how do we Organize Networks inOrder to Get
Optimum Performance?
  • For P2P
  • This does not mean abstract numerical benchmarks
    e.g. how many milliseconds will it take to
    compute this many millions of FFTs?
  • Rather, it means asking question like
  • How long will it take to retrieve this
    particular file?
  • How much bandwidth will this query consume?
  • How many hops will it take for my package to get
    to a peer on the far side of the network?
  • If I add/remove a peer to the network will the
    network still be fault tolerant?
  • Does the network scale as we add more peers. Such
    networks can rapidly expand from a few hundred
    peers to several thousand or even millions

7
Performance Issues in P2P Networks
3 main factors that make P2P networks more
sensitive to performance issues
  • Communication.
  • Fundamental necessity
  • Users connected via different connections speeds
  • Multi-hop
  • 2. Searching
  • No central Control so more effort is needed
  • Each hop adds to total bandwidth problems time
    outs
  • 3. Equal Peers
  • Free Riders unbalance in the harmonicity of
    network
  • Degrades performance for others
  • Need to get this right to adjust accordingly

8
Peer Topologies
  • Core
  • Centralized
  • Ring
  • Hierarchical
  • Decentralized
  • Hybrid
  • Centralized-Ring
  • Centralized-Centralized
  • Centralized-Decentralized

9
Centralized
  • Client/server
  • Web servers
  • Databases
  • Napster search
  • Instant Messaging
  • Popular Power

10
Ring
  • Fail-over clusters
  • Simple load balancing
  • Assumption
  • Single owner

11
Hierarchical
  • Tree structure
  • DNS
  • Usenet (sort of)

12
Decentralized
  • Gnutella
  • Freenet
  • Internet routing

13
Centralized Ring
  • Robust web applications
  • High availability of servers

14
Centralized Centralized
  • N-tier apps
  • Database heavy systems
  • Web services gateways
  • Google.com uses this topology to deliver their
    service

15
Centralized Decentralized
  • New Wave of P2P
  • Clip2 Gnutella Reflector (next)
  • FastTrack
  • KaZaA
  • Morpheus
  • Email
  • Like Social Networks perhaps ?

16
Reflector Nodes
  • Known as super peers in JXTA these are
    Rendezvous peers
  • cache file list of connected users maintain an
    index
  • When a query is issued, the Reflector does not
    retransmit it - it answers the query from its own
    memory
  • Do they remind you of anything ?

17
Napster Gnutella?
Gnutella
Napster
User
Napster.com
?
1. Natural??
2. Reflector (clip2.com)
18
The Gnutella Network Today
The figure below is a view of the topology of a
Gnutella network as shown on the LimeWire web
site, the popular Gnutella file-sharing client.
Notice how the power-law or centralized-decentrali
zed structure is demonstrated.
19
Another View of the Gnutella Network
20
Gnutella Studies 1 Free Riding
E. Adar and B.A. Huberman (2000), Free Riding
on Gnutella, First Monday 5(10),
http//firstmonday.org/issues/issue5_10/adar/inde
x.html
Two types of free riding
  • download files but never provide any files for
    other to download
  • users that have undesirable content
  • They found 22,084 of the 33,335 peers in the
    network (66) of the peers share no files
  • 24,347 or 73 share ten or less files
  • top 1 percent (333 hosts) represent 37 percent
    of the total files shared
  • 20 percent (6,667 hosts) sharing 98 of the
    files

shows - even without Gnutella Reflector nodes,
the Gnutella network naturally converges into a
centralized decentralized topology with the top
20 of nodes acting as super peers or reflectors
21
Gnutella Studies 2 Equal Peers
Study on Reflector Nodes clip www.clip2.com
Studied Gnutella for one month
  • Noted an apparent scalability barrier when query
    rates went above 10 per second.

Why??
  • Gnutella query 560 bits long and queries make
    up approximately one quarter of traffic.
  • Each peer is connect to three peers, so 560
    10 3 16,800 bytes per second
  • This is a quarter of the traffic so total
    traffic 67,200 bytes per second.
  • a 56-K link cannot keep up with this amount of
    traffic
  • one node connected in the incorrect place can
    grind the whole network to a halt.
  • This is why P2P networks place slower nodes at
    the edges

22
Gnutella Studies 3 Communication
Peer-to-Peer Architecture Case Study Gnutella
Network Matei Ripeanu, on-line at
http//people.cs.uchicago.edu/matei/PAPERS/P2P200
1.pdf
Studied topology of Gnutella over several months
reported two findings
  • Gnutella network shares the benefits and
    drawbacks of a power-law structure
  • - networks that organize themselves so that most
    nodes have a few links and a small number of
    nodes have many
  • - found to show an unexpected degree of
    robustness when facing random node failures.
  • - vulnerable to attacks e.g. by removing a few of
    the super nodes can have a massive effect on the
    function of the network as a whole.
  • Gnutella network topology does not match well
    with the underlying Internet topology leading to
    inefficient use of network bandwidth.
  • He gave 2 suggestions
  • use an agent to monitor network and intervene by
    asking servents to drop/add links to keep the
    topology optimal.
  • replace the Gnutella flooding mechanism with a
    smarter routing and group communication
    mechanism.

23
What about other topologies The Future?
  • Centralized Hierarchical?
  • Back end tree of information
  • Caching architectures
  • Decentralized Ring?
  • P2P network of fail-over clusters
  • More ??

24
Closing Remarks
  • Summary
  • Centralized Decentralized understand from the
    original Gnutella to the new models
  • The role of Reflector nodes
  • Further Information Distributed Hashtable Models
  • Pastry http//research.microsoft.com/antr/pastry
  • Chord http//www.pdos.lcs.mit.edu/chord/
Write a Comment
User Comments (0)
About PowerShow.com