Title: PeertoPeer Applications Part2: Peer selection and dynamics of a p2p network
1Peer-to-Peer Applications-Part-2 Peer
selection anddynamics of a p2p network
- CS 7270
- Networked Applications Services
- Lecture-10
2Reading
- Understanding KaZaA by J. Liang et al.
- Based on 2004 paper
- Measurement-based study
- Focuses on KaZaA (FastTrack architecture), but
most of the observations are important for
general p2p unstructured systems
3Understanding KaZaA
- Jian Liang
- Rakesh Kumar
- Keith Ross
- Polytechnic University
- Brooklyn, N.Y.
4Internet Traffic
CF CacheLogic
5KaZaA/FastTrack Operation
- Top file sharing system
- 3 million active nodes
- four clients KaZaA, KaZaA-lite, Grokster and
iMesh - Good availability and scalability
- Proprietary protocol signaling traffic encrypted
- in contrast with Gnutella and e-mule
6Purpose of Measurement Study
- Try to understand highly successful file-sharing
system - Overlay topology and dynamics
- Peer selection
- Index management
7Existing Tools and Projects
- FastTrack encryption algorithm
- available from a Web site http//gift-fasttrack.b
erlios.de/ - KaZaA Media Desktop (KMD) software architecture
- http//kazaasearch.narod.ru/
8Big Picture of Overlay
- Two layer hierarchy
- Ordinary Node (ON)
- Super Node (SN)
- SNs are generally more powerful machines (CPU,
network bw) and they are NOT behind NATs
9FastTrack architecture
- Each ON has a parent SN node
- For each shared file, ON uploads to parent SN
- Filename, ContentHash, file descriptors
(metadata) - Parent SN provides ON with SN refresh list
- Up to 200 alive SNs, then stored at ON cache
- For each SN, the list includes IP address, port
number, SN workload (defined as ?), freshness,
and timestamp - SNs also exchange SN refresh lists
- Each SN maintains local index for all children
ONs - Each SN maintains TCP connections with other SNs
- Overlay net
- If an SN cannot answer a query, it forwards query
to other SN peers - TTL-limited flooding
- Actual file transfer is directly between peers
(not through overlay) using HTTP - All signaling traffic is encrypted
10Measurement Apparatus
- KaZaA Sniffing Platform
- KaZaA Probing Tool
11KaZaA Sniffing Platform
- Poly (Ethernet)
- Home (cable modem)
12KaZaA Probing Tool
- Campus home based probing
- Probe arbitrary SNs
- Retrieve their SN refresh lists
- Obtain workload of probed SN
13Signaling Protocol
ON-SN session initial (repeat for 5 SNs)
SN-SN session initial
14TCP Connections Evolution at instrumented SN node
Poly campus 4 6 hour measurement
Cable modem 7-11 hour measurement
15Some basic calculations
- Estimate total number of SNs, assuming about 3M
users (typical in 2004) - About 25000-40000 SNs
- Estimate probability of SN-SN link
- About 0.1
16Signaling Sessions Lifetime
- Measured over a period of 12 hours
- Avg duration 34mins (ON-SN) and 11mins (SN-SN)
- 30-40 of connections (both types) last for less
than 30 seconds! - What causes short-lived ON-SN connections?
- What causes short-lived SN-SN connections?
17Parent selection
- Recall that ON receives a list of 200 SNs from
its parent SN - Then, it can select a new parent
- How would you select the parent SN?
18SN workload vs of connections
7 - 11 hours TCP connections evolution
7 - 11 hours workload values evolution
19Peer Selection the workload of the SN clearly
matters
20Locality in Peer Selection (graphs show
percentage of SNs in the SN list having common
prefix with child ON and parent SN)
21Peer Selection it appears that RTT also
matters 40 of ON-SN connections have
RTT
22Index Management 1) No index exchange between
SNs2) SN purges metadata of ON as soon as that
child disconnects from parent3) Highly skewed
contribution of metadata by different peers
23Firewall evasion NAT circumvention
- Found only 3.6 of SNs use the default 1214 port
number - Earlier KaZaA clients used this default port, but
it was easy for net-admins to block them - 18,887 SNs (96.3) use non-default and
dynamically assigned port numbers - How is this done?
- NAT traversal if peer B is behind NAT, then peer
A contacts Bs parent, and the latter asks B to
initiate connection to A
24Summary of Results
- 20,000 40,000 active supernodes
- Each SN connects to approx. 0.1 of other SNs
- Highly dynamic connections over 35 SN-SN
durations are less than 30 sec.
25Summary of results
- Peer selection uses IP prefix match, workload,
RTT and freshness - No index exchange between SNs, but query
forwarding - Skewed content distribution 20 peers provide
70 metadata for sharing
26Design Principles forUnstructured P2P Overlays
- Distributed design
- No infrastructure
- Also helps with legal attacks...
- Exploit heterogeneity
- Hierarchy
- Self organization
- Load balancing based on workload metric
- Explicit locality awareness
- Shuffle connections in core overlay
27Design Principles forUnstructured P2P Overlays
- Properly designed gossip mechanisms
- peers have a fresh list of SNs
- Firewall circumvention
- dynamic port numbers
- improves availability
- NAT circumvention