Title: Data in P2P Systems
1Data in P2P Systems
- Deepak Verma
- Sanjay prasad
2What are P2P systems
- Every node is designed to provide some service
that helps other nodes in the network to get
service. - The main goal of Peer to peer networking is to
allow users to share files without putting them
on central servers. They are used mainly for file
sharing. - P2P is not a new concept. The IP routers are peep
to peer.
3Client Server vs Peer-to-Peer
4P2P System Goals
- Cost sharing/reduction.
- Improved scalability/reliability.
- Resource aggregation and interoperability
- Increased autonomy
- Anonymity/privacy
- Dynamism
- Enabling ad-hoc communication and collaboration.
5Computer system Taxonomy
Computer Systems
Centralized Systems (mainframes, SMPs,
workstations)
Distributed Systems
Client - server
Peer to Peer
Flat
Hierarchical
pure
Hybrid
6P2P Taxonomy
P2P systems
Distributed Computing
File sharing
Collaboration
platforms
7- Parallelizable Parallelizable P2P applications
split a large task into smaller sub-pieces that
can execute in parallel over a number of
independent peer nodes. - Content and file management. Content and file
management P2P applications focus on storing
information on and retrieving information from
various peers in the network.
8- Collaborative Collaborative P2P applications
allow users to collaborate, in real time, without
relying on a central server to collect and relay
information.
9P2P System Architecture
tools
applications
services
Application specific Layer
Class specific Layer
scheduling
Meta-data
messaging
management
Resource aggregation
Robustness Layer
security
reliability
Locating and routing
Group Management Layer
discovery
Communication Layer
communication
10- Communication The fundamental challenge of
communication in a P2P community is overcoming
the problems associated with the dynamic nature
of peers. - Group Management This includes discovery of
other peers in the community and location and
routing between those peers. - Robustness There are three main components that
are essential to maintaining robust P2P systems
security, resource aggregation and reliability.
11- Class specific application-specific components
abstract functionality from each class of P2P
application. - Application specific Tools, applications, and
services implement application-specific
functionality, such as content file management.
12P2P system characteristics
- Systems discover topology and maintain it.
- Systems are neither client nor server.
- Systems continually talk to each other.
- They are inherently fault tolerant.
- Systems are autonomous.
13P2P system characteristics
- Systems have no distinguished role.
- No single point of bottleneck or failure.
- Routing will depend on service and data.
- Some applications (like Napster) are a mix of P2P
and centralized systems.
14P2P file sharing vs. web resource sharing
- P2P file systems are easily to set up rather than
maintaining a web server for an ordinary user.
Applications like Napster provide easy to use
interface. - P2P users can share specific type of contents
like music, video etc., - Massive storage is possible because of
distributed sharing.
15P2P Architecture Classification
- Centralized service Location (SCL)
- Napster
- Distributed service location with flooding(DSLF)
- Gnutella
- Distributed service location with hashing(DSLH).
- CAN,pastry,Tapestry,Chord
16Centralized Service Architecture
17Distributed Search/Flooding
18Distributed Search/Flooding
19Distributed search with hash table
20Distributed search with hash table
21Comparison
22Comparison
- CSL has services bottleneck
- DSLF has occasional failure to find a file
- DSLH more scalable
23Data-Sharing P2P Systems Open Problems
24Implementation Choices
- Topology
- How elements are put together
- From free to rigid topologies
- Data and meta-data placement
- Gnutella, super-peer networks, Chrod
- Message routing
- Query language
- Keyword-based (less expressive), SQL-like (more
expressive)
25Resulting properties
- Autonomy
- How, to whom and what to share, how to answer,
etc - Robustness
- Maintain the quality of searches with the
presence of failures - Efficiency
- Absolute resources consumed (processing power,
disk storage, bandwidth, etc) - More efficiency higher throughput
- Accuracy of answers
- Depends mostly on the query language
- Comprehensiveness
- Sometimes is not reachable (partial search)
26Research Challenges
- Autonomy/Efficiency correlation
- Autonomy/Robustness correlation
- Finding techniques to decouple or best tradeoffs
- More autonomy but more complex sophisticated
search alg. - Data and meta-data replication technigues
27Research Challenges, contd
- Quality of Service (QoS)
- Different metrics to measure (number of results,
response time, comprehensiveness, etc) - Problem of achieving required QoS as efficient as
possible - E.g., number of results is important for QoS in
Gnutella the QoS/efficiency tradeoff - Problem of making QoS invariable to changes of
other factors (topology, number of nodes, etc)
28The Lookup Problem
- The lookup problem is simple to state Given a
data item X stored at some dynamic set of nodes
in the system, find it. - One approach is maintain a central database that
maps a file name to the locations of servers that
store file. - The traditional approach to achieving scalability
is to use hierarchy. - Symmetric lookup algorithms. Unlike the
hierarchy, no node is more important than any
other node as far as the lookup process is
concerned.
29Distributed Hash Table (DHT)
- A hash-table interface is an attractive
foundation for a distributed lookup algorithm
because it places few constraints on the
structure of keys or the values they name. - The main requirements are that data be identified
using unique numeric keys, and that nodes be
willing to store keys for each other.
30Distributed Hash Table (cont.)
- A DHT implements just one operation lookup(key)
yields the network location of the node currently
responsible for the given key.
31Distributed Hash Table (cont.)
- Mapping keys to nodes in a load-balanced way.
- Forwarding a lookup for a key to an appropriate
node. - Distance function.
- Building routing tables adaptively.
32P2P Algorithms
- Routing in One Dimensions
- Chord skiplist-like routing
- Pastry tree-like data structure.
- Routing in Multiple Dimensions
- Content-Addressable Network (CAN)
33Example of Chord
34Lookup in Chord
- Each node in Chord has a finger table containing
the IP address of a node halfway around the ID
space from it, a quarter-of-the-way, and so forth
in power of two. - A node forwards a query for k to the node in its
finger table with the highest ID not exceeding k
the ID of this node is called successor of k .
35Lookup in Chord (cont.)
- The power-of-two structure of the finger table
ensures that the node can always forward the
query at least half of the remaining ID-space
distance to k. - As a result Chord lookups use O(logN) messages to
resolve a query.
36Lookup in Chord (cont.)
- Chord ensures correct lookups in the face of node
failures and arrivals using a successor list
each node keeps track of IP address of the next r
nodes immediately after it in ID space. - A query to make incremental progress in ID space
even if many finger-table entries turn out point
to failed or nonexistent nodes.
37Lookup in Chord (cont.)
6
Successor(6) 0
Successor(2) 3
Finger tables and key locations for a net with
nodes 0, 1, and 3, and keys 1, 2, 6.
38Node Joins Chord
- When a node n joins the network
- Initialize the predecessor and fingers of node n.
- Update the fingers and predecessors of existing
nodes to reflect the addition of n. - Copy all keys for which node n has became their
successor to n.
39Example for Addition
Finger tables and key locations after node 6
joins.
40Ad-hoc networks and Peer to Peer
- Wireless adhoc networks have many similarities to
peer to peer systems. - No a priori knowledge.
- No given infrastructure.
41References
- www.oreilly.com
- P2P Journal
- P2P tutorial Don Towsley