Data in P2P Systems presentation

About This Presentation

Transcript and Presenter's Notes

Title: Data in P2P Systems

1
Data in P2P Systems

Deepak Verma
Sanjay prasad

2
What are P2P systems

Every node is designed to provide some service
that helps other nodes in the network to get
service.
The main goal of Peer to peer networking is to
allow users to share files without putting them
on central servers. They are used mainly for file
sharing.
P2P is not a new concept. The IP routers are peep
to peer.

3
Client Server vs Peer-to-Peer
4
P2P System Goals

Cost sharing/reduction.
Improved scalability/reliability.
Resource aggregation and interoperability
Increased autonomy
Anonymity/privacy
Dynamism
Enabling ad-hoc communication and collaboration.

5
Computer system Taxonomy
Computer Systems
Centralized Systems (mainframes, SMPs,
workstations)
Distributed Systems
Client - server
Peer to Peer
Flat
Hierarchical
pure
Hybrid
6
P2P Taxonomy
P2P systems
Distributed Computing
File sharing
Collaboration
platforms
7

Parallelizable Parallelizable P2P applications
split a large task into smaller sub-pieces that
can execute in parallel over a number of
independent peer nodes.
Content and file management. Content and file
management P2P applications focus on storing
information on and retrieving information from
various peers in the network.

Collaborative Collaborative P2P applications
allow users to collaborate, in real time, without
relying on a central server to collect and relay
information.

9
P2P System Architecture
tools
applications
services
Application specific Layer
Class specific Layer
scheduling
Meta-data
messaging
management
Resource aggregation
Robustness Layer
security
reliability
Locating and routing
Group Management Layer
discovery
Communication Layer
communication
10

Communication The fundamental challenge of
communication in a P2P community is overcoming
the problems associated with the dynamic nature
of peers.
Group Management This includes discovery of
other peers in the community and location and
routing between those peers.
Robustness There are three main components that
are essential to maintaining robust P2P systems
security, resource aggregation and reliability.

Class specific application-specific components
abstract functionality from each class of P2P
application.
Application specific Tools, applications, and
services implement application-specific
functionality, such as content file management.

12
P2P system characteristics

Systems discover topology and maintain it.
Systems are neither client nor server.
Systems continually talk to each other.
They are inherently fault tolerant.
Systems are autonomous.

13
P2P system characteristics

Systems have no distinguished role.
No single point of bottleneck or failure.
Routing will depend on service and data.
Some applications (like Napster) are a mix of P2P
and centralized systems.

14
P2P file sharing vs. web resource sharing

P2P file systems are easily to set up rather than
maintaining a web server for an ordinary user.
Applications like Napster provide easy to use
interface.
P2P users can share specific type of contents
like music, video etc.,
Massive storage is possible because of
distributed sharing.

15
P2P Architecture Classification

Centralized service Location (SCL)
Napster
Distributed service location with flooding(DSLF)
Gnutella
Distributed service location with hashing(DSLH).
CAN,pastry,Tapestry,Chord

16
Centralized Service Architecture
17
Distributed Search/Flooding
18
Distributed Search/Flooding
19
Distributed search with hash table
20
Distributed search with hash table
21
Comparison
22
Comparison

CSL has services bottleneck
DSLF has occasional failure to find a file
DSLH more scalable

23
Data-Sharing P2P Systems Open Problems
24
Implementation Choices

Topology
How elements are put together
From free to rigid topologies
Data and meta-data placement
Gnutella, super-peer networks, Chrod
Message routing
Query language
Keyword-based (less expressive), SQL-like (more
expressive)

25
Resulting properties

Autonomy
How, to whom and what to share, how to answer,
etc
Robustness
Maintain the quality of searches with the
presence of failures
Efficiency
Absolute resources consumed (processing power,
disk storage, bandwidth, etc)
More efficiency higher throughput
Accuracy of answers
Depends mostly on the query language
Comprehensiveness
Sometimes is not reachable (partial search)

26
Research Challenges

Autonomy/Efficiency correlation
Autonomy/Robustness correlation
Finding techniques to decouple or best tradeoffs
More autonomy but more complex sophisticated
search alg.
Data and meta-data replication technigues

27
Research Challenges, contd

Quality of Service (QoS)
Different metrics to measure (number of results,
response time, comprehensiveness, etc)
Problem of achieving required QoS as efficient as
possible
E.g., number of results is important for QoS in
Gnutella the QoS/efficiency tradeoff
Problem of making QoS invariable to changes of
other factors (topology, number of nodes, etc)

28
The Lookup Problem

The lookup problem is simple to state Given a
data item X stored at some dynamic set of nodes
in the system, find it.
One approach is maintain a central database that
maps a file name to the locations of servers that
store file.
The traditional approach to achieving scalability
is to use hierarchy.
Symmetric lookup algorithms. Unlike the
hierarchy, no node is more important than any
other node as far as the lookup process is
concerned.

29
Distributed Hash Table (DHT)

A hash-table interface is an attractive
foundation for a distributed lookup algorithm
because it places few constraints on the
structure of keys or the values they name.
The main requirements are that data be identified
using unique numeric keys, and that nodes be
willing to store keys for each other.

30
Distributed Hash Table (cont.)

A DHT implements just one operation lookup(key)
yields the network location of the node currently
responsible for the given key.

31
Distributed Hash Table (cont.)

Mapping keys to nodes in a load-balanced way.
Forwarding a lookup for a key to an appropriate
node.
Distance function.
Building routing tables adaptively.

32
P2P Algorithms

Routing in One Dimensions
Chord skiplist-like routing
Pastry tree-like data structure.
Routing in Multiple Dimensions
Content-Addressable Network (CAN)

33
Example of Chord
34
Lookup in Chord

Each node in Chord has a finger table containing
the IP address of a node halfway around the ID
space from it, a quarter-of-the-way, and so forth
in power of two.
A node forwards a query for k to the node in its
finger table with the highest ID not exceeding k
the ID of this node is called successor of k .

35
Lookup in Chord (cont.)

The power-of-two structure of the finger table
ensures that the node can always forward the
query at least half of the remaining ID-space
distance to k.
As a result Chord lookups use O(logN) messages to
resolve a query.

36
Lookup in Chord (cont.)

Chord ensures correct lookups in the face of node
failures and arrivals using a successor list
each node keeps track of IP address of the next r
nodes immediately after it in ID space.
A query to make incremental progress in ID space
even if many finger-table entries turn out point
to failed or nonexistent nodes.

37
Lookup in Chord (cont.)
6
Successor(6) 0
Successor(2) 3
Finger tables and key locations for a net with
nodes 0, 1, and 3, and keys 1, 2, 6.
38
Node Joins Chord

When a node n joins the network
Initialize the predecessor and fingers of node n.
Update the fingers and predecessors of existing
nodes to reflect the addition of n.
Copy all keys for which node n has became their
successor to n.

39
Example for Addition
Finger tables and key locations after node 6
joins.
40
Ad-hoc networks and Peer to Peer

Wireless adhoc networks have many similarities to
peer to peer systems.
No a priori knowledge.
No given infrastructure.

41
References

www.oreilly.com
P2P Journal
P2P tutorial Don Towsley

Write a Comment

User Comments (0)

About PowerShow.com

Data in P2P Systems PowerPoint PPT Presentation