Title: Computer networks
1Computer networks
- Lecture 11 P2P, Semantic P2P
- Prof. Younghee Lee
-
2Peer-to-Peer?
- Centralized server
- Distributed server
- Client server paradigm
- Plat RPC
- Hierarchical DNS, mount
- Peer to Peer paradigm
- both a client and a transient server Easy part
- How a peer determines which peers have the
desired content? - Connected peers that have copies of the desired
object. Difficult part - Dynamic member list makes it more difficult
- Pure Gnutella, Chord
- Hybrid Napster, Groove
- Other challenges
- Scalability up to hundred of thousands or
millions of machines - Dynamicity machines can come and go any time
3P2P file sharing
- Napster
- Centralized, sophisticated search
- C-S search
- Point to point file transfer
- Gnutella
- Decentralized directory
- Flooding, TTL, unreachable nodes
- FastTrack (KaZaA)
- Heterogeneous peers
- Freenet
- Anonymity, caching, replication
4Routing Structured Approaches
- Goal make sure that an item (file) identified is
always found in a reasonable of steps - Abstraction a distributed hash-table (DHT) data
structure - insert(id, item)
- item query(id)
- Note item can be anything a data object,
document, file, pointer to a file - Proposals
- CAN (ICIR/Berkeley)
- Chord (MIT/Berkeley)
- Pastry (Rice)
- Tapestry (Berkeley)
5High level idea Indirection
- Indirection in space
- Logical IDs (Content based)
- Routing to those IDs
- Content addressable network
- Tolerant of nodes joining and leaving the
network - Indirection in time
- Scheme to temporally decouple send and receive
- Soft state
- publisher requests TTL on storage
- Distributed Hash Table
6Distributed Hash Table (DHT)
- Hash table
- Data structure that maps keys to values
- DHT
- Hash table
- but spread across the Internet
- Interface
- insert(key, value)
- lookup(key)
- Every DHT node supports a single operation
- Given key as input route messages toward node
holding key
7DHT in action put()
(K1,V1)
insert(K1,V1)
Operation take key as input route messages to
node holding key
8DHT in action get()
retrieve (K1)
Operation take key as input route messages to
node holding key
9Routing Chord
- Associate to each node and item a unique id in an
uni-dimensional space - Goals
- Scales to hundreds of thousands of nodes
- Handles rapid arrival and failure of nodes
- Properties
- Routing table size O(log(N)) , where N is the
total number of nodes - Guarantees that a file is found in O(log(N)) steps
10AsideConsistent Hashing Karger97
- A key is stored at its successor node with next
higher ID - This is designed to let nodes enter and leave the
network with minimal disruption
11Routing Chord Basic Lookup
12Routing Finger table - Faster Lookups
13Routing join operation
14Routing Chord Summary
- Assume identifier space is 02m
- Each node maintains
- Finger table
- Entry i in the finger table of n is the first
node that succeeds or equals n 2i - Predecessor node
- An item identified by id is stored on the
successor node of id - Pastry
- Similar to Chord
15CAN
- Associate to each node and item a unique id in an
d-dimensional space - Virtual Cartesian coordinate space
- Entire space is partitioned amongst all the nodes
- Every node owns a zone in the overall space
- Abstraction
- Can store data at points in the space
- Can route from one point to another
- Point node that owns the enclosing zone
- Properties
- Routing table size O(d)
- Guarantees that a file is found in at most dn1/d
steps, where n is the total number of nodes
16CAN E.g. Two Dimensional Space
- Space divided between nodes
- All nodes cover the entire space
- Each node covers either a square or a rectangular
area of ratios 12 or 21 - Example
- Assume space size (8 x 8)
- Node n1(1, 2) first node that joins ? cover the
entire space
7
6
5
4
3
n1
2
1
0
2
3
4
5
6
7
0
1
17CAN E.g. Two Dimensional Space
- Node n2(4, 2) joins ? space is divided between
n1 and n2
7
6
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
18CAN E.g. Two Dimensional Space
- Node n2(4, 2) joins ? space is divided between
n1 and n2
7
6
n3
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
19CAN E.g. Two Dimensional Space
- Nodes n4(5, 5) and n5(6,6) join
7
6
n5
n4
n3
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
20CAN E.g. Two Dimensional Space
- Nodes n1(1, 2) n2(4,2) n3(3, 5)
n4(5,5)n5(6,6) - Items f1(2,3) f2(5,1) f3(2,1) f4(7,5)
7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
21CAN E.g. Two Dimensional Space
- Each item is stored by the node who owns its
mapping in the space
7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
22CAN Query Example
- Each node knows its neighbors in the d-space
- Forward query to the neighbor that is closest to
the query id - Example assume n1 queries f4
7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
23Routing Concerns/optimization
- Each hop in a routing-based P2P network can be
expensive - No correlation between neighbors and their
location - A query can repeatedly jump from Europe to North
America, though both the initiator and the node
that store the item are in Europe! - Solutions Tapestry takes care of this
implicitly CAN and Chord maintain multiple
copies for each entry in their routing tables and
choose the closest in terms of network distance - CAN/Chord Optimizations
- Weight neighbor nodes by RTT
- When routing, choose neighbor who is closer to
destination with lowest RTT from me - Reduces path latency
- Multiple physical nodes per virtual node
- Reduces path length (fewer virtual nodes)
- Reduces path latency (can choose physical node
from virtual node with lowest RTT) - Improved fault tolerance (only one node per zone
needs to survive to allow routing through the
zone) - What type of lookups?
- Only exact match!
24BitTorent
- A p2p file sharing system
- Load sharing through file splitting
- Uses bandwidth of peers instead of a server
- Successfully used
- Used to distribute RedHat 9 ISOs (about 80TB)
- Setup
- A seed node has the file
- File is split into fixed-size segments (256KB
typ) - Hash calculated for each segment
- A tracker node is associated with the file
- A .torrent meta-file is built for the file
identifies the address of the tracker node - The .torrent file is passed around the web
25BitTorent
- Download
- A client contacts the tracker identified in the
.torrent file (using HTTP) - Tracker sends client a (random) list of peers who
have/are downloading the file - Client contacts peers on list to see which
segments of the file they have - Client requests segments from peers (via TCP)
- Client uses hash from .torrent to confirm that
segment is legitimate - Client reports to other peers on the list that it
has the segment - Other peers start to contact client to get the
segment (while client is getting other segments)
26Conclusions
- Distributed Hash Tables are a key component of
scalable and robust overlay networks - CAN O(d) state, O(dn1/d) distance
- Chord O(log n) state, O(log n) distance
- Both can achieve stretch
- Simplicity is key
- Services built on top of distributed hash tables
- p2p file storage, i3 (chord)
- multicast (CAN, Tapestry)
- persistent storage (OceanStore using Tapestry)
27Semantic based p2p systems 1
- P2P systems no notion of semantics difficulty
in knowledge sharing - Semantic web
- knowledge sharing among different nodes with
possibly different schemas. use centralized
repository - Combining P2P and Semantic system large scale
collection of structured data - Now
- Several efforts to such directions
- no efficient scalable semantic P2P system yet
important area of research
28Semantic based p2p systems 1
- Efforts to this direction 3 categories
- Addressing Storing and querying of RDF stores
- Resource Description Framework (RDF) a general
method of modeling information through a variety
of syntax formats. - Addressing issues of semantic interoperability in
a peering setting - Attempts to build scalable infrastructures for
semantic systems - GridVine semantic interoperability scalability
issues - Benchmarking system on this now)
29Various approaches
- Addressing Storing and querying of RDF stores
- RDF data in distributed environment
- Mediator stores a central schema and does query
reformulation - Different mediator architectures Hierarchical
Mediator architecture (HMA) - Semantic Query Routing in Unstructured Networks
using social metaphors - Human being may observe the communication between
others and knows who may be able to answer the
queries - Expertise based peer selection
- Addressing issues of semantic interoperability in
a peering setting - Semantic interoperability problem by focusing on
the (pairwise) mapping between dynamic schemas. - Semantic Mapping by Approximation
- An approximation value that indicates how
strongly a concept is a subconcept of the second
is calculated for each pair of concepts. - Satisfying Ontology Mapping satisfying decision
maker 3
30Various approaches
- Attempts to build scalable infrastructures for
semantic systems - Active XML
- dynamic XML documents over web services for
distributed data integration - Edutella
- attempts to design and implement a schema based
P2P infrastructure for the semantic web. It
focuses on the exchange of learning material. - Piazza
- a peer data management system that facilitates
decentralized sharing of heterogeneous data - PIER
- P2P Information Exchange and Retrieval (PIER) a
P2P query engine for query processing in Internet
scale distributed systems. - PeerDB
- an object management system that provides
sophisticated searching capabilities. - GridVine 4
- the first attempt at using a structured overlay
network (namely P-Grid) to realize semantic
overlays - It realizes semantic overlays by separating a
logical layer from a physical layer, applying the
well known database principle of data
independence.
31Various approaches
- Semantic-based Peer- to-Peer systems
- SWAP Project (Semantic Web and Peer-to-Peer).
- X-Leges System. This is for Legislative document
exchange. - Semantic based P2P System for local e-Government
Madrid university - Many e-government within semantic web P2P
- DIP projects, IFIP WG8.5, . e-government
- Semantic Grid Resource Discovery using DHTs in
Atlas Athens university - Resource discovery in semantic Grid RDF based
query answering on top of P2P networks - A semantic web service based P2P infrastructure
for the interoperability of medical information
system METU Turkey
32Semantic Overlay Networks in P2P2
- Hash-based queries scale, but not semantic
- How about forwarding queries to peer that are
more likely to have what you find - Ontological structure
- Query is routed within each relevant cluster only
- Reduce flood messages (comparing the case where
you search the contents by flooding)
33Semantic Overlay Networks in P2P
Semantic Overlay Network
- Overlay network associated with a concept of
classification hierarchy
34Semantic Overlay Networks in P2P
- Generating semantic overlay network
- Less nodes per SON more results sooner
- Less SONs per node less connections
- Coverage?
35Semantic Overlay Networks in P2P
- More SONs per node high coverage
- too many link many query messages
- Layered SONs
- Choosing SONs to join
- c1, c2, c9(for c3, c4), c12(for c5, c6 .c8)
Hierarchy of concepts
15
36Semantic Overlay Networks in P2P
37References
- Vijay Srinivas Agneeswaran, A Survey of Semantic
Based Peer-to-Peer Systems, Distributed
Information Systems Lab (LSIR) Ecole
Polytechnique Federale de Lausanne. - Arturo Cresco et al., Semantic Overlay Networks
for P2P systems, Google Technologies Inc.,
Stanford University - Marc Erhig and Steffen Staab. Satisficing
Ontology Mapping. In Steffen Staab and Heiner
Stuckenschmidt, editors, Semantic Web and
Peer-to-Peer. Springer-Verlag, Berlin Heidelberg,
2006. - Karl Aberer, Philippe Cudre-Mauroux, Manfred
Hauswirth, and Tim Van Pelt. GridVine Building
Internet-Scale Semantic Overlay Networks. In
Sheila A. McIlraith, Dimitris Plexousakis, and
Frank van Harmelen, editors, International
Semantic Web Conference, volume 3298 of Lecture
Notes in Computer Science, pages 107121.
Springer, 2004.
2009-07-15
37
ICE0602 Ubiquitous Networking