P2P Databases - PowerPoint PPT Presentation

About This Presentation

Title:

P2P Databases

Description:

1. Freeform versus structured attribute data ... limewire. icq. fiorana. mojo nation. jxta. united devices. open cola. uddi. process tree. can ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 42

Provided by: ashish9

Learn more at: https://users.cs.northwestern.edu

Category:

more less

Transcript and Presenter's Notes

Title: P2P Databases

1
P2P Databases
2
Overview

0. Data objects, pointers (URLs), and attributes
1. Freeform versus structured attribute data
2. Centralized indices for attribute data and
pointers (ex Napster)
3. Query by flooding (ex Gnutella)
4. DHTs (ex Chord)
5. Problems with DHTs
6. Keyword queries in DHTs (Magnolia)
7. Popularity queries
8. Demo of system
9. (if time) Data transmission - Overlay vs
DHT Multicast - Bittorrent / Splitstream
10. (if time) P2P file systems and versioning
(precursor to undo/redo logging from later in the
course)

3
P2P Today
edonkey
bittorrent
pastry
jxta
can
fiorana
napster
freenet
united devices
open cola
?
aim
ocean store
netmeeting
farsite
gnutella
icq
ebay
morpheus
limewire
seti_at_home
bearshare
uddi
grove
jabber
popular power
kazaa
folding_at_home
tapestry
mojo nation
process tree
chord
4
Object representation and storage
Objects
Attributes Name , Artist, Album , Genre
Pointer to object
5
P2P vs. Distributed DBMS
Traditional DDBMS Issues

Transactions
Distributed Query Optimization
Interoperation of heterogeneous data sources
Reliability/failure of nodes

Complex features do not scale
6
P2P vs. Distributed DBMS

Example application file-sharing
Simple data model and query language
No complex query optimization
Easy interoperation
No guarantee on quality of results
Individual site availability unimportant
Local updates
No transactions
Network partitions OK

Simple Amenable to large-scale network of
PCs
7
Example file sharing

Challenge 1 Performance
Asking everyone is expensive!
If I am smart, I only need to ask one peer
How can I be smart?

File X?
8
Search in P2P

System can control
Connections made by users/topology
Data placement
Query type
Tight control Structured
Efficient, comprehensive
Loose control Unstructured
Inefficient, not comprehensive, simple,
expressive
Used in real life

9
Centralized

Napster model
Benefits
Efficient search
Limited bandwidth usage
No per-node state
Drawbacks
Central point of failure
Limited scale

Bob
Alice
Jane
Judy
10
http//www.snocap.com/
11
Unstructured Query Flooding
12
Problems with unstructured

Inefficient
Query messages are flooded
Even if routing is intelligent, worst case load
is still O(n), where n is nodes in system
Not comprehensive
If I do not get a result for my query, is it
because none exists?
(Of course, many optimizations are possible)

13
Distributed Hash Table (DHTs)

Model
Key/Object pair, the key is hashed to get an ID
Example
Objects are files
The key is the content of the file
The ID is the hash of the file contents
Single operation Lookup(ID)
Input integer ID
Output the object with the corresponding ID

14
Identifiers

IDs are m-bit integers
Nodes are also assigned IDs
Commonly assigned by hashing a nodes IP address,
although many problems with this
An object is stored on the node with the smallest
ID greater than the objects ID
This node is called the successor of the objects
ID
IDs are arranged on a circle, so 0 2m-1

15
Data Placement
0

Nodes
0
1
3

m 3
7
1
1
6

Data
1
2
6

6
2
2
3
5
4
16
Connections

Distance
20
21
.
2m-1

0
7
1
Finger pointers
6
2
3
5
4
17
Query

Lookup(objectID)
objectID is typically the ID of the object you
are looking for, but not necessarily
Approach
Find the predecessor of the object
I.e. the node with the largest ID that is smaller
than the object ID
Return the successor of the predecessor

18
Query Example

Say node 0 wants to find the object with ID 7
For simplicity, we will assume a node exists at
every ID in the space

19
Query Example
0
Node 0 Lookup(7)
7
1
Node 0 FindPred (7)
6
2
3
5
4
20
Query Example
0
Node 4 FindPred(7)
7
1
6
2
3
5
4
21
Query Example
0
Node 6 FindPred(7)
7
1
Node 6 is predecessor Return successor node 7
6
2
3
5
4
22
Query characteristics

With high probability, a query can be answered by
contacting O(log N) nodes
N total nodes in the network
Efficient!
Also notice if an object with the ID exists in
the network, it will be found
Comprehensive!
State is also O(log N) in size

23
Query characteristics

Note that finger pointers are not required for
correct operation
Only successor pointers are needed
But then cost of query increases
O(N) in worst case

24
Advantages of Structured?

Scalability/Efficiency
load grows with O(log N)
Comprehensiveness

25
Disadvantages? (cont)

Availability of Data
If a node dies suddenly, what happens to the data
it was storing?
MUST replicate data across multiple nodes
Query Language
How can we express keyword queries efficiently?
Many useful applications require different
languages

26
Magnolia
27
Resulting Distribution
28
Prefix hashing
29
Balancing
Innovation
Balanced over the sibling group
100
Sibling group ID100
All siblings in a group share the same prefix
30
Insert
Keyword hP? SiblingGroup ID
Random Sibling
Locate a sibling node via SIFT
31
Advantages

Good Balancing Properties

32
Advantages

Low Traffic Load on nodes for popular queries
Quick Lookup
Popularity Ranking of Objects
Distributed Replication for resilience

33
Implementing Magnolia

Developed on top of a chord clone written in
Python
If youre going to write a peer-to-peer app, why
not leverage existing modules and libraries?
Challenge How do we implement group-based stores
and queries without requiring additional network
maintenance?

34
Chords Finger Table

A chord node maintains a finger table of M IPs
pointing to nodes ahead of it in the ring.
A pointer at index i is the successor of node id
(2i-1). This lets us reach any node in the
network in O(log M) hops
We use the M most significant bits in a nodes
id to indicate its group. We want to reach any
group in O(log M) hops.
Do we need another table?
Nope. The last M entries in our finger table
provide this.

35
Talking to Siblings

How do we propagate queries through the group?
Naïve solution send to our predecessor and
successor.
A better solution We can send a query throughout
the group by treating the sibling group as a tree.

36
Sibling Tree
N/N 16 M/M 4
0 1 2 3 4 5
6 7 8 9 10 11 12
13 14 15
0
023
01
8
1
822
122
81
11
2
12
9
5
221
21
521
921
1221
51
91
121
10
11
3
4
6
7
13
14
Every edge can be found in the finger table!
1420
15
37
Sibling Tree Problems