PeerDB: A P2P-based System for Distributed Data Sharing - PowerPoint PPT Presentation

About This Presentation
Title:

PeerDB: A P2P-based System for Distributed Data Sharing

Description:

Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou. Shawn Jeffery ... Allows content-based search. No global schema. Utilizes mobile agents ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 18
Provided by: shawnj8
Category:

less

Transcript and Presenter's Notes

Title: PeerDB: A P2P-based System for Distributed Data Sharing


1
PeerDB A P2P-based System for Distributed Data
Sharing
  • Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying
    Zhou

Shawn Jeffery CS294-4 Peer-to-Peer
Systems 11/05/03
2
Overview
  • A P2P database system
  • Allows content-based search
  • No global schema
  • Utilizes mobile agents
  • Provides flexibility and extensibility
  • Dynamically adjusts topology

3
Background P2P vs Distributed Databases
P2P DDBMS
Membership Ad-hoc Controlled
Schema No global schema Shared (or at least some way to mediate)
Query result set Incomplete Complete
Content location Word of mouth Shared catalog
4
BestPeer
  • Generic P2P platform
  • Mobile Agents
  • Carry code and data
  • Collect stats
  • Security issues?
  • Dynamic Reconfiguration
  • How does this compare to Gia?
  • Location Independent Global Names Lookup (LIGLO)
    Servers
  • Small number
  • Provides a global identity for peers and peer
    status
  • Why not use a DHT/KBR/DOLR?

5
BestPeer Security
  • Private and sharable data
  • Agents only able to access sharable data
  • Does this adequately restrict the power of mobile
    agents?
  • Communications on the wire also encrypted
  • Whats missing?

6
Architecture
Sharable Data
Local Data
Database
7
Schema Mediation
  • Problems with supporting SQL queries
  • No global schema information
  • Different nodes could name the same
    table/attribute differently (len, length)
  • Solution User supplies metadata for each
    relation name and attribute
  • Users expected to do a lot
  • Formula based on matching relation keywords and
    attribute keywords to determine if a query
    matches a table
  • What about other schema mediation work (such as
    Piazza)?

8
Local Query Processing Phase I
  • Master Agent coordinates the entire affair
  • Check Local Dictionary for matching relations
  • Use the relation matching strategy even for the
    local DB
  • Create Relation Matching Agents and flood to
    all neighbors
  • Wait for responses
  • Display results to user as they arrive

9
Local Query Processing Phase II
  • User selects the relations he/she wants
  • Create a Data Retrieval Agent
  • Rewrite query in terms of new relations
  • If local, submit SQL to local db
  • Contact remote nodes directly to access the data
  • Creates remote join plans locally - optimization?

10
Remote Query Processing
  • Phase I Find relations
  • Relation Matching Agents flood with TTL
  • Check Export Dictionary for a match
  • Return matches directly
  • Phase II Get data
  • Data Retrieval Agent submits SQL to DBMS
  • Return data to the requesting node directly
  • Run further data processing before returning
  • Again, security issues

11
Statistics
  • Master Agents monitor stats in the network
  • Keywords for some relations returned during Phase
    I
  • Update metadata
  • Number of objects returned for selected relations
  • Can be used for topology change decisions
  • Use most recently returned results as metric to
    determine who to connect with
  • Frequent updates might need to change neighbors
    after each result returned

12
Caches
  • Cache all query results locally
  • Soft state
  • LRU replacement
  • Users choose which copy they want
  • Only provided with peer id and an indication of
    which is the source
  • What about timestamp, etc?
  • Again, user heavily involved

13
Relation Matching Performance
  • Significant tradeoff between precision and recall
  • Which is more important?
  • Is their approach acceptable?

14
Experimental Methodology
  • Compare P2P Model vs Client/Server model
  • CS returns via the search path (?)
  • Compare static vs reconfigurable networks
  • Compare agent vs message based approach
  • 32 Nodes
  • Is this enough?

15
Evaluation Scenarios (Metrics?)
  • Fixed set of nodes
  • Easily test P2P protocols, Reconfiguration
    strategies
  • Latency
  • Quality and Quantity
  • What else is important?

16
Performance
  • As you increase the amount of storage on each
    node, latency decrease
  • Due to caching
  • In general, reconfiguration performs better
  • Response times O(1 Minute)
  • Is this acceptable?
  • Agent based shown to be better
  • What if agent produces more data than it
    processes?

17
Discussion A P2P DBMS?
  • PeerDB represents a tiny step towards a P2P DB
    (also PIER, Piazza)
  • What does it do right?
  • What else is needed?
  • Is it ideal to have a P2P DB?
  • Is it feasible?
Write a Comment
User Comments (0)
About PowerShow.com