ORCHESTRA: Facilitating Collaborative Data Sharing - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

ORCHESTRA: Facilitating Collaborative Data Sharing

Description:

. Jim's. Restaurant. rating. name. . Geno's. ... . 4. rating. id. Ratings ... Pat's = . Challenges solved in a CDSS. Intermittent participation ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 15
Provided by: nic198
Category:

less

Transcript and Presenter's Notes

Title: ORCHESTRA: Facilitating Collaborative Data Sharing


1
ORCHESTRA Facilitating Collaborative Data
Sharing
Nicholas Taylor Department of Computer and
Information Science University of
Pennsylvania Joint work with T.J. Green,
Z. Ives, G. Karvounarakis, and V. Tannen
  • Research Forum _at_ Penn Engineering
  • February 20, 2007

2
The Data Integration Problem
3
Standards-Based Data Sharing
  • Works well for applications that
  • Query but do not make updates
  • Are not worried about contradictory information
  • Downsides
  • Need to design standard
  • Difficult to change standard once implemented
  • Can be difficult to represent all information
  • Sources may lack some of the data

4
Conflicting Data is Inevitable
  • Independent sources, conflicting data
  • Database constraints often reveal conflicts
  • e.g. more than one rating for a given restaurant

Inquirer DB
Daily News DB
5
Multiple Users Conflicting Data!
  • Synchronizing PDA and computer calendars
  • HotSync, iSync, etc.
  • Many possible conflicts
  • Collaborators share citation databases
  • Different abbreviation styles
  • Different citation programs
  • Biologists databases
  • Collect new information from published databases
  • Store information from own experiments
  • Disagree about key data points

6
A New Model for Data Sharing
  • Single global database not possible
  • Participants prefer their own data
    representations
  • Conflicting data is present, but conflicts are
    localized
  • Collaborative Data Sharing System (CDSS)
  • Synchronize databases by sharing updates made to
    individual databases
  • Each participant creates its own global
    database by deciding which updates to apply
  • ORCHESTRA is our implementation of a CDSS

7
CDSS Overview
Participant 2
Participant 1
CDSS (ORCHESTRA)
  • CDSS coordinates with other participants
  • Stores updates (publication)
  • Finds updates participant wants to apply
    (reconciliation)
  • Other participants may use a different data
    representation
  • or introduce contradictory information

8
Trust Policies in a CDSS
Pasion! ééé
Pats ééé
Pats éé
Pasion! ééé
Alma de Cuba éé
Pats éé
Pats ééé
  • Allow participants to trust all, some, or none of
    another participants updates
  • Specify preferences in the event of conflicting
    updates

Pasion! ééé
Alma de Cuba éé
Pats ééé
Pats éé
I trust the Inquirer, and the Daily News only for
restaurants in South Philly.
I trust both papers, but I prefer the Daily News
for steak shops.
Pats ?
9
Challenges solved in a CDSS
  • Intermittent participation
  • Need for consistent, predictable behavior
  • Translating updates between different data
    representations
  • Reference Z. Ives et al., ORCHESTRA Rapid,
    Collaborative Sharing of Dynamic Data, CIDR 05.

10
Data Sharing Operations
  • Operations involve only one participant
  • Publishing
  • Reconciliation
  • Participant applies consistent subset of updates
  • May get its own unique instance

d
Publish New Updates
request
Reconciliation Requests
Published Updates
Local Database
Update Log
d?
d?
11
Implementation Details
  • Peer-to-Peer Java implementation
  • Efficient algorithm for reconciliation
  • Distributed processing
  • First implementation scales to 10s to 100s of
    participants
  • Reference N.E. Taylor Z.G. Ives, Reconciling
    while Tolerating Disagreement in Collaborative
    Data Sharing, SIGMOD 06.

12
Ongoing work
  • Improving performance using caching and
    pre-computation
  • Improving reliability using replication
  • Incorporate update translation
  • First implementation assumes a single data
    representation

13
Conclusions
  • Flexible model of data sharing
  • Synchronization tolerates disagreement
  • P2P network for reliable data storage and
    distributed processing

http//www.cis.upenn.edu/zives/orchestra/
14
Related Work
  • Inconsistency repair
  • Bry97, ABC99
  • Causal ordering in distributed DBs with
    replication
  • Optimistic Concurrency Control KR81,
    Version vectors PPR83,
  • Distributed file systems
  • Ivy MMGC02, Coda Braam98,KS95,
    Bayou TTP96,
  • File synchronization
  • Unison PV04, Harmony BVP06
  • Version control (CVS, Subversion, etc.)
Write a Comment
User Comments (0)
About PowerShow.com