Title: ORCHESTRA: Facilitating Collaborative Data Sharing
1ORCHESTRA Facilitating Collaborative Data
Sharing
Nicholas Taylor Department of Computer and
Information Science University of
Pennsylvania Joint work with T.J. Green,
Z. Ives, G. Karvounarakis, and V. Tannen
- Research Forum _at_ Penn Engineering
- February 20, 2007
2The Data Integration Problem
3Standards-Based Data Sharing
- Works well for applications that
- Query but do not make updates
- Are not worried about contradictory information
- Downsides
- Need to design standard
- Difficult to change standard once implemented
- Can be difficult to represent all information
- Sources may lack some of the data
4Conflicting Data is Inevitable
- Independent sources, conflicting data
- Database constraints often reveal conflicts
- e.g. more than one rating for a given restaurant
Inquirer DB
Daily News DB
5Multiple Users Conflicting Data!
- Synchronizing PDA and computer calendars
- HotSync, iSync, etc.
- Many possible conflicts
- Collaborators share citation databases
- Different abbreviation styles
- Different citation programs
- Biologists databases
- Collect new information from published databases
- Store information from own experiments
- Disagree about key data points
6A New Model for Data Sharing
- Single global database not possible
- Participants prefer their own data
representations - Conflicting data is present, but conflicts are
localized - Collaborative Data Sharing System (CDSS)
- Synchronize databases by sharing updates made to
individual databases - Each participant creates its own global
database by deciding which updates to apply - ORCHESTRA is our implementation of a CDSS
7CDSS Overview
Participant 2
Participant 1
CDSS (ORCHESTRA)
- CDSS coordinates with other participants
- Stores updates (publication)
- Finds updates participant wants to apply
(reconciliation) - Other participants may use a different data
representation - or introduce contradictory information
8Trust Policies in a CDSS
Pasion! ééé
Pats ééé
Pats éé
Pasion! ééé
Alma de Cuba éé
Pats éé
Pats ééé
- Allow participants to trust all, some, or none of
another participants updates - Specify preferences in the event of conflicting
updates
Pasion! ééé
Alma de Cuba éé
Pats ééé
Pats éé
I trust the Inquirer, and the Daily News only for
restaurants in South Philly.
I trust both papers, but I prefer the Daily News
for steak shops.
Pats ?
9Challenges solved in a CDSS
- Intermittent participation
- Need for consistent, predictable behavior
- Translating updates between different data
representations - Reference Z. Ives et al., ORCHESTRA Rapid,
Collaborative Sharing of Dynamic Data, CIDR 05.
10Data Sharing Operations
- Operations involve only one participant
- Publishing
- Reconciliation
- Participant applies consistent subset of updates
- May get its own unique instance
d
Publish New Updates
request
Reconciliation Requests
Published Updates
Local Database
Update Log
d?
d?
11Implementation Details
- Peer-to-Peer Java implementation
- Efficient algorithm for reconciliation
- Distributed processing
- First implementation scales to 10s to 100s of
participants - Reference N.E. Taylor Z.G. Ives, Reconciling
while Tolerating Disagreement in Collaborative
Data Sharing, SIGMOD 06.
12Ongoing work
- Improving performance using caching and
pre-computation - Improving reliability using replication
- Incorporate update translation
- First implementation assumes a single data
representation
13Conclusions
- Flexible model of data sharing
- Synchronization tolerates disagreement
- P2P network for reliable data storage and
distributed processing
http//www.cis.upenn.edu/zives/orchestra/
14Related Work
- Inconsistency repair
- Bry97, ABC99
- Causal ordering in distributed DBs with
replication - Optimistic Concurrency Control KR81,
Version vectors PPR83, - Distributed file systems
- Ivy MMGC02, Coda Braam98,KS95,
Bayou TTP96, - File synchronization
- Unison PV04, Harmony BVP06
- Version control (CVS, Subversion, etc.)