Title: Peer-to-Peer Data Integration Using Distributed Bridges
1Peer-to-Peer Data Integration Using Distributed
Bridges
Candidate Thesis for M. A. Sc. in Electrical
Engineering
- Neal Arthorne
- B. Eng. Computer Systems (2002)
- Supervisor Babak Esfandiari
- April 12, 2005
2Introduction
- Multiple autonomous heterogeneous data sources
- E.g. chemistry and genetics databases, digital
repositories, astronomy databases - Data is distributed and network-accessible
- Each data source may use a different syntax or
query language (SQL, Web Services etc.)
3Related Work
- Federated database systems Sheth, 1990
- Global federated schema
- Mediator approach Wiederhold, 1992
- Databases wrapped in a software layer that
translates to a common information model - Middleware lies between user applications and
data sources - Theoretical description of data integration
- Schemas mapped with FOL statements LAV or GAV
approach
4Related Work Contd
- OWL/RDF/RDFS used to describe semantic
relationships ontologies - Peer-to-Peer Data Integration WWW approach to
integrating data - PIAZZA (Halevy et al.), Lenzerini, Franconi
- Focused on query optimization and decidability in
FOL systems
5Limitations of Related Work
- Global shared schemas are fragile and not
scalable - Centrally located and administered
- Changes affect all component databases or
middleware - P2P data integration is limited
- Semantic differences not addressed
- Centrally stored mappings
- Large databases not compatible with centralized
metadata
6Proposed Solutions
- User-contributed mappings between schemas
(bridges) - Fully de-centralized distribution of mappings
- Anyone can publish a new mapping
- No global schema means improved scalability
- Provide semantic mappings for data
- Distributed searching compatible with large
databases - Use existing Universal Peer-to-Peer (U-P2P)
framework
7Universal Peer-to-Peer (U-P2P)
- Peers share XML metadata with binary attachments
- Communities formed around a shared XML Schema
- Community itself is published anyone can create
a community - Flexible deployment pluggable Network Adapters
Book Comunity
0..
1
Resource
1
0..
lt?xml version1.0?gtltbookgt lttitlegtWar
Peacelt/titlegt lte-textgtfile//...lt/e-textgt
Attachments
8P2P Data Integration with U-P2P
- Proposed Bridge Community and bridge schema
- Anyone can publish a bridge
- Includes simple semantic relation
- Attached mappings and/or transforms
- U-P2P modularized for database proxies
- Distributed Network Adapter (Gnutella)
- Compatible with large databases
- No central indexing servers
9Bridges in U-P2P
Resource
Community
Bridge
10Example Bridge
ltbridgegt lttitlegtDSpace to Fedora
bridgelt/titlegt ltdescriptiongtlt/descriptiongt
ltbridgeMappinggt ltsourcegt
ltcommunitygtd2a9d6f78dcf91828f68a52f78260e05lt/commu
nitygt ltresourcegt134d8f8ecd57acb35206b4cd13e3
8622lt/resourcegt lt/sourcegt
ltrelationgtowlsameAslt/relationgt lttargetgt
ltcommunitygtd2a9d6f78dcf91828f68a52f78260e05lt/comm
unitygt ltresourcegtda1058314b7d8890fc7df7f879a
0a7dblt/resourcegt lt/targetgt
lt/bridgeMappinggt lttransformListgt
lttransformgt ltfilegtfile//lt/filegt
lt/transformgt
11Case Study Digital Repositories
Peer A
DSpace Community
Generic Central Server
Fedora Community
Peer C
Fedora Database
DSpace Community
Peer B
Fedora Community
Fedora Community
Proxy
Gnutella Protocol
Centralized P2P
12Conclusion
- P2P approach to integration anyone can create a
bridge - Fully-distributed network adapter brings in large
data sources via proxies - Demonstrated integration with digital
repositories - Simple semantic relationship (OWL)
- Query translation
13Future Work
- Manual navigation between schemas
- Need to automate retrieving bridges
- XPath query translation is limited
- Need to provide robust query translation modules
- Translate instance data
- Semantic relationships not exploited
- Use OWL ontologies to give bridges a context
- Software agents can be introduced to discover and
use bridges