Title: Semantic Integration of Heterogeneous NASA Mission Data Sources
1Semantic Integration of Heterogeneous NASA
Mission Data Sources
Rich Keller Dan Berrios Shawn
Wolfe David Hall Ian Sturken
Semantic Technologies Applied Research
Team Information Sharing Integration
Group Intelligent Systems Division NASA Ames
Research Center
2Goal Virtual data integration
- Virtual Integration enable construction of
single virtual data source that presents a
semantically unified view across a set of
heterogeneous data sources
WebDAV servers
Web page
RDF repository
SQL database
XML file
ftp server
Web services application
Excel file
document repository
3Outline
- Semantic Integration Basics
- SemanticIntegrator (SI) Architecture
- Application to Science Ops Planning
- Conclusions
4What is Information Integration?
- Integration involves
- bringing together information from multiple
sources - synthesizing a unified view of the information
- Different shades of integration
- Syntactic vs. Semantic
- Shallow vs. Deep
5Syntactic vs. Semantic Integration
- Syntactic integration integrate based on surface
commonalities across data labels and values - Eg correspond temperature field in one DB with
temperature field in another based on identical
field name and datatype (numeric) - Semantic integration integrate based on
commonalities in meaning behind data - Eg correspond temperature fields based on the
fact that both measure the same property of the
same physical subsystem and their scientific
units are compatible
6Shallow vs. Deep Integration
- Shallow integration retrieve the union of all
potentially-relevant information from all data
sources and present everything to the user
(Google-like approach) - Deep integration synthesize a single view from
all available data sources and present that
integrated view to the user - Requires defining a common integrated view of
data - Requires identification and disambiguation of
similar data across sources - Is challenging!
7Deep, Semantic Integration Example
Excel TI Staff List
Keller Home Page from TI Web Site
- Interleaves data from both sources
- Referees conflicts across sources
8Ontologies Key to Deep, Semantic Integration
- Semantic Integration requires a deep
understanding of the implicit meaning and context
surrounding the data - Semantic data models based on ontologies capture
underlying meaning and context needed to support
deep cross-source data integration
9Outline
- Semantic Integration Basics
- SemanticIntegrator (SI) Architecture
- Application to Science Ops Planning
- Conclusions
10Generic Integration Architecture
Native Data Sources
Database
User
Web Service
Integrated Data Source
Excel file
11Integration Problems/Approaches
Native Data Sources
- Problems
- Data sources store data in different formats and
speak different languages - Data sources dont capture meaning of data
- Data sources dont capture users world view
Database
User
- Approach
- Use the language of ontologies (RDF) as a the
common format in which to perform integration - Wrap all data sources so they appear to store
their data in RDF - Develop data source ontologies to capture meaning
of data in native sources - Develop an integrated ontology to describe the
users world view of the merged data - Develop translation rules that allow you to map
vocabularies across sources
Web Service
Integrated Data Source
Excel file
12SemanticIntegrator Architecture
Native Data Sources
Results are returned Wrapper translates to RDFS,
and passes back
Client requests data from virtual integrated data
source (VIDS)
Interface displays results
External data source is queried
Interface
Triple Store passes query through to wrapper
Query routed via VIDS to data source mediator
(DSM)
DS1 Ontology
DSM translates results via rules back to language
used by VIDS.
W r a p p e r
DS1 Database
RDFS Triple Store
VIDS Ontology
DS2 Ontology
Query is rewritten into native format for the
data source by a Wrapper
W r a p p e r
DS2 Web Service
RDFS Triple Store
DS3 Ontology
W r a p p e r
Virtual Integrated Data Source (RDFS format)
RDFS Triple Store
DS3 Web Page
DS4 Ontology
DSM uses translation rules to transform the query
so it can be posed to the appropriate RDF data
sources
W r a p p e r
RDFD Triple Store
Translation Rules
DS4 Excel file
13Outline
- Semantic Integration Basics
- SemanticIntegrator (SI) Architecture
- Application to Science Ops Planning
- Conclusions
14Info. Integration for Science Operations Planning
Mobile Agents Analog Simulation of Mars Surface
Exploration by Human-Robot Teams
Robot Mule tracks Astronauts takes photos when
commanded
Robot in follow me mode
- Utah Field Tests 2003 and 2004
- 50 Participants over 17 days
- 3 NASA centers 2 universities
- Diverse scenarios, rough terrain
- 2 geologists authentic science
Co-Is Bill Clancey Maarten Sierhuis
Voice annotation is recorded and transmitted to
database in habitat to RST on earth
Astros can work fully in parallel, talking to
personal agents
15Mobile Agents Data Sources
- Field Data
- Images of field sites, environ. features, mineral
samples - Voice notes
- Site data (lat/lon, topography)
- Stored in ScienceOrganizer semantic repository
- Analytic Data
- Sample analysis data (e.g., composition)
- Stored in Excel spreadsheet
- Mineralogy Data
- Chemical composition, atomic weight
- Available _at_ minerals.com
- GIS Data
- Satellite images
- Geographic data (e.g., population, features)
- Available from Microsofts TerraServer Web service
16Applying SemanticIntegrator to Mobile Agents
- 4 data sources
- 5 ontologies
- 4 different ontologies to impart meaning to data
sources - 1 ontology represents the integrated source
- Rules capture translations between sources
- Simple interface to display integrated data
(SIMA SemanticIntegrator for Mobile Agents)
17Ontologies
Field Data
GIS Data
Analysis Data
Integrated Data
Mineralogy Data
18SIMA Interface
Satellite
19Outline
- Semantic Integration Basics
- SemanticIntegrator (SI) Architecture
- Application to Science Ops Planning
- Conclusions
20Conclusions
- Goal avoid expensive hard-coded, one-off
integration strategies - SemanticIntegrators explicit integration
framework enables reuse of components and
knowledge, reducing incremental integration
overhead - Data source wrappers/ontologies can be reused
- Rules can be reused