Title: Ontologies and Semantic Applications in Earth Sciences
1Ontologies and Semantic Applications in Earth
Sciences
- Peter Fox (TWC/RPI formerly HAO/NCAR)
- Thanks to many.
- Projects funded by NSF/OCI and NASA/ACCESS/ESTO
2Background
- Scientists should be able to access a global,
distributed knowledge base of scientific data
that - appears to be integrated
- appears to be locally available
- But data is obtained by multiple means (models
and instruments), using various protocols, in
differing vocabularies, using (sometimes
unstated) assumptions, with inconsistent (or
non-existent) meta-data. It may be inconsistent,
incomplete, evolving, and distributed - And there exist(ed) significant levels of
semantic heterogeneity, large-scale data, complex
data types, legacy systems, inflexible and
unsustainable implementation technology
3Data-types as service
Limited interoperability
- VOTable
- Simple Image Access Protocol
- Simple Spectrum Access Protocol
- Simple Time Access Protocol
VO App2
VO App3
VO App1
Open Geospatial Consortium Web Feature,
Coverage, Mapping Service Sensor Web
Enablement Sensor Observation, Planning,
Analysis Service use the same approach
VO layer
DBn
DB2
DB3
DB1
4VO API
Web Serv.
VO Portal
Knowledge as service!
Query, access and use of data
- Mediation Layer
- Ontology - capturing concepts of Parameters,
Instruments, Date/Time, Data Product (and
associated classes, properties) and Service
Classes - Maps queries to underlying data
- Generates access requests for metadata, data
- Allows queries, reasoning, analysis, new
hypothesis generation, testing, explanation, etc.
Semantic mediation layer - VSTO - low level
Standard, or not, vocabularies and schema
Metadata, schema, data
DBn
DB2
DB3
DB1
5Semantic Web Methodology and Technology
Development Process
- Establish and improve a well-defined methodology
vision for Semantic Technology based application
development - Leverage any existing vocabularies
Adopt Technology Approach
Leverage Technology Infrastructure
Science/Expert Review Iteration
Rapid Prototype
Open World Evolve, Iterate, Redesign, Redeploy
Use Tools
Analysis
Use Case
Develop model/ ontology
Small Team, mixed skills
6E.g. Science and technical use cases
- Find data which represents the state of the
neutral atmosphere anywhere above 100km and
toward the arctic circle (above 45N) at any time
of high geomagnetic activity. - Extract information from the use-case - encode
knowledge - Translate this into a complete query for data -
inference and integration of data from
instruments, indices and models - Provide semantically-enabled, smart data query
services via a SOAP web for the Virtual
Ionosphere-Thermosphere-Mesosphere Observatory
that retrieve data, filtered by constraints on
Instrument, Date-Time, and Parameter in any order
and with constraints included in any combination.
7VSTO - semantics and ontologies in an operational
environment vsto.hao.ucar.edu, www.vsto.org
8Semantic Web Services
9Semantic Web Services
OWL document returned using VSTO ontology - can
be used both syntactically or semantically
10Semantic Web Benefits
- Unified/ abstracted query workflow Parameters,
Instruments, Date-Time across widely different
disciplines - Decreased input requirements for query in one
case reducing the number of selections from eight
to three - Semantic query support by using background
ontologies and a reasoner, our application has
the opportunity to only expose coherent queries
(portal and services) - Semantic integration in the past users had to
remember (and maintain codes) to account for
numerous different ways to combine and plot the
data whereas now semantic mediation provides the
level of sensible data integration required, and
exposed as smart web services - understanding of coordinate systems,
relationships, data synthesis, transformations,
etc. - returns independent variables and related
parameters - A broader range of potential users (PhD
scientists, students, professional research
associates and those from outside the fields) - VSTO http//vsto.hao.ucar.edu,
http//www.vsto.org
11http//dataportal.ucar.edu/schemas/vsto_all.owl
(1.0, 2.0 coming)
12Ingest/pipelines problem definition
- Data is coming in faster, in greater volumes and
outstripping our ability to perform adequate
quality control - Data is being used in new ways and we frequently
do not have sufficient information on what
happened to the data along the processing stages
to determine if it is suitable for a use we did
not envision - We often fail to capture, represent and propagate
manually generated information that need to go
with the data flows - Each time we develop a new instrument, we develop
a new data ingest procedure and collect different
metadata and organize it differently. It is then
hard to use with previous projects - The task of event determination and feature
classification is onerous and we don't do it
until after we get the data
13(No Transcript)
14Use cases
- Who (person or program) added the comments to the
science data file for the best vignetted,
rectangular polarization brightness image from
January, 26, 2005 184909UT taken by the ACOS
Mark IV polarimeter? - What was the cloud cover and atmospheric seeing
conditions during the local morning of January
26, 2005 at MLSO? - Find all good images on March 21, 2008.
- Why are the quick look images from March 21,
2008, 1900UT missing? - Why does this image look bad?
15(No Transcript)
16(No Transcript)
17Provenance
- Origin or source from which something comes,
intention for use, who/what generated for, manner
of manufacture, history of subsequent owners,
sense of place and time of manufacture,
production or discovery, documented in detail
sufficient to allow reproducibility - Knowledge provenance enrich with ontologies and
ontology-aware tools
18(No Transcript)
19(No Transcript)
20Quick look browse
21(No Transcript)
22Visual browse
23(No Transcript)
24(No Transcript)
25Search and structured query
Structured Query
Search
26Search
27Data Integration Use Case
- Determine the statistical signatures of both
volcanic and solar forcings on the height of the
tropopause
28Detection and attribution relations
29(No Transcript)
30SWEET 2.0
31Semantic framework indicating how volcano and
atmospheric parameters and databases can
immediately be plugged in to the semantic data
framework to enable data integration.
32Faceted Search
33Summary
- Level of ontology encoding relates to use, e.g.
- VSTO
- SPCDIS
- SESDI Data integration needs higher level of
curation of ontologies and mapping to data - Languages and tools
- Rapid prototyping (PHP, Semantic MediaWiki)
- Clean and simple (RDFS, Perl and SPARQL)
- Complex and rich (Java, Protégé, Jena, Pellet,
ELMO, Maven, Eclipse)
34Modified GEON Solution Framework
Data Discovery
Data Integration
Level 1 Data Registration at the Discovery
Level, e.g. Volcano location and activity
Level 2 Data Registration at the Inventory
Level, e.g. list of datasets by, types, times,
products
Level 3 Data Registration at the Item
Detail Level, e.g. access to individual quantities
Earth Sciences Virtual Database A Data Warehouse
where Schema heterogeneity problem is Solved
schema based integration
Ontology based Data Integration
A.K.Sinha, Virginia Tech, 2006
35Spare material
36Example 1 Registration of Volcanic Data
- Location Codes
- U - Above the 180 turn at Holei Pali (upper
Chain of Craters Road) - L - Below Holei Pali (lower Chain of Craters
Road) - UL - Individual traverses were made both above
and below the 180 turn at Holei Pali - H - Highway 11
SO2 Emission from Kilauea east rift zone -
vehicle-based (Source HVO)
Abreviations t/dmetric tonne (1000 kg)/day,
SDstandard deviation, WSwind speed, WDwind
direction east of true north, Nnumber of
traverses
37Registering Volcanic Data (2)
- No explicit lat/long data
- Volcano identified by name
- Volcano ontology framework will link name to
location
38Registering Atmospheric Data (2)
39Building blocks
- Data formats and metadata IAU standard FITS,
with SoHO keyword convention, JPeG, GIF - Ontologies OWL-DL and RDF
- The proof markup language (PML) provides an
interlingua for capturing the information agents
need to understand results and to justify why
they should believe the results. - The Inference Web toolkit provides a suite of
tools for manipulating, presenting, summarizing,
analyzing, and searching PML in efforts to
provide a set of tools that will let end users
understand information and its derivation,
thereby facilitating trust in and reuse of
information. - Capturing semantics of data quality, event, and
feature detection within a suitable community
ontology packages (SWEET, VSTO)