Title: Semantic Web and Retrieval of Scientific Data Semantics
1Semantic Web and Retrieval of Scientific Data
Semantics
Goran Soldar University of Brighton UK
Dan Smith University of East Anglia UK
2Introduction
- Semantic Web
- Introduced by Tim Berners-Lee
- Data and resources described, interchanged, and
processed - Machine understanding of heterogeneous data
- Most search engines on the Web are human use
oriented - Finding and processing scientific data on the
web is time- - consuming process
- Example
- Search Web pages containing the word
temperature - Search engine Google
- Search domain www.cru.uea.ac.uk
- Results 773 web pages
3Introduction
- Inefficiency of the traditional search
- Humans have to browse through web pages
- No guarantee that the wanted information will
be found
- Preferred approach
- Describe the semantics of data using RDF/XML
format - Store the data in a DBMS
- Automatically retrieve desired information
based on - users requests
- Enable client machines to learn the semantics
of RDF - format described data
4Introduction
- Objectives of this work
- To address the problem of extracting semantics
from - data files within the meteorology domain.
-
- To build the ontology for the meteorology
domain. - To create semantic cases with RDF Model/RDF
Schema. - To employDB2 DBMS as the data repository.
- To enhance standard DBMS with RDF Triples
Engine. -
- To manage the RDF graph structure with
- RDF Triples Engine.
5RDF and Domain Ontology
- RDF is a framework for describing metadata.
- It enables interoperability between machines by
- interchanging information about information
resources -
- It is represented with a Directed Labeled Graph
6RDF and Domain Ontology
- Specific domains represented with RDF
- Our focus The Meteorology domain
- The concepts, semantics and the relations
between the - concepts defined with RDF Schema.
-
- Ontology An explicit specification of an
information - domain
- RDF Schema Uses the syntax of RDF Model
- Corresponds to XMLs DTD or XML Schema
- RDF Schema is a basis for RDF instances
7Modelling RDF Model for Meteorology
- Three phases of modelling
-
- Development of the vocabulary (ontology)
- Design of semantic cases to capture resource
description - Creation of semantic case instances
- The vocabulary is comprised of main concepts
and classes represented - by classes and properties
- RDF Schema uses RDF Model encoding syntax
- rdftype separates RDF classes from properties
- rdfssubClassOf allows expression of
inheritance-relationship - between RDF classes
8Modelling RDF Model for Meteorology
- The Meteorology domain at cru.sys.uea.ac.uk
- Contains about 1000 data files
- Made of 9 meteorological topic (sub-domains)
- Have all sub-domains designed as RDF classes
- have all concepts and elements defined in its
Namespace
- The ontology is defined in two RDF files
- Class.rdf
- Property.rdf
- Semantic cases are based on the existing
vocabulary - Simple semantic cases designed first
- Complex cases are the combination of complex
ones
9Modelling RDF Model for Meteorology
- Our prototype model
- Describes 100 data sets
- Contains 4 semantic cases
The semantic cases
- HeaderCase
- URL
- FormatType
- DataParameter
- Comment
- Domain
- ObservationCase
- Frequency
- TimePeriod
- Value
- PeriodCase
- TimeRange
- TimePeriod
- Value
- SizeCase
- Compression
- FileSize
- Value
10Modelling RDF Model for Meteorology
ltrdfDescription about"hgt.1958.1000.6h.w1.53x21.
dat.gz"gt ltcruURLgt http//www.cru.uea.ac.uk/
cru/pressure/hgt/hgt1000_6h lt/cruURLgt
ltcruFormatTypegtASCIIlt/cruFormatTypegt
ltcruDataParametergt GeopotentialHeight_AtPress
ure lt/cruDataParametergt ltrdfscommentgt
6-Hourly GeopotentialHeight at 1000mb
lt/rdfscommentgt ltrdfsdomaingtcruHeightlt/rdfsdo
maingt lt/rdfDescriptiongt
RDF Instance of HeaderCase for a data file
11From RDF to Relational Model
- Our prototype model
- Comprises of 12 RDF files
- One holds semantic case descriptions
- Two hold RDF Schema descriptions
- Nine contain RDF onstances of semantic cases
- Management of RDF-described data
- W3C does not recommend any method for
manipulating RDF Triples - RDF structure is similar to XML
- XML comes with APIs for data manipulation (SAX,
DOM), - RDF does not
-
-
12Modelling RDF Model for Meteorology
- We utilise RDF triple structure to
- achieve the manipulation of data
- XML parsers check the syntax of RDF
- RDF parsers converts it into triples
- RDF tags removed
- Triples converted onto Relational model
- Stored in DB2 DBMS
Mapping RDF model for Meteorology into RDBMS
13Modelling RDF Model for Meteorology
14Retrieval of Semantic Information
- RDF Triple Engine is responsible for
manipulating triples and - executing semantic queries
- Based on Client/Server architecture with
specialised RDF servers - Records in DBMS have graph structure
- Not semantically atomic
- Additional query processing added to RTE
- RTE is aware of graph structure of triples
- Able to produce results that reconstruct the
graph structure and present - in format specified by users
15Retrieval of Semantic Information
16Retrieval of Semantic Information
Property
Resource
Value
cruURL
hgt.1958.1000.6h.w1.53x21.dat.gz
http//www.cru.uea.ac.uk/cru/data/ncep/window1/
6hourly
/pressure/hgt/hgt1000_6h
cruFormatType
hgt.1958.1000.6h.w1.53x21.dat.gz
ASCII
cruDataParameter
hgt.1958.1000.6h.w1.53x21.dat.gz
GeopotentialHeight_AtPressure
rdfscomment,
hgt.1958.1000.6h.w1.53x21.dat.gz
6
-
Hourly GeopotentialHeight at 1000mb
rdfs
domain
hgt.1958.1000.6h.w1.53x21.dat.gz
cruHeight
rdftype
cruHeightgenid2
RdfSeq
rdf_1
cruHeightgenid2
Compressed
rdf_2
cruHeightgenid2
Kilobyte
rdf_3
cruHeightgenid2
2593
crusize
hgt.1958.1000.6h.w1.53x21.dat.gz
cru
Heightgenid2
rdftype
cruHeightgenid3
rdfSeq
rdf_1
cruHeightgenid3
Frequency
rdf_2
cruHeightgenid3
Hour
rdf_3
cruHeightgenid3
6
cruobservation
hgt.1958.1000.6h.w1.53x21.dat.gz
cruHeightgenid3
rdftype
cruHeightgen
id4
rdfSeq
rdf_1
cruHeightgenid4
TimeRange
rdf_2
cruHeightgenid4
Year
rdf_3
cruHeightgenid4
1958
cruperiod
hgt.1958.1000.6h.w1.53x21.dat.gz
cruHeightgenid4
RDF instance
MetInstance
converted into a
relational table
17Retrieval of Semantic Information
- RTE relies on SQL query processor to extract
relevant triples - Semantics Retrieval Language (SRL) prototype
developed - SQL-similar syntax
- Example
- DESCRIBE RESOURCE hgt.1958.1000.6h.w1.53x21.dat
.gz
Processing of the above SRL query Step 1
Transform the query into a standard SQL sentence
and submit it to DB2 SELECT FROM
MetInstance WHERE RESOURCEhgt.1958.1000.6h.w1.5
3x21.dat.gz
18Retrieval of Semantic Information
Step 2 RTE applies the rules to generate XML as
the output 1. Extract name space prefixes and
generate XML namespace node. 2. For all (real)
atomic value create XML elements with Property
values as XML elements 3. For all
non-atomic values, create XML nodes as
sub-elements of the resources where they
appear as values 4. Ensure that if the node
type is Seq container, all elements must be
ordered
19Conclusion
- RTE-DBS approach enables querying and retrieval
of semantic - information from scientific data files
available on the Web - Such retrieved information can be further
processed by a machine or - used by humans
- Future work will be based on building a user
interface into RTE to - maintain individual triples to prevent
removal of triples who are nodes - A method for for identifying data semantics of
data sets, based on - reasoning over semantic cases will be
developed