Semantic Web and Retrieval of Scientific Data Semantics - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Semantic Web and Retrieval of Scientific Data Semantics

Description:

Data and resources described, interchanged, and processed ... Kilobyte. rdf:_3. cru:Height#genid2. 2593. cru:size. hgt.1958.1000.6h.w1.53x21.dat.gz. cru : ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 20
Provided by: lily83
Category:

less

Transcript and Presenter's Notes

Title: Semantic Web and Retrieval of Scientific Data Semantics


1
Semantic Web and Retrieval of Scientific Data
Semantics
Goran Soldar University of Brighton UK
Dan Smith University of East Anglia UK
2
Introduction
  • Semantic Web
  • Introduced by Tim Berners-Lee
  • Data and resources described, interchanged, and
    processed
  • Machine understanding of heterogeneous data
  • Most search engines on the Web are human use
    oriented
  • Finding and processing scientific data on the
    web is time-
  • consuming process
  • Example
  • Search Web pages containing the word
    temperature
  • Search engine Google
  • Search domain www.cru.uea.ac.uk
  • Results 773 web pages

3
Introduction
  • Inefficiency of the traditional search
  • Humans have to browse through web pages
  • No guarantee that the wanted information will
    be found
  • Preferred approach
  • Describe the semantics of data using RDF/XML
    format
  • Store the data in a DBMS
  • Automatically retrieve desired information
    based on
  • users requests
  • Enable client machines to learn the semantics
    of RDF
  • format described data

4
Introduction
  • Objectives of this work
  • To address the problem of extracting semantics
    from
  • data files within the meteorology domain.
  • To build the ontology for the meteorology
    domain.
  • To create semantic cases with RDF Model/RDF
    Schema.
  • To employDB2 DBMS as the data repository.
  • To enhance standard DBMS with RDF Triples
    Engine.
  • To manage the RDF graph structure with
  • RDF Triples Engine.

5
RDF and Domain Ontology
  • RDF is a framework for describing metadata.
  • It enables interoperability between machines by
  • interchanging information about information
    resources
  • It is represented with a Directed Labeled Graph

6
RDF and Domain Ontology
  • Specific domains represented with RDF
  • Our focus The Meteorology domain
  • The concepts, semantics and the relations
    between the
  • concepts defined with RDF Schema.
  • Ontology An explicit specification of an
    information
  • domain
  • RDF Schema Uses the syntax of RDF Model
  • Corresponds to XMLs DTD or XML Schema
  • RDF Schema is a basis for RDF instances

7
Modelling RDF Model for Meteorology
  • Three phases of modelling
  • Development of the vocabulary (ontology)
  • Design of semantic cases to capture resource
    description
  • Creation of semantic case instances
  • The vocabulary is comprised of main concepts
    and classes represented
  • by classes and properties
  • RDF Schema uses RDF Model encoding syntax
  • rdftype separates RDF classes from properties
  • rdfssubClassOf allows expression of
    inheritance-relationship
  • between RDF classes

8
Modelling RDF Model for Meteorology
  • The Meteorology domain at cru.sys.uea.ac.uk
  • Contains about 1000 data files
  • Made of 9 meteorological topic (sub-domains)
  • Have all sub-domains designed as RDF classes
  • have all concepts and elements defined in its
    Namespace
  • The ontology is defined in two RDF files
  • Class.rdf
  • Property.rdf
  • Semantic cases are based on the existing
    vocabulary
  • Simple semantic cases designed first
  • Complex cases are the combination of complex
    ones

9
Modelling RDF Model for Meteorology
  • Our prototype model
  • Describes 100 data sets
  • Contains 4 semantic cases

The semantic cases
  • HeaderCase
  • URL
  • FormatType
  • DataParameter
  • Comment
  • Domain
  • ObservationCase
  • Frequency
  • TimePeriod
  • Value
  • PeriodCase
  • TimeRange
  • TimePeriod
  • Value
  • SizeCase
  • Compression
  • FileSize
  • Value

10
Modelling RDF Model for Meteorology
ltrdfDescription about"hgt.1958.1000.6h.w1.53x21.
dat.gz"gt ltcruURLgt http//www.cru.uea.ac.uk/
cru/pressure/hgt/hgt1000_6h lt/cruURLgt
ltcruFormatTypegtASCIIlt/cruFormatTypegt
ltcruDataParametergt GeopotentialHeight_AtPress
ure lt/cruDataParametergt ltrdfscommentgt
6-Hourly GeopotentialHeight at 1000mb
lt/rdfscommentgt ltrdfsdomaingtcruHeightlt/rdfsdo
maingt lt/rdfDescriptiongt
RDF Instance of HeaderCase for a data file
11
From RDF to Relational Model
  • Our prototype model
  • Comprises of 12 RDF files
  • One holds semantic case descriptions
  • Two hold RDF Schema descriptions
  • Nine contain RDF onstances of semantic cases
  • Management of RDF-described data
  • W3C does not recommend any method for
    manipulating RDF Triples
  • RDF structure is similar to XML
  • XML comes with APIs for data manipulation (SAX,
    DOM),
  • RDF does not

12
Modelling RDF Model for Meteorology
  • We utilise RDF triple structure to
  • achieve the manipulation of data
  • XML parsers check the syntax of RDF
  • RDF parsers converts it into triples
  • RDF tags removed
  • Triples converted onto Relational model
  • Stored in DB2 DBMS



Mapping RDF model for Meteorology into RDBMS
13
Modelling RDF Model for Meteorology
14
Retrieval of Semantic Information
  • RDF Triple Engine is responsible for
    manipulating triples and
  • executing semantic queries
  • Based on Client/Server architecture with
    specialised RDF servers
  • Records in DBMS have graph structure
  • Not semantically atomic
  • Additional query processing added to RTE
  • RTE is aware of graph structure of triples
  • Able to produce results that reconstruct the
    graph structure and present
  • in format specified by users

15
Retrieval of Semantic Information
16
Retrieval of Semantic Information

Property



Resource





Value

cruURL



hgt.1958.1000.6h.w1.53x21.dat.gz

http//www.cru.uea.ac.uk/cru/data/ncep/window1/






6hourly
/pressure/hgt/hgt1000_6h

cruFormatType


hgt.1958.1000.6h.w1.53x21.dat.gz


ASCII

cruDataParameter


hgt.1958.1000.6h.w1.53x21.dat.gz


GeopotentialHeight_AtPressure

rdfscomment,


hgt.1958.1000.6h.w1.53x21.dat.gz


6
-
Hourly GeopotentialHeight at 1000mb

rdfs
domain


hgt.1958.1000.6h.w1.53x21.dat.gz


cruHeight

rdftype



cruHeightgenid2




RdfSeq

rdf_1



cruHeightgenid2




Compressed

rdf_2



cruHeightgenid2




Kilobyte

rdf_3



cruHeightgenid2




2593

crusize



hgt.1958.1000.6h.w1.53x21.dat.gz


cru
Heightgenid2

rdftype



cruHeightgenid3




rdfSeq

rdf_1



cruHeightgenid3




Frequency

rdf_2



cruHeightgenid3




Hour

rdf_3



cruHeightgenid3




6

cruobservation


hgt.1958.1000.6h.w1.53x21.dat.gz


cruHeightgenid3

rdftype



cruHeightgen
id4




rdfSeq

rdf_1



cruHeightgenid4




TimeRange

rdf_2



cruHeightgenid4




Year

rdf_3



cruHeightgenid4




1958

cruperiod



hgt.1958.1000.6h.w1.53x21.dat.gz


cruHeightgenid4

RDF instance
MetInstance
converted into a
relational table

17
Retrieval of Semantic Information
  • RTE relies on SQL query processor to extract
    relevant triples
  • Semantics Retrieval Language (SRL) prototype
    developed
  • SQL-similar syntax
  • Example
  • DESCRIBE RESOURCE hgt.1958.1000.6h.w1.53x21.dat
    .gz

Processing of the above SRL query Step 1
Transform the query into a standard SQL sentence
and submit it to DB2 SELECT FROM
MetInstance WHERE RESOURCEhgt.1958.1000.6h.w1.5
3x21.dat.gz
18
Retrieval of Semantic Information
Step 2 RTE applies the rules to generate XML as
the output 1. Extract name space prefixes and
generate XML namespace node. 2. For all (real)
atomic value create XML elements with Property
values as XML elements 3. For all
non-atomic values, create XML nodes as
sub-elements of the resources where they
appear as values 4. Ensure that if the node
type is Seq container, all elements must be
ordered
19
Conclusion
  • RTE-DBS approach enables querying and retrieval
    of semantic
  • information from scientific data files
    available on the Web
  • Such retrieved information can be further
    processed by a machine or
  • used by humans
  • Future work will be based on building a user
    interface into RTE to
  • maintain individual triples to prevent
    removal of triples who are nodes
  • A method for for identifying data semantics of
    data sets, based on
  • reasoning over semantic cases will be
    developed
Write a Comment
User Comments (0)
About PowerShow.com