Metadata and Information Services for an Earthquake Simulation Grid presentation

About This Presentation

Transcript and Presenter's Notes

Title: Metadata and Information Services for an Earthquake Simulation Grid

1
Metadata and Information Services for an
Earthquake Simulation Grid

Mehmet Aktas, Marlon Pierce, and Geoffrey Fox
Community Grids Lab
Indiana University

2
SERVOGrid Background

Web services based grid for supporting earthquake
simulation
Components
Databases faults, GPS, Seismic catalogs
Simulation codes Monte Carlo, FEM, mesh
generation tools
Web services
Data access, job management (workflow), file
transfer, session management.
Portlet-based user interface
Information services

3
Information Service Complaints

I have never been happy with the various
information services.
In my experience, data models have problems
Tree model forces arbitrary decisions about
container organization
Overuse of ltanygt tags
Poorly maintained information
UDDI problem registries are filled with obsolete
information.
Information servers tend to be very centralized.
Peer approaches need to be examined

4
Semantic Information Services

After reviewing the Semantic Web specifications,
I became interested in using them for information
services.
Graph models seem to be a more natural way to
extend interlinked information.
Using URIs, potentially easier to support
fragmented data
Centralized data services can be used but also
P2P approaches.
I am not so interested in artificial
intelligence, reasoning, etc.
Interesting problems, but for someone else.
We see two different activities
Designing an RDFS/OWL ontology to act as
SERVOGrid information services data models
Implementing the information services middleware.

5
An Ontology Overview
6
Sample Simulation Codes

Disloc calculates surface stress displacements
causes by a fault placed in an elastic
half-space. Surface data can be either on a grid
or on defined scattered points. Can also create
InSAR-style surface displacements.
Simplex inverts Disloc to estimate fault
parameters from observed surface displacements.
Surface displacements can be either on a grid or
at defined points.
GeoFEST does a realistic model of stresses
created by a fault. Uses finite element method,
realistic material properties.
AKIRA Converts a geometry (layers, faults)
specification into a finite element mesh.
Successive calls refine the mesh. Needed as a
helper application for GeoFEST.
Virtual California Based on realistic fault and
fault friction models, simulates interacting
fault systems.

7
Visualization Codes

We associate simulation codes with zero or more
visualization systems.
GMT (General Mapping Tool)
IDL
RIVA
Web Map Service (GIS)
In practice, we usually refer to scripts for
specific tasks rather than the entire toolkit.

8
Sample Compute Resources

Grids a Sun Ultra 60 with Disloc, Simplex, and
VC installed.
Danube linux dual processor machine with
GeoFEST, Akira, GMT installed.
Jabba an SGI 8 processor machine with RIVA
installed.

9
Data Types and Formats

This is a mixture of data objects and
representations. As always, the data itself is
not represented but information like the creator
of the data is.
Faults
GPS data
Seismicity
Surface stress data
INSAR data
Surface data representation grid or point data

10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
Managing Distributed Metadata
14
Managing Distributed Metadata

Small problem with the Semantic Web/Grid
How do you manage fragments of dynamic metadata?
(Assume a uniform data model)
In our case, we need a medium sized distributed
information system
Not the entire web, but dynamic enough to benefit
from distributed information systems.
We want to strike a balance between response time
efficiency and reliability.

15
Cache Nodes

Instances of the SERVO ontology are initially
distributed over several distributed cache nodes.
No one cache has all the instances.
Caches are accessed as peer-to-peer nodes.

(grids,hasCode,disloc)
(danube,hasCode,geoFEST)
(kamet,hasCode,Slider)
16
Querying a Proxy Cache

Clients can connect to any of the cache nodes via
a Web service connection.
Queries and responses are just SOAP requests.
If the Proxy cache cant answer a query, it does
a P2P search of all neighbors.
If/when query is answered, the initiating proxy
cache augments its RDF store with the new info
from the peer.
It can henceforth answer that query without
searching.

17
Client
(?,hasCode,GeoFEST)
Proxy Cache 1
(Grids,hasCode,Disloc)
SOAP Call
(Danube,hasCode,GeoFEST)
Peer Search
Proxy Cache 2
(Danube,hasCode,GeoFEST)
18
Notification Updates to Proxy Caches

Proxy caches acquire larger sets of metadata over
time in response to client queries.
Problem now is that caches can become out of
synch.
Disloc may be removed from Grids, so all caches
have to be notified.
This is handled through publish/subscribe system
based on topics.
There is one topic for each property.
Caches subscribe to topics for each property.
Origin caches are allowed to publish changes.

19
More Information

QuakeSim http//www-aig.jpl.nasa.gov/public/dus/q
uakesim/
Semantic Web Work http//grids.ucs.indiana.edu/m
aktas/servo/
NASA CT and AIST support the QuakeSim project,
and to NASA Ames supported Semantic Grid
investigations.

20
Querying Cache Space
Broker Cloud
Web Service
Client
Cache Space
21
The Picture

Each peer of the P2P network is working as Proxy
Cache. A Proxy cache forms a door between client
a the Cache Space.
Clients interact with peers through a Web Service
interface
When a clients queries a peer where the cache is
installed, this peer will query its cache and
then forward the query to the rest of the Cache
Space.
Forwarding simply happens as publishing the query
to the available topics. With this method query
is distributed stepwise to the nodes that are
semantically connected to the origin Proxy Cache.
Each query message has the unique identifier of
the peer that originates the query. When the
results are propagated
Each cache repeats the querying and forwarding
process unless there is results.
When there is results to the query, results are
propagated back as an RDF Model
Distributed search stops when there are results
satisfying the query or when there are no results
found after a customized threshold for the number
of stepwise exploration.

22
What About WS-ltanygt?

We are examining the feasibility of using RDF and
related languages to describe our information
requirements.
Build a testbed infrastructure for decentralized
metadata management as a proof of concept.
There are many activities and specifications in
this general area that we do not want to use in
the proof-of-concept phase.
WS-Notification and WSRF obviously.

23
Edutella P2P Network Infrastructure

Edutella uses JXTA framework for P2P
functionality and provides services that
complement JXTA service layer.
Edutella uses RDF syntax for metadata. Each peer
is provides a Query Service to search its RDF
repository.
There is no stepwise exploration of the peers. A
JXTA peer sends a query only to its JXTA
neighborhood regardless of the link structure of
the metadata.
There is no cashing of the metadata on the peers
to decrease the search time. Each JXTA peer
performs a query on its own data repository.

24
High level classification of classes in Servo
Grid Ontology
25
Classification of ServoComputePlatform
26
Classification of Servo Code Characteristics
27
Associated properties with ServoCode Class
28
Associated properties with ServoData Class
29
Associated properties with ServoComputePlatform
Class
30
Edutella P2P Network Infrastructure

Edutella uses JXTA framework for P2P
functionality and provides services that
complement JXTA service layer.
Edutella uses RDF syntax for metadata. Each peer
is provides a Query Service to search its RDF
repository.
There is no stepwise exploration of the peers. A
JXTA peer sends a query only to its JXTA
neighborhood regardless of the link structure of
the metadata.
There is no cashing of the metadata on the peers
to decrease the search time. Each JXTA peer
performs a query on its own data repository.

31
Notification-Based Caching

We want to strike a balance between centralized
and decentralized content management.
Metadata instances may be distributed over
several hosts.
We are investigating caching based on
breadth-first search.
Each node stores its own data and all of the
immediate property values, one node deep.
This allows each node to maintain a moderate
amount of information sufficient to satisfy
immediate (one-hop) RDF queries.

32
Distributed RDF Queries based on properties

Queries are formed as triples to find available
metadata about a resource.
Results are metadata (set of triples) regarding
the requested resource.
A query may be issued to any P2P node where the
cache is installed.
Starting from first cache, each cache is queried
via stepwise exploration.
First cache interacting with the client is the
Proxy cache.
The Proxy cache distributes the clients query
with its unique identifier to be able to receive
the results.

33
Distributed Query Steps occur as follows

First cache in the cache space is queried.
When there is no results, query is published to
available topics with the unique identifier of
the first P2P node.
Each cache repeats the querying and forwarding
process unless there is results.
When there is results satisfying the query,
results are propagated back as an RDF Model
Distributed search stops when there are results
satisfying the query or when there are no results
found after a customized threshold for the number
of stepwise exploration.

34
Initialization of system with fragmented RDF

When a node is bootstrapped, a triple store is
created out of available RDF Models at that node.
Predicates of available triples (where the object
of the triple is a Resource) form topics.
Topics are created at the broker node dynamically
by publishing a message to static topics such as
createTopic.
A Resource metadata provider can be a publisher
for the topics where the Resource is the Domain
of that topic.
A node can be a subscriber to a topic when there
are Resource Objects (in the cached triple store)
that are in the Range of that topic.

35
Notification based updates

Static topics, such as createTopic, deleteTopic
are used to create dynamic topics.
Finite amount of topics available
SERVOGrid ontology defines all possible topics
(predicates) available
Publisher node of a topic stores the origin
metadata (set of triples)
metadata provider is responsible to propagate
updates
Subscriber node of a topic is listening to
updates for the resources that are in the range
of that topic.

36
Topic-Based Publish/Subscribe Systems

Publish/subscribe systems are a way to distribute
messages to many different listeners.
Publishers and subscribers are associated with
topics.
We use in-house developed NaradaBrokering system
JMS, WS-Notification
S. Pallickara

Subscriber
Subscriber
Subscriber
Broker Cloud
Publisher

Write a Comment

User Comments (0)

About PowerShow.com

Metadata and Information Services for an Earthquake Simulation Grid PowerPoint PPT Presentation