Title: Metadata and Information Services for an Earthquake Simulation Grid
1Metadata and Information Services for an
Earthquake Simulation Grid
- Mehmet Aktas, Marlon Pierce, and Geoffrey Fox
- Community Grids Lab
- Indiana University
2SERVOGrid Background
- Web services based grid for supporting earthquake
simulation - Components
- Databases faults, GPS, Seismic catalogs
- Simulation codes Monte Carlo, FEM, mesh
generation tools - Web services
- Data access, job management (workflow), file
transfer, session management. - Portlet-based user interface
- Information services
3Information Service Complaints
- I have never been happy with the various
information services. - In my experience, data models have problems
- Tree model forces arbitrary decisions about
container organization - Overuse of ltanygt tags
- Poorly maintained information
- UDDI problem registries are filled with obsolete
information. - Information servers tend to be very centralized.
- Peer approaches need to be examined
4Semantic Information Services
- After reviewing the Semantic Web specifications,
I became interested in using them for information
services. - Graph models seem to be a more natural way to
extend interlinked information. - Using URIs, potentially easier to support
fragmented data - Centralized data services can be used but also
P2P approaches. - I am not so interested in artificial
intelligence, reasoning, etc. - Interesting problems, but for someone else.
- We see two different activities
- Designing an RDFS/OWL ontology to act as
SERVOGrid information services data models - Implementing the information services middleware.
5An Ontology Overview
6Sample Simulation Codes
- Disloc calculates surface stress displacements
causes by a fault placed in an elastic
half-space. Surface data can be either on a grid
or on defined scattered points. Can also create
InSAR-style surface displacements. - Simplex inverts Disloc to estimate fault
parameters from observed surface displacements.
Surface displacements can be either on a grid or
at defined points. - GeoFEST does a realistic model of stresses
created by a fault. Uses finite element method,
realistic material properties. - AKIRA Converts a geometry (layers, faults)
specification into a finite element mesh.
Successive calls refine the mesh. Needed as a
helper application for GeoFEST. - Virtual California Based on realistic fault and
fault friction models, simulates interacting
fault systems.
7Visualization Codes
- We associate simulation codes with zero or more
visualization systems. - GMT (General Mapping Tool)
- IDL
- RIVA
- Web Map Service (GIS)
- In practice, we usually refer to scripts for
specific tasks rather than the entire toolkit.
8Sample Compute Resources
- Grids a Sun Ultra 60 with Disloc, Simplex, and
VC installed. - Danube linux dual processor machine with
GeoFEST, Akira, GMT installed. - Jabba an SGI 8 processor machine with RIVA
installed.
9 Data Types and Formats
- This is a mixture of data objects and
representations. As always, the data itself is
not represented but information like the creator
of the data is. - Faults
- GPS data
- Seismicity
- Surface stress data
- INSAR data
- Surface data representation grid or point data
10(No Transcript)
11(No Transcript)
12(No Transcript)
13Managing Distributed Metadata
14Managing Distributed Metadata
- Small problem with the Semantic Web/Grid
- How do you manage fragments of dynamic metadata?
- (Assume a uniform data model)
- In our case, we need a medium sized distributed
information system - Not the entire web, but dynamic enough to benefit
from distributed information systems. - We want to strike a balance between response time
efficiency and reliability.
15Cache Nodes
- Instances of the SERVO ontology are initially
distributed over several distributed cache nodes. - No one cache has all the instances.
- Caches are accessed as peer-to-peer nodes.
(grids,hasCode,disloc)
(danube,hasCode,geoFEST)
(kamet,hasCode,Slider)
16Querying a Proxy Cache
- Clients can connect to any of the cache nodes via
a Web service connection. - Queries and responses are just SOAP requests.
- If the Proxy cache cant answer a query, it does
a P2P search of all neighbors. - If/when query is answered, the initiating proxy
cache augments its RDF store with the new info
from the peer. - It can henceforth answer that query without
searching.
17Client
(?,hasCode,GeoFEST)
Proxy Cache 1
(Grids,hasCode,Disloc)
SOAP Call
(Danube,hasCode,GeoFEST)
Peer Search
Proxy Cache 2
(Danube,hasCode,GeoFEST)
18Notification Updates to Proxy Caches
- Proxy caches acquire larger sets of metadata over
time in response to client queries. - Problem now is that caches can become out of
synch. - Disloc may be removed from Grids, so all caches
have to be notified. - This is handled through publish/subscribe system
based on topics. - There is one topic for each property.
- Caches subscribe to topics for each property.
- Origin caches are allowed to publish changes.
19More Information
- QuakeSim http//www-aig.jpl.nasa.gov/public/dus/q
uakesim/ - Semantic Web Work http//grids.ucs.indiana.edu/m
aktas/servo/ - NASA CT and AIST support the QuakeSim project,
and to NASA Ames supported Semantic Grid
investigations.
20Querying Cache Space
Broker Cloud
Web Service
Client
Cache Space
21The Picture
- Each peer of the P2P network is working as Proxy
Cache. A Proxy cache forms a door between client
a the Cache Space. - Clients interact with peers through a Web Service
interface - When a clients queries a peer where the cache is
installed, this peer will query its cache and
then forward the query to the rest of the Cache
Space. - Forwarding simply happens as publishing the query
to the available topics. With this method query
is distributed stepwise to the nodes that are
semantically connected to the origin Proxy Cache.
- Each query message has the unique identifier of
the peer that originates the query. When the
results are propagated - Each cache repeats the querying and forwarding
process unless there is results. - When there is results to the query, results are
propagated back as an RDF Model - Distributed search stops when there are results
satisfying the query or when there are no results
found after a customized threshold for the number
of stepwise exploration.
22What About WS-ltanygt?
- We are examining the feasibility of using RDF and
related languages to describe our information
requirements. - Build a testbed infrastructure for decentralized
metadata management as a proof of concept. - There are many activities and specifications in
this general area that we do not want to use in
the proof-of-concept phase. - WS-Notification and WSRF obviously.
23Edutella P2P Network Infrastructure
- Edutella uses JXTA framework for P2P
functionality and provides services that
complement JXTA service layer. - Edutella uses RDF syntax for metadata. Each peer
is provides a Query Service to search its RDF
repository. - There is no stepwise exploration of the peers. A
JXTA peer sends a query only to its JXTA
neighborhood regardless of the link structure of
the metadata. - There is no cashing of the metadata on the peers
to decrease the search time. Each JXTA peer
performs a query on its own data repository.
24High level classification of classes in Servo
Grid Ontology
25Classification of ServoComputePlatform
26Classification of Servo Code Characteristics
27Associated properties with ServoCode Class
28Associated properties with ServoData Class
29Associated properties with ServoComputePlatform
Class
30Edutella P2P Network Infrastructure
- Edutella uses JXTA framework for P2P
functionality and provides services that
complement JXTA service layer. - Edutella uses RDF syntax for metadata. Each peer
is provides a Query Service to search its RDF
repository. - There is no stepwise exploration of the peers. A
JXTA peer sends a query only to its JXTA
neighborhood regardless of the link structure of
the metadata. - There is no cashing of the metadata on the peers
to decrease the search time. Each JXTA peer
performs a query on its own data repository.
31Notification-Based Caching
- We want to strike a balance between centralized
and decentralized content management. - Metadata instances may be distributed over
several hosts. - We are investigating caching based on
breadth-first search. - Each node stores its own data and all of the
immediate property values, one node deep. - This allows each node to maintain a moderate
amount of information sufficient to satisfy
immediate (one-hop) RDF queries.
32Distributed RDF Queries based on properties
- Queries are formed as triples to find available
metadata about a resource. - Results are metadata (set of triples) regarding
the requested resource. - A query may be issued to any P2P node where the
cache is installed. - Starting from first cache, each cache is queried
via stepwise exploration. - First cache interacting with the client is the
Proxy cache. - The Proxy cache distributes the clients query
with its unique identifier to be able to receive
the results.
33Distributed Query Steps occur as follows
- First cache in the cache space is queried.
- When there is no results, query is published to
available topics with the unique identifier of
the first P2P node. - Each cache repeats the querying and forwarding
process unless there is results. - When there is results satisfying the query,
results are propagated back as an RDF Model - Distributed search stops when there are results
satisfying the query or when there are no results
found after a customized threshold for the number
of stepwise exploration.
34Initialization of system with fragmented RDF
- When a node is bootstrapped, a triple store is
created out of available RDF Models at that node. - Predicates of available triples (where the object
of the triple is a Resource) form topics. - Topics are created at the broker node dynamically
by publishing a message to static topics such as
createTopic. - A Resource metadata provider can be a publisher
for the topics where the Resource is the Domain
of that topic. - A node can be a subscriber to a topic when there
are Resource Objects (in the cached triple store)
that are in the Range of that topic.
35Notification based updates
- Static topics, such as createTopic, deleteTopic
are used to create dynamic topics. - Finite amount of topics available
- SERVOGrid ontology defines all possible topics
(predicates) available - Publisher node of a topic stores the origin
metadata (set of triples) - metadata provider is responsible to propagate
updates - Subscriber node of a topic is listening to
updates for the resources that are in the range
of that topic.
36Topic-Based Publish/Subscribe Systems
- Publish/subscribe systems are a way to distribute
messages to many different listeners. - Publishers and subscribers are associated with
topics. - We use in-house developed NaradaBrokering system
- JMS, WS-Notification
- S. Pallickara
Subscriber
Subscriber
Subscriber
Broker Cloud
Publisher