Metadata and Information Services for an Earthquake Simulation Grid PowerPoint PPT Presentation

presentation player overlay
1 / 36
About This Presentation
Transcript and Presenter's Notes

Title: Metadata and Information Services for an Earthquake Simulation Grid


1
Metadata and Information Services for an
Earthquake Simulation Grid
  • Mehmet Aktas, Marlon Pierce, and Geoffrey Fox
  • Community Grids Lab
  • Indiana University

2
SERVOGrid Background
  • Web services based grid for supporting earthquake
    simulation
  • Components
  • Databases faults, GPS, Seismic catalogs
  • Simulation codes Monte Carlo, FEM, mesh
    generation tools
  • Web services
  • Data access, job management (workflow), file
    transfer, session management.
  • Portlet-based user interface
  • Information services

3
Information Service Complaints
  • I have never been happy with the various
    information services.
  • In my experience, data models have problems
  • Tree model forces arbitrary decisions about
    container organization
  • Overuse of ltanygt tags
  • Poorly maintained information
  • UDDI problem registries are filled with obsolete
    information.
  • Information servers tend to be very centralized.
  • Peer approaches need to be examined

4
Semantic Information Services
  • After reviewing the Semantic Web specifications,
    I became interested in using them for information
    services.
  • Graph models seem to be a more natural way to
    extend interlinked information.
  • Using URIs, potentially easier to support
    fragmented data
  • Centralized data services can be used but also
    P2P approaches.
  • I am not so interested in artificial
    intelligence, reasoning, etc.
  • Interesting problems, but for someone else.
  • We see two different activities
  • Designing an RDFS/OWL ontology to act as
    SERVOGrid information services data models
  • Implementing the information services middleware.

5
An Ontology Overview
6
Sample Simulation Codes
  • Disloc calculates surface stress displacements
    causes by a fault placed in an elastic
    half-space. Surface data can be either on a grid
    or on defined scattered points. Can also create
    InSAR-style surface displacements.
  • Simplex inverts Disloc to estimate fault
    parameters from observed surface displacements.
    Surface displacements can be either on a grid or
    at defined points.
  • GeoFEST does a realistic model of stresses
    created by a fault. Uses finite element method,
    realistic material properties.
  • AKIRA Converts a geometry (layers, faults)
    specification into a finite element mesh.
    Successive calls refine the mesh. Needed as a
    helper application for GeoFEST.
  • Virtual California Based on realistic fault and
    fault friction models, simulates interacting
    fault systems.

7
Visualization Codes
  • We associate simulation codes with zero or more
    visualization systems.
  • GMT (General Mapping Tool)
  • IDL
  • RIVA
  • Web Map Service (GIS)
  • In practice, we usually refer to scripts for
    specific tasks rather than the entire toolkit.

8
Sample Compute Resources
  • Grids a Sun Ultra 60 with Disloc, Simplex, and
    VC installed.
  • Danube linux dual processor machine with
    GeoFEST, Akira, GMT installed.
  • Jabba an SGI 8 processor machine with RIVA
    installed.

9
Data Types and Formats
  • This is a mixture of data objects and
    representations. As always, the data itself is
    not represented but information like the creator
    of the data is.
  • Faults
  • GPS data
  • Seismicity
  • Surface stress data
  • INSAR data
  • Surface data representation grid or point data

10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
Managing Distributed Metadata
14
Managing Distributed Metadata
  • Small problem with the Semantic Web/Grid
  • How do you manage fragments of dynamic metadata?
  • (Assume a uniform data model)
  • In our case, we need a medium sized distributed
    information system
  • Not the entire web, but dynamic enough to benefit
    from distributed information systems.
  • We want to strike a balance between response time
    efficiency and reliability.

15
Cache Nodes
  • Instances of the SERVO ontology are initially
    distributed over several distributed cache nodes.
  • No one cache has all the instances.
  • Caches are accessed as peer-to-peer nodes.

(grids,hasCode,disloc)
(danube,hasCode,geoFEST)
(kamet,hasCode,Slider)
16
Querying a Proxy Cache
  • Clients can connect to any of the cache nodes via
    a Web service connection.
  • Queries and responses are just SOAP requests.
  • If the Proxy cache cant answer a query, it does
    a P2P search of all neighbors.
  • If/when query is answered, the initiating proxy
    cache augments its RDF store with the new info
    from the peer.
  • It can henceforth answer that query without
    searching.

17
Client
(?,hasCode,GeoFEST)
Proxy Cache 1
(Grids,hasCode,Disloc)
SOAP Call
(Danube,hasCode,GeoFEST)
Peer Search
Proxy Cache 2
(Danube,hasCode,GeoFEST)
18
Notification Updates to Proxy Caches
  • Proxy caches acquire larger sets of metadata over
    time in response to client queries.
  • Problem now is that caches can become out of
    synch.
  • Disloc may be removed from Grids, so all caches
    have to be notified.
  • This is handled through publish/subscribe system
    based on topics.
  • There is one topic for each property.
  • Caches subscribe to topics for each property.
  • Origin caches are allowed to publish changes.

19
More Information
  • QuakeSim http//www-aig.jpl.nasa.gov/public/dus/q
    uakesim/
  • Semantic Web Work http//grids.ucs.indiana.edu/m
    aktas/servo/
  • NASA CT and AIST support the QuakeSim project,
    and to NASA Ames supported Semantic Grid
    investigations.

20
Querying Cache Space
Broker Cloud
Web Service
Client
Cache Space
21
The Picture
  • Each peer of the P2P network is working as Proxy
    Cache. A Proxy cache forms a door between client
    a the Cache Space.
  • Clients interact with peers through a Web Service
    interface
  • When a clients queries a peer where the cache is
    installed, this peer will query its cache and
    then forward the query to the rest of the Cache
    Space.
  • Forwarding simply happens as publishing the query
    to the available topics. With this method query
    is distributed stepwise to the nodes that are
    semantically connected to the origin Proxy Cache.
  • Each query message has the unique identifier of
    the peer that originates the query. When the
    results are propagated
  • Each cache repeats the querying and forwarding
    process unless there is results.
  • When there is results to the query, results are
    propagated back as an RDF Model
  • Distributed search stops when there are results
    satisfying the query or when there are no results
    found after a customized threshold for the number
    of stepwise exploration.

22
What About WS-ltanygt?
  • We are examining the feasibility of using RDF and
    related languages to describe our information
    requirements.
  • Build a testbed infrastructure for decentralized
    metadata management as a proof of concept.
  • There are many activities and specifications in
    this general area that we do not want to use in
    the proof-of-concept phase.
  • WS-Notification and WSRF obviously.

23
Edutella P2P Network Infrastructure
  • Edutella uses JXTA framework for P2P
    functionality and provides services that
    complement JXTA service layer.
  • Edutella uses RDF syntax for metadata. Each peer
    is provides a Query Service to search its RDF
    repository.
  • There is no stepwise exploration of the peers. A
    JXTA peer sends a query only to its JXTA
    neighborhood regardless of the link structure of
    the metadata.
  • There is no cashing of the metadata on the peers
    to decrease the search time. Each JXTA peer
    performs a query on its own data repository.

24
High level classification of classes in Servo
Grid Ontology
25
Classification of ServoComputePlatform
26
Classification of Servo Code Characteristics
27
Associated properties with ServoCode Class
28
Associated properties with ServoData Class
29
Associated properties with ServoComputePlatform
Class
30
Edutella P2P Network Infrastructure
  • Edutella uses JXTA framework for P2P
    functionality and provides services that
    complement JXTA service layer.
  • Edutella uses RDF syntax for metadata. Each peer
    is provides a Query Service to search its RDF
    repository.
  • There is no stepwise exploration of the peers. A
    JXTA peer sends a query only to its JXTA
    neighborhood regardless of the link structure of
    the metadata.
  • There is no cashing of the metadata on the peers
    to decrease the search time. Each JXTA peer
    performs a query on its own data repository.

31
Notification-Based Caching
  • We want to strike a balance between centralized
    and decentralized content management.
  • Metadata instances may be distributed over
    several hosts.
  • We are investigating caching based on
    breadth-first search.
  • Each node stores its own data and all of the
    immediate property values, one node deep.
  • This allows each node to maintain a moderate
    amount of information sufficient to satisfy
    immediate (one-hop) RDF queries.

32
Distributed RDF Queries based on properties
  • Queries are formed as triples to find available
    metadata about a resource.
  • Results are metadata (set of triples) regarding
    the requested resource.
  • A query may be issued to any P2P node where the
    cache is installed.
  • Starting from first cache, each cache is queried
    via stepwise exploration.
  • First cache interacting with the client is the
    Proxy cache.
  • The Proxy cache distributes the clients query
    with its unique identifier to be able to receive
    the results.

33
Distributed Query Steps occur as follows
  • First cache in the cache space is queried.
  • When there is no results, query is published to
    available topics with the unique identifier of
    the first P2P node.
  • Each cache repeats the querying and forwarding
    process unless there is results.
  • When there is results satisfying the query,
    results are propagated back as an RDF Model
  • Distributed search stops when there are results
    satisfying the query or when there are no results
    found after a customized threshold for the number
    of stepwise exploration.

34
Initialization of system with fragmented RDF
  • When a node is bootstrapped, a triple store is
    created out of available RDF Models at that node.
  • Predicates of available triples (where the object
    of the triple is a Resource) form topics.
  • Topics are created at the broker node dynamically
    by publishing a message to static topics such as
    createTopic.
  • A Resource metadata provider can be a publisher
    for the topics where the Resource is the Domain
    of that topic.
  • A node can be a subscriber to a topic when there
    are Resource Objects (in the cached triple store)
    that are in the Range of that topic.

35
Notification based updates
  • Static topics, such as createTopic, deleteTopic
    are used to create dynamic topics.
  • Finite amount of topics available
  • SERVOGrid ontology defines all possible topics
    (predicates) available
  • Publisher node of a topic stores the origin
    metadata (set of triples)
  • metadata provider is responsible to propagate
    updates
  • Subscriber node of a topic is listening to
    updates for the resources that are in the range
    of that topic.

36
Topic-Based Publish/Subscribe Systems
  • Publish/subscribe systems are a way to distribute
    messages to many different listeners.
  • Publishers and subscribers are associated with
    topics.
  • We use in-house developed NaradaBrokering system
  • JMS, WS-Notification
  • S. Pallickara

Subscriber
Subscriber
Subscriber
Broker Cloud
Publisher
Write a Comment
User Comments (0)
About PowerShow.com