A Software Architecture for Highly DataIntensive Systems - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

A Software Architecture for Highly DataIntensive Systems

Description:

Integrate data sources linked in by exploiting the Data Dictionary structure ... 14th IEEE Symposium on Computer-Based Medical Systems. July 2001. ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 24
Provided by: ChrisMa45
Category:

less

Transcript and Presenter's Notes

Title: A Software Architecture for Highly DataIntensive Systems


1
A Software Architecture for Highly Data-Intensive
Systems
  • Chris A. Mattmann
  • mattmann_at_usc.edu
  • USC Center for Software Engineering
  • Annual Research Review
  • March 2004

Special thanks to Dan Crichton, Steve Hughes, and
Sean Kelly for some of the slides!
2
Overview
  • Motivation
  • Problem Statement
  • OODT A Software Architecture and Middleware for
    Data-Intensive Systems
  • Evaluation Science Problems
  • Planetary Science
  • Cancer Research
  • Conclusion

3
Motivation
4
Problem Statement
  • Information Integration in Data-Intensive Systems
  • Needed to support data access, distribution,
    processing and retrieval across existing
    heterogeneous data sources
  • NASAs Planetary Data System
  • NCIs Early Detection Research Network
  • Software and Techniques exist to perform
    Information Integration
  • But..
  • No Software Re-use
  • No Design Methods to start from
  • No mapping of integration techniques to software
    components, interaction mechanisms, or
    arrangements of components
  • Lack of Re-use and software standards for
    information integration in data-intensive systems
    has forced systems to be built from scratch
  • Little or no interoperability with other software
    systems
  • Programmer almost always in the loop
  • New GDS proposal accompanies most new NASA
    mission proposals

5
Our Approach
  • A Software Architecture for Data-Intensive
    Systems
  • Data Architecture
  • Data Dictionary
  • Resource Profiles
  • Software Architecture
  • Components Product Servers, Profile Servers,
    Query Servers
  • Connector Messaging Layer
  • Configurations of Product/Profile/Query Servers
  • ..and a middleware implementation based on the
    software architecture
  • Middleware leverages existing distributed object
    middleware frameworks such as CORBA, RMI
  • Were currently working on a SOAP version
  • Built and maintained at the Jet Propulsion
    Laboratory
  • Yes, the Mars folks
  • Architecturemiddleware OODT (Object Oriented
    Data Technology)
  • Middleware being developed at JPL
  • Architecture being formalized at USC-CSE

6
Data Dictionary
  • Common Data Model containing
  • Data Elements which the user is interested in
    querying for
  • Data Elements which the user would like to
    retrieve
  • Challenge
  • Integrate data sources linked in by exploiting
    the Data Dictionary structure
  • Map common data model to data source models
    across data-intensive system
  • Use a common data element structure
  • ISO-11179 Specification and Standardization of
    Data Elements
  • Handles the integration of data models across the
    system, but still need to integrate software
    interfaces

7
Resource Profiles
  • Provides mechanisms for describing data systems,
    data products, etc including
  • Common data attributes using Dublin Core (I.e.
    Title, Author, Subject) data elements to describe
    electronic resources
  • Mechanisms for describing where the data is
    located and how to access it
  • Domain data elements that are useful for
    describing the product (i.e. TARGET_NAME,
    MISSION_NAME, INSTUMENT_NAME, etc)
  • Enables search and retrieval of distributed
    data products
  • Searches to a Profile Server yields information
    regarding the characteristics of distributed
    resources (i.e. descriptive information about the
    product, access information, etc)

8
Resource Profiles Example
  • country US and windspeed gt 120

ltprofilegt ltresAttributesgt
ltresLocationgturnedarmiWestern
ltprofileElementgt ltelemNamegtcountrylt/elemNamegt
ltelemValuegtUSlt/elemValuegt
ltprofileElementgt ltelemNamegtstatelt/elemNamegt
ltelemValuegtWAlt/elemValuegt
ltelemValuegtCAlt/elemValuegt ltprofileElementgt
ltelemNamegtwindspeedlt/elemNamegt
ltelemMinValuegt3lt/elemMinValuegt
ltelemMaxValuegt146lt/elemMaxValuegt
ltprofilegt ltresAttributesgt
ltresLocationgturnedarmiSouthern
ltprofileElementgt ltelemNamegtcountrylt/elemNamegt
ltelemValuegtUSlt/elemValuegt
ltprofileElementgt ltelemNamegtstatelt/elemNamegt
ltelemValuegtLAlt/elemValuegt
ltelemValuegtTXlt/elemValuegt ltprofileElementgt
ltelemNamegtwindspeedlt/elemNamegt
ltelemMinValuegt1lt/elemMinValuegt
ltelemMaxValuegt89lt/elemMaxValuegt
Matches!
9
Components
  • Product Server
  • Responsible for abstracting heterogeneous data
    source interfaces
  • Attach a Product Server to each data source that
    is integrated
  • Provides a common query interface across
    heterogeneous data sources
  • Profile Server
  • Describe data resources using resource profiles
  • Allow data resources to be discovered and located
    at query-time
  • Query Server
  • Tie it all together
  • Uses Profile Servers to discover data resources
    which could potentially satisfy a query
  • Queries discovered data resources (such as
    Product Servers) and collects obtained data
    products to return to the user

10
Connectors
  • Messaging Layer
  • Each OODT component registers itself with a
    Component Registry
  • Allows Components to define and provide services
  • Components defined by unique URNs
  • Transfers OODT Query Object containing
  • OODT Style Query
  • (Keyword Value) predicates joined by logical
    operators (AND, OR, etc)
  • The result list to be populated

11
Configurations Example
12
Configurations Example (2)
13
Configurations Example (3)
14
Planetary Science
  • Planetary Data System
  • Official NASA Active Archive for all Planetary
    Data
  • Data ingestion required as part of Announcement
    of Opportunity (AO) for a mission
  • 9 Nodes with data located at discipline sites
  • Common Data Architecture
  • Different data systems located at the sites
  • Prior to October 2002, no ability to find and
    share data between PDS nodes
  • Data distribution via CD ROM
  • Limited electronic distribution

15
OODT PDS Deployment
16
Early Detection Research Network
  • OODTs success has lead to interagency agreements
    with both NIH and NCI
  • OODT has provided the NCI with a bioinformatics
    infrastructure for sharing data across the nation
  • Currently deployed at 10 of 31 NCI Research
    Institutions for the Early Detection Research
    Network (EDRN)
  • Providing real-time access to distributed,
    heterogeneous databases
  • Created a national virtual repository for
    biospecimens (now a NCI Director Initiative)
  • Now integrating new datasets validation studies,
    images, biomarkers, etc
  • Meet Federal security regulations
  • Operational September 2002
  • Same core software framework as deployed in
    planetary, earth and engineering

17
OODT EDRN Deployment
18
Conclusion
  • OODT is..
  • A novel software architecture to describe data
    intensive systems
  • integration, search, retrieval and discovery of
    heterogeneous data stored in heterogeneous domain
    data sources
  • A reference implementation of above software
    architecture
  • Java-based middleware
  • C. Perl, Python, PHP Client APIs
  • A process for annotating and creating standard
    metadata models to describe heterogeneous data
    based on data standards
  • Dublin Core
  • ISO-11179

19
Referred Papers
  • Mattmann C, Ramirez P, Crichton D, and Hughes,
    J.S. Packaging Data Products using Data Grid
    Middleware for Deep Space Mission Systems.
    Accepted for Publication at the 8th International
    Conference on Space Operations, Montreal, Canada,
    2004.
  • Mattmann C, Freeborn D, Crichton D. Towards a
    Distributed Information Architecture for Avionics
    Data. In Proceedings of the 2nd International
    IADIS Conference on the World-Wide-Web and
    Internet, Volume II, pp 829-832. Algarve,
    Portugal, 2003.
  • Crichton D, Hughes, J.S., Kelly, S. A Science
    Data System Architecture for Information
    Retrieval. Clustering and Information Retrieval.
    Kluwer Academic Publishers. December 2003.  -
    Book Chapter on OODT
  • Crichton D, Hughes, J.S., Kelly, S, Rameriz, P.
    A Component Framework Supporting Peer Services
    for Space Data Management. 2002 IEEE Aerospace
    Conference. Big Sky, Montana. March 2002. 
  • Crichton D, Downing G, Hughes J. S, Kincaid H,
    Srivistava S. An Interoperable Data
    Architecture for Data Exchange in a Biomedical
    Research Network. 14th IEEE Symposium on
    Computer-Based Medical Systems. July 2001.  
  • Crichton, D., Hughes J. S, Hardman S, Kelly S. A
    Distributed Component Framework for Data Product
    Interoperability. 17th CODATA International
    Conference, Baveno, Italy. October 2000.
  • Crichton, D., Hughes J. S, Kelly S, Hyon J.
    Science Search and Retrieval using XML. Second
    National Conference on Scientific and Technical
    Data, Washington D.C., National Academy of
    Sciences. March 2000.

20
Questions?
  • Contacts
  • OODT Website http//oodt.jpl.nasa.gov
  • Principal Investigator
  • Dan Crichton (Dan.Crichton_at_jpl.nasa.gov)
  • Co-Investigator
  • Steve Hughes (Steve.Hughes_at_jpl.nasa.gov)
  • Programmer/Research Grunt
  • Me (chris.mattmann_at_jpl.nasa.gov)
  • Thanks for your attention!

21
Backup Slides
22
Resource Profiles Example
  • country US and windspeed gt 120

ltprofilegt ltresAttributesgt
ltresLocationgturnedarmiWestern
ltprofileElementgt ltelemNamegtcountrylt/elemNamegt
ltelemValuegtUSlt/elemValuegt
ltprofileElementgt ltelemNamegtstatelt/elemNamegt
ltelemValuegtWAlt/elemValuegt
ltelemValuegtCAlt/elemValuegt ltprofileElementgt
ltelemNamegtwindspeedlt/elemNamegt
ltelemMinValuegt3lt/elemMinValuegt
ltelemMaxValuegt146lt/elemMaxValuegt
ltprofilegt ltresAttributesgt
ltresLocationgturnedarmiSouthern
ltprofileElementgt ltelemNamegtcountrylt/elemNamegt
ltelemValuegtUSlt/elemValuegt
ltprofileElementgt ltelemNamegtstatelt/elemNamegt
ltelemValuegtLAlt/elemValuegt
ltelemValuegtTXlt/elemValuegt ltprofileElementgt
ltelemNamegtwindspeedlt/elemNamegt
ltelemMinValuegt1lt/elemMinValuegt
ltelemMaxValuegt89lt/elemMaxValuegt
Matches!
23
Object Oriented Data Technology
  • Object-Oriented Data Technology (OODT)
  • Funded in 1998 by NASAs Office of Space Science
    to develop a national software framework for
    sharing data across heterogeneous, distributed
    data repositories
  • Develop
  • a common data and software framework to enable
    data sharing across multiple science and
    engineering disciplines
  • A reusable software architecture across data
    management projects
  • Reusable software components with common
    interfaces
  • Interfaces to enable new components to be plugged
    in
  • Mechanism to wrap legacy data system components
    with minimal impact
  • OODT should provide..
  • Science domain independence (use in engineering,
    science and biomedicine)
  • Data location independence (describe what you
    want, not how/where to get it
Write a Comment
User Comments (0)
About PowerShow.com