Title: Dan Crichton, Manager, Enterprise Data Architecture Task,
1Interoperability and Data Architecture for
Metadata DevelopmentBiomarkers Knowledge System
MeetingBethesda, MD September 8, 2000
Dan Crichton, Manager, Enterprise Data
Architecture Task, Principal Investigator Object
Oriented Data Technology Task Steve Hughes, Lead
System Engineer, Planetary Data System, Co
Investigator, Object Oriented Data Technology
Task Thuy Tran, Senior Software Engineer,
Enterprise Data Architecture and Object Oriented
Data Technology Tasks Jet Propulsion Laboratory,
California Institute of Technology National
Aeronautics and Space Administration
2Outline
D. Crichton S. Hughes T. Tran
- Describe JPL and enterprise computing
- Define an Enterprise Data Architecture
- Define why an EDA is Critical to NASA
- Address Data Interoperability Challenges
- Describe the Object Oriented Data Technology task
- Case Study The Planetary Data System
- Provide an Overview of PDS and Objectives
- Describe the PDS Organizational Structure
- What has the PDS accomplished
- Discuss the role of Metadata within the PDS
- Demo search of PDS data sets and PTI data sets
3About JPL and Enterprise Computing
- JPL is a federally funded research and
development center (FFRDC) run by Caltech for
NASA - JPL is NASAs lead center for robotic exploration
of the universe - JPL has an enormous amount of data that it needs
to manage from scientific data, to engineering,
to institutional - We represent several efforts in both the research
and enterprise side of JPL that is addressing
enterprise architectures for integrating data at
both JPL and NASA. Such efforts include - Knowledge Management
- Enterprise Data Architecture Task
- Planetary Data System
- Object Oriented Data Technology
4What is an Enterprise Data Architecture?
- An enterprise data architecture provides the
infrastructure necessary to enable development of
interoperable, enterprise-wide applications - Data Interoperability
- Data Sharing
- Data Access
- Facilitate access
- Reduce complexity
- Data Management
- Data Archiving
- Basic infrastructure to support knowledge
discovery
5Why is an EDA Critical to NASA?
- Interoperability is an important key to unlock
knowledge discovery - Allows scientists the ability to locate critical
information - Enables knowledge management across NASA
- A key to scientific discovery
- State of data systems at NASA agency wide
- Difficult to access (no standard interfaces)
- Geographically distributed
- Have no standard language or protocol for
interchange (no EDI) agency wide - No common metadata language agency wide
- Have no system for registration of data products
- Have little or no interoperability
- Have few common terms for describing data
6Interoperability Challenges and Needs
- Space scientists can not easily locate or use
data across the hundreds if not thousands of
autonomous, heterogeneous, and distributed data
systems currently in the Space Science community. - Heterogeneous Systems
- Data Management - RDBMS, ODBMS, Home Grown DBMS,
Binary Files - Platforms - UNIX, Linux, WIN3.x/9x/NT, Mac, VMS,
- Interfaces - Web, Windows, Command Line
- Data Formats - HDF, CDF, NetCDF, PDS, FITS, VICR,
ASCII, ... - Data Volume - KiloBytes to TeraBytes
- Heterogeneous Disciplines
- Moving targets and stationary targets
- Multiple coordinate systems
- Multiple data object types (images, cubes, time
series,binary,document) - Multiple interpretations of single object types
- Multiple software solutions to same problem.
- Incompatible and/or missing metadata
7Solution Build a Data Architecture
- Solution build a data architecture by initially
focusing on - Metadata management
- A middleware framework for interoperability
8Focus on Metadata
- Metadata is data about data
- Provides descriptive information about the data
- Classification, identification, etc
- Metadata Example
- Data Value 55 (not descriptive)
- Metadata Values
- Data Element NameVehicle_Speed
- Unit Miles per Hour
- Description The average velocity of a vehicle.
- Build metadata repositories that manage
information about distributed data products (E.g.
location, target, observation date, etc) - Use standards where appropriate
- ISO/IEC 11179 A framework for the Specification
and Standardization of Data Elements - Dublin Core A metadata element set intended to
facilitate discovery of electronic resources.
9Focus on Middleware
- Middleware defined as
- In the computer industry, middleware is a
general term for any programming that serves to
glue together or mediate between two separate
and usually already existing programs. A common
application of middleware is to allow programs
written for access to a particular database to
access other databases. - Messaging is a common service provided by
middleware programs so that different
applications can communicate. The systematic
tying together of disparate applications is known
as enterprise application integration. - http//www.whatis.com
- Middleware allows for the encapsulation of
individual data systems - Hide uniqueness by introducing the data
architecture layer - Ties distributed applications together an often
works with a Electronic Data Interchange (EDI)
type mechanism - Enables reuse and promotes standards
10Role of Middleware
Applications
User Interface
Middleware
Data
Middleware can tie application, data, and user
interfaces together and hide the unique interfaces
11NIST I.T. Architecture for Federal Govt
Redrawn from Federal Enterprise Architecture
Framework version 1.1, September 1999, Chief
Information Officers Council
Drives ?
InformationArchitecture
Prescribes ?
Information SystemsArchitecture
Identifies ?
Enterprise Data Architecture
Supported by ?
Delivery Systems Architecture
12Object Oriented Data Technology Task
- Research task funded by the Office of Space
Science (OSS) at NASA - Provides a framework for managing data access and
interoperability - Archive Service For managing data sets
- Profile Service For managing metadata profiles
about data systems, data sets, and data products - Product Service To tie individual data systems
into a larger enterprise data system - Presented a paper at CODATA in March 2000 called
Science Search and Retrieval using XML
13OODT Focus
- Focus on building middleware components for an
enterprise data architecture - Focus on building profiles for managing
metadata information about cross-disciplinary
resources - Provide sufficient layers of abstraction in the
architecture to isolate technologies choices from
the architecture choices - XML for the data content
- CORBA for the data transport
- Research technologies for implementing a
distributed data architecture - Distributed Object Computing (CORBA, DCOM, etc)
- Database Technology (RDBMS, ODBMS)
- Data Access Technologies (O/JDBC, STEP, XML, etc)
- Directory Implementations (LDAP)
- Data Interchange (XML)
- Communication Technologies (Web/HTTP, MOM, RPC,
etc)
14OODT Pilot Activity
- Partner with the Planetary Data System (PDS) to
address interoperability across 10 PDS silos - Build a generic XML Document Type Definition
(DTD) that will support PDS data dictionary and
metadata infrastructure - Demonstrate how a science query can return data
across the PDS nodes - Demonstrate how the same interface can return
information between planetary and astrophysics
data systems
15OODT Metadata Development
- Metadata Registry Develop a data management
system for managing the semantics of data that is
shared within and between domains. - Terminology Base Domain specific name space.
- Data Dictionary Inventory of domain terms with
definitions and other distinguishing attributes. - Ontology A set of concepts, their relationships
and constraints, all within the scope of a
domain. - XML for metadata registry and communication
- The PDS experience with the Planetary Science
Data Dictionary has shown the criticality of
metadata in enabling data sharing and system
interoperability.
16What is the PDS?
- PDS is the official planetary science data
archive for the NASA Office of Space Science
(OSS) Solar System Exploration (SSE). - PDS is chartered to ensure that SSE planetary
data are archived and available to the scientific
community. - PDS is a distributed system designed to optimize
scientific oversight in the archiving process. - The PDS has been in existence for 10 years.
17Objectives of the PDS
- Publish and disseminate documented data sets for
use in scientific analysis - Work with projects to help design, generate, and
validate data products for placement in archive - Develop and maintain archive data standards to
ensure future usability. - Provide expert scientific help to the user
community.
18What is meant by a documentedData Set?
- The goal of the PDS archiving system is for each
data set to be autonomous, i.e., all information
required to understand and interpret the data
should be included in the archive. - To that end, an archive package includes
- Raw data
- Data calibrated to physical units
- Calibration data and algorithms
- Ancillary data, e.g. observation geometry
- Higher level data products (maps, projections,
other aggregations) - Documentation for the data, instrument, flight
project, etc. (metadata)
19What is the structure of the PDS?
- PDS is a distributed system designed to optimize
scientific oversight in the archiving process - The PDS is managed by discipline scientists
working with the project manager - PDS Science Discipline Nodes
- Archival of data and supporting documentation
- Expertise in researching and interpreting the
data - Expertise in the planning and design of future
observations and data sets - Distribution of data to the community
- PDS Central Node
- Program management
- Project engineering
- Standards development
20PDS Nodes and Institutions (Silos)
Geosciences/Washington University
Rings/Ames
Radio Science/Stanford
Small Bodies/UMD
Planetary Plasma/UCLA
Imaging/JPL
Central Node/JPL
Imaging/USGS
Atmospheres/New Mexico State
NAIF/JPL
21What has the PDS Accomplished?
- Produced a high-quality peer-reviewed archive of
Solar System Exploration Data - Stored for long-term viability
- Described by metadata
- Distributed either online or on CD media
- Developed a robust standards architecture
- Planetary Science Data Dictionary - Provides the
domain of discourse for the planetary science
community. - Planetary Community Model - Provides formalized
descriptions of the entities and their
relationships within the planetary science
community. - Developed science driven management structure
- Responsive to changing mission project
environment through distributed, science
discipline oriented nodes.
22The Use of Metadata in the PDS
Locate and Use Data - Use context to find data -
Use context to understand data
Mission
Target
Data Set Collection
System Interoperability - Use context to share
data
Spacecraft
Planetary Science Model
Data Set
Correlative Science - Use context to find new
relationships between data
Instrument
Spectrum
Time Series
Image
Document
Model Attributes
Label
Data
23What OODT is doing for PDS?
- Problem Statement - In spite of the Web and a
common standards architecture, the PDS continues
to be a collection of heterogeneous data systems
with little resource sharing. - Solution
- Prototype a PDS profile service that will manage
metadata profiles for data sets, data products,
and data systems. - Prototype PDS product servers to integrate
individual data systems. - Promote the use of archive services by mission
projects for more efficient production of data
products.
24OODT Demonstration
- Search for PDS data sets
- Search for PDS images (granules)
- Search for Astrophysics data by star
- Searches the Palomar Testbed Interferometer (PTI)
archive
25OODT Query Flow
Search Web Page
XMLQuery(no results)
XMLQuery(no results)
Userquery
Query Server
Profile Serverjpl
QueryClient
Web server
search.jsp
Profile DB
XMLQuery(profiles of resources to handle query)
XMLQuery(profiles ordata resultsas requested)
XSL(profiles ordata productsformatted)
Product Serverjpl.pti
PTI Repository
XMLQuery (product search)
Product Serverjpl.pds
XMLQuery (data results)
PDS DVD Jukebox
Product Serverjpl.pds.mola
PDS MOLA Oracle DB
26More Information
- Science Search and Retrieval using XML by OODT
Team. Presented at Second National Conference on
Scientific and Technical Data, National Academy
of Sciences, Washington D.C. - http//oodt.jpl.nasa.gov/doc/papers/codata/paper.p
df - Planetary Data System
- http//pds.jpl.nasa.gov
- Dublin Core
- http//purl.oclc.org/dc
- Extensible Markup Language
- http//www.w3c.org/XML
- ISO/IEC 11179 Specification and Standardization
of Data Elements - Object Management Group (CORBA and UML standards)
- http//www.omg.org
- Federal CIO Statement on Metadata
- http//www.cio.gov/docs/metadata.htm
- National Information Standards Organization
Z39.50 Information Retrieval Protocol - http//www.niso.org/z3950.html
27Backup Slides
28JPL Org Chart (partial)
Caltech President
JPL Office of the Director
Institutional Computing/ Chief Information
Officer
Engineering and Science Directorate
Science and Earth Science Programs
Enterprise Infrastructure Office
Enterprise Applications Office
Object Oriented Data Technology
Planetary Data System
Science Data Management and Archiving
29Org Chart - Responsibility Flow
Program Offices
Implementation Organizations
Enterprise Applications Office
Deliverables
Science Data Management and Archiving
Funding, Programmatic Oversight
Deliverables
Planetary Data System
Deliverables
Object Oriented Data Technology
Task Management Design and Implementation
Responsibility
30Institutional Enterprise Data Architecture
Breakdown
- Paradigm shift from stove pipe implementations to
horizontal solutions that cross organizational
boundaries - Include such services as
- Enterprise Application Standards
- Object Services
- Data Infrastructure Services
- Database Hosting
- Metadata Management
- Data Interchange
- Information Architecture Services
- Institutional directory and security access
- Data system APIs for access
- Data mining and data warehousing
- Data Management Services
- Data archiving
31JPL Enterprise Architecture (Logical View)
32What is a profile?
- A profile is a set of resource definitions
implemented in XML for data products residing in
one or more distributed systems - Profile servers are CORBA servers that manage XML
profile definitions - Profile servers communicate via XML-over-CORBA
- Developed Java classes that map XML profiles to a
Java object
Profile Distributed Node Architecture
33Profile Server Architecture
34Profile DTD
lt!ELEMENT profiles (profile)gt lt!ELEMENT
profile (profAttributes, resAttributes,
profElement)gt lt!ELEMENT profAttributes
(profId, profVersion, profTitle, profDesc,
profType, profStatusId,
profSecurityType, profParentId, profChildId,
profRegAuthority, profRevisionNote,
profDataDictId)gt lt!ELEMENT resAttributes
(Identifier, Title, Format, Description,
Creator, Subject, Publisher,
Contributor, Date, Type, Source,
Language, Relation, Coverage, Rights,
resContext, resAggregation, resClass,
resLocation)gt lt!ELEMENT profElement
(elemId, elemName, elemDesc, elemType,
elemUnit, elemEnumFlag, (elemValue
(elemMinValue, elemMaxValue)),
elemSynonym, elemObligation,
elemMaxOccurrence, elemComment)gt
35Profile Example
ltltprofilegt ltprofAttributesgt
ltprofIdgtOODT_PDS_DATA_SET_INV_82lt/profIdgt
ltprofDataDictIdgtOODT_PDS_DATA_SET_DD_V1.0lt/profDat
aDictIdgt lt/profAttributesgt ltresAttributesgt
ltIdentifiergtVO1/VO2-M-VIS-5-DIM-V1.0lt/Identifiergt
ltTitlegtVO1/VO2 MARS VISUAL IMAGING SUBSYSTEM
DIGITAL IMAGING MODELlt/Titlegt
ltFormatgttext/htmllt/Formatgt ltLanguagegtenlt/Langu
agegt ltresContextgtPDSlt/resContextgt
ltresAggregationgtdataSetlt/resAggregationgt
ltresClassgtdata.dataSetlt/resClassgt
ltresLocationgthttp//pds.jpl.nasa.gov/cgi-bin/pdsse
rv.pl?OBJECT_IDPDS100751lt/resLocationgt
lt/resAttributesgt ltprofElementgt
ltelemIdgtARCHIVE_STATUSlt/elemIdgt
ltelemNamegtARCHIVE_STATUSlt/elemNamegt
ltelemTypegtENUMERATIONlt/elemTypegt
ltelemEnumFlaggtTlt/elemEnumFlaggt
ltelemValuegtARCHIVEDlt/elemValuegt
lt/profElementgt ltprofElementgt
ltelemIdgtTARGET_NAMElt/elemIdgt
ltelemNamegtTARGET_NAMElt/elemNamegt
ltelemTypegtENUMERATIONlt/elemTypegt
ltelemEnumFlaggtTlt/elemEnumFlaggt
ltelemValuegtMARSlt/elemValuegt
lt/profElementgt lt/profilegt
36OODT Product Server
- The Product Server plugs into the OODT framework
and manages the handshake between the data
system and the OODT system. - Extensible by dynamically loading objects at
runtime which are specific to the data system
model - Queries and results are passed using an OODT XML
Query structure
Generic Server
Implementation Class
File Sys
Query
Result
Database
Product Server
37XML Query Structure
- Defined as follows
- The query description
- The results
- Result 1
- Result Header
- Result Data
- Result 2
- Result Header
- Result Data
- ...
- Result N
- Result Header
- Result Data
38Why XML for OODT?
- XML doesnt provide a silver bullet, but it
does allow us to refocus the problem on metadata - Metadata is a key to interoperability
- XML is language neutral
- Allows the designer to separate the data and the
transport (re CORBA vs XML-over-CORBA) - Transport mechanism and data are not tied
together - Could be XML/HTTP
- Simpler deployments
- Simpler interfaces
- Allows technologies to grow and change
independently - Real value of XML is the content
39CORBA vs XML
- XML over CORBA/IIOP
- module jpl module user interface
UserManager string do(string xml)
-
-
- lttransactiongt ltfindUsergt ltusergt
ltsurnamegtDoelt/surnamegt lt/usergt
lt/findUsergtlt/transactiongt
- CORBA method
-
- module jpl module user interface
UserManager User findUser(string - name)
interface User String getName()
40Middleware Framework for OODT
Archive Client
OBJECT ORIENTED DATA TECHNOLOGY FRAMEWORK