Title: Semantic Interoperability Between Distributed Science Data Registries
1Semantic Interoperability Between Distributed
Science Data Registries
- Open Forum 2003 on Metadata Registries
- 400 - 445 PM
- January 2003
J. Steve Hughes Jet Propulsion Laboratory
2Topics
- Introduction and Background
- Challenges
- Solution
- Implementation
- Benefits
- Next Steps
3PDS Overview
Key PDS Products and Services High quality
peer-reviewed data archives Data distribution to
planetary community Archiving expertise to
planetary missions Scientific expertise and
support for users Value-added aggregated data
products Education and outreach data products and
services
Node structure provides focus on key disciplines
4PDS Role for Planetary Science Data
missions
2003 Mars Exploration Rovers
Muses-C
Mars Scouts
Mars Pathfinder
NEAR
Galileo
Mars Express
2001 Mars Odyssey
Messenger
Voyager
Deep Impact
Rosetta
Ulysses
Lunar Prospector
Deep Space 1
MGS
Cassini
Stardust
DAWN
MRO
P l a n e t a r y D a t a S y s t e m
data products
scientists
the public
mission planners
educators
5Recently Archived PDS Products
The Environs of NEAR Shoemaker's Landing Site on
Eros Catalog PIA03141 2/07/01
Galileo SSI Global image of Io (true color)
Catalog PIA02308 8/27/99
MGS Pre-Mapping Phase Pilot DVD Set
MGS Martian North Polar Cap on September 12,
1998 Catalog PIA01471
Dark Dunes Over-riding Bright Dunes MGS MOC
Release No. MOC2-201, 1/31/2000
Clementine Observes the Moon, Solar Corona, and
Venus Catalog PIA00434 11/04/97
6Data Archiving Life Cycle
- Planning Phase
- Data archiving requirements written into mission
Announcement of Opportunity - Pre-proposal briefing on PDS data archiving
requirements given to potential proposers - Proposal data archiving section reviewed by PDS
- PDS orientation to flight project staff
- Data archiving working groups formed
Definition Design Phase
PlanningPhase
- Definition Design Phase
- Project Data Management and Archive Plans define
data to be archived - Data Product and Volume Organization Software
Interface Specifications detail the data and
volume structure - Preliminary metadata labels loaded into PDS
catalog
- Production Phase
- Raw and processed data products, labels
(metadata) and documentation produced - Preliminary and quick-look data made accessible
via Project and PDS web pages - Data archive products validated and
peer-reviewed liens corrected
Distribution Maintenance Phase
Production Phase
- Distribution Maintenance Phase
- Final data products made available on-line
- PDS adds the data to the archive
- Physical copies sent to NSSDC
- PDS provides data, documentation and science
expertise to users - Data archive maintained via periodic media
refreshes, addition of new / updated data products
7Previous PDS Archive Production and Distribution
Process
- PDS receives data from flight projects for
archive and distribution - PDS helps planetary missions to create high
quality data archives and to release them in a
timely manner - PDS validates data for compliance to PDS
standards - PDS assembles, publishes, distributes, and
maintains peer-reviewed, documented planetary
data sets - PDS archive data also available on-line at PDS
discipline nodes - Problem Planetary missions are producing larger
data volumes - CD-ROM distribution is too expensive
- Even if DVDs replace CDs, there will still be
hundreds, even thousands, of volumes - Difficult for users to store difficult to locate
data of interest - A new paradigm for archive and distribution is
needed
8 Current Challenges
- More missions
- Smaller, more frequent missions more orbiters
- New programs (Mars Exploration, Discovery, New
Frontiers) - Inadequate mission archiving budgets
- New PIs with little experience in data archiving
- Larger data volumes
- Bigger payloads (Cassini-18 instruments,
Galileo-16, Rosetta-20) - More complex instruments/better resolution/higher
data rates - THEMIS will return 5TB of data (2 Magellans)
- Mars 05 300TB (100 Magellans!)
- Increased user expectations
- Demand for instant internet access and modern
interfaces - Need for sophisticated methods to access larger
data volumes and to locate data of interest
9Archive Growth with Mars Exploration Program
10PDS Data Distribution New Paradigm
- Online access is the primary method for data
distribution, with improved tools to support
users - Find out what data exist
- Select data of interest
- Retrieve data
- Correlate data across instruments, missions, and
nodes - Data are publicly available as soon as possible
- Copies on physical media are available on demand
using limited resources - Special collections containing data of high
interest can be published on physical media from
time to time - Copies of complete data sets on cost-effective
physical media for long term archives at PDS and
NSSDC (minimum 3 copies) are still required from
the flight projects
11The Data
- Variety and Volume
- 5TB of data from 30 years of exploration
- 700 Data Sets (hundreds of product types)
- 1700 Archive Volumes CD/DVDs
- Camera, Spectrometer, LECP, SAR, RS,
- Images, Time_Series, Spectra, Qubes, Tables,
- Binary and ASCII
- Spacecraft and Earth Based
- Many data representations
- Geographically distributed
- Multi-disciplinary
- Maintain original bits and convert as needed
12The Data Model
Level Group/Element Structure ___________________
______________________ 1 spacecraft instrument
identification group 2 instrument
identification 2 instrument name
2 spacecraft identification 2 instrument
type 1 instrument description ... 1 filter
group 2 filter name 2 filter
number 2 filter type ...
13An Image Label
DATA_SET_ID "VO1/VO2-M-VIS-5-DIM-V1.0" SPACECRA
FT_NAME VIKING_ORBITER_1, ... TARGET_NAME
MARS IMAGE_ID MG88S045 IMAGE
2 SOURCE_IMAGE_ID "383B23", "421B23",
... INSTRUMENT_NAME VISUAL_IMAGING_SUBSYSTEM
... NOTE "MARS DIGITAL IMAGE
... OBJECT IMAGE LINES 160
LINE_SAMPLES 252 SAMPLE_TYPE
UNSIGNED_INTEGER SAMPLE_BITS 8
SAMPLE_BIT_MASK 211111111 CHECKSUM
2636242 END_OBJECT
14Categories of Meta-Data
- Data Represenation
- Data Representation
- ITEM_TYPE VAX_INTEGER
- File Attributes
- RECORD_TYPE FIXED_LENGTH
- RECORD_BYTES 252
- Data Organization
- LINES 160
- LINE_SAMPLES 252
- SAMPLE_TYPE UNSIGNED_INTEGER
- SAMPLE_BITS 8
- Catalog
- Identification
- DATA_SET_ID "VO1/VO2-M-VIS-5-DIM-V1.0
- IMAGE_ID MG88S045
- INSTRUMENT_NAME VISUAL_IMAGING_...
- Observation Context
- FILTER_NAME CLEAR
15PDS-D Implementation
- Multi-tiered information architecture
- Application Clients (Browsers/Interfaces)
- Middleware (OODT)
- Data and Metadata Servers (product server,
profile server) - Data Repositories and Catalogs
- Existing PDS subsystems
- Data and resources remain physically distributed
and locally managed - Underlying heterogeneity is encapsulated and
hidden from the users - User Interfaces (Image Atlas, DITDOS,etc.)
- Data repositories (disk farms, databases, CD
Jukeboxes) - Catalogs
- Separate data and technology architectures
- PDS archive metadata used to its full potential
- Evolved technology architecture deployed
- Internet used for data distribution
16PDS-D Architecture for Mars Odyssey
Users
Educational
Science
General Public
Data Set View
IDL, WIPE
Distributed Clients
Standard Interfaces (OODT Middleware)
Data Products and Metadata
MARIE, PDS PPI
Documents and Ancillary Files PDS CN
THEMIS ASU
Radio Science PDS GEO
SPICE PDS NAIF
GRS PDS GEO
ACCEL PDS ATMOS
17Data Product Retrieval
18Conceptual Architecture
Name Server
Web I/F
Query Server
Node 1 Profile Server
XMLQuery
Web server Plugins
Web Server
XMLQuery
Node 1 Catalog
Node 1 Product Server
XMLQuery
QueryClient
XMLQuery
Desktop I/F
XMLQuery
Node 1 Products
XMLQuery
XMLQuery
XMLQuery
DSCAT Profile Server
DS Catalog
Client Environment
Central Node
Discipline Node
19Information Architecture in a Nutshell
- Profiles describe and provide location for
anything of interest - Things of interest
- Data Sets, Data Set Browsers, Data Products,
Volumes, Websites, Web Applications, etc - Written as XML documents
- Provide sufficient information to describe and
locate resources - Helps determines if the resource can resolve a
user query - Profile Servers serve profiles
- Allow search and retrieval of profiles
- Distributed for local management and scalability
- Access profile databases
- Static profiles stored as XML documents
- Dynamic profiles generated from information
stored in databases
20System Architecture in a Nutshell
- Product Servers serve Data Products
- Data Products are served from data repositories
- Input unique product identifiers
- Output products in the requested formats
- Middleware
- Uses XML documents for communication
- Common language and protocols
- XML profiles for resource descriptions
- XMLQUERY for queries
- Implements message passing protocol for
distributed processing - Web service encapsulation of existing resources
- Product servers for data repositories
- Profile servers for catalogs
21PROFILE DTD
lt!ELEMENT profiles (profile)gt lt!ELEMENT
profile (profAttributes, resAttributes,
profElement)gt lt!ELEMENT profAttributes
(profId, profVersion?, profType,
profStatusId, profSecurityType?, profParentId?,
profChildId, profRegAuthority?,
profRevisionNote, profDataDictId?)gt
lt!ELEMENT resAttributes (Identifier,
Title?, Format, Description?, Creator,
Subject, Publisher, Contributor, Date,
Type, Source, Language, Relation,
Coverage, Rights, resContext,
resAggregation?, resClass, resLocation)gt
lt!ELEMENT profElement (elemId?, elemName,
elemDesc?, elemType?, elemUnit?,
elemEnumFlag, (elemValue (elemMinValue,
elemMaxValue)), elemSynonym,
elemObligation?, elemMaxOccurrence?,
elemComment?)gt
22Data Product Profile (ODYSSEY HEND)
- -ltprofilegt
- -ltprofAttributesgt
- ltprofIdgt1.3.6.1.4.1.1306.2.104.10018791lt/prof
Idgt - ltprofVersiongtnulllt/profVersiongt
- ltprofTypegtprofilelt/profTypegt
- lt/profAttributesgt
- -ltresAttributesgt
- ltIdentifiergtODY-M-HEND-EDR-2-V1.0H0133lt/Iden
tifiergt - ltTitlegtData_Set_Name ODYSSEY-MARS-HEND-EDR-2
-V1.0 Product_IdH0133lt/Titlegt - ltDescriptiongtnulllt/Descriptiongt
- ltresContextgtNASA.PDSlt/resContextgt
- ltresAggregationgtnulllt/resAggregationgt
- ltresClassgtdata.granulelt/resClassgt
- ltresLocationgtiiop//PDS.ProfServer.GEO.ODY.GR
Slt/resLocationgt - lt/resAttributesgt
- -ltprofElementgt
- ltelemNamegtFILE_SPECIFICATION_NAMElt/elemNamegt
- ltelemValuegt/ody_2001/xxx/H0133.DATlt/elemValue
gt - lt/profElementgt
23Profile Server
- Profile Servers serve profiles
- Allow search and retrieval of profiles
- Retrieves from profile databases
- Static profiles stored as XML documents
- Dynamic profiles generated from information
stored in databases - Distributed for local management and scalability
24Profile Server Requirements
- A profile server shall search and retrieve
profiles from a profile database - For search, a profile server shall allow any
profile attribute as a query constraint. Profile
attributes include those from the profile
element, resource attribute, and profile
attribute sections of the profile document. - For retrieval, a profile server shall return
matching profiles. The user can request the
complete profile or any subset of the profile.
25Demo
http//starbrite.jpl.nasa.gov/pds
26Data Set View Results
27Custom Data Set Browser THEMIS Search
28Custom Data Set Browser THEMIS Results
29Default Data Set Browser
30Benefits
- New system architecture provides seamless search
and retrieval of all PDS data products in the
system - Users can access all PDS resources without
knowing their location - Users are presented with an integrated set of PDS
Nodes (one PDS, not seven) - Primary method of data distribution is now
electronic and saves media costs - Heterogeneous data repositories can be located
anywhere for optimum performance and cost savings
(e.g., THEMIS data node at ASU) - PDS-D provides a standard interface for software
developers thereby increasing the availability of
user clients - Supports plug-ins for analysis tools and
graphical user interfaces - PDS-D supports evolution and scaling to
incorporate new information technology and
requirements changes - Mission are now more involved with the PDS sooner
and data are released through the PDS as soon as
they become available - Mars Odyssey data were released to the public
through the PDS on October 1st -- the same day
they were delivered!
31Scalability
- Number of system component interconnections
increases linearly - Nodes added as needed
- One-to-one connections from each component to
middleware - Exponential number of inter-operational
connections made dynamically via message passing - Since distribution system is built as a light
layer on top of the archive system, it will scale
as long as the archive system scales - Continue to distribute archive as needed to
support larger data repositories (e.g. MRO) - Parallel load balancing
- Smaller frequently used data repositories can be
mirrored
32Correlative Search the Simple Way
- All data resources in the system are profiled
- Submit a query that describes what you want
- Not how to get what you want
- System returns all matching data profiles
- Provides identification and description
information - Provides location information
- Provides all PDS metadata to support correlative
science - Information is machine and human readable
- Submit query to retrieve data
33Next Steps
- Collect and analyze requirements from upcoming
planetary missions (e.g., Cassini, MER, Mars
Express, MRO) - Gather user community feedback from PDS D-01
- Incorporate both of these to determine future
releases of PDS-D - Automate the data archiving processes to
streamline getting data into the PDS - Automated archive product creation work flow
- Product generation, labeling, validation, and
ingestion - Derived product processing and versioning
- Upgrade the PDS data model to support new
requirements - XML modeling and interfaces
- Ground-based data sets
- Wavelength regimes
- Targets with multiple identifiers and types
34PDS Development Timeline
PDS-D D02 Data Set View for entire archive
Mars Database product search NEAR Data Set
3 sigma Data Set On-request CD/DVD creation
Cassini
35For More Information
J. Steven Hughes Jet Propulsion
Laboratory Steve.Hughes_at_jpl.nasa.gov
36Backup
37Product Server Architecture
HTTP, IIOP, Java, C APIs
HTTP, IIOP, Java, C APIs
Distributed Product Servers
Java Server Framework
Java Server Framework
Data Source Interface For Dynamically Loaded
Query Handlers
Data Source Interface For Dynamically Loaded
Query Handlers
File System Access/Zip/ReadLabel
File system access/Zip/ReadLabel
Distributed Data Repositories
38The Product Server in a Nutshell
- If a product server has read privileges to a
file, it can return that file. - If a product server has read privileges to a
directory, it can return all files in the
directory, packaged as a zip file. - If a product server has read privileges to a PDS
labeled product, it can return all files
referenced within the label of the product,
packaged as a zip file. - The PDS/OODT product server is capable of serving
the vast majority of all products in the PDS
archive. (I.e. The product server is not
constrained by any target body, mission, data
set, or the data repository layout.) - The currently released product server is
installed at six nodes. (I.e. All product server
capabilities are at all nodes.)
39Product Server Query Handlers
- Return_Types
- PDS_LABEL return PDS label
- PDS_ZIP return PDS labeled file and all
associated files in a ZIP package - PDS_ZIPN same as PDS_ZIP except for 1-n PDS
labeled files - RAW (mime_type) return specified file
- DIRLIST return list of all files in a directory
- PDS_ZIPN_TES returns TES product in a ZIP
package - PDS_JPEG convert PDS image to jpeg
- Under consideration
- PDS_CSV convert PDS binary TABLE to common
separated value ASCII file - PDS_PDS Normalize data representation of a PDS
product - PDS_FITS Convert PDS product to FITS
-
-
- http//buttons.jpl.nasa.gov9002/index.html
40Standard Product Server Interface
- HTTP protocol link to product query servlet
- http//starbrite.jpl.nasa.gov/servlet/jpl.oodt.se
rvlets.ProductServlet - Target product server specification
- objectPDS.ASU.Product
- Keyword query
- ONLINE_FILE_SPECIFICATION_NAME
- data/odtie0_xxxx/i009xxedr/I00900003EDR.QUB
- AND RETURN_TYPE PDS_ZIP
41(No Transcript)