Title: Mike Smorul
1Digital Preservation and Archiving at the
Institute for Advanced Computer
StudiesUniversity of Maryland, College Park
- Mike Smorul
- Saurabh Channan
2Overview
- Digital Preservation Research
- ADAPT Project and Components
- Pilot Persistent Archive
- Digital Library and Production Data Distribution
- Global Land Cover Facility
- Conclusion Questions
3A Digital Approach to Preservation Technology
(ADAPT)
- Premise
- Preservation of digital entities into
self-describing objects - OAIS Information Packet model as a framework
- Separation of management into three layers,
bitstream, semantic, and access/discovery - Distributed and Secure Infrastructure
- Automatic ingestion and replication
- Policy-Driven Management of Preservation
Processes - Global Format Registry
- Separate Peer-to-Peer Deep Archive
4ADAPT Architecture
5ADAPT Components
- Ingestion
- Producer-Archive Workflow Network (PAWN)
- Management of Preservation Processes
- Lightweight Preservation Environment (LPE)
- Access and Discovery
- Grid Retrieval and Search Platform (GRASP)
- EAP Collection browser
6Overall Principles (PAWN)
- Distributed, secure ingestion
- OAIS based Information Packet creation
- Use of web/grid technologies platform
independent - Minimal client-side requirements
- Ease of integration with archive and data grid
systems. - Designed to satisfy data integrity requirements
of scientific collections and digital preservation
7Distributed Ingestion (PAWN)
8Ingestion Workflow (PAWN)
- Negotiate Submission Agreement.
- Workflow Initialization and Submission
Information Packet (SIP) creation. - Transfer of SIPs to Data Grid site.
- Validation of SIP transfer
- Organization of data into collections and
transfer into Data Grid.
9Component Overview (PAWN)
10Target Collections (PAWN)
- Digital Image Collection
- Rich metadata in various formats
- Web site crawling
- Online and interactive content
- GLCF Landsat data
- Spatial and temporal metadata
- Large quantity (over 15,000 objects)
11Lightweight Preservation Environment (LPE)
- The Lightweight Preservation Environment is an
archival system based on a modular design using
grid and web services. - The current implementation relies mostly on
Globus technologies. - Primarily, weve focused on wrapping logic around
those components.
12Developed Components (LPE)
- Data Manager (DM)Organizes data and queries
between the user and the other components - Policy Manager (PM)Ensures that a minimum
number of copies exist for any given file - Transformation Manager (TM)Executes specific
transformations on a named file on a given
storage node and returns the results
13Grid Retrieval and Search Platform (GRASP)
- Based on concepts developed in the Earth Science
Data Interface (ESDI) developed at the UMIACS
GLCF. - Provides a graphical interface into data grid
holdings. - Access to entire GLCF holdings through the
Storage Resource Broker(SRB)
14GRASP Architecture
15GRASP Architecture
- GRASP uses a data grid as an abstract storage
repository. - Metadata in the grid is mined from the grid
itself or from external sources and published
into a browsable form. - Data grids may allow for platform independent
metadata, but may not be optimal for access
16GRASP Screenshot
17Global Land Cover Facility
- Mission
- The GLCF Mission is to encourage the use of
remotely sensed imagery, derived products and
applications within a broad range of science
communities in a manner that improves
comprehension of the nature and causes of land
cover change and its impact on the Earth. - Goal
- The GLCF Goal is to provide free access to an
integrated collection of critical land cover and
Earth science data through systems that are
designed to maximize user outreach and that
promote development of novel tools for ordering,
visualizing and manipulating spatial data.
18Data Collections
Majority of the holdings are of Landsat and MODIS
data
19Data Distribution
- Data at the GLCF
- Approximately 5.1 TB compressed
- Approximately 13 TB uncompressed
- Anticipated Production Rate
- Triple or Quadruple current data holding within
the next two year
20Data Discovery Applications
- ESDI
- Web Interface
- User friendly
- Search
- Retrieve
- Discover
- Scalable
- Over 9TB a month !
21GLCF Archive
Scalable and Reliable
22Participation Possibilities
- PAWN ingestion component
- Minimal geospatial metadata support planned, can
be expanded to support NGDA endpoint - GRASP display component
- Solid core components, end-user interfaces need
additional polishing - GLCF data holdings
- Additional hardware required if additional data
and access mechanisms (grid, etc) required - Other possibilities include grid infrastructure,
GSI security, format registry, etc.
23Questions