Title: Building the Archives of the Future: NARAs Electronic Records Archives Program
1Building the Archives of the FutureNARAs
Electronic Records Archives Program
Managing Electronic Records
Kenneth Thibodeau, Director ERA Program National
Archives and Records Administration Richard
Marciano, Reseach Scientist San Diego
Supercomputer Center 3 October 2000
2ITS OUR FUTURE
- - John Carlin Archivist of the United States
3Why do we need anElectronic Records Archives?
- The conduct of business is increasingly enabled
by, and dependent on, digital computer and
communications technologies. - The records that are being created in this
environment are increasingly electronic. - Many of these records cannot be expressed in
non-electronic form - Digital technology is both necessary and
advantageous for discovering and delivering
information
4Technical Challenges in Building ERA
- Overcome technological obsolescence in a way that
enables the preservation of demonstrably
authentic records. - Find ways to take advantage of continuing
progress in information technology in order to
maintain and improve customer service - Build solutions that recognize that todays
progress is tomorrows obsolescence
5What is the Electronic Records Archives?
The Electronic Records Archives is a
comprehensive, systematic, and dynamic means of
accomplishing the archival work that must be done
to provide continuing access to authentic
electronic records over time.
6Archival Business Model Context
The Life Cycle of Records
7Digital Preservation Strategies
- Maintain original technology
- Imitate original technology
- Re-engineer software
- Migrate data formats
- Standardize data formats
Original Technology
State-of-the-art Technology
- Collection-based Persistent Object Preservation
8Collection Based Persistent Object
PreservationMethod
- Create meta-data models
- the internal components of objects
- the sequence of components within objects
- the attributes of presentation of preserved
objects - Apply models by marking up objects
- Express links among records and collections as
persistent data values - Define the semantics of components
- Preserve the models, the transformed records and
procedures to apply the models. - Provide rich, comprehensive and flexible
meta-data management for discovery, retrieval
preservation
9Open Archival Information System
10Persistent Object PreservationImplementation
- Comprehensive
- All types of computer applications
- All types of electronic records
- Collections as well as individual records
- All required archival processes
- Infrastructure Independence
- Objects and Collections of Objects
- Enable replacement of any component
- Scalable
- Up to gtgt 100,000,000 objects
- Down for small collections institutions
- Metacomputing - over the Internet
- Extensible over the Records Lifecycle
11Basic Process Ingest
Archival Information Package
Submission Information Package
12Basic Process Access
Retrieve Records
Dissemination Information Package
13ERA Design Strategy
14ERA Concept model
15Technology DETOUR Persistent Archive
Infrastructure
- Data object management
- Ability to work with multiple types of storage
systems, across separate administration domains - Richard Fisher 4 Legacy Data Base
Archiving - - data records/objects
- - conceptual Approach for Data Archives where a
technology neutral format (XML) is used and where
through a reverse process one can restore a
collection and query it (data records, metadata,
audit trails) - Information management
- Ability to define a collection independent of
database choice - Ability to migrate collection onto new databases
- Jeff Rothenberg 3 Digital Records Last
Forever, or Five Years, Whichever Comes First - - encapsulated document and metadata
- Gregory Hunter, Charles Dollar 13 Strategies
Best Practices for Managing the Storage
Preservation of E-Records - PRINCIPLE 8 Encapsulated Electronic Records
- Store raw data, processed data, analysis
parameters, correspondence, and metadata as a
single physical entity - Use XML-based software to define the components
of the electronic wrapper, including indexing
terms for retrieval - Knowledge management
- Mark Gilbert 2 Content Management, XML
Records Management - - knowledge map technology for navigation
- July Gable 8 Document Management Update,
Whats New, Whats Hot and What to be Wary
About - - from DM to KM
16ERA Synergy Beyond
- A uniform architecture is emerging across
- persistent archives (NARA)
- digital libraries (NSF)
- NSF -- DLI2, National SMET Education Digital
Library - -- NPACI data grid for neuroscience brain image
federation - grid development (DOE, NASA, NLM)
- DOE -- ASCI Data Visualization Corridor remote
data processing - -- Particle Physics Data Grid object
replication - NASA -- Information Power Grid distributed data
processing - NLM -- Digital Embryo Project data grid for
image processing and storage
17ERA Research Benefits
- Validation mechanism for the
- common data management architecture
- differentiation between knowledge, information,
and data and the choice of representation
standards - Integration vehicle for tying together
- persistent archives with grid environments
- grid environments with digital libraries
- digital libraries with persistent archives
18Knowledge-based Persistent Archives
Ingest
Manage
Access
(Topic Maps / Model-based Access)
? 9 SLIDES
(Data Handling System - Storage Resource Broker)
? 3 SLIDES
19Data Handling System (1/3) Storage Resource
Broker Meta-data Catalog
Application
Resource
Third-party copy
User
Remote Proxies
MCAT
Dublin Core
DataCutter
Application Meta-data
20Collection Based Access (2/3)
- Abstract data set naming and administration away
from physical storage resource - Data sets defined by attributes
- Logical collection used to group data sets across
storage systems - Enables support for replication of data
- Collection owned data
- Authentication controlled by data handling system
- Persistence controlled by data handling system
21SRB Containers (3/3)Managing Archive Latency
SRB client
- Create container in a logical storage resource
containing at least one cacheable resource - Create objects in containers
- Cache daemon will move filled containers to
archive - synch and purge APIs
SRB Server
UNIX
HPSS
HPSS
container
Distributed Storage Resources
cached containers
22Knowledge-based Persistent Archives
Ingest
Manage
Access
(Topic Maps / Model-based Access)
? 9 SLIDES
(Data Handling System - Storage Resource Broker)
? 3 SLIDES
23Knowledge-based Access (1/9)
- The relationships between knowledge and
information layers define - Rules that can be applied to the collection
- Rules for defining collection attributes
- Rules for organizing attributes into a schema
- Rules for feature extraction
- Relationships that quantify associations
- Organization of concepts into topic maps
- Ontology mapping between concept maps
- Mapping of concepts to collection attributes
- Etc.
24Knowledge Standards (2/9)usingTOPIC MAPS
ISO/IEC 13250 (Jan. 2000)Bridging knowledge
representation information management
- STANDARD FOR
- describing knowledge structures
- associating them with information resources
- solution for organizing and navigating large and
large information pools - XTM SPECIFICATION
25TOPIC MAPS (3/9)
- Paradigm for K. navigation synthesis
- Concept of creating style sheets for K.- based
information access and navigation - TMs define semantically customized views
26The TAO of Topic Maps (4/9)(from XML Europe 2000
papers)NEXT 4 SLIDES
T is for Topic
Topics
Topic types
Topic names
27The TAO of Topic Maps (cont.) (5/9)
A is for Association
Topic associations
Association types
28The TAO of Topic Maps (cont.) (6/9)
O is for Occurrence
Occurrences
Occurrence Roles
29The TAO of Topic Maps (cont.) (7/9)
? Independence of topic associations topic
occurrences (information resources)
Topic maps as portable semantic networks
30Model-Based Archival Collection Management (8/9)
31Towards a Model-based ERA? (9/9)
- Using XML (XML, DTD, TM, )
- Introducing rules (e.g. retention schedule rule)
- Inference rules ? to derive implicit knowledge
- Validation rules ? to express constraints
- Presentation rules ? style sheets / views
- Archiving rules models
- Migrating collections models restoring a
collection and querying it! - END OF DETOUR Back to the ERA Infrastructure
Concept!
32(No Transcript)
33Getting to ERA
- Build on core technologies of the emerging
National Information Infrastructure - Leverage efforts in the physical sciences, life
sciences, spatial data, digital government,and
digital library communities - Develop the Information Management Architecture
for digital archives - Articulate and refine the archival business model
34Partnerships
- ISO draft Model of Open Archival Information
System - NASA/Consultative Committtee on Space Data
Systems - International research on Permanent Authentic
Records in Electronic Systems (InterPARES) - 7 international research teams, 10 national
archives - Intelligent processing of electronic records
- Army Research Laboratory, Georgia Tech Research
Institute - Distributed Object Computation Testbed
- Defense Advanced Research Projects Agency, U.S.
Patent and Trademark Office - National Partnership for Advanced Computational
Infrastructure - National Science Foundation
- Archivists Workbench
- NHPRC Grant to San Diego Supercomputer Center
35How do these activities fit together?
- OAIS Model
- InterPARES
- Intelligent processing
- DOCT
- NPACI
- Archivists Workbench
- High level framework for entities, functions,
data flows - Archival requirements, electronic records
typology, preservation model, best practices - Tool sets for archival processes
- Persistent Object Preservation
- Core technologies for ERA
- Scale ERA for smaller archives
36What have we accomplished?
- Research prototype
- migratable information architecture
- scalable archive
- Demonstrated application
- Process from ingest through access
- Multiple types of collections
- Databases, e-mail, GIS, digital images, office
automation files. - Experiments
- Application of knowledge-based, natural language
processing, and other technologies to archival
processing of records
37Additional Information
- http//www.nara.gov
- http//www.sdsc.edu/NARA
- http//www.ces.btc.gatech.edu/research.htm
- Digital Strategies 2000
- National Archives at College Park, MD
- Nov 16-17, 2000
- http//www.nara.gov/program