Building the Archives of the Future: SelfDescribing Records - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Building the Archives of the Future: SelfDescribing Records

Description:

Defense Advanced Research Projects Agency, U.S. Patent and Trademark Office ... 'mediators' to enable future technologies to interpret the models and metadata ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 39
Provided by: nar354
Category:

less

Transcript and Presenter's Notes

Title: Building the Archives of the Future: SelfDescribing Records


1
Building the Archives of the Future
Self-Describing Records
  • Kenneth Thibodeau
  • Director, Electronic Records Archives Program
  • National Archives and Records AdministrationJuly
    18, 2001

2
The Electronic Records Archives Vision
  • Overcome technological obsolescence in a way that
    preserves demonstrably authentic records.
  • Build a dynamic solution that incorporates the
    expectation of continuing change in information
    technology and in the records it produces.
  • Find ways to take advantage of continuing
    progress in information technology in order to
    maintain and improve both performance and
    customer service

3
Critical Challenge
  • Proven methods for preserving digital information
    across generations of technology are limited to
    the simplest formats
  • Available methods are increasingly inadequate
  • The market has not delivered solutions.

4
How will we develop the Electronic Records
Archives?
5
ERA Infrastructure Concept
SCALABLE Gb/sec Internet Grid Security Distributed
Processing Mediation among Systems Distributed,
redundant Storage Infrastructure Independence
Public User
Records Creator
Workbench
Government User
NARA User
Workbench
Records Creator
Trusted Repository
Digital Library
Records Creator
NARA User
Public User
6
ERA Infrastructure
  • In NARA (using NARANET)
  • Archival workstations for staff
  • Reference workstations for researchers
  • On the National Information Infrastructure, under
    NARAs control
  • ERA Ingest Distribution portal (Internet
    Media)
  • POP repositories (Normal, Trusted, Special)
  • Affiliated Archives
  • On the National Information Infrastructure
  • Agency systems with access to NARA portal
  • Digital Libraries
  • Public Users

7
NARA Partnerships
  • Open Archival Information System (OAIS) Reference
    Model
  • NASA, Consultative Committtee on Space Data
    Systems
  • Distributed Object Computation Testbed (DOCT)
  • Defense Advanced Research Projects Agency, U.S.
    Patent and Trademark Office
  • National Partnership for Advanced Computational
    Infrastructure (NPACI)
  • National Science Foundation
  • Presidential Electronic Records Processing
    Operational System (PERPOS)
  • Army Research Laboratory, Georgia Tech Research
    Institute
  • Archivists Workbench
  • NHPRC Grant to San Diego Supercomputer Center
  • International research on Permanent Authentic
    Records in Electronic Systems (InterPARES)
  • 7 international, multidisciplinary research
    teams, 10 national archives

8
ERA Functional ModelAn Open Archival Information
System Implementation
Submission Information Packages
Producer
OAIS
Archival Information Packages
queries
Result sets
orders
Consumer
Dissemination Information Packages
9
InterPARES Preserve Electronic Records Model
10
Information Management Architecture for
Persistent Object Preservation
Ingest Services
Access Services
Management
Knowledge or Topic-Based Query / Browse
Knowledge Repository for Rules
Relationships Between Concepts
Knowledge
XTM DTD
Rules - KQL
(Topic Maps / Buckets / Model-based Access)
Information Repository
Attribute- based Query
Attributes Semantics
SDLIP
Information
XML DTD
(Data Handling System - SRB / FTP / HTTP)
Fields Containers Folders
Storage (Replicas, Persistent IDs)
Grids
Feature-based Query
Data
MCAT/HDF
11
ERA Concept model
12
ERA Processes
ACCESSIONING
13
Accept an Accession?
Accessioning Workbench
Transferred Records
Expected Records
Transfer Documentation
  • What should the agency have transferred?
  • What did the agency say it transferred?
  • What was transferred?

14
SELF-DESCRIBING
  • Records
  • Files
  • Series
  • Record Systems

15
Persistent Object Method
  • Characterize significant properties of the things
    that are to be preserved.
  • Express these properties in formal models
  • Encapsulate objects in metadata defined in the
    models.
  • Use software mediators to enable future
    technologies to interpret the models and metadata
  • to rebuild and repopulate collections
  • to re-present the records
  • support information discovery and delivery.

16
E-mail Groupwise view
17
E-mail text editor view
18
E-mail MIME-aware view
19
Tagged MIME E-mail Message
ltMessage-Idgt p05010429b6e7e938e0b4_at_10.2.68.205
lt/Message-Idgt ltX-OrganizationgtUSC/Information
Sciences Institutelt/X-Organizationgt ltX-Phonegt(310)
822-1511 ext. 766lt/X-Phonegt ltX-Faxgt(310)
822-0751lt/X-Faxgt ltDategtWed, 28 Mar 2001 222230
-0500lt/Dategt ltTogtReaderslt/Togt ltFromgtYigal Arens
arens_at_ISI.EDUlt/Fromgt ltSubjectgtAnnouncing DG
Online, the magazine of digital government
researchlt/Subjectgt ltContent-Type
multipart/alternative boundary"_-122
6283781_ma" --_-1226283
781_magt ltContent-Type text/plain
charset"us-ascii" format"flowed"gt ltMessage_Bod
ygtDG Online http//www.dgrc.org/dg-online/. .
20
Structure of E-mail Message aka Document Type
Definition
  • lt!ELEMENT Email_Message (Header, Message Body,
    Attachment)gt
  • lt!ELEMENT Header (Internal Header, External
    Header)gt
  • lt!ELEMENT Internal Header (Message_Id,
    X-Organization, X-Phone, X-Fax)gt
  • lt!ELEMENT Message-Idgt
  • lt!ELEMENT X-Organizationgt
  • lt!ELEMENT X-Phonegt
  • lt!ELEMENT X-Faxgt
  • lt!ELEMENT External Header (Date, To, From,
    Subject)gt
  • lt!ELEMENT Date (Weekday, Day_of_Month, Month,
    Timegt
  • lt!ELEMENT To (PCDATA)gt
  • lt!ELEMENT From (PCDATA)gt
  • lt!ELEMENT Subject (PCDATA)gt
  • lt!ELEMENT Message_Body (PCDATA)gt

21
File of Electronic Records
Types of Records
REPORT OF INDEPENDENT ACCOUNTANTS To the
Board of Directors and Stockholders of Great
Plains Software, Inc. In our opinion, the
consolidated financial statements listed in the
accompanying index present fairly, in all
material respects, the financial position of
Great Plains Software, Inc. and its subsidiaries
at May 31, 1999 and 1998, and the results of
their operations and their cash flows for each of
the three years in the period ended May 31,
1999, in conformity with generally accepted
accounting principles. In addition, in our
opinion, the financial statement schedules
listed in the accompanying index present fairly,
in all material respects, the information set
forth therein when read in conjunction with the
related consolidated financial statements. These
financial statements and financial statement
schedules are the responsibility of the
Company's management our responsibility is to
express an opinion on these financial statements
and financial statement schedules based on our
audits. We conducted our audits of these
statements in accordance with generally accepted
auditing standards, which require that we plan
and perform the audit to obtain reasonable
assurance about whether the financial statements
are free of material misstatement. An audit
includes examining, on a test basis, evidence
supporting the amounts and disclosures in the
financial statements, assessing the accounting
principles used and significant estimates made by
management, and evaluating the overall financial
statement presentation. We believe that our
audits provide a reasonable basis for the opinion
expressed above. /s/ PricewaterhouseCoopers LLP
PricewaterhouseCoopers LLP Minneapolis ,
Minnesota June 25, 1999
22
eXtensible Business Reporting Language
(XBRL)Example
23
XBRL DTD
24
Structure expressed as Tree
Electronic Mail Message

Internal Header
External Header

25
Accept an Accession?
Accessioning Workbench
Transferred Records
Expected Records
Transfer Documentation
26
How does ERA determine the dates of records?
  • E-mail
  • All e-mail contains a field indicating the date
    it was sent. For the sender, that is the date of
    the record. ERA needs to search the date-sent
    fields.
  • (Technology solution)
  • Attachments to e-mail messages
  • Attachments to a record are parts of that record.
    The date of the message is the date of the
    record.
  • (Archival principle)
  • Records forwarded, via e-mail, for filing in a
    recordkeeping system
  • E-mail is used only to transmit a record to the
    system. The date of the attached record depends
    on the record
  • (Archival principle)

27
Defining models for electronic recordse.g. E-mail
  • All E-mail
  • Groupwise mail
  • ccmail
  • USENET mail
  • User defined fields
  • ..

28
Transformation
Accessioning Workbench
Metadata
Transferred Records
Persistent Records
Exceptions
Archival Exceptions
Mistakes
29
Aggregation
Accessioning Workbench
Persistent Records
Container
Exceptions
Container Metadata
30
Aggregation risk management
Accessioning Workbench
Container
Persistent Records
Exceptions
Container Metadata
31
Risk ManagementMulti-Valent Documents
32
ERA Reference Process

Archival Repository
Query
Collection
Collection
Rebuild
Collection
Metadata
Present
User
33
Process Check metadata for the series to
identify relevant DTDs
Reference Workbench
Series Metadata Files Records
Repository
Metadata
34
Translate E-mail DTD to Relational Database
Structure
Reference Workbench
35
Process Retrieve the records and place in the
target structure
Reference Workbench
Repository
Metadata
36
Persistent Object Preservation
  • Aims at independence of technological
    infrastructure
  • Reduce threats to integrity and authenticity by
    minimizing changes over time.
  • Embeds changes in a comprehensive information
    management architecture designed for preservation
  • Inherently extensible
  • Facilitates use of future, advanced technologies,
    without requiring change in what is preserved.
  • Currently beyond state of the art of information
    technology.

37
Self-describing Objects for Records Management
  • Facilitate management, exchange, and disposition
    of records
  • explicitly identify the content of records,
    files, series,...
  • express how content is organized
  • allow the content to be stored once and used in
    different documents
  • separate, but link, management of content and
    presentation
  • capture the relationships among documents and
    collections of documents
  • and support multiple views of a collection of
    documents
  • all in plain language

38
  • Thank you.

For more information www.nara.gov/era
Write a Comment
User Comments (0)
About PowerShow.com