Title: DCC Representation Information Registry
1DCC Representation Information Registry
Digital Curation Centre
a centre of expertise in data curation and
preservation
- D Giaretta
- http//www.dcc.ac.uk
- http//dev.dcc.ac.uk
Funders
2Outline
- OAIS key points
- Representation Information
- Registry
- Conclusions
3Fundamentals
- OAIS Reference Model is a key (the only?)
standard for the long-term preservation of
information - Digital Preservation/Curation covers many issues
including financial, scientific, technical, legal
and sociological ones. - OAIS does not cover all these issues and so
addresses only part of the solution but a vital
part
4OAIS Reminder
- OAIS is a standard about the long-term
preservation of information - Information
- not just bits
- Must be usable at least by the Designated
Community - How can this be done in a manageable way?
5OAIS Responsibilities
- The OAIS must
- Negotiate for and accept appropriate information
from information Producers. - Obtain sufficient control of the information
provided to the level needed to ensure Long-Term
Preservation. - Determine, either by itself or in conjunction
with other parties, which communities should
become the Designated Community and, therefore,
should be able to understand the information
provided. - Ensure that the information to be preserved is
Independently Understandable to the Designated
Community. In other words, the community should
be able to understand the information without
needing the assistance of the experts who
produced the information. ? - Follow documented policies and procedures which
ensure that the information is preserved against
all reasonable contingencies, and which enable
the information to be disseminated as
authenticated copies of the original, or as
traceable to the original. - Make the preserved information available to the
Designated Community.
6OAIS Reference Model Functional Model
7OAIS Information Definition
- Information is always expressed (i.e.
represented) by some type of data - Data interpreted using its Representation
Information yields Information - Information Object preservation requires clear
identification and understanding of the Data
Object and its associated Representation
Information
Interpreted Using its
Yields
Data Object
Representation Information
Information Object
8Information Objects
9Representation Information
- The Data Object is interpreted using the
Representation Information (RepInfo) - The Reference Model is designed to ensure that an
OAIS is not set the impossible task of having to
provide all possible RepInfo immediately - Hence
- Take account of the Designated Community and its
associated Knowledge Base - Note that RepInfo may itself need further RepInfo
10Designated Community
- general English reading public educated to High
School and above, with access to a Web Browser
(HTML 4.0 capable) - GIS data GIS researchers - undergraduates and
above, having an understanding of the concepts of
Geographic data having access to current (2005,
USA) GIS tools/computer software e.g. ArcInfo
(2005) - Astronomer (undergraduate and above) with access
to FITS software such as FITSIO, familiar with
astronomical spectrographic instruments - Student of Middle English with an understanding
of TEI encoding and access to an XML rendering
environment. - Variant 1 Cannot understand TEI
- Variant 2 Cannot understand TEI and no access to
XML rendering environment - Variant 3 No understanding of Middle English but
does understand TEI and XML
11Representation Information
- The information that maps a Data Object into more
meaningful concepts. An example is the ASCII
definition that describes how a sequence of bits
(i.e., a Data Object) is mapped into a symbol.
12Representation Information
- The Representation Information accompanying a
physical object, like a moon rock, may give
additional meaning - It typically is a result of some analysis of the
physically observable attributes of the rock - The Representation Information accompanying a
digital object, or sequence of bits, is used to
provide additional meaning. - It typically maps the bits into commonly
recognized data types such as character, integer,
and real and into groups of these data types. - It associates these with higher level meanings
which can have complex inter-relationships that
are also described
13RepInfo Classification
14Structure including Formats
- Distinguish
- formats which are used mainly for rendering to
be followed by human inspection, and - formats used for automated processing
particularly important for science data - Distinguish
- Things with unknown structure needs software
- proprietary software e.g. MS Word
- Open Source software e.g. CDF
- Things with known/well described structure
- ASCII file, FITS file, telemetry etc
- Document the format
- Use description language if possible e.g. EAST,
DFDL, - The EAST tools are themselves Representation
Information which in due course will have to be
fully defined the closure of their
Representation Nets will be the EAST standard - Higher level definitions should include useful
scientific objects and humanities objects
15Layered Model from OAIS
16Semantics
- Meaning/ Relationships
- Data Dictionaries
- Thesauri
- Ontologies
- Semantic interoperability
17Time Dependent Information
- Many, perhaps most, datasets change over time and
the state at each particular moment in time may
be important. It may be useful to break the issue
into separate parts. - at each moment in time we could, in principle,
take a snapshot and store it. That snapshot has
its associated Representation Net. - efficient storage of a series of snapshots may
lead one to store differences or include time
tags in the data - Additional Representation Information would be
needed which describes how to get to a particular
time's snapshot from the efficiently encoded
version. - Also applies to ANNOTATION who said what about
which and when did they say it
18Actions and Processes (Behaviour)
- Some information has, as an integral part of its
content, an implicit or explicit process
associated with it - An examples of this is a database or other time
dependent or reactive system such as a Neural
Net. - Emulations
- Limited but may be adequate for rendered
document-type data
19Is saying its XML enough?
- lt?xml version'1.0'?gt
- ltVOTABLE version"1.1"
- xmlnsxsi"http//www.w3.org/2001/XMLSchema-insta
nce" - xsischemaLocation"http//www.ivoa.net/xml/VOTab
le/v1.1 http//www.ivoa.net/xml/VOTable/v1.1" - xmlns"http//www.ivoa.net/xml/VOTable/v1.1"gt
- lt!--
- ! VOTable written by uk.ac.starlink.votable.VOTa
bleWriter - !--gt
- ltRESOURCEgt
- ltTABLE name"6dfgs_E7_subset" nrows"875"gt
- ltPARAM arraysize"" datatype"char"
name"Original Source" value"http//www-wfau.roe.
ac.uk/6dFGS/6dfgs_E7.fld.gz"gt - ltDESCRIPTIONgtURL of data file used to create this
table.lt/DESCRIPTIONgt - lt/PARAMgt
- ltPARAM arraysize"" datatype"char"
name"Credits" value"Column explanations
provided by Mike Read (ROE) from 6dfGS
project."/gt - ltPARAM arraysize"" datatype"char"
name"Conversion" value"Converted from
6dfgs_E7.fld.gz by Mark Taylor (Starlink) using
STIL."/gt - ltPARAM arraysize"" datatype"char"
name"Comment" value"Cut down 6dfGS dataset for
TOPCAT demo usage."/gt - ltFIELD arraysize"15" datatype"char"
name"TARGET"gt - ltDESCRIPTIONgtTarget namelt/DESCRIPTIONgt
Or here
NO!
20Preservation Issues
- Given a file or a stream of bits how does one
know what Representation Information is needed
(this question applies to Representation
Information itself as well as to the digital
objects we are primarily interested in preserving
and using) how does one know, for example, if
this thing is in FITS format? - Someone may simply know what it is and how to
deal with it i.e. the bits are within the
Knowledge Base - One may be able to recognise the format by
looking for various types of patterns. - One may feed the bits into all available
interpreters to see which accept the data as
valid - Other means.
- The only safe way have an associated label which
points to the appropriate Representation
Information - Note this does not exclude the other methods e.g.
for data rescue
21Registry for Representation Info
The Digital Object could have RepInfo packed with
it
Support automated access processing
Example of use of Representation Information
Labelling
22CPID
Registry
External
23Preservation Perspectives
- Migration
- Refresh
- Replicate
- Repackage
- Transform
- Access Service
- Dissemination API
- Data Virtualisation
- Source code
- Emulation
- Archive Interoperability
- Federated
- Resource sharing
24Types of Information Used in OAIS
25Preservation Description Info
Issues of Trust
26Archival Information Package
27Preservation Description Information
- Provenance Information
- Describes the source of Content Information, who
has had custody of it, what is its history - Context Information
- Describes how the Content Information relates to
other information outside the Information Package - Reference Information
- Provides one or more identifiers, or systems of
identifiers, by which the Content Information may
be uniquely identified - Fixity Information
- Protects the Content Information from
undocumented alteration
28Conclusions
- OAIS provides the framework for information
preservation - Representation Information is key
- Representation Information is more than just
format - Desirable to share the effort
- Also need PDI, Packaging, etc