Office formats - PowerPoint PPT Presentation

About This Presentation
Title:

Office formats

Description:

Digital Preservation Testbed is performing experiments on three strategies for ... Sometimes loss of information (e.g. diacritic characters) No full-bodied ASCII basis ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 22
Provided by: jacqueli68
Learn more at: https://www.erpanet.org
Category:

less

Transcript and Presenter's Notes

Title: Office formats


1

Practical Experiences of the Digital Preservation
Testbed Office formats Jacqueline Slats File
Formats for Preservation, Erpanet May10-11 2004,
Vienna, Austria
2
  • Digital Preservation Testbed is performing
    experiments on three strategies for preserving
    records without affecting the authenticity of the
    records
  • - Migration
  • - XML
  • - Emulation (Universal Virtual Computer)
  • Is assessing their practical use for theDutch
    situation.

3
  • Experiments are taking place on
  • text documents
  • spreadsheets
  • electronic mail
  • databases.

4

Research Questions
  • Advantages of each preservation approach?
  • Factors affecting each approach?
  • Effectiveness of each approach?
  • Basic Requirements for Preservation(context,
    content, structure, appearance, behaviour)
  • Which metadata are essential for preservation?

5
Experiment Process (1)
  • Step 1 Definition of the process
  • Step 2 Preparation for the process
  • Step 3 Authenticity requirements and evaluation
    checklist
  • Step 4 Design of the experiments
  • Step 5 Resource specification
  • Step 6 Go/no go decision

6
Experiment Process (2)
  • Step 7 Development of the experiment
  • Step 8 Test experiment
  • Step 9 Go/no go decision
  • Step 10 Run experiment
  • Step 11 Evaluate experiment
  • Step 12 Consider results

7
Testbed Team
8
Digital record as a combination of ...
Hardware
Software
Computer file
Digital Record
9
Basic Requirements for Preservation
  • Context
  • Content
  • Structure
  • Appearance
  • Behaviour

10

Basic Requirements for Preservation of text
documents Context
  • Organisational context, such as name of
    organisation, business process, date, relation
    with other documents
  • Preservation Log File, with information about
    original and current file formats, name and
    version of hardware, software and OS,
    preservation actions

11

Basic Requirements for Preservation of text
documents Content
  • All content must be preserved, including headers
    and footers, table of content, document
    properties, remarks
  • Plain text must always be readable

12

Basic Requirements for Preservation of text
documents Structure
  • Structure of the document must be preserved, in
    order to represent the logical relations between
    the components of the document, such as the order
    of chapters, paragraphs, but also the right
    position of inserted remarks, footnotes and images

13
Basic Requirements for Preservation of text
documents Appearance
  • The appearance of the original and the preserved
    version do not have to be identical, but the new
    appearance may not in any way affect the meaning
    of the original record

14

Basic Requirements for Preservation of text
documents Behaviour
  • Description of active links must be preserved
  • Active behaviour, updating the content must not
    be preserved, but prove of this behaviour driven
    content does

15
Text documents (1)
Approach
Results
  • Migration from an older version of an application
    to a newer version of this application
  • Migration from an application to a standard
    format PDF
  • Migration of old records created in one word
    processor to another (WP to Word)
  • For the short term needs to be repeated every
    few years manual checking only if migration is
    automated
  • PDF is suitable to represent text documents
    authentically, especially the appearance
  • Met authenticity requirements only after manual
    intervention

16
Text documents (2)
Approach
Results
  • Conversion to XML
  • XML is able to represent the context, content,
    structure and behavior of text documents
    authentically. To represent appearance an
    additional stylesheet is required.

17
File format XML
Cons
Pros
  • Open standard, controlled through W3C
  • Platform-independent
  • Self describing and human readable
  • Well equipped to preserve content, context and
    structure
  • Difficult to fully preserve the appearance of a
    texual document
  • XML, its related standards and their use form a
    complex material much pioneering work still
    needs to be done

18
File format PDF
Pros
Cons
  • PDF is openly and freely published
  • Platform-independent
  • Widely used standard
  • Well equipped to preserve content, context and
    appearance
  • Adobe controls the development of PDF
  • Sometimes loss of information (e.g. diacritic
    characters)
  • No full-bodied ASCII basis

19
Decision table for preservation of text documents


PDF alternativebackwards compatibility
P lt 10 jr
Implicit structure
P gt 10 jr
PDF
Text-document
PDF or XML alternativebackwards compatibility
P lt 10 jr
Explicit structure
PDF or XML
P gt 10 jr
20
Preserved Object
Original file
XML file
PDFfile
Preserv. log file
Metadata
1..
1..
1
0..
1
1
Imagefile
DTD or schema
Style-sheet
21
For further information about the
Testbed Website www.digitaleduurzaamheid.nl
e-mail testbed_at_nationaalarchief.nl
Write a Comment
User Comments (0)
About PowerShow.com