Long term preservation: an overview - PowerPoint PPT Presentation

About This Presentation
Title:

Long term preservation: an overview

Description:

http://www.sims.berkeley.edu/research/projects/how-much-info-2003/ 6. Why ... Hardware and software used = IBM PC XT, MS DOS, 5 ' floppy disks, shareware word ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 36
Provided by: micha558
Category:

less

Transcript and Presenter's Notes

Title: Long term preservation: an overview


1
Long term preservation an overview
Digital Curation Centre
a centre of expertise in data curation and
preservation
  • Michael DayDigital Curation CentreUKOLN,
    University of Bathhttp//www.ukoln.ac.uk/
  • Joint Workshop on Electronic Publishing,Lund,
    Sweden, 15 April 2005

Funded by
2
Session overview
  • Quick introduction
  • A fifteen year view
  • Overview of current issues

3
What is digital preservation?
  • Dealing with the potential technical problems
    that impede continued access to digital resources
    (of all types)
  • No longer possible to place physical artefact on
    a shelf and ignore for 100 years
  • Not just a technical problem
  • "... The planning, resource allocation, and
    application of preservation methods and
    technologies to ensure that digital information
    of continuing value remains accessible and
    usable" - Margaret Hedstrom (1998)

4
What is digital curation?
  • New(ish) term, from science data world (e.g.
    bioinformatics)
  • Reflects those extra things that need to be done
    to facilitate access and reuse
  • "The activity of managing and promoting the use
    of data from its point of creation, to ensure it
    is fit for contemporary purpose, and available
    for discovery and reuse" - Philip Lord, et al.
    (2004)

5
Why is it a problem? (1)
  • An increasing flood of 'born-digital' data
  • The World Wide Web
  • Comprises billions of pages "deep Web"
  • Internet Archive gt1 petabyte, and growing _at_ 20
    Tb. per month (http//www.archive.org/)
  • Data deluge in science and engineering
  • Petabytes generated by high throughput
    instruments, streamed from sensors and
    satellites, etc.
  • 5 exabytes of new information created in 2002
  • http//www.sims.berkeley.edu/research/projects/how
    -much-info-2003/

6
Why is it a problem? (2)
  • Need for (open) access to this data
  • Results in added scientific value
  • New analytic techniques
  • 2004 - OECD member states endorsed the principle
    that publicly funded research data should be
    openly available to the maximum extent possible

7
Technical problems
  • Media longevity
  • Estimated lifetimes are short compared to paper
    or good quality microform
  • Solutions more durable media, 'refreshing'
    regimes
  • Hardware and software obsolescence
  • Relatively short obsolescence cycles for
    hardware, peripherals, media, and software
  • For example, BBC Domesday Project (1986) - hybrid
    videodisc

8
Preservation strategies (1)
  • Technology preservation
  • The preservation of an information object
    together with all of the hardware and software
    needed to interpret it
  • But will lead to museums of "ageing and
    incompatible computer hardware" - Mary Feeney
    (1999)
  • Has key role in the rescue of digital objects
    (digital archaeology)
  • Emulation
  • The preservation of original application software
    and to run this on emulators that mimic the
    behaviour of obsolete hardware and operating
    systems
  • Development of virtual machines that will be
    migrated to work on different platforms (Jeff
    Rothenberg, 1998)
  • Universal Virtual Computer (UVC) concept

9
Preservation strategies (2)
  • Migration
  • Managed transformations
  • The periodic transfer of digital information from
    one hardware and software configuration to
    another, or from one generation of computer
    technology to a subsequent one - CPA/RLG report
    (1996)
  • Widely used strategy, e.g. on ingest into a
    repository
  • Problems with preserving the integrity of an
    object
  • Encapsulation
  • Self-describing objects, e.g. information package
    in OAIS model, METS, Buckets, Universal
    Preservation Format

10
Preservation strategies (3)
  • Metadata and documentation
  • All digital preservation strategies depend - to
    some extent - on the creation, capture and
    maintenance of metadata
  • "Preserving the right metadata is key to
    preserving digital objects" (ERPANET Briefing
    Paper, 2003)
  • The various types data that will allow the
    re-creation and interpretation of the structure
    and content of digital data over time (Ludäsher,
    Marciano Moore, 2001)
  • Reference Model for an Open Archival Information
    System (OAIS) - ISO 147212003
  • PREMIS working group

11
A fifteen year retrospective
  • Based on my dissertation
  • "Preservation problems of electronic text and
    data" - Loughborough University (1989)
  • Overview of the state of the art in digital
    preservation in the late 1980s
  • Hardware and software used IBM PC XT, MS DOS,
    5¼" floppy disks, shareware word processing
    program (Galaxy)

12
The 1980s - contexts
  • Still faith in the "paperless" future
  • Electronic publishing in its infancy
  • Online databases (mainly bibliographic)
  • Viewdata systems (e.g., Minitel, Prestel)
  • Experiments with electronic journals (e.g. BLEND,
    project quartet) and electronic document delivery
    systems (ADONIS)
  • CD-ROM databases

13
The 1980s - issues (1)
  • Digital preservation issues
  • Major focus on the longevity of media
  • e.g., BNB Research Fund funded comparison of
    microfilm, magnetic media, and optical disks for
    archival storage (1983)
  • Interest in the potential value of new types of
    optical media, e.g. videodisc, Compact Disc
    (CD-ROM, CD-R)
  • No promising results from initial research

14
The 1980s - issues (2)
  • Knowledge that media longevity was not the only
    issue
  • "The problem with machine-readable records is the
    long term availability of the machines rather
    than the physical decay of the recording
    mechanism" - John Mallinson (1986)
  • Brief consideration of COM (microform) for
    long-term storage

15
The 1980s - experiences (1)
  • National archives
  • A focus in some countries on machine-readable
    records from the 1960s
  • The principle that machine-readable records
    should be treated in the same manner as
    conventional records was established very early
    on, e.g. by Meyer Fishbein (1972)
  • Also, there was an early recognition of the
    importance of documentation and economic factors

16
The 1980s - experiences (2)
  • Data archives
  • Storage of social science survey data started in
    the punched-card era (1940s)
  • ESRC Data Archive established 1967
  • Recognised the importance of developing
    procedures to manage data (e.g., migration on
    ingest) and of standardised descriptions
    (metadata)
  • National libraries
  • Were considering legal deposit obligations

17
The 1980s - summing up
  • Some differences with the position today, e.g.
  • General lack of awareness
  • Focus on media longevity, 'refreshing' strategies
  • Little practical experience (except for data
    archives)
  • Some continuity, e.g. it was recognised
  • That the obsolescence of hardware (and software
    environments) was a serious problem
  • That data management strategies and
    documentation/metadata were important
  • That digital resources were not conceptually
    different to non-digital ones

18
The current context (1)
  • The World Wide Web
  • Changes in scholarly communication, e.g.
  • Increased use of electronic journals, e-print
    repositories
  • Changes in scientific practice data-intensive
    science, Grid computing, petabyte-scale storage,
    e-research
  • Current focus on open access
  • Similar developments elsewhere, e.g.
  • Broadcasting, e-commerce, e-government, ...

19
The current context (2)
  • Task Force on Archiving of Digital Information
    (1996)
  • in UK led to influential research projects like
    Cedars, eventually to the Digital Preservation
    Coalition (DPC)
  • Major current initiatives
  • US National Digital Information Infrastructure
    and Preservation Program (NDIIPP)
  • ERPANET, NESTOR, KB's e-Depot, etc.
  • UK Digital Curation Centre

20
Digital Curation Centre (1)
  • Funded from 2004 for three years by the JISC and
    the e-Science Core Programme
  • Main aim "continuing improvement in the quality
    of data curation and digital preservation"
  • Will focus on all aspects of the research
    process, e.g. from data creation to publication
    and beyond, also on the work of repositories and
    data archives
  • Not itself a digital repository, but offering
    outreach and practical services to assist those
    who curate data

21
Digital Curation Centre (2)
  • Main activities
  • Advisory services and outreach
  • Development
  • Registries of Representation Information, testing
    of tools,
  • Research programme
  • Role of annotation, legal and socioeconomic
    issues,
  • Collaborative network of associates
  • Partners Universities of Edinburgh (lead),
    Glasgow and Bath (UKOLN), CCLRC
  • http//www.dcc.ac.uk/

22
Key developments (1)
  • Greater awareness of the issues
  • Digital preservation now beginning to be taken
    seriously by governments and NGOs (e.g. Unesco
    Charter on the Preservation of Digital Heritage,
    World Summit on the Information Society)
  • More experience with developing systems and
    tools, e.g.
  • DIAS (IBM), DSpace, Fedora, Internet Archive,
    LOCKSS, OCLC Digital Archive, PANDAS, PubMed
    Central, Storage Resource Broker, etc.
  • Journal publishers co-operating with KB on e-Depot

23
Key developments (2)
  • Standards
  • Reference Model for an Open Archival Information
    System (OAIS) - ISO 147212003
  • A reference model, not a blueprint - but
    increasingly influential
  • Preservation metadata
  • Current focus on PREMIS working group, supported
    by OCLC and Research Libraries Group
  • Other activity ongoing, e.g. in scientific
    research domains

24
Research (1)
  • Some key requirements identified in
  • It's about time research challenges in digital
    archiving and long-term preservation, National
    Science Foundation and Library of Congress
    (2003)http//www.digitalpreservation.gov/repor/N
    SF_LC_Final_Report.pdf
  • Invest to Save report and recommendations of the
    NSF-DELOS Working Group on Digital Archiving and
    Preservation (2003)http//delos-noe.iei.pi.cnr.i
    t/activities/internationalforum/Joint-WGs/digitala
    rchiving/Digitalarchiving.pdf

25
Research (2)
  • DELOS preservation cluster
  • Frameworks for the analysis of preservation
    strategies
  • Building preservation functionality into digital
    libraries
  • File formats and metadata
  • Workshop on Digital repositories
    interoperability and common services, Crete,
    11-13 May 2005http//www.ukoln.ac.uk/events/delo
    s-rep-workshop/

26
Research (3)
  • Current JISC research programmes
  • Supporting Digital Preservation and Asset
    Management in Institutions
  • Relatively small-scale projects assessment
    tools, training, user guides, etc.
  • Digital Repositories (deadline last week)
  • Building on Focus on Access to Institutional
    Resources (FAIR) programme
  • http//www.jisc.ac.uk/

27
Some issues (1)
  • Open access repositories and preservation
  • Exact role of repositories still evolving
  • Some advocates of open access treat digital
    preservation concerns as a distraction to the
    primary task of "filling up the archives"
  • But the recent National Institutes of Health
    public access policy requests grantees to submit
    publications to PubMed Central - emphasising its
    role for permanent preservation
  • Disaggregated model proposed, whereby not all
    repositories will have preservation
    responsibilities
  • Possible need for mechanisms for transferring
    content to third parties, e.g. national libraries

28
Some issues (2)
  • Trusted repositories
  • Attributes and responsibilities of 'trusted
    repositories' defined by RLG and OCLC working
    group (2002)
  • Builds on 1996 Task Force report and OAIS model
  • Attributes include the viability and financial
    sustainability of the organisation, and the need
    for accountability
  • Question whether these (and other criteria) could
    be used as a basis for certification is being
    explored by the Task Force on Digital Repository
    Certification, supported by RLG and the National
    Archives and Records Administration (NARA)

29
Some issues (3)
  • Collection development
  • Selection/appraisal, storage, access,
    'de-selection'
  • Preservation issues need to be considered early
    in an object's life-cycle (the traditional
    'transfer to repository' model will not work)
  • Rethinking concept of 'custody'
  • Cannot be done in isolation
  • Sharing responsibilities across repositories
    while maintaining useful redundancy

30
Some issues (4)
  • Legal issues
  • Repositories need the legal right to copy,
    migrate, reverse engineer software, etc.
  • Problems with identifying rights holders
  • Access - are "dark archives" the answer?

31
Some issues (5)
  • Economic issues
  • Still very little known about costs over the long
    term
  • No widely used economic models
  • Research-type funding is not long-term
  • Recent draft report for National Science
    Foundation asks whether digital collections
    should be treated like scientific facilities

32
Summing up (1)
  • Major differences from the late 1980s
  • Problem has grown, but awareness of it is now
    much higher
  • Many research projects, vendors, services, etc.
    now investigating this problem - not always
    particularly co-ordinated
  • Encouraging signs in funding of NDIIPP, DCC and
    other recent initiatives

33
Summing up (2)
  • Co-operation is essential
  • Some progress, e.g. DPC, ERPANET
  • Need to work out how trusted repositories will
    work together in a distributed network
  • Need for training
  • Many problems remain to be resolved
  • Research (e.g. into provenance of data, the role
    of file format registries)
  • Development of tools
  • Integrating existing work

34
More information
  • National Library of Australia's Preserving Access
    to Digital Information (PADI) gatewayhttp//www.
    nla.gov.au/padi/
  • Joint DPC and PADI bulletin What's New in Digital
    Preservationhttp//www.dpconline.org/graphics/wh
    atsnew/
  • UK Digital Curation Centrehttp//www.dcc.ac.uk/

35
Acknowledgements
  • The Digital Curation Centre is funded by the
    Joint Information Systems Committee (JISC) of the
    UK higher and further education funding councils
    and the Core e-Science Programme of the UK
    research councils. The consortium comprises the
    University of Edinburgh (lead partner), the
    University of Glasgow, the Council for the
    Central Laboratory of the Research Councils, and
    the University of Bath (UKOLN).
  • http//www.dcc.ac.uk/
  • UKOLN is funded by the Council for Museums,
    Libraries and Archives (MLA) and the JISC, as
    well as by project funding from the JISC, the
    European Union and other sources. UKOLN also
    receives support from the University of Bath,
    where it is based.
  • http//www.ukoln.ac.uk/
Write a Comment
User Comments (0)
About PowerShow.com