Permanent access to the record of science - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Permanent access to the record of science

Description:

DIAS is the technical heart of the e-Depot archiving infrastructure ... Repository is OAI compatible (Open Archiving Initiative) ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 26
Provided by: hildevanwi
Category:

less

Transcript and Presenter's Notes

Title: Permanent access to the record of science


1
Permanent access to the record of science
  • The KB e-Depot and its role in information
    preservation
  • Hilde van Wijngaarden
  • Head, Digital Preservation Department
  • National Library of the Netherlands
  • DPE/Driver/NESTOR/DCC Workshop
  • Berlin, 28 November 2007

2
Introducing the National Library of the
Netherlands
  • Koninklijke Bibliotheek
  • Medium-sized national library, founded in 1798
  • Financed by Ministry of Education, Culture and
    Science
  • Annual budget 50 million, 270 fte
  • Digital archiving and preservation embedded in
    organization
  • Financing for digital preservation
  • 1,1 million for staff and system maintenance
  • 1,3 million for RD
  • Additional external funding for collaborative
    projects

3
Organisation of digital preservation
  • e-Depot department
  • 7 persons 1 vacancy
  • daily operations ingest and error handling
  • publisher contacts
  • metadata conversions
  • Digital Preservation department
  • 8 persons 5 vacancies
  • RD of tools to ensure permanent access
  • Projects to develop new services for the e-Depot
  • IT department
  • 2 ½ persons
  • Maintenance of e-Depot system
  • Programming and technical architecture

4
The e-Depot and DIAS
  • DIAS is the technical heart of the e-Depot
    archiving infrastructure
  • Integrated with other library modules
  • Functionalities
  • Ingest of e-journals, e-books, digitized
    publications, and CDRs
  • Automatic batch ingest
  • Authentic publications are archived, all formats
    in common use are accepted
  • Automatic validation (checksums, integrity
    checks), error handling
  • Metadata conversion
  • Batch delivery

5
Scale
  • Volume
  • 10 million e-publications currently
  • Size
  • 1 e-publication equals 1 Mb on average
  • 1 Terabyte for every 1 million publications
  • Capacity
  • 5,000 50,000 e-publications ingested per day

6
RD Digital preservation
  • Three questions have to be answered
  • What do you have?
  • What do you want to preserve?
  • What can you do?
  • Preservation involves the whole proces from
    creation to access
  • Important steps in the digital preservation
    workflow
  • Characterisation
  • Risk assessment
  • Defining significant properties
  • Preservation action Migration Emulation

7
RD Digital preservation at the KB (1)
  • Characterization
  • Developing modules for identification, validation
    and characterisation
  • Using JHove, DROID and in-house development
  • Setting up Jhove-error-database
  • Risk assessment
  • File format information
  • New requirements for the Preservation Manager
  • Interoperability with external file format
    registries
  • Defining significant properties for different
    collections
  • Set policies and strategies

8
RD Digital preservation at the KB (2)
  • What to do when obsolescence threatens?
  • Adapt the object migration
  • Development and implementation in three steps
  • Migration on ingest normalization (operational
    module spring 2008)
  • Batch-migration
  • Migration on access
  • Adapt the environment emulation
  • Project to develop a modular emulator for digital
    preservation together with Nationaal Archief of
    the Netherlands and Tessella Support Services
  • Dioscuri delivered in July 2007, open source
    available
  • Further development in PLANETS

9
(No Transcript)
10
(No Transcript)
11
DIOSCURI Emulator for digital preservation
  • Project together with Dutch National Archives
  • Full hardware emulation in software (JAVA)
  • Using existing emulators as examples (Bochs,
    QEMU)
  • Open-source
  • International experts involved (Jeff Rothenberg)

Dioscuri available http//dioscuri.sourceforge.net
/
12
Preservation Planning
Data Management
SIP
Ingest
DIP
Access
PRODUCER
CONSUMER
AIP
Archival Storage
AIP
Administration
13
Data Management
SIP
DIP
Ingest
Access
PRODUCER
CONSUMER
AIP
Archival Storage
AIP
Administration
14
Preservation Planning
Data Management
SIP
Ingest
Access
DIP
PRODUCER
CONSUMER
AIP
Archival Storage
AIP
Administration
15
The international e-Depot
  • Mission ensures permanent access to the
    published records of science
  • 2002 Landmark archiving agreement with Elsevier
  • ? The e-Depot goes international
  • More archiving agreements (2003-2007)
  • Kluwer Academic Publishers, BioMed Central,
    Blackwell, Oxford University Press, Taylor
    Francis, Sage, Springer, Brill Academic
    Publishers, Dutch Publishers Association
  • KB offers
  • Long-term archiving, permanent access, metadata
    conversion
  • Access procedures depend on type of
    publisher/depositor
  • Publishers offer
  • Their journals and money?

16
The Safe Places Network
  • A network of institutions dedicated to permanent
    archiving and preservation of the published
    records of science
  • To share responsibility for complete, world-wide
    coverage and allocate tasks accordingly
  • Safe Places Network secures systematic,
    coordinated preservation
  • In case of loss, libraries know where to go

17
DARE (Digital Academic Repositories)
  • Digital Academic Repositories (DARE)
  • Goal to make the research output digitally
    accessible and available for different
    services
  • Partners
  • 14 Universities and Organisations of Higher
    Education
  • Netherlands Organisation for Scientific Research
  • Royal Netherlands Academy of Arts and Sciences
  • KB, National Library of the Netherlands
  • Coordination SURF Foundation

18
DARE (Digital Academic Repositories)
  • DARE-net network of university repositories in
    the
  • Netherlands (2004)
  • Agreement on repository principles
  • Repository is OAI compatible (Open Archiving
    Initiative)
  • Descriptive metadata in Dublin Core Simple
  • Structural metadata in MPEG21-DIDL
  • Metadata in XML
  • Special attention for preservation
  • Role for KB, National Library of the Netherlands
  • e-Depot

19
Agreement KB and DARE
  • No official contract, but practical approach
    work-agreements
  • In accordance with e-Depot current practice
  • Submission on voluntary basis
  • What is a publication? What is scientific?
  • Those objects that are made available on the
    public Internet through Institutional
    Repositories
  • Return-delivery will be made available
  • e-Depot is not a back-up, but will take necessary
    actions to keep the documents accessible

20
Harvesting of DARE material
  • Harvesting by KB, in consultation with university
    library (day time)
  • Based on OAI-PMH protocol
  • Containers based on MPEG21-DIDL
  • Initial harvest november 2006
  • Second harvest 2007
  • Regular updates

21
File formats in DARE
  • Mostly PDF files (90)
  • Around 20 popular other file formats
  • Dynamic webpages received as HTML pages
  • Written agreement allows this set
  • Zip files allowed uncompressed
  • Only those 20 file formats in Zip
  • Negotiation about new file formats
  • Normalisation to be applied, original will always
    be maintained

22
New content, new challenges
  • Webarchiving
  • Project started in 2005
  • Selection of the Dutch web
  • Not just harvesting but PRESERVATION
  • Websites to be stored in e-Depot
  • Challenge keep websites as complete units
    accessible over time
  • Digitised master images
  • Preservation scanning
  • Preserve the images in what format, quality of
    scanning process
  • High volumes do master images always have to be
    TIFFs?

23
Preservation/curation of complex objects
  • Complex objects publications with underlying
    data?
  • Publications in one place, link to data in
    another place
  • What underlying data? Publications can also
    contain interactive models, moving images etc.
  • And what about linking publications and data to
    websites?
  • definitely complex!

24
Cooperation in Digital Preservation
  • Work together to address these complex issues
  • PLANETS European project to develop Preservation
    Planning tools
  • Collaboration with Portico
  • Development of preservation systems
  • DIAS/IBM enhance the system, international user
    group
  • Set common requirements with SUB and DNB
  • Webarchiving Internation Internet Preservation
    Coalition (IIPC)
  • The Netherlands set up National Digital
    Preservation Coalition
  • Connect with e-science community
  • Alliance for permanent access
  • PARSE-Insight
  • DRIVER II

25
Contact Hilde.vanwijngaarden_at_kb.nl www.kb.nl/e-
Depot
Write a Comment
User Comments (0)
About PowerShow.com