Title: Permanent access to the record of science
1Permanent access to the record of science
- The KB e-Depot and its role in information
preservation - Hilde van Wijngaarden
- Head, Digital Preservation Department
- National Library of the Netherlands
- DPE/Driver/NESTOR/DCC Workshop
- Berlin, 28 November 2007
2Introducing the National Library of the
Netherlands
- Koninklijke Bibliotheek
- Medium-sized national library, founded in 1798
- Financed by Ministry of Education, Culture and
Science - Annual budget 50 million, 270 fte
- Digital archiving and preservation embedded in
organization
- Financing for digital preservation
- 1,1 million for staff and system maintenance
- 1,3 million for RD
- Additional external funding for collaborative
projects
3Organisation of digital preservation
- e-Depot department
- 7 persons 1 vacancy
- daily operations ingest and error handling
- publisher contacts
- metadata conversions
- Digital Preservation department
- 8 persons 5 vacancies
- RD of tools to ensure permanent access
- Projects to develop new services for the e-Depot
- IT department
- 2 ½ persons
- Maintenance of e-Depot system
- Programming and technical architecture
4The e-Depot and DIAS
- DIAS is the technical heart of the e-Depot
archiving infrastructure - Integrated with other library modules
- Functionalities
- Ingest of e-journals, e-books, digitized
publications, and CDRs - Automatic batch ingest
- Authentic publications are archived, all formats
in common use are accepted - Automatic validation (checksums, integrity
checks), error handling - Metadata conversion
- Batch delivery
5Scale
- Volume
- 10 million e-publications currently
- Size
- 1 e-publication equals 1 Mb on average
- 1 Terabyte for every 1 million publications
- Capacity
- 5,000 50,000 e-publications ingested per day
6RD Digital preservation
- Three questions have to be answered
- What do you have?
- What do you want to preserve?
- What can you do?
- Preservation involves the whole proces from
creation to access - Important steps in the digital preservation
workflow - Characterisation
- Risk assessment
- Defining significant properties
- Preservation action Migration Emulation
7RD Digital preservation at the KB (1)
- Characterization
- Developing modules for identification, validation
and characterisation - Using JHove, DROID and in-house development
- Setting up Jhove-error-database
- Risk assessment
- File format information
- New requirements for the Preservation Manager
- Interoperability with external file format
registries - Defining significant properties for different
collections - Set policies and strategies
8RD Digital preservation at the KB (2)
- What to do when obsolescence threatens?
- Adapt the object migration
- Development and implementation in three steps
- Migration on ingest normalization (operational
module spring 2008) - Batch-migration
- Migration on access
- Adapt the environment emulation
- Project to develop a modular emulator for digital
preservation together with Nationaal Archief of
the Netherlands and Tessella Support Services - Dioscuri delivered in July 2007, open source
available - Further development in PLANETS
9(No Transcript)
10(No Transcript)
11DIOSCURI Emulator for digital preservation
- Project together with Dutch National Archives
- Full hardware emulation in software (JAVA)
- Using existing emulators as examples (Bochs,
QEMU) - Open-source
- International experts involved (Jeff Rothenberg)
Dioscuri available http//dioscuri.sourceforge.net
/
12Preservation Planning
Data Management
SIP
Ingest
DIP
Access
PRODUCER
CONSUMER
AIP
Archival Storage
AIP
Administration
13Data Management
SIP
DIP
Ingest
Access
PRODUCER
CONSUMER
AIP
Archival Storage
AIP
Administration
14Preservation Planning
Data Management
SIP
Ingest
Access
DIP
PRODUCER
CONSUMER
AIP
Archival Storage
AIP
Administration
15The international e-Depot
- Mission ensures permanent access to the
published records of science - 2002 Landmark archiving agreement with Elsevier
- ? The e-Depot goes international
- More archiving agreements (2003-2007)
- Kluwer Academic Publishers, BioMed Central,
Blackwell, Oxford University Press, Taylor
Francis, Sage, Springer, Brill Academic
Publishers, Dutch Publishers Association - KB offers
- Long-term archiving, permanent access, metadata
conversion - Access procedures depend on type of
publisher/depositor - Publishers offer
- Their journals and money?
16The Safe Places Network
- A network of institutions dedicated to permanent
archiving and preservation of the published
records of science - To share responsibility for complete, world-wide
coverage and allocate tasks accordingly - Safe Places Network secures systematic,
coordinated preservation - In case of loss, libraries know where to go
17DARE (Digital Academic Repositories)
- Digital Academic Repositories (DARE)
- Goal to make the research output digitally
accessible and available for different
services - Partners
- 14 Universities and Organisations of Higher
Education - Netherlands Organisation for Scientific Research
- Royal Netherlands Academy of Arts and Sciences
- KB, National Library of the Netherlands
- Coordination SURF Foundation
18DARE (Digital Academic Repositories)
- DARE-net network of university repositories in
the - Netherlands (2004)
- Agreement on repository principles
- Repository is OAI compatible (Open Archiving
Initiative) - Descriptive metadata in Dublin Core Simple
- Structural metadata in MPEG21-DIDL
- Metadata in XML
- Special attention for preservation
- Role for KB, National Library of the Netherlands
- e-Depot
19Agreement KB and DARE
- No official contract, but practical approach
work-agreements - In accordance with e-Depot current practice
- Submission on voluntary basis
- What is a publication? What is scientific?
- Those objects that are made available on the
public Internet through Institutional
Repositories - Return-delivery will be made available
- e-Depot is not a back-up, but will take necessary
actions to keep the documents accessible
20Harvesting of DARE material
- Harvesting by KB, in consultation with university
library (day time) - Based on OAI-PMH protocol
- Containers based on MPEG21-DIDL
- Initial harvest november 2006
- Second harvest 2007
- Regular updates
21File formats in DARE
- Mostly PDF files (90)
- Around 20 popular other file formats
- Dynamic webpages received as HTML pages
- Written agreement allows this set
- Zip files allowed uncompressed
- Only those 20 file formats in Zip
- Negotiation about new file formats
- Normalisation to be applied, original will always
be maintained
22New content, new challenges
- Webarchiving
- Project started in 2005
- Selection of the Dutch web
- Not just harvesting but PRESERVATION
- Websites to be stored in e-Depot
- Challenge keep websites as complete units
accessible over time - Digitised master images
- Preservation scanning
- Preserve the images in what format, quality of
scanning process - High volumes do master images always have to be
TIFFs?
23Preservation/curation of complex objects
- Complex objects publications with underlying
data? - Publications in one place, link to data in
another place - What underlying data? Publications can also
contain interactive models, moving images etc. - And what about linking publications and data to
websites? - definitely complex!
24Cooperation in Digital Preservation
- Work together to address these complex issues
- PLANETS European project to develop Preservation
Planning tools - Collaboration with Portico
- Development of preservation systems
- DIAS/IBM enhance the system, international user
group - Set common requirements with SUB and DNB
- Webarchiving Internation Internet Preservation
Coalition (IIPC) - The Netherlands set up National Digital
Preservation Coalition - Connect with e-science community
- Alliance for permanent access
- PARSE-Insight
- DRIVER II
25Contact Hilde.vanwijngaarden_at_kb.nl www.kb.nl/e-
Depot