Presentaci - PowerPoint PPT Presentation

About This Presentation
Title:

Presentaci

Description:

Long term preservation of the digital information created and stored using ... Diccionario Biogr fico Real Academia Historia. Archivo General Militar, Segovia ... – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 43
Provided by: Fern183
Learn more at: https://www.erpanet.org
Category:

less

Transcript and Presenter's Notes

Title: Presentaci


1
  • Experiences on Migration of Data in Digitization
    Projects
  • Julián Bescós

Presentation for the ERPANET Workshop Workflow in
Digital Preservation Budapest, 13-15 October 2004
2
  • The Migration Issue
  • Our Experience
  • Migration Tasks
  • Best Practices for Preservation
  • Planning and Schedule

3
  • Migration is the set of tasks to achieve periodic
    transfer of digital materials from one hard/soft
    configuration to another
  • Purpose
  • Long term preservation of the digital information
    created and stored using digital technology
  • Allow broad access
  • Retrieve, display and use
  • Origin
  • New devices, processes and software replace the
    methods to record, store and access
  • New standards
  • Enhancement of service

4
  • Technology obsolescence
  • Hardware
  • More powerfull computers and higher density
    storage
  • Elements for updating are not available (
    increase of storage,
    memory, etc)
  • Basic software
  • Operating systems
  • Data base managers
  • Media
  • Lifetime is rarely the constraining factor for DP
  • Obsolescence of old storage media as newer and
    better media are available in the market
  • Obsolescence of the Access software
  • Access in new platform and media
  • Not available long term programs
  • Changes in metadata and in image formats
  • New functions of the software

5
  • In practice it is a combination of
  • Technology obsolescence
  • New functionalities of the software
  • Derived from information and communication
    technology
  • Daily work on digitisation, storage and access
    requiring
  • Higher density storage
  • Faster computers
  • It is a consequence of
  • The digital world of information and
    communication technology is still relatively
    young and inmature

6
  • Beginning in 1988 with the design and development
    of the Information System for the Archivo de
    Indias in Seville
  • Computarization of 66 Archives and Libraries of
    different kinds and sizes in Spain and abroad
  • Digitalization of more than 20 millions pages of
    ancient documents
  • Installation of more than 320 workstations
  • Development of the own products ArchiDOC-ArchiGES
    for Archives
  • With a team in the areas of consulting, managing,
    development, installation, trainning and
    maintenance of systems for archives

Archivo General de Indias, Sevilla
Access Room in 1992
7
  • MAIN PROJECTS WITH DIGITALIZATION

Archivo General de Indias, Sevilla Archivo
General de Simancas Archivo Histórico Nacional,
Madrid Archivo Histórico Nacional - Sección
Nobleza, Toledo Archivo Histórico Nacional
Sección Guerra Civil, Salamanca Archivo de la
Corona de Aragón, Barcelona Archivo General de
Navarra Archivo del Reino de Valencia Archivo del
Reino de Mallorca Biblioteca Sancho el Sabio,
Vitoria Archivo Virtual de la corona de Aragón (
con Imágenes del ACA y AHN) Archivo Eclesiástico
de Poblet Archivo Histórico Universidad de
Salamanca Archivo Histórico de la Universidad de
Santiago de Compostela Archivo Histórico de la
Universidad de Oviedo Archivo General de la
Nación, Colombia Archivo Histórico Ultramarino,
Lisboa Archivo del Nacionalismo de la Fundación
Sabino Arana, Vizcaya Biblioteca Valenciana
Archivo del Ilustre Colegio Notarial de
Granada Real Academia Española (Diccionarios
Histórico) Diccionario Biográfico Real Academia
Historia
Archivo General Militar, Segovia Archivo General
Militar, Ávila Instituto de Historia y Cultura
Militar Archivo General de la Marina, El Viso
del Marqués, Ciudad Real Archivo Histórico
Provincial de Murcia Sistema de Información del
Archivo, Biblioteca, Fototeca y Videoteca de Cruz
Roja Española Biblioteca de la Fundación
Francisco de Zabalburu, Madrid Biblioteca
Parlamento Vasco Archivo-Biblioteca de la
Diputación de Cáceres Digitalización de 11
periódicos para 11 Instituciones Vascas de Prensa
retrospectiva y prensa actual Archivo Municipal
de Castellón de la Plana Archivo Histórico del
Excmo. Ayuntamiento de La Laguna,
Tenerife Archivo del Ayuntamiento Oviedo Archivo
del Komintern, Moscow and its replica in 6
National Archives, LOC and Open Society Archives
Archivo General Militar, Segovia
Archivo General de Navarra
Zabalburu Library
8
(No Transcript)
9
(No Transcript)
10
  • 1. Projects from 1988 1992
  • Computer System for Archivo General de Indias
  • The Archive contains 86 million of pages of
    original manuscripts related to the Spanish
    Administration in America (XV-XIX centuries), in
    43.000 bundles
  • The Computer System integrated
  • A Textual Data Base with 400.000 descriptive
    entries
  • A Digital Image Archive with 11 million digital
    images in 1995
  • A Module for User and Document Management
    Control of User management, Consultation room,
    documents movements and statistics
  • Access by researchers and archivists from 50
    workstations
  • About 30 of present consultations are on the
    screen (1 million pages/year )
  • About 35 of printing are digital ( 85.000/year )
  • Access system in service since 1992

11
  • Architecture
  • The Data Base for Descriptions in SQL/400 keeps
    the hierarchical structure of fonds
  • Standalone Digitization Workstations with flat
    bed scanners and optical disk driver under DOS
  • Images servers based on PCs with optical disk
    drivers
  • Access from PCs under OS/2
  • Image Acquisition and Storage
  • 11 million images digitized in gray levels with
    high fidelity with respect to the original
    manuscripts
  • Low cost workstations
  • Legibility Enhancements applied by users at the
    consultation time
  • Non expert digitization operators
  • Digitization 100 dpi, 16 gray levels
  • 1 Page/minute, 15 workstations, 2 turns, 4 years

12
  • Image Acquisition and Storage
  • Images stored in WORM optical disks
  • The structure at the low level ( bundle/documents
    ) was also in directories in the WORM disks
  • Access to images in one disk done through the
    call number of the document
  • Images path as metadata images names had
    information about document call number and number
    of page.
  • Not available standard compression for gray level
    images. Images were DPCM compressed by software
    without losses.
  • Compressed Image size of A4 300-350 Kbytes
  • Storage for 1 bundle 2000 x 350 700 MB

13
  • Image Acquisition and Storage
  • Media for storage of digital images
  • Bundles Media Year beg. Number of disks Images
  • 1.729 IBM optical disks ( 200 MB) 1989
    6.916 3.458.000
  • 3.732 Plasmon optical disks ( 940 MB) 1991
    3.732 7.464.000
  • 50 CD-R (640 MB) 1996 100.000

14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
Example of blotches removal to be applied by the
user
18
(No Transcript)
19
Example of reduction of ink bleeding through the
paper
20
Archivo General de Indias
Digitization Room of Archivo de Indias in 1989
21
Archivo General de Indias
Shelf with optical disks
22
  • 2. Projects from 1992 1996
  • Data Base Server under OS/2 and DB2
  • Access and Digitization workstations from PCs
    with OS/2
  • The relational Data Base keeps the hierarchical
    structure of documentation
  • Images stored in CDRs
  • Directory structures and image names changed.
  • Metadata in binary control files Each image has
    information about signature, position in
    hierarchical structure, number of page, notes
  • Image compression JPEG
  • Metadata in images resolution, date, dimensions

23
Example metadata in Binary Control File
  • The file keeps information about the hierarchical
    structure
  • It maintains relationship between each image file
    and its position in the document.
  • The control file and its metadata can be imported
    into the database

24
  • Migration of Images of Archivo de Indias from
    10.600 optical disks to 6.000 CD-Rs
  • The images of a bundle are stored in 1 or 2 CD-R
  • Reading of optical disks through the network
  • No direct connectivity between optical disks and
    Windows NT
  • Main Operation Tasks
  • Decompression of the DPCM format
  • Compression on JPEG format
  • Temporary storage in magnetic disk
  • All images of the bundle are copied in CD-R
  • Verification of images by reading
  • 6.000 CD-Rs, and 6.000 CD-Rs backup copy

25
  • Migration of Images from 6.916 WORM IBM disks to
    CD-Rs
  • Typically 4 WORM disks ( 200 MB each) in 1 or 2
    CD-R

IBM Disks to CD-R
Pentium PC Windows NT Token-Ring PCI Card 3GB
disk SCSI interface
Microchannel IBM PS/2 File system driver for
OS/2 OS/2 1.3 and Lan Server TokenRing
Microchannel Card
Token Ring Network
CD-R Drives
IBM Optical Drives
26
  • Migration of Images from 3.732 WORM Plasmon to
    CD-Rs
  • 1 WORM Plasmon disk ( 940 MB) in 1 or 2 CD-R

Plasmon Disks to CD-R
PC with i486 SCSI interface File system driver
for OS/2 OS/2 3.0 Ethernet card
Pentium PC Windows NT Token-Ring PCI Card 3GB
disk SCSI interface
HUB Ethernet Network
HUB Ethernet Network
Plasmon Drives
CD-R Drives
27
  • Migration of Images of Archivo de Indias from
    10.600 optical disks to 6.000 CD-Rs
  • Requirements of personnel and time
  • 3 operators during 4 months
  • Similar migration schemes with less images
  • Library Sancho el Sabio ( Vitoria) 1.000.000
    images
  • University of Salamanca 700.000 images
  • Archivo General Militar, Segovia 200.000
    images
  • Archivo del Monasterio Poblet 100.000 images

28
  • 3. Projects from 1996 to now
  • Oracle Data Base
  • Access and Digitization workstations with PCs
    with W/NT,.. W XP
  • Capturing Images also using standard programs and
    their metadata
  • Images stored in magnetic disks. CDROMS as backup
  • Metadata in database Scanning operator, date of
    creation, Signature, path, dimensions in bytes
    Data about control of the information
  • Metadata in image resolution, dimensions Data
    for presentation in computers and for printing
  • Image quality
  • 200 300 dpi, 256 gray levels
  • Color images
  • Standard formats
  • TIFF, CCITTGIV
  • JPEG, PDF,

29
Example metadata in database
Modes of Image Display
Management of Image Access
30
Example metadata XML File
  • Same functionality than binary control file
  • Standard virtually any program can import these
    metadata

31
  • Migration of Archivo de Indias from CD-R to
    magnetic disk in 2000
  • Project for online access and Internet
  • Just copy. Images are already with JPEG
    compression
  • 10 RAID cabinets of 350 GB each ( 8 disks x 50 GB
    )
  • 1 operator was required during 1 month for the
    copy from a CD-ROM tower to magnetic disks
  • Transfer rate from different media
  • Media Transfer rate Image Bundle
  • IBM optical disk 60 KBs 6 seconds 4
    hours
  • Plasmon optical disk 100 KB/s 3 seconds 1
    hour
  • CD-R 16x 2,5 MB/s lt1 second 5 minutes
  • Magnetic disk 80 MB/s 1 minute
  • Similar Migrations
  • Sancho Sabio Library ( Vitoria) 1 million images
  • Zabalburu Library 700.000 images
  • Military Archives 500.000 images
  • Archivo General Navarra 600.000 images
  • Komintern Archives (Moscow) 1 million images
  • ........

Komintern Archives, Moscow
32
Archivo General de Indias
33
Archivo General de Indias
34
  • Analysis of origin and destination data models
  • Equivalence between of the fields in the origin
    and destination models
  • New versions include new metadata not available
    before
  • Development of migration software
  • Testing with a limited number of objects
  • Display of information in a destination card
  • Application of migration to all data
  • Verification of results
  • Correction of errors
  • Sometimes some images cannot be copied and must
    be recoverd from alternative media or even to be
    digitised again

Komintern Archives, Moscow
35
  • Preparation of the system for migration
  • Hardware and Basic Software
  • Magnetic disk storage for images
  • PCs with appropriate OS and DB manager
  • Development of Software (1 programmer, 2-3 weeks
    work )
  • Software development for migration
  • Testing of migration of data
  • Operation ( usually less than 1 week)
  • Significant operation with removable media

Komintern Archives, Moscow
36
  • General principles
  • Based on PCs and mainstream commercial equipment
  • Key hardware provided by first class IT companies
  • Database managers of widespread use
  • Consultations with institutions undertaking
    projects
  • Based on elements and standard formats. Officials
    or the facto, like TIFF, JPEG, XML, etc.
  • Modular, allowing a progressive installation and
    easy update of elements
  • Selection of software
  • Functionalities
  • Number of installations
  • Maintenance
  • Provided by a IT company settled in the sector
  • Key factors
  • Server, operating system, database manager
  • Backup policies

37
  • Digitization
  • Capture systems
  • Robust flatbed scanners (A3)
  • Zenithal scanners. Digital cameras with
    limitations.
  • Use of standard compression formats. JPEG,
    CCITTGIV
  • Ensure that digital images will allow a broad
    range of future use
  • Capture the highest quality image technically
    possible and economically feasible for
    large-scale production
  • Capture the informational content / physical
    appearance
  • Fast and easy correction of errors
  • Criteria for holding selection
  • Value
  • Condition
  • Use
  • Acceptability of the digital object
  • Access aids

38
  • Storage
  • Media of wide use and low cost
  • Magnetic disk for on line image service
    (specially in high demand)
  • Disks with redundancy
  • Backup in tapes of high capacity (10/20GB)
  • One or two units available as hotsawp
  • It allows migration without personnel operation
  • In a distributed network they may need to be
    stored online in multiple locations
  • CD-R or DVD as backup for off line access in case
    of system failure
  • In general there is little experience in storing
    massive quantities of culturally valuable
    materials
  • Backup and Recovery
  • Use industry standard backup and recovery
    procedures
  • Periodic backup to magnetic tape
  • A copy held on site for near term recovery
  • A copy off-site stored for disaster recovery

39
  • Traditional approach of Computer Science
  • Migration of media
  • Refreshing digital information by copying it from
    medium to medium
  • Conversion of files to another format to be
    interpreted by new programs to a reduced number
    of standard formats
  • Migration of technology platform
  • Server and PCs
  • Periphericals
  • Capture devices and CDR writers
  • Operating system and database manager
  • Migration of the digitising and access software
  • Maintenance of software in new platform
  • New software versions for digitising and access

40
  • Planning for migration is difficult due to
  • the limited experience
  • we cannot predict when media, soft and hard will
    become obsoleted
  • No single strategy applies to all formats of
    digital information
  • It varies in different applicational
    environments, for different formats of digital
    materials and for preserving different degrees of
    computation, display and retrieval
  • It requires a unique new solution for each new
    format and process
  • Automatic conversion is only partially possible
  • In general there are no firm plans for migration,
    but to stay up to date with current technologies
    by migration the content
  • Usually there is urgency involved in migration
    due by the obsolescence of soft and hard

41
  • Schedule
  • New releases of software, databases,etc. can be
    expected every 2-3 years, with minor updates more
    often
  • Migration from one storage media to another every
    4-5 years, if not online
  • Migration to new hardware and software occur less
    frequently but can be expected between 5-10 years

42
  • Best practices for Digital Preservation
  • Mainstream commercial equipment
  • Use of standard formats
  • Storage in magnetic disk with redundancy
  • Backup policies
  • Maintenance
  • Periodical Update Policy
  • Hardware
  • Media
  • Basic sofware
  • Application software
Write a Comment
User Comments (0)
About PowerShow.com