Storage - PowerPoint PPT Presentation

About This Presentation
Title:

Storage

Description:

Hard disk storage, traditional backup methods not cost-effective ... Hierarchical storage (metadata on disk, data on tape - 30-90 second to start transfer. ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 15
Provided by: JonD6
Learn more at: https://dlib.indiana.edu
Category:
Tags: storage

less

Transcript and Presenter's Notes

Title: Storage


1
Storage
  • Why is storage an issue?
  • Space requirements
  • Persistence
  • Accessibility
  • Needs depend on purpose of storage
  • Capture/encoding
  • Access/delivery
  • Preservation

2
Storage Working Space
  • Space for storage of digital files during
    capture/encoding/quality control process
  • Possibilities
  • PC hard drive
  • File server, e.g. marengo (LIT)
  • DLP file server
  • Issues
  • Capacity, backup, speed, accessibility

3
Storage Access/Delivery
  • Storage for web delivery of images, audio, text,
    etc.
  • Possibilities
  • UITS web server, under library account
  • UITS streaming media server (audio/video)
  • DLP web server
  • Issues capacity, backup, performance, software
    integration, maintenance/migration

4
Storage Preservation
  • Much harder problem
  • Longer term
  • Issues of longevity of media, hardware, file
    format
  • Where are the files?
  • Larger files
  • Hard disk storage, traditional backup methods not
    cost-effective
  • Infrequency of access
  • Problems do not become immediately evident

5
Long-Term Storage Options
  • Removable media
  • e.g. CD-R, DVD-R
  • Pros cheap, easy, produces tangible item
  • Cons Low capacity, physical space requirements,
    unknown longevity, migration
  • Nearline storage
  • UITS Massive Data Storage Service

6
UITS MDSS
  • Massive Data Storage Service
  • HPSS (High Performance Storage System) software
  • Developed as collaboration of IBM and US national
    labs
  • Four tape robots (two at IUB, two at IUPUI)
  • Data can be mirrored
  • 540 TB total storage
  • 75 TB used as of April 2001

7
MDSS A Sense of Scale
  • 2 Kilobytes A typewritten page
  • 5 Megabytes Complete works of Shakespeare OR
    30 seconds of TV quality video
  • 1 Gigabyte (1000MB) 1 pickup truck filled with
    paper OR a symphony in
    hi-fi sound
  • 1 Terabyte (1000GB) All the X-ray films in a
    large hospital OR paper from 50,000 trees
  • 10 Terabytes The printed collection of the US
    Library of Congress
  • 50 Terabytes The contents of a large mass
    store system
  • 8 Petabytes (8000TB) All information available
    on the web
  • 200 Petabytes All the printed material (in the
    world!)

8
MDSS Storage Infrastructure

9
MDSS
  • Access
  • FTP/PFTP (Parallel) File Transfer Protocol
  • DFS Distributed File System (being phased out)
  • HSI
  • Not practical for delivery
  • Hierarchical storage (metadata on disk, data on
    tape -gt 30-90 second to start transfer.)
  • File size chunks of 50 MB or greater work best
  • Small files aggregated into larger .tar or .zip
    files

10
DL Objects
  • Digital library objects have many parts
  • Metadata
  • Preservation files
  • Delivery files
  • How do we keep them connected?
  • Now Good practice in file naming, directory
    organization, project documentation -not
    scalable!
  • Future Digital object repository

11
Data Persistence
  • Key is migration
  • Keeping the bits alive - MDSS responsibility
  • Physical media
  • Logical media format
  • Keeping the bits understandable - MDSS user
    responsibility
  • File format
  • Metadata
  • Small pockets of digital content pose a problem
    for migration

12
DL Object Repository
Preservation version in MDSS
Repository System
Users and applications
Delivery version on web server
Metadata records
13
DL Repository Models
  • OAIS Open Archival Information System Reference
    model
  • Fedora Flexible and Extensible Digital Object
    and Repository Architecture
  • Developed at Cornell and UVa
  • IU DLP in deployment group

14
DLP Storage Services
  • Consulting
  • Server space for production and access
  • Persistent naming service (PURL server)
  • Facilitation of access to UITS services
  • Streaming media
  • MDSS
  • Developing repository service
  • Contact diglib_at_indiana.edu
Write a Comment
User Comments (0)
About PowerShow.com