Preservation Environments - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Preservation Environments

Description:

Logical name space - location independent naming ... 3Ware Escalade 7500-12 port PCI bus IDE RAID. 10 Western Digital Caviar 200-GB IDE disk drives ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 12
Provided by: arunj9
Category:

less

Transcript and Presenter's Notes

Title: Preservation Environments


1
Preservation Environments
  • Reagan W. Moore
  • San Diego Supercomputer Center
  • http//www.npaci.edu/DICE/

2
Distributed Data ManagementUsing Storage
Resource Broker
  • Data collecting
  • Sensor systems, object ring buffers and portals
  • Data organization
  • Collections, manage data context
  • Data sharing
  • Data grids, manage heterogeneity
  • Data publication
  • Digital libraries, support discovery
  • Data preservation
  • Persistent archives, manage technology evolution
  • Data analysis
  • Processing pipelines, manage knowledge extraction

3
Persistent Archives
  • Implements abstractions required to manage
    technology evolution
  • Logical name space - location independent naming
  • Storage abstraction - replicate data onto any
    storage system
  • Information Repository abstraction - manage
    preservation attributes in multiple databases
  • Access abstraction - support multiple access
    mechanisms
  • Authenticity mechanisms

4
SDSC Storage Resource Broker Meta-data Catalog
Application
OAI WSDL
Linux I/O
DLL / Python
Java, NT Browsers
Access APIs
Unix Shell
C, C Libraries
GridFTP

Consistency Management / Authorization-Authenticat
ion
Prime Server
Logical Name Space
Latency Management
Data Transport
Metadata Transport
Storage Abstraction
Catalog Abstraction
Databases DB2, Oracle, Sybase, SQLServer,
Informix
Servers
HRM
5
Current Implementations
  • NSF National Science Digital Library
  • Persistent archive of material retrieved from web
    crawls
  • Map each digital entity onto the logical name
    space
  • Keep the original URL as an attribute
  • California Digital Library
  • Snapshot of federal agency web sites
  • Applied same technology (Storage Resource Broker)
  • Preserved 17 million digital entities, 1.5 TBs of
    data
  • Retrieved 233 different MIME types
  • Observed retrieval error rate of 3.68 (mostly
    file not found)

6
Current Implementations
  • NARA Prototype Persistent Archive
  • Distributed between NARA, U Maryland, SDSC
  • Built using Grid Bricks, Storage Resource Broker
  • Migrated the EAP image collection (1.2 TBs) from
    obsolete system
  • NHPRC Persistent Archive Testbed
  • Proposed system to federate archives between
  • U. Michigan (electronic records),
  • Ohio Historical Society (E-mail),
  • Kentucky State Department for Libraries and
    Archives (web collection),
  • Minnesota Historical Society (land use records),
  • Stanford Linear Accelerator (electronic documents)

7
Storage Resource Broker at SDSC
50 Terabytes and Counting
8
Persistent Archives Built from Grid Bricks
  • Grid Brick
  • Commodity based disk storage
  • Minimize cost by using data grid replication for
    reliability
  • Grid Brick management via data grid
  • Logical name space for unifying access across
    multiple grid bricks
  • User authentication through data grid
  • User authorization through data grid

9
Data Grid Brick
  • Components
  • Intel Celeron 1.7 GHz CPU
  • SuperMicro P4SGA PCI Local bus ATX mainboard
  • 1 GB memory (266 MHz DDR DRAM)
  • 3Ware Escalade 7500-12 port PCI bus IDE RAID
  • 10 Western Digital Caviar 200-GB IDE disk drives
  • 3Com Etherlink 3C996B-T PCI bus 1000Base-T
  • Redstone RMC-4F2-7 4U ten bay ATX chassis
  • Linux operating system
  • Cost is 2,200 per Tbyte plus tax
  • Gig-E network switch costs 500 per brick
  • Effective cost is about 2,700 per Tbyte

10
SDSC SRB Team - Data R Us -)
  • Camera-shy
  • Wayne Schroeder
  • Vicky Rowley (BIRN)
  • Lucas Gilbert
  • Marcio Faerman (SCEC)
  • Students emeritus
  • Erik Vandekieft
  • Reena Mathew
  • Xi (Cynthia) Sheng
  • Allen Ding
  • Grace Lin
  • Qiao Xin
  • Antoine De Torcy
  • Daniel Moore

11
For More Information
  • Reagan W. Moore
  • San Diego Supercomputer Center
  • moore_at_sdsc.edu
  • http//www.npaci.edu/DICE/
  • http//www.npaci.edu/DICE/SRB/
  • http//www.npaci.edu/dice/srb/mySRB/mySRB.html
Write a Comment
User Comments (0)
About PowerShow.com