Storage Resource Broker Persistent Management of Distributed Data - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Storage Resource Broker Persistent Management of Distributed Data

Description:

2 Micron All Sky Survey (NPACI) DPOSS Collection (NSF-NVO) ... for Cell Signaling (NIH) NeuroSciences ... National Archives and Records Administration ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 30
Provided by: greggr8
Learn more at: http://www.sdsc.edu
Category:

less

Transcript and Presenter's Notes

Title: Storage Resource Broker Persistent Management of Distributed Data


1
Storage Resource Broker Persistent
Management of Distributed Data
Reagan W. Moore General Atomics, Inc. San Diego
Supercomputer Center moore_at_sdsc.edu http//www.nir
vanastorage.com
2
Topics
  • Data management systems
  • Data collections, digital libraries
  • Distributed data management
  • Data grids
  • Persistent data management
  • Persistent archives
  • Common infrastructure for data management

3
Data Collections
  • Astronomy
  • CACR Computing Resource (NPACI)
  • National Virtual Observatory (NSF)
  • 2 Micron All Sky Survey (NPACI)
  • DPOSS Collection (NSF-NVO)
  • Hayden Planetarium
  • Ecology and Environmental Sciences
  • CEED (NPACI)
  • Bionome
  • HyperLTER (NPACI)
  • Land Data Assimilation System
  • Knowledge Networks for BioComplexity (NSF)
  • Medical Sciences
  • Digital Embryo (NLM)
  • Molecular Sciences
  • JCSG, Synchrotron Data Repository (NSF)
  • AFCS, Alliance for Cell Signaling (NIH)
  • NeuroSciences
  • Biomedical Information Research Network (NIH)

4
Data Collections
  • Physics and Chemistry
  • PPDG, Particle Physics Data Grid (DOE)
  • GriPhyN (NSF)
  • BaBar (DOE)
  • GAMESS (NPACI)
  • Digital Libraries and Archives
  • SIO Digital Libraries (NSF)
  • California Digital Library
  • ADEPT (NSF)
  • Stanford Digital Library Project (NSF)
  • National Archives and Records Administration
    (NARA)
  • Data Grids
  • ROADNet, Real-time Observatories App.and Data
    management
  • E-Science at CLRC, UK Grid Starter Kit (UK)
  • Library of Congress data grid
  • DOE ASCI Data Visualization Corridor
  • NASA Information Power Grid
  • DOE SciDAC - Portal Web Services
  • NPACI Portal Projects

5
Data Collections
  • Define the context for describing a collection of
    digital entities
  • Context specified by metadata attributes
  • Provenance, origin of the digital entities
  • Administrative, location of the digital entities
  • Technical, purpose of the digital entities
  • Support organization of attributes as hierarchy
    of sub-collections

6
Digital Libraries
  • Provide services on the data collection
  • Ingestion, loading of attribute values
  • Extensibility, definition of new attributes
  • Discovery, queries on attributes
  • Browsing, hierarchical listing
  • Presentation, formatting specified data models

7
Data Grids
  • Manage data in a distributed environment
  • Logical name space, provide global identifier
  • Data access, storage system abstraction
  • Replication, disaster back up
  • Uniform access, common API across file systems,
    archives, and databases
  • Single sign-on, authenticate across
    administration domains

8
Persistent Archives
  • Manage technology evolution
  • Storage system abstraction, support data
    migration across storage systems
  • Information repository abstraction, support
    catalog migration to new databases
  • Logical name space, support global persistent
    identifier

9
SRB
  • Integration of collection-based management of
    digital entities, with
  • Remote data access through storage system
    abstraction
  • Catalog access through information repository
    abstraction
  • Automation through collection-owned data

Storage Resource Broker
10
Capabilities
  • Support legacy systems
  • Integrate archives with file systems
  • Share distributed data
  • Maintain persistent collection
  • Control data access

11
Digital Entities
  • Digital entities are images of reality made of
  • Data, the bits (zeros and ones) put on a storage
    system
  • Information, the attributes used to assign
    semantic meaning to the data
  • Knowledge, the structural relationships described
    by a data model
  • Every digital entity requires information and
    knowledge to correctly interpret and display

12
Digital Entities
  • Files
  • Text documents, images, spread sheets, binary
    files
  • URLs
  • Database query commands
  • Databases
  • Directories

13
Digital Entities
  • Register digital entities into a catalog
  • Assign metadata to describe each digital entity
  • Separate management of the associated data bits
    from management of the metadata
  • Support manipulation of each digital entity data
    type

14
Technology Management
New Application
New Operating System
Wrap Storage System
Wrap Display System
Old Storage System
Old Display System
Migrate Encoding Format
Digital Object
15
Preservation of Data
  • Migration
  • Preserve the data bits
  • Preserve the digital entity name
  • Preserve the information and knowledge content
    for presentation by new applications

16
Migration Advantages
  • By migrating the digital entity encoding format
    to new standards, more sophisticated technologies
    can be applied to express the information and
    knowledge content inherent in collections of
    digital entities.
  • Requires the ability to associate data model with
    digital entity

17
Uniform API
  • Provide common access semantics
  • Map from the interface preferred by your
    application to the interfaces required by legacy
    storage systems

18
SRB and MCAT
Uniform APIs
Application
Linux I/O
Web WSDL
DLL / Python
Java, NT Browsers
GridFTP
Access APIs

Consistency Management / Authorization-Authenticat
ion
Prime Server
Logical Name Space
Latency Management
Data Transport
Metadata Transport
Storage Abstraction
Catalog Abstraction
Databases DB2, Oracle, Sybase
Servers
HRM
19
Discovery Transparencies
  • Naming transparency - find a data set without
    knowing its name
  • Map from attributes to a global file name
  • Location transparency - access a data set without
    knowing where it is
  • Map from global file name to local file name
  • Access transparency - access a data set without
    knowing the type of storage system
  • Federated client-server architecture

20
SRB and MCAT
Transparencies
Application
Linux I/O
Web WSDL
Access APIs
DLL / Python
Java, NT Browsers
GridFTP

Consistency Management / Authorization-Authenticat
ion
Prime Server
Logical Name Space
Latency Management
Data Transport
Metadata Transport
Storage Abstraction
Catalog Abstraction
Databases DB2, Oracle, Sybase
HRM
Servers
21
Persistent Collection
  • Maintain authenticity
  • Authenticate all accesses
  • Assign roles for access control lists (curation,
    write, annotate, read)
  • Manage audit trails of all operations
  • Collection-owned data
  • All accesses through the data management system

22
SRB and MCAT
Persistency
Access APIs
Prime Server
Servers
23
Preservation
  • Name transparency
  • Find a file by attributes (map from attributes to
    global name)
  • Location transparency
  • Access a file by a global identifier (map from
    global to local file name)
  • Access transparency
  • Use same API to access data in archive or file
    cache
  • Authenticity
  • Disaster recovery, replicate data across storage
    systems
  • Audit and process management

(Similar requirements to a data grid)
24
SRB MCAT
Preservation
Application
Linux I/O
Web WSDL
DLL / Python
Access APIs
Java, NT Browsers
GridFTP

Consistency Management / Authorization-Authenticat
ion
Prime Server
Logical Name Space
Latency Management
Data Transport
Metadata Transport
Storage Abstraction
Catalog Abstraction
Databases DB2, Oracle, Sybase
HRM
Servers
25
Technology Convergence
  • Data grids as basis for distributed data
    management
  • Federation of distributed resources
  • Creation of logical name space to automate
    discovery
  • Distributed data collections
  • Discovery based on attributes
  • Distributed data storage systems
  • Digital libraries
  • Development of services for manipulating, viewing
    data
  • Persistent archives
  • Management of technology evolution

26
Data Naming Ontologies
27
Knowledge Creation
  • Knowledge syntax
  • (consensus)
  • RDF, XMI, Topic Map
  • Knowledge management
  • (recursive operations)
  • Oracle parallel database
  • Knowledge manipulation
  • (spatial/procedural rules)
  • Generation of inference rules and mapping to data
    models
  • Knowledge generation
  • (scalable inference engine)
  • Application of inference rules in inference
    engine

28
Knowledge-based Data Grid
Ingest Services
Management
Access Services
Knowledge or Topic-Based Query / Browse
Knowledge Repository for Rules
Relationships Between Concepts
Knowledge
XTM DTD
Rules - KQL
(Model-based Access)
Information Repository
Attribute- based Query
Attributes Semantics
SDLIP
Information
XML DTD
(Data Handling System - SRB)
Data
Fields Containers Folders
Storage (Replicas, Persistent IDs)
MCAT/HDF
Grids
Feature-based Query
29
Reagan W. Moore General Atomics San Diego
Supercomputer Center moore_at_sdsc.edu http//www.nir
vanastorage.com
Write a Comment
User Comments (0)
About PowerShow.com