Storage Resource Broker - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Storage Resource Broker

Description:

San Diego Supercomputer Center (SDSC) was founded in 1985. ... Reagan Moore. Michael Wan. Arcot Rajasekar. George Kremenek. Bing Zhu. Sheau-Yen Chen. Charles Cowart ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 38
Provided by: gridfor
Category:

less

Transcript and Presenter's Notes

Title: Storage Resource Broker


1
Storage Resource Broker
A solution for Managing Distributed Data in a Grid
  • Bing Zhu
  • San Diego Supercomputer Center
  • Dec 10, 2002 Korea

SDSC/UCSD/NPACI
2
San Diego Supercomputer Center
  • San Diego Supercomputer Center (SDSC) was founded
    in 1985. It is a research unit of University of
    California at San Diego with a staff of over 400
    scientists, software developers and research
    support personnel.
  • Working with researchers from 350 Institutions
    and 50 industrial partners, SDSC focuses on five
    program areas.
  • Integrative Biosciences
  • Data and Knowledge Systems (SRB, )
  • Environmental Sciences
  • High-end Computing and Communications
  • Grid and Cluster Computing

3
Participating Institutions in PRAGMA
  • Australia Partnership for Advanced Computing and
    its many partners including Monash University and
    Sydney Visualization Lab
  • Bioinformatics Institute of Singapore
  • Computer Network Information Center, CAS
  • Global Scientific Information and Computing
    Center, Tokyo Institute of Technology
  • Grid Technology Research Center and Tsukuba
    Advanced Computing Center, National Institute of
    Advanced Industrial Science and Technology
  • Korea Institute of Science and Technology
    Information
  • National Center for High Performance Computing,
    National Science Council
  • Research Center for Ultra-High Voltage Electron
    Microscopy and the Cybermedia Center, Osaka
    University
  • STAR TAP/StarLight initiative, supported by NSF
    and organized by the University of Illinois at
    Chicago, Northwestern University and Argonne
    National Laboratory
  • Thai Social/Scientific Academic and Research
    Network (ThaiSARN-3), National Electronics and
    Computer Technology Center
  • TransPAC initiative, supported by NSF at Indiana
    University
  • Universitis Sains Malaysia
  • UCSD SDSC, CalIT2, CRBS
  • University of Hyderabad

4
Topics
  • Data Grids, Digital Libraries, Persistent
    Archives
  • Case studies
  • SRB Software Architecture
  • SRB interface
  • Authentication
  • Logical Name Space
  • File Replications
  • Data Container
  • Metadata

5
Digital Libraries
  • Provide services on the data collection
  • Ingestion, loading of attribute values
  • Extensibility, definition of new attributes
  • Discovery, queries on attributes
  • Browsing, hierarchical listing
  • Presentation, formatting specified data models
  • Communities
  • Digital library
  • Global Grid Forum, Databases and the Grid working
    group
  • OMG, Common Warehouse Metamodel

6
Data Grids
  • Manage data in a distributed environment
  • Logical name space, provide global identifier
  • Data access, storage system abstraction
  • Replication, disaster back up
  • Uniform access, common API across file systems,
    archives, and databases
  • Single sign-on, authenticate across
    administration domains
  • Communities
  • Global Grid Forum, data grids
  • Discipline specific data management systems

7
Persistent Archives
  • Manage technology evolution
  • Storage system abstraction, support data
    migration across storage systems
  • Information repository abstraction, support
    catalog migration to new databases
  • Logical name space, support global persistent
    identifier
  • Communities
  • Persistent archive community
  • Global Grid Forum, Persistent archive working
    group

8
SRB A Data Grid Solution
  • Storage Resource Broker (SRB)
  • Federated client-server system that integrates
    distributed heterogeneous resources using an
    uniform interface
  • Provides a simple tool to integrate data and
    metadata handling attribute-based access
  • Blends browsing and searching
  • Developed at SDSC
  • - operational for 4 years
  • - brokering over 40 TeraBytes

9
Case Study Hayden Planetarium
10
Data involved in Hayden
  • ISM Interstellar Medium Simulation
  • run by Mordecai Mac Low of AMNH at NCSA
  • 2.5 Terabytes sent from NCSA to SDSC.
  • Data stored in SRB (HPSS, GPFS).
  • Ionization
  • Simulation run at AMNH
  • 117 Gigabytes sent from AMNH to SDSC.
  • Data stored in SRB.
  • Star motion
  • Simulation run at AMNH by Ryan Wyatt
  • 38 Megabytes sent from AMNH to SDSC.
  • Rendering Movies
  • Intermediate Steps produced 7.5 Terabytes.
  • Data stored in SRB (SDSC, CalTech)

11
Case Study SRB in BIRN
BIRN Toolkit
Queries/Results
Applications
Data Management
Collaboration
Viewing/Visualization
Mediator
GridPort
Grid Management
Data Model
Database
Scheduler
Database
Data Grid
Computational Grid
NMI
MCAT
Globus
SRB
Data Access
HPSS
File System
Distributed Resources
12
Using a Data Grid in Abstract
Data Grid
  • User asks for data from the data grid

13
Using a Data Grid - Details
  • User asks for data
  • Data request goes to SRB Server
  • Server looks up data in catalog
  • Catalog tells which SRB server has data
  • 1st server asks 2nd for data
  • The data is found and returned

14
Using a Data Grid - Details
DB
MCAT
SRB
SRB
SRB
SRB
SRB
SRB
  • Data Grid has arbitrary number of servers
  • Complexity is hidden from users

15
SDSC Storage Resource Meta-data Catalog
16
SRB Interface
Application
Application
MCAT Core
SRB Master
SRB Agent
SRB Server
MCAT
Dublin Core
Eco Core
SRB Server
SRB Server
17
Federated SRB Operation
Peer-to-peer Brokering
Read Application in Boston
Parallel Data Access
Logical Name Or Attribute Condition
1
6
5/6
SRB server
3
SRB server
4
SRB agent
5
SRB agent
Durham
2
San Diego
Server(s) Spawning
R2
R1
MCAT
1.Logical-to-Physical mapping 2. Identification
of Replicas 3.Access Audit Control
Data Access
R2
18
SRB Concepts
  • Abstraction of Data and Collections Virtual
    Data Organization
  • Virtual Collections Persistent Identifier and
    Global Name Space
  • Organization independent of physical location
  • Virtual Data Management
  • Replication Segmentation
  • Data Aggregation Containers
  • Seamless Cache Management and Data Placement
  • Metadata Data Discovery semantic linking
  • System Metadata - metadata needed to run a data
    grid
  • User-defined Metadata Structural Descriptive
  • Application, Schema-based, Domain-centric
  • extensible and dynamic
  • Attribute-based Access (path names become
    irrelevant)

19
SRB Concepts (Continued)
  • Abstraction of User Space Global User Space
  • Single sign-on Seamless Authorization
  • Certificates, (secure) passwords, tickets, group
    permissions, roles
  • Abstraction of Methods
  • APIs, Command Line, GUI Browsers, Web-Access
    (Portal,WSDL, CGI)
  • Parallel Access with both Client and
    Server-driven strategies
  • Fault-tolerant and Reliable data management
  • Proxy and Remote Operations
  • Abstraction of Resources - Resource
    Virtualisation
  • Resource Location, Type Access transparency
  • Logical Resource Definitions bundling

20
SRB Features
  • Resource Transparency - uniform access interface
  • Location Transparency - logical naming in
    collections
  • Cross-Domain Authentication - single account
    management
  • Rich Access Control - Users, Groups, Resources
  • User Transparency
  • Uniform User Name Space
  • Replicated Data
  • fault tolerant
  • Data Discovery
  • User-defined Metadata
  • Multiple Access Methods
  • GUI, API, Command-line
  • Supported on Heterogeneous
    Platforms
  • Unix, Windows, Linux

SRB
DL
21
Authentication Management
  • GSI
  • Encrypted Password
  • GSS-API for Kerberos or DCE
  • Collection-owned Data
  • Collection ID installed at each storage system
  • Users authenticate themselves to the SRB
  • SRB authenticates to local server

22
Virtual Hierarchical Collection Management
23
Attributes
  • SRB metadata
  • Location, protocol
  • Unix semantics
  • Authorization, authentication
  • Latency management
  • Container aggregation
  • Administrative
  • Dublin core, provenance
  • Annotations, comments
  • Discipline specific attributes
  • Collection
  • User defined

24
Logical File Name
One of the major functions of SRB is the
mapping between a logical file name and its
physical file. The mapped info of a logical
filename includes
  • Location of name in collection hierarchy
  • Physical file location host name and path
  • Protocol for fetching local file
  • Unix semantics for file manipulation
  • Location in container
  • Audit trail
  • Access control list
  • Locking status

25
Digital Entities
Digital entities which can be registered into a
logical name space are
  • Files/directories
  • outside Files Directories (shadow links)
  • URLs
  • SQL commands
  • Relational databases

26
Data Access Control
  • Access Control for Data Collections
  • User level access control
  • Domain level access control
  • Group level access control
  • Ticket based data access
  • Multi-level Access
  • Read, Annotate,Write, Curate, Own
  • Audit Access

27
Storage Resource Management
  • Plug-and-play model
  • Well-defined framework for developing a new
    driver for a new storage system
  • Easy registration of a new resource into SRB
  • Each physical resource has a logical name which
    is mapped to host name and resource type.
  • Resource Access Control r/w permission for user,
    domain, or group

28
Replica Management
  • Files can be replicated into any valid physical
    storage resource registered in SRB.
  • Each replica is managed by the same logical
    filename as the original one and a unique
    replication number. Each replica can have unique
    metadata.
  • 1-to-many Replication A logical resource can
    contain several physical storage resources.
  • Multiple replicas can be made to the same storage
    resource
  • Many Modes of Replication
  • Synchronous Replication
  • Asynchronous Replication - Offline
  • Out of Band Replication - outside SRB

29
Containers
  • Initially were designed for storing many small
    files in HPSS
  • Physical Grouping of Objects
  • Similar to tar but has significant differences
  • Multiple Uses
  • To take advantage of resource characteristics
  • To aid access patterns
  • Move data sets together
  • Tie together logically different files
  • Automatic Archiving/Caching
  • Chaining of Containers
  • Sharing of metadata
  • Containers for Collections

30
Metadata Management
  • Metadata Insertion Through User Interfaces
  • Bulk Metadata Insertion from XML files
  • Template Based Metadata Extraction
  • Metadata Search
  • system data
  • user-defined metadata
  • File Content Search Key words are pre-extracted
    by a template and saved as user-defined metadata.

31
Database Access Interface (DAI)
  • Facility to access tabular data using SRB API
  • View SQL queries as Locators (Path Names or URI)
  • Apply open, close, read, write operations
  • Provide for very general queries to specific
    queries
  • any query on a database to soft queries to
    hard-coded queries
  • Access Result Table as a Stream
  • Provide Server-side operations to present results
  • Forms, HTML, XML,
  • Data Wetting, Charting, Visualization
  • Multi-modal Ingestion
  • SQL ingestion
  • Packed Ingestion - useful in data movement and
    replication
  • Directly ingest data marked by HTML, XML, ...

32
Software SRB
  • Federated Server Architecture
  • Uniform Access Interface Thread-safe Client
  • Programmatic API (C, C, Java, Perl-through-C,
    Python)
  • GUI (Java for Unix, Windows Browser inQ NT,
    Me, 98, 2000 )
  • Web Support (CGI-Scripts, Portals )
  • Command Line Interface (Unix, DOS)
  • Metadata Catalog (Oracle, DB2, Sybase, SQLServer)
  • Handles transparencies, authentication, access
    control, replication, container support,
  • User-defined Metadata
  • Multi-Platform Support
  • Unix, Linux, Windows, MacOSX (from Cray to
    Desktop)
  • HPSS, ADSM, UniTree, , UnixFS, NTFS,, Oracle,
    DB2,

33
SRB Software (continued)
  • Command-line Executables (Sls, Sput, Sget,
    ,Schmod, Smeta, etc)
  • available in UNIX, Linux, Windows
  • for large amount data uploading
  • can be used build other script-based application
    such as
  • GridPortal
  • InQ A SRB Windows Browser
  • srbBrowser A Java browser using JNI in UNIX
  • MySRB A Web Interface Browser
  • Client Libraries
  • libSrbClient.a in UNIX/Linux platform
  • srbClient.dll for Windows
  • srb.so the dynamic library with Python binding

34
SRB Projects
  • Digital Libraries
  • UCB, Umich, UCSB, Stanford,CDL
  • NSF NSDL - UCAR / DLESE
  • NASA Information Power Grid
  • Astronomy
  • National Virtual Observatory
  • 2MASS Project (2 Micron All Sky Survey)
  • Particle Physics
  • Particle Physics Data Grid (DOE)
  • GriPhyN
  • SLAC Synchrotron Data Repository
  • Medicine
  • Digital Embryo (NLM)
  • Earth Systems Sciences
  • ESIPS
  • LTER
  • Persistent Archives
  • NARA
  • LOC

Over 40 Tera Bytes in 6.6 million files
35
SRB Scalability
36
TeamSRB_at_SanDiego
  • Reagan Moore
  • Michael Wan
  • Arcot Rajasekar
  • George Kremenek
  • Bing Zhu
  • Sheau-Yen Chen
  • Charles Cowart
  • Arun Jagatheesan
  • Lucas Gilbert
  • Wayne Schroeder
  • Roman Olsachnowsky (BIRN)
  • Vicky Rowley (BIRN)

37
Contacts
  • For Additional Information
  • Web http//www.npaci.edu/dice/srb
  • Mail srb_at_sdsc.edu
Write a Comment
User Comments (0)
About PowerShow.com