Title: Storage Resource Broker
1Storage Resource Broker
A solution for Managing Distributed Data in a Grid
- Bing Zhu
- San Diego Supercomputer Center
- Dec 10, 2002 Korea
SDSC/UCSD/NPACI
2San Diego Supercomputer Center
- San Diego Supercomputer Center (SDSC) was founded
in 1985. It is a research unit of University of
California at San Diego with a staff of over 400
scientists, software developers and research
support personnel. - Working with researchers from 350 Institutions
and 50 industrial partners, SDSC focuses on five
program areas. - Integrative Biosciences
- Data and Knowledge Systems (SRB, )
- Environmental Sciences
- High-end Computing and Communications
- Grid and Cluster Computing
3Pacific Rim Applications and Grid Middleware
Assembly
Participating Institutions in PRAGMA
- Australia Partnership for Advanced Computing
and its many partners including Monash University
and Sydney Visualization Lab - Bioinformatics Institute of Singapore
- Computer Network Information Center, CAS
- Global Scientific Information and Computing
Center, Tokyo Institute of Technology - Grid Technology Research Center and Tsukuba
Advanced Computing Center, National Institute of
Advanced Industrial Science and Technology - Korea Institute of Science and Technology
Information - National Center for High Performance Computing,
National Science Council
4Participating Institutions in PRAGMA(cont.)
- Research Center for Ultra-High Voltage Electron
Microscopy and the Cybermedia Center, Osaka
University - STAR TAP/StarLight initiative, supported by NSF
and organized by the University of Illinois at
Chicago, Northwestern University and Argonne
National Laboratory - Thai Social/Scientific Academic and Research
Network (ThaiSARN-3), National Electronics and
Computer Technology Center - TransPAC initiative, supported by NSF at
Indiana University - Universitis Sains Malaysia
- UCSD SDSC, CalIT2, CRBS
- University of Hyderabad
5Topics
- Data Grids, Digital Libraries, Persistent
Archives - Case studies
- SRB Software Architecture
- SRB interface
- Authentication
- Logical Name Space
- File Replications
- Data Container
- Metadata
6Digital Libraries
- Provide services on the data collection
- Ingestion, loading of attribute values
- Extensibility, definition of new attributes
- Discovery, queries on attributes
- Browsing, hierarchical listing
- Presentation, formatting specified data models
- Communities
- Digital library
- Global Grid Forum, Databases and the Grid working
group - OMG, Common Warehouse Metamodel
7Data Grids
- Manage data in a distributed environment
- Logical name space, provide global identifier
- Data access, storage system abstraction
- Replication, disaster back up
- Uniform access, common API across file systems,
archives, and databases - Single sign-on, authenticate across
administration domains - Communities
- Global Grid Forum, data grids
- Discipline specific data management systems
8Persistent Archives
- Manage technology evolution
- Storage system abstraction, support data
migration across storage systems - Information repository abstraction, support
catalog migration to new databases - Logical name space, support global persistent
identifier - Communities
- Persistent archive community
- Global Grid Forum, Persistent archive working
group
9SRB A Data Grid Solution
- Storage Resource Broker (SRB)
- Federated client-server system that integrates
distributed heterogeneous resources using an
uniform interface - Provides a simple tool to integrate data and
metadata handling attribute-based access - Blends browsing and searching
- Developed at SDSC
- operational for 4 years
- brokering over 40 TeraBytes
10Case Study Hayden Planetarium
11Data involved in Hayden
- ISM Interstellar Medium Simulation
- run by Mordecai Mac Low of AMNH at NCSA
- 2.5 Terabytes sent from NCSA to SDSC.
- Data stored in SRB (HPSS, GPFS).
- Ionization
- Simulation run at AMNH
- 117 Gigabytes sent from AMNH to SDSC.
- Data stored in SRB.
- Star motion
- Simulation run at AMNH by Ryan Wyatt
- 38 Megabytes sent from AMNH to SDSC.
- Rendering Movies
- Intermediate Steps produced 7.5 Terabytes.
- Data stored in SRB (SDSC, CalTech)
12Case Study SRB in BIRN
BIRN Toolkit
Queries/Results
Applications
Data Management
Collaboration
Viewing/Visualization
Mediator
GridPort
Grid Management
Data Model
Database
Scheduler
Database
Data Grid
Computational Grid
NMI
MCAT
Globus
SRB
Data Access
HPSS
File System
Distributed Resources
13Using a Data Grid in Abstract
Data Grid
- User asks for data from the data grid
14Using a Data Grid - Details
- Data request goes to SRB Server
- Server looks up data in catalog
- Catalog tells which SRB server has data
- 1st server asks 2nd for data
- The data is found and returned
15Using a Data Grid - Details
DB
MCAT
SRB
SRB
SRB
SRB
SRB
SRB
- Data Grid has arbitrary number of servers
- Complexity is hidden from users
16SDSC Storage Resource Meta-data Catalog
17SRB Interface
Application
Application
MCAT Core
SRB Master
SRB Agent
SRB Server
MCAT
Dublin Core
Eco Core
SRB Server
SRB Server
18Federated SRB Operation
Peer-to-peer Brokering
Read Application in Boston
Parallel Data Access
Logical Name Or Attribute Condition
1
6
5/6
SRB server
3
SRB server
4
SRB agent
5
SRB agent
Durham
2
San Diego
Server(s) Spawning
R2
R1
MCAT
1.Logical-to-Physical mapping 2. Identification
of Replicas 3.Access Audit Control
Data Access
R2
19SRB Concepts
- Abstraction of Data and Collections Virtual
Data Organization - Virtual Collections Persistent Identifier and
Global Name Space - Organization independent of physical location
- Virtual Data Management
- Replication Segmentation
- Data Aggregation Containers
- Seamless Cache Management and Data Placement
- Metadata Data Discovery semantic linking
- System Metadata - metadata needed to run a data
grid - User-defined Metadata Structural Descriptive
- Application, Schema-based, Domain-centric
- extensible and dynamic
- Attribute-based Access (path names become
irrelevant)
20SRB Concepts (Continued)
- Abstraction of User Space Global User Space
- Single sign-on Seamless Authorization
- Certificates, (secure) passwords, tickets, group
permissions, roles - Abstraction of Methods
- APIs, Command Line, GUI Browsers, Web-Access
(Portal,WSDL, CGI) - Parallel Access with both Client and
Server-driven strategies - Fault-tolerant and Reliable data management
- Proxy and Remote Operations
- Abstraction of Resources - Resource
Virtualisation - Resource Location, Type Access transparency
- Logical Resource Definitions bundling
21SRB Features
- Resource Transparency - uniform access interface
- Location Transparency - logical naming in
collections - Cross-Domain Authentication - single account
management - Rich Access Control - Users, Groups, Resources
- User Transparency
- Uniform User Name Space
- Replicated Data
- fault tolerant
- Data Discovery
- User-defined Metadata
- Multiple Access Methods
- GUI, API, Command-line
- Supported on Heterogeneous
Platforms - Unix, Windows, Linux
SRB
DL
22Authentication Management
- GSI
- Encrypted Password
- GSS-API for Kerberos or DCE
- Collection-owned Data
- Collection ID installed at each storage system
- Users authenticate themselves to the SRB
- SRB authenticates to local server
23Virtual Hierarchical Collection Management
24Attributes
- SRB metadata
- Location, protocol
- Unix semantics
- Authorization, authentication
- Latency management
- Container aggregation
- Administrative
- Dublin core, provenance
- Annotations, comments
- Discipline specific attributes
- Collection
- User defined
25Logical File Name
One of the major functions of SRB is the
mapping between a logical file name and its
physical file. The mapped info of a logical
filename includes
- Location of name in collection hierarchy
- Physical file location host name and path
- Protocol for fetching local file
- Unix semantics for file manipulation
- Location in container
- Audit trail
- Access control list
- Locking status
26Digital Entities
Digital entities which can be registered into a
logical name space are
- Files/directories
- outside Files Directories (shadow links)
- URLs
- SQL commands
- Relational databases
27Data Access Control
- Access Control for Data Collections
- User level access control
- Domain level access control
- Group level access control
- Ticket based data access
- Multi-level Access
- Read, Annotate,Write, Curate, Own
- Audit Access
28Storage Resource Management
- Plug-and-play model
- Well-defined framework for developing a new
driver for a new storage system - Easy registration of a new resource into SRB
- Each physical resource has a logical name which
is mapped to host name and resource type. - Resource Access Control r/w permission for user,
domain, or group
29Replica Management
- Files can be replicated into any valid physical
storage resource registered in SRB. - Each replica is managed by the same logical
filename as the original one and a unique
replication number. Each replica can have unique
metadata. - 1-to-many Replication A logical resource can
contain several physical storage resources. - Multiple replicas can be made to the same storage
resource - Many Modes of Replication
- Synchronous Replication
- Asynchronous Replication - Offline
- Out of Band Replication - outside SRB
30Containers
- Initially were designed for storing many small
files in HPSS - Physical Grouping of Objects
- Similar to tar but has significant differences
- Multiple Uses
- To take advantage of resource characteristics
- To aid access patterns
- Move data sets together
- Tie together logically different files
- Automatic Archiving/Caching
- Chaining of Containers
- Sharing of metadata
- Containers for Collections
31Metadata Management
- Metadata Insertion Through User Interfaces
- Bulk Metadata Insertion from XML files
- Template Based Metadata Extraction
- Metadata Search
- system data
- user-defined metadata
- File Content Search Key words are pre-extracted
by a template and saved as user-defined metadata.
32Database Access Interface (DAI)
- Facility to access tabular data using SRB API
- View SQL queries as Locators (Path Names or URI)
- Apply open, close, read, write operations
- Provide for very general queries to specific
queries - any query on a database to soft queries to
hard-coded queries - Access Result Table as a Stream
- Provide Server-side operations to present results
- Forms, HTML, XML,
- Data Wetting, Charting, Visualization
- Multi-modal Ingestion
- SQL ingestion
- Packed Ingestion - useful in data movement and
replication - Directly ingest data marked by HTML, XML, ...
33Software SRB
- Federated Server Architecture
- Uniform Access Interface Thread-safe Client
- Programmatic API (C, C, Java, Perl-through-C,
Python) - GUI (Java for Unix, Windows Browser inQ NT,
Me, 98, 2000 ) - Web Support (CGI-Scripts, Portals )
- Command Line Interface (Unix, DOS)
- Metadata Catalog (Oracle, DB2, Sybase, SQLServer)
- Handles transparencies, authentication, access
control, replication, container support, - User-defined Metadata
- Multi-Platform Support
- Unix, Linux, Windows, MacOSX (from Cray to
Desktop) - HPSS, ADSM, UniTree, , UnixFS, NTFS,, Oracle,
DB2,
34SRB Software (continued)
- Command-line Executables (Sls, Sput, Sget,
,Schmod, Smeta, etc) - available in UNIX, Linux, Windows
- for large amount data uploading
- can be used build other script-based application
such as - GridPortal
- InQ A SRB Windows Browser
- srbBrowser A Java browser using JNI in UNIX
- MySRB A Web Interface Browser
- Client Libraries
- libSrbClient.a in UNIX/Linux platform
- srbClient.dll for Windows
- srb.so the dynamic library with Python binding
35SRB Projects
- Digital Libraries
- UCB, Umich, UCSB, Stanford,CDL
- NSF NSDL - UCAR / DLESE
- NASA Information Power Grid
- Astronomy
- National Virtual Observatory
- 2MASS Project (2 Micron All Sky Survey)
- Particle Physics
- Particle Physics Data Grid (DOE)
- GriPhyN
- SLAC Synchrotron Data Repository
- Medicine
- Digital Embryo (NLM)
- Earth Systems Sciences
- ESIPS
- LTER
- Persistent Archives
- NARA
- LOC
Over 40 Tera Bytes in 6.6 million files
36SRB Scalability
37TeamSRB_at_SanDiego
- Reagan Moore
- Michael Wan
- Arcot Rajasekar
- George Kremenek
- Bing Zhu
- Sheau-Yen Chen
- Charles Cowart
- Arun Jagatheesan
- Lucas Gilbert
- Wayne Schroeder
- Roman Olsachnowsky (BIRN)
- Vicky Rowley (BIRN)
38Contacts
- For Additional Information
- Web http//www.npaci.edu/dice/srb
- Mail srb_at_sdsc.edu