Title: SRB
1SRB SRBRackComponents of a Virtual Data Grid
Architecture
Arcot (Raja) Rajasekar Mike Wan San Diego
Supercomputer Center sekar,mwan_at_sdsc.edu
2Why do we need Data Grids?
- Data is distributed
- Computation is done remotely
- Distributed Computation
- Large-scale Data Movement
- Sharing of Data across Security Realms
- Large Data stored beyond computation
- Need to access multiple data sets
3What are Data Grids?
- Power Grid Analogy
- Multiple power generators
- Complex transmission networks with switching
- Simple Usage Interface plug and play
- Guaranteed Supply - Meeting of demands (peak and
lull) - Complex cost function
- More than one data provider
- Best movement of data across computer networks
- Seamless Access to Data with good Finding Aids
- Guarantee of Data Access
- Access Control, Quotas Complex Usage Costing
4Hayden Planetarium
NCSA
SGI
NY
AMNH NYC
2.5 TB UniTree
visualization
data simulation
SDSC
CalTech
GPFS 7.5 TB
IBM SP2
BIRN
HPSS 7.5 TB
UVa
rendering
5Data involved in Hayden
- ISM Interstellar Medium Simulation
- run by Mordecai Mac Low of AMNH at NCSA
2.5 Terabytes sent from NCSA to SDSC. Data
stored in SRB (HPSS, GPFS). - Ionization
- Simulation run at AMNH
- 117 Gigabytes sent from AMNH to SDSC. Data
stored in SRB. - Star motion
- Simulation run at AMNH by Ryan Wyatt
38 Megabytes sent from AMNH to SDSC. - Rendering Movies
- Intermediate Steps produced 7.5 Terabytes. Data
stored in SRB (SDSC, CalTech)
6Data Grid Requirements
- Functional
- Logical
- Service-oriented
- Administrative
- Physical
7Data Grid Requirements 1(Functional)
- Seamless Access
- Scale in Size Number
- Guaranteed Delivery
- Fault tolerance, load sharing
- Replication, Consistency Maintenance
- Handle Heterogeniety Multiplicity
- Platforms systems, vendors, types of storage,
types of services, types of processes users - Controlled Data Movement
- Demand-driven Data placement
- Caching, archiving, version and locks
- Third-party data movement
- Parallel data transfer
- Server-driven or client-controlled
8Data Grid Requirements 2 (Logical)
- Handle Autonomous Authentication
- Multiple Authentication Realms single sign-on
- Uniform user name space
- Authorization Access Control
- Seamless One-stop authorization
- Roles Tickets inheritance longevity
- Virtual Data Organization
- Data Location Independence
- Uniform data name space, persistent identifiers
- Collections Hierarchy
- Integrate with Metadata
- finding aids complex querying browsing
- System, user-defined, domain-specific,
application - Access Control for Metadata
9Data Grid Requirements 3 (Service-oriented)
- Data Services
- Third party services
- Web- accessibility (HTTP GET, WSDL, SOAP)
- Language API
- Computational Grid Interaction - Globus
- Examples
- Ingestion, Certification and Authenticity
- Value-added integration
- Server-side Operations
- Close-to-data
- Proxy-operation (security/access considerations)
- Bulk Operations - batch
- JIT operations interactive or on-demand
- Seamless Chaining and Composition
- Examples
- Data Filtering Eg. Data Cutter
- Format Conversion Eg. Thumbnail creation
- Metadata Extraction
10Data Grid Requirements -4 (Administrative)
- Virtual Administration
- Single-point administration
- Autonomous local control
- Multiple-levels of administrations
- Roles and Responsiibilities
- Policy Management
- Distributed Caching, Archiving, Replication
Data Placement - Locking, Pinning, BackUp
- Data Movement
- Preferences, Priorities Administration
- Auditing, Quotas, Pricing
11Data Grid Requirements 5(Physical)
- Storage
- Hierarchical Storage Systems, Tapes, Disks, SAN,
NAS, NFS, Databases, FTP servers, HTTP servers,
WSDL services, - Integration on Device Characteristics
- Storage Bricks
- Distributed Cluster Storage
- Network
- Characteristics
- NWS
- Guaranteed Service
12SRB A Data Grid Solution
- The Storage Resource Broker is a Middleware
- It provides uniform access to data in
heterogeneous resources - It uses a MetaCatalog to facilitate the brokering
MCAT
Application
SRB Server
HRM DB2, Oracle, Illustra, ObjectStore
HPSS, ADSM, UniTree
UNIX, NTFS, HTTP, FTP
13SRB Concepts
- Abstraction of User Space
- Single sign-on
- Multiple authentication schemes
- certificates, passwords, tickets, group
permissions, roles - Virtualization of Resources
- Resource Location, Type Access transperancy
- Logical Resource Definitions - bundling
- Abstraction of Data and Collections
- Virtual Collections Persistent Identifier and
Global Name Space - Replication Segmentation
- Data Discovery system application metadata
- User-defined Metadata Structural Descriptive
- Attribute-based Access (path names become
irrelevant) - Uniform Access Methods
- APIs, Command Line, GUI Browsers, Web-Access
(Portal,WSDL, CGI) - Parallel Access with both Client and
Server-driven strategies
14Federated SRB Operation
Peer-to-peer Brokering
Read Application
Parallel Data Access
Logical Name Or Attribute Condition
1
6
SRB server
5/6
SRB server
3
4
5
SRB agent
SRB agent
2
Server(s) Spawning
R1
MCAT
1.Logical-to-Physical mapping 2. Identification
of Replicas 3.Access Audit Control
Data Access
R2
15SRB Rack
- Brick of Disk Storage
- 1TB to 4TB possibly more
- Linux-based
- Pluggable
- Self-organizing
- Each Brick may be internally RAIDed
- Processor for Data-side Operations
- High-speed Network Connection
16SRB Rack
17Biomedical Information Research Net
18Conclusion
- SRB provides
- a uniform interface to heterogeneous data
resources - logical name space management
- a replica management and mapping
- attribute-based data discovery access
- parallel access to replicated data
- SRB Rack
- Data brick which can be plugged into data grids
- Storage as a service component
- Easy Management lights out
- Collection-level RAIB