SRB Tutorial NPACI All Hands Meeting 1999 - PowerPoint PPT Presentation

1 / 77
About This Presentation
Title:

SRB Tutorial NPACI All Hands Meeting 1999

Description:

SAN DIEGO SUPERCOMPUTER CENTER. A National Laboratory for Computational Science & Engineering ... time-period (start and expire timestamps) number of access (count) ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 78
Provided by: sdsc
Category:

less

Transcript and Presenter's Notes

Title: SRB Tutorial NPACI All Hands Meeting 1999


1
SRB TutorialNPACI All Hands Meeting 1999
2
WWW
  • Exchange of information
  • specifically text, images and multi-media
  • hyper-links to navigate through documents
  • search engines for indexing
  • Not friendly for exchange of meta-information -
    yet
  • Not easy to integrate with computation

3
DATA
  • Data - any body of information that can be used
    for computation and communication
  • Scientific Data
  • data from experiments
  • images (scans)
  • genetic strings (DNA)
  • simulation

4
METADATA
  • Meta Data - data that qualifies data
  • information that captures the semantics of data
  • date of experiment, reactants used, result
    obtained,
  • useful for communication and computation
  • Example
  • Titanic
  • Leanordo DiCaprio
  • James Cameron

5
Data Handling System
  • Require knowledge of file name
  • Distributed file systems
  • Persistent object environments
  • Require special interface for data access
  • Database systems
  • Local solution with well-knowm file name
  • Data migration systems

6
Data-Intensive Computing
  • Support new modes of science
  • Enable analysis of very large data sets
  • Improve the ability to conduct science
  • Build discipline specific data collections
  • Build tools that decrease time needed to transfer
    information
  • Automate information discovery
  • Enable Information Based Computing

7
Information Based Computing
  • Enable information discovery from scientific
    applications
  • Metadata Catalogs
  • Enable data management and access to
    heterogeneous, distributed data sources
  • Storage Resource Brokers
  • Provide scalable systems, terabyte data access
    from petabyte archives
  • Parallel I/o

8
Common Middleware
  • Distributed computing environment for remote
    execution of procedures
  • Distributed data handling environment for access
    to archives, databases and file systems
  • Inter-realm authentication system
  • Distributed information discovery
  • Collaboration environment

9
Evolution of data Handling Environment
  • Tightly couple database / archival storage
  • Metadata catalog implemented to identify
    resources
  • Separation of data identification from dataset
    access
  • Separation of services from repositories to
    improve interoperability
  • Integration with digital library technology

10
What is SRB?
  • SRB is an Intelligent Data Access System
  • SRB provides federated access to datasets
  • SRB provides protocol transparency to diverse and
    distributed storage systems
  • SRB provides location transparency to distributed
    datasets
  • SRB provides access transparency to remote user

11
What is SRB?
  • Extends File Systems
  • Extends Database Systems
  • Extends I/O protocol
  • Extends WWW
  • Extends Digital Library Systems

12
SRB Concepts(1)
  • Provide Scalability
  • Hosts
  • Resource Types
  • Resources
  • Collections
  • Data Objects - size and number
  • Users Groups
  • Methods
  • MetaData

13
SRB Concepts(2)
  • Provide Logical Abstractions
  • srbSpace - an abstract storage space
  • Resource Types - resource defined by properties
  • Resources - resource identified by name and type
  • multiple resources tied together as a single
    resource
  • Collections - abstraction over directory
    structure
  • distributed curated
  • Datasets - identified by properties
  • Users - authenticated across hosts/networks
  • Domain - abstraction over physical domains
  • Metadata Schema/Attributes

14
SRB Concepts(3)
  • Provide Uniform Interfaces
  • Uniform API to Resources - archival, file and DB
  • Uniform API to federated Resources
  • Uniform Access to Collections Datasets
  • Uniform Authentication across SRB space

15
SRB Concepts(4)
  • Replication of Datasets
  • Access Control Lists
  • Ticket-based Access
  • Auditing
  • Authentication and Encryption (SEA)
  • Server-side proxy Operations
  • Metadata-based Discovery
  • Rich Interface - programmatic interactive

16
What is MCAT?
  • Cataloging System
  • Metadata Repository
  • Digital Object Metadata
  • type, format, lineage, usage methods,
    domain-specific attributes, collection info, etc
  • System-level Metadata
  • access control, audit trails, location,
    replication, resource types, user groups, etc
  • Schema-level Metadata
  • ontology, relationships among attributes/schemas,
    semantics of attributes, etc
  • Uniform Access and Federation interface

17
The Storage Resource Broker is Middleware
MCAT
Application (SRB client)
SRB Server
DB2, Oracle, Illustra, ObjectStore
HPSS, UniTree
UNIX, ftp
18
Software Architecture of the SRB
SRB Client
Resource
User
FILE SID DB SID
Object SID
SRB
MCAT
Dublin Core
Application Metadata
DB2
Oracle
Unix
ADSM
HPSS
19
The SRB Process Model
Application
(Host, port)
SRB Master
(port)
SRB agents
MCAT
20
Federated SRB Operation
Application
1
6
SRB server
SRB server
3
4
5
SRB agent
SRB agent
2
MCAT
21
SRB Space
SRB
SRB
SRB
SRB
SRB
SRB
SRB
DL
DR - Data Repository DL - Dig Library MC - Meta
Catalog CP - Comp Process/ SRB Client
SRB
SRB
SRB
22
SRB V1.0 Features
  • Multi-platform (clients and servers)
  • SunOS/Solaris, AIX, Cray C90, DEC OSF
  • API and command line interfaces
  • Low-level and high-level APIs
  • Storage systems supported
  • DB2, Illustra, Unitree, HPSS, UNIX
  • Support for federated servers
  • Released early September, 1997

23
SRB V1.1 Features
  • In beta in DOCT. To be released in January, 1998
  • Ported to additional platforms - SGI, Cray T3E
  • Incorporates the SDSC Encryption and
    Authentication (SEA) Library
  • Ticket-based access control
  • Graphical user interface - SRBTool
  • Additional storage systems supported
  • Oracle, Objectstore, ftp, http
  • Oracle-based MCAT
  • Support for proxy operations, e.g. move, copy,
    replicate
  • Data replication using Logical Storage Resource

24
New SRB Features
  • Java-based SRB browser
  • C API
  • SRBIO - C library for redirecting stdio
  • Proxy functions for meta data extraction
  • System Monitor for remote auto-startup
  • System Parameters stored in MCAT

25
MCAT Metadata Catalog
  • Stores metadata about
  • Users, Data sets, Resources, Methods
  • Provides collection abstraction
  • Stores detailed access control information
  • Maintains audit trail information on data sets
  • Implemented as a relational database with
    referential integrity constraints (currently uses
    DB2, ported to Oracle)

26
MCAT Architecture
MCAT Interface Functions
MAPS to Schema Convertor
Schema to MAPS Convertor
MAPS Initialization
MAPS Semantics
Answer Extractor Cursor Control
Dynamic Query Generator
Schema Initialization
Schema Semantics
Oracle Query System
DB2 Query System
27
Federated Catalog Architecture
MAPS
MCAT
CATALOG
Semantics Definitions
Local Routines
Internal Catalogs
External CATALOG Interface
CATALOG
MAPS Interface
Local Interface
Local Interface
CAT-2
CAT-1
Semantics Definitions
Semantics Definitions
Local Routines
CATALOG
CATALOG
Local Routines
28
New MCAT Features
  • Meta-Schema to hold System and User meta data
    schema information
  • Extensible meta data schema
  • Distributed meta data schema
  • Metadata exchange Interface Protocol
  • MAPS- Metadata Attribute Presentation Structure
  • query, update and result structures
  • Close to Z39.50

29
New MCAT Features (contd.)
  • Core Schema Implemented
  • MCAT Core - Data, Resources, Users and Methods
  • Dublin Core
  • IV Core - Image Visualization attributes
  • Web-based Prototype User Interface
  • extensible schema functions
  • query,, insert and update of meta data
  • integrated presentation of meta data and data

30
SRB Data Replication Support
  • Replication via Resource Set definition
  • Replication support integrated into write
    function
  • srbObjReplicate API can be used for post facto
    replication
  • Synchronous replication across all sites. Can
    choose any k out of n
  • Can choose specific replica on read operation

31
Data Replication (DOCT)
Application SAIC
MCAT
SDSC
SRB
SRB
SRB
Caltech
NCSA
LogRsrc1
LogRsrc2
HPSS
HPSS
Oracle
DB2
Unix
32
SEA(SDSC Encryption Authentication)
  • Developed as part of DOCT
  • Designed for Supercomputing/ MetaComputing
    Environment
  • Based On RSA Public/Private Keys and RC5
    Encryption Algorithm
  • Integrated into SRB
  • Being integrated into pftp hsi - for
    Remote HPSS Access

33
SEA Features
  • Secure User/Process Authentication Across Network
    (TCP Sockets)
  • Optional Encryption As Independent Function
  • Simple API
  • Batch Support - Long-term Certificates
  • Adjustable Key Lengths (Speed/Security Tradeoff)
  • User-Adjustable Encryption Levels (Speed/Security
    Tradeoff)
  • Multiple Initial User Registration Methods (Set
    By Administrator)
  • Self-Introduction
  • Trusted Host
  • Password
  • Available for Cray T90, C90, T3E, SunOS, Solaris,
    IRIX, OSF1, AIX, CS6400, NextStep
  • More Information http//www.sdsc.edu/schroede/se
    a.html

34
Ticket-based Access Control
  • Owner can request ticket for a data set
  • Ticket can be issued for a data set or a
    collection
  • Ticket controls access by
  • time-period (start and expire timestamps)
  • number of access (count)
  • user names ( any, single or group users)
  • Non-registered Users can also access using
    tickets
  • Useful for sharing data and access through the
    web
  • Tickets generated and stored in MCAT
  • Currently supports read-only tickets

35
SRB API
  • Programmatic API
  • High-level API
  • Low-level API
  • SRB Manager API
  • Command Level Interface - Scommands
  • Graphical User Interface - srbBrowser
  • Web Utilities

36
SRB API Interface
Application
MCAT
SRB Master
37
High Low-level API
  • Low-level API
  • talks to resource drivers
  • no registration of data sets in MCAT
  • no authentication through MCAT
  • User provides all information
  • High-level API
  • Uses low-level API to access resources
  • Registers data management information in MCAT
  • Uses MCAT for authentication and meta information
  • Uses MCAT for resource and data discovery
  • Access/store data in remote SRB

38
Low-level API
  • srbFileOpen(conn, storType, host, fileName, mode)
  • srbFileCreate(conn, storType, host, fileName,
    mode)
  • srbFileClose(conn, fd)
  • srbFileUnlink(conn, storType, host, fileName)
  • srbFileRead(conn, fd, buffer, length)
  • srbFileWrite(conn, fd, buffer, length)
  • srbFileSeek(conn, fd, offset, whence)
  • srbFileSync(conn, fd)
  • srbFileStat(conn, storType, host, fileName,
    statBuf)
  • srbFileMkdir(conn, storType, host, dirName, mode)
  • srbFileRmdir(conn, storType, host, dirName, mode)
  • srbFileChmod(conn, storType, host, fileName, mode)

39
Low-Level API (contd )
  • srbDbLobjOpen(conn, storType, resourceLoc,
    positionName, mode)
  • srbDbLobjCreate(conn, storType, resourceLoc,
    positionName, mode)
  • srbDbLobjClose(conn, dd)
  • srbDbLobjUnlink(conn, storType, host, fileName)
  • srbDbLobjRead(conn, dd, buffer, length)
  • srbDbLobjWrite(conn, dd, buffer, length)
  • srbDbLobjSeek(conn, dd, offset, whence)

40
High-level API
  • srbObjOpen(conn, objChar, mode, collectionName)
  • srbObjCreate(conn, objName, objType,
    resourceName, collectionName,
    pathName, size)
  • srbObjClose(conn, od)
  • srbObjUnlink(conn, objChar, collectionName)
  • srbObjRead(conn, od, buffer, length)
  • srbObjWrite(conn, od, buffer, length)
  • srbObjSeek(conn, od, offset, whence)
  • srbObjMove(conn, objChar, collectionName,
    newResourceName, newPathName)
  • srbObjReplicate(conn, objChar, collectionName,
    newResourceName, newPathName)
  • srbObjProxyOpr(conn, Operation, sourceDesc,
    targetDesc)

41
High-Level API (contd )
  • srbGetDatasetInfo(conn, objChar, collectionName,
    resultStruct, requiredNumber)
  • srbGetMoreInfo(resDesc, resultStruct,
    requiredNumber)
  • srbGetDataDirInfo(conn, conditionList,
    selectList, resultStruct)
  • srbModifyDataset(conn, objId, collectionName,
    newValue1, newValue2, modifyType, resourceName,
    pathName)
  • srbCreateCollect(conn, parentCollectionName,
    childCollectionName)
  • srbListCollect(conn, CollectionName, flag,
    resultStruct)
  • srbModifyCollect(conn, CollectionName, newValue1,
    newValue2, newValue2, modifyType)
  • srbModifyUser(conn, newValue1, newValue2,
    modifyType)
  • srbSetAuditTrail(conn, setValue)

42
System Manager API
  • srbChkMdasAuth(conn, userName, userAuth, domain)
  • srbChkMdasSysAuth(conn, userName, userAuth,
    domain)
  • srbRegisterUser(conn, userName, domain, password,
    userType, userAddress, userPhone, userEmail)
  • srbRegisterUserGrp(conn, userGrpName,
    userGrpPassword, userGrpType,
    userGrpAddress, userGrpPhone, userGrpEmail)

43
srbBrowser - A SRB Graphical Interface
  • A java GUI
  • Interface with SRB servers using the client
    API library.
  • Performs most SRB operations - cp, replicate,
    import, export, metadata query, etc.

USER
Java GUI
Obtain users metadata information via SRB.
Invoke SRB operations
SRB Agent
MCAT
Proxy operation
44
SRB Command Line Interface
Environment File
USER
SRB shell commands Sls, Scp, Scat, Sput, Sget,
...
MCAT
SRB Agent
Proxy operation
45
Scommands
  • Sinit - initialize S-environment
  • Sexit - clean up
  • Sman - get manpage for Scommand
  • Scat - display srbObject on screen
  • Sput - copy local file into srbSpace
  • Sget - copy srbObject to local space
  • Sappend - append to srbObject
  • Srename - change srbObject name
  • Srm - remove srbObject
  • Schmod - change/grant access to srbObject
  • Scd - change collection
  • Spwd - display current collection
  • Sls - list collection
  • Smkdir - make new collection
  • Srmdir - remove old collection
  • SgetD - get srbObject information
  • SgetR - get resource information
  • SgetU - get user information
  • SmodD - modify srbObject info
  • SmodU - modify user info
  • Stoken - get native type information
  • Scopy - copy srbObject in another
    collection and under another name
  • Sreplicate - clone object in new resource -
    same internal id
  • Smove - move srbObject to new collection or
    resource

46
Scommands (contd )
  • ingestUser - adding a new user or group
  • ingestResource - adding a new resource
  • ingestLogicalResource - making a new resource
    grouping
  • addLogicalResource - adding to a resource
    grouping
  • ingetLocation - adding new location
    information
  • ingestToken - adding new native types
    (eg. resourceType, objectType, userType,
    domainName, ActionType, . . .)

47
Scommands
  • Sls
  • Sls -h -L number -Y number -r-f
    collection ...
  • Sls -L number -Y number srbObj
  • Sput
  • Sput -p -D dataType -R resourceName
    -P pathName localFileName ...
    TargetName
  • Sput -p -D dataType -R resourceName
    -P pathName -i TargetName
  • Sget
  • Sget -C_n -p srbObj ... localFile
  • Sreplicate
  • Sreplicate -Cn -p -R resourceName
    -P pathName srbObj ...

48
(No Transcript)
49
SRBIO
50
SRBIO
  • Open
  • creat
  • read
  • write
  • close
  • lseek
  • fopen
  • fread
  • fwrite
  • fclose
  • fseek
  • fflush
  • fgetc
  • fgets
  • fputc
  • fputs
  • getc
  • putc
  • ungetc
  • rewind
  • vfprintf
  • fprintf
  • fscanf

51
Web Utilities
  • Sgetw - copies a SRBobject into server site
  • Sputw - copies local file in SRBspace
  • Scatw - displays SRBobject on browser
    (handles types)
  • Slsw - displays information of SRBobjects

52
SRB Case Studies
  • Digital Libraries
  • ELIB - Berkeley Digital Library (UCB)
  • ADL - Alexandria Digital Library (UCSB)
  • Ecological Archives Data Repository (UCSD)
  • Environmental Archives
  • International Satellite Cloud Climatology Project
    Data
  • TIES Data Atlas - Chesapeake Bay Estuarine
    Studies
  • DOCT - Patent Workflow System

53
Digital Libraries
  • Access to images, documents Tools
  • Large Number of files -
  • Images of various resolution
  • Documents of various types (valences)
  • Web-based access - form and spatial queries
  • Domain Metadata - External DB
  • Digital Objects replicated
  • Uses SRB web interface and low-level API

54
Digital Libraries and SRB
ADL
ELIB
SRB
55
DOCT - Patent Workflow
  • Archiving Applications and Office-Actions
  • Replicated Archiving
  • Storage of Issued Patents in multiple forms -
    SGML, DB Schema, HTML
  • Access of Patents from replicated storage
  • Controlled Access for Applications and
    Office-Actions
  • Uses SRB web utilities and high-Level API
  • URL http//www.sdsc.edu/DOCT

56
I
Applicant
DOCT
Examiner
I
E
A Secure Electronic Filing B App As Filed
SRB/SEA C Replication SRB/SEA D Mailroom To
Workflow E User To Workflow F Search Other
Other G Workflow Agent FrameW H WorkF To
ArchSRB/SEA I Applicant Search Auth Web J Pub
Search Elec Commerce
Work Flow
G
Mail Room
D
A
Applicant
SRB Client
SRB Client
Search
B
G
H
C
SRB
SRB
SRB
SRB
I
C
Public Search
F
J
Public
Other Works
57
ESADR
  • Data Archive for Ecological Society
  • Scientists can publish, update and control
    access to data sets
  • Domain meta data kept in external database
    (Oracle)
  • Web-based upload download mechanism
  • FTP and Email support for very large data sets
  • Spatial, temporal bibliographic search
  • Uses SRB web-utilities and high-level API
  • ESADR Homepage http//esadr.sdsc.edu

58
ESADR Dataflow
Register
Login
Connect
Login Retrieve
SRB
59
TIES
  • Distributed Data Atlas for Cruise-Transects
  • Data collected at Chesapeake Bay Estuarine
    Studies
  • 26 transects, 3 times/year, 6 variables
  • Reference atlas of 2-D color plots
  • Domain metadata stored in Oracle
  • Each Object registered in MCAT
  • 3 partner sites - replication and staging in
    hierarchical storage (Unix and HPSS)
  • Uses Scommands and srb web utilities
  • URL http//www.sdsc.edu/marciano/DOCT/Atlas/doct
    .html

60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
Global Clouds Database
  • Storing cloud information throughout the world
  • Tabular data - made online through SRB
  • 6,596 grid-cells over the globe
  • 200 variables per grid-cell
  • Data collected every 3 hours over 4 years (89-92)
  • Small metadata - stored in Flat file
  • Each cell dataset (4 yrs data) is stored in SRB
    (HPSS)
  • Uses Scommands srb web utilities
  • URL http//www.sdsc.edu/marciano/clouds/clouds.h
    tm

67
(No Transcript)
68
(No Transcript)
69
Summary
  • Storing, Publishing, Sharing Cooperating
  • Distributed, Replicated, Heterogeneous Data
    Cache
  • Unifies access to Archival Storage, Database
    Storage, Disk Storage
  • Information Discovery (application-level
    metadata)
  • unifying meta-catalogs (future work)
  • Secure, encrypted controlled access data
    movement
  • integrate with other security systems (future
    work)
  • No new environment requirement

70
Future Work
  • Integration with IBM digital library software
    (e.g. federation of MCAT and DL metadata catalog)
  • Replication and partitioning of MCAT across wide
    area
  • Integration with NWS
  • Schema evolution, ie extending the extensiblity
    feature to cover versioning concept
  • Parallel I/O

71
Future Work
  • Caching
  • Incorporating concept of data set "resolution" in
    the system and APIs
  • Streaming I/O and access to video and large
    visualization data sets
  • Extending the IVCore concept to extract metadata
    for other types of data sets

72
The SRB/MCAT Team
  • Design
  • Reagan Moore, Chaitan Baru, Richard Frost,
    Richard Marciano, Arcot Rajasekar, Wayne
    Schroeder, Michael Wan
  • Implementation
  • Michael Wan (SRB client/server, many drivers,
    srbBrowser)
  • Arcot Rajasekar (MCAT, DB drivers, Scommands,
    SRBIO)
  • Wayne Schroeder (SEAL, Illustra and ftp drivers)
  • Mike Gleicher (HPSS driver)
  • Rob Tempelton/Randy Sharpe (Oracle driver) --
    NCSA
  • Dave Wade (Objectstore driver) -- SAIC
  • Tom Hacker (Error management) -- U. Mich
  • Marcio Faerman (NWS integrtion) -- UCSD

73
Client Registration
  • Get software at http//www.npaci.edu/DICE/SRB/tarf
    iles /???
  • Fill form at http//www.npaci.edu/DICE/SRB/install
    /SRBUserRegister.html
  • SRB Admin will respond with your
  • authorization password (should be changed
    immediately using Spasswd command)
  • domain
  • home collection

74
Setting the Client Environs
  • Set paths and environment variables
    set path(path SRBDIR/utilities/bin)
    set path(path SRBDIR/java/bin) setenv
    CLASSPATH
    /sdsc/local/generic/lib/java/swing/swing.jar
    setenv THREADS_FLAG native

75
Setting the Client Environs
  • Two environment files
  • .srb/.MdasEnv
    mdasCollectionHome '/home/u26072.test
    mdasDomainHome 'test srbUser
    'u26072 srbHost
    'ghidorah.sdsc.edu defaultResource 'unix-sdsc
    SEA_OPT '0
  • .srb/.MdasAuth
    SRBTEST

76
Hands On Demo
  • srbBrowser
  • Import a File into SRBspace
  • Import a Directory as a Collection
  • Display a SRBObject
  • Create a Collection
  • Remove SRBObjects
  • Remove a Collection
  • Copy a SRBObject
  • Replicate a SRBObject

77
Hands On Demo
  • srbBrowser (contd.)
  • Display Replicated Object
  • Display Access Permission
  • Enable Access for Another User
  • Change Access for Another User
  • Display Metadata Attributes

78
Hands On Demo
  • Scommands
  • Sinit -v
  • Sls
  • Spwd
  • Scat ltobject-namegt
  • SgetU ltuser-namegt
  • SgetR
  • Sman Scommandgt
  • Srm ltobject-namegt
  • Sexit
  • Scd ltcollection-namegt
  • SgetD ltobject-namegt
  • Sput ltfile-namegt ltobject-namegt
Write a Comment
User Comments (0)
About PowerShow.com