NPACI Data Intensive Computing Environment - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

NPACI Data Intensive Computing Environment

Description:

Amarnath Gupta: XML wrappers, video/image sources ... Visualization - Shastra, 3D visualization tools. Information model - MIX using XML DTDs ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 33
Provided by: npa5
Category:

less

Transcript and Presenter's Notes

Title: NPACI Data Intensive Computing Environment


1
NPACI Data Intensive Computing Environment
  • Reagan W. Moore
  • Associate Director, Data Intensive Computing
  • San Diego Supercomputer Center
  • moore_at_sdsc.edu
  • http//www.npaci.edu/DICE

2
Current Infrastructure Development
  • UCSD / SDSC
  • Chaitan Baru MIX Mediation of Information using
    XML
  • Amarnath Gupta XML wrappers, video/image sources
  • Bertram Ludaescher BBQ Blended Browsing and
    Querying
  • Richard Marciano BBQ/GIS interfaces
  • Arcot Rajasekar MCAT Meta-data Catalog
  • Wayne Schroeder GSI Grid Security Infrastructure
  • Michael Wan SRB Storage Resource Broker
  • UCSD / CSE
  • Yannis Papakonstantinou XML Matching and
    Structuring Language
  • Victor Vianu MIX
  • Stanford
  • Andreas Paepcke SDLIP Simple Digital Library
    Interoperability Protocol
  • U Md
  • Joel Saltz ADR Active Data Repository

3
Themes
  • Scientific Data Collections
  • Publication of scientific data sets
  • Information discovery mechanisms
  • Application of NSF DLI-II Interlib technology to
    NPACI
  • Information Models for Data
  • eXtended Markup Language (XML) Document Type
    Definition (DTD)
  • Information model for digital objects, data
    collections, and presentation interfaces
  • Application to scientific data collections
  • Digital sky, Protein Data Bank, Neuroscience
    brain images
  • California Digital Library - Art Museum Image
    Consortium

4
Distributed Scientific Data Collections
5
Data Collections
6
Context Management using Collections
  • For data to be useful, the context must be
    defined
  • Data format - binary/integer representation
  • Physical meaning - units
  • Structure - geometry
  • Relevance - feature annotation
  • Semantics - data dictionary for attributes
  • Context is preserved as meta-data attributes
    within a collection

7
XML Query Language
Joint development effort with UCSD CSE Database
Lab (Yannis Papakonstantinou)
8
Themes
  • Integration of Digital Library and Computational
    Grid Technology
  • Information discovery mechanisms - SDLIP
  • Inter-realm authentication - Grid Security
    Infrastructure
  • Data handling systems - Storage Resource Broker
  • Integration promoted through
  • NSF DLI-II InterLib project
  • Grid Forum

9
Information Management Architecture
  • Digital library community technologies
  • Distributed information resources
  • Digital library interoperability protocols -
    SDLIP
  • Mediation of information using XML - MIX
  • Grid Forum technologies
  • Support for distributed services / procedures
  • Inter-realm authentication
  • GSI Grid Security Infrastructure
  • Data handling system
  • Storage Resource Broker, Meta-data Catalog

10
Evolution of Grid Architectures
Common User Environment
Heterogeneous User Environment
Multiple Data and Compute Grids
Single Compute Grid
Open Grid Architecture
11
Digital Library Architecture
Meta-data manipulation services
12
Open Grid Architecture
13
Open Grid Architecture
Application
Data Model Management
Remote Procedure Execution
Armada Dagents, FEL, ADR GRAM, SRB
Data Handling Systems
Information Discovery
LDAP, Database, Flat file, Object database
Condor, GASS, NILE, SRB, I-2 caching, ADR
(e.g., filtering)
Storage System Description
Dynamic Info Discovery
Storage Resources
DPSS, DFS, NFS, HPSS, ADSM, DMF, Unitree,
NASstore, DB2, Oracle, Informix, Sybase, O2,
ObjectStore, Objectivity
DTD, ADR, object class
GloPerf, Netlogger, NWS
14
Open Grid Architecture
API that provides glue to underlying data
handling systems (security, scheduling, QoS,
access protocol, data format/model, adaptivity,
info discovery, location control)
Application
authentication authorization
Data Model Management
Remote Procedure Execution
Armada Dagents, FEL, ADR GRAM, SRB
Data Handling Systems
Information Discovery
Condor, GASS, NILE, SRB, I-2 caching, ADR
LDAP, Database, Flat file, Object database
(e.g., filtering)
Storage System Description
Dynamic Info Discovery
API that provides glue to underlying storage,
QoS, etc. GASS, IBP, SRB
Storage Resources
DPSS, DFS, NFS HPSS, ADSM, DMF, Unitree,
NASstore, DB2, Oracle, Informix, Sybase, O2,
ObjectStore, Objectivity
GloPerf, Netlogger, NWS
DTD, ADR, object class
15
Data Handling System
  • SDSC Storage Resource Broker
  • Protocol transparency
  • Common API for access to remote data resources
  • Explicit drivers for each type of storage system
  • Name transparency
  • Attribute based access to data
  • Location transparency
  • Distribution of collection across multiple
    physical resources
  • Time transparency
  • Minimization of latency for data access

16
SDSC Storage Resource Broker Meta-data Catalog
Application
Resource
User
MCAT
Dublin Core
Application Meta-data
17
SRB Production Sites
  • SRB Servers 18 sites, 24 hosts, 45 resources, 90
    users, 350,000 data sets
  • SDSC - 4 hosts V1.1.4 (HPSS,DB2,Oracle,Illustra
    ,UnixFS,C-90Unicos)
  • U. Maryland V1.1.4 (HPSS, UnixFS)
  • U. Michigan V1.1.3 (ADSM, UnixFS)
  • UIUC(NCSA) V1.1 (Oracle, UnixFS)
  • Rutgers U. V1.1.2 (UnixFS)
  • CalTech V1.1.4 (HPSS, UnixFS)
  • UC Berkeley V1.1.4 (UnixFS)
  • Montana State U V1.1.4 (UnixFS)
  • UCLA V1.1.4 (UnixFS)
  • UCSB V1.1.3 (UnixFS)
  • U Texas,Austin V1.1.3 (DMF, UnixFS)
  • UC Davis V1.1.3 (UnixFS)
  • Washington U,StL V1.1.4 (UnixFS)
  • U Houston V1.1.3 (UnixFS)
  • UCSC V1.1.4 (Oracle, UnixFS)
  • UCSD - 2 hosts V1.1.4 (UnixFS)
  • LBL V1.1.3 (UnixFS)
  • LLNL V1.1.3 (DB2, UnixFS)

18
Time Transparency
  • How to minimize access time
  • Prefetch data to local high performance disk, so
    that all accesses can be done at high speed from
    local resources
  • How to maximize data delivery
  • Composite or aggregate data into a single data
    set to avoid multiple accesses
  • Stream data at high rates using parallel I/O,
    amortizing the access latency by the volume of
    data that is delivered.
  • How to avoid congestion
  • Replicate data across multiple servers

19
Integrating Cache and Collections(Collection
Controlled Data)
Application
Data Model Management
GASS local data cache
ADR compositing cache
DPSS network data cache
SRB Collection Access
Database Collection
Archive Collection
File System Collection
20
Grid Security Infrastructure (GSI)
  • Inter-realm certificate support
  • X.509 certificate support
  • Support for Kerberos, DCE access
  • Secure communication
  • SSL
  • SDSC LibAID - Authentication and Integrity of
    Data
  • Simplified interface library to GSS-API
  • Authentication through two calls
  • Provided in release 1.1.5 of Storage Resource
    Broker

21
DICE Roadmap
Info Disovery
Interactive Browsing
Metacomputing Services
Data Handling
CDL Distributed Query
Digital Libraries/Interlib
CDL FindingAids
Technologies
Distributed Collections
MIX/ICE
Advancement of Scientific DiscoveryInformation
Discovery
SDLIP
Data Collections
XMAS/BBQ
Distributed Data Resources
SRB/MCAT
Internet
RDBMS
DB
Files
1999
2000
2001
2002
UNIX
Time
22
Roadmap - Goals
  • Application of digital library technology to
    scientific data collections
  • Creation of data collections (NS, AMICO, ESS)
  • Support for education through CDL
  • Common information structure model across
    presentation, collection, digital objects
  • Application of MIX to construct user interfaces,
    define structure of data collection, and
    structure of objects
  • Common information discovery interface

23
Roadmap - Information Management Hierarchy
  • Presentation / Information Discovery
  • Collaboration/Visualization - ICE
  • Visualization - Shastra, 3D visualization tools
  • Information model - MIX using XML DTDs
  • Collection organization
  • Meta-data catalog - MCAT
  • Information model - XML DTD and database DDL
  • Data handling
  • Storage Resource Broker - SRB
  • Storage
  • Archival storage system - HPSS
  • Digital object model - XML DTD

24
Roadmap - Integration with Metacomputing
  • Common security infrastructure - GSI
  • Integration of SRB with GSI - FY99
  • Interoperable certificate authorities (NCSA,
    NPACI, CDL) - FY99
  • Interoperable data access systems
  • Integration of SRB with GASS - FY99
  • Integration of SRB with Legion - FY00
  • Remote procedure execution
  • Naming, discovery, and application of procedures
    - FY00
  • Linkage of procedures - FY00
  • Application of XML DTD for object definition

25
Roadmap - DICE
  • Digital Library
  • Archive support at SDSC via SRB - ADL, ELIB, CDL
    (FY99)
  • User Interfaces / Electronic notebooks - UCB,
    UCSB, UCSD (FY00)
  • Data collection support
  • AMICO (FY99) - support educational access to
    images
  • NS (FY00) - develop data dictionary, schema
  • Digital Sky (FY00) - develop XML DTDs for
    structure and access, store 2-20 TB of digital
    sky images
  • ESS ( FY00) - integration of HPSS archives (U
    Md, SDSC)

26
Management of Scientific Data
DX ICE AVS
Notebook GIS wrapper
XMAS XML structure
MIX
Presentation
CDL
ADL ELIB AMICO
PDB NS Publication API
Spatial query ADEPT
Extensible Schema
MCAT
Collection
Containers GSI GASS interface
ADR Parallel I/O Remote Proc.
Globus directory Info. Discovery API
Data Handling
SRB
Distributed Nameserver
Archive
HPSS
MPI interface
GPFS interface
1998
1999
2000
2001
2002
27
Coordination of Digital Library and Metacomputing
Environments
  • Grid Forum
  • Common implementation practice defined by working
    groups
  • Data Access Working Group - Chair Reagan Moore
  • Security Working Group - Co-chair Andrew Grimshaw
  • NPACI Database Workshop
  • Integration of PTE and DICE data handling systems
  • NPACI Storage Resource Broker Workshop
  • Integration of data collections and archival
    storage

28
PTE / DICE Data Handling Integration
XML DTD for data set description
29
SRB Containers - Managing Archive Latency
SRB client
  • Create container in a logical storage resource
    containing at least one cacheable resource
  • Create objects in containers
  • Cache daemon will move filled containers to
    archive
  • synch and purge APIs

SRB Server
UNIX
HPSS
HPSS
container
Distributed Storage Resources
cached containers
30
NPACI Collaborations
  • NASA - Information Power Grid
  • Promote integration of Globus and SRB
    authentication
  • DOE ASCI Data Visualization Corridor
  • Promote use of XML DTDs for scientific data
  • NARA - Persistent Archive
  • Collection based data management
  • DOE NGI - Particle Physics Data Grid
  • Replication of data across multiple servers
  • NSF DLI-II - InterLib
  • Interoperable services between digital libraries
  • California Digital Library - AMICO
  • Educational access to image collections

31
Education and Outreach
  • California Digital Library - AMICO
  • Educational access to image collections
  • 1.5 TB of images
  • Tunable interfaces for students, educators,
    researchers
  • Digital Insight - U Wisconsin
  • Provide access to class videos archived at SDSC
  • 10-20 TB of videos
  • NARA - Historical Collections
  • Mediate information between local collections and
    NARA collections

32
For More Information
  • http//www.npaci.edu/DICE/
Write a Comment
User Comments (0)
About PowerShow.com