Data Collection Management within the NPACI Toolkit - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Data Collection Management within the NPACI Toolkit

Description:

Blue horizon. Data is output TBytes per run. Output dumps occurr about 35 min apart ... Mary Thomas, Jay Boisseau, Maytal Dahan, Eric Roberts, Tomislav Urban (TACC) ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 35
Provided by: broo96
Category:

less

Transcript and Presenter's Notes

Title: Data Collection Management within the NPACI Toolkit


1
Data Collection Management within the NPACI
Toolkit
  • Mary Thomas
  • Texas Advanced Computing Center
  • The University of Texas at Austin
  • and
  • NPACI
  • Presented at the NPACI AHM 2003, San Diego, CA
  • NPACI AHM 2003Parallel Session 4 - Collection
    Management

2
Abstract
  • The GridPort Toolkit is a simple portal
    developer's API that accesses a large set of grid
    services and software, including the Globus
    Toolkit, the NWS, GridFTP, and the Storage
    Resource Broker. GridPort portals include
    Telescience, several NBCR portals, and the
    "Cosmic Web Portal," a portal system developed
    for astrophysicist Mike Norman and his research
    community. We will discuss the approach currently
    used by GridPort, the unique solutions applied to
    the Cosmic Web Portal, and our future plans.

3
Outline
  • Motivation
  • GridPort Architecture
  • Data Management Tools
  • GridFTP
  • SRB
  • GridPort-based Data Portals Examples

4
Portals Provide Simple Interfaces
  • Portals are web based and that has advantages -
  • Users know understand the web
  • Can serve as a layer in the middle-tier
    infrastructure of the Grid
  • Integrate various Grid services and resources
  • Users can be isolated from resource specific
    details
  • Single web interface isolates system
    changes/differences
  • Not and end-all solution - several
    issues/challenges here
  • Performance, scalability
  • Tradeoffs

5
Simple Computational Grid
LSF
  • Resource View
  • Full Functionality
  • Very Complex

6
Conceptual Web Grid
7
GridPort Architecture
8
Services Provided by Grid Computing Portals
  • Grid-Specific
  • Security
  • Account and allocation management
  • Information, discovery and monitoring
  • Resource scheduling and management
  • Data and collection management
  • Application support
  • Portal specific
  • view customization, user session management, and
    portal logging.
  • Groups, roles, sharing, access control
  • collaboration and communication systems -
    chat/instant messaging services, whiteboards,
    calendars, newsgroups, citation browsers
  • Ubiquitous access browsers, cmd line, cell, pda.

9
GridPort SRB
  • With SRB capabilities, file access is direct,
    virtualized
  • Single SRB account access allows for more
    flexible data management

10
GridPort 2.0
  • Part of NPACkage
  • Perl/CGI
  • Easy to install
  • Dynamic
  • Multiportal architecture
  • Account data
  • manage certs/keys, session info for users
  • Grid Globus, SRB, NWS, etc.
  • Thin client

11
Recommended Technologies
  • Collection Management
  • GridFTP
  • SDSC Storage Resource (SRB)
  • for file collection management
  • High speed network interfaces
  • Info. Services
  • Globus MDS 2.2, GIIS, GRIS
  • NWS, data from LSF, United Devices, etc.
  • Web service based GIS archival system Grid-IAS
  • Custom information provider scripts
  • Grid Monitoring System (Java enhanced version of
    NCSA)
  • Portals
  • Java, Jetspeed, portlets, CC
  • Web services (in addtn to grid)
  • Database back end
  • XML
  • Grid
  • OGSI/OGSA Globus 3.0
  • NPACKage
  • Globus GT 2.x (NMI R1, R2 (also earlier versions)
  • Security
  • GSI is key enabling Techn.
  • Grid Security Infrastructure
  • MyProxy for remote proxies
  • Job Execution
  • Globus GRAM Gatekeeper (key)
  • used to run batch, interactive jobs and tasks on
    remote resources
  • Scheduler Platform Computing (LSF,
    Multi-cluster)
  • Integration with SGE, AVAKI, others (Texas grid)
  • Queuing systems PBS, LSF, etc

12
PACI HotPage
  • Access portal to all resources
  • Information Portal to all users
  • Secure access for authorized users
  • PACI Grid Software used
  • Globus Toolkit(GRAM, GSI, GRIS, GIIS), SRB,
    MyProxy, NWS
  • Built with the GridPort Toolkit
  • GP 2.0 Perl/CGI
  • Services provided
  • Resource information/status
  • job control
  • data collection management,
  • command execution
  • personalization

13
Path Forward for GridPort
  • OGSA ? huge impact
  • Software packages compliance
  • GT 3.0 integration
  • NPACKage
  • Continue with GridFTP, SRB integration, others
  • NMI Releases
  • Emerging Portal Technologies ? standards
  • GCE Component Portal Architecture and repository
  • Portlets/Jetspeed
  • GridPort 3.0 Toolkit
  • New architecture based on grid services and
    workflow
  • Open source/team approach
  • PyGridPort GridPort LBL PyGlobus (DOE)

14
Web Services
  • Architecture mechanisms for
  • dynamic service discovery (UDDI)
  • Separation of implementation from function (WSDL)
  • Knowx protocol (SOAP/HTTP, SOAP/RPC)
  • Service provider encapsulates implementation
    details
  • Client doesnt need details, just where/how to
    send request
  • Commercial world developing P2P web services
  • In some ways, Globus/GRAM is a web service
  • Advantage language independent, so can run on
    any system
  • Community pursuing Python, Java, C at this time

15
Open Grid Services Architecture
  • IBM and Globus team integrated key concepts of
    Grid and web
  • Taking Grid community to next level services
    are interoperable
  • protocol based rather than implementation
  • PROTOCOLSexamples telnet, ftp, ssh
  • telnet
  • Login
  • password
  • Grid
  • Security (PKI, GSI)
  • persistence stateless web is gone track task,
    user info, etc.
  • Handles to instances
  • Web
  • HTTP transport layer
  • Simple Object Access Protocol (SOAP)
  • XML
  • Web Services Description Language (WSDL)

16
OGSA Component Approach Workflow
  • Grid Web services components
  • Standard interface
  • Dynamic composition, transfer, exchange of data

17
JetSpeed and Portlets
  • New direction for grid computing portal community
    based on Apache and open source
  • Uses Java plug-in software behind web servers
  • Builds dynamic web pages based on client request
  • Executes set of components (Java Portlets)
  • Composites them into a web page
  • Returns page to user
  • Portlets exchanged by sharing code
  • WSDL will be employed
  • NCSA has developed GridFTP, GridPort team
    developing SRB

18
Jetspeed Advantages
  • Overall portal customization
  • Java Portlet mini code perform tasks.
  • Can install someone elses portlets
  • Individual user customization
  • This will fulfill a need for users to tailor
    their portal interface to their liking.
  • Open Source
  • Always being debugged, re-released.
  • One downside of Open Source is that documentation
    is limited. But, tight user/developer community
    provides some assistance.
  • Template interfaces such as Velocity and JSP
    allow for presentation layer to be separated from
    java program layer.

19
Portlet-based Tools and Technology
  • Provided Capability
  • Management of user proxy certificates
  • Remote file Management via Grid FTP
  • Collaborations tools -News/Message systems
  • Event/Logging service
  • Access to OGSA services
  • Specialized Application Factories
  • Access to directory services and Metadata tools

See http//www.extreme.indiana.edu/xportlets
20
Jetspeed Gridport 2.0
  • Path forward allows adoption of new portal
    technologies while supporting production NPACI
    Infrastructure
  • Only minor modification made to Gridport
  • Perl modules - authentication
  • pass Jetspeed session data - set Gridport cookies
  • Current Progress
  • Gridport Login/Logout
  • Globus Run

21
GridPort 3.0 GCE Portal
  • Expanded CE Layer thin client, GCE Shell,
    Portals, Portlets, Apps, etc.)
  • Distributed grid and web services (OGSA)
  • NPACKage compliant
  • Workflow interaction between components
  • Component Approach
  • need OOPs capability ? Java
  • Python, PHP/Perl
  • XML, database at core

22
GridPort-Based Portals
23
GridPort Data Intensive Portals
  • Cosmic Web Portal (PI Mike Norman, UCSD)
  • Astrophysics Code ENZO
  • Example of large collection browser
  • Telescience (PI Mark Ellisman, UCSD)
  • https//gridport.npaci.edu/Telescience
  • Example of complex data
  • Real time data acquisition system visualization
  • Requires high bandwidth and metadata cataloguing
  • Biomedical Infrastructure Research Network (BIRN,
    PI Peter Arzberger, UCSD)

24
Cosmic Data Portal(PI Mike Norman, UCSD)
  • Astrophysics code ENZO -
  • Generates TB data per run
  • Blue horizon
  • Data is output ? TBytes per run
  • Output dumps occurr about 35 min apart
  • about 14 GB/dump
  • about 75 dumps/run
  • Data Portal to view
  • Enzo data collections
  • Secure access
  • Store user searches for future apps (viz,
    compute)
  • Portal developer Cathie Mills (SDSC)

25
Cosmic SRB web browser
  • Virtualized Views
  • Top logical location rather than physical
    location
  • Bottom View is by attributes rather than
    location
  • Future Plans include Enzo job submission and SRB
    to migrate files during data runs

26
PACI HotPage
  • Access portal to all resources
  • Information Portal to all users
  • Secure access for authorized users
  • PACI Grid Software used
  • Globus Toolkit(GRAM, GSI, GRIS, GIIS), SRB,
    MyProxy, NWS
  • Built with the GridPort Toolkit
  • GP 2.0 Perl/CGI
  • Services provided
  • Resource information/status
  • job control
  • data collection management,
  • command execution
  • personalization

27
Telescience for Advanced Tomography Applications
(PI Mark Ellisman, UCSD)
  • Example of complex data
  • Real time data acquisition system visualization
  • Requires high bandwidth and metadata cataloguing
  • NPACI Alpha Project create a set of tools
  • Remote control of UCSD high energy microscope
  • Computation of 3-D structureselectron
    tomographic volumes
  • Deposit results into a database forming a library
    of computerized cell-level brain structures
  • Tomography (a lay persons view)
  • High energy (400keV) electron microscope is used
    to scan a physical sample, sent by user to
    facility
  • Series of projections are taken under user
    control
  • Specimen successively tilted in small angular
    increments
  • Data analyzed/reconstructed to perform 3-D image

28
Telescience Data Portal
  • Access to high performance and long term storage
    facilities across computational domains with a
    point and click interface
  • NPACI (SRB collections located at SDSC)
  • NASA/IPG (SRB collections)
  • others
  • Seamless integration with SRB- Storage Resource
    Broker
  • High speed access to data utilizing advanced
    networks such as Internet2.
  • NREN and Abilene networks
  • gt 60 Mbits/sec data transfer rate
  • Portal (in production)
  • On-line certificate generation (_at_NPACI)
  • On-line SRB collection creation and mgmt
  • Compute (O2K, Globus)
  • Integrates existing tools
  • GridPort provides grid access

29
Telescience Access to Instruments/Data
30
Data Performance
  • Telescience Portal couples NASA/IPG and
    SDSC/UCSD, NPACI resources
  • Mass storage (SRB) for collections
  • Compute (O2K, Globus)
  • NREN and Abilene networks
  • Tests successfully ran on September 24, 2001
  • gt 60 Mbits/sec data transfer rate
  • Portal in production

31
BIRN Portal
  • Production Portal
  • GridPort 2.2/NPACKage
  • Extends Telescience Architecture
  • Uses GridFTP and SRB
  • Uses GridPort to Integrate Telescience
    technologies with the Grid
  • Access to instruments
  • Globus job control
  • SRB data collections

32
Future Directions
  • Portals Workshop on Friday (open to all)
  • Goal is to bring NPACI Portal developers users
    together
  • GridPort 2.2 part of NPACKage, comliant with NMI
    program (NSF)
  • Perl has no GSI security capabilities moving
    away
  • Developing Jetspeed/Portlet solutions for
    GridPort
  • Planning on pyGlobus version pyGridPort
  • Collaborating with U. Mich, Indiana, Argonne,
    NCSA to develop grid portlet repository
  • Developing GridPort GCE
  • OGSA, Java/portlets, GCEShell interfaces

33
GridPort Project Team
  • GridPort Project represents collaboration efforts
    spanning the PACI Program
  • Mary Thomas, Jay Boisseau, Maytal Dahan, Eric
    Roberts, Tomislav Urban (TACC)
  • Cathie Mills, Steve Mock, Kurt Mueller (SDSC)
  • Charles Severance, Joseph Hardin (U. Mich)
  • Dennis Gannon, Goeffrey Fox, Marlon Pierce
    (Indiana)
  • Argonne/ISI Globus development team
  • And input from other Institutions/Projects
  • NASA/IPG, GGF/GCE Research Group
  • NBCR, Telescience, etc.

34
References
  • Related AHM Sessions
  • Tutorial 7 SRB
  • Tutorial 9 Grid Portals
  • Parallel Session 2 (Weds) Grid Experiences
  • Workshop on Portals (Friday)
  • GridPort Toolkit Contact Mary Thomas
    (mthomas_at_tacc.utexas.edu)
  • Project Websites http//gridport.npaci.edu
  • Download http//gridport.npaci.edu/download
Write a Comment
User Comments (0)
About PowerShow.com