Campus - PowerPoint PPT Presentation

About This Presentation
Title:

Campus

Description:

Increase productivity of UT researchers do more science! Grid User Portal (GUP) Developed UT Grid-specific portal based on GridPort3 that focused on the ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 48
Provided by: josh210
Category:
Tags: campus

less

Transcript and Presenter's Notes

Title: Campus


1
Campus State Grids in Texas
  • Jay Boisseau
  • Texas Advanced Computing Center
  • The University of Texas at Austin
  • June 23, 2005

2
TACC Grid Deployment Projects
  • TACC is involved in grids at five scales
  • Campus UT Grid
  • State Texas Internet Grid for Research
    Education (new)
  • Regional SURA Grid (planning phases)
  • National TeraGrid
  • International Open Science Grid (just joining)

3
TACC Grid Technology Projects
  • GridPort grid portal toolkit
  • Also building grid portals of course
  • GPIR
  • Web services-based grid resource info system
  • GridShell
  • Shell environment for managing jobs and data on
    grids
  • MyCluster
  • Virtualizing grid resources for local clusters
  • Scheduling Prediction Services
  • Providing estimates for queue waits, execution
    times, data transfer time

4
UT GRID
5
UT Grid Develop and Provide a Unique,
Comprehensive Cyberinfrastructure
  • The strategy of the UT Grid project is to
    integrate
  • common security/authentication
  • scheduling and provisioning
  • aggregation and coordination
  • diverse campus resources
  • computational (PCs, servers, clusters)
  • storage (Local HDs, NASes, SANs, archives)
  • visualization (PCs, workstations, displays,
    projection rooms)
  • data collections (sci/eng, social sciences,
    communications, etc.)
  • instruments sensors (CT scanners, telescopes,
    etc.)
  • from personal scale to terascale
  • personal laptops and desktops
  • department servers and labs
  • institutional (and national) high-end facilities

6
That Provides Maximum Opportunity Capability
for Impact in Research, Education
  • into a campus cyberinfrastructure
  • evaluate existing grid computing technologies
  • develop new grid technologies
  • deploy and support appropriate technologies for
    production use
  • continue evaluation, RD on new technologies
  • share expertise, experiences, software
    techniques
  • that provides simple access to all resources
  • through web portals
  • from personal desktop/laptop PCs, via custom CLIs
    and GUIs
  • to the entire community for maximum impact on
  • computational research in applications domains
  • educational programs
  • grid computing RD

7
Add Services Incrementally, Driven By User
Requirements
8
Hub Spoke Approach
  • Deploying P2P campus grid requires overcoming two
    trust issues
  • grid software reliability, security, and
    performance
  • each other not to abuse ones own resources
  • Advanced computing center presents opportunity to
    build centrally manage grid as step to P2P grid
  • already has trust relationships with users
  • so, when facing both issues, install grid
    software centrally first
  • create centrally managed services
  • create spokes from central hub
  • then, when grid software is trusted
  • show usage and capability data to demonstrate
    opportunity
  • show policies and procedures to ensure fairness
  • negotiate spokes among willing participants

9
UT Grid Logical View
  • Integrate a set of resources(clusters, storage
    systems, etc.)within TACC first

TACC Compute, Vis, Storage, Data
(actually spread across two campuses)
10
UT Grid Logical View
  • Next add other UTresources usingsame tools
    andprocedures

TACC Compute, Vis, Storage, Data
ACES Cluster
ACES Data
ACES PCs
11
UT Grid Logical View
  • Next add other UTresources usingsame tools
    andprocedures

GEO Data
GEO Cluster
TACC Compute, Vis, Storage, Data
GEO Cluster
ACES Cluster
ACES Data
ACES PCs
12
UT Grid Logical View
BIO Data
BIO Instrument
  • Next add other UTresources usingsame tools
    andprocedures

PGE Cluster
GEO Data
PGE Data
GEO Cluster
TACC Compute, Vis, Storage, Data
PGE Instrument
GEO Cluster
ACES Cluster
ACES Data
ACES PCs
13
UT Grid Logical View
BIO Data
BIO Instrument
  • Finally negotiateconnectionsbetween spokesfor
    willing participantsto develop a P2P grid.

PGE Cluster
GEO Data
PGE Data
GEO Cluster
TACC Compute, Vis, Storage, Data
PGE Instrument
GEO Cluster
ACES Cluster
ACES Data
ACES PCs
14
Distributed Serial Computing Roundup
  • Roundup consists of UT Austin campus desktops and
    servers running the United Devices Grid MP
    software
  • Clients are pooled together to make up a single
    UT Grid resource
  • Resources contributed by several UT
    organizations TACC, ICES, CoE, ITS, etc
  • 1500 CPUs available today
  • Integrated into TACC user portal
  • Production usage began April 1
  • Identified future RD opportunities for/with UD

15
Distributed Serial Computing Rodeo
  • Rodeo is set of Condor Pools of dedicated and
    non-dedicated resources
  • Dedicated resources
  • Condor Central Manager (collector and negotiator)
  • TACC Condor Pool can flock to CS and ICES pools
    as needed
  • Non-dedicated resources
  • Linux, Windows, and Mac resources are managed by
    Condor (similar to United Devices)
  • Usage policy is configured by resource owner,
    i.e.
  • when there is no other activity
  • when load (utilization) is low
  • give preference to certain group or users
  • 700 CPUs across multiple pools
  • In production since April 1

16
Distributed Parallel Computing CSF
  • Community Scheduling Framework (CSF) is open
    source framework for meta-scheduling
  • Coordinates communications between multiple
    heterogeneous resource managers
  • LSF, GRAM
  • Issues
  • provides metascheduler framework (only)
  • current functionality inadequate
  • tightly coupled with Globus Toolkit
  • requires significant investment in development,
    maintenance, and support investment

17
Distributed Parallel Computing CSF
  • UT Grid team
  • ported CSF to Globus Toolkit 3.2, 3.2.1
  • started development of scheduler plug-in
  • contributed development back to the CSF
  • completed technology evaluation, DeveloperWorks
    article
  • Current status stop and monitor
  • current functionality inadequate
  • requires significant investment in development,
    maintenance, and support
  • CSF tightly coupled to the Globus toolkit which
    makes it hard to upgrade
  • currently monitoring CSF discussion lists to keep
    track of the project

18
Distributed Parallel Computing Metascheduler
Evaluation
  Condor CSF Moab
RM coverage LSF PBS SGE Condor LoadLeveler Y Y Y Y Y Y Y N N N Y Y Y N Y
Grid Integration GSI GRAM GridFTP Index Service Y Y Y N Y Y Y Y Y Y Y N
Customizable Policies Y N Y
Availability Licensing Open Source Free Y() Free Y Commercial N
Administration Interface Tools Dynamic updates Y Y Y   Command line N N   Y Y Y
NMI Y N  N 
Research Opportunities Y Y N
  • Results of evaluation
  • CSF
  • development resources required
  • monitor evolution
  • Condor
  • use as resource broker for UT Grid
  • MOAB, LSF MultiCluster
  • too expensive
  • Portable designs to support other metaschedulers
    in future

19
Condor for Distributed Parallel Computing
  • MPI Universe can be used for running parallel
    jobs
  • MPI jobs can be run only on dedicated Condor
    resources
  • condor does not preempt MPI jobs
  • single Condor submit description file is required
  • support for staging specified files to each
    compute node (in case shared file system does not
    exist)
  • Alternate method submit MPI job using Condor-G
  • submit a Globus Universe job to a native resource
    manager (such as LSF)
  • Globus job uses bsub to submit MPI job to LSF

20
Resource Broker Service
  • Resource Broker is a central service of UT Grid
  • advertises capabilities and specifics of
    resources
  • Resource broker components
  • Catalog of resources
  • Resource broker query from GUN, GUP
  • will send query string to catalog service
  • select resources will be returned, based on input
    string
  • User to select resources based on some criteria
  • user will get names of qualified resources
    returned
  • scheduling decisions can be based on these
    results
  • example GridShell job submission
  • In development!

21
Scheduling Logic for Resource Broker
  • Serial jobs can be scheduled dynamically based on
    availability parallel jobs usually queued to
    busy systems
  • Ideally, use same model for parallel as serial,
    but cant if data transfer times are longcommon
    for big parallel jobs
  • Cant stage data to all systemsresource
    limitations
  • Solution predict system likely to complete job
    first
  • estimate data transfer time to each possible
    system
  • estimate queue wait time on each system
  • estimate execution time on each systems
  • calculate minimum of maxttrans,tqueuetexec for
    each system
  • Currently working to design, then develop broker
    to include predictions based on these three
    variables

22
File Transfer Services
  • GridFTP
  • high-performance, secure, reliable data transfer
    protocol
  • incorporates GSI for enabling secure
    authentication and communication over an open
    network
  • enables Third Party Transfers between remote
    servers while client manages transfer
  • Comprehensive File Transfer Portlet
  • developed multiple file transfer capabilities
  • uses NWS to estimate file transfer times
  • enables monitoring and persistent storage of file
    transfers

23
Grid Visualization
  • Network bandwidth growth means remote
    visualization is possible
  • bandwidth growing faster than display sizes!
  • now possible to leverage powerful central/remote
    rendering/visualization resources, just like HPC
  • requires s/w tools, demand/reservation
    scheduling, etc.
  • With remote visualization enabled, collaborative
    visualization is possible
  • requires further advances in tools, incl.
    integration of multiple keyboard/mouse inputs
  • Want to enable both for grid visualization
    resources!

24
Grid Visualization
  • Grid rendering is like traditional grid batch
    computing
  • already used by animation studios
  • Grid remote/collaborative visualization our goal
  • identify rendering resource based on data,
    technique, availability
  • move data to rendering system based on
    reservation or demand
  • calculate geometries and push geometry to local
    device if bandwidth not sufficient and local
    graphics hardware is
  • render images and push pixels to local display if
    bandwidth is sufficient
  • still requires GSI, scheduling (on demand and
    advanced reservation), data management, etc.
  • requires multi-platform clients for remote and
    collaborative vis

25
Initial UT Grid Visualization Services
  • Installed Maverick as terascale visualization
    resource with parallelism over commodity graphics
  • TeraBurst V2D hardware for remote visualization
  • high performance
  • multi-tiled displays
  • Sun 3D Server software for remote visualization
  • evaluating versions of 3D Server not based on X
    protocol to increase interactive performance
  • Leveraging NSF TeraGrid activities heavily

26
Deploying Initial Remote Visualization Tools On
Maverick
Ethernet
3D Server Client
Sun 3D Server Software
TeraBurst V2D Receiver
TeraBurst V2D Transmitter
Ethernet
27
Data Collections Servics
  • TACC hosting four UT scientific data collections
  • Data were already available/used by researchers
    at low b/w
  • Multiple data sets will be used in Flood Modeling
    SGW
  • Leverages strong relationships with UT
    Geosciences
  • Enables researchers to use these data in high-end
    simulations and analyses, science gateways, etc.

Collection Initial Size Projected Growth
NEXRAD Precipitation 200 GB 1-1.5 TB/year
MODIS Satellite Imagery 6 TB 5-6 TB/year
LiDAR Terrain 15 GB ?
X-Ray CT Scan 1 TB 2 TB/year
28
Data Collections Activities
  • Evaluating issues fur using DBs for sci data
  • data schemas
  • extensions for scientific data types (bio, geo)
  • clusters for database I/O performance
  • TeraGrid
  • Leveraging NSF TeraGrid activities heavily
  • Currently analyzing data collection requirements
    in TeraGrid, will utilize in UT Grid

29
Grid User Node (GUN)
  • Campus users have PCs for research education
    projects and they are used to their local systems
  • Researchers also often need additional resources
  • need to be able to keep doing what they know best
  • Issuing same commands, yet reaching additional
    resources
  • would like access to those resources easily and
    transparently
  • Data available to both local and remote
    resources, etc.
  • The Grid User Node (GUN) concept is designed to
    address these needs by integrating local
    resources into UT Grid
  • removes distinction of local vs. remote resource
  • GUN will probably be adopted in TeraGrid, TIGRE,
    etc.

30
Grid User Node (GUN)
  • Two types of GUNs are available
  • TACC-hosted GUNs (Linux for now Windows, Mac
    coming)
  • allows easy start and testing of environment
  • hosted GUNs are already fully integrated into UT
    Grid
  • Personal GUN via downloadable GUN software
  • links to downloadable packages, user guides to
    make it easy
  • can then be further customized to suit needs and
    tastes
  • users PC is now fully integrated into UT Grid
  • Currently have Linux and Mac versions in
    production Windows version under discussion.

31
Grid User Node (GUN)
  • Developed GridShell software to enable GUNs
  • GridShell incorporates features to transparently
    execute commands and data transfers across
    computational resources integrated by grid
    computing technologies.
  • Built on top of GSI, GRAM, GridFTP, Condor, LSF
  • GridShell v1.0
  • bash and tcsh
  • Linux and Mac OS X
  • Implementing GridShell for TeraGrid as well as UT
    Grid
  • Already in use by researchers on UT Grid and
    TeraGrid

32
Grid User Node (GUN)
  • GUN already enables
  • information queries about grid resources
  • Roundup and Rodeo job submission
  • monitoring job status
  • reviewing job results
  • resource brokering based on ClassAd catalogs
  • GridFTP enabled GSIFTP
  • On-Demand glide-in of UD resources into Condor
    pool
  • expand, generalize resource broker design
    implementation
  • integrated real-life applications NAMD,
    SNOOP3D, POVray

33
Grid User Portal (GUP)
  • Portals lower the barrier of entry for novice
    users
  • Also provide alternatives to CLI for advanced
    users
  • Enable easy access to multiple resources through
    a single interface
  • Offer simple GUI interface to complex grid
    computing capabilities
  • Can host applications for domain-specific
    scientific research using grid technology
  • Present a Virtual Organization view of the Grid
    as a whole
  • Increase productivity of UT researchers do more
    science!

34
Grid User Portal (GUP)
  • Developed UT Grid-specific portal based on
    GridPort3 that focused on the following
    functionality
  • View information on resources within UT Grid,
    including status, load, jobs, queues, etc.
  • View network bandwidth and latency between
    systems, aggregate capabilities for all systems.
  • Submit user jobs and run hosted applications
  • Manage files across systems, and move/copy
    multiple files between resources with transfer
    time estimates
  • Browse Data Collections

35
Grid User Portal (GUP)
  • Incorporated UT Grid resources into the current
    production TACC User Portal
  • Developing GUP components (portlets) using GPv3
    JSR-168
  • Portlets will be compatible to work with other
    JSR-168 frameworks (WebSphere, GridSphere,
    uPortal, etc.)
  • Enables sharing of portlets with other
    communities (IBM, OGCE)
  • Current JSR168 implementations
  • GPIR Browser
  • Comprehensive File Transfer
  • Comprehensive Job Management
  • Data Collection browsing
  • NAMD hosted application (in progress)
  • Leading development of TeraGrid User Portal
  • Driving requirements for GPv4

36
Serial Compute Services Plans
  • Increase client counts to 10,000 through UT
    BevoWare
  • Increase user community through training, docs,
    consulting
  • Simplify usage through hosted applications,
    application portals
  • Add UD screen saver educational content
  • Develop Condor glide-in for United Devices
  • Develop UD support for multiple grids, GSI
  • Long term plans/possibilities
  • explore P2P algorithms for traditional sci/eng
    apps
  • integrate each into TIGRE, TeraGrid
  • explore development environments for
    multiplatform client execution
  • ask Texas Exes to support, distribute clients

37
Parallel Compute Services Plans
  • Developer UT Grid Resource Broker for integrating
    clusters with different queuing systems
  • use Condor, Globus Toolkit v4, and own
    development
  • address TeraGrid requirements, other partner
    requirements
  • use data transfer times, queue wait predictions,
    application execution time predictions
  • provide to TeraGrid, TIGRE, partners
  • Integrate UT Grid Resource Broker into GUP, GUN
  • Install UT clusters as spokes, later convert to
    peer clusters
  • Long term plans/possibilities
  • explore more sophisticated scheduling algorithms
  • explore WS-Agreement based scheduling
  • evaluate and develop workflow tools

38
Storage Services Plans
  • File Services
  • explore network performance issues impact on
    GridFTP
  • harden, distribute file transfer portlet
  • integrate comprehensive file transfer services
    into GridShell
  • Grid File Systems
  • GPFS discuss results with SDSC, set up TACC
    testbed, evaluate ease of deployment, robustness,
    performance
  • GridNFS track development, invite speaker from
    project to visit TACC
  • select one technology for campus deployment

39
Visualization Services Plans
  • Complete installation of Maverick
  • begin measuring effectiveness, impact of remote
    visualization
  • Complete determination, documentation of campus
    vis tools for personal scale vis and high-end
    vis, provide support
  • Work with IBM to develop Linux remote
    visualization tools
  • continue meeting with Deep graphics team
  • deploy Linux visualization cluster? SMP?
  • Define and develop remote and collaborative
    visualization software in 2Q05 and beyond
  • leverage TeraGrid funding
  • prepare and submit NSF proposals on grid
    visualization

40
Data Collections Plans
  • Continue work with SRB
  • develop better interfaces for collections such as
    query mechanism
  • complete technology evaluations
  • complete hosting of initial four data collections
  • Continue evaluating Avaki for collections
  • will this technology meet our needs for data
    collections?
  • complete technology evaluations
  • Evaluate DB2, SQL Server, Oracle, etc. for
    collection hosting capabilities
  • Solicit UT community for additional data
    collections and user requirements

41
Grid User Portal Grid User Node
  • Complete and distribute
  • GridShell v1.0 (bash and tcsh)
  • GridPort v4.0
  • GridPort portlets, application portlets
  • Develop and deploy
  • TeraGrid User Portal 1.0
  • TACC User Portal v3.0
  • TACC GUN v1.0
  • Write DeveloperWorks articles on
  • GridShell v1.0 and GridPort v4.0
  • GUN and GUP concepts, value, implementations
  • Evangelize GUN concept to TeraGrid for
    deployment, support
  • Develop GridShell agents for additional grid
    technologies

42
TIGRE
43
About TIGRE
  • High Performance Computing Across Texas (HiPCAT)
    is a consortium of Texas higher ed and medical
    research institutions
  • Texas Internet Grid for Research Education
    (TIGRE) is a new project of HiPCAT to build a
    state grid for higher ed and med research
  • Lonestar Education And Research Network (LEARN)
    will connect 30 higher ed and med research
    institutions in Texas

44
About TIGRE
  • TIGRE is a 2.5M two-year project for UT Austin,
    Texas Tech, Texas AM, Rice, andU. Houston to
    deploy a grid
  • Limited funding, duration project
  • must be lightweight, easy to extend to all Texas
    institutions
  • Must be reliable, easy to support at Texas
    institutions
  • TIGRE needs GRIDS Center!

45
TIGRE Requirements
  • Initial applications communities include
  • atmospheric modeling/environmental issues
  • biomedical research (diverse)
  • petroleum modeling/engineering
  • Technology requirements
  • Sharing data in these domains
  • Aggregating compute resources
  • Maximizing throughput of compute jobs
  • Integration with campus grids
  • Integration with TeraGrid

46
TIGRE Request to GRIDS Center
  • TIGRE/GRIDS design meeting late July
  • Determine minimal complete software stack for
  • setting up grid usage
  • conducting grid accounting
  • deploying user portal
  • enabling data sharing
  • distributing compute jobs
  • Develop aggressive plan, timeline for deployment
  • SC05 as a driver for initial capabilities,
    demonstrations?
  • Summer 06 for initial production with at least
    one app domain?
  • Summer 07 for completion
  • Regular consulting meetings with TIGRE teams
  • Journal entire process, publish jointly as case
    study

47
More About TACC
  • Texas Advanced Computing Center
  • www.tacc.utexas.edu
  • info_at_tacc.utexas.edu
  • (512) 475-9411
Write a Comment
User Comments (0)
About PowerShow.com