Title: Jay Boisseau boisseau@tacc.utexas.edu
1TeraGrid A Terascale Distributed Discovery
Environment
- Jay Boisseau
- TeraGrid Executive Steering Committee (ESC)
Member - and
- Director, Texas Advanced Computing Center atThe
University of Texas at Austin
2Outline
- What is TeraGrid?
- Users Requirements
- TeraGrid Software
- TeraGrid Resources Support
- Science Gateways
- Summary
3What is TeraGrid?
4The TeraGrid Vision
- Integrating the Nations Most Powerful Resources
- Provide a unified, general purpose, reliable set
of services and resources. - Strategy An extensible virtual organization of
people and resources across TeraGrid partner
sites. - Enabling the Nations Terascale Science
- Make Science More Productive through a unified
set of very-high capability resources. - Strategy leverage TeraGrids unique resources to
create new capabilities driven prioritized by
science partners - Empowering communities to leverage TeraGrid
capabilities - Bring TG capabilities to the broad science
community (no longer just big science). - Strategy Science Gateways connecting
communities, Integrated roadmap with peer Grids
and software efforts
5The TeraGrid Strategy
- Building a distributed system of unprecedented
scale - 40 teraflops compute
- 1 petabyte storage
- 10-40Gb/s networking
- Creating a unified user environment across
heterogeneous resources - User software environment, User support
resources. - Created an initial community of over 500 users,
80 PIs.
- Integrating new partners to introduce new
capabilities - Additional computing, visualization capabilities
- New types of resources- data collections,
instruments
Make it extensible!
6The TeraGrid Team
- TeraGrid Team has two major components
- 9 Resource Providers (RPs) who provide resources
and expertise - Seven universities
- Two government laboratories
- Expected to grow
- The Grid Integration Group (GIG) who provides
leadership in grid integration among the RPs - Led by Director, who is assisted by Executive
Steering Committee, Area Directors, Project
Manager - Includes participation by staff at each RP
- Funding now provided for people, not just
networks and hardware!
7Integration Converging NSF Initiatives
- High-End Capabilities U.S. Core Centers,
TeraGrid - Integrating high-end, production-quality
supercomputer centers - Building tightly coupled, unique large-scale
resources - STRENGTH Time-critical and/or unique high-end
capabilities - Communities GriPhyN, iVDGL, LEAD, GEON,
NEESGrid - ITR and MRI projects integrate science
communities - Building community-specific capabilities and
tools - STRENGTH Community integration and tailored
capabilities, high-capacity loosely coupled
capabilities - Common Software Base NSF/NMI, DOE, NASA programs
- Projects integrating, packaging, distributing
software and tools from the Grid community - Building common middleware components and
integrated distributions - STRENGTH Large-scale deployment, common software
base, assured-quality software components and
component sets
8User Requirements
9Coherence Unified User Environment
- Do I have to learn how to use 9 systems?
- Coordinated TeraGrid Software and Services (CTSS)
- Transition toward services and service oriented
architecture - From software stack to software and services
- Do I have to submit proposals for 9 allocations?
- Unified NRAC for Core and TeraGrid Resources
Roaming allocations - Can I use TeraGrid the way I use other Grids?
- Partnership with Globus Alliance, NMI GRIDS
Center, Other Grids - History of collaboration and successful
interoperation with other Grids
10Teragrid User Survey
- TeraGrid capabilities must be user-driven
- Undertook needs analysis Summer 2004
- 16 Science Partner Teams
- Realize these may not be widely representative,
so will repeat this analysis every year with
increasing number of groups - 62 items considered, top 10 needs reflected in
the TeraGrid roadmap
11TeraGrid User Input
Data
Grid Computing
Science Gateways
Overall Score
Partners in Need
Remote File Read/Write
High-Performance File Transfer
Coupled Applications, Co-scheduling
Grid Portal Toolkits
Grid Workflow Tools
Batch Metascheduling
Global File System
Client-Side Computing Tools
Batch Scheduled Parameter Sweep Tools
Advanced Reservations
12Some Common Grid Computing Use Cases
- Submitting large number of individual jobs
- Requires grid scheduling to multiple systems
- Requires automated data movement or common file
system - Running on-demand jobs for time-critical
applications (e.g. weather forecasts, medical
treatments) - Requires preemptive scheduling
- Requires fault tolerance (checkpoint/recovery)
13Highest Priority Items
- Common to many projects that are quite different
in their specific usage scenarios - Efficient cross-site data management
- Efficient cross-site computing
- Capabilities to customize Science Gateways to the
needs of specific user communities - Simplified management of accounts, allocations,
and security credentials across sites
14Bringing TeraGrid Capabilities to Communities
15Bringing TeraGrid Capabilities to Communities
A new generation of users that access TeraGrid
via Science Gateways, scaling well beyond the
traditional user with a shell login
account. Projected user community size by each
science gateway project. Impact on society from
gateways enabling decision support is much larger!
16Exploiting TeraGrids Unique Capabilities
Aquaporin mechanism
Water moves through aquaporin channels in single
file. Oxygen leads the way in. At the most
constricted point of channel, water molecule
flips. Protons cant do this. Animation pointed
to by 2003 Nobel chemistry prize announcement.
(Klaus Schulten, UIUC)
ENZO (Astrophysics)
Enzo is an adaptive mesh refinement grid-based
hybrid code designed to do simulations of
cosmological structure formation (Mike Norman,
UCSD).
Reservoir Modeling
- Given An (unproduced) oil field permeability
and other material properties (based on
geostatistical models) locations of a few
producer/injector wells - Question Where is the best place for a third
injector? - Goal To have fully automatic methods of injector
well placement optimization. (J. Saltz, OSU)
GAFEM (Ground-water modeling)
GAFEM is a parallel code, developed at North
Carolina State Univ., for solution of large
scale groundwater inverse problems.
17Exploiting TeraGrids Unique Capabilities Flood
Modeling
Merry Maisel (TACC), Gordon Wells (UT)
18Exploiting TeraGrids Unique Capabilities Flood
Modeling
- Flood Modeling needs more than traditional
batch-scheduled HPC systems! - Precipitation data, groundwater data, terrain
data - Rapid large-scale data visualization
- On-demand scheduling
- Ensemble scheduling
- Real-time visualization of simulations
- Computational steering of possible remedies
- Simplified access to results via web portals for
field agents, decisions makers, etc. - TeraGrid adds the data and visualization systems,
portals, and grid services necessary
19Harnessing TeraGrid for Education
Example Nanohub is used to complete coursework
by undergraduate and graduate students in dozens
of courses at 10 universities.
20User Inputs Determine TeraGrid Roadmap
- Top priorities reflected in Grid Capabilities and
Software Integration roadmap First targets - User-defined reservations
- Resource matching and wait time estimation
- Grid interfaces for on-demand and reserved access
- Parallel/striped data movers
- Co-scheduling service defined for
high-performance data transfers - Dedicated GridFTP transfer nodes available to
production users.
21TeraGrid Roadmap Defined 5 Years Out
22(No Transcript)
23Working Groups, Requirements Analysis Teams
- Working Groups
- Applications
- Data
- External Relations
- Grid
- Interoperability
- Networks
- Operations
- Performance Evaluation
- Portals
- Security
- Software
- Test Harness and Information Services (THIS)
- User Services
- Visualization
- RATs
- Science Gateways
- Security
- Advanced Application Support
- User Portal
- CTSS Evolution
- Data Transport Tools
- Job Scheduling Tools
- TeraGrid Network
24TeraGrid Software
25Software Strategy
- Identify existing solutions develop solutions
only as needed - Some solutions are frameworks
- We need to tailor software to our goals
- Information services/site interfaces
- Some solutions do not exist
- Software function verification
- INCA project scripted implementation of the docs
- Global account / accounting management
- AMIE
- Data Movers
- Etc.
- Deploy, Integrate, Harden, and Support!
26TeraGrid Software Stack Offerings
- Core Software
- Grid service servers and clients
- Data management and access tools
- Authentication services
- Environment commonality and management
- Applications springboard for workflow and
service oriented work - Platform-specific software
- Compilers
- Binary compatibility opportunities
- Performance tools
- Visualization software
- Services
- Databases
- Data archives
- Instruments
27TeraGrid Software Development
- Consortium of leading project members
- Define primary goals and targets
- Mine helpdesk data
- Review pending software request candidates
- Transition test environments to production
- Eliminate software workarounds
- Implement solutions derived from user surveys
- Deployment testbeds
- Separate environments as well as alternate access
points - Independent testbeds in place
- Internal staff testing from applications teams
- Initial Beta users
28(No Transcript)
29Software Roadmap
- Near term Work (work in progress)
- Co-scheduled file transfers
- Production-level GridFTP resources
- Metascheduling (grid scheduling)
- Simple workflow tools
- Future directions
- On-demand integration with Open Science Grid
- Grid checkpoint/restart
30Grid Roadmap
- Near term
- User-defined reservations
- Web services testbeds
- Resource wait time estimation
- To be used by workflow tools
- Striped data movers
- WAN file system prototypes
- Longer term
- Integrated tools for workflow scheduling
- Commercial grid middleware opportunities
31TeraGrid Resources Support
32TeraGrid Resource Partners
33TeraGrid Resources
ANL/UC Caltech IU NCSA ORNL PSC Purdue SDSC TACC
Compute Resources Itanium2 (0.5 TF) IA-32 (0.5 TF) Itanium2 (0.8 TF) Itanium2 (0.2 TF) IA-32 (2.0 TF) Itanium2 (10 TF) SGI SMP (6.5 TF) IA-32 (0.3 TF) XT3 (10 TF) TCS (6 TF) Marvel (0.3 TF) Hetero (1.7 TF) Itanium2 (4.4 TF) Power4 (1.1 TF) IA-32 (6.3 TF) Sun (Vis)
Online Storage 20 TB 155 TB 32 TB 600 TB 1 TB 150 TB 540 TB 50 TB
Mass Storage 1.2 PB 3 PB 2.4 PB 6 PB 2 PB
Data Collections Yes Yes Yes Yes Yes
Visualization Yes Yes Yes Yes Yes
Instruments Yes Yes Yes
Network (Gb/s,Hub) 30 CHI 30 LA 10 CHI 30 CHI 10 ATL 30 CHI 10 CHI 30 LA 10 CHI
Partners will add resources and TeraGrid will add
partners!
34TeraGrid Usage by NSF Division
CDA
IBN
CCR
ECS
DMS
BCS
ASC
DMR
MCB
AST
CHE
PHY
CTS
Includes DTF/ETF clusters only
35TeraGrid User Support Strategy
- Proactive and Rapid Response for General User
Needs - Sustained Assistance for Groundbreaking
Applications - GIG Coordination with staffing from all RP sites
- Area Director (AD) Sergiu Sanielevici (PSC)
- Peering with Core Centers User Support teams
36User Support Team (UST)Trouble Tickets
- Filter TeraGrid Operations Center (TOC) trouble
tickets system issue or possible user issue - For each Ticket, designate a Point of Contact
(POC) to contact User within 24 hours - Communicate status if known
- Begin dialog to consult on solution or workaround
- Designate a Problem Response Squad (PRS) to
assist POC - Experts who respond to POCs postings to UST list,
and/or requested by AD - All UST members monitor progress reports and
contribute their expertise - PRS membership may evolve with our understanding
of the problem, including support from hardware
and software teams - Ensure all GIG/RP/Core Helps and Learns
- Weekly review of user issues selected by AD
decide on escalation - Inform TG development plans
37User Support Team (UST)Advanced Support
- For applications/groups judged by TG management
to be groundbreaking in exploiting DEEP/WIDE TG
infrastructure - Embedded Point Of Contact (labor intensive)
- Becomes de-facto member of the application group
- Prior working relationship with the application
group a plus - Can write and test code, redesign algorithms,
optimize etc - But no throwing over the fence
- Represents needs of the application group to
systems people, if required - Alerts AD to success stories
38Science Gateways
39The Gateway Concept
- The Goal and Approach
- To engage advanced scientific communities that
are not traditional users of the supercomputing
centers. - We will build science gateways providing
community-tailored access to TeraGrid services
and capabilities - Science Gateways take two forms
- Web-based Portals that front-end Grid Services
that provide TeraGrid-deployed applications used
by a community. - Coordinated access points enabling users to move
seamlessly between TeraGrid and other grids.
40Grid Portal Gateways
- The Portal accessed through a browser or desktop
tools - Provides Grid authentication and access to
services - Provide direct access to TeraGrid hosted
applications as services - The Required Support Services
- Searchable Metadata catalogs
- Information Space Management.
- Workflow managers
- Resource brokers
- Application deployment services
- Authorization services.
- Builds on NSF DOE software
- Use NMI Portal Framework, GridPort
- NMI Grid Tools Condor, Globus, etc.
- OSG, HEP tools Clarens, MonaLisa
41Gateways that Bridge to Community Grids
- Many Community Grids already exist or are being
built - NEESGrid, LIGO, Earth Systems Grid, NVO, Open
Science Grid, etc. - TeraGrid will provide a service framework to
enable access in ways that are transparent to
their users. - The community maintains and controls the Gateway
- Different Communities have different
requirements. - NEES and LEAD will use TeraGrid to provision
compute services - LIGO and NVO have substantial data distribution
problems. - All of them require remote execution of complex
workflows.
Storms Forming
Forecast Model
Streaming Observations
Data Mining
On-Demand Grid Computing
42The Architecture of Gateway Services
Grid Portal Server
TeraGrid Gateway Services
Proxy Certificate Server / vault
User Metadata Catalog
Application Workflow
Application Deployment
Application Events
Resource Broker
Replica Mgmt
App. Resource catalogs
Core Grid Services
Security
Notification Service
Data Management Service
Grid Orchestration
Resource Allocation
Accounting Service
Policy
Administration Monitoring
Reservations And Scheduling
Web Services Resource Framework Web Services
Notification
Physical Resource Layer
43Flood Modeling Gateway
- University of Texas at Austin
- TACC
- Center for Research in Water Resources
- Center for Space Research
- Oak Ridge National Lab
- Purdue University
Large-scale flooding along Brays Bayou in central
Houston triggered by heavy rainfall during
Tropical Storm Allison (June 9, 2001) caused more
than 2 billion of damage.
Gordon Wells, UT David Maidment, UT Budhu
Bhaduri, ORNL, Gilbert Rochon, Purdue
44Biomedical and Biology
- Building Biomedical Communities Dan Reed (UNC)
- National Evolutionary Synthesis Center
- Carolina Center for Exploratory Genetic Analysis
- Portals and federated databases for the Biomed
research community
45Neutron Science Gateway
- Matching Instrument science with TeraGrid
- Focusing on application use cases that can be
uniquely served by TeraGrid. For example, a
proposed scenario from March 2003 SETENS proposal
Neutron Science TeraGrid Gateway (NSTG) John
Cobb, ORNL
46Summary
47SURA Opportunities with TeraGrid
- Identify applications in SURA universities
- Leverage TeraGrid technologies in SURA grid
activities - Provide tech transfer back to TeraGrid
- Deploy grids in SURA region that interoperate
with TeraGrid, allow users to scale up to
TeraGrid
48Summary
- TeraGrid is a national cyberinfrastructure
partnership for world-class computational
research, with many types of resources for
knowledge discovery - TeraGrid aims to integrate with other grids, and
other researchers around the world - All Hands Meeting in April will yield new details
on roadmaps, software, capabilities, and
opportunities.
49For More Information
- TeraGrid http//www.teragrid.org
- TACC http//www.tacc.utexas.edu
- Feel free to contact me directly
- Jay Boisseau boisseau_at_tacc.utexas.edu
- Note TACC is about to announce the
newInternational Partnerships for Advanced
Computing (IPAC) program, with initial members
from Latin America and Spain, which can serve as
gateway into TeraGrid.