Title: Computing on the Grid
1Computing on the Grid
- Sandra Bittner, ANL ltbittner_at_mcs.anl.govgt
- Sharon Brunett, Caltech ltsharon_at_cacr.caltech.edugt
- Derek Simmel, PSC ltdsimmel_at_psc.edugt
- and many others participating in the TeraGrid
Project
2The TeraGrid VisionDistributing the resources is
better than putting them at one site
- Build new, extensible, grid-based infrastructure
to support grid-enabled scientific applications - New hardware, new networks, new software, new
practices, new policies - Expand centers to support cyberinfrastructure
- Distributed, coordinated operations center
- Exploit unique partner expertise and resources to
make whole greater than the sum of its parts - Leverage homogeneity to make the distributed
computing easier and simplify initial development
and standardization - Run single job across entire TeraGrid
- Move executables between sites
3TeraGrid Objectives
- Create unprecedented capability
- integrated with extant PACI capabilities
- supporting a new class of scientific research
- Deploy a balanced, distributed system
- not a distributed computer but rather
- a distributed system using Grid technologies
- computing and data management
- visualization and scientific application analysis
- Define an open and extensible infrastructure
- an enabling cyberinfrastructure for scientific
research - extensible beyond the original sites
- NCSA, SDSC, ANL, Caltech, PSC (under ETF)
- ETF2 awards to TACC, Indian/Purdue, ORNL
4Measuring Success
- Breakthrough science via new capabilities
- integrated capabilities more powerful than
existing PACI resources - current PACI users and new communities requiring
Grids - An extensible Grid
- design principles assume heterogeneity and more
than four sites - Grid hierarchy, scalable, replicable, and
interoperable - formally documented design, standard protocols
and specifications - encourage, support, and leverage open source
software - A pathway for current users
- evolutionary paths from current practice
- provide examples, tools, and training to exploit
Grid capabilities - user support, user support, and user support
5TeraGrid Application Targets
- Multiple classes of user support
- each with differing implementation complexity
- minimal change from current practice
- new models, software, and applications
- Usage exemplars
- traditional supercomputing made simpler
- remote access to data archives and computers
- distributed data archive access and correlation
- remote rendering and visualization
- remote sensor and instrument coupling
6TeraGrid Components
- Compute hardware
- Intel/Linux Clusters, Alpha SMP clusters, POWER4
cluster, - Large-scale storage systems
- hundreds of terabytes for secondary storage
- Very high-speed network backbone
- bandwidth for rich interaction and tight
coupling - Grid middleware
- Globus, data management,
- Next-generation applications
7Wide Variety of Usage Scenarios
- Tightly coupled jobs storing vast amounts of
data, performing visualization remotely as well
as making data available through online
collections (ENZO) - Thousands of independent jobs using data from a
distributed data collection (NVO) - Applications employing novel latency-hiding
algorithms adapting to a changing number of
processors (PPM) - High-throughput applications of loosely coupled
jobs (MCell)
8 Prioritization to ensure success
- Diagnostic apps to test functionality (ENZO, PPM)
- Flagship apps provide early requirements for
software and hardware functionality - Cactus, ENZO, EOL, Gadu, LSMS, MCell, MM5,
Montage, NAMD, NekTar, PPM, Quake, Real time
brain mapping - Plans to approach existing grid communities
- GriPhyN, NEES, BIRN, etc.
9TeraGrid Roaming
Attend TeraGrid training class or access
web-based TG training materials
Receive Account info, pointers to training, POC
for user services Ops, pointers to login
resources, atlas of TG resources
Apply for TeraGrid Account
Develop and optimize code at Caltech
Run large job at NCSA, move data from SRB to
local scratch and store results in SRB
Run large job at SDSC, store data using SRB.
Run larger job using both SDSC and PSC systems
together, move data from SRB to local scratch
storing results in SRB
Move small output set from SRB to ANL cluster, do
visualization experiments, render small sample,
store results in SRB
Move large output data set from SRB to
remote-access storage cache at SDSC, render using
ANL hardware, store results in SRB
(Recompile may be necessary in some cases)
10Strategy Define Build Standard Services
- Finite Number of TeraGrid Services
- defined as specifications, protocols, APIs
- separate from implementation
- Extending TeraGrid
- adoption of TeraGrid specifications, protocols,
APIs - protocols, data formats, behavior specifications,
SLAs - Engineering and Verification
- shared software repository
- build sources, scripts
- service must be accompanied by test module
11If your site wants to join the TeraGrid
- You must be this high to ride the TeraGrid
- fast network
- non-trivial resources
- meet SLA (testing and QA requirements)
- become a member of the virtual organization
- capable of TG hosting (peering arrangements)
- TG Software Environment
- user (download, configure, install, and run TG
1.0) - developer (join distributed engineering team)
- TG Virtual Organization
- Operations, User-services
- Add new capability
- make the whole greater than the sum of its parts
repo.teragrid.org
12TeraGrid Resources and Services
- Compute Resources
- Data Resources and Data Management Services
- Visualization Resources
- Network Resources
- Grid Services
- Grid Scheduling Reservations
13Compute Resources Today
4 Lambdas
CHI
LA
96 GeForce4 Graphics Pipes
100 TB DataWulf
96 Pentium4 64 2p Madison Myrinet
32 Pentium4 52 2p Madison 20 2p Madison Myrinet
20 TB
Caltech
ANL
256 2p Madison 667 2p Madison Myrinet
128 2p Madison 256 2p Madison Myrinet
1.1 TF Power4 Federation
500 TB FCS SAN
230 TB FCS SAN
NCSA
SDSC
PSC
Charlie Catlett ltcatlett_at_mcs.anl.govgt Pete
Beckman ltbeckman_at_mcs.anl.govgt
14Common Data Services
- Database systems five systems (5x32 IBM Regatta)
acquired at SDSC for DB2 and other related DB
apps Oracle and DB2 clients planned at NCSA
15The TeraGrid Visualization Strategy
- Combine existing resources and current
technology - Commodity clustering and commodity graphics
- Grid technology
- Access Grid collaborative tools
- Efforts, expertise, and tools from each of the
ETF sites - Volume Rendering (SDSC)
- Coupled Visualization (PSC)
- Volume Rendering (Caltech)
- VisBench (NCSA)
- Grid and Visualization Services (ANL)
- to enable new and novel ways of visually
interacting with simulations and data
16Two Types of Loosely Coupled Visualization
Interactive Visualization
TeraGrid Simulation
Computationally steeringthrough pre-computed data
TeraGridnetwork
User
Batch Visualization
short term storage
Long term storage
Processing batch jobssuch as movie generation
17On-Demand and Collaborative Visualization
TeraGrid Simulation
On-Demand Visualization
Coupling simulation with interaction
AG
Voyager Recording
Collaborative Visualization
Preprocessing,filtering, featuredetection.
Multi-party viewingand collaboration
18ETF Network Today
19TeraGrid Optical Network
Ciena Metro DWDM (operated by site)
818 W. 7th St. (CENIC Hub)
455 N. Cityfront Plaza (Qwest Fiber Collocation
Facility)
2200mi
Ciena CoreStream Long-Haul DWDM (Operated by
Qwest)
Los Angeles
DTF Backbone Core Router
Chicago
Cisco Long-Haul DWDM (Operated by CENIC)
Additional Sites And Networks
Routers / Switch-Routers
Starlight
DTF Local Site Resources and External Network
Connections
115mi
25mi
140mi
25mi
??mi
Caltech
SDSC
ANL
NCSA
PSC
Site Border Router
Cluster Aggregation Switch
Caltech Systems
SDSC Systems
NCSA Systems
ANL Systems
PSC Systems
20ETF Network Expansion
ETF Network Segments Blue 4x 10Gb/s White
13x 10Gb/s
21Grid Services A Layered Grid Architecture
Talking to things communication (Internet
protocols) security
Connectivity
Controlling things locally Access to,
control of, resources
Fabric
22TeraGrid Runtime Environment
CREDENTIAL
Single sign-on via grid-id
Assignment of credentials to user proxies
Globus Credential
Mutual user-resource authentication
Site 2
Authenticated interprocess communication
Mappingtolocal ids
Certificate
23Common Authentication Service
- Standardized GSI authentication across all
TeraGrid systems allows use of the same
certificate - Developed coordinated cert acceptance policy
- today accept
- NCSA/Alliance
- SDSC
- PSC
- DOE Science Grid
- Developing procedures and tools to simplify the
management of certificates - Grid mapfile distribution
- simplified certificate request/retrieval
24TeraGrid Software Stack
- A social contract with the user
- LORA Learn Once, Run Anywhere
- Precise definitions
- services
- software
- user environment
- Reproducible
- standard configure, build, and install
- single CVS repository for software
- initial releases for IA-64, IA-32, Power4, Alpha
25Inca Test Harness
- Example pre-production screenshots
26Grid Scheduling Job Management Condor-G, the
User Interface
- Condor-G is the preferred job management
interface - job scheduling, submission, tracking, etc.
- allows for complex job relationships and data
staging issues - interfaces to Globus layers transparently
- allows you to use your workstation as your
interface to the grid - The ability to determine current system loads and
queue status will come in the form of a web
interface - allows for user-drive load balancing across
resources - might look a lot like the PACI HotPage https//ho
tpage.paci.org/
27Advanced Reservations
- Users need
- Automated means
- Users may request reservations for specific
resources - Co-scheduling resources
- Instrument, detectors
- Multi-site single execution, peer scheduling
- Across heterogeneous sites and platforms
- Local Scheduling PBS/Maui Condor-G
- Manual Process help_at_teragrid.org
- Hot Topic in Research
28TG Nuts Bolts
29Approaches to TeraGrid Use
- Log in interactively to a login node at a
TeraGrid site and work from there - no client software to install/maintain yourself
- execute tasks from your interactive session
- Work from your local workstation and authenticate
remotely to TeraGrid resources - comfort and convenience of working "at home"
- may have to install/maintain add'l TG software
30Requesting a TeraGrid Allocation
31Allocations Policies
- TG resources allocated via the PACI allocations
and review process - modeled after NSF process
- TG considered as single resource for grid
allocations - Different levels of review for different size
allocation requests - DAC up to 10,000
- PRAC/AAB lt200,000 SUs/year
- NRAC 200,000 SUs/year
- Policies/procedures posted at
- http//www.paci.org/Allocations.html
- Proposal submission through the PACI On-Line
Proposal System (POPS) - https//pops-submit.paci.org/
32User Certificates for TeraGrid
- Why use certificates for authentication?
- Facilitates Single Sign-On
- enter your pass-phrase only once per session,
regardless of how many systems and services that
you access on the Grid during that session - one pass-phrase to remember (to protect your
private key), instead of one for each system - Widespread Use and Acceptance
- certificate-based authentication is standard for
modern Web commerce and secure services
33Certificate-Based Authentication
Registration Authority
Certificate Authority
A
CA
RA
Client Z
34TeraGrid Authentication-gtTasks
GIIS
RA/CA
HPC
HPC
HPC
Data
Viz
35Integrating complex resources
- SRB
- Visualization Resources
- ANL booth demos
- fractal demo during hands-on session
- Real-time equipment
- shake tables
- microscopy
- haptic devices
- Integration work in progress
- More research topics
36SoftEnv System
- Software package management system instituting
symbolic keys for user environments - Replaces traditional UNIX dot files
- Supports community keys
- Programmable similar to other dot files
- Integrated user environment transfer
- Well suited to software lifecycles
- Offers unified view of heterogeneous platforms
37TG Users Data Responsibilities
- Storage lifetimes
- check local policy command TG documentation
- Data transfer- srb, grid-ftp, scp
- Data restoration services/Back-ups
- varies by site
- Job Check-pointing
- responsibility rests with the user
- Email Relay only, no local delivery
- forwarded to address of registration
- Parallel Systems GPFS, PVFS
38Onion Layers of MPI...
- Cross-site MPI (MPICH-G2, PACX, etc...)
- between administrative domains (sites)
- Inter-cluster (VMI)
- within-site, multiple clusters
- Intra-cluster MPI
- Portable MPI (MPICH)
- OS Vendor MPI (SGI, Cray, IBM,...)
- Interconnect-Vendor MPI (MPICH-GM,
Quadrics-MPI,...) - http//www.paci.org
39Multi-Site, Single Execution
40Computing Models
41TeraGrid Computing Paradigm
- Traditional parallel processing
- Distributed parallel processing
- Pipelined/dataflow processing
42Traditional Parallel Processing
- Tightly coupled multicomputers are meeting
traditional needs of large scale scientific
applications - compute bound codes
- faster and more CPUs
- memory hungry codes
- deeper cache, more local memory
- tightly coupled, communications intensive codes
- high bandwidth, low latency interconnect message
passing between tasks - I/O bound codes
- large capacity, high performance disk subsystems
43Traditional Parallel Processing - When Have
We Hit the Wall?
- Applications can outgrow or be limited by a
single parallel computer - heterogeneity desirable due to application
components - storage, memory and/or computing demands exceed
resources of a single system - more robustness desired
- integrate remote instruments
44Traditional Parallel Processing
- Single executables to be on a single remote
machine - big assumptions
- runtime necessities (e.g. executables, input
files, shared objects) available on remote
system! - login to a head node, choose a submission
mechanism - Direct, interactive execution
- mpirun np 16 ./a.out
- Through a batch job manager
- qsub my_script
- where my_script describes executable location,
runtime duration, redirection of stdout/err,
mpirun specification
45Traditional Parallel Processing II
- Through globus
- globusrun -r some-teragrid-head-node.teragrid.or
g/jobmanager -f my_rsl_script - where my_rsl_script describes the same details as
in the qsub my_script! - Through Condor-G
- condor_submit my_condor_script
- where my_condor_script describes the same details
as the globus my_rsl_script!
46Distributed Parallel Processing
- Decompose application over geographically
distributed resources - functional or domain decomposition fits well
- take advantage of load balancing opportunities
- think about latency impact
- Improved utilization of a many resources
- Flexible job management
47Distributed Parallel Processing II
- Multiple executables to run on multiple remote
systems - tools for pushing runtime necessities to remote
sites - Storage Resource Broker, gsiscp,ftp,
globus-url-copy - copies files between sites - globus-job-submit my_script
- returns https address for monitoring and post
processing control
48Distributed Parallel Processing III
- Multi-site runs need co-allocated resources
- VMI-mpich jobs can run multi-site
- vmirun np local_cpus grid_vmi gnp total_cpus
-crm crm_name key key_value ./a.out - server/client socket based data exchanges between
sites - Globus and Condor-G based multi-site job
submission - create appropriate RSL script
49Pipelined/dataflow processing
- Suited for problems which can be divided into a
series of sequential tasks where - multiple instances of problem need executing
- series of data needs processing with multiple
operations on each series - information from one processing phase can be
passed to next phase before current phase is
complete
50Pipelined/dataflow processing
- Key requirement for efficiency
- fast communication between adjacent processes in
a pipeline - interconnect on TeraGrid resources meets this
need - Common examples
- frequency filters
- Monte Carlo
51Pipelined CMS Job Flow
2) Launch secondary job on remote pool of nodes
get input files via Globus tools (GASS)
Master Condor job running at Caltech
Secondary Condor job on remote pool
5) Secondary reports complete to master
Caltech workstation
6) Master starts reconstruction jobs via Globus
jobmanager on cluster
9) Reconstruction job reports complete to master
Vladimir Litvin, Caltech Scott Koranda,
NCSA/Univ of Wisc-Milwaulke
3a) 75 Monte Carlo jobs on remote Condor pool
3b) 25 Monte Carlo jobs on remote nodes via
Condor
7) gsiftp fetches data from mass storage
4) 100 data files transferred via gsiftp, 1 GB
each
TG or other Linux cluster
8) Processed database stored to mass storage
TeraGrid Globus-enabled FTP server
52Academic Professional Development
53Help your students to
- Master Problem Decomposition
- Develop Discipline focus
- Be curious and explore
- Identify problems and appropriate solutions
- Experiment in and out of the classroom
- Perform simulations
- Play with Legos other gadgets
- Find mentors
- Learn to ask the right questions!
54University Students
- Develop advanced skills
- Systematic problem solving Design
- Professional Discipline
- Team building
- Practice skills in controlled academic
environment - Begin building professional networking
- Field work, exercise skills in real world
- Internships, Co-Ops, Part/Full Time Jobs
- Research collaborations
- Study Abroad
- Find mentors
55Professional
- Develop professional relationships, expand
professional network, find mentors - Participation in Standards Groups
- Professional Societies Continuing Education
- Scientific understanding of Problems
- Communication Skills
- Verbal, Written, Cultural Awareness, Conflict
Resolution - Presentation Skills
- Engineering/Technical Skills
- Time Project Management
56TeraGrid Gallery
57IBM Itanium Nodes(SDSC, NCSA, ANL, Caltech)
58IBM Itanium Nodes
59Myrinet Interconnect High Speed Network
Force-10Aggregator
128-way Myrinet Switch
60Terascale Computing System
LeMieux.psc.edu6 TeraflopsHP/Compaq Tru64
Unix3000 Alpha EV68 processors (750x4p) 4GB
RAM/node (3TB total) Dual-rail Quadrics
Interconnect
61HP GS1280 "Marvel" Systems
Rachel.psc.edu0.4 TeraflopsHP/Compaq Tru64
Unix128 Alpha EV7 processors512GB Shared
MemoryQuadrics connectivity to LeMieux
62µMural2 (Argonne)
63Visualization Example
- This fun parallel application demo computes and
renders Mandelbrot fractals. - The fractal demo is provided courtesy of Dan
Nurmi, Mathematics and Computer Science Division,
Argonne National Laboratory
64Colliding Spheres Demonstration
- Parallel finite element computation of
contact-impact problems - large deformations
- Frictional contact done in parallel
- orthogonal range queries
- Typical contact enforcement method
- start on continuum level
- Discretize non smooth differential equations
- Alternative method
- start with discretized contact surfaces
- consider collisions between triangles of finite
element surface mesh - Collision search can be costly since surface
elements can collide with any other element - Store contact surface mesh on all nodes
65Colliding Spheres Demo
- Back end simulation running on 10 Itanium 2
TeraGrid nodes at Caltech - job launched with simple globus script
- Visualization running on Iris Explorer on laptop
- allows user to change velocities, initial
properties of colliding spheres
66Resource References
- TeraGrid http//www.teragrid.org
- Condor-G http//www.cs.wisc.edu/condorg
- DagMan http//www.cs.wisc.edu/dagman
- Globus http//www.globus.org
- PBS http//www.openpbs.org
- SRB http//www.npaci.edu/DICE/SRB
- SoftEnv http//www.mcs.anl.gov/systems/software
67Resource References
- MPICH http//www.mcs.anl.gov/mpi
- MPICH-G2 http//www.niu.edu/mpi
- VMI http//www.ncsa.uiuc.edu
- PACX-MPI http//www.hlrs.de/organization/pds/proje
cts/pacx-mpi/