Title: State of the Grid 2002
1State of the Grid 2002
GGF, July 2002
-
- Dr. Francine Berman
- Director, NPACI and SDSC
- Professor, Department of Computer Science and
Engineering, UCSD -
2Grid Computing in the News
3Has the Grid been oversold?
4State of the Grid 2002
- How did we get here?
- Short history
- Trends
- Grids Today -- Where are we now?
- Drivers
- Applications
- Technology
- Has the Grid been oversold?
- The Grid the next 10 years
- New application paradigms
- New devices
- Policy and social dynamics
- New research
5A short history of the Grid in 2 slides
- Science as a team sport
- Grand Challenge Problems of the 80s
- Parallel computation
- First serious study of program coordination
- Gigabit Testbed program
- Focus on applications for the local to wide area
- I-Way at SC 95
- First large-scale grid experiment
- Provided the basis for modern grid infrastructure
efforts
61995 2000 Maturation of Grid Computing
- Grid book gave a comprehensive view of the
state of the art - Important infrastructure and middleware efforts
initiated - Globus, Legion, Condor, NWS, SRB, NetSolve,
AppLes, etc. - 2000 Beginnings of a Global Grid
- Evolution of the Global Grid Forum
- Some projects evolving to de facto standards
(e.g. Globus, Condor, NWS)
7Current Trends 1
- Proliferation of resources
- Everyone has computers
- Multiple IP addresses per person
- Increasing Application Complexity
- Multi-scale
- Multi-disciplinary
- Immense amounts of data
Arpanet1969
Internet2002
8Current Trends 2
- Coordination/collaboration is default mode of
interaction - The Internet
- Globalization, virtualization
- Open source movement
- At scale, heterogeneity is a fact of life
9Grid Computing Today
DISCOM SinRG APGrid IPG
10It Can Be Done Real World Distributed
Applications
- Walmart Inventory Control
- Satellite technology used to track every item
- Bar code information sent to remote data centers
to update inventory database and cash flow
estimates - Satellite networking used to coordinate vast
operations - Inventory adjusted in real time to avoid
shortages and predictdemand - Data management,prediction, real-time,wide-area
synchronization
11Real World Distributed Applications
- Everquest
- 45 communal world servers (26 high-end PCs per
server) supporting 430,000 players - Real-time interaction, individualized database
management, back channel communication between
players - Data management adapted to span both client PC
and server to mitigate communication delays - Game masters interact with players for real-time
game management
12Real World Distributed Applications
- SETI_at_home
- 3.8M users in 226 countries
- 1200 CPU years/day
- 38 TF sustained (Japanese Earth Simulator is 40
TF peak) - 1.7 ZETTAflop over last 3 years (1021, beyond
peta and exa ) - Highly heterogeneous gt77 different processor
types
13From Distributed Applications to Grid
applications
- Real world applications demonstrate that it is
technically, commercially, and economically
viable to deploy robust, large-scale distributed
applications - The Grid should accelerate progress
- Current applications currently developed as
stand-alone entities - Availability of Grid services should allow
designers to build on existing infrastructure
and evolving technologies - Applications developed for the Grid will likely
contribute to community infrastructure,
standards, progress - Stability/performance of multiple applications
currently not addressed - Grid applications will help evolve policies for a
scalable Grid
14Unifying the InfrastructureA Community Grid
Model
- Roll your own SW but agree on interfaces, service
architecture, standards
NPACI, TeraGrid Grid Applications
Grid Applications
NPACI Grid Middleware
User-focused and targeted grid middleware,
tools, and services
Common Infrastructure layer (NMI, GGF standards,
OGSA etc.)
Grid Resources
15Common Grid Application Paradigms
- Minimal Communication applications
- Includes embarrassingly parallel apps, parameter
sweeps - Staged/linked applications (do part A then do
part B) - Includes remote instrument applications (get
input from instrument at site A, compute/analyze
data at site B) - Access to resources (get stuff from/do something
at site A) - Portals, access mechanisms
16Minimal Communication Applications
- MCell -- Simulation of neuromuscular synaptic
transmission - Uses Monte Carlo diffusion and chemical
reaction algorithm in 3D to simulate complex
biochemical interactions of molecules - Molecular environment represented as 3D space
in which trajectories of ligands against cell
membranes tracked - Ultimate Goal A complete molecular model of
neuro-transmission at level of entire cell
MCell Animation
17Grid Software Provides the Distribution Mechanism
- MCell is a parameter sweep application
- Parameter Sweeps class of applications that are
structured as multiple instances of an
experiment (task) with distinct parameter sets - APST middleware developed to schedule/deploy Grid
parameter sweeps and promote application
performance
SharedInput files
CLIENT
APSTMiddleware
experiments
Shared output files
MCell Software agent
Grid of clusters and MPPs
18Linking Applications
- Telescience -- Derivation 3D information about a
sample from a series of 2D projections - Links computation and data management to
unique, expensive instrumentation - Requires advanced visualization tools for
segmentation and analysis of the data - Provides critical database of biological
structure info for neuroscientists
3D Model of the Node of Ranvier
19Grid Software used to coordinate resources
PORTALS
SECURITY, RESOURCE MANAGEMENT, DATA
MANAGEMENT SERVICES
20Access to Resources
- NASA Information Power Grid
- Computing resources ?800 CPU nodes in half a
dozen SGI Origin 2000s and several workstation
clusters at Ames, Glenn, and Langley, ?200 nodes
in a Condor pool - Storage resources 50-100 Terabytes of archival
information/data storage uniformly and securely
accessible from all IPG systems via MCAT/SRB and
GSIftp / Gridftp - Globus provides Grid common services
- Stable and supported operational environment
- (help desk, consulting, training)
21Grid portals used to provide resource access and
information
- Grid portals provide
- submission, tracking, and management of jobs
running on IPG resources - integration of IPG services with the user desktop
environment, - access to persistent user profiles.
22Community Grid Model
23Resources and Infrastructure Driver
- TeraGrid will provide in aggregate
- 13.6 trillion calculations per second
- Over 600 trillion bytes of immediately accessible
data - 40 gigabit per second network speed
- TeraGrid will provide a new paradigm for
data-oriented computing - Critical for disaster response, genomics,
environmental modeling,
24TeraGrid
574p IA-32 Chiba City
256p HP X-Class
128p Origin
128p HP V2500
HR Display VR Facilities
Caltech Data collection and analysis applications
92p IA-32
HPSS
HPSS
ANL Visualization
SDSC Data-orientedcomputing
Myrinet
UniTree
HPSS
Myrinet
1024p IA-32 320p IA-64
1176p IBM SP Blue Horizon
1500p Origin
Sun E10K
NCSA Compute-Intensive
25TeraGrid Common Infrastructure Environment
- Linux Operating
- Environment
- Basic and Core Globus
- Services
- GSI (Grid Security Infrastructure)
- GSI-enabled SSH and GSIFTP
- GRAM (Grid Resource Allocation Management)
- GridFTP
- Information Service
- Distributed accounting
- MPICH-G2
- Science Portals
- Advanced and Data Services
- Replica Management Tools
- GRAM-2 (GRAM extensions)
- CAS (Community Authorization Service)
- Condor-G (as brokering super scheduler)
- SDSC SRB (Storage Resource Broker)
- APST user middleware, etc.
26Measures of Success
- Use a single node on TeraGrid
- Portals, SW, scheduling should allow access to
designated individual resources - Use as a wide-area cluster computer
- Use multiple designated resources of the same
type for a single computation - Use as a simple grid
- Use multiple resources of different types in a
staged or concurrent computation - Use as a full grid
- Use multiple nodes as an ensemble via advanced SW
environment
27Scaling TeraGrid -- ETF
- 4 TeraGrid sites PSC have just responded to NSF
Dear Colleague letter for Extensible Terascale
Facility (ETF) - ETF will contain
- More networking
- More data
- Larger nodes
- Heterogeneity
ETF Team Fran Berman (SDSC) Charlie Catlett
(ANL) Ian Foster (ANL) Paul Messina (CalTech,
ANL)Mike Levine (PSC) Dan Reed (NCSA) Ralph
Roskies (PSC) Rick Stevens (ANL)
28Heterogeneity
- What does it mean to add heterogeneous nodes to
TeraGrid? - Ensure that basic services supported on all
architectures - GRAM, GridFTP, etc.
- Ensure that core services supported on all
architectures - GIIS, NWS, SRB, MPICH-G, etc.
- Develop mechanism for scheduling between
architectures - sophisticated techniques will require research
- Continually monitor system to ensure SW
compatibility - Ensure that SW mitigates differences
- data formats, byte ordering, etc.
- Deploy consistent user interfaces and portals
- Help support a growing application community
29Has the Grid been Grid oversold?
- The promise of the Grid has been not been
oversold but the difficulty of developing the
requisite Grid infrastructure has been
underestimated.
30The Grid is more than just a development and
integration project
- E.g. TeraGrid was developed as a vision for the
future, which needs to accomplished - Within a short time frame (3 years)
- Using current and emerging products
- Leveraging current research
- Targeting a current set of cutting edge
applications - There are many questions not addressed by
TeraGrid and other projects that must be
addressed to develop a usable and useful Grid
information infrastructure
31 - We have barely scratched the surface on
- Program development environments (e.g. compiling
for the grid) - Debugging
- Fault tolerance
- Modeling of dynamic, unpredictable environments
- Grid market economy (allocation, accounting, cost
models) - Extreme heterogeneity (sensors, supercomputers,
cell phones, cars, etc.)
32Grids The Next 10 Years
Picture ofdigital sky
33Ultimate Goal A useful, usable, stable Grid
that is
- High-capacity (rich in resources)
- High capability (rich in options)
- Persistent (promoting stable infra
knowledgeable workforce) - Evolutionary (able to adapt to new technologies
and uses) - Usable (accessible, robust, easy-to-use)
- Scalable (growth must be a part of the design)
- Adequately supported (both in funding and
commitment) - Useful, able to support/promote new science
- More cooperative than competitive
34Applications are key to the Grids success
- Applications will use whatever parts of the
infrastructure that can really deliver - Apps developers are willing to be dedicated and
creative but it has to be worth their while - Goal is for Grid infrastructure to some day be as
natural a part of the picture as the OS - Grid will be considered oversold if the only
people who can productively use it are the
techies
35New Application Paradigms for the Grid
- Next generation Grid applications
- Adaptive applications (run where you can find
resources satisfying criteria X) - Real-time applications (do something right now)
- Coordinated applications (dynamic programming,
branch and bound) - Poly-applications (choice of resources for
different components) - We still cant throw any application at the
grid and have SW determine where and how it will
run
36Focus on Adaptive Applications
Everyware -- a highly adaptive Grid application
which investigated solutions to the Ramsey Number
Problem
- Everyware Wolski, SC98 ran on
- Berkeley NOW
- Convex Exemplar
- Cray T3E
- HPVM/NT Supercluster
- IBM SP2
- Intel X86
- SGI
- Sun SPARC
- Tera MTA
- Laptops
- Batch Systems
- Condor
- Globus
- Java
- Legion
- Netsolve
- Unix
- Windows NT
all at the same time
37Everyware adapted to whatever resources were
available
- Application was
- Ubiquitous -- able to run everywhere
- Resource Aware capable of managing
heterogeneity - Adaptive -- able to dynamically tailor its
behavior to optimize performance - NOT embarrassingly parallel -- Branch-and-Bound
and Simulated Annealing used
38Focus on Performance Grid Programming
Environments
- Grid-friendly libraries, compilers, schedulers,
performance tools - Program performance through adaptation
- Contract-based grid performance economy
- The GrADS (Grid Application Development Software)
Project - Design and development of a Grid program
development and execution environment
39Focus on Data A Killer App for the Grid
- Over the next decade, data will come from
everywhere - Scientific instruments
- Experiments
- Sensors and sensornets
- New devices (personal digital devices,
computer-enabled clothing, cars, ) - And be used by everyone
- Scientists
- Consumers
- Educators
- General public
- SW environment will need to support unprecedented
diversity, globalization, integration, scale, and
use
Data from instruments
Data from simulations
Data from analysis
40From Data to Information to Knowledge
High speed networking
Networked Storage (SAN)
instruments
Storage hardware
sensornets
SDSC Data and Knowledge Systems Program
41Developing a Data Condominium
- Well-defined interfaces for tools, services
- Services to be added, swapped in as they evolve
A Data Condominium Grid Data
Tools/ServicesFramework
42Next generation Grids will include new
technologies
- New devices
- PDAs, sensors, cars, clothes, smart dust,smart
bandaids,
- Wired and Wireless
- HPWREN (Hans-Werner Braun, Frank Vernon et al.)
- 45 Mbps between Mount Laguna telescope and SDSU
- Wireless access to Pala, Rincon, La Jolla Indian
Reservations, etc.
43Fiber, Wireless, Compute, Data, Software
UCSD campus GridFrom Sensors to Supercomputers
44Global Information InfrastructureA Grid of Grids
45The Global Information Infrastructure must cross
technical, political, social boundaries
46Policy and Social Dyanmics
- Policy issues must be considered up front
- Social engineering will be at least as important
as software engineering - Well-defined interfaces will be critical for
successful software development - Application communities will need to participate
from the beginning
47A New Model of Interaction is Needed
- The Grid is about cooperation
- Process of building and using the Grid predicated
on shared resources, agreement, coordination - Will need to identify/adopt systems that
incentivize the individual to contribute to the
success of the group - Examples highway driving, EOT PACI, single line
bank queues - Cooperation must bridge technological, political,
social boundaries - Will need to settle issues of turf, credit,
resources
48Weve made a great start but there is much
farther to go
- New Research
- Fault tolerance
- Compilers, performance prediction, scheduling
- Agent-based computing
- Location-independence
- Extreme heterogeneity
- Applications which push the envelope
- Applications with dependences
- Adaptivity, poly-algorithms, commercial
applications
- Policy and economics for grid environments
- Sharing as a default mode of interaction
- Trust, policy, negotiation, payment
- Usability and performance
- Programming environments for the Grid, portals
- Adaptivity as the prevalent mode for performance
Drivers Wanted We should be developing a new
generation of scientists, technologists and
solutions to address the challenges of a Global
Grid Infrastructure
49Thank You