Title: Grid Computing 1
1Grid Computing 1
- Grid Book, Chapters 1, 2, 3, 22
- Implementing Distributed Synthetic Forces
Simulations in Metacomputing Environments - Brunett, Davis, Gottschalk, Messina, Kesselman
- http//www.globus.org
2Outline
- What is Grid computing?
- Grid computing applications
- Grid computing history
- Issues in Grid Computing
- Condor, Globus, Legion
- The next step
3What is Grid Computing?
- Computational Grid is a collection of
distributed, possibly heterogeneous resources
which can be used as an ensemble to execute
large-scale applications - Computational Grid also called metacomputer
4Computational Grids
- Term computational grid comes from an analogy
with the electric power grid - Electric power is ubiquitous
- Dont need to know the source (transformer,
generator) of the power or the power company that
serves it - Analogy falls down in the area of performance
- Ever-present search for cycles in HPC. Two foci
of research - In the box parallel computers -- PetaFLOPS
architectures - Increasing development of infrastructure and
middleware to leverage the performance potential
of distributed Computational Grids
5Grid Applications
- Distributed Supercomputing
- Distributed Supercomputing applications couple
multiple computational resources supercomputers
and/or workstations - Examples include
- SFExpress (large-scale modeling of battle
entities with complex interactive behavior for
distributed interactive simulation) - Climate Modeling (high resolution, long time
scales, complex models)
6Distributed Supercomputing Example SF Express
- SF Express (Synthetic Forces Express) large
scale distributed simulation of behavior and
movement of entities (tanks, trucks, airplanes,
etc.) for interactive battle simulation. - Entities require information about
- State of terrain
- Location and state of other entities
- Info updated several times a second
- Interest management allows entities to only look
at relevant information, enabling scalability
7SF Express
- Large scale SF Express run goals
- Simulation of 50,000 entities in 8/97, 100,000
entries in 3/98 - Increase fidelity and resolution of simulation
over previous runs - Improve
- Refresh rate
- Training environment responsiveness
- Number of automatic behaviors
- Ultimately use simulation for real-time planning
as well as training - Large scale runs extremely resource-intensive
8SF Express Programming Issues
- How should entities be mapped to computational
resources? - Entities receive information based on interests
- Communication reduced and localized based on
interest management - Consistency model for entity information must be
developed - Which entities can/should be replicated?
- How should updates be performed?
9SF Express Distributed Application Architecture
- D data server, I interest management, R
router, S simulation node
1050,000 entity SF Express Run
- 2 large-scale simulations run on August 11, 1997
1150,000 entity SF Express Run
- Simulation decomposed terrain (Saudi Arabia,
Kuwait, Iraq) contiguously among supercomputers - Each supercomputer simulated a specific area and
exchanged interest and state information with
other supercomputers - All data exchanges were flow-controlled
- Supercomputers fully interconnected, dedicated
for experiment - Success depended on moderate to significant
system administration, interventions, competent
system support personnel, and numerous phone
calls. - Subsequent Globus runs focused on improving data,
control management and operational issues for
wide area
12High-Throughput Applications
- Grid used to schedule large numbers of
independent or loosely coupled tasks with the
goal of putting unused cycles to work - High-throughput applications include RSA
keycracking, Seti_at_home (detection of
extra-terrestrial intelligence), MCell
13High-Throughput Applications
- Biggest master/slave parallel program in the
world with master website, slaves individual
computers
14High-Throughput Example - MCell
- MCell Monte Carlo simulation of cellular
microphysiology. Simulation implemented as
large-scale parameter sweep.
15 MCell
- MCell architecture simulations performed by
independent processors with distinct parameter
sets and shared input files
16 MCell Programming Issues
- How should we assign tasks to processors to
optimize locality? - How can we use partial results during execution
to steer the computation? - How do we mine all the resulting data from
experiments for results - During execution
- After execution
- How can we use all available resources?
17Data-Intensive Applications
- Focus is on synthesizing new information from
large amounts of physically distributed data - Examples include NILE (distributed system for
high energy physics experiments using data from
CLEO), SAR/SRB applications (Grid version of MS
Terraserver), digital library applications
18Data-Intensive Example - SARA
- SARA Synthetic Aperture Radar Atlas
- application developed at JPL and SDSC
- Goal Assemble/process files for users desired
image - Radar organized into tracks
- User selects track of interestand properties to
be highlighted - Raw data is filtered and converted to an image
format - Image displayed in web browser
19SARA Application Architecture
- Application structure focused around optimizing
the delivery and processing of distributed data
20SARA Programming Issues
- Which data server should replicated data be
accessed from? - Should computation be done at the data server or
data moved to a compute server or something in
between? - How big are the data files and how often will
they be accessed?
AppLeS/NWS
21TeleImmersion
- Focus is on use of immersive virtual reality
systems over a network - Combines generators, data sets and simulations
remote from users display environment - Often used to support collaboration
- Examples include
- Interactive scientific visualization (being
there with the data), industrial design, art and
entertainment
22Teleimmersion Example Combustion System Modeling
- A shared collaborative space
- Link people at multiple locations
- Share and steer scientific simulations on
supercomputer - Combustion code developed by Lori Freitag at ANL
- Boiler application used to troubleshoot and
design better products
Chicago
San Diego
23Early Experiences with Grid Computing
- Gigabit Testbeds Program
- Late 80s, early 90s, gigabit testbed program
was developed as joint NSF, DARPA, CNRI
(Corporation for Networking Research, Bob Kahn)
initiative - Goals were to
- investigate potential architecture for a
gigabit/sec network testbed - explore usefulness for end-users
24Gigabit Testbeds Early 90s
- 6 testbeds formed
- CASA (southwest)
- MAGIC (midwest)
- BLANCA (midwest)
- AURORA (northeast)
- NECTAR (northeast)
- VISTANET (southeast)
- Each had a unique blend of research in
applications and in networking and computer
science research
25Gigabit Testbeds
26Gigabit Testbeds
27I-Way
- First large-scale modern Grid experiment
- Put together for SC95 (the Supercomputing
Conference) - I-Way consisted of a Grid of 17 sites connected
by vBNS - Over 60 applications ran on the I-WAY during SC95
28I-Way Architecture
- Each I-WAY site served by an I-POP (I-WAY Point
of Presence) used for - authentication of distributed applications
- distribution of associated libraries and other
software - monitoring the connectivity of the I-WAY virtual
network - Users could use single authentication and job
submission across multiple sites or they could
work directly with end-users - Scheduling done with a human-in-the-loop
29I-Soft Software for I-Way
- Kerberos based authentication
- I-POP initiated rsh to local resources
- AFS for distribution of software and state
- Central scheduler
- Dedicated I-WAY nodes on resource
- Interface to local scheduler
- Nexus based communication libraries
- MPI, CaveComm, CC
- In many ways, I-Way experience formed foundation
of Globus
30I-Way Application Cloud Detection
- Cloud detection from multimodal satellite data
- Want to determine if satellite image is clear,
partially cloudy or completely cloudy - Used remote supercomputer to enhance instruments
with - Real-time response
- Enhanced function, accuracy (of pixel image)
- Developed by C. Lee, Aerospace Corporation,
Kesselman, Caltech et al.
31PACIs
- 2 NSF Supercomputer Centers (PACIs) SDSC/NPACI
and NCSA/Alliance, both committed to Grid
computing - vBNS backbone between NCSA and SDSC running at
OC-12 with connectivity to over 100 locations at
speeds ranging from 45 Mb/s to 155 Mb/s or more
32PACI Grid
33NPACI Grid Activities
- Metasystems Thrust Area one of the NPACI
technology thrust areas - Goal is to create an operational metasystems for
NPACI - Metasystems players
- Globus (Kesselman)
- Legion (Grimshaw)
- AppLeS (Berman and Wolski)
- Network Weather Service (Wolski)
34Alliance Grid Activities
- Grid Task Force and Distributed Computing team
are Alliance teams - Globus supported as exclusive grid infrastructure
by Alliance - Grid concept pervasive throughout Alliance
- Access Grid developed for use by distributed
collaborative groups - Allliance grid players include Foster (Globus),
Livny (Condor), Stevens (ANL), Reed (Pablo), etc.
35Â Other Efforts
- Centurion Cluster Legion testbed
- Legion cluster housed at UVA
- 128 533 MHz Dec Alphas
- 128 Dual 400 MHz Pentium2
- Fast ethernet and myrinet
- Globus testbed GUSTO which supports Globus
infrastructure and application development - 125 sites in 23 countries as of 2/2000
- Testbed aggregated from partner sites (including
NPACI)
36GUSTO (Globus) Computational Grid
37IPG
- IPG Information Power Grid
- NASA effort in grid computing
- Globus supported as underlying infrastructure
- Application focus include aerospace design,
environmental and space applications
38 Research and Development Foci for the Grid
- Applications
- Questions revolve around design and development
of Grid-aware applications - Different programming models polyalgorithms,
components, mixed languages, etc. - Program development environment and tools
required for development and execution of
performance-efficient applications
Applications
Middleware
Infrastructure
Resources
39 Research and Development Foci for the Grid
- Middleware
- Questions revolve around the development of tools
and environments which facilitate application
performance - Software must be able to assess and utilize
dynamic performance characteristics of resources
to support application - Agent-based computing and resource negotiation
Applications
Middleware
Infrastructure
Resources
40 Research and Development Foci for the Grid
- Infrastructure
- Development of infrastructure that presents a
virtual machine view of the Grid to users - Questions revolve around providing basic services
to user security, remote file transfer,
resource management, etc., as well as exposing
performance characteristics. - Services must be supported by heterogeneous and
interoperate
Applications
Middleware
Infrastructure
Resources
41 Research and Development Foci for the Grid
- Resources
- Questions revolve around heterogeneity and scale.
- New challenges focus on combining wireless and
wired, static and dynamic, low-power and
high-power, cheap and expensive resources - Performance characteristics of grid resources
vary dramatically, integrating them to support
performance of individual and multiple
applciations extremely challenging
Applications
Middleware
Infrastructure
Resources
42What is the difference between Grid Computing,
Cluster Computing and the Web?
- Cluster computing focuses on platforms consisting
of often homogeneous interconnected nodes in a
single administrative domain. - Clusters often consist of PCs or workstations
and relatively fast networks - Cluster components can be shared or dedicated
- Application focus is on cycle-stealing
computations, high-throughput computations,
distributed computations - Beowulfs are clustered essentially administered
as a single multicomputer XXX
43The Web and Grids
- Web focuses on platforms consisting of large
number of resources and networks which support
naming services, protocols, search engines, etc. - Web consists of very diverse set of
computational, storage, communication, and other
resources shared by an immense number of users - Application focus is on access to information,
electronic commerce, etc. - Grid focuses on computing on ensembles of
distributed heterogeneous resources - Typically used as a platform for high performance
computing - Some grid resources may be shared, other may be
dedicated or reserved - Application focus is on high-performance,
resource-intensive applicationsÂ