Title: The GRID Era Vanguard, Miami 23 September 2002
1The GRID EraVanguard, Miami23 September 2002
- Gordon Bell gbell_at_microsoft.com
- Bay Area Research Center
- Microsoft Corporation
2(No Transcript)
3Grid Technology
- Background
- Taxonomy
- Grids from seti_at_home to arbitrary cluster
platform - Grid-type examples and web services
- Summary
4Bright spots in the evolution from prototypes
to early suppliers
- Early efforts
- UC/Berkeley NOW U of WI Condor NASA
BeowulfgtAirframes - Argonne (Foster el al) Grid Globus Toolkit,
Grid Forum - Entropia startup (Andrew Chien)
- Andrew Grimshaw - Avaki
- Making Legion vision real. A reality check.
- United Devices MetaProcessor Platform
- UK e-Sciences research program. Apps-based
funding. Web services based Grid data
orientation. - Nimrod at Monash University
- Parameter scans other low hanging fruit
- Encapsulate apps! Excel-- language/control
mgmt. - Legacy apps. No time or resources to modify
code independent of age, author, or language
e.g. Java. - Grid Services Gray et al Skyservice and
Terraservice - Goal providing a web service must be as easy as
publishing and using a web pageand will occur!!!
5Grid Taxonomy c2002
X
- Taxonomy interesting vs necessity
- Cycle scavenging and object evaluation (e.g.
seti_at_home, QCD) - File distribution/sharing for IP theft e.g.
Napster - Databases programs for a community(astronomy,
bioinformatics, CERN, NCAR) - Workbenches web workflow chem, bio
- Exchanges many sites operating together
- Single, large objectified pipeline e.g. NASA.
- Grid as a cluster platform! Transparent
arbitrary access including load balancing - Homogeneous/heterogeneous computers
- Fixed or variable network loading
- Intranet, extranet, internet (many organizations)
Web SVCs
6Grids Ready for prime time.
- Economics thief, scavenger, power, efficiency or
resource e.g. programs and database sharing? - Embarrassingly parallel apps e.g. parameter
scans killer apps - Coupling large, separated apps
- Entry points for web services
- Research funding thats where the money is.
7 Grid ComputingConcepts, Appplications, and
Technologies
- Ian Foster
- Mathematics and Computer Science Division
- Argonne National Laboratory
- and
- Department of Computer Science
- The University of Chicago
- www.mcs.anl.gov/foster/talks.htm.
Grid Computing in Canada Workshop, University of
Alberta, May 1, 2002
8Globus Toolkit
- A software toolkit addressing key technical
problems in the development of Grid-enabled
tools, services, and applications - Offer a modular set of orthogonal services
- Enable incremental development of grid-enabled
tools and applications - Implement standard Grid protocols and APIs
- Available under liberal open source license
- Large community of developers users
- Commercial support
9Globus Toolkit Core Services(work in progress
since 1996)
- Small, standards based set of protocols.Embedded
in Open Source ToolkitEnabling web services and
applications - Scheduling (Globus Resource Alloc. Manager)
- Low-level scheduler API
- Information (Directory Service) UDDI
- Uniform access to structure/state information
- Communications (Nexus)
- Multimethod communication QoS management
- Security (Globus Security Infrastructure)
- Single sign-on, key management
- Health and status (Heartbeat monitor)
- Remote file access (Global Access to Storage)
10Living in an Exponential World(1) Computing
Sensors
- Moores Law transistor count doubles each 18
months
Magnetohydro- dynamics star formation
11The 13.6 TF TeraGridComputing at 40 Gb/s
Site Resources
Site Resources
26
HPSS
HPSS
4
24
External Networks
External Networks
8
5
Caltech
Argonne
External Networks
External Networks
NCSA/PACI 8 TF 240 TB
SDSC 4.1 TF 225 TB
Site Resources
Site Resources
HPSS
UniTree
NCSA, SDSC, Caltech, Argonne
www.teragrid.org
12Access Grid
- High-end group work and collaboration technology
- Grid services being used for discovery,
configuration, authentication - O(50) systems deployed worldwide
- Basis for SC2001 SC Global event in November
2001 - www.scglobal.org
www.accessgrid.org
13(No Transcript)
14Grids at NASA Aviation Safety
Wing Models
- Lift Capabilities
- Drag Capabilities
- Responsiveness
Stabilizer Models
Airframe Models
- Deflection capabilities
- Responsiveness
Crew Capabilities - accuracy - perception -
stamina - re-action times - SOPs
Engine Models
- Braking performance
- Steering capabilities
- Traction
- Dampening capabilities
- Thrust performance
- Reverse Thrust performance
- Responsiveness
- Fuel Consumption
Landing Gear Models
15A Large Virtual Organization CERNs Large Hadron
Collider
- 1800 Physicists, 150 Institutes, 32 Countries
- 100 PB of data by 2010 50,000 CPUs?
16Life Sciences Telemicroscopy
DATA ACQUISITION
PROCESSING,ANALYSIS
ADVANCEDVISUALIZATION
NETWORK
COMPUTATIONALRESOURCES
IMAGING INSTRUMENTS
LARGE DATABASES
17Nimrod/G and GriddLeS Grid Programming with Ease
- David Abramson
- Monash University
- DSTC
18Building on Legacy Software
- Nimrod
- Support parametric computation without
programming - High performance distributed computing
- Clusters (1994 1997)
- The Grid (1997 - ) (Added QOS through
Computational Economy) - Nimrod/O Optimisation on the Grid
- Active Sheets Spreadsheet interface
- GriddLeS
- General Grid Applications using Legacy Software
- Whole applications as components
- Using no new primitives in application
19Parametric Execution
- Study the behaviour of some of the output
variables against a range of different input
scenarios. - Allows real time analysis for many applications
- More realistic simulations
- More rigorous science
- More robust engineering
20Some science is hitting a wallFTP and GREP are
not adequate (Jim Gray)
- You can FTP 1 MB in 1 sec.
- You can FTP 1 GB / min.
- 2 days and 1K
- 3 years and 1M
- You can GREP 1 GB in a minute
- You can GREP 1 TB in 2 days
- You can GREP 1 PB in 3 years.
- 1PB 10,000 gtgt 1,000 disks
- At some point you need indices to limit
search parallel data search and analysis - Goal using dbases. Make it easy to
- Publish Record structured data
- Find data anywhere in the network
- Get the subset you need!
- Explore datasets interactively
- Database becomes the file system!!!
21SkyServer delivering a web service to the
astronomy community. Prototype for other
sciences? Gray, Szalay, et al
- First paper on the SkyServer
- http//research.microsoft.com/gray/Papers/MSR_TR
_2001_77_Virtual_Observatory.pdf - http//research.microsoft.com/gray/Papers/MSR_TR
_2001_77_Virtual_Observatory.doc - Later, more detailed paper for database community
- http//research.microsoft.com/gray/Papers/MSR_TR
_01_104_SkyServer_V1.pdf - http//research.microsoft.com/gray/Papers/MSR_TR
_01_104_SkyServer_V1.doc
22What can be learned from Sky Server?
- Its about data, not about harvesting flops
- 1-2 hr. query programs versus 1 wk programs based
on grep - 10 minute runs versus 3 day compute searches
- Database viewpoint. 100x speed-ups
- Avoid costly re-computation and searches
- Use indices and PARALLEL I/O. Read / Write gtgt1.
- Parallelism is automatic, transparent, and just
depends on the number of computers/disks. - Limited experience and talent to use dbases.
23 Sloan Digital Sky Survey Analysis
Size distribution of galaxy clusters?
24Network concerns
- Very high cost
- (1 1) / GByte to send on the net Fedex and
160 GByte shipments are cheaper - DSL at home is 0.15 - 0.30
- Disks cost less than 2/GByte to purchase
- Low availability of fast links (last mile
problem) - Labs universities have DS3 links at most, and
they are very expensive - Traffic Instant messaging, music stealing
- Performance at desktop is poor
- 1- 10 Mbps very poor communication links
- Manage trade-in fast links for cheap links!!
25For More Information
- www.gridtoday.com
- Grid concepts, projects
- www.mcs.anl.gov/foster
- The Globus Project
- www.globus.org
- Open Grid Services Arch.
- www.globus.org/ogsa
- Global Grid Forum
- www.gridforum.org
- GriPhyN project
- www.griphyn.org
- Avika, Entropia, UK eSciences, Condor,
- Grid books in press
Published July 1998
26The EndAre GRIDs already a real, useful,
computing structure?When will Grids be
ubiquitous?
27 Toward a Framework for Preparing and Executing
Adaptive Grid Programs An Overview of the GrADS
Project Sponsored by NSF NGS Ken Kennedy Center
for High Performance Software Rice
University http//www.cs.rice.edu/ken/Presentati
ons/GrADSOverview.pdf
28GrADS Vision
- Build a National Problem-Solving System on the
Grid - Transparent to the user, who sees a
problem-solving system - Software Support for Application Development on
Grids - Goal Design and build programming systems for
the Grid that broaden the community of users who
can develop and run applications in this complex
environment - Challenges
- Presenting a high-level application development
interface - If programming is hard, the Grid will not not
reach its potential - Designing and constructing applications for
adaptability - Late mapping of applications to Grid resources
- Monitoring and control of performance
- When should the application be interrupted and
remapped?
29Today Globus
- Developed by Ian Foster and Carl Kesselman
- Grew from the I-Way (SC-95)
- Basic Services for distributed computing
- Resource discovery and information services
- User authentication and access control
- Job initiation
- Communication services (Nexus and MPI)
- Applications are programmed by hand
- Many applications
- User responsible for resource mapping and all
communication - Existing users acknowledge how hard this is
30GrADSoft Architecture
Program Preparation System
31Configurable Object Program
- Goal Provide minimum needed to automate resource
selection and program launch - Code
- Today MPI program
- Tomorrow more general representations
- Mapper
- Defines required resources and affinities to
specialized resources - Given a set of resources, maps computation to
those resources - Optimal performance, given all requirements met
- Performance Model
- Given a set of resources and mapping, estimates
performance - Serves as objective function for Resource
Negotiator/Scheduler
32GrADSoft Architecture
Execution Environment
33Grid nj. An arbitrary distributed, cluster
platform
- A geographical and multi-organizational
collection of diverse computers dynamically
configured as cluster platforms responding to
arbitrary, ill-defined jobs thrown at it. - Costs are not necessarily favorable e.g. disks
are less expensive than cost to transfer data. - Latency and bandwidth are non-deterministic, gt
cluster with unknown, dynamic parameters - Once a large body of data exists for a job, it is
inherently bound to (set into) fixed resources. - Large datasets I/O bound programs need to be
with their data or be database accesses - But are there resources there to share?
- Costs may vary, depending on organization
34Cactus(Allen, Dramlitsch, Seidel, Shalf, Radke)
- Modular, portable framework for parallel,
multidimensional simulations - Construct codes by linking
- Small core (flesh) mgmt services
- Selected modules (thorns) Numerical methods,
grids domain decomps, visualization and
steering, etc. - Custom linking/configuration tools
- Developed for astrophysics, but not
astrophysics-specific
Thorns
Cactus flesh
www.cactuscode.org
35Cactus ExampleTerascale Computing
- Solved EEs for gravitational waves (real code)
- Tightly coupled, communications required through
derivatives - Must communicate 30MB/step between machines
- Time step take 1.6 sec
- Used 10 ghost zones along direction of machines
communicate every 10 steps - Compression/decomp. on all data passed in this
direction - Achieved 70-80 scaling, 200GF (only 14 scaling
without tricks)
36Grid Projects in eScience
37Nimrod/G and GriddLeS Grid Programming with Ease
- David Abramson
- Monash University
- DSTC
38Distributed computing comes to the rescue .
For each scenario Generate input files Copy
them to remote node Run SMOG model Post-process
output files Copy results back to root
39Its just too hard!
- Doing by hand
- Nightmare!!
- Programming with (say) MPI
- Overkill
- No fault tolerance
- Codes no longer work as stand alone code.
- Scientists dont want to know about underlying
technologies
40Building on Legacy Software
- Nimrod
- Support parametric computation without
programming - High performance distributed computing
- Clusters (1994 1997)
- The Grid (1997 - ) (Added QOS through
Computational Economy) - Nimrod/O Optimisation on the Grid
- Active Sheets Spreadsheet interface
- GriddLeS
- General Grid Applications using Legacy Software
- Whole applications as components
- Using no new primitives in application
41Parametric Execution
- Study the behaviour of some of the output
variables against a range of different input
scenarios. - Allows real time analysis for many applications
- More realistic simulations
- More rigorous science
- More robust engineering
42In Nimrod, an application doesnt know it has
been Grid enabled
Input Files Substitution
Output Files
Computational Nodes
Root Machine
43How does a user develop an application using
Nimrod?
Description of Parameters PLAN FILE
44GriddLeS
- Significant body of useful applications that are
not Grid Enabled - Lessons from Nimrod
- Users will avoid rewriting applications if
possible - Applications need to function in the Grid and
standalone - Users are not experts in parallel/distributed
computing - General Grid computations have much more general
interconnections than possible with Nimrod. - Legacy Applications are Components!
45GriddLeS
- Specification of the interconnections between
components - Interfaces for discovering resources and mapping
the computations to them - Locate data files in the grid and connect the
applications to them - Schedule computations on the underlying platforms
and making sure the network bandwidth is
available and - Monitor the progress of the grid computation and
reassign work to other parts of the Grid as
necessary.
46Today Condor
- Support for matching application requirements to
resources - User and resource provider write ClassAD
specifications - System matches ClassADs for applications with
ClassADs for resources - Selects the best match based on a
user-specified priority - Can extend to Grid via Globus (Condor-G)
- What is missing?
- User must handle application mapping tasks
- No dynamic resource selection
- No checkpoint/migration (resource re-selection)
- Performance matching is straightforward
- Priorities coded into ClassADs