Title: Why we need the
1Why we need the Grid the successor to the WWW
- IOP Talk
- Dr Paul Jeffreys
- Particle Physics Department
- CLRC - Rutherford Appleton Laboratory
- p.jeffreys_at_rl.ac.uk
2Outline
- e-Science and the Grid
- The Grid Vision
- What are Grids ?
- Grid vs web
- Challenges
- Applications
- Technical
- Example Projects
- LHC Requirements
- DataGrid
- Conclusions
3e-Science and the Grid
4e-Science and the Grid
- e-Science
- means science increasingly done through
distributed global collaborations enabled by the
Internet, using very large data collections,
terascale computing resources and high
performance visualisation - Grid ...
- the word Grid is chosen by analogy with the
electric power grid, which provides pervasive
access to power and, like the computer and a
small number of other advances, has had a
dramatic impact on human capabilities and
society. We believe that by providing pervasive,
dependable, consistent and inexpensive access to
advanced computational capabilities, databases,
sensors and people, computational grids will have
a similar transforming effect, allowing new
classes of applications to emerge. - Web??
- Foster Kesselman,
1999
5Front page FT, 7th Mar 2000
6Who will use the Grid ?
- Computational scientists engineers
- real-time visualisation, rapid turn-round
simulation evaluation - Experimental scientists
- remote instrumentationsupercomputers, advanced
visualisation - Collaborations
- high-end VC, distributed resources (CPU, data,
people) - Corporations
- global enterprises, virtual environments, CAD,...
- Environmentalists
- ozone depletion, climate change, pollution--gt
coupled models, knowledge databases - Training education
- virtual lecture rooms - distributed classes
7What does this all mean ?
- Enabling collaboration of dispersed communities
- Enable large scale applications (10,000 CPU,
pipelines) - Transparent access to high-end resources
- Uniform look feel to wide range of resources
- High availability and fault tolerance
- High performance / throughput
- Location independence of resources
- Span multiple administrative domains
- Scalability of resources
- Respect for local implementations policies
8and what Grids are not !...
- NOT free
- organisations place LARGE resources on Grid
- NO good for close coupled computation
- e.g. CFD calculations
- NOT here yet !
- Grids built by collaborative effort to enable
collaboration - incrementally built no Big Bang
- many good ideas exist some implemented
- very early days of Grid integration
9The Vision
10Jack Dongarras Vision
11JD Concomitant Issues
12JD Pictorial View
13Guofei Jiangs Vision
14Promise of ubiquitous computing?
- Talk from RAL.
- Download from http//www.escience.clrc.ac.uk/!!
- Global virtual computer
- Grid middleware as the OS of this computer
- Accessible via a plug on the wall
- Dependable - predictable, sustained performance
- Consistent - standard interfaces, services and
operating parameters - Pervasive - widely available via controlled
access - Inexpensive - perceived cost relative to
alternative solutions We will
eventually take the Grid for granted
15Where the Grid is coming from
- Work over the last decade on several distinct
topics - Metacomputing - combining distributed
heterogeneous computing resources on a single
problem - Data Archiving - building collections of data on
specific topics with open metadata catalogues and
well-documented data formats - Collaborative Working - network-based facilities
for distributed, concurrent working on shared
information - Data visualization - large-scale immersive
facilities for 3D visual interaction with data
coupled to computational analysis - Network-based information systems - WAIS, Gopher,
WWW - Instrumentation control - remote control of
experimental equipment and real time data
gathering systems - A consistent way of combining many types of
computing resource located anywhere to solve
complex problems
16The Web vs The Grid
- The Web supports wide area data/information
location and retrieval - The Grid supports complete process initiation and
execution including any necessary data location
and retrieval - offers the potential to carry out significantly
large tasks - opens up new capabilities for knowledge
generation - Currently Grid tools provide a relatively low
level of operational control - higher level tools will be developed to automate
low level processes - agent technology will eventually support real
time dynamic process optimisation
17NASA IPG Motivation
- NASA Information Power Grid
- Large-scale science and engineering is done
through the interaction of people, heterogeneous
computing resources, information systems, and
instruments, all of which are geographically and
organizationally dispersed. - The overall motivation for Grids is to
facilitate the routine interactions of these
resources in order to support large-scale science
and engineering. - Motivation
- Many facilities are moving toward making
resources available on the Grid - The Information Power Grid is NASAs push for a
persistent, secure, robust implementation of the
Grid
18IPG Case Study 1- Online Instrumentation
Unitary Plan Wind Tunnel
Multi-source Data Analysis
desktop VR clients with shared controls
real-time collection
archival storage
Computer simulations
19IPG Case Study 2- Distributed Supercomputing
- OVERFLOW-D2 with Latency Tolerant Algorithm Mods
- Grid mechanisms for
- Resource allocation
- Distributed startup
- I/O and configuration
- Advance Reservation
- 13.5 Million Grid Points over computers at 3 or
more sites
LaRC Origin
ANL Origin
GRC Origin
ARC Origin
20IPG Case Study 3- Collaborative Engineering
Boeing Rapid Response Center
SFO Hangars
United MOC
Remote Displays
Digital White Board
Virtual Iron Bird
Remote Displays
United Maintenance Systems
Line Mechanic
Digital White Board
Digital White Board
Wireless Bridge
Wireless Digital - Video - Audio - Non-
Destructive Imaging - Sketchpad - Portable
Maintenance Aid
CAD/CAE Table
CAD/CAE Table
Boeing Maintenance Systems
CAD/CAE Printer
CAD/CAE Printer
CAD/CAE Printer
NASA Ames Aviation ExtraNet/ IPG
21IPG Case Study 6- Data Grid
- Numerous data sources generating PB/yr
- Simulation 100-1000 MB/s
- Instruments Satellites 100 MB/s
- Larger, more distributed user communities
- Data are increasingly community resources
- Data analysis is increasingly multidisciplinary
22CERN The Grid
- Dependable, consistent, pervasive access to
high-end resources - Dependable
- provides performance and functionality guarantees
- Consistent
- uniform interfaces to a wide variety of resources
- Pervasive
- ability to plug in from anywhere
23CERN The Grid from a Services View
E.g.,
24Challenges
25Science-driven Grid applications
- Environmental science
- coupled atmosphere and ocean simulations with
long simulated timescales at high resolution - Biological science
- multiple protein folding simulations to generate
statistically valid models of complex molecules - Astronomy - Virtual Observatory
- searching across many instrument-specific data
archives to study a new class of object at all
wavelengths - Materials science
- combining and analysing data from different
experimental facilities to derive the structure
of complex new materials
26Application Challenges
- Computational modelling, data analysis by
dispersed communities, multi-disciplinary
simulations - aviation, HEP data analysis, climate modelling
- whole system simulation (e.g. aircraft)
- whole living cell simulation
- Online instrumentation access real-time
analysis - national facilities (e.g. synchrotron light
source at ANL) - Shared data archives
- EO data, genome data
- Collaborative visualisation analysis
- shared virtual worlds...
27Issues
- Collaborating but dispersed research communities
- world-wide ?
- Heterogeneous resources
- desktops to supercomputers
- Multi-disciplinary
- Coupled simulation models/codes
- Data archives, curation
- Remote access to large resources
- Fast turn-round for online instruments
28Technical Challenges
- Cross-admin domain, multi-national
- security, access policies, reimbursement
- no central control
- Resource stability
- characteristics change over time and location
- Complex distributed applications
- co-allocation, advance reservation
- optimisations (e.g. caches)
- Guaranteed end-to-end performance
- heterogeneity, fault-tolerance
- Hidden complexity
29Technical Details - 1
- Security - a matter of trust
- map global to local domain PKI based
certificates - distinguish authentication from authorisation
- Data management
- caching, high throughput transfers, metadata,
objects - Performance
- grid view, application view
- multi-host, multi-site, non-repeatable, archives
- Workload
- multi-site, forecasting, resource/job description
(ClassAds) - advance resource reservation, co-allocation
30Technical Details - 2
- Computing Fabric
- scalability, fault tolerance
- concept of service .vs. system (cluster)
- cluster hiding
- Information Services
- resource discovery
- metadata catalogue
31Example Projects
32SETI - Searching for Life
- Arecibo telescope in Puerto Rico
- Screensaver
- Home page in33 languages
- http//setiathome.ssl.berkeley.edu/
33EntropiaTM
- Brokerage Free download use idle-cycles
- For-Profit Not-For-Profit Philanthropic
Membership - Certified project code
- Great Internet Mersenne Prime Search
- Entropia I - concept proof
- 124,380 machines, 302,449 tasks
- Fight AIDS_at_Home
- Entropia 2000 - production
- 5,589 machines, 46,716 tasks
- Future projects
- environmental, economic, scientific,
mathematical, entertainment, product design
34High Throughput Computing -Condor-
- HPC delivers large number cycle in bursts
- HTC delivers sustained cycles over a long period
- Job management
- Both traditional clusters idle workstations
- Heterogeneous
- UNIX
- Porting to WNT
- Sensitive to workstation use
- Checkpointing job migration
- Resource brokerage - ClassAds
35The Globus Project
- Basic research
- resource management, storage, security,networking
, QoS, policy, etc.
- Toolkit - bag of services, NOT an integrated
solution - Information Service (MDS) - GRIS/GIIS
- Remote file management - GASS
- Process monitoring - HBM
- Executable management - GEM
- Resource management - GRAM
- Security - GSI
- Many users of Globusincreasing
36LHC Experiments Requirements
37LHC Computing Challenge
- Data written to tape 5 Petabytes/Year (1 PB
1015 Bytes) - 0.1 to 1 Exabyte (1 EB 1018
Bytes) (2010) (2020 ?) Total for the
LHC Experiments
- Higgs
- New Particles
- Quark-Gluon Plasma
- CP Violation
38An LHC Collaboration
- 1850 Physicists
- 150 Institutes
- 34 countries
39CPU estimation
Capacity that can purchased for the value of the
equipment present in 2000
Non-LHC
10K SI951200 processors
LHC
technology-price curve (40 annual price
improvement)
40Disk estimation
Non-LHC
LHC
technology-price curve (40 annual price
improvement)
41Long term tape estimates
42Funding
- Requirements growing faster than Moores law
- CERNs overall budget is fixed
Estimated cost of facility at CERN 30 of
offline requirements
Budget level in 2000 for all physics data
handling
assumes physics in July 2005, rapid ramp-up of
luminosity
43Regional Computing Centres
- Exploit established computing expertise
infrastructure - In national labs, universities
- National funding
- Reduce dependence on links to CERN
- Active data available nearby maintained through
a fat, fast, reliable network link - Devolve control over resource allocation
- national interests?
- regional interests?
- at the expense of physics interests?
44LHC Data Access Patterns
Access Rates (aggregate, average) 100 Mbytes/s
(2-5 physicists) 500 Mbytes/s (5-10
physicists) 1000 Mbytes/s (50 physicists) 2000
Mbytes/s (150 physicists)
Typical particle physics experiment in 2000-2005
One year of acquisition and analysis of data
Raw Data 1000 Tbytes
Reco-V1 1000 Tbytes
Reco-V2 1000 Tbytes
ESD-V1.1 100 Tbytes
ESD-V1.2 100 Tbytes
ESD-V2.1 100 Tbytes
ESD-V2.2 100 Tbytes
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
45Regional Centres - a Multi-Tier Model
46More realistically - a Grid Topology
47The DataGrid Project
- 21 partners, 3 years, EU-sponsored, OpenSource
- Middleware
- workload management
- data management
- performance monitoring
- compute, storage network fabrics
- Testbed
- link national PP Grids together
- EO Bioscience workpackages
- Production environment for LHC simulation
- Link with GriPhyN, PPDG in US
48DataGrid Testbed sites
Acknowledgment A.Ghiselli
49DataGrid Summary
- A grand vision !!
- Much technology still to develop
- Significant challenge to break down barriers
between admin domains - Built through collaboration to enable
collaboration - Will affect many disciplines (...across
discipline ?) - Significant growth in number of Grid projects
- EU UK government support
50The LHC Challenge - Summary
- Scalability ? cost ? management
- Thousands of processors, thousands of disks,
PetaBytes of data, Terabits/second of I/O b/w - Wide-area distribution
- WANs are only and will only be 1 of LANs
- Distribute, replicate, cache, synchronise the
data - Multiple ownership, policies, .
- Integration of this amorphous collection of
Regional Centres .. - .. with some attempt at optimisation
- Adaptability
- We shall only know how analysis is done once the
data arrives
51CONCLUSIONS
- The overall Grid Vision is to facilitate the
routine interactions of heterogeneous computing
resources, information systems, and instruments,
all of which are geographically and
organizationally dispersed, in order to support
large-scale science and engineering - (NASA paraphrase)
- To meet the computing demands of the LHC
experiments, a distributed computing model must
be embraced, the one foreseen is based on a
multi-tier model - (Required for financial and sociological reasons)
- Implementing a Grid for the LHC will present
new computing challenges - The Grid may lead to a revolution as profound as
the World Wide Web - Science will never be the same again DGRC,
Sept 2000