Title: The Grid a brief briefing
1The Grida brief briefing
- Carole Goble
- Information Management Group
2Roadmap
- What is the Grid?
- Example projects
- Relationship to the Semantic Web
- Example architectures
- The international programme
3Take Home
- The Grid is an international activity
- The Grid has attracted high profile industrial
and government support and funding - The Information/Knowledge Grid is in many ways
indistinguishable from the Semantic Web - The Grid Communitys understanding of generic and
theoretical issues for the IK Grid is immature
and hackery.
4So whats the Grid?
- Isnt it just High Performance Computing for High
Energy Physicists?
5Why Grids?
- Large-scale science and engineering are done
through the interaction of people, heterogeneous
computing resources, information systems, and
instruments, all of which are geographically and
organizationally dispersed. - The overall motivation for Grids is to
facilitate the routine interactions of these
resources in order to support large-scale science
and engineering.
From Bill Johnston 27 July 01
6CERN Large Hadron Collider (LHC)
Raw Data 1 Petabyte / sec Filtered 100Mbyte /
sec 1 Petabyte / year 1 Million CD ROMs
CMS Detector
7Why Grids?
- A biochemist exploits 10,000 computers to screen
100,000 compounds in an hour - A biologist combines a range of diverse and
distributed resources (databases, tools,
instruments) to answer complex questions - 1,000 physicists worldwide pool resources for
petaop analyses of petabytes of data - Civil engineers collaborate to design, execute,
analyze shake table experiments
From Steve Tuecke 12 Oct. 01
8Why Grids? (contd.)
- Climate scientists visualize, annotate, analyze
terabyte simulation datasets - An emergency response team couples real time
data, weather model, population data - A multidisciplinary analysis in aerospace couples
code and data in four companies - A home user invokes architectural design
functions at an application service provider
From Steve Tuecke 12 Oct. 01
9Why Grids? (contd.)
- An application service provider purchases cycles
from compute cycle providers - Scientists working for a multinational soap
company design a new product - A community group pools members PCs to analyze
alternative designs for a local road
From Steve Tuecke 12 Oct. 01
10The Grid Vision
- flexible, secure, coordinated resource-sharing
among dynamic collections of individuals,
institutions, and resourceswhat we refer to as
virtual organisations - The Anatomy of the Grid Enabling Scalable
Virtual Organizations Foster, Kesselman and
Tuecke, 2001
11The Grid Problem
- Enable communities (virtual organizations) to
share geographically distributed resources as
they pursue common goals -- assuming the absence
of - central location,
- central control,
- omniscience,
- existing trust relationships.
From Steve Tuecke 12 Oct. 01
12Large scale
- Multi-disciplinary simulation
- Decision support and optimization
- Virtual prototyping
- Collaborative analysis and visualization
- Large scale distributed data management
- Large scale distributed computation
- High speed communications
- Dynamic collaborative virtual organisations
13What is it? Where is it? How to get it? When did
it? happen?
Who knows it? Why does it? What are you doing?
interrogation
results
workflows
Governance Control
Collaboration Grid
Technology Grid
14Online Access to Scientific Instruments
Advanced Photon Source
wide-area dissemination
desktop VR clients with shared controls
archival storage
real-time collection
tomographic reconstruction
DOE X-ray grand challenge ANL, USC/ISI, NIST,
U.Chicago
From Steve Tuecke 12 Oct. 01
15Supernova Cosmology
16- GRID Software Components
- An efficient data transfer mechanism
- A resource broker
- An interface for coupled applications
- An interface for "computing-on-demand
- An interface for interactive use
- Distributed Simulation Codes for e-Science
Testbed - Biomolecular simulations
- Weather prediction
- Coupled CAE simulations
- ASP-type services
- Real-time data processing
17Network for EarthquakeEngineering Simulation
- NEESgrid national infrastructure to couple
earthquake engineers with experimental
facilities, databases, computers, each other - On-demand access to experiments, data streams,
computing, archives, collaboration
NEESgrid Argonne, Michigan, NCSA, UIUC, USC
From Steve Tuecke 12 Oct. 01
18Home ComputersEvaluate AIDS Drugs
- Community
- 1000s of home computer users
- Philanthropic computing vendor (Entropia)
- Research group (Scripps)
- Common goal advance AIDS research
From Steve Tuecke 12 Oct. 01
19myGrid
- Personalised extensible environments for
data-intensive in silico experiments in biology - Straightforward discovery, interoperation,
sharing - Workflow oriented
- provenance
- propagating change
- Individual creativity collaborative working
- personalisation
20myGrid resources
- Question
- Nucleotide binding protein in mouse
- Answer
- P12345 in Swiss-Prot is an ATPase
- Terri Attwood is an expert on this
- Jackson Labs have a database but you need to
register - A paper has just been published in Proteins by
the Stanford lab on this.
21GeoDISE engineering design optimisation
- Access to knowledge repository
- Access to optimisation and search tools
- Industrial analysis codes
- Distributed computing and data resources in
design optimisation - Applied to industrial problems - large scale CFD
codes - Demonstrate scalability across distributed
computational and data resources and teams of
designers
22GeoDISE Modern engineering firms are global and
distributed
How to ?
CAD and analysis tools, user interfaces, PSEs,
and Visualization
improve design environments cope with legacy
code / systems
Optimisation methods
produce optimized designs
Management of distributed compute and data
resources
integrate large-scale systems in a
flexible way
Data archives (e.g. design/ system usage)
archive and re-use design history
Knowledge repositories knowledge capture and
reuse tools.
capture and re-use knowledge
- Not just a problem of using HPC
23Virtual Sky http//virtualsky.org/
24Broader Context
- Grid Computing has much in common with major
industrial thrusts - Business-to-business, Peer-to-peer, Application
Service Providers, Storage Service Providers,
Distributed Computing, Internet Computing - Sharing issues not adequately addressed by
existing technologies - Complicated requirements run program X at site
Y subject to community policy P, providing access
to data at Z according to policy Q - High performance unique demands of advanced
high-performance systems
From Steve Tuecke 12 Oct. 01
25Elements of the Problem
From Steve Tuecke 12 Oct. 01
- Resource sharing
- Computers, storage, sensors, networks,
- Sharing always conditional issues of trust,
policy, negotiation, payment, - Coordinated problem solving
- Beyond client-server distributed data analysis,
computation, collaboration, - Dynamic, multi-institutional virtual
organisations - Community overlays on classic org structures
- Large or small, static or dynamic
- Problem Solving Environments
26Broader Context
- Grid Computing has much in common with major
industrial thrusts - Business-to-business, Peer-to-peer, Application
Service Providers, Storage Service Providers,
Distributed Computing, Internet Computing - Sharing issues not adequately addressed by
existing technologies - Complicated requirements run program X at site
Y subject to community policy P, providing access
to data at Z according to policy Q - High performance unique demands of advanced
high-performance systems
From Steve Tuecke 12 Oct. 01
27The Globus Project
- Close collaboration with real Grid projects in
science and industry - Development and promotion of standard Grid
protocols to enable interoperability and shared
infrastructure - Development and promotion of standard Grid
software APIs and SDKs to enable portability and
code sharing - The Globus Toolkit Open source, reference
software base for building grid infrastructure
and applications - Global Grid Forum Development of standard
protocols and APIs for Grid computing
From Steve Tuecke 12 Oct. 01
28Doesnt Globus solve it all?
- Globus ToolKit is focused on the
Data/Computational layer - No database connectivity
- Little brokering, and static not dynamic
- Weak metadata management, workflow
- Trashes firewalls
- No, not everything is JCL, FTP and LDAP
- Distributed computation dominates etcetc
29Is it done?
- NASA Power Grid is the only one really working
- http//www.ipg.nasa.gov
- Linking similar supercomputers owned by the same
organisation - Computation-focused
- High Energy Physics is atypical
30Example Application Projects
- AstroGrid astronomy, etc. (UK)
- Earth Systems Grid environment (US DOE)
- EU DataGrid physics, environment, etc. (EU)
- EuroGrid various (EU)
- Fusion Collaboratory (US DOE)
- GridLab astrophysics, etc. (EU)
- Grid Physics Network (US NSF)
- MetaNEOS numerical optimization (US NSF)
- NEESgrid civil engineering (US NSF)
- RealityGrid (UK)
- DAME (UK)
- Comb-e-Chem (UK)
- GeoDISE (UK)
- iVDGL, StarLight (US/EU)
- DiscoveryNet (UK)
- myGrid (UK)
- GridPP (UK)
- Particle Physics Data Grid (US DOE)
- etc
31- Since the early days of mankind the primary
motivation for the establishment of communities
has been the idea that by being part of an
organized group the capabilities of an individual
are improved. The great progress in the area of
inter-computer communication led to the
development of means by which stand-alone
processing sub-systems can be integrated into
multi-computer communities.
Miron Livny, Study of Load Balancing Algorithms
for Decentralized Distributed Processing
Systems., Ph.D thesis, July 1983.
32Every Community needs a Matchmaker!
- Condor uses Matchmakers to build Computing
Communities out of Commodity Components - .. someone has to bring together community
members who have requests for goods and services
with members who offer them. - Both sides are looking for each other
- Both sides have constraints
- Both sides have preferences
33Lets look at some Architectures
34A Desiderata (adapted from Globus)
- Software development toolkits e.g. Globus toolkit
- Standard protocols, services APIs
- A modular bag of technologies
- Enable incremental development of grid-enabled
tools and applications - Reference implementations
- Learn through deployment and applications
- Open source
A p p l i c a t i o n s
Diverse global services
Core services
Local OS
35(No Transcript)
36Globus Layered Grid ArchitectureCERN - High
Energy Physics
From Steve Tuecke 12 Oct. 01
37Keith Jeffery
38"Reproduced by permission of the IT Innovation
Centre, University of Southampton."
http//www.it-innovation.soton.ac.uk
Three Layer Grid Abstraction
Interoperability, higher level ontologies,
reasoning, discovery, Reasoning services,
Discovery services
Fulfillment
Grid
Scientific Problems
Knowledge
Knowledge / capability
Processes
Information
Value chain
Semantics / process
Jobs and Data
Data
Data / applications
Raw Resources
39Architecture of a Grid
Discipline Specific Portals andScientific
Workflow Management Systems
Applications Simulations, Data Analysis,
etc. Toolkits Visualization, Data
Publication/Subscription, etc.
Grid Common Services Standardized Services and
Resources Interfaces
Collaboration and Remote Instrument Services
Grid Information Service
UniformResourceAccess
Co-Scheduling
Network Cache
Authentication Authorization
Security Services
Communication Services
Global Queuing
Global EventServices
Data Cataloguing
Uniform Data Access
Fault Management
Monitoring
Brokering
Auditing
Globus services
clusters
Distributed Resources
national supercomputer facilities
tertiary storage
national user facilities
Condor pools
networkcaches
High-speed Networks and Communications Services
40Architecture of a Grid upper layers
- Knowledge based query
- Tools to implement the human interfaces, e.g.
SciRun, ECCE, WebFlow, ..... - Mechanisms to express, organize, and manage the
workflow of problem ????solutions
(frameworks) - Access control
Problem Solving Environments
Applications and Supporting Tools
Grid enabled libraries (security, communication
services, data access, global event management,
etc.)
Application Development and Execution Support
Grid Common Services
Distributed Resources
41Knowledge Based Data Grids
Ingest Services
Management
Access Services
Knowledge or Topic-Based Query / Browse
Knowledge Repository for Rules
Relationships Between Concepts
Knowledge
XTM DTD
Rules - KQL
(Model-based Access)
Information Repository
Attribute- based Query
Attributes Semantics
XML DTD
SDLIP
Information
(Data Handling System - SRB)
Data
Fields Containers Folders
Storage (Replicas, Persistent IDs)
Grids
Feature-based Query
MCAT/HDF
42Astronomy Sky Survey Data Grid
1. Portals and Workbenches
2.Knowledge Resource Management
Bulk Data Analysis
Metadata View
Data View
Catalog Analysis
3.
Standard APIs and Protocols
Concept space
4.Grid Security Caching Replication Backup Schedu
ling
Information Discovery
Metadata delivery
Data Discovery
Data Delivery
5.
Standard Metadata format, Data model, Wire format
Catalog Mediator
6.
Data mediator
Catalog/Image Specific Access
Compute Resources
Catalogs
Data Archives
Derived Collections
7.
43User Interfaces
NSDL
Usage Enhancement
Delivery Presentation Aggregation - Channels
Information about collections
Core NSDL Bus
Meta-data delivery Data delivery Query Global
Ids Security Network
Metadata data access-based services
Virtual Collections Mediators
Collection Building
44ERA Concept model
45(No Transcript)
46The De Roure Triangle
Grid Computing
?
e-Science
Agents
Web Services Semantic Web
e-Business
47Roy Williams Paul Messina
California Institute of Technology
48So what is going on?
- UK http//www.escience-grid.org.uk/
- International http//www.gridforum.org/
49E-Science Programme
DG Research Councils
Grid TAG
E-Science Steering Committee
Director
Directors Management Role
Directors Awareness and Co-ordination Role
Generic Challenges EPSRC (15m), DTI (15m)
Academic Application Support Programme Research
Councils (74m), DTI (5m) PPARC (26m) BBSRC
(8m) MRC (8m) NERC (7m) ESRC (3m) EPSRC
(17m) CLRC (5m)
80m Collaborative projects
Industrial Collaboration (40m)
From Tony Hey 27 July 01
50Key Elements of UK Grid Development Plan
- Development of Generic Grid Middleware
- Network of Grid Core Programme e-Science Centres
- National Centre http//www.nesc.ac.uk/
- Regional Centres http//www.esnw.ac.uk/
- Grid IRC Grand Challenge Project
- Support for e-Science Pilots
- Short term funding for e-Science demonstrators
- Grid Network Team Grid Engineering Team
- Grid Support Centre Task Forces
Adapted from Tony Hey 27 July 01
51Take Home
- The Grid is an international activity
- The Grid has attracted high profile industrial
and government support and funding - The Information/Knowledge Grid is in many ways
indistinguishable from the Semantic Web - The Grid Communitys understanding of generic and
theoretical issues for the IK Grid is immature
and hackery.
52Spares
53Supernova Cosmology
54Home ComputersEvaluate AIDS Drugs
- Community
- 1000s of home computer users
- Philanthropic computing vendor (Entropia)
- Research group (Scripps)
- Common goal advance AIDS research
From Steve Tuecke 12 Oct. 01
55Grid viewpoints
What is it? Where is it? How to get it? When did
it happen?
Who knows it? Why does it? What are you doing?
interrogation
results
private
New Biology
workflows
public
Governance Control
Access Grid
Technology Grid
56Particle Physics and Astronomy Research Council
(PPARC)
- GridPP (http//www.gridpp.ac.uk/)
- to develop the Grid technologies required to meet
the LHC computing challenge - ASTROGRID (http//www.astrogrid.ac.uk/)
- a 4M project aimed at building a data-grid for
UK astronomy, which will form the UK contribution
to a global Virtual Observatory
57Infrastructure Deployments
- Institutional Grid deployments deploying
services and network infrastructure - DISCOM, IPG, TeraGrid, DOE Science Grid, DOD
Grid, NEESgrid, ASCI (Netherlands) - International deployments supporting
international experiments and science - iVDGL, StarLight
- Support centers
- U.K. Grid Center
- U.S. GRIDS Center