Title: What is eScience
1What is e-Science What is the Grid?
W T Hewitt Monday, November 23, 2009 UCISA
Meeting Edinburgh
2Agenda
- What is Grid eScience?
- The Global Programme
- The UK eScience Programme
- Impacts
3- What is e-Science the Grid?
4Why Grids?
- Large-scale science and engineering are done
through - the interaction of people,
- heterogeneous computing resources, information
systems, and instruments, - all of which are geographically and
organizationally dispersed. - The overall motivation for Grids is to
facilitate the routine interactions of these
resources in order to support large-scale science
and engineering.
From Bill Johnston 27 July 01
5The Grid
- "is the web on steroids."
- "is Napster for Scientists" of data grids
- "is the solution to all your problems."
- "is evil." a system manager, of Globus
- "is distributed computing re-badged."
- "is distributed computing across multiple
administrative domains" - Dave Snelling, senior architect of UNICORE
6- provides "Flexible, secure, coordinated
resource sharing among dynamic collections of
individuals, institutions, and resource" - From The Anatomy of the Grid Enabling Scalable
Virtual Organizations - "enables communities (virtual organizations)
to share geographically distributed resources as
they pursue common goals -- assuming the absence
of central location, central control,
omniscience, existing trust relationships."
7CERN Large Hadron Collider (LHC)
Raw Data 1 Petabyte / sec Filtered 100Mbyte /
sec 1 Petabyte / year 1 Million CD ROMs
CMS Detector
8Why Grids?
- A biochemist exploits 10,000 computers to screen
100,000 compounds in an hour - A biologist combines a range of diverse and
distributed resources (databases, tools,
instruments) to answer complex questions - 1,000 physicists worldwide pool resources for
petaop analyses of petabytes of data - Civil engineers collaborate to design, execute,
analyze shake table experiments
From Steve Tuecke 12 Oct. 01
9Why Grids? (contd.)
- Climate scientists visualize, annotate, analyze
terabyte simulation datasets - An emergency response team couples real time
data, weather model, population data - A multidisciplinary analysis in aerospace couples
code and data in four companies - A home user invokes architectural design
functions at an application service provider
From Steve Tuecke 12 Oct. 01
10Broader Context
- Grid Computing has much in common with major
industrial thrusts - Business-to-business, Peer-to-peer, Application
Service Providers, Storage Service Providers,
Distributed Computing, Internet Computing - Sharing issues not adequately addressed by
existing technologies - Complicated requirements run program X at site
Y subject to community policy P, providing access
to data at Z according to policy Q - High performance unique demands of advanced
high-performance systems
11What is the Grid?
- Grid computing is distinguished from
conventional distributed computing by its focus
on large-scale resource sharing, innovative
applications, and, in some cases,
high-performance orientation...we review the
"Grid problem", which we define as flexible,
secure, coordinated resource sharing among
dynamic collections of individuals, institutions,
and resources - what we refer to as virtual
organizations." - From "The Anatomy of the Grid Enabling Scalable
Virtual Organizations" by Foster, Kesselman and
Tuecke
12New Book
13What is the Grid?
- Resource sharing coordinated problem solving in
dynamic, multi-institutional virtual
organizations - On-demand, ubiquitous access to computing, data,
and all kinds of services - New capabilities constructed dynamically and
transparently from distributed services - No central location, No central control, No
existing trust relationships, Little
predetermination - Uniformity
- Pooling Resources
14e-Science and the Grid
- e-Science is about global collaboration in key
areas of science, and the next generation of
infrastructure that will enable it. - e-Science will change the dynamic of the way
science is undertaken. - John Taylor,
- Director General of Research Councils,
- Office of Science and Technology
15Why GRID?
- VERY VERY IMPORTANT
- The GRID is one way to realise the e-Science
vision. - WE ARE TRYING TO DO E-SCIENCE!
16Diverse global services
Grid services
Local OS
17Common principles
- Single sign-on
- Often implying Public Key Infrastructure (PKI)
- Standard protocols and services
- Respect for autonomy of resource owner
- Layered architectures
- Higher-level infrastructures hiding heterogeneity
of lower levels - Interoperability is paramount
18Grid Middleware
- Middleware
- Globus
- UNICORE
- Legion and Avaki
- Scheduling
- Sun Grid Engine
- Load Sharing Facility (LSF)
- from Platform Computing
- OpenPBS and PBS(Pro)
- from Veridian
- Maui scheduler
- Condor
- could also go under middleware
- Data
- Storage Resource Broker (SRB)
- Replica Management
- OGSA-DAI
- Web services (WSDL, SOAP, UDDI)
- IBM Websphere
- Microsoft .NET
- Sun Open Net Environment (Sun ONE)
- PC Grids
- Peer-to-Peer computing
19 20Data-oriented middleware
- Wide-area distributed file systems (e.g. AFS)
- Storage Resource Broker (SRB)
- UCSD and SDSC
- Provide transparent access to data storage
- Centralised architecture
- Motivated by experiences of HPC users, not
database users - Little enthusiasm from UK e-Science programme
- OGSA-DAI
- Database Access and Integration
- Strategic contribution of UK e-Science programme
- Universities of Edinburgh, Manchester, Newcastle
IBM, Oracle - Alpha release January 2003
- Globus Replica Management software
- Next up!
21Data Grids forHigh Energy Physics
22Data Intensive Issues Include
- Harness potentially large numbers of data,
storage, network resources located in distinct
administrative domains - Respect local and global policies governing what
can be used for what - Schedule resources efficiently, again subject to
local and global constraints - Achieve high performance, with respect to both
speed and reliability - Catalog software and virtual data
23Desired Data Grid Functionality
- High-speed, reliable access to remote data
- Automated discovery of best copy of data
- Manage replication to improve performance
- Co-schedule compute, storage, network
- Transparency wrt delivered performance
- Enforce access control on data
- Allow representation of global resource
allocation policies
24Grid Standards
- Grid Standards Bodies
- IETF Home of the Network Infrastructure
Standards - W3C Home of the Internet
- GGF Home of the Grid
- GGF Defines the Open Grid Services Architecture
- OGSI is the Infrastructure part of OGSA
- OGSI Public comment draft submitted 14 February
2003 - Key OGSA Areas of Standards Development
- Job management interfaces
- Resources Discovery
- Security
- Grid Economy and Brokering
25What is OGSA?
Web Services with Attitude!
Also known as "Open Grid Services Architecture"
26Aside What are Web Services?
- Loosely Coupled Distributed Computing
- Think Java RMI or C remote procedure call
- Text Based Serialization
- XML Human Readable serialization of objects
- IBM and Microsoft lead
- Web Services Description Language (WSDL)
- W3C Standardization
- Three Parts
- Messages (SOAP)
- Definition (WSDL)
- Discovery (UDDI)
27Web Services in Action
28Enter Grid Services
- Experiences of Grid computing (and business
process integration) suggest similar extensions
to Web Services - State
- Service Data Model
- Persistence and Naming
- Two Level Naming (GSH, GSR)
- Allows dynamic migration and QoS adaptation
- Lifetime Management
- Self healing and soft garbage collection.
- Standard PortTypes
- Guarantee of minimal level of service
- Beyond P2P is Federation through Mediation
- Explicit Semantics
- Grid Services specify semantics on top of Web
Service syntax. - PortType Inheritance
29- If one GRID is good then Many GRIDS must be better
30US Grid Projects
- NASA Information Power Grid
- DOE Science Grid
- NSF National Virtual Observatory
- NSF GriPhyN
- DOE Particle Physics Data Grid
- NSF DTF TeraGrid
- DOE ASCI DISCOM Grid
- DOE Earth Systems Grid
- DOE FusionGrid
- NEESGrid
- NIH BIRN
- NSF iVDGL
31National Grid Projects
- Japan Grid Data Farm, ITBL
- Netherlands VLAM, DutchGrid
- Germany UNICORE, Grid proposal
- France Grid funding approved
- Italy INFN Grid
- Eire Grid-Ireland
- Poland PIONIER Grid
- Switzerland - Grid proposal
- Hungary DemoGrid, Grid proposal
- ApGrid AsiaPacific Grid proposal
32EU GridProjects
- DataGrid (CERN, ..)
- EuroGrid (Unicore)
- DataTag (TTT)
- Astrophysical Virtual Observatory
- GRIP (Globus/Unicore)
- GRIA (Industrial applications)
- GridLab (Cactus Toolkit)
- CrossGrid (Infrastructure Components)
- EGSO (Solar Physics)
- COG (Semantic Grid)
33 34UK e-Science Programme
DG Research Councils
Grid TAG
E-Science Steering Committee
Director
Directors Management Role
Directors Awareness and Co-ordination Role
Generic Challenges EPSRC (15m), DTI (15m)
Academic Application Support Programme Research
Councils (74m), DTI (5m) PPARC (26m) BBSRC
(8m) MRC (8m) NERC (7m) ESRC (3m) EPSRC
(17m) CLRC (5m)
80m Collaborative projects
Industrial Collaboration (40m)
From Tony Hey 27 July 01
35Key Elements
- Development of Generic Grid Middleware
- Network of Grid Core Programme e-Science Centres
- National Centre http//www.nesc.ac.uk
- Regional Centres http//www.esnw.ac.uk/
- Grid IRC Grand Challenge Project
- Support for e-Science Pilots
- Short term funding for e-Science demonstrators
- Grid Network Team
- Grid Engineering Team
- Grid Support Centre
- Task Forces
- Database lead by Norman Paton
- Architecture lead by Malcolm Atkinson
- International Involvement
Adapted from Tony Hey 27 July 01
36National Regional Centres
- Centres donate equipment to make a Grid
Edinburgh
Glasgow
Newcastle
Belfast
Manchester
DL
Cambridge
Oxford
Hinxton
RAL
Cardiff
London
Southampton
37e-Science Demonstrators
- Dynamic Brain Atlas
- Biodiversity
- Chemical Structures
- Mouse Genes
- Robotic Astronomy
- Collaborative Visualisation
- Climateprediction.com
- Medical Imaging/VR
38Grid Middleware RD
- 16M funding available for industrial
collaborative projects - 11M allocated to Centres projects plus 5M for
Open Call projects - Set up Task Forces
- Database Task Force
- Architecture Task Force
- Security Task Force
39Grid Network Team
- Expert group to identify end-to-end network
bottlenecks and other network issues - e.g. problems with multicast for Access Grid
- Identify e-Science project requirements
- Funding 0.5M traffic engineering/QoS project
with PPARC, UKERNA and CISCO - investigating MPLS using SuperJANET network
- Funding DataGrid extension project investigating
bandwidth scheduling with PPARC - Proposal for UKLight lambda connection to
Chicago and Amsterdam
40 UK e-Science Pilot Projects
- GRIDPP (PPARC)
- ASTROGRID (PPARC)
- Comb-e-Chem (EPSRC)
- DAME (EPSRC)
- DiscoveryNet (EPSRC)
- GEODISE (EPSRC)
- myGrid (EPSRC)
- RealityGrid (EPSRC)
- Climateprediction.com (NERC)
- Oceanographic Grid (NERC)
- Molecular Environmental Grid (NERC)
- NERC DataGrid ( OST-CP)
- Biomolecular Grid (BBSRC)
- Proteome Annotation Pipeline (BBSRC)
- High-Throughput Structural Biology (BBSRC)
- Global Biodiversity (BBSRC)
41e-Science Centres of Excellence
- Birmingham/Warwick Modelling
- Bristol Media
- UCL Networking
- White Rose Grid Leeds, York, Sheffield
- Lancaster Social Science
- Leicester Astronomy
- Reading - Environment
42UK e-Science Grid
Edinburgh
Glasgow
Newcastle
Belfast
Manchester
DL
Cambridge
Oxford
RL
Hinxton
Cardiff
London
Soton
43UK e-Science Funding
- First Phase 2001 2004
- Application Projects
- 74M
- All areas of science and engineering
- Core Programme
- 15M 20M (DTI)
- Collaborative industrial projects
- Second Phase 2003 2006
- Application Projects
- 96M
- All areas of science and engineering
- Core Programme
- 16M
- Core Grid Middleware
- DTI follow-on?
44- EPSRC Computer Science for e-Science
- 9M, 18 projects so far
- ESRC National e-Social Science Centre 3 hubs
- 6M
- PPARC
- MRC
- BBSRC
45Core Programme Phase 2
- UK e-Science Grid/Centres and e-Science Institute
- Grid Operation Centre and Network Monitoring
- Core Middleware engineering
- National Data Curation Centre
- e-Science Exemplars/New Opportunities
- Outreach and International involvement
46Other Activities
- Security Task Force
- Joint fund key security projects with EPSRC
JCSR and coordinated effort with NSF NMI
Internet2 projects - JCSR 2M call in preparation
- UK Digital Curation Centre
- 3M, Core e-Science JCSR
- JCSR
- 3M per annum
47SR2004 e-Science Infrastructure
- Persistent UK e-Science Research Grid
- Grid Operations Centre
- UK Open Middleware Infrastructure Institute
- National e-Science Institute
- UK Digital Curation Centre
- AccessGrid Support Service
- e-Science/Grid collaboratories Legal Service
- International Standards Activity
48 49Todays Grid
- A Single System Image
- Transparent wide-area access to large data banks
- Transparent wide-area access to applications on
heterogeneous platforms - Transparent wide-area access to processing
resources
- Security, certification, single sign-on
authentication, AAA - Grid Security Infrastructure,
- Data access,Transfer Replication
- GridFTP, Giggle
- Computational resource discovery, allocation and
process creation - GRAAM, Unicore, Condor-G
50Reality Checks!!
- The Technology is Ready
- Not true its emerging
- Building middleware, Advancing Standards,
Developing, Dependability - Building demonstrators.
- The computational grid is in advance of the data
intensive middleware - Integration and curation are probably the
obstacles - But!! It doesnt have to be all there to be
useful. - We know how we will use grid services
- No Disruptive technology
- Lower the barriers of entry.
51Grid Evolution
- 1st Generation Grid
- Computationally intensive, file access/transfer
- Bag of various heterogeneous protocols toolkits
- Recognises internet, Ignores Web
- Academic teams
- 2nd Generation Grid
- Data intensive -gt knowledge intensive
- Services-based architecture
- Recognises Web and Web services
- Global Grid Forum
- Industry participation
We are here!
52Impacts
- It's all about interoperability, really.
- Web Grid Services are creating a new
marketplace for components - If you're concerned with systems integration or
internet delivery of services, embrace Web
Services technologies now. You'll be ready for
Grid Services when they're ready for you. - If you're a developer, get Web Services on your
CV - If you're an IT manager, collect Web Service
expertise through hiring or training - Software license models must adapt
53I don't want to share!Do I need a grid?
54In conclusion
- The GRID is not, and will not, be free
- must pay for resources
- What have we to show for 250M?
55Acknowledgements
- Carole Goble
- Stephen Pickles
- Paul Jeffreys
- University of Manchester
- Academic collaborators
- Industrial collaborators
- Funding Agencies DTI, EPSRC, NERC, ESRC, PPARC
56SVE _at_ Manchester Computing
World Leading Supercomputing Service, Support and
Research Bringing Science and Supercomputers
Together www.man.ac.uk/sve sve_at_man.ac.uk