Title: The impact of grid computing on UK research
1The impact of grid computing on UK research
- R Perrott
- Queens University
- Belfast
2The Grid The Web on Steroids
Grid Flexible, high-perf access to all
significant resources
On-demand creation of powerful virtual computing
systems
3Why Now?
- The Internet as infrastructure
- Increasing bandwidth, advanced services
- Advances in storage capacity
- Terabyte for lt 15,000
- Increased availability of compute resources
- Clusters, supercomputers, etc.
- Advances in application concepts
- Simulation-based design, advanced scientific
instruments, collaborative engineering, ...
4Grids
- computational grid
- provides the raw computing power, high speed
bandwidth interconnection and associate data
storage - information grid
- allows easily accessible connections to major
sources of information and tools for its analysis
and visualisation - knowledge grid
- gives added value to the information and also
provides intelligent guidance for decision-makers
5Grid Architecture
Data to Knowledge
Knowledge Grid
Information Grid
Computation Data Grid
Communications
Control
6Application Users
Software Suppliers
7UK Research Councils Approx..
funding for 2000/01 (M)
- Biotechnology and Biological Sciences 200Researc
h Council (BBSRC) - Engineering and Physical Sciences 400Research
Council (EPSRC) - Economic and Social Research Council (ESRC) 70
- Medical Research Council (MRC) 350
- Natural Environment Research Council (NERC) 225
- Particle Physics and Astronomy 200Research
Council (PPARC) - Council for the Central Laboratory of
the 100Research Councils
8(No Transcript)
9UK Grid Development Plan
- Network of Grid Core Programme e-Science Centres
- Development of Generic Grid Middleware
- Grid Grand Challenge Project
- Support for e-Science Projects
- International Involvement
- Grid Network Team
101. Grid Core Programme Centres
- National e-Science Centre to achieve
international visibility - National Centre will host international e-Science
seminars similar to Newton Institute - Funding 8 Regional e-Science Centres to form
coherent UK Grid - DTI funding requires matching industrial
involvement - Good overlap with Particle Physics and AstroGrid
Centres
11Edinburgh
Glasgow
Newcastle
DL
Belfast
Manchester
Cambridge
Oxford
RL
Hinxton
Cardiff
London
Soton
12Centres will be Access Grid Nodes
Access Grid
- Access Grid will enable informal and formal group
to group collaboration - It enables
- Distributed lectures and seminars
- Virtual meetings
- Complex distributed grid demos
- Will improve the user experience (sense of
presence) - natural interactions (natural audio,
big display)
132. Generic Grid Middleware
- Continuing dialogue with major industrial players
- - IBM, Microsoft, Oracle, Sun, HP ..
- - IBM Press Announcement August 2001
- Open Call for Proposals from July 2001 plus
Centre industrial projects - Funding Computer Science involvement in EU
DataGrid Middleware Work Packages
143. Grid Interdisciplinary Research Centres Project
- 4 IT-centric IRCs funded
- - DIRC Dependability
- - EQUATOR HCI
- - AKT Knowledge Management
- - Medical Informatics
- Grand Challenge in Medical/Healthcare
Informatics - - issues of security, privacy and trust
154. Support for e-Science Projects
- Grid Starter Kit Version 1.0 available for
distribution from July 2001 -
- Set up Grid Support Centre
-
- Training Courses
- National e-Science Centre Research Seminar
Programme -
165. International Involvement
- GridNet at National Centre for UK participation
in the Global Grid Forum - Funding CERN and iVDGL Grid Fellowships
- Participation/Leadership in EU Grid Activities
- - New FP5 Grid Projects (DataTag, GRIP, )
- Establishing links with major US Centres San
Diego Supercomputer Center, NCSA -
176. Grid Network Team
- Tasked with ensuring adequate end-to-end
bandwidth for e-Science Projects - Identify/fix network bottlenecks
- Identify network requirements of e-Science
projects - Funding traffic engineering project
- Upgrade SuperJANET4 connection to sites
18Network Issues
- Upgrading SJ4 backbone from 2.5 Gbps to 10 Gbps
- Installing 2.5 Gbps link to GEANT pan-European
network - TransAtlantic bandwidth procurement
- 2.5 Gbps dedicated fibre
- Connections to Abilene and ESNet
- EU DataTAG project 2.5 Gbps link from CERN to
Chicago
19Early e-Science Demonstrators
- Funded
- Dynamic Brain Atlas
- Biodiversity
- Chemical Structures
- Under Development/Consideration
- Grid-Microscopy
- Robotic Astronomy
- Collaborative Visualisation
- Mouse Genes
- 3D Engineering Prototypes
- Medical Imaging/VR
20Particle Physics and Astronomy Research Council
(PPARC)
- GridPP (http//www.gridpp.ac.uk/)
- to develop the Grid technologies required to meet
the LHC computing challenge - collaboration with international grid
developments in Europe and the US
21Particle Physics and Astronomy Research Council
(PPARC)
- ASTROGRID (http//www.astrogrid.ac.uk/)
- a 4M project aimed at building a data-grid for
UK astronomy, which will form the UK contribution
to a global Virtual Observatory
22EPSRC Testbeds (1)
- DAME Distributed Aircraft Maintenance
Environment - RealityGrid closely couple high performance
computing, high throughput experiment and
visualization - GEODISE Grid Enabled Optimisation and DesIgn
Search for Engineering
23EPSRC Testbeds (2)
- CombiChem combinatorial chemistry
structure-property mapping - MyGrid personalised extensible environments for
data-intensive experiments in biology - Discovery Net high throughput sensing
24Distributed Aircraft Maintenance Environment
- Jim Austin, University of York
- Peter Dew, Leeds
- Graham Hesketh, Rolls-Royce
25In flight data
Global Network
Ground Station
Airline
DSS Engine Health Center
Maintenance Centre
Internet, e-mail, pager
Data centre
26Aims
- To build a generic grid test bed for distributed
diagnostics on a global scale - To demonstrate this on distributed aircraft
maintenance - Evaluate the effectiveness of grid for this task
- To deliver grid-enabled technologies that
underpin the application - To investigate performance issues
27Computational Infrastructure
Leeds Local Grid
3D Interactive Graphics Conferencing
Lab. Machines
Onyx 3
teradata
Shared Mem.
Cluster
Super Janet
Running Across YHMAN
White Rose Computational Grid (SAN)
York Shared Memory
Sheffield Dist. Memory
28MyGrid
ibm
- Personalised
- extensible environments for
- data-intensive experiments
in biology
Professor Carole Goble, University of Manchester
Dr Alan Robinson, EBI
29Consortium
- Scientific Team
- Biologists
- GSK, AZ, Merck KGaA, Manchester, EBI
- Technical Team
- Manchester, Southampton, Newcastle, Sheffield,
EBI, Nottingham - IBM, SUN
- GeneticXchange
- Network Inference, Epistemics Ltd
30Comparative Functional Genomics
- Vast amounts of data escalating
- Highly heterogeneous
- Data types
- Data forms
- Community
- Highly complex and inter-related
- Volatile
31MyGrid e-Science Objectives
- Revolutionise scientific practice in biology
- Straightforward discovery, interoperation,
sharing - Improving quality of both experiments and data
- Individual creativity collaborative working
- Enabling genomic level bioinformatics
- Cottage Industry to an Industrial Scale
32On the shoulders of giants
- We are not starting from scratch
- Globus Starter Kit
- Web Service initiatives
- Our own environments
- Integration platforms for bioinformatics
- Standards e.g. OMG LSR, I3C
- Experience with Open Source
33Specific Outcomes
- E-Scientists
- Environment built on toolkits for service access,
personalisation community - Gene function expression analysis
- Annotation workbench for the PRINTS pattern
database - Developers
- MyGrid-in-a-Box developers kit
- Re-purposing existing integration platforms
34Discovery Net
- Yike Guo, John Darlington (Dept. of Computing),
- John Hassard (Depts. of Physics and
Bioengineering) - Bob Spence (Dept. of Electrical Engineering)
- Tony Cass (Department of Biochemistry),
- Sevket Durucan (T. H. Huxley School of
Environment) - Imperial College London
-
-
35AIM
- To design, develop and implement an
infrastructure to support real time processing,
interaction, integration, visualisation and
mining of massive amounts of time critical data
generated by high throughput devices.
36The Consortium
- Industry Connection 4 Spin-off companies
related companies (AstraZeneca, Pfizer, GSK,
Cisco, IBM, HP, Fujitsu, Gene Logic, Applera,
Evotec, International Power, Hydro Quebec, BP,
British Energy, .)
37Industrial Contribution
- Hardware sensors (photodiode arrays), systems
(optics, mechanical systems, DSPs, FPGAs) - Software (analysis packages, algorithms, data
warehousing and mining systems) - Intellectual Property access to IP portfolio
suite at no cost - Data raw and processed data from biotechnology,
pharmacogenomic, remote sensing (GUSTO
installations, satellite data from geo-hazard
programmes) and renewable energy data (from
remote tidal power systems)
38High Throughput Sensing Characteristics
- Different Devices but same computational
characteristics - Data intensive
- Data dispersive
- large scale,
- heterogeneous
- distributed data
- Real-time data manipulation Need to
- calibrate
- integrate
- analyse
Discovery issues Â
Information issues
Data issues
GRID issues
39Testbed Applications
Throughput (GB/s) Size (petabytes) Node
Number operations
HTS Applications
Large-scale Dynamic Real- time Decision support
Large-scale Dynamic System Knowledge Discovery
1-10 1-10 gt20000 Structuring Mining Optimisat
ion RT decisions
- Bio Chip Applications
- Protein-folding chips SNP chips, Diff. Gene
chips using LFII - Protein-based fluorescent micro arrays
- Renewable energy Applications
- Tidal Energy
- Connections to other renewable initiatives
- (solar, biomass, fuel cells), to CHP and
baseload stations
- Remote Sensing Applications
- Air Sensing, GUSTO
- Geological, geohazard analysis
1-100 10-100 gt50000 Image Registration Visual
isation Predictive Modelling RT decisions
1-1000 10-1000 gt10000 Data Quality Visualisation
Structuring Clustering Distributed Dynamic
Knowledge Management
40Large-scale urban air sensing applications
Each GUSTO air pollution system produces 1kbit
per second, or 1010 bits per year. We expect to
increase the number (from the present 2 systems)
to over 20,000 over next 3 years, to reach a
total of 0.6 petabytes of data within the 3-year
ramp-up.
The useful information comes from time-resolved
correlations among remote stations, and with
other environmental data sets.
NO simulant 6.7.2001
You are here
41The IC Advantage
The IC infrastructure microgird for the testbed
Over than 12000 end devices
10 Mb/s 1Gb/s to end devices
ICPC Resource
1 Gb/s between floors
150 Gflops Processing
10 Gb/s to backbone
gt100 GB Memory
10 Gb/s between backbone router matrix and
wireless capability
5 TB of disk storage
3m SRIF funding
Network upgrade
20 TB of disk storage
2x1Gb/s to LMAN II (10Gb/s scheduled 2004)
25 TB of tape storage
3 Clusters (gt 1 Tera Flops)
42Conclusions
- Good buy-in from scientists and engineers
- Considerable industrial interest
- Reasonable buy-in from good fraction of
Computer Science community but not all - Serious interest in Grids from IBM, HP, Oracle
and Sun - On paper UK now has most visible and focussed
e-Science/Grid programme in Europe - Now have to deliver!
43US Grid Projects/Proposals
- NASA Information Power Grid
- DOE Science Grid
- NSF National Virtual Observatory
- NSF GriPhyN
- DOE Particle Physics Data Grid
- NSF Distributed Terascale Facility
- DOE ASCI Grid
- DOE Earth Systems Grid
- DARPA CoABS Grid
- NEESGrid
- NSF BIRN
- NSF iVDGL
44EU GridProjects
- DataGrid (CERN, ..)
- EuroGrid (Unicore)
- DataTag (TTT)
- Astrophysical Virtual Observatory
- GRIP (Globus/Unicore)
- GRIA (Industrial applications)
- GridLab (Cactus Toolkit)
- CrossGrid (Infrastructure Components)
- EGSO (Solar Physics)