Title: The Challenge of eScience and the Grid for Universities
1The Challenge of e-Science and the Grid for
Universities
- Tony Hey
- Director of UK e-Science Core Programme
- Tony.Hey_at_epsrc.ac.uk
2J.C.R.Lickliders Vision
- Lick had this concept of the intergalactic
network which he believed was everybody could use
computers anywhere and get at data anywhere in
the world. He didnt envision the number of
computers we have today by any means, but he had
the same concept all of the stuff linked
together throughout the world, that you can use a
remote computer, get data from a remote computer,
or use lots of computers in your job. The vision
was really Licks originally. - Larry Roberts Principal Architect of the ARPANET
3A Definition of e-Science
- e-Science is about global collaboration in
key areas of science, and the next generation of
infrastructure that will enable it. - John Taylor
- Director General of Research Councils
- Office of Science and Technology
- Purpose of e-Science initiative is to allow
scientists to do faster, different, better
research
4The e-Science Paradigm
- The Integrative Biology Project involves the
University of Oxford (and others) in the UK and
the University of Auckland in New Zealand - Models of electrical behaviour of heart cells
developed by Denis Nobles team in Oxford - Mechanical models of beating heart developed in
Auckland - Need to be able to build a Virtual Organisation
allowing routine access for researchers to
specific resources in the UK and New Zealand
5e-Infrastructure/Cyberinfrastructurefor Research
Common Fabric
Generic services
Group A
Resources
Private Resources
Group B
Private Resources
6 Educational Grids
- Education is a classic distributed organization
requiring integration of data sources with people
and computing - New multi-disciplinary curricula require
distributed experts interacting with mentors and
students - Grids will democratize resources enabling
universal and ubiqitous access - Learning Management systems such as WebCT,
Blackboard, Placeware, WebEx and Groove all have
natural Grid implementations
7Enterprise Grid
Dynamic light-weight Peer-to-peer Collaboration
Training Grid
Students
Information Grid
Compute Grid
Campus Grid
Teacher
Overlapping Heterogeneous Dynamic Grid Islands
8The Grid as an Enabler for Virtual Organisations
- Ian Foster, Carl Kesselman and Steve Tueke
- The Grid is a software infrastructure that
enables flexible, secure, coordinated resource
sharing among dynamic collections of individuals,
institutions and resources - - includes computational systems and data
storage resources and specialized facilities - Enabling infrastructure for transient Virtual
Organisations
9A Definition of e-Research?
- e-Research is about global collaboration in
key research areas, and the next generation of
infrastructure that will enable it. - John Taylor
- Director General of Research Councils
- Office of Science and Technology
10Motivations
- Scientific community developed the Web as a
collaboration technology - Transformed modern business world!
- John Taylor brought the HP vision of the
information utility to the scientific context - Global infrastructure for scientific RD
- Scientific community is now developing the Grid
as a collaboration technology - Will this be as relevant to Arts and Humanities
as the Web?
11NSF Adkins Report on Cyberinfrastructure
- the primary access to the latest findings in a
growing number of fields is through the Web, then
through classic preprints and conferences, and
lastly through refereed archival papers. - Â Â
- archives containing hundreds or thousands of
terabytes of data will be affordable and
necessary for archiving scientific and
engineering information.
12MIT DSpace Vision
- As more and more research and educational
material is born digital, institutions and
organizations are increasingly realizing the need
for a stable place in which such material may be
stored and accessed long-term. The Massachusetts
Institute of Technology is a perfect example of
an organization with this need. Much of the
material produced by faculty, such as datasets,
experimental results and rich media data as well
as more conventional document-based material
(e.g. articles and reports) is housed on an
individuals hard drive or department Web server.
Such material is often lost forever as faculty
and departments change over time. - Â
13UK e-Science Funding
- First Phase 2001 2004
- Application Projects
- 74M
- All areas of science and engineering
- Core Programme
- 15M OST
- 20M DTI Collaborative industrial projects
- Second Phase 2003 2006
- Application Projects
- 96M
- All areas of science and engineering
- Core Programme
- 16M OST
- DTI Technology Fund
14CERN's Users in the World
Europe 267 institutes, 4603 usersElsewhere
208 institutes, 1632 users
15Powering the Virtual Universehttp//www.astrog
rid.ac.uk(Edinburgh, Belfast, Cambridge,
Leicester, London, Manchester, RAL)
Multi-wavelength showing the jet in M87 from top
to bottom Chandra X-ray, HST optical, Gemini
mid-IR, VLA radio. AstroGrid will provide
advanced, Grid based, federation and data mining
tools to facilitate better and faster scientific
output.
Picture credits NASA / Chandra X-ray
Observatory / Herman Marshall (MIT),
NASA/HST/Eric Perlman (UMBC), Gemini
Observatory/OSCIR, VLA/NSF/Eric Perlman
(UMBC)/Fang Zhou, Biretta (STScI)/F Owen (NRA)
p15
Printed 09/11/2009
16Comb-e-Chem Project
Video
Simulation
Properties
Analysis
StructuresDatabase
Diffractometer
X-Raye-Lab
Propertiese-Lab
Grid Middleware
17DAME Project
In flight data
Global Network eg SITA
Ground Station
Airline
DSS Engine Health Center
Maintenance Centre
Internet, e-mail, pager
Data centre
18myGrid Project
- Imminent deluge of data
- Highly heterogeneous
- Highly complex and inter-related
- Convergence of data and literature archives
19Discovery Net Project
Interactive Editor Visualisation
Nucleotide Annotation Workflows
Download sequence from Reference Server
Save to Distributed AnnotationServer
- 1800 clicks
- 500 Web access
- 200 copy/paste
- 3 weeks work
- in 1 workflow and few second execution
20 CLEF - Integrating information
- High quality, integrated clinical information is
key to - clinical research
- evidence-based health care
- the clinical application of genetic and genomic
research - Capture, integration, and presentation of
descriptive information is a major barrier to
achieving an integrated framework - Data includes
- clinical histories
- radiology and pathology reports
- annotations on genomic and image databases
- technical literature and Web based resources
21eDiaMoND Project
Mammograms have different appearances, depending
on image settings and acquisition systems
Temporal mammography
Computer Aided Detection
Standard Mammo Format
3D View
22MIAS-Devices Project
- Continuous monitoring of multiple signals via
wearable devices - Periodic monitoring using Java phones and blood
glucose measures
Sensor bus
GPS ariel
23Support for e-Science Projects
- Grid Support Centre
- supported Grid middleware users
- see www.grid-support.ac.uk
- National e-Science Institute
- Research Seminars Training Programme
- See www.nesc.ac.uk
- e-Science Certificate Authority
- Issue digital certificates for projects
- Goal is single sign-on'
24Single Sign-On Digital Certificates
Public Key
A text string
ABCDEFGHIJKLMNOPQRSTUV
Validity Data
Extensions
Signature from CAs private key
25UK e-Science Grid
Edinburgh
Glasgow
Newcastle
DL
Belfast
Manchester
Cambridge
Oxford
Hinxton
RAL
Cardiff
London
Southampton
26Access Grid Group Conferencing
All UK e-Science Centres have AG rooms Widely
used for technical and management meetings
Multi-site group-to-group conferencing
system Continuous audio and video contact with
all participants Globally deployed
27e-Science Centres of Excellence
- Birmingham/Warwick Modelling
- Bristol Media
- UCL Networking
- White Rose Grid Leeds, York, Sheffield
- Lancaster Social Science
- Leicester Astronomy
- Reading - Environment
28UK e-Science Grid Second Phase OGSA Grid
Edinburgh
Glasgow
Newcastle
DL
Belfast
Manchester
Cambridge
Oxford
RL
Hinxton
Cardiff
London
Soton
29Identifiable UK Focus
- Data Access and Integration
- OGSA-DAI and DAIT project (1.5M)
- Key grid data services
- Workflow, Provenance, Notification
- Disrtributed Query, Knowledge Management
- Data Curation and Data Handling
- Digital Curation Centre (3M)
- Data Handling (1M)
- Security, AA and all that
- Short/Medium Term Problems (3M)
- Medium/Long Term Issues (2M)
30 Semantic Web
31Metadata Ontologies
- Metadata computationally accessible data about
the services - Ontologies the shared and common understanding
of a domain - A vocabulary of terms
- Definition of what those terms mean.
- A shared understanding for people and machines
- Usually organised into a taxonomy.
32The Semantic Grid
Semantic Web
Data Complexity
Classical Grid
Classical Web
Computational Complexity
33The UK e-Science ExperiencePhase 1
- All Research Council e-Science funds committed
- e-Science pilots launched covering many areas of
science, engineering and medicine - UK e-Science Core Programme
- DTI 20M for collaborative industrial RD
- About 80 UK companies participating
- Over 30M industrial contributions
- Engineering, Pharmaceutical, Petrochemical
- IT companies, Commerce, Media
34UK e-Science Next Steps
- Deploy production National Grid Service
based on four dedicated compute and data nodes
plus the two UK Supercomputers - Develop operational policies, security,
- Gain experience with genuine users
- Develop OGSA based e-Science Grid
- Based on two OGSA Grid projects and
- e-Science Centres
- Work with EU EGEE project and NSF
- Cyberinfrastructure Program
35Open Grid Services Architecture
- Development of Web Services
- OGSA/WSRF/ will provide
- Naming /Authorization / Security / Privacy/
- Projects should look at higher level services
Workflow, Transactions, DataMining, Knowledge
Discovery - Exploit Synergy Commercial Internet
with Grid Services
36The Key Problem Research Prototype Middleware to
Production Quality
- Research projects are not funded to do the
regression testing, configuration and QA required
to produce production quality middleware - Common rule of thumb is that it requires at least
10 times more effort to take proof of concept
research software to production quality - Key issue for UK e-Science projects is to ensure
that there is some documented, maintainable,
robust grid middleware by the end of the 5 year
250M initiative
37The UK Open Middleware Infrastructure Institute
(OMII)
- Repository for UK-developed Open Source
e-Science/Cyber-infrastructure Middleware - Documentation, specification,QA and standards
- Fund work to bring research project software up
to production strength - Fund Middleware projects for identified gaps
- Work with US NSF, EU Projects and others
- Supported by major IT companies
- Southampton selected as the OMII site
382.4 Petabytes Today
39Digital Curation Centre (DCC)
- In next 5 years e-Science projects will produce
more scientific data than has been collected in
the whole of human history - In 20 years can guarantee that the operating and
spreadsheet program and the hardware used to
store data will not exist - Research curation technologies and best practice
- Need to liaise closely with individual research
communities, data archives and libraries - Edinburgh with Glasgow, CLRC and UKOLN selected
as site of DCC
40The UK Dual Support System
- Provides two streams of public funding for
university research - Funding provided by the HEFCs for research
infrastructure salaries of permanent academic
staff, premises, libraries central computing
costs - Funding from the Research Councils for specific
projects in response to proposals submitted
approved through peer review - Well Founded Laboratory concept
41SuperJANET4
42JISC Committee for Support of Research (JCSR)
- Established in 2002 after Follett Review
- Remit is to ensure JISC retains focus on research
community - Budget of 3M p.a.
- Seeking research support requirements from
Research Councils - Funded analysis of research data curation
requirements - Funded scoping study on legal, IPR and provenance
issues for e-Science collaboratories
43Initial JCSR Portfolio
- Grid Middleware Testbed with Compute and Data
Clusters with CLRC - AAA Initiative with JCIE
- Autonomic Computing/Semantic Grid initiative with
EPSRC - Access Grid Support Service
- e-Social Science Training material with ESRC
- Intelligent Text Mining Service for Biosciences
with BBSRC - Digital Curation Centre with e-Science Core
Programme
44UK e-Science Timeframes
- 2001 2002 2003 2004 2005 2006 2007
- SR2000
- SR2002
- SR2004
- SJ5/AAA Service
- LHC/LCG
45e-Science Infrastructure beyond 2006
- Persistent UK e-Science Research Grid
- Grid Operations Centre
- Open Middleware Infrastructure Institute
- National e-Science Institute
- Digital Curation Centre
- AccessGrid Support Service
- e-Science/Grid Legal Service
- International Standards Activity
46e-Science and the University
- e-Science will change the dynamic of the way
science is undertaken. - John Taylor, 2001
- Need to break down the barriers between the
Victorian bastions of science biology,
chemistry, physics, . - Develop permeable structures that promote
rather than hinder multidisciplinary
collaboration - Key role for Computer Science Departments
- e.g. Cornell model
- Need to engage University IT Service Departments
Computing, Library, .. -
47e-Government and the Grid
-
- The Grid intends to make access to computing
power, scientific data repositories and
experimental facilities as easy as the Web makes
access to information. - Tony Blair, 2002