Title: Session 2 Overview of eScience and Distributed Systems
1Session 2 Overview of e-Science and
Distributed Systems
7 July 2008
2Overview
- e-Science Computational thinking
- A turning point in the history of science
- Modern challenges
- Combined approaches
- Grids in context
- What can a grid do?
- What cant it do?
- Principles
- Scenarios
3New modes in Research, Thought and Collaboration
4Vision
- We are undergoing a transition in
- the power of affordable computing
- the wealth of accessible data and
- the capacity of digital communication
- e-Science provides leadership in
interdisciplinary collaboration - By combining these we will provide unprecedented
ability to address pressing research challenges
5Definition of e-Science
Computing has become a fundamental tool in all
research disciplines, which often proceed by
assembling and managing large data collections
and exploiting computer models and simulations
(a topic called e-Science) Phil Wadler 2008
e-Science is the invention and application of
computer-enabled methods to achieve new, better,
faster or more efficient research in any
discipline. It draws on advances in mathematical
sciences, informatics, computation and digital
communications. As such it has been an important
tool for researchers for many decades. The data
deluge and the scale and complexity of todays
research challenges have greatly increased its
importance for researchers. As a consequence, in
2001 the UK led the world by initiating a
coordinated e-Science research programme to
stimulate the development of e-Science across all
fields of research.
6Strengths of e-Science
Communities and e-Infrastructure supporting
research and innovation
7Computational thinking
- Transforming the way we think
- Incremental refinement
- Solution by composition
- Layers of abstractions
- Process models
- Notations
- Recursive thinking
- Simulation, Randomisation
-
- Enabled by ubiquitous computers
- Analogue of the printing press
Jeanette Wing, Computational Thinking,
Communications of the ACM, March 2006, Vol 49,
No. 3, p33-35
8WWW acting
- The Long Tail
- Data is the Next Intel Inside
- Users Add Value
- Network Effects by Default
- Some Rights Reserved
- The Perpetual Beta
- Cooperate, Don't Control
- Software Above the Level of a Single Device
- Transforming the way we act
- Data is key ingredient
- Community action
- Global collaboration
- Community thinking
- Minimal (?) control
- Minimal reserved rights
- Composition via wikis
- Mash ups
-
- Enabled by ubiquitous digital communication
- Analogue of the radio
http//www.oreillynet.com/pub/a/oreilly/tim/news/2
005/09/30/what-is-web-20.html
9Is e-Science making a difference?
10Tremendous global challenges
11Scale, Urgency, Complexity,
12Achieving the CI Vision requires synergy between
3 types of Foundation wide activities
Transformative Application - to enhance discovery
learning
Provisioning -Creation, deployment and operation
of advanced CI
RD to enhance technical and social dimensions of
future CI systems
Cyberinfrastructure Vision for 21st Century
Discovery, NSF Cyberinfrastructure Council, March
2007
13The Information Explosion
988EB (2010)
1ZB
?????
161EB (2006 by IDC)
??????
???? ????????
GRID/????
ITS
Slide Satoshi Matsuoka
14The 21st Century
This is the century of information
Prime Minister Gordon Brown, University of
Westminster, 25 October 2007
Thanks for images to Mark Birkin (MoSeS
Genysis projects) and Michael Batty (GeoVue
project)
15Historical perspective
16Timeline
Foundations for Collaborative Behaviour
Today
Wellbeing the global-scale killer app., Sir
Robin Saxby Oct. 2006
17Healthcare _at_ Home
REFERRAL
REFERRAL
GPHome-mobile-clinic via PDA-laptop-PC-Paper
DiabeticianHome-mobile-clinic via
PDA-laptop-PC-Paper
Various Clinical Specialists (Distributed) e.g.
Ophthalmologist, Podiatrist, Vascular Surgeons,
Renal Specialists, Wound clinic, Foot care
clinic, Neurologists, Cardiologists
ILLNESS
REFERRAL
VARIABLESACCESSMATRIX
CASE
PatientHome-mobile-clinic via TV-PDA-laptop-PC-Pa
per
Diabetes Specialist / Other Specialist
Nurses Home-mobile-clinic via TV-PDA-laptop-PC-Pap
er
Dietitian
Biochemist
Community Nurses / Health Visitors
Slide from Alex Hardisty
18Distributed Systems History
ARPA net
1960
1970
1980
1990
2000
19Distributed Systems to Grids
1960
1970
1980
1990
2000
20e-Infrastructure
- A shared resource
- That enables science, research, engineering,
medicine, industry, - It will improve UK/European/ productivity
- Lisbon Accord 2000
- e-Science Vision SR2000 John Taylor
- Commitment by UK government
- Sections 2.23-2.25
- Always there multi-purpose
- c.f. telephones, transport, power
- OSI report
www.nesc.ac.uk/documents/OSI/index.html
21A Grid Computing Timeline
US Grid Forum forms at SC 98
Grid Forums merge, form GGF
European AP Grid Forums
I-Way SuperComputing 95
OGSA-WG formed
Physiology paper
Anatomy paper
GGF EGAform OGF
OGSA v1.0
Source Hiro Kishimoto GGF17 Keynote May 2006
22What is a Grid?
- A grid is a system consisting of
- Distributed but connected resources and
- Software and/or hardware that provides and
manages logically seamless access to those
resources to meet desired objectives
Handheld
Supercomputer
Server
Data Center
Cluster
Workstation
Source Hiro Kishimoto GGF17 Keynote May 2006
23Grid Related Paradigms
- Cluster
- Tightly coupled
- Homogeneous
- Cooperative working
- Distributed Computing
- Loosely coupled
- Heterogeneous
- Single Administration
- Grid Computing
- Large scale
- Cross-organizational
- Geographical distribution
- Distributed Management
Source Hiro Kishimoto GGF17 Keynote May 2006
24Views of Grids
25Grids integrating providing homogeneity
- Grids are (potentially) Generic Industry
Supported - Grids combine many heterogeneous distributed
resources - Data Information
- Computation software
- Instruments, sensors actuators
- Research processes procedures
- System operations processes procedures
- Grids restrict choices
- Harder for provider to make localised decisions
- Deployment can be challenging
- Grids provide virtual homogeneity through
virtualisation - Should be easier to compose services
- More opportunity to amortise costs
- A component of e-Infrastructure
Deliberately choosing consistent interfaces,
protocols management controls across a set of
compatible services. Giving up some freedom to
differ.
26Grids as a Foundation for Solutions
- The grid per se doesnt provide
- Supported e-Science methods
- Supported data information resources
- Computations
- Convenient access
- Collaborative behaviour
- Grids help organisations provide
- International national secure e-Infrastructure
- Standards for interoperation
- Standard APIs to promote re-use
- But Research Support must be built
- What is needed?
- Who should build it?
27Grids as a Foundation for Solutions
Much to be done by developers of applications
services and by resource providers
- The grid per se doesnt provide
- Supported e-Science methods
- Supported data information resources
- Computations
- Convenient access
- Collaborative behaviour
- Grids help organisations provide
- International national secure e-Infrastructure
- Standards for interoperation
- Standard APIs to promote re-use
- But Research Support must be built
- What is needed?
- Who should build it?
28Grids as a Foundation for Solutions
Much to be done by developers of applications
services and by resource providers
- Must support many categories of user
- Application Service developers
- Tool builders
- Deployers Operations teams
- Gateway developers
- App, tool gateway users
- The grid per se doesnt provide
- Supported e-Science methods
- Supported data information resources
- Computations
- Convenient access
- Grids help providers of these
- International national secure e-Infrastructure
- Standards for interoperation
- Standard APIs to promote re-use
- But Research Support must be built
- What is needed?
- Who should do it?
29Motives for Grids
30Why use / build Grids?
- Research Arguments
- Enables new ways of working
- New distributed collaborative research
- Unprecedented scale and resources
- Economic Ecological Arguments
- Reduced system management costs
- Shared resources ? better utilisation
- Pooled resources ? increased capacity
- Greener / less power consumption ?
environmentally acceptable computing - Load sharing utility computing
- Cheaper disaster recovery
31Why use / build Grids?
- Computer Science Arguments
- New attempt at an old hard problem
- Frustrating ignorance about existing results
- New scale, new dynamics, new scope
- Engineering Arguments
- Enable autonomous organisations to
- Write complementary software components
- Set up run use complementary services
- Share operational responsibility
- General consistent environment forAbstraction,
Automation, Optimisation Tools - Generally available code mobility
32Why use / build Grids?
- Political Management Arguments
- Stimulate innovation
- Promote intra-organisation collaboration
- Promote inter-enterprise collaboration
33Collaboration is key
34Biomedical Research Informatics Delivered by Grid
Enabled Services
Portal
http//www.brc.dcs.gla.ac.uk/projects/bridges/
Slide by Richard Sinnott
35eDiaMoND Screening for Breast Cancer
1 Trust ? Many Trusts Collaborative Working Audit
capability Epidemiology
- Other Modalities
- MRI
- PET
- Ultrasound
Better access to Case information And digital
tools
Supplement Mentoring With access to
digital Training cases and sharing Of information
across clinics
Provided by eDiamond project Prof. Sir Mike
Brady et al.
36climateprediction.net and GENIE
- Largest climate model ensemble
- gt45,000 users, gt1,000,000 model years
Response of Atlantic circulation to freshwater
forcing
10K
2K
37Integrative Biology
- Tackling two Grand Challenge research questions
- What causes heart disease?
- How does a cancer form and grow?
- Together these diseases cause 61 of all UK
deaths
Will build a powerful, fault-tolerant Grid
infrastructure for biomedical science Enabling
biomedical researchers to use distributed
resources such as high-performance computers,
databases and visualisation tools to develop
complex models of how these killer diseases
develop.
Slide David Gavaghan IB team, Oxford
38Foundations of Collaboration
- Strong commitment by individuals
- To work together
- To take on communication challenges
- Mutual respect mutual trust
- Strong leadership
- Distributed technology
- To support information interchange
- To support resource sharing
- To support data integration
- To support trust building
- Sufficient time
- Common goals
- Complementary knowledge, skills data
Can we predictwhen it will work? Can we
findremedies when itdoesnt?
39Grid Collaboration Questions
- Without collaboration little is achievable
- Must collaboration precede successful grid
applications? - Or will persistently and pervasively available
grids stimulate collaborations? - If we deliver support for collaborative teams,
will we also support the individual researcher? - Can we use grids to democratise computation?
- Broadening access
- Open science
40CARMEN - Scales of Integration
Understanding the brain may be the greatest
informatics challenge of the 21st century
See talk Paul Watson at Google Scalability
conf., Seattle, June 2008www.youtube.com/watch?v
2m4EvnlgL8Q
Slide from Colin Ingram Paul Watson
41CARMEN Consortium
Leadership e-Infrastructure
Colin Ingram
Paul Watson
Leslie Smith
Jim Austin
Slide from Colin Ingram Paul Watson
42CARMEN Consortium
International Partners
Slide from Colin Ingram Paul Watson
43CARMEN Consortium
Commercial Partners
- applications in the pharmaceutical sector
- interfacing of data acquisition software
- application of infrastructure
- commercialisation of tools
Slide from Colin Ingram Paul Watson
44Summary
45Grids in context
- Technology is transforming research
- Computer power, network speed, data bonanza,
pervasive devices - Social and commercial impact of web-based
computing - Part of a long-term drive for distributed
computing - A new and ambitious form
- Search for trade-offs multiple uses
- Leads to many varieties
- Multiple stake holders
- Many good reasons for building using grids
- Questions
- Will we have many grids?
- A consistent general purpose foundation grid?
- What are the minimum standards across the grids
- Collaboration is a key driver enabler
46Minimum Grid Functionalities
- Supports distributed computation
- Data and computation
- Over a variety of
- hardware components (servers, data stores, )
- Software components (services resource managers,
computation and data services) - With regularity that can be exploited
- By applications
- By other middleware tools
- By providers and operations
- It will normally have security mechanisms
- To develop and sustain trust regimes
Users want uniform and consistent access to
computing and data desk top, cloud, cluster,
institutional and regional grids, national and
international facilities
47Distributed Systems Introduction, Principles
Foundations
48Principles of Distributed Computing
- Issues you cant avoid
- Lack of Complete Knowledge (LoCK)
- Latency
- Heterogeneity
- Autonomy
- Unreliability
- Change
- A Challenging goal
- balance technical feasibility
- against virtual homogeneity, stability and
reliability - Balance between usability and productivity
- Affordable
- Wide user base to amortise costs
- Manageable and maintainable
This is NOT easy
49Lack of Complete Knowledge
- Technical origins of LoCK
- Dynamics of systems involve very large state
spaces - Cant track or explore all the states
- Latency prevents up-to-date knowledge being
available - By the time a notification of a state change
arrives the state may have changed again - Failures inhibit information propagation
- Unanticipated failure modes
- If you ask a remote system
- By the time the answer arrives it may be wrong
Never assume you know the state of a remote
system
50Lack of Complete Knowledge 2
- Human origins of LoCK
- lack of understanding
- Incomplete simplified models
- Intractable models
- Poor incomplete descriptions
- Erroneous descriptions
- Socio-Economic effects generate LoCK
- Autonomous owners do not reveal all
- About services, resources and performance
- Intermediaries aggregate simplify
- Present services they want to sell or you to use
favourably
51LoCK Counter Strategies
- Improve the quality of the available knowledge
- Better static information
- Better information collection dissemination
- Improve quality of Distributed System Models
- Prove invariants that algorithms can exploit
- Test axioms with real systems
- Build algorithms that behave reasonably well
- When they have incomplete knowledge
52Latency
- It is always going to be there
- Consequence of signal transmission times
- Consequence of messages / packets in queues
- Consequence of message processing time
- Errors cause retries ? multiplied delays
- It gets worse
- Geographic scale increases latency
- System complexity increases number of queues
- Scale complexity increase processing time
- Think about
- How many operations a system can do while a
message it sent reaches its destination, a reply
is formed and the reply travels back
53Latency Counter Strategies
- Design algorithms that require fewer round trips
- This is THE complexity measure!
- Batch requests and responses
- Shorten distance to get information
- Caching, pre-fetching replication
- But may be stale data!
- Move data to computation
- But be smart about which data when
- Move computation to data
- Succinct computation volumes of data
- But safety and privacy issues arise
Communication is very expensive
54Heterogeneity
Some of the variation is wanted and exploited
- Hardware variation
- Different computer architectures
- Big endians v little endians
- Number representation
- Address length
- Performance
- Different Storage systems
- Architectures
- Technologies
- Available operations
- Different Instrument systems
- Accepting different control inputs
- Generating different output data streams
55Heterogeneity 2
Some of the variation is unnecessary
- Operating System variation
- Different O/S architectures
- Unix families versions
- Windows families and versions
- Specialised O/S, e.g. for Instruments Mobile
devices - Implementation system variation
- Programming languages
- Scripting languages
- Workflow systems
- Data models
- Description languages
- Grid systems
- Many implementations of same functionality
56Heterogeneity Counter Measures
- Invest in virtual Homogeneity
- Agree standards (formally or de facto)
- Introduce intermediate code
- That hides unwanted variation
- Presenting it in standard form
- But this has high cost
- Developing the standard
- Developing the intermediate code
- Executing the intermediate code
- It may hide variations some want
- Provide direct access to facilities as well
- But this may inhibit optimisation automation
57Heterogeneity Counter Measures 2
- Automatically manage diversity
- Manual agreement and construction of virtual
homogeneity will not scale compose - Develop abstract and higher level models
- Describe each component
- Generate adaptations as needed from descriptions
- Not yet achievable for general complete systems
- Relevant for specific domains
58Autonomy and Change
- Necessary
- To persuade organisations individuals to engage
- They need to control their own facilities
- They have best knowledge to develop their
services - Their business opportunity
- Because coordinated change is unachievable
- Systems workloads are busy
- Service commitments must be met
- Large-scale scheduling of work is very hard
- To correct errors
- To plug vulnerabilities
- To obtain new capabilities
59Autonomy and Change 2
- What changes local decisions
- The underlying technology delivering a service
- The operations available from a service
- The semantics of the operations
- Policy changes, e.g. authorisation rules, costs,
- What changes corporate decisions
- Some agreed standard is changed
- E.g. a new version of a protocol is introduced
60Autonomy and change Counter Measures
- Users other providers expect stability
- Agree some standards that are rarely changed
- As a platform framework
- As a means of communicating change
- Introduce change-absorbing technology
- Mark the protocols and services with version
information - Transform between protocols when changes occur
- Anneal the change out of the system
- Develop algorithms tolerant to change
- Revalidate dependencies where they may change
- Handle failures due to change
Change is an asset Embrace and Manage it Ignore
it atyour peril
61Unreliability
- Failures are inevitable
- Equipment, software operations errors
- Network outages, Power outages,
- Their effects must be localised
- Cannot afford total system outages
- This is not easy
- Each error may occur when system is in any state
- The system is an unknown composition of
subsystems - Errors often occur while other errors are still
active - Errors often occur during error recovery actions
- Errors may be caused by deliberate attack
- Attackers may continue their attack
62Unreliability Counter Measures
- Requires much RD
- Continuous arms race as scale of Grids grow
- Ideal of a continuously available stable service
- Not achievable recognise that drops in response
and local failures must be dealt with - Design resilient architectures
- Design resilient algorithms
- Improve reliability of each component
- Distribute the responsibility
- For failure detection
- For recovery action
Invest heavily in error detection and recovery
63Service Oriented Architectures
64Three Components
Registries
Register an available service Send name
description
Service Consumers
Services
65Three Components
Registries
Request a service Send a description
Service Consumers
Services
66Three Components
Registries
Set (possibly empty)of matching services
Service Consumers
Services
67Three Components
Registries
Service Consumers
Request service operation
Services
68Three Components
Registries
Service Consumers
Services
Return result or Error
69Composed behaviour
- Services are themselves consumers
- They may compose and wrap other services
- The registry is itself a consumer
- A federation of registries may deal with registry
services reliability performance - Observer services may report on quality of
services and help with diagnostics - Agreements between services may be set up
- Service-Level Agreements
- Permitting sustained interaction
70Composed behaviour
- Services are themselves consumers
- They may compose and wrap other services
- The registry is itself a consumer
- A federation of registries may deal with registry
services reliability performance - Observer services may report on quality of
services and help with diagnostics - Agreements between services may be set up
- Service-Level Agreements
- Permitting sustained interaction
Requires Organising as an Architecture
71Scenarios
72Why Scenarios
- Abstraction of what people want to do
- Catches the essence of their requirement
- Framework for
- Discussion
- Comparison
- Elaboration
- Check how technologies cover scenarios
- Scenarios should not be about implementation
- Scenario can be decomposed into steps
- Possibly in many ways
- These are less abstract requirements
73Job submission scenario
1 Create or revise a job description Q In what
language? Q What must it / can it say?
74Job submission scenario
2 Submit the job description Q How? Q With what
extra parameters?
75Job submission scenario
3 Ask about progress Q How? Q What can they learn
and when? Q Is the reply in user or system terms?
76Job submission scenario
4 Retrieve results Q How? Q Where can they be
found? Q Are there helpful diagnostics?
77Job submission scenario
Q Who provides and runs this system? Q How does
it get paid for? Q What are its policies for
allocating resources to JD submissions? Q How
reliable and efficient is it? Users view?
Managers view?
78Job submission scenario
Q How much effort does it take to submit the same
job to another system? Q How does the code for
the application get to be executed? Q How are
data read or created during the computation
handled? Q How will this system evolve? Will
users need to learn new tricks?
79Ensemble run scenario
Computing resources any type any where
80Ensemble run scenario
Computing resources any type any where
Coordinationsystem
resultsstore
81Ensemble run scenario
Computing resources any type any where
resultsstore
1 Create plan for the ensemble run, e.g.
parameter space to sweep and sampling method
82Ensemble run scenario
Computing resources any type any where
resultsstore
2 Initiate the production and submission of jobs
83Ensemble run scenario
Computing resources any type any where
resultsstore
3 Result accumulation
84Ensemble run scenario
Computing resources any type any where
resultsstore
4 Researcher monitors and steers progress
85Ensemble run scenario
Computing resources any type any where
resultsstore
5 Researcher recovers and analyses results -
computes derivatives
86Ensemble run scenario
Computing resources any type any where
resultsstore
6 Researcher completes analyses discards or
archives results
87Ensemble run scenario with context
Computing resources any type any where
Everything asbefore, plusinterleavedrequests
forcontext datafrom eachjob as it runs
Runs draw data from context stores boundary
conditions, pre-computed data, observations
88Ensemble run scenario with metadata
Computing resources any type any where
Everything asbefore, plususe andgeneratemetada
ta aseach job runs
Runs organised using metadata and jobs generate
metadata helps manage 1000s of files
89Repetition of Scenario
- Normally, users repeatedly perform the same
scenario - Analysis of the next sample
- Re-analysis by other researchers designers
- Calibration and normalisation of the latest
observational run - Re-verification against the latest data
- Evaluation of the risk of the next share purchase
- (Revising the) design of an(other similar) engine
component -
- Often with parametric variations
- Often with progressive refinements
- A better pattern recogniser
- A refinement in calibration
- Code fixes, updates to reference data,
- How well do the solutions on offer support
repetition?
90Data integration scenario
Researcher wants to obtainspecified data from
multipledistributed data sources andto supply
the result to aprocess and then view itsoutput.
1 Researcher formulates query
2 Researcher submits query
3 Query system transforms and distributes query
4 Data services send back local results
5 Query system combines these to form requested
data
6 Query system sends data to process
7 Process system sends derived data to researcher
91Summary Conclusions
92Grids
- Many reasons motivating investment in grids
- Collaboration for Global Science Business
- Resource integration sharing
- New approach to large-scale distributed systems
- Large coordinated effort necessary
- Industry Academia
- Economic Creative niches
- Can they be assembled to provide all that is
needed? - Many technical and socio-economic challenges
- Work for you all
- Many new opportunities
- Work for you all
93Summary Take home message
- e-Infrastructure is arriving
- Built on Grids Web Services
- Data and Information grow in importance
- Must include user support
- Must be based on good socio-economic
understanding - There is a dramatic rate of change
- An opportunity for everyone
Can you ride the wave?
94?
Picture compositionbyLuke Humphrybased on
prior art by Frans Hals
www.omii.ac.uk