Distributed Systems: Architectures, Principles and Scenarios presentation

About This Presentation

Transcript and Presenter's Notes

Title: Distributed Systems: Architectures, Principles and Scenarios

1
Session 2 Distributed Systems Architectures,
Principles and Scenarios Monday 10th July
Malcolm Atkinson
2
Distributed Systems Introduction, Principles
Foundations
3
Principles of Distributed Computing

Issues you cant avoid
Lack of Complete Knowledge (LoCK)
Latency
Heterogeneity
Autonomy
Unreliability
Change
A Challenging goal
balance technical feasibility
against virtual homogeneity, stability and
reliability
Appropriate balance between usability and
productivity
while remaining affordable, manageable and
maintainable

This is NOT easy
4
Lack of Complete Knowledge

Technical origins of LoCK
Dynamics of systems involve very large state
spaces
Cant track or explore all the states
Latency prevents up-to-date knowledge being
available
By the time a notification of a state change
arrives the state may have changed again
Failures inhibit information propagation
Unanticipated failure modes
If you ask a remote system
By the time the answer arrives it may be wrong

Never assume you know the state of a remote
system
5
Lack of Complete Knowledge 2

Human origins of LoCK
lack of understanding
Incomplete simplified models
Intractable models
Poor incomplete descriptions
Erroneous descriptions
Socio-Economic effects generate LoCK
Autonomous owners do not reveal all
About services, resources and performance
Intermediaries aggregate simplify

6
LoCK Counter Strategies

Improve the quality of the available knowledge
Better static information
Better information collection dissemination
Improve quality of Distributed System Models
Prove invariants that algorithms can exploit
Test axioms with real systems
Build algorithms that behave reasonably well
When they have incomplete knowledge

7
Latency

It is always going to be there
Consequence of signal transmission times
Consequence of messages / packets in queues
Consequence of message processing time
Errors cause retries
It gets worse
Geographic scale increases latency
System complexity increases number of queues
Scale complexity increase processing time
Think about
How many operations a system can do while a
message it sent reaches its destination, a reply
is formed and the reply travels back

8
Latency Counter Strategies

Design algorithms that require fewer round trips
This is THE complexity measure!
Batch requests and responses
Shorten distance to get information
Caching, pre-fetching replication
But may be stale data!
Move data to computation
But be smart about which data when
Move computation to data
Succinct computation volumes of data
But safety and privacy issues arise

Communication is very expensive
9
Heterogeneity
Some of the variation is wanted and exploited

Hardware variation
Different computer architectures
Big endians v little endians
Number representation
Address length
Performance
Different Storage systems
Architectures
Technologies
Available operations
Different Instrument systems
Accepting different control inputs
Generating different output data streams

10
Heterogeneity 2
Some of the variation is just make work

Operating System variation
Different O/S architectures
Unix families versions
Windows families and versions
Specialised O/S, e.g. for Instruments Mobile
devices
Implementation system variation
Programming languages
Scripting languages
Workflow systems
Data models
Description languages
Grid systems
Many implementations of same functionality

11
Heterogeneity Counter Measures

Invest in virtual Homogeneity
Agree standards (formally or de facto)
Introduce intermediate code
That hides unwanted variation
Presenting it in standard form
But this has high cost
Developing the standard
Developing the intermediate code
Executing the intermediate code
It may hide variations some want
Provide direct access to facilities as well
But this may inhibit optimisation automation

12
Heterogeneity Counter Measures 2

Automatically manage diversity
Manual agreement and construction of virtual
homogeneity will not scale compose
Develop abstract and higher level models
Describe each component
Generate the adaptations as needed from these
descriptions
Not yet achievable for the general complete
systems
Relevant for specific domains

13
Autonomy and Change

Necessary
To persuade organisations individuals to engage
They need to control their own facilities
They have best knowledge to develop their
services
Their business opportunity
Because coordinated change is unachievable
Systems workloads are busy
Service commitments must be met
Large-scale scheduling of work is very hard
To correct errors
To plug vulnerabilities
To obtain new capabilities

14
Autonomy and Change 2

What changes local decisions
The underlying technology delivering a service
The operations available from a service
The semantics of the operations
Policy changes, e.g. authorisation rules, costs,
What changes corporate decisions
Some agreed standard is changed
E.g. a new version of a protocol is introduced

15
Autonomy and change Counter Measures

Users other providers expect stability
Agree some standards that are rarely changed
As a platform framework
As a means of communicating change
Introduce change-absorbing technology
Mark the protocols and services with version
information
Transform between protocols when changes occur
Anneal the change out of the system
Develop algorithms tolerant to change
Revalidate dependencies where they may change
Handle failures due to change

Change is an asset Embrace and Manage it Ignore
it atyour peril
16
Unreliability

Failures are inevitable
Equipment, software operations errors
Network outages, Power outages,
Their effects must be localised
Cannot afford total system outages
This is not easy
Each error may occur when system is in any state
The system is an unknown composition of
subsystems
Errors often occur while other errors are still
active
Errors often occur during error recovery actions
Errors may be caused by deliberate attack
Attackers may continue their attack

17
Unreliability Counter Measures

Requires much RD
Continuous arms race as scale of Grids grow
Ideal of a continuously available stable service
Not achievable recognise that drops in response
and local failures must be dealt with
Design resilient architectures
Design resilient algorithms
Improve reliability of each component
Distribute the responsibility
For failure detection
For recovery action

Invest heavily in error detection and recovery
18
Service Oriented Architectures
19
Three Components
Registries
Register an available service Send name
description
Service Consumers
Services
20
Three Components
Registries
Request a service Send a description
Service Consumers
Services
21
Three Components
Registries
Set (possibly empty)of matching services
Service Consumers
Services
22
Three Components
Registries
Service Consumers
Request service operation
Services
23
Three Components
Registries
Service Consumers
Services
Return result or Error
24
Composed behaviour

Services are themselves consumers
They may compose and wrap other services
The registry is itself a consumer
A federation of registries may deal with registry
services reliability performance
Observer services may report on quality of
services and help with diagnostics
Agreements between services may be set up
Service-Level Agreements
Permitting sustained interaction

25
Composed behaviour

Services are themselves consumers
They may compose and wrap other services
The registry is itself a consumer
A federation of registries may deal with registry
services reliability performance
Observer services may report on quality of
services and help with diagnostics
Agreements between services may be set up
Service-Level Agreements
Permitting sustained interaction

Requires Organising as an Architecture
26
OGF Open Grid Services Architecture
27
The Open Grid Services Architecture

An open, service-oriented architecture (SOA)
Resources as first-class entities
Dynamic service/resource creation and destruction
Built on a Web services infrastructure
Resource virtualization at the core
Build grids from small number of standards-based
components
Replaceable, coarse-grained
e.g. brokers
Customizable
Support for dynamic, domain-specific content
within the same standardized framework

Hiro Kishimoto Keynote GGF17
28
Why Use an SOA?

Logical view of capabilities
Relatively coarse-grained functions
Reusable and composable behaviors
Encapsulation of complex operations
Naturally extendable framework
Platform-neutral
machine and OS

Hiro Kishimoto Keynote GGF17
29
SOA Web Services Key Benefits

SOA
Flexible
Locate services on any server
Relocate as necessary
Prospective clients find services using
registries
Scalable
Add remove services as demand varies
Replaceable
Update implementations without disruption to
users
Fault-tolerant
On failure, clients query registry for alternate
services

Web Services
Interoperable
Growing number of industry standards
Strong industry support
Reduce time-to-value
Harness robust development tools for Web services
Decrease learning implementation time
Embrace and extend
Leverage effort in developing and driving
consensus on standards
Focus limited resources on augmenting adding
standards as needed

Hiro Kishimoto Keynote GGF17
30
Virtualizing Resources
Access
Type-specific interfaces
Storage
Sensors
Applications
Information

Computers
Common Interfaces
Resource-specific Interfaces
Hiro Kishimoto Keynote GGF17
31
Specifications Landscape April 2006
Warning Volatile data!
SYSTEMS MANAGEMENT
UTILITY COMPUTING
GRID COMPUTING
Use Cases Applications
Distributed query processing
Data Centre
ASP
Collaboration
Multi Media
Persistent Archive
VO Management
OGSA-EMS
OGSA Self Mgmt
WS-DAI
WSDM
Discovery
Information
WS-BaseNotification
Naming
GGF-UR
Core Services
Privacy
GFD-C.16
Trust
Data Model
WSRF-RL
WSRF-RP
Web ServicesFoundation
WSRF-RAP
WS-Security
SAML/XACML
X.509
WS-Addressing
CIM/JSIM
HTTP(S)/SOAP
WSDL
Data Transport
Standard
Evolving
Gap
Hole
Hiro Kishimoto Keynote GGF17
32
Summary Conclusions
33
Grids

Many reasons motivating investment in grids
Collaboration for Global Science Business
Resource integration sharing
New approach to large scale distributed systems
Large coordinated effort
Industry Academia
Many technical and socio-economic challenges
Work for you all
Many new opportunities
Work for you all

34
Summary Take home message

E-Infrastructure is arriving
Built on Grids Web Services
Data and Information grow in importance
It must include user support
It must be based on good socio-economic
understanding
There is a dramatic rate of change
An opportunity for everyone

Can you ride the wave?
35
Scenarios
36
Why Scenarios

Abstraction of what people want to do
Catches the essence of their requirement
Framework for
Discussion
Comparison
Elaboration
Check how technologies cover scenarios
Opportunity not part of the scenario
Scenarios should not be about implementation
Scenario can be decomposed into steps
Possibly in many ways
These are less abstract requirements

37
Job submission scenario
1 Create or revise a job description Q In what
language? Q What must it / can it say?
38
Job submission scenario
2 Submit the job description Q How? Q With what
extra parameters?
39
Job submission scenario
3 Ask about progress Q How? Q What can they learn
and when? Q Is the reply in user or system terms?
40
Job submission scenario
4 Retrieve results Q How? Q Where can they be
found? Q Are there helpful diagnostics?
41
Job submission scenario
Q Who provides and runs this system? Q How does
it get paid for? Q What are its policies for
allocating resources to JD submissions? Q How
reliable and efficient is it? Users view?
Managers view?
42
Job submission scenario
Q How much effort does it take to submit the same
job to another system? Q How does the code for
the application get to be executed? Q How are
data read or created during the computation
handled? Q How will this system evolve? Will
users need to learn new tricks?
43
Ensemble run scenario
Computing resources any type any where
44
Ensemble run scenario
Computing resources any type any where
Coordinationsystem
resultsstore
45
Ensemble run scenario
Computing resources any type any where
resultsstore
1 Create plan for the ensemble run, e.g.
parameter space to sweep and sampling method
46
Ensemble run scenario
Computing resources any type any where
resultsstore
2 Initiate the production and submission of jobs
47
Ensemble run scenario
Computing resources any type any where
resultsstore
3 Result accumulation
48
Ensemble run scenario
Computing resources any type any where
resultsstore
4 Researcher monitors and steers progress
49
Ensemble run scenario
Computing resources any type any where
resultsstore
5 Researcher recover and analyses results -
computes derivatives
50
Ensemble run scenario
Computing resources any type any where
resultsstore
6 Researcher completes analyses discards or
archives results
51
Ensemble run scenario with context
Computing resources any type any where
Everything asbefore, plusinterleavedrequests
forcontext datafrom eachjob as it runs
Runs draw data from context stores boundary
conditions, pre-computed data, observations
52
Ensemble run scenario with metadata
Computing resources any type any where
Everything asbefore, plususe andgeneratemetada
ta aseach job runs
Runs organised using metadata and jobs generate
metadata helps manage 1000s of files
53
Repetition of Scenario

Normally, users repeatedly perform the same
scenario
Analysis of the next sample
Re-analysis by other researchers designers
Calibration and normalisation of the latest
observational run
Re-verification against the latest data
Evaluation of the risk of the next share purchase
(Revising the) design of an(other similar) engine
component
Often with parametric variations
Often with progressive refinements
A better pattern recogniser
A refinement in calibration
Code fixes, updates to reference data,
How well do the solutions on offer support
repetition?

54
Questions Comments Please

Write a Comment

User Comments (0)

About PowerShow.com

Distributed Systems: Architectures, Principles and Scenarios PowerPoint PPT Presentation