CS 582 CMPE 481 Distributed Systems - PowerPoint PPT Presentation

1 / 59

About This Presentation

Title:

CS 582 CMPE 481 Distributed Systems

Description:

Explicit communications property ... modeling of relative and temporal event ordering. Properties of Distributed Systems (cont. ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 60

Provided by: steve1824

Category:

more less

Transcript and Presenter's Notes

Title: CS 582 CMPE 481 Distributed Systems

1
CS 582 / CMPE 481Distributed Systems

Spring 2004 - 2005
Shahab Baqai

2
Administrative Information

Lecture TR at 1015 1130 hours
Homeworks Maybe
selected paper(s) review.
Usually due in 1 week.
Project
Three projects
2 extra days for emergencies
Exams
Mid-Term Exam 11th lecture
Final Exam
Web page http//suraj.lums.edu.pk/cs582s04
Evaluation
HW/Quizzes 10
Project 30
Mid Final 60

W04
3
Academic Integrity

Permitted Collaboration
encouraged and allowed at all times
Examples
Discussion of material covered during lecture /
handouts
Discussion of the requirements of an assignment
Discussion of the use of tools or development
environments
Discussion of general approaches to solving
problems
Discussion of general techniques of coding or
debugging
Discussion between a student a TA or instructor

4
Academic Integrity

Collaboration Requiring Citation
must be able to explain the solution
properly credit contribution, just like citing a
reference in a paper
Examples
Discussing the key to a problem
Discussing the design of a programming project
Assistance in debugging code
Sharing advice for testing
Research from alternative sources

5
Academic Integrity

Non-permitted Collaboration
submissions must represent original, independent
work
Examples
Copying solutions from others
Using work from past quarters
Studying another student's solution
Debugging code for someone else
Cut paste of solution/code from Internet

6
Overview

The course will focus on the principles
underlying modern distributed systems such as
networking, naming, security, synchronization,
concurrency, fault tolerance, etc. along with
case studies.
Text and reference books
CDK Distributed Systems - Concept and Design,
G. Coulouris, J, Dollimore, and T. Kindberg,
Addison-Wesley, 3rd Edition, 2001.
DS Distributed Systems Principles and
Paradigms, A. Tanenbaum and M. van Steen,
Prentice Hall, 2002.

7
Syllabus

Introduction
Communications
Naming
Security
Distributed File Systems
Synchronization
Concurrency Control
Distributed Transactions
Replication Consistency
Fault Tolerance
Advanced topic Distributed Multimedia Systems

8
Class Overview

Definition and Motivation of Distributed Systems
Properties of Distributed Systems
Challenges of Distributed Systems
Design Principles of Distributed Systems
Distributed System Architecture and Model

9
Introduction

A system is
an autonomous whole
an owner of a set of resources
something that can perform information processing
something that may be able to communicate with
other systems
A distributed system
a system
is composed of other systems
requires explicit communication among the
components
its components have a common goal set
Asynchronous system
Concurrency
No global clock
Independent failures

10
Definition(s) of a Distributed System

A distributed system is
one in which hardware or software components at
networked computers communicate and coordinate
their actions only by passing messages CDK
collection of independent computers that appears
to its users as a single coherent system. DS
several computers doing something together
multiple components, interconnections and shared
state Schroeder
fundamental properties are fault tolerance and
parallelism Mullender
one that stops you from getting any work done
when a machine youve never heard of crashes
Lamport
A set of communicating autonomous entities
functioning together to achieve a common goal
Cheriton
Collection of computer nodes connected by a
network running software that makes it function
as one system.

11
Definition of a Distributed System
1.1
A distributed system organized as
middleware.Note that the middleware layer
extends over multiple machines.
12
Why Build Distributed Systems?

inherently distributed
people, information, etc.
performance/cost
users requires more CPU cycles e.g. interactive
UI
network bandwidth growth gtgt CPU clock speed
concurrency
Modularity
standard interfaces
DCE, CORBA, and COM are examples
expandability
incremental growth
Availability
partial failure
load sharing
Scalability
Ideally no limit
no qualitative change as the system scales
careful design required to scale to very large
numbers of componentse.g. naming, addressing,
etc.
Reliability

13
Examples of Distributed Systems

Internet
Heterogeneity
Scalability
Fault tolerance
Intranet
Security
Availability
Resource sharing
Mobile and ubiquitous computing
Location hand-off management
Service discovery integration management
Disconnected operation support
Security

14
Properties of Distributed Systems

Fundamental property of distributed systems
Separation
Derived properties
Isolation property
Explicit communication property
Location property
Heterogeneity property
Multiple authority property
Concurrency property
Incremental change property
Partial failure property

15
Properties of Distributed Systems (cont.)

Isolation property
explicit access potential for control over
accessibility of components
Explicit communications property
components have disjoint storage explicit
communications between components
communication mechanism between components
Location property
components are potentially separable
requires explicit communications
possibility of relocation - location change

16
Properties of Distributed Systems (cont.)

Heterogeneity property
implication of diverse implementation technology
usage
mechanism to accommodate heterogeneous components
Multiple authority property
implication of multiple autonomous management or
control authorities
mechanism to make systems consistent
multiple security domains
Concurrency property
timely parallel activity to occur
Synchronization
modeling of relative and temporal event ordering

17
Properties of Distributed Systems (cont.)

Incremental change property (extensibility)
potential to add or remove components
Partial failure property
potential to continue after failure of individual
property
failure recovery and fault modeling

18
Challenges of Distributed Systems

Heterogeneity
Openness
Scalability
Concurrency
Security
Distribution Transparency

19
Challenges of Distributed Systems (cont.)

Heterogeneity
Applies to
Networks
interfaces, protocols
Hardware
data representation
Operating systems
system calls
Programming languages
data representation
Support for heterogeneity
Middleware
DCE, CORBA, DCOM
Virtual machine/mobile code
Java

20
Challenges of Distributed Systems (cont.)

Openness
Criteria to determine the extensibility or
re-configurability of a system
Be able to interact with services from other
components, regardless of heterogeneity of the
underlying environment
Key factors to openness coherence
public interfaces to key functions
uniform communication mechanism and public
interfaces for access to shared resources
conformance test of components to integrate
Implementing openness
provide only mechanisms, not policies
Examples
level of consistency for client-cached data
operations for mobile code download
network QoS requirement adjustment
level of security

21
Challenges of Distributed Systems (cont.)

Scalability
Said to be scalable if a system
remains effective in the presence of a
significant increase in the number of resources
users CDK
can handle the addition of users and resources
without suffering a noticeable loss of
performance or increase in administrative
complexity Neuman
Scalability components Neuman
size scalability
geographic scalability
administrative scalability
Design considerations for scalability
cost of physical resources
performance loss
software resource shortage
performance bottleneck

22
Challenges of Distributed Systems (cont.)

Scalability (cont.)
Techniques for scaling
Distribution
distribute data and computations across multiple
sites
Replication
replicate data to multiple sites
caching
make copies available locally
Scalability trade-offs
consistency vs. global synchronization

23
Challenges of Distributed Systems (cont.)

Failure handling
A system is considered faulty once its behavior
is no longer consistent with its specification
Schneider
partial failure property
Failure handling techniques
Fault detection
omission failure, timing failure (performance
failure), response failure, crash failure
Fault masking
retransmission, checksum, roll-back
Fault tolerance
can detect a fault and either fail predictably or
mask the fault from users
Recovery from failures
roll-back
Redundancy
k-resilience

24
Challenges of Distributed Systems (cont.)

Concurrency
Concurrent access to a shared resource may cause
inconsistency of the resource
Inconsistency examples
lost updates
two transactions concurrently perform an update
operation
inconsistent retrievals
performing retrieval operation before or during
an update operation
To avoid possible problems due to concurrent
access, operations of related transactions must
be serialized (one-at-a-time)

25
Challenges of Distributed Systems (cont.)

Security
Authentication
verification of source
Authorization
access rights to the resource
Encryption and decryption
public vs. private key

26
Challenges of Distributed Systems (cont.)

Distribution transparency
abstraction concepts/mechanisms to make
distributed systems appear as if they are a
single united system by concealing properties
derived from separation
Various forms of distribution transparency
Access transparency
Location transparency
Replication transparency
Concurrency transparency
Migration/Mobility transparency
Failure transparency
Performance transparency
Scaling transparency

27
Distribution transparency

Access transparency
hiding the use of explicit communications
no difference in access between local and remote
data
access may be restricted by controls in support
of some policy
generic functions communications and security
mechanisms

Logical Shared Resource
A
B
C
D
28
Distribution transparency (cont.)

Location transparency
hiding the details of topology in a system
bindings between objects are independent of the
routes connecting them
data is identified by logical or functional names
rather than addresses
generic functions naming, addressing, and routing

Logical Shared Resource
A
C
D
B
29
Distribution transparency (cont.)

Replication transparency
hiding the replication of state in a system
increase dependability and/or performance without
knowledge of replica visibility
active vs. passive replicas
generic functions active and passive replication
mechanisms

Logical Shared Resource
A
B
C
D
30
Distribution transparency (cont.)

Concurrency transparency
hiding the effect of parallel execution potential
consistency ensuring mechanisms may vary
generic functions synchronization, event
ordering mechanisms, along with atomic operations

Logical Shared Resource
A
B
C
D
31
Distribution transparency (cont.)

Migration transparency
hiding the effect of migration or reconfiguration
of resources in a system
generic functions configuration and dynamic
reconfiguration mechanisms

Logical Shared Resource
A
B
C
D
32
Distribution transparency (cont.)

Failure transparency
hiding the occurrence of errors in system
components and communications
restart or start the job on the replica just
after where it stopped
generic functions fault management and recovery
mechanism

Logical Shared Resource
A
B
C
D
33
Distribution transparency (cont.)

Performance transparency
hiding the performance penalty of distribution
exploits other forms of transparency to
reconfigure system resources to optimize
performance

Logical Shared Resource
A
B
C
D
34
Distribution transparency (cont.)

Scaling transparency
hiding the effect of changes in system size
allows the resizing of a distributed system to be
independent of its structures and algorithms

Logical Shared Resource
A
C
D
B
E
G
H
F
35
Considerations in Distributed System Design

Widely varying modes of use
Workload
Connectivity
Timeliness
Wide range of system environments
Heterogeneity
Performance
Scalability
Internal Problems
Synchronization
Failure
External Threats
Security

36
Distributed Systems Design Principles

Replicate to increase availability
replication and consistency vs. availability
Tradeoff availability and consistency
network name service vs. bank transaction
Cache hints if possible
vital technique for high-performance distributed
system design and implementation
Stashing to allow autonomous operation
conceal the temporal disconnection from networks
Exploit locality with caches
cache coherence protocols

37
Distributed Systems Design Principles

Use timeout for revocation
resource locking, validity of cache, etc.
Use a standard remote invocation mechanism
Trust only programs on physically secure machines
Use encryption for authentication and data
security
Try to prove distributed algorithms
formal specification of operations
Capabilities might be useful
authentication and access right embedded in the
clients request

38
Distributed Systems Architecture HW
Multi-processors bus-based vs.
switch-based ?not a distributed system
39
Distributed Systems Architecture HW (cont.)

Multi-computers
homogeneous systems
bus-based vs. switch-based
heterogeneous systems
node heterogeneity
network heterogeneity
? distributed systems hide heterogeneity

40
Distributed Systems Architecture SW

DOS (Distributed Operating Systems)
NOS (Network Operating Systems)
Middleware

41
Distributed Systems Architecture SW (cont.)

Distributed operating systems a single system
view
Multi-computer OS
Distributed shared memory system

42
Distributed Systems Architecture SW (cont.)

Network operating systems
transparency via network OS services

43
Distributed Systems Architecture SW (cont.)

Middleware
Harmonization of pros of DOS and NOS
DOS transparency ease of use
NOS scalability openness
Improve distribution transparency of NOS.

44
Distributed Systems Architecture SW (cont.)

Comparison

45
Distributed Systems Architecture SW (cont.)

Client-Server Model
Client a process wishing to access the
resources on a different computer
Server a process managing the shared resources
which is allowed to a client
HTTP server vs. Web browser
Services provided by multiple servers
Functional distribution
Replication
Proxy servers and caches
Increase availability and performance of the
service
Peer processes
Processes without any distinction between clients
and servers
Distribution of control and load

46
Distributed Systems Architecture SW (cont.)

Multi-tier architecture
Alternatives (a) thru (e)

47
Variations of Client-Server Model

Mobile code
Code at the client downloaded to the server
e.g., Applets
Mobile agents
Executing program (code and data) that goes from
one computer to another in a network carrying out
a task
e.g., worm program (Xerox PARC)
Network computers
Applications are run locally but the files are
managed by a remote file server
e.g., NFS
Thin clients
executing windows-based user interface on local
computer while application executes on compute
server
e.g., X11 Server

48
Variations of Client-Server Model (cont.)

Mobile devices
capable of wireless networking
e.g. personal digital assistants (PDAs)
Spontaneous networking
the form of distribution that integrates mobile
devices and other devices into a given network
Key features
easy connection to a local network
A device brought into a new network environments
is transparently reconfigured to obtain
connectivity there
easy integration with local services
automatic discovery of available services with no
special configuration
discovery services

49
Variations of Client-Server Model (cont.)

Spontaneous networking (cont.)
Issues
supporting convenient connection and integration
limited connectivity
how the system can support the user so that they
can continue to work while disconnected
security and privacy
track of users location
Discovery services
to accept and store details of services that are
available on the network and to respond to
queries from clients about them
Interfaces of discovery services
registration service accept registration
requests from servers, stores properties in
database of currently available services
lookup service match requested services with
available servers

50
Fundamental Models of Distributed Systems

Main consideration points
Interaction
Communication and coordination
Failure
Classification of faults and tolerance methods
Security
Classification of attacks and protection
mechanisms

51
Interaction Model

Process interaction in a distributed system
Separation transparency over a communication
subsystem
two factors
Communication performance
Latency
Bandwidth
Jitter
Clock synchronization
Adjust clock drifts via either physical clock or
logical clock
Two variants of interaction model
synchronous distributed systems
Bounded time on process execution, message
delivery, and clock drift
Use timeout for detecting failures
asynchronous distributed systems
no bounds on process execution, message delivery,
and clock drift

52
Interaction Model (cont.)

Event ordering
Example
X sends a message Meeting(m1)
Y reads Meeting and then replies Re
Meeting(m2)
Z reads Meeting(m1) and Re Meeting(m2) and
then replies ReMeeting(m3)
A sees m3, m1, and m2 in wrong order (due to
independent delay)

53
Interaction Model (cont.)

Event ordering (cont.)
using physical time
not feasible in real since perfect clock
synchronization is impossible
using logical time Lamport78
execution of a distributed system can be
described in terms of events and their ordering
despite the lack of accurate clocks
event ordering rules (partial ordering relation)
if e1 and e2 happen in the same process, and e2
happens after e1, then e1?e2
Y receives m1 before sending m2 (logical time 2
and 3)
if e1 is the sending of a message and e2 is the
receiving of the message, then e1?e2
X sends m1 before Y receives m1 (logical time 1
and 2)
Y sends m2 before X receives m2 (logical time 3
and 4)

54
Failure Model

Omission failures
fail to perform actions a process or
communication channel is supposed to do
process omission failures
crash halt and remain halted
a process crash is fail-stop if other processes
can detect certainly that the process has crashed
detection by timeout in synchronous systems
cf. asynchronous systems
communication omission failures
fail to transport a message from a senders
outgoing buffer to a receivers incoming buffer
possible causes
buffer overflow and/or transmission error
Derived failures
send-omission failure
channel failures
receive-omission failures

55
Failure Model (cont.)

Arbitrary failures
the term arbitrary or Byzantine to describe the
worst possible failure semantics (cf. omission
and timing failures are called benign)
a process arbitrarily omits intended steps or
takes unintended steps
set or return wrong values
cant detect by timeout
communication arbitrary failures
message contents corruption or delivery of
non-existent messages and duplicate messages
detect by checksums or sequence numbers

56
Failure Model (cont.)

Timing failures
applicable only in synchronous systems
time limits are set on process execution, message
delivery, and clock drift rate
clock failures
exceeding the bounds on clock drift rate
performance failures
exceeding the bounds on the interval between two
processing steps or message transmission

57
Failure Model (cont.)

Masking failures
by hiding failures or by converting them into a
more acceptable type of failures (e.g.,
checksums)
retransmission - masking communication omission
failures
replication - masking process crashes
reliable communication (masking communication
omission failures)
validity any message is eventually delivered
integrity the identical message is delivered
exactly once
duplicate checking by sequence number
security measures against spurious message and
replaying or tampering with messages

58
Security Model

Protecting objects
access right allows a specified principal to
perform a specified operation of an object
principal authority (user or process) on which a
invocation/result is issued
Securing processes and channels
to identify/defeat security threats to processes
interaction by message exchange
identifying security threats
threats to processes message source spoofing
threats to channels message copying, altering,
replaying

59
Security Model (cont.)

Protection mechanisms
encryption scrambles a message to hide its
contents by using secret keys shared only by
processes involved
message authentication proves the identities of
senders
an encrypted part included in a message
secure channel
reliable knowledge of the identity of the
principal behind a invocation/result
privacy and integrity (protection against
tampering)
prevention of message replaying or reordering
(physical or logical time stamp)
e.g. Virtual Private Networks (VPN), Secure
Sockets Layer (SSL) protocol
Other possible threats
denial of service interfering with the
activities of authorized users
making excessive, invalid invocations or
transmission -gt resource overloading,
e.g., pings to cnn.com
mobile code
e.g., email worm as a Trojan horse