Outline

About This Presentation

Title:

Outline

Description:

Tightly-coupled operating system for multi-processors and homogeneous multicomputers ... This layer is the last part of a basic network protocol stack ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 92

Provided by: xiuwe

Learn more at: http://www.cs.fsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Outline

1
Outline

Announcement
Midterm Review
Distributed File Systems continued
If we have time

2
Announcements

Please turn in your homework 3 at the beginning
of class
The midterm will be on March 20
This coming Thursday
It will be an open-book, open-note exam

3
Operating System

An operating system is a layer of software on a
bare machine that performs two basic functions
Resource management
To manage resources so that they are used in an
efficient and fair manner
User friendliness

4
Distributed Systems

A distributed system is a collection of
independent computers that appears to its users
as a single coherent system
Independent computers mean that they do not share
memory or clock
The computers communicate with each other by
exchanging messages over a communication network

5
Distributed Systems cont.
6
Distributed Systems cont.

Advantages
The computing power of a group of cheap
workstations can be enormous
Decisive price/performance advantage over
traditional time-sharing systems
Resource sharing
Enhanced performance
Improved reliability and availability
Modular expandability

7
Distributed System Architecture cont.

Distributed systems are often classified based on
the hardware
Multiprocessor systems
Homogenous multi-computer systems
Heterogeneous multi-computer systems

8
Distributed Operating Systems

Hardware for distributed systems is important,
but the software largely determines what a
distributed system looks like to a user
Distributed operating systems are much like the
traditional operating systems
Resource management
User friendliness
The key concept is transparency

9
Distributed Operating Systems cont.

In a truly distributed operating system, the user
views the system as a virtual uniprocessor system
even though physically it consists of multiple
computers
In other words, the use of multiple computers and
accessing remote data and resources should be
invisible to the user

10
Overview of Different Kinds of Distributed Systems
11
Multicomputer Operating Systems

General structure of a multicomputer operating
system

12
Network Operating System
1-19
13
Middleware and Openness
1.23

In an open middleware-based distributed system,
the protocols used by each middleware layer
should be the same, as well as the interfaces
they offer to applications.

14
Comparison Between Systems
15
Issues in Distributed Operating Systems

Absence of global knowledge
In a distributed system, due to the
unavailability of a global memory and a global
clock and due to unpredictable message delays, it
is practically impossible to for a computer to
collect up-to-date information about the global
state of the distributed system
Therefore a fundamental problem is to develop
efficient techniques to implement a decentralized
system wide control
Another problem is how to order all the events

16
Issues in Distributed Operating Systems cont.

Naming
Plays an important role in achieving location
transparency
A name service maps a logical name into a
physical address by making use of a table lookup,
an algorithm, or a combination of both
In distributed systems, the tables may be
replicated and stored at many places
Consider naming in a distributed file system

17
Issues in Distributed Operating Systems cont.

Scalability
Systems generally grow with time, especially
distributed systems
Scalability requires that the growth should not
result in system unavailability or degraded
performance
This puts additional constraints on design
approaches

18
Issues in Distributed Operating Systems cont.

Compatibility
Refers to the interoperability among the
resources in a system
Three different levels
Binary level
All processors execute the same binary
instruction repertoire
Virtual binary level
Execution level
Same source code can be compiled and executed
properly
Protocol level
A common set of protocols

19
Issues in Distributed Operating Systems cont.

Process synchronization
The synchronization of processes in distributed
systems is difficult because of the
unavailability of shared memory
It needs to synchronize processes running on
different computers when they try to concurrently
access a shared resource
This is the mutual exclusion problem as in
classical operating systems

20
Issues in Distributed Operating Systems cont.

Resource management
Resource management needs to make both local and
remote resources available to uses in an
effective manner
Data migration
Distributed file system
Distributed shared memory
Computation migration
Remote procedure call
Distributed scheduling

21
Issues in Distributed Operating Systems cont.

Structuring
The distributed operating system requires some
additional constraints on the structure of the
underlying operating system
The collective kernel structure
An operating system is structured as a collection
of processes that are largely independent of each
other
Object-oriented operating system
The operating systems services are implemented
as objects

22
Clients and Servers

General interaction between a client and a server.

23
Layered Protocols

Layers, interfaces, and protocols in the OSI
model.

24
Network Layer

The primary task of a network layer is routing
The most widely used network protocol is the
connection-less IP (Internet Protocol)
Each IP packet is routed to its destination
independent of all others
A connection-oriented protocol is gaining
popularity
Virtual channel in ATM networks

25
Transport Layer

This layer is the last part of a basic network
protocol stack
In other words, this layer can be used by
application developers
An important aspect of this layer is to provide
end-to-end communication
The Internet transport protocol is called TCP
(Transmission Control Protocol)
The Internet protocol also supports a
connectionless transport protocol called UDP
(Universal Datagram Protocol)

26
Sockets

Socket primitives for TCP/IP.

27
Sockets cont.

Connection-oriented communication pattern using
sockets.

28
Socket Programming

Review
IP
TCP
UDP
Port
Server Design Issues
Iterative vs. concurrent server
Stateless vs. stateful server
Multithreaded server

29
A Multithreaded Server
30
The Message Passing Model

The message passing model provides two basic
communication primitives
Send and receive
Send has two logical parameters, a message and
its destination
Receive has two logical parameters, the source
and a buffer for storing the message

31
Semantics of Send and Receive Primitives

There are several design issues regarding send
and receive primitives
Buffered or un-buffered
Blocking vs. non-blocking primitives
With blocking primitives, the send does not
return control until the message has been sent or
received and the receive does not return control
until a message is copied to the buffer
With non-blocking primitives, the send returns
control as the message is copied and the receive
signals its intention to receive a message and
provide a buffer for it

32
Semantics of Send and Receive Primitives cont.

Synchronous vs. asynchronous primitives
With synchronous primitives, a SEND primitive is
blocked until a corresponding RECEIVE primitive
is executed
With asynchronous primitives, a SEND primitive
does not block if there is no corresponding
execution of a RECEIVE primitive
The messages are buffered

33
Remote Procedure Call

RPC is designed to hide all the details from
programmers
Overcome the difficulties with message-passing
model
It extends the conventional local procedure calls
to calling procedures on remote computers

34
Steps of a Remote Procedure Call cont.
35
Remote Procedure Call cont.

Design issues
Structure
Mostly based on stub procedures
Binding
Through a binding server
The client specifies the machine and service
required
Parameter and result passing
Representation issues
By value and by reference

36
Remote Object Invocation

Extend RPC principles to objects
The key feature of an object is that it
encapsulates data (called state) and the
operations on those data (called methods)
Methods are made available through an interface
The separation between interfaces and the objects
implementing these interfaces allows us to place
an interface at one machine, while the object
itself resides on another machine

37
Distributed Objects

Common organization of a remote object with
client-side proxy.

38
Inherent Limitations of a Distributed System

Absence of a global clock
In a centralized system, time is unambiguous
In a distributed system, there exists no system
wide common clock
In other words, the notion of global time does
not exist
Impact of the absence of global time
Difficult to reason about temporal order of
events
Makes it harder to collect up-to-date information
on the state of the entire system

39
Inherent Limitations of a Distributed System

Absence of shared memory
An up-to-date state of the entire system is not
available to any individual process
This information, however, is necessary to reason
about the systems behavior, debugging,
recovering from failures

40
Lamports Logical Clocks

Logical clocks
For a wide of algorithms, what matters is the
internal consistency of clocks, not whether they
are close to the real time
For these algorithms, the clocks are often called
logical locks
Lamport proposed a scheme to order events in a
distributed system using logical clocks

41
Lamports Logical Clocks cont.

Definitions
Happened before relation
Happened before relation (?) captures the causal
dependencies between events
It is defined as follows
a ? b, if a and b are events in the same process
and a occurred before b.
a ? b, if a is the event of sending a message m
in a process and b is the event of receipt of the
same message m by another process
If a ? b and b ? c, then a ? c, i.e., ? is
transitive

42
Lamports Logical Clocks cont.

Definitions continued
Causally related events
Event a causally affects event b if a ? b
Concurrent events
Two distinct events a and b are said to be
concurrent (denoted by a b) if a ? b and b ? a
For any two events, either a ? b, b ? a, or a b

43
Lamports Logical Clocks cont.

Implementation rules
IR1 Clock Ci is incremented between any two
successive events in process Pi
Ci Ci d ( d gt 0)
IR2 If event a is the sending of message m by
process Pi, then message m is assigned a
timestamp tm Ci(a). On receiving the same
message m by process Pj, Cj is set to
Cj max(Cj, tm d)

44
An Example
45
Total Ordering Using Lamports Clocks

If a is any event at process Pi and b is any
event at process Pj, then a gt b if and only if
either
Where is any arbitrary relation that
totally orders the processes to break ties

46
A Limitation of Lamports Clocks

In Lamports system of logical clocks
If a ? b, then C(a) lt C(b)
The reverse if not necessarily true if the events
have occurred on different processes

47
A Limitation of Lamports Clocks
48
Vector Clocks

Implementation rules
IR1 Clock Ci is incremented between any two
successive events in process Pi
Cii Cii d ( d gt 0)
IR2 If event a is the sending of message m by
process Pi, then message m is assigned a
timestamp tm Ci(a). On receiving the same
message m by process Pj, Cj is set to
Cjk max(Cjk, tmk)

49
Vector Clocks cont.
50
Vector Clocks cont.

Assertion
At any instant,
Events a and b are casually related if ta lt tb or
tb lt ta. Otherwise, these events are concurrent
In a system of vector clocks,

51
Causal Ordering of Messages

The causal ordering of messages tries to maintain
the same causal relationship that holds among
message send events with the corresponding
message receive events
In other words, if Send(M1) -gt Send(M2), then
Receive(M1) -gt Receive(M2)
This is different from causal ordering of events

52
Causal Ordering of Messages cont.
53
Causal Ordering of Messages cont.

The basic idea
It is very simple
Deliver a message only when no causality
constraints are violated
Otherwise, the message is not delivered
immediately but is buffered until all the
preceding messages are delivered

54
Birman-Schiper-Stephenson Protocol
55
Schiper-Eggli-Sando Protocol
56
Schiper-Eggli-Sando Protocol cont.
57
Schiper-Eggli-Sando Protocol cont.
58
Local State

Local state
For a site Si, its local state at a given time is
defined by the local context of the distributed
application, denoted by LSi.
More notations
mij denotes a message sent by Si to Sj
send(mij) and rec(mij) denote the corresponding
sending and receiving event.

59
Definitions cont.
60
Definitions cont.
61
Global State cont.
62
Definitions cont.
Strongly consistent global state A global state
is strongly consistent if it is consistent and
transitless
63
Global State cont.
64
Chandy-Lamports Global State Recording Algorithm
65
Cuts of a Distributed Computation

A cut is a graphical representation of a global
state
A consistent cut is a graphical representation of
a consistent global state
Definition
A cut of a distributed computation is a set
Cc1, c2, ...., cn, where ci is a cut event at
site Si in the history of the distributed
computation

66
Cuts of a Distributed Computation cont.
67
Cuts of a Distributed Computation cont.
68
Cuts of a Distributed Computation cont.
69
Cuts of a Distributed Computation cont.
70
Cuts of a Distributed Computation cont.
71
The Critical Section Problem

When processes (centralized or distributed)
interact through shared resources, the integrity
of the resources may be violated if the accesses
are not coordinated
The resources may not record all the changes
A process may obtain inconsistent values
The final state of the shared resource may be
inconsistent

72
Mutual Exclusion

One solution to the problem is that at any time
at most only one process can access the shared
resources
This solution is known as mutual exclusion
A critical section is a code segment in a process
which shared resources are accessed
A process can have more than one critical section
There are problems which involve shared resources
where mutual exclusion is not the optimal solution

73
The Structure of Processes

Structure of process Pi
repeat
entry section
critical section
exit section
reminder section
until false

74
Requirements of Mutual Exclusion Algorithms

Freedom from deadlocks
Two or more sites should not endlessly wait for
messages
Freedom from starvation
A site would wait indefinitely to execute its
critical section
Fairness
Requests are executed in the order based on
logical clocks
Fault tolerant
It continues to work when some failures occur

75
Performance Measure for Distributed Mutual
Exclusion

The number of messages per CS invocation
Synchronization delay
The time required after a site leaves the CS and
before the next site enters the CS
System throughput 1/(sdE), where sd is the
synchronization delay and E the average CS
execution time
Response time
The time interval a request waits for its CS
execution to be over after its request messages
have been sent out

76
Performance Measure for Distributed Mutual
Exclusion
77
A Centralized Algorithm

It is a simple solution
One site, called the control site, is responsible
for granting permission to the CS execution
To request the CS, a site sends a REQUEST message
to the control site
When a site is done with CS execution, it sends a
RELEASE message to the control site
The control site queues up the requests for the
CS and grant them permission

78
Distributed Solutions

Non-token-based algorithms
Use timestamps to order requests and resolve
conflicts between simultaneous requests
Lamports algorithm and Ricart-Agrawala Algorithm
Token-based algorithms
A unique token is shared among the sites
A site is allowed to enter the CS if it possess
the token and continues to hold the token until
its CS execution is over then it passes the
token to the next site

79
Lamports Distributed Mutual Exclusion Algorithm

This algorithm is based on the total ordering
using Lamports clocks
Each process keeps a Lamports logical clock
Each process is associated with a unique id that
can be used to break the ties
In the algorithm, each process keeps a queue,
request_queuei, which contains mutual exclusion
requests ordered by their timestamp and
associated id
Ri of each process consists of all the processes
The communication channel is assumed to be FIFO

80
Lamports Distributed Mutual Exclusion Algorithm
cont.
81
Lamports Distributed Mutual Exclusion Algorithm
cont.
82
Ricart-Agrawala Algorithm
83
A Simple Toke Ring Algorithm

When the ring is initialized, one process is
given the token
The token circulates around the ring
It is passed from k to k1 (modulo the ring size)
When a process acquires the token from its
neighbor, it checks to see if it is waiting to
enter its critical section
If so, it enters its CS
When exiting from its CS, it passes the token to
the next
Otherwise, it passes the token to the next

84
Suzuki-Kasamis Algorithm

Data structures
Each site maintains a vector consisting the
largest sequence number received so far from
other sites
The token consists of a queue of requesting sites
and an array of integers, consisting of the
sequence number of the request that a site
executed most recently

85
Suzuki-Kasamis Algorithm cont.
86
Distributed Deadlock Detection

In distributed systems, the system state can be
represented by a wait-for graph (WFG)
In WFG, nodes are processes and there is a
directed edge from node P1 to node P2 if P1 is
blocked and is waiting for P2 to release some
resource
The system is deadlocked if there is a directed
cycle or knot in its WFG
The problem is how to maintain the WFG and detect
cycle/knot in the graph

87
Distributed Deadlock Detection cont.

Centralized detection algorithms
Distributed deadlock algorithms
Path-pushing
Edge-chasing
Diffusion computation
Global state detection
You need to know the basic ideas but not the
details about those algorithms

88
Agreement Protocols

In distributed systems, sites are often required
to reach mutual agreement
In distributed database systems, data managers
must agree on whether to commit or to abort a
transaction
Reaching an agreement requires the sites have
knowledge about values at other sites
Agreement when the system is free from failures
Agreement when the system is prone to failure

89
Agreement Problems