Title: Computer Science 328 Distributed Systems
1Computer Science 328Distributed Systems
- Lecture 19
- Replication Control
2Replication
- Enhancing Services by replicating data
- Performance Enhancement
- Example Workload is shared between the servers
by binding all the server IP addresses to the
sites DNS name. A DNS lookup of the site results
in one of the servers IP addresses being
returned, in a round-robin fashion. - Fault Tolerance
- Under the failure-stop model, if up to f of f1
servers crash, at least one remains to supply the
service. - Under the Byzantine failure model, if up to f
servers can exhibit failures, then a group of
2f1 servers can provide a correct service. - Increased Availability
- Service may not be available when servers fail
or when the network is partitioned. -
P probability of one server fails, 1 P
availability of server, e.g. P 5, server is
available 95 of the time.
Pn probability of n servers fail, 1 Pn
availability of service, e.g. P 5, n 3,
service available 99.875 of the time
3Basic Mode of Replication
-
- Replication Transparency
- User need not know that multiple physical copies
of data exist. - Replication Consistency
- Data is consistent on all (or some) of the
replicas
server
Front End
Client
RM
Front End
Client
RM
server
RM
Front End
Client
server
Service
4Replication Management
- Request Communication
- Requests can be made to a single RM or to
multiple RMs - Coordination The RMs decide
- whether the request is to be applied
- the order of requests
- FIFO ordering If a FE issues r then r, then any
correct RM handles r and then r. - Causal ordering If the issue of r happened
before the issue of r, then any correct RM
handles r and then r. - Total ordering If a correct RM handles r and
then r, then any correct RM handles r and then
r. - Execution The RMs execute the request
tentatively.
5Replication Management
- Agreement The replica managers reach consensus
on the effect of the request. - Response
- One or more replica managers responds to the
front end. - In the case of fail-stop model, the FE returns
the first response to arrive, and in the case of
Byzantine failure, it returns a response that a
majority of the replica managers provides.
6Group Communication
-
-
- Static Groups group membership is pre-defined
- Dynamic Groups Members may join and leave, as
necessary e.g. RMs
Group
Address Expansion
Leave
Membership Management
Group Send
Fail
Multicast Comm.
Join
7Views
- A group membership service maintains group
views, which are lists of current group members. - A new group view is generated when a member
joins or leaves. - A view Vp(g) is process ps understanding of its
group (list of members) - Example V p.0(g) p, V p.1(g) p, q, V
p.2 (g) p, q, r, V p.3 (g) p,r
8Views
- An event occurs in a view vp,i(g) if at the time
of event occurrence, p has delivered vp,i(g) but
has not yet delivered vp,I1(g). - P delivers a view by multicasting it to all the
process. - Requirements for view delivery
- Order If p delivers vi(g) and then vi1(g) then
no other process q delivers vi1(g) before vi(g). - Integrity If p delivers vi(g), then p is in
vi(g). - Non-triviality if process q joins a group and
becomes reachable from process p, then eventually
q is always in the views that p delivers.
9View Synchronous Communication
- Extends reliable multicast for dynamic groups
(w.r.t changing views) - The following guarantees are provided
- Agreement Correct processes deliver the same
set of messages in any view. - if p delivers m in V, and then delivers V, then
all processes in V ? V deliver m in view V - Integrity if P delivers message m, P does not
deliver m again, also P ? group (m) - Validity Correct processes always deliver the
messages they send. That is, if p delivers
message m in view v(g), and some process q ? v(g)
does not deliver m in view v(g), then the next
view v(g) that p delivers excludes q. - Order If p delivers vi(g) and then vi1(g) then
no other process q delivers vi1(g) before vi(g).
10Example View Synchronous Communication
Allowed
Allowed
Not Allowed
Not Allowed
11Linearizability
- Let the sequence of read and update operations
that client i performs in some execution be oi1,
oi2, - A single server managing a single copy of the
objects would serialize the operations of the
clients. - A virtual interleaving of the operations does
not necessarily physically occur at any
particular replica manager but should establish
the correctness of the execution. - The most strict criterion is linearizability. A
replicated shared object service is linearizable
if for any execution, there is some interleaving
of operations issued by all clients that - meets the specification of a single correct copy
of objects - is consistent with the real times at which each
operation occurred during the execution
12Sequential Consistency
- The real-time requirement of linearizability is
hard, if not impossible to achieve for most
systems. - A less strict criterion is sequential
consistency A replicated shared object
service is sequentially consistent if for any
execution, there is some interleaving of clients
operations that - meets the specification of a single correct copy
of objects - is consistent with the program order in which
each individual client executes those operations. - The criterion does not require absolute time or
total order. Only that for each client the order
in the sequence be consistent with request order.
13Passive (Primary-Backup) Replication
-
- Request Communication the request is issued to
the primary RM and carries a unique request id. - Coordination Primary takes requests atomically,
in order, checks id (resends response if not new
id.) - Execution Primary executes stores the response
- Agreement If update, primary sends updates
state, req-id and response to all back ups. - Response primary sends to the front end
primary
Front End
Client
RM
RM
Backup
.
RM
RM
Front End
Client
Backup
Backup
14Fault Tolerance in Passive Replication
- The system implements linearizability, since the
primary sequences operations in order. - If the primary fails, a back up becomes primary
by agreement, and the replica managers that
survive agree on which operations had been
performed at the point when the new primary takes
over. - The above requirement is met if the replica
managers (primary and backups) are organized as a
group and if the primary uses view-synchronous
group communication to send updates to backups. - The system remains linearizable after the
primary crashes - To tolerate byzantine failures, need 2f1, else
need f1 for up to f failures
15Active Replication
-
- Request Communication The request contains a
unique identifier and is multicast to all by a
reliable totally ordered multicast. - Coordination Group comm. delivers the request to
each RM in the same order. - Execution Each replica executes the request.
- Agreement none
- Response Each replica sends response directly to
FE, FE can use first response, n responses, or
all responses
Front End
Client
RM
.
RM
Front End
Client
RM
16Fault Tolerance in Active Replication
- Replica managers work as state machines, playing
equivalent roles. If anyone crashes, state is
maintained by others. - The system implements sequential consistency
- The total order ensures that all correct replica
managers process the same set of requests in the
same order. - Each front ends requests are served in FIFO
order (because the front end awaits a response
before making the next request). - If clients are multi-threaded and communicate
with one another while waiting for responses from
the service, we may need to incorporate causal
ordering as well - Front ends can use consensus algorithms to check
for Byzantine failures - System can handle crashes up to f of 2f1 nodes