Reliable multicast - PowerPoint PPT Presentation

About This Presentation

Title:

Reliable multicast

Description:

Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes in the group. – PowerPoint PPT presentation

Number of Views:85

Avg rating:3.0/5.0

Slides: 23

Provided by: Suku80

Learn more at: http://homepage.cs.uiowa.edu

Category:

more less

Transcript and Presenter's Notes

Title: Reliable multicast

1
Reliable multicast

Tolerates process crashes. The additional
requirements are
Only correct processes will receive multicasts
from all correct processes in the group.
Multicasts by faulty processes will be received
either by every correct process, or by none at
all.

2
A theorem on reliable multicast

In an asynchronous distributed system, total
order reliable multicasts cannot be implemented
when even a single process undergoes a crash
failure.
Why? Since it will violate the FLP impossibility
result.

3
Scalable Reliable Multicast

IP multicast or application layer multicast has
to detect the loss of messages and use
retransmission for achieving reliability. For
large groups (like distance learning
applications) scalability is a major problem.

4
Scalable Reliable Multicast

Difficult to scale
Sender state explosion
Message implosion

Statereceiver 1, receiver 2, receiver n
5
Scalable Reliable Multicast

If omission failures are rare, then receivers
will only report the non-receipt of messages
using NACK, It only triggers selective
point-to-point retransmission. The reduction of
acknowledgements is the underlying principle of
Scalable Reliable Multicasts (SRM).
If several members of a group fail to receive a
message, then each such member waits for a random
period of time before sending its NACK. This
helps to suppress redundant NACKs. Sender
multicasts the missing copy only once.

6
Dealing with open groups

Processes may join or leave an open group. Life
will be simpler, if everyone has a consistent
view of the current membership.
(view current membership)
What problems can arise if members do not have
identical views?

7
Membership service

A group membership service looks after the
following
Joining and leaving groups.
Updating all members about the latest view of the
group
Failure detection

8
Dealing with open groups

Views should propagate in the same order to all.
Example.
Current view v0(g) 0, 1, 2, 3.
Let 1, 2 leave and 4 join the group concurrently.
This view change can be serialized in many ways
0,1,2,3, 0,1,3 0,3,4, OR
0,1,2,3, 0,2,3, 0,3, 0,3,4, OR
0,1,2,3, 0,3, 0,3,4
Send these changes by total order multicast.

9
View propagation

Process 0
v0(g) v0(g) 0.1,2,3,
send m1, ...
v1(g)
send m2, send m3 v1(g) 0,1,3,
v2(g)
Process 1 v2(g) 0,3,4
v0(g)
send m4, send m5
v1(g)
send m6
v2(g) ...

10
View-synchronous communication

With respect to each message, all correct
processes have the same view.
m sent in view V ? m received in view V

11
View delivery guidelines

If a process j joins and thereafter continues its
membership in a group g that already contains a
process i, then eventually j appears in all views
delivered by process i.
If a process j permanently leaves a group g that
contains a process i, then eventually j is
excluded from all views delivered by process i.

12
View-synchronous communication

Agreement. If a correct process k delivers a
message m in vi(g) before delivering the next
view vi1(g), then every correct process j ?
vi(g) ? vi1(g) must deliver m before delivering
vi1(g).
Integrity. If a process j delivers a view vi(g),
then vi(g) must include j.
Validity. If a process k delivers a message m in
view vi(g) and another process j ? vi(g) does not
deliver that message m, then the next view
vi1(g) delivered by k must exclude j.

13
Example

Let process 1 deliver m and then crash.
Possibility 1. No one delivers m, but each
delivers the new view 0,2,3.
Possibility 2. Processes 0, 2, 3 deliver m and
then deliver the new view 0,2,3
Possibility 3. Processes 2, 3 deliver m and
then deliver the new view 0,2,3 but process 0
first delivers the view 0,2,3 and then delivers
m.
Are these acceptable?

0
m
1
m
2
m
3
0,1,2,3
0,2,3
14
Overview of Transis

Group communication system developed by Danny
Dolev at the Hebrew University of Jerusalem.
Deals with open group
Supports scalable reliable multicast
Tolerates network partition

15
Overview of Transis

IP multicast (or ethernet LAN) used to support
high bandwidth multicast.
Acks are piggybacked and message loss is detected
transparently, leading to selective
retransmission
The sequence of messages P1, P2, p2Q1, Q2, q3R1,
received by a member i ? P,Q,R,S shows the
recipient did not receive the message Q3.

16
Overview of Transis

Causal mode (maintains causal order)
Agreed mode (maintains total order that does not
conflict with the causal order)
Safe mode (Delivers a message only when the lower
levels of the system have acknowledged its
reception at all the destination machines. All
messages are delivered relative to a safe
message)

17
Overview of Transis
Dealing with partition
Each partition assumes that the machines in the
other partition have failed, and
maintains consistency within its own partition
only.
After repair, consistency is restored in the
entire system.
18
Replication

Improves reliability
Improves availability
(What good is a reliable system if it is not
available?)
Replication must be transparent and create the
illusion of a single copy.

19
Updating replicated data
F
F
F
Alice
Bob
Bob
Alice
Update and consistency are primary issues.
20
Passive replication

At most one replica can be the primary server
Each client maintains a variable L (leader) that
specifies the replica to which it will send
requests. Requests are queued at the primary
server.
Backup servers ignore client requests.

primary
clients
backup
21
Primary-backup protocol

Receive. Receive the request from the client and
update the state if appropriate.
Broadcast. Broadcast an update of the state to
all other replicas.
Reply. Send a response to the client.

client
req
reply
primary
update
?
backup
22
Primary-backup protocol

If the client fails to get a response due the
crash of the primary, then the request is
retransmitted until a backup is promoted to the
primary,
Failover time is the duration when there is no
primary server.

client
req
reply
primary
?
update
backup

Write a Comment

User Comments (0)