Title: Reliable Distributed Systems
1Reliable Distributed Systems
2Group Membership
- Foundational concept for high speed data
replication protocols. - Essential for large scale grid-based virtual
organizations and resource discovery and
scheduling - Solution Group membership service (GMS)
- Manage GMS services membership and then manage
other services general membership 2-tier
architecture - GMP Group Membership Protocol is used among GMS
to manage membership - GMS then woks on its group.
- Another problem is static vs dynamic membership
3Agreement on Membership
- Detecting failure is a lost cause.
- Too many things can mimic failure
- To be accurate would end up waiting for a process
to recover - Substitute agreement on membership
- Now we can drop a process because it isnt fast
enough - This can seem arbitrary, e.g. A kills B
- GMS implements this service for everyone else
4Architecture
Applications use replicated data for high
availability
2PC-like protocols use membership changes instead
of failure notification
Membership Agreement, join/leave and P seems
to be unresponsive
5Architecture
Application processes
membership views
A
A A,B,D A,D A,D,C D,C
GMS processes
join
B
leave
GMS
join
C
X
Y
Z
D
A seems to have failed
6GMS API
7GMS API
- P.278
- Three operations
- Join(process-id, callback)
- Leave(process-id)
- Monitor(process-id,callback)
- GMS needs to be highly available
- Here is problem Adapt it to grid services and VO
8Example
- Distributed system using the GMS is a airtraffic
control system it would require itself to be
reconfigured with existing processes after
failure of a process. - In some cases such as in grid VO it may be fact
of life membership may be changing dynamically.
9Contrast dynamic with static model
- Static model fixed set of processes tied to
resources - Processes may be unreachable (while failed or
partitioned away) but later recover - Think cluster of PCs
- Dynamic model changing set of processes launched
while system runs, some fail/terminate - Failed processes never recover (partitioned
process may reconnect, but uses a new pid) - And can still own a physical resource, allowing
us to emulate a static model
10Commit protocol
ok to commit?
vote unknown!
ok
decision unknown!
ok
11Suppose this is a partitioning failure (or
merging)
ok to commit?
vote unknown!
ok
decision unknown!
ok
Do these processes actually need to be consistent
with the others?
12Primary partition concept
- Idea is to identify notion of the system with a
unique component of the partitioned system - Call this distinguished component the primary
partition of the system as a whole. - Primary partition can speak with authority for
the system as a whole - Non-primary partitions have weaker consistency
guarantees and limited ability to initiate new
actions
13Ricciardi Group Membership Protocol
- For use in a group membership service (usually
just a few processes that run on behalf of whole
system) - Tracks own membership own members use this to
maintain membership list for the whole system - All users of the service see subsequences of a
single system-wide group membership history - GMS also tracks the primary partition
14GMP protocol itself
- Used only to track membership of the core GMS
- Designates one GMS member as the coordinator
- Switches between 2PC and 3PC
- 2PC if the coordinator didnt fail and other
members failed or are joining - 3PC if the coordinator failed and some other
member is taking over as new coordinator
15GMS majority requirement
- To move from system view i to view i1, GMS
requires explicit acknowledgement by a majority
of the processes in view i - Cant get a majority causes GMS to lose its
primaryness information - GMP to can be extended to support partitioning
and remerging
16GMS in Action
p0 p1 ... p5
p0 is the initial coordinator. p1 and p2 join,
then p3...p5 join. But p0 fails during join
protocol, and later so does p3. Majority
consent is used to avoid partitioning!
17GMS in Action
p0 p1 ... p5
2-phase commit 3-phase 2phase
P0 is coordinator P1 takes over
P1 is new coordinator
18What if system has thousands of processes?
- Idea is to build a GMS subsystem that runs on
just a few nodes - GMS members track themselves
- Other processes ask to be admitted to system or
for faulty processes to be excluded - GMS treats overall system membership as a form of
replicated data that it manages, reports to its
listeners
19Uses of membership?
- If we rewire TCP and RPC to use membership
changes as trigger for breaking connections, can
eliminate many problems! - But nobody really does this
- Problem is that networks lack standard GMS
subsystems now! - But we can try using it in Grid/Web services
environment?!!?
20Summary
- We know how to build a GMS that tracks its own
membership - Examine how this can be applied to grid services?
- M.S. or a Ph.D. Problem.