Virtual Synchrony - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Virtual Synchrony

Description:

Virtual Synchrony Justin W. Hart CS 614 11/17/2005 Papers The Process Group Approach to Reliable Distributed Computing. Birman. CACM, Dec 1993, 36(12):37-53. – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 55
Provided by: csCornell
Category:

less

Transcript and Presenter's Notes

Title: Virtual Synchrony


1
Virtual Synchrony
  • Justin W. Hart
  • CS 614
  • 11/17/2005

2
Papers
  • The Process Group Approach to Reliable
    Distributed Computing. Birman. CACM, Dec 1993,
    36(12)37-53.
  • Understanding the Limitations of Causally and
    Totally Ordered Communication.  Cheriton and
    Skeen.  14th SOSP, 1993.

3
Background
  • Chandy-Lamport Logical Clocks
  • Consistent Cuts
  • Distributed Snapshots
  • Publish/Subscribe
  • Fail-Stop

4
Fail Stop
  • Group Membership Service
  • Processes appear to fail by halting
  • How does this affect the FLP result?

5
Motivation
  • Information Backplane
  • Customization
  • Hierarchical Structure
  • Fault-Tolerance
  • Reliability

6
Process Groups
  • Types of groups
  • Anonymous groups
  • Explicit groups
  • Implementation Requirements
  • Group communication
  • Group membership as input
  • Synchronization

7
Anonymous Groups
  • Group addressing
  • Messages sent exactly once to all or no
    recipients
  • Ordering
  • Logging

8
Explicit Groups
  • Group members cooperate directly
  • May execute algorithms based on membership
    knowledge
  • Communication is sensitive to membership changes

9
Building groups over conventional technology
  • Conventional message passing technologies
  • Group addressing
  • Logical time causal dependency
  • Message delivery ordering
  • State transfer
  • Fault tolerance

10
Close Synchrony
  • Close Synchrony
  • 100 lock-step execution model

11
A synchronous execution
p
q
r
s
t
u
  • With true synchrony executions run in genuine
    lock-step.

12
So whats wrong with that?
  • Under close synchrony, execution is limited by
    the slowest process in the group!

13
Virtual Synchrony
  • Relax synchronization requirements where possible
  • Benefit by allowing for asynchronous interactions
  • Do this where the result is identical to close
    synchrony

14
A few protocols
  • fbcast
  • cbcast
  • abcast
  • gbcast

15
Four protocols!?!?
  • but Justin. The paper only discussed 2
    protocols youre getting off-topic!

16
A few protocols
  • fbcast
  • Simple protocol upon which well build the
    others.
  • Delivery is FIFO ordered, with respect to the
    original sender
  • Accomplished easily with a logical timestamp
  • cbcast
  • abcast
  • gbcast

17
Single updater
  • If p is the only update source, the need is a bit
    like the TCP fifo ordering
  • fbcast is a good choice for this case

1
2
3
4
p
r
s
t
18
A few protocols
  • fbcast
  • cbcast
  • Receipt is causally ordered
  • Protocol in paper uses token passing
  • Another simple protocol uses vector timestamps
  • abcast
  • gbcast

19
Causally ordered updates
  • Simple protocol based on token passing

20
Causally ordered updates
  • Example messages from p and s arrive out of
    order at t

VT(b)1,0,0,1
c is early VT(c) 1,0,1,1 but
VT(t)0,0,0,1 clearly we are missing one
message from s
p
VT(c) 1,0,1,1
When b arrives, we can deliver both it and
message c, in order
r
s
t
VT(a) 0,0,0,1
21
Causally ordered updates
  • Each thread corresponds to a different lock
  • In effect red events never conflict with green
    ones!

2
5
p
1
r
3
s
t
2
1
4
22
Hey that sped things up!
  • Now I get it! Processes only have to wait for
    processes that they depend on. Not the slowest
    in the group!

23
A few protocols
  • fbcast
  • cbcast
  • abcast
  • Atomic delivery ordering
  • With respect to other abcasts
  • More costly than cbcast, but with a stronger
    ordering property
  • ISIS builds abcast over cbcast
  • gbcast

24
A few protocols
  • fbcast
  • cbcast
  • abcast
  • gbcast
  • Atomic delivery ordering
  • With respect to everything

25
Three Round Multicast
26
As a time-line picture
Phase 1
Phase 2
Vote?
Commit!
2PC initiator
p
q
r
s
t
All vote commit
27
Just one more
28
Flush protocol
  • We say that a message is unstable if some
    receiver has it but (perhaps) others dont
  • For example, qs message is unstable at process r
  • If q fails we want to flush unstable messages
    out of the system

29
Styles of groups
  • Peer Groups
  • Processes cooperate closely
  • Client-Server Groups
  • Group acts as a server
  • Client multicasts repeatedly to the group
  • Diffusion Groups
  • Group serves information
  • Clients connect to receive data from group
  • Hierarchical Groups
  • Offer scalability through a hierarchy of
    connected groups

30
Historical Aside
  • Two major classes of real systems
  • Virtual synchrony
  • Weaker properties not quite FLP consensus
  • Much higher performance (orders of magnitude)
  • Requires that majority of system remain
    connected. Partitioning failures force protocols
    to wait for repair
  • Quorum-based state machine protocols are
  • Closer to FLP definition of consensus
  • Slower (by orders of magnitude)
  • Sometimes can make progress in partitioning
    situations where virtual synchrony cant

31
Names of some famous systems
  • Isis was first practical virtual synchrony system
  • Later followed by Transis, Totem, Horus
  • Today Best options are Jgroups, Spread, Ensemble
  • Technology is now used in IBM Websphere and
    Microsoft Windows Clusters products!
  • Paxos was first major state machine system
  • BASE and other Byzantine Quorum systems now
    getting attention from the security community
  • (End of Historical aside)

32
Sounds good whats wrong with it?
  • Tries to solve state problems at communication
    level
  • This violates the end-to-end argument!
  • Consistency requirements are typically stated
    with respect to application state

33
Stable vs Durable
  • Stable messages are buffered until received by
    all group members
  • Durable message will be delivered, even if the
    sender dies

34
Ordering semantics
  • Incidental Ordering
  • Semantic Ordering
  • Prescriptive Ordering

35
The problem with CATOCS
  • It cant say for sure
  • It cant say the whole story
  • It cant say together
  • It cant say it efficiently

36
It cant say for sure
  • Processes communicating over a hidden channel
  • Common database
  • Shared memory
  • Two threads reacting to external event

37
It cant say together
  • Standard solution locking
  • Transaction models allow for abort and rollback
  • Higher level conditions what happens if a
    message arrives, but is not successfully processed

38
Stock trading example
39
Cant say the whole story
  • Not everything can be expressed through the
    happens-before relationship
  • Semantic ordering constraints
  • Causal memory, the weakest of these, cannot be
    expressed in causal multicast
  • Total ordering helps some of these, but is far
    too expensive
  • Inexpensive, state-level protocols with logical
    clocks can solve these

40
It cant say it efficiently
  • False causality
  • Potential causality ! Actual causality
  • Memory requirements for buffering unstable
    messages
  • Ordering information during transmission and
    reception

41
And what of the end to end argument?
  • All of this considers our communication channels
    isnt the application-level check far more
    important?

42
Classes of distributed applications
  • Data dissemination
  • Netnews
  • Trading application example
  • Global predicate evaluation
  • Transactional applications
  • Replicated data
  • Replication in the large
  • Distributed real-time applications

43
Implementing only part of the messaging?
  • Can you cut down on overhead by implementing only
    part of the messaging using CATOCS?

44
Semantics
  • Are the semantics of state-based approaches
    superior to those of virtual synchrony?

45
Scalability
  • N Processes
  • Time T to propagate a message across the system
  • Grows roughly proportional with the square root
    of the number of processes
  • Arcs in the active causal graph grow
    quadratically
  • Quadratic causal graph

46
Buffering grows
  • Quadratic arcs
  • Linear communication of causal dependencies
  • Linear growth in required buffering
  • Changing topologies doesnt help
  • CATOCS would require separate process groups for
    read and write to accomplish optimization of
    updates vs queries

47
Group membership protocols
  • Must enforce atomic delivery semantics
  • Run our most expensive protocol gbcast
  • Failures increase with the size of the system,
    increasing load on the GMS

48
Who uses ISIS?
  • Brokerage
  • Database replication and triggers

49
ISIS-based utilities
  • NEWS
  • A pub/sub application with that will replay
    histories
  • NMGR
  • Manages batch-style jobs and performs load
    sharing
  • Parallel make

50
ISIS-based utilities
  • DECEIT
  • NFS compatible file system
  • META/LOMITA
  • Sensors actuators
  • Abstract sensors
  • Specify control actions in high-level terms
  • SPOOLER/LONG-HAUL FACILITY

51
Now somewhat supported
  • ISIS/Horus/Ensemble/QuickSilver
  • JGroups
  • Spread
  • Totem
  • Transis
  • WebSphere Windows Cluster (internally)

52
and people actually use it.
  • NYSE
  • French ATC System
  • AEGIS

53
An ongoing debate
  • The effort continues here at Cornell with the
    QuickSilver effort
  • Youve been presented the options what are your
    conclusions?

54
References
  • Some slides borrowed from Ken Birmans CS 614
    slide sets on Virtual Synchrony
    http//www.cs.cornell.edu/courses/cs514/2005sp/Sli
    de20Sets.htm
  • Images have been borrowed from The Process Group
    Approach to Reliable Distributed Computing.
    Birman. CACM, Dec 1993, 36(12)37-53.
  • Images have been borrowed from Understanding the
    Limitations of Causally and Totally Ordered
    Communication.  Cheriton and Skeen.  14th SOSP,
    1993.
  • Statements and ideas have been borrowed verbatim
    from both papers, including section headings, and
    statements in notes. This has been mostly for
    coherence between the slides and papers
  • Also sourced data from http//www.cs.cornell.edu/k
    en/
Write a Comment
User Comments (0)
About PowerShow.com