Consistent Cuts - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Consistent Cuts

Description:

Title: PowerPoint Presentation Last modified by: Ken Birman Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 38
Provided by: corn143
Category:

less

Transcript and Presenter's Notes

Title: Consistent Cuts


1
Consistent Cuts
  • Ken Birman

2
Idea
  • We would like to take a snapshot of the state of
    a distributed computation
  • Well do this by asking participants to jot down
    their states
  • Under what conditions can the resulting puzzle
    pieces be assembled into a consistent whole?

3
An instant in real-time
  • Imagine that we could photograph the system in
    real-time at some instant
  • Process state
  • A set of variables and values
  • Channel state
  • Messages in transit through the network
  • In principle, the system is fully defined by the
    set of such states

4
Problems?
  • Real systems dont have real-time snapshot
    facilities
  • In fact, real systems may not have channels in
    this sense, either
  • How can we approximate the real-time concept of a
    cut using purely logical time?

5
Deadlock detection
  • Need to detect cycles

A
B
C
D
6
Deadlock is a stable property
  • Once a deadlock occurs, it holds in all future
    states of the system
  • Easy to prove that if a snapshot is computed
    correctly, a stable condition detected in the
    snapshot will continue to hold
  • Insight is that adding events cant undo the
    condition

7
Leads us to define consistent cut and snapshot
  • Think of the execution of a process as a history
    of events, Lamport-style
  • Events can be local, msg-send, msg-rcv
  • A consistent snapshot is a set of history
    prefixes and messages closed under causality
  • A consistent cut is the frontier of a consistent
    snapshot the process states

8
Deadlock detection
  • Need to detect cycles

A
B
C
D
9
Deadlock detection
  • Need to detect cycles

A
B
C
D
10
Deadlock detection
  • Need to detect cycles

A
B
C
D
11
Deadlock detection
  • A ghost or false cycle!

A
B
C
D
12
A ghost deadlock
  • Occurs when we accidently snapshot process states
    so as to include some events while omitting prior
    events
  • Cant occur if the cut is computed consistently
    since this violates causal closure requirement

13
A ghost deadlock
A B C D
14
A ghost deadlock
A B C D
15
A ghost deadlock
A B C D
16
Algorithms for computing consistent cuts
  • Paper focuses on a flooding algorithm
  • Well consider several other methods too
  • Logical timestamps
  • Flooding algorithm without blocking
  • Two-phase commit with blocking
  • Each pattern arises commonly in distributed
    systems well look at in coming weeks

17
Cuts using logical clocks
  • Suppose that we have Lamports basic logical
    clocks
  • But we add a new operation called snap
  • Write down your process state
  • Create empty channel state structure
  • Set your logical clock to some big value
  • Think of clock as (epoch-number, counter)?
  • Record channel state until rcv message with big
    incoming clock value

18
How does this work?
  • Recall that with Lamports clocks, if e is
    causally prior to e then LT(e) lt LT(e)
  • Our scheme creates a snapshot for each process at
    instant it reaches logical time t
  • Easy to see that these events are concurrent a
    possible instant in real-time
  • Depends upon FIFO channels, cant easily tell
    when cut is complete a sort of lazy version of
    the flooding algorithm

19
Flooding algorithm
  • To make a cut, observer sends out messages snap
  • On receiving snap the first time, A
  • Writes down its state, creates empty channel
    state record for all incoming channels
  • Sends snap to all neighbor processes Waits for
    snap on all incoming channels
  • As piece of the snapshot is its state and the
    channel contents once it receives snap from all
    neighbors
  • Note also assumes FIFO channels

20
With 2-phase commit
  • In this, the initiator sends to all neighbors
  • Please halt
  • A halts computation, sends please halt to all
    downstream neighbors
  • Waits for halted from all of them
  • Replies halted to upstream caller
  • Now initiator sends snap
  • A forwards snap downstream
  • Waits for replies
  • Collects them into its own state
  • Sends own state to upstream caller and resumes

21
Why does this work?
  • Forces the system into an idle state
  • In this situation, nothing is changing
  • Usually, sender in this scheme records
    unacknowledged outgoing channel state
  • Alternative upstream process tells receiver how
    many incoming messages to await, receiver does so
    and includes them in its state.
  • So a snapshot can be safely computed and there is
    nothing unaccounted for in the channels

22
Observation
  • Suppose we use a two-phase property detection
    algorithm
  • In first phase, asks (for example), what is your
    current state
  • You reply waiting for a reply from B and give a
    wait counter
  • If a second round of the same algorithm detects
    the same condition with the same wait-counter
    values, the condition is stable

23
A ghost deadlock
A B C D
24
Look twice and it goes away
  • But we could see new wait edges mimicking the
    old ones
  • This is why we need some form of counter to
    distinguish same-old condition from new edges on
    the same channels
  • Easily extended to other conditions

25
Consistent cuts
  • Offer the illusion that you took a picture of the
    system at an instant in real-time
  • A powerful notion widely used in real systems
  • Especially valuable after a failure
  • Allows us to reconstruct the state so that we can
    repair it, e.g. recreate missing tokens
  • But has awkward hidden assumptions

26
Hidden assumptions
  • Use of FIFO channels is a problem
  • Many systems use some form of datagram
  • Many systems have multiple concurrent senders on
    same paths
  • These algorithms assume knowledge of system
    membership
  • Hard to make them fault-tolerant
  • Recall that a slow process can seem faulty

27
High costs
  • With flooding algorithm, n2 messages
  • With 2-phase commit algorithm, system pauses for
    a long time
  • Well see some tricky ways to hide these costs
    either by continuing to run but somehow delaying
    delivery of messages to the application, or by
    treating the cut algorithm as a background task
  • Could have concurrent activities that view same
    messages in different ways

28
Fault-tolerance
  • Many issues here
  • Who should run the algorithm?
  • If we decide that a process is faulty, what
    happens if a message from it then turns up?
  • What if failures leave a hole in the system
    state missing messages or missing process state
  • Problems are overcome in virtual synchrony
    implementations of group communication tools

29
Systems issues
  • Suppose that I want to add notions such as
    real-time, logical time, consistent cuts, etc to
    a complex real-world operating system (list goes
    on)
  • How should these abstractions be integrated with
    the usual O/S interfaces, like the file system,
    the process subsystem, etc?
  • Only virtual synchrony has really tackled these
    kinds of questions, but one could imagine much
    better solutions. A possible research topic, for
    a PhD in software engineering

30
Theory issues
  • Lamports ideas are fundamentally rooted in
    static notions of system membership
  • Later with his work on Paxos he adds the idea of
    dynamically changing subsets of a static maximum
    set
  • Does true dynamicism, of sort used when we look
    at virtual synchrony, have fundamental
    implications?

31
Example of a theory question
  • Suppose that I want to add a location type to a
    language like Java
  • Object o is at process p at computer x
  • Objects a,b,c are replicas of ?
  • Now notions of system membership and location are
    very fundamental to the type system
  • Need a logic of locations. How should it look?
  • Extend to a logic of replication and self-defined
    membership? But FLP lurks in the shadows

32
FLP
33
Other questions
  • Checkpoint/rollback
  • Processes make checkpoints, probably when
    convenient
  • Some systems try to tell a process when to make
    them, using some form of signal or interrupt
  • But this tends to result in awkward, large
    checkpoints
  • Later if a fault occurs we can restart from the
    most recent checkpoint

34
So, wheres the question?
  • The issue arises when systems use message passing
    and want to checkpoint/restart
  • Few applications are deterministic
  • Clocks, signals, threads scheduling,
    interrupts, multiple I/O channels, order in which
    messages arrived, user input
  • When rolling forward from a checkpoint actions
    might not be identical
  • Hence anyone who saw my actions may be in a
    state that wont be recreated!

35
Technical question
  • Suppose we make checkpoints in an uncoordinated
    manner
  • Now process p fails
  • Which other processes should roll back?
  • And how far might this rollback cascade?

36
Rollback scenario
37
Avoiding cascaded rollback?
  • Both making checkpoints, and rolling back, should
    happen along consistent cuts
  • In mid 1980s several papers developed this into
    simple 2-phase protocols
  • Today would recognize them as algorithms that
    simply run on consistent cuts
  • For those who are interested sender-based
    logging is the best algorithm in this area.
    (Alvisis work)
Write a Comment
User Comments (0)
About PowerShow.com