EEC 688/788 Secure and Dependable Computing - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

EEC 688/788 Secure and Dependable Computing

Description:

EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing_at_ieee.org – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 27
Provided by: Wenb3
Category:

less

Transcript and Presenter's Notes

Title: EEC 688/788 Secure and Dependable Computing


1
EEC 688/788Secure and Dependable Computing
  • Lecture 7
  • Wenbing Zhao
  • Department of Electrical and Computer Engineering
  • Cleveland State University
  • wenbing_at_ieee.org

2
Outline
  • Checkpointing and logging
  • Checkpoint-based protocols
  • Uncoordinted checkpointing
  • Coordinated checkpointing
  • Logging-based protocols
  • Pessimistic logging
  • Optimistic logging
  • Causal logging

3
Chandy and Lamport Distributed Snapshot Protocol
  • CL snapshot protocol is a nonblocking protocol
  • TS checkpointing protocol is blocking
  • CL protocol is more desirable for applications
    that do not wish to suspect normal operation
  • However, CL protocol is only concerned how to
    obtain a consistent global checkpoint
  • CL Protocol no coordinator, any node may
    initiate a global checkpointing
  • Data structure
  • Marker message equivalent to the CHECKPOINT
    message
  • Marker certificate keep track to see a marker is
    received from every incoming channel

4
CL Distributed Snapshot Protocol
5
Example
  • P0 channel state m0 (p1 to p0 channel)
  • P1 channel state m1 (p2 to p1 channel)
  • P2 channel state empty

6
Comparison of TS CL Protocols
  • Similarity
  • Both rely on control msgs to coordinate
    checkpointing
  • Both capture channel state in virtually the same
    way
  • Start logging channel state upon receiving the
    1st checkpoint msg from another channel
  • Stop logging channel state after received
    checkpoint on the incoming channel
  • Communication overhead similar

7
Comparison of TS CL Protocols
  • Differences strategies in producing a global
    checkpoint
  • TS protocol suspends normal operation upon 1st
    checkpoint msg while CL does not
  • TS protocol captures channel state prior to
    taking a checkpoint, while CL captures channel
    state after taking a checkpoint
  • TS protocol more complete and robust than CL
  • Has fault handling mechanism

8
Log Based Protocols
  • Work might be lost upon recovery using
    checkpoint-based protocols
  • By logging messages, we may be able to recover
    the system to where it was prior to the failure
  • System mode the execution of a process is
    modeled as a set of consecutive state intervals
  • Each interval is initiated by a nondeterministic
    state or initial state
  • We assume the only type of nondeterministic event
    is receiving of a message

9
Log Based Protocols
  • In practice, logging is always used together with
    checkpointing
  • Limits the recovery time start with the latest
    checkpoint instead of from the initial state
  • Limits the size of the log after taking a
    checkpoint, previously logged events can be
    purged
  • Logging protocol types
  • Pessimistic logging msgs are logged prior to
    execution
  • Optimistic logging msgs are logged
    asynchronously
  • Causal logging nondeterministic events that not
    yet logged (to stable storage) are piggybacked
    with each msg sent
  • For optimistic and causal logging, dependency of
    processes has to be tracked gt more complexity,
    longer recovery time

10
Pessimistic Logging
  • Synchronously log every incoming message to
    stable storage prior to execution
  • Each process periodically checkpoints its state
    no need for coordination
  • Recovery a process restores its state using the
    last checkpoint and replay all logged incoming
    msgss

11
Pessimistic Logging Example
  • Pessimistic logging can cope with concurrent
    failures and the recovery of two or more processes

12
Benefits of Pessimistic Logging
  • Processes do not need to track their dependencies
  • Logging mechanism is easy to implement and less
    error prone
  • Output commit is automatically ensured
  • No need to carry out coordinated global
    checkpointing
  • By replaying the logged msgs, a process can
    always bring itself to be consistent with other
    processes
  • Recovery can be done completely locally
  • Only impact to other processes duplicate msgs
    (can be discarded)

13
Pessimistic Logging Discussion
  • Reconnection
  • A process must be able to cope with temporary
    connection failures and be ready to accept
    reconnections from other processes
  • Application logic should be made independent from
    the transport level events event-based or
    document-based computing paradigm
  • Message duplicate detection
  • Messages may be replayed during recovery gt
    duplicate messages
  • Transport level duplicate detection irrelevant.
    Must add mechanism in application level
    protocols, e.g., WS-ReliableMessaging
  • Atomic message receiving and logging
  • A process may fail right after the receiving of a
    message before it has a chance to log it to
    stable storage
  • Need application-level reliable messaging
    mechanism

14
Application-Level Reliable Messaging
  • Sender buffers message sent until receives an
    application-level ack
  • Benefits of application-level reliable messaging
  • Atomic message receiving and logging
  • Facilitate distributed system recovery from
    process failures enables reconnection
  • Enables optimization message received can be
    executed immediately and the logging can be
    deferred until another message is to be sent
  • Logging and msg execution can be done
    concurrently
  • If a process sends out a message after receiving
    several msgs, logging of msgs can be batched

15
Sender Based Message Logging
  • Basic idea
  • Log the message at the sending side in volatile
    memory
  • Should the receiving process fail, it could
    obtain the messages logged at the sending
    processes for recovery.
  • To avoid restarting from the initial state after
    a failure, a process can periodically checkpoint
    its local state and write the message log in
    stable storage (as part of the checkpoint)
    asynchronously
  • Tradeoff
  • Relative ordering of messages must be explicitly
    supplied by the receiver to the sender (quite
    counter-intuitive!)
  • The receiver must wait for an explicit ack for
    the ordering message before it send any msgs to
    other processes (however, it can execute the
    message received immediately without delay)
  • The mechanism is to prevent the formation of
    orphan messages and orphan processes

16
Orphan Message and Orphan Process
  • An orphan message is one that was sent by a
    process prior to a failure, but cannot be
    guaranteed to be regenerated upon the recovery of
    the process
  • An orphan process is a process that receives an
    orphan message
  • If a process sends out a message and subsequently
    fails before the determinants of the messages it
    has received are properly logged, the message
    sent becomes an orphan message

17
Sender Based Message Logging Protocol Data
Structures
  • A counter, seq_counter, used to assign a sequence
    number (using the current value of the counter)
    to each outgoing message
  • Needed for duplicate detection
  • A table for duplicate detection
  • Each entry has the form ltprocess_id,max_seqgt,
    where max_seq is the maximum sequence number that
    the current process has received from a process
    with an identifier of process_id.
  • A message is deemed as a duplicate if it carries
    a sequence number lower or equal to max_seq for
    the corresponding process
  • Another counter, rsn_counter, used to record the
    receiving/execution order of an incoming message
  • The counter is initialized to 0 and incremented
    by one for each message received

18
Sender Based Message Logging Protocol Data
Structures
  • A message log (in volatile memory) for msg sent
    by the process. In addition to the msg sent, the
    following meta data is also recorded
  • Destination process id, receiver_id
  • Sending sequence number, seq
  • Receiving sequence number, rsn.
  • A history list for the messages received since
    the last checkpoint. It is used to find the
    receiving order number for a duplicate msg.
  • Upon receiving a duplicate message, the process
    should supply the corresponding (original)
    receiving order number so that the sender of the
    message can log such ordering information
    properly
  • Each entry in the list has the following
    information
  • Sending process id, sender_id
  • Sending sequence number, seq
  • Receiving sequence number, rsn (assigned by the
    current process).

19
What Should be Checkpointed?
  • All the data structures described above except
    the history list must be checkpointed together
    with the process state
  • The two counters, one for assigning the message
    sequence number and the other for assigning the
    message receiving order, are needed so that the
    process can continue doing so upon recovery using
    the checkpoint
  • The table for duplicate detection is needed for a
    similar reason.
  • Why the message log must be checkpointed?
  • The log is needed for the receiving processes to
    recover from a failure, and hence, cannot be
    garbage collected upon a checkpointing operation
  • Additional mechanism is necessary to ensure that
    the message log does not grow indefinitely

20
Sender Based Message Logging Protocol Message
Types
  • REGULAR It is used for sending regular messages
    generated by the application process, and it has
    the form ltREGULAR, seq, rsn,mgt
  • ORDER It is used for the receiving process is
    notify the sending process the receiving order of
    the message. An order message carries the form
    ltORDER, m, rsngt,
  • m is the message identifier consisting of a
    tuple ltsender_id, receiver_id, seqgt
  • ACK It is used for the sending process (of a
    regular message) to acknowledge the receipt of
    the order message. It assumes the form ltACK, mgt

21
Sender Based Message Logging Protocol Normal
Operation
  • The protocol operates in three steps for each
    message
  • A regular message, ltREGULAR,seq, rsn,mgt, is sent
    from one process, e.g., Pi, to another process,
    e.g., Pj .
  • Process Pj determines the receiving/execution
    order, rsn, of the regular message and informs
    the determinant information to Pi in an order
    message ltORDER, m, rsngt.
  • Process Pj waits until it has received the
    corresponding acknowledgment message, ltACK, mgt,
    before it sends out any regular message.

22
(No Transcript)
23
Sender Based Message Logging Protocol Recovery
Mechanism
  • On recovering from a failure, a process first
    restores its state using the latest local
    checkpoint, and then it must broadcast a request
    to all other processes in the system to
    retransmit all their logged messages that were
    sent to the process
  • The recovering process retransmit the regular
    messages or the ack messages based on the
    following rule
  • If the entry in the log for a message contains no
    rsn value, then a regular message is
    retransmitted because the intended receiving
    process might not have received this message.
  • If the entry in the log for a message contains a
    valid rsn value, then an ack message is sent so
    that the receiving process can send regular
    messages
  • When a process receives a regular message, it
    always sends a corresponding order message in
    response

24
Actions upon Receiving a Regular Message
  • A process always sends a corresponding order msg
    in response
  • Three scenarios with recovery
  • The msg is a not duplicate the current rsn
    counter value is assigned to the msg and the
    order msg is sent. The process must wait until it
    receives the ack msg before it can send any
    regular msg
  • The msg is a duplicate, and the corresponding rsn
    is found in the history list actions are
    identical to above except rsn is not newly
    assigned
  • The msg is a duplicate, and no rsn is found in
    the history list the process must have
    checkpointed its state after receiving the msg
    and the msg is no longer needed for recovery.
    Hence, the order msg includes a special constant
    indicating so. The sender can then purge the msg
    in its log
  • The recovering process may receive two types of
    retransmitted regular messages
  • Those with a valid rsn value the rsn must be
    already part of the checkpoint. It executes the
    msg according to the order
  • Those without can assign the msg to any order

25
Limitations of Sender Based Msg Logging Protocol
  • Wont work in the presence of 2 or more
    concurrent failures
  • Determinant for some regular msgs (i.e., rsn)
    might be lost gt orphan processes and cascading
    rollbacks

P2 may become an orphan process if P0 and P1 both
crash received mt that no one has sent
26
Truncating Senders Message Log
  • Once a process completes a local checkpoint, it
    broadcasts a message containing the highest rsn
    value for the messages that it has executed prior
    to the checkpoint.
  • All messages sent by other processes to this
    process that were assigned a value that is
    smaller or equal to this rsn value can now to
    purged from its message log (including those in
    stable storage as part of a checkpoint)
  • Alternatively, this highest rsn value can be
    piggybacked with each message (regular or control
    messages) sent to another process to enable
    asynchronous purging of the logged messages that
    are no longer needed
Write a Comment
User Comments (0)
About PowerShow.com